当前位置:网站首页>Application practice | Shuhai supply chain construction of data center based on Apache Doris

Application practice | Shuhai supply chain construction of data center based on Apache Doris

2022-07-04 21:38:00 SelectDB

Reading guide : Shuhai supply chain is a set of sales 、 Research and development 、 purchase 、 production 、 Quality assurance 、 Storage 、 transport 、 Information 、 Finance as one of the catering supply chain service enterprises , Because its business is relatively complex ,2020 By the end of the year Apache Doris Upgrade the core architecture , And in 2021 Started construction in Apache Doris As the core data center . This article will be from data access , Data service arrangement , Data security ,Doris Application and so on .

author Head of Shuhai supply chain big data team Wang Yongxu

Business background

Shuhai supply chain is a set of sales 、 Research and development 、 purchase 、 production 、 Quality assurance 、 Storage 、 transport 、 Information 、 Finance as one of the catering supply chain service enterprises , To provide catering chain enterprises and retail customers with overall food supply chain solution services . Because its business is relatively complex ,2020 By the end of the year Apache Doris Upgrade the core architecture , And in 2021 Started construction in Apache Doris As the core data center .

In the use of Doris Before , We used CDH This data platform , Many components are used , But the link is too long , And the development and maintenance costs are relatively large , Finally, there is no good OLAP System .

Because our data history burden is relatively light , Yes Apache Doris Research and testing , decision Use to Apache Doris Build a data platform for the core , It has the following advantages :

  • At the same time, it supports high concurrent point query and high throughput Ad-hoc Inquire about .
  • At the same time, it supports offline bulk import and real-time data import .
  • Both details and aggregate queries are supported .
  • compatible MySQL Agreements and standards SQL.
  • Support Rollup Table and Rollup Table Intelligent query routing .
  • Support better multi table Join Strategy and flexible expression query .
  • Support Schema Online change .
  • Support Range and Hash Secondary division .
  • High availability , It can tolerate some nodes hanging .
  • Simple operation and maintenance , Deploy , maintain , Upgrading is relatively simple , Independent of external components .

The architecture is as follows :

img

Because of the previous understanding of Metadata , Data services , Access data quality , The construction of kinship Introduced , This article will be from data access , Data service arrangement , Data security ,Doris Application and so on .

Data access

Data access function is an important part of data development , We have developed a data access system , stay Web End operation , Realize zero code data access to Doris, The following is an introduction to the main functions :

  • subscribe MySQL Binlog, Warehousing to Doris surface .
  • subscribe Kafka Topic, Warehousing to Doris surface .
  • Data dynamic cleaning , Writing code on the page can complete the conversion before data warehousing .
  • Access task merging , To save resources , Support sub database and sub table access in one task , Support multiple TOPIC Access in a task .
  • Dynamic data quality verification , Configure field quality rules , Check the access data quality .
  • Warehousing encryption , During re connection , You can encrypt sensitive data and then enter Doris surface .
  • Error data management , Because of network or data errors and other reasons , Data re warehousing can be completed on the page .
  • Data access link monitoring , For example, error data monitoring , Abnormal monitoring of data production link , Abnormal monitoring of data consumption link , Task data access trend chart , Trend chart of cluster data access .

Data access task list :

img

Data access task configuration :

img

Data access dynamic code processing :

img

Data service arrangement

Data services are used by business systems API A system for obtaining data . It can be done on the page API newly build 、 edit 、 Online development debugging 、 Set current limiting 、 Online and offline operations . because API There may be a business logic relationship between , And you cannot configure the same API, We developed the data service orchestration function , By dragging and dropping , Give Way API Can arrange and transfer data between , External provision API when , What is still exposed is a API.

give an example : The user and the city to which the user belongs are stored in one MySQL data source , The sales volume of each city is kept in Doris data source . To be developed API The function of is that users can only view the sales of their cities . Then it can be realized through the service orchestration function ,Node1 Nodes pass through users ID Get the city ,Node2 The node gets the output of the upstream node ( City ) As input , Get city sales as API Output .

img

The input and output of each node can be customized , Input can come from API Request parameters , It can also come from the output of an upstream node , It can come from global parameters , Such as user ID, Paging parameters ;

img

img

Data security construction

Data security is a big topic , It's all about , Here from data encryption , Data authority and data warehouse data backup are briefly introduced .

Data warehousing encryption

In the process of data access , You can choose to encrypt fields , When connected to Doris After the table , It is already encrypted data , Follow up data analysis , You can use the key to decrypt .

Data access encryption configuration :

img

Data access

Because the company has a wide range of people who view reports , For the same data model , Sales in every city and every region , operating , Factory personnel , The data viewed by managers and other personnel is different , You need to accurately control row permissions and column permissions , therefore We are Doris The upper layer developed a set of Data permission system , Through configuration , Complete data permission configuration , It can be accurate to row permissions and column permissions . BI The reporting system acts as an access party , Introduce the data permission client and implement the corresponding abstract method .

give an example 1: For a report model , Zhang San can only view data in North China or northwest China ; Li Si , Wang Ming can only view Xi'an or Beijing , And the sales volume is greater than 10000 The data of ; Zhang Si , Zhang Wu is not restricted , Others have no authority .

List of model row level permission rules :

img

Row level empowerment rule editing :

img

img

give an example 2: Everyone can view the report data , But everyone can only view their own city , And the amount is greater than 200 Or the amount is less than 100 The data of .

Free combination rule conditions and rule relationships :

img

Personnel label management :

img

img

give an example 3: Column permission rules , You can set forbidden viewing for users , Data desensitization and other rules Column level permission configuration :

img

img

Data warehouse backup

We use Dori As the core of storage and Computing ,Doris The data itself has been stored in multiple copies , But considering disaster recovery , We will still back up the core access data to HDFS, Therefore, a data warehouse data backup system is developed , hold Doris The table is divided according to the total quantity or partition , Scheduled backup to HDFS.

Backup schedule configuration :

img

Backup scheduled task list

img

Doris Application

We use it Doris It carries the calculation and storage of data analysis . Besides , There is also a scenario like this : Business MySQL Database data has been growing , A large amount of historical data affects online performance , And it cannot be deleted directly , Because there are also low-frequency historical data queries , So , We are based on Doris Developed a set of business historical data archiving system , The historical data that will not be changed can be archived incrementally at regular intervals , Provide data query through data service system , Push archived data to the business side , The business party verifies , And delete historical data .

Archive plan list :

img

Archive plan configuration :

img

Data push plan configuration :

img

earnings

At present Doris As the core data platform , It has supported the data query and data analysis requirements of dozens of business systems of the company . by BI Intelligent analysis , Each business system provides excellent query performance , And greatly reduce the data platform maintenance , Data development , The cost of data center construction .

  • The real-time data access is stable and reliable , adopt Stream Load, Thousands of watches are accessed in real time , The total number of data accessed every day is at the level of 100 million , Very stable and reliable ;
  • Support high concurrency and high performance data online analysis and query , Every day to Doris The number of online analysis queries is at the level of millions , Most of the SQL At the millisecond level , slow SQL There is also a lot of room for optimization , also Doris It will automatically optimize queries in some scenarios ;
  • By directly querying the original access table , Establish a materialization attempt , Index , It supports multiple real-time query requirements with low latency and high concurrency . And many tables Join Excellent performance ;

other :

  • Doris The overall structure of is simple , The operation and maintenance cost is very low , Online rolling upgrade , It can save manpower and focus on the construction of data center and business development ;
  • Doris Highly compatible MySQL agreement , Interactive query analysis , Provide an efficient data development experience ;
  • High availability , Data partition multi copy storage , The overall service will not be unavailable due to the exceptions of some nodes ;
  • Wide ecological compatibility , The community provides and Flink,Datax Wait for big data interaction Doris plug-in unit , adopt Broker Importing and exporting data is simple and fast ;
  • The community is active ,Doris Functions and performance are constantly expanding and improving , If you encounter problems, you can get close help from the community .

Join the community

Welcome more partners who love open source to join Apache Doris Community , Participate in community building , Except in the GitHub Ascending PR or Issue outside , You are also welcome to actively participate in the daily construction of the community , such as :

Join the community ** Solicitation activities **, Perform technical analysis 、 Applied practice and other articles ; Participate as an instructor Doris Community online and offline activities ; actively participate in Doris Questions and answers from community users .

Last , Welcome more open source technology enthusiasts to join us Apache Doris Community , Grow up hand in hand , Build community ecology .

img

img

img

SelectDB Is an open source technology company , Committed to Apache Doris The community provides a full-time engineer 、 A team of product managers and support engineers , Prosper the open source community ecology , Create an international industry standard in the field of real-time analytical databases . be based on Apache Doris R & D of a new generation of cloud native real-time data warehouse SelectDB, Running on multiple clouds , Provide users and customers with out of the box capability .

Related links :

SelectDB Official website :

https://selectdb.com (We Are Coming Soon)

Apache Doris Official website :

http://doris.apache.org

Apache Doris Github:

https://github.com/apache/doris

Apache Doris Developer mail group :

[email protected]

原网站

版权声明
本文为[SelectDB]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207042035447685.html