当前位置:网站首页>What is MYCAT? Get to know you quickly

What is MYCAT? Get to know you quickly

2022-06-11 00:44:00 A music loving programmer


Preface

Here is just to show you Mycat, It's right Mycat An understanding of , There's no actual operation .


One 、mycat What is it? ?

1、Mycat What is it? ? In terms of definition and classification , It is an open source distributed database system , It's an implementation MySQL Agreed Server, Front end users can think of it as a database agent , use MySQL Client tools and command line access , And its back end can use MySQL Native (Native) Agreement with multiple MySQL Server communication , It can also be used. JDBC The protocol communicates with most mainstream database servers , Its core function is to divide tables and databases , Divide a large table horizontally into N Small tables , Store on the back end MySQL In servers or other databases .

2、Mycat To the current version , It's not a simple MySQL Agent , Its back end can support MySQL、 SQL Server、Oracle、 DB2、 PostgreSQL Isomainstream database , Also support MongoDB This new type NoSQL How to store , More types of storage will be supported in the future . And in the eyes of end users , Whether it's that way of storage , stay Mycat in , It's a traditional database table , Supporting the standard SQL Statement to operate data , thus , For front-end business systems , It can greatly reduce the development difficulty , Improve development speed , In the test phase , A table can be defined as any kind of Mycat Supported storage methods , such as MySQL Of MyASIM surface 、 Memory tables 、 perhaps MongoDB、 LevelDB And the fastest in memory database in the world MemSQL On . Just imagine , The user table is stored in MemSQL On , A large number of data whose read frequency far exceeds the write frequency, such as the snapshot data of orders, are stored in InnoDB in , Some log data is stored in MongoDB in , And it can also put Oracle Watch heel MySQL Do association query on the table of , Do you have a feeling that you can't breathe ? But the future , Also can pass the Mycat Automatically input some calculated and analyzed data into Hadoop in , And it can be used Mycat+Storm/Spark Stream The engine does large-scale data analysis , see
Come here , You probably understand , Mycat What is it? ? Mycat Namely BigSQL, Big Data On SQL Database.

3、 Many students saw the above description , Maybe I'm still confused , I do not know! mycat What the hell is that? , Let's explain the different roles in detail ,mycat What the hell is that? ?

Two 、 Understand in another way Mycat

​ 1、 about DBA for , It's understandable mycat:

Mycat Namely MySQL Server, and Mycat Connected at the back MySQL Server, It's like MySQL Storage engine for , Such as InnoDB,MyISAM etc. , therefore ,Mycat It doesn't store data on its own , Data is back-end MySQL Stored on , So data reliability and transactions are MySQL Ensure that the , In short ,Mycat Namely MySQL The best mate , It makes MySQL Have the ability to follow Oracle PK The ability of .

​ 2、 For software engineers , It's understandable mycat:

Mycat It's an approximation of MySQL Database server , You can use the connection MySQL The way to connect Mycat, Except for the port , default mycat The port is 8066 instead of mysql Of 3306, Therefore, you need to add port information to the connection string , Most of the time , You can use the familiar object mapping framework mycat, But it is suggested that for the partition table , Try to use basic SQL sentence , Because it can achieve the best performance , Especially in the case of tens of millions or even tens of billions of records .

​ 3、 For architects , It's understandable mycat:

mycat It is a powerful database middleware , It's not just a read-write separation 、 And sub database and sub table 、 Disaster recovery backup , And it can be used for multi tenant application development , Cloud platform infrastructure , Let your architecture have strong adaptability and flexibility , With the help of the forthcoming mycat You can only optimize the module , The data access bottleneck and hotspot of the system are clear at a glance , Based on these statistical analysis data , You can adjust the back-end storage automatically or manually , Mapping different tables to different storage engines , And the whole application doesn't have to change a single line of code .

3、 ... and 、mycat Principle

​ mycat It's not complicated , The complexity is the code , If the code is not complicated , It has become a legend so early .

​ mycat One of the most important actions in the principle of is “ Intercept ”, It intercepts what the user sent SQL sentence , First of all, SQL The statement does some specific analysis : Such as fragment analysis 、 Route analysis 、 Read write separation analysis 、 Cache analysis, etc , And then put this SQL Send back-end real database , And will return the results to do the appropriate processing , And finally back to the user .
 Insert picture description here
​ In the picture above ,orders The table is divided into three pieces datanode( abbreviation dn), These three pieces are distributed in two stations MySQL Server On (Datahost), namely [email protected] The way , So you can use one to N It's divided into two servers , The fragmentation rule is (sharding rule) Typical string enumeration fragmentation rules , A rule is defined as a fragment field (sharding column)+ Piecewise functions (rule function), The fragment field here is prov The slicing function is string enumeration .

​ When mycat Receive a SQL when , I'll parse this first SQL, Find the table involved , Then look at the definition of this table , If there are fragmentation rules , Then we get SQL The value of the slice field in , And assign the partition function , Get it SQL Corresponding fragment list , And then SQL Send to these segments for execution , Finally, collect and process all the result data returned by the partition , And output to the client , With select * from orders where prov = ? Statements, for example , find out prov=wuhan, According to partition function ,wuhan return dn1, therefore sql It was sent to mysql1, selection db1 Query results on , And return to the user .

​ If the above sql Change it to select * from orders where prov in (wuhan,beijing), that ,sql It will be sent to MySQL1 and MySQL2 To carry out , Then the result set is merged and output to the user . But usually in business our SQL There will be order by as well as limit Flipping Syntax , At this point, the result set is designed to be in mycat Secondary processing of the end , This part of the code is also more complex , And the most complex one is the two tables join, So ,mycat Put forward innovative ER Fragmentation , Global table ,HBT(human brain tech) Manual only catlet, And the combination of storm/spark Engine and other 18 kinds of martial arts solutions , So it is called the most powerful solution in the industry , This is the power of open source .

Application scenarios

​ mycat Up to now , The scenarios used are already very rich , And new users are constantly giving new and innovative solutions , The following is a typical application scenario :

​ 1、 Simple separation of reading and writing , The configuration is the simplest , Support for read/write separation , Master slave switch

​ 2、 Sub database and sub table , For more than 1000 Ten thousand meters are divided into pieces , The biggest support 1000 A hundred million pieces of a single watch

​ 3、 Multi tenant applications , One library per application , But the application only connects mycat, So it doesn't change the program itself , Achieve multi tenancy

​ 4、 Report system , With the help of mycat The ability to divide tables , Deal with large-scale report statistics

​ 5、 Integrate multiple data sources

​ 6、 As a simple and effective way to query massive data in real time , such as 100 Million frequently queried records need to be in 3 Search results in seconds , In addition to primary key based queries , There may also be scope queries or other attribute queries , here mycat Probably the simplest and most effective option

​ 7、 Database router ,mycat be based on mysql Instance connection pool reuse mechanism , Each application can share one to the greatest extent mysql All connection pools in the instance , The concurrent access ability of the database is greatly improved

Why use mycat

​ 1、java Tightly coupled with database

​ 2、 High access and high concurrency pressure on the database

​ 3、 Read write request data inconsistent

Database middleware comparison

 Insert picture description here

Four 、mycat Core concept of

mycat It's database middleware , Between database and application , Intermediate services for data processing and interaction . From the original library , Segmented into multiple segmented databases , All the partitioned database clusters constitute the complete database storage .
 Insert picture description here
​ As shown in the figure above , After the data is divided into multiple partitioned databases , If the application needs to read the data , It is necessary to process data from multiple data sources . If there is no database middleware , Then the application will face the sharding cluster directly , Data source switching 、 Transaction processing 、 Data aggregation requires direct application processing , It's supposed to be a business-focused application , A lot of work will be done in the session to deal with the problems after fragmentation , The most important thing is that each application process will be completely duplicated to build the wheel .

1、 Logical library

​ For practical applications , In fact, you don't need to know the existence of middleware , Developers only need to know the concept of database , Therefore, database middleware can be regarded as a logical library composed of one or more database clusters .

​ In the era of cloud computing , Database middleware can provide services to one or more applications in the form of multi tenancy , Each application may access an independent or shared physical library , Common examples are Alibaba cloud database servers RDS
 Insert picture description here

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-AX4YHhCA-1627469149279)(image\ Logical library .png)]

2、 Logic table

​ Since there is a logic library , Then there should be a logic table , In distributed database , For applications , The tables that read and write data are logical tables . Logical table can make data split , Step by step in one or more tile libraries , It can also be done without data segmentation , Not in pieces , There is only one table

3、 Fragment table

​ Fragment table , It refers to the original tables with large data , Tables that need to be split into multiple databases , In this way, each partition will have some data , All the pieces make up the whole data .

4、 A non segmented watch

​ Not all tables in a database are large , Some tables do not need to be segmented , Non fragmentation is relative to fragmentation table , Tables that don't need data segmentation .

5、ER surface

​ Relational database is based on entity relation model , It describes things and relationships in the real world ,mycat Medium ER The table comes from this . According to this idea , Based on ER Data fragmentation strategy for relationships , The records of the child table and the associated parent table are stored in the same data fragment , That is, the subclass depends on the parent class , Guarantee data by table grouping join No cross library operation .

​ Table grouping is to solve the problem of cross slice data join It's a very good idea of , It is also an important rule of data segmentation planning .

6、 Global table

​ In a real business system , There are often a large number of dictionary like tables , These tables are basically little changed , A dictionary table has the following characteristics :

​ 1、 Changes are not frequent

​ 2、 The total amount of data has not changed much

​ 3、 The data is not big , There are rarely more than a hundred thousand records

​ For this kind of watch , In the case of fragmentation , When the business table is fragmented due to its size , The association between business tables and these attached dictionary tables , It's a tough problem , therefore mycat Data redundancy is used to solve the problem of this kind of table join, That is, all partitions have a copy of data , All dictionaries or tables that conform to the characteristics of dictionaries are defined as global tables .

​ Data redundancy is to solve the problem of cross slice data join A good idea for , It is also another important principle of data segmentation planning

7、 Sharded nodes (dataNode)

​ After data segmentation , A large table is divided into different partition databases , The database of each table partition is the partition node (dataNode)

8、 Node host (dataHost)

​ After data segmentation , Each segment node (dataNode) It's not always a single machine , There can be multiple sharded databases on the same machine , Such one or more sharding nodes (dataNode) The machine is the node host (dataHost), To avoid the concurrency limit of single node hosts , Try to segment nodes with high reading and writing pressure (dataNode) Balanced on different node hosts (dataHost).

9、 Fragmentation rule

​ Data segmentation means that a large table is divided into several partitioned tables , We need some rules , In this way, the rule of dividing data into certain partitions according to certain rules is the partition rule , It is very important for data segmentation to choose appropriate segmentation rules , It will greatly avoid the difficulty of subsequent data processing .

10、 Global serial number

​ After data segmentation , The primary key constraint in the original relational database cannot be used under the distributed condition , Therefore, it is necessary to introduce external mechanisms to ensure data uniqueness , The mechanism to ensure the global data unique identification is the global serial number .

11、 multi-tenancy

​ Multi tenancy technology or multi tenancy technology , It's a software architecture technique , It is to explore and implement how to share the same system or program components in a multi-user environment , And it can ensure the data isolation between users . In the era of cloud computing , Multi tenant technology provides the same or even customized services for most clients with a single system architecture and services in the shared data center , And it can still guarantee the data isolation of customers . At present, all kinds of cloud computing services are in the category of such technologies , For example, Alibaba cloud database service (RDS), Alibaba cloud server and so on .

​ There are three main solutions for multi tenant data storage , Namely :

1、 Independent database

​ One tenant, one database , This solution has the highest level of user data isolation , The best security , But the cost is also high .

​ advantage : Provide independent database for different tenants , It helps to simplify the extended design of the data model , To meet the unique needs of different tenants , If there is a fault , It's easy to recover data .

​ shortcoming : Increased the number of database installations , It will increase the maintenance cost and purchase cost

2、 Shared database , Isolated data architecture

​ Multiple or all tenants share database, But one for each tenant schema

​ advantage : It provides a certain degree of logical data isolation for tenants with high security requirements , It's not completely isolated ; Each database can support more tenants

​ shortcoming : If there is a fault , Data recovery is difficult , Therefore, restoring the database will involve the data of other tenants , If you need cross tenant Statistics , There are certain difficulties

3、 Shared database , Shared data structure

​ Tenants share the same database, The same schema, But pass... In the table tenantID Differentiate tenant data . This is the highest level of sharing 、 The mode with the lowest isolation level

​ advantage : Lowest maintenance and acquisition costs , The maximum number of tenants supported by running each database

​ shortcoming : Lowest isolation level , Minimum safety , We need to increase the amount of safety development in the design and development , Data backup and recovery is the most difficult , You need to backup and restore one by one .

原网站

版权声明
本文为[A music loving programmer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203020627519573.html