当前位置:网站首页>Tidb HTAP Getting Started Guide - how to add a copy of tiflash
Tidb HTAP Getting Started Guide - how to add a copy of tiflash
2022-06-24 02:21:00 【PingCAP】
TiFlash yes TiDB HTAP The key components of form , It is TiKV Expansion of column storage , stay Provides good isolation while , It also takes into account strong consistency . List and save copies through Raft Learner Protocol asynchronous replication , But when reading through Raft Proofreading index matching MVCC The way to get Snapshot Isolation Consistency isolation level . This architecture solves HTAP The isolation of scenarios and the synchronization of columns and storage .
Use TiFlash front , You need to add... To the table TiFlash copy . Many users added TiFlash There was a problem with the copy .TiFlash The replica is always unavailable Official documents summarize some simple troubleshooting .
This article will introduce The current version ( Currently all release Of 4.x, 5.x edition ) Next to the TiDB Table addition in TiFlash How copies work , Mainly for DBA When students check related problems , You can refer to what aspects to collect information and try to solve .
Basic concepts
stay PD From the perspective of ,TiFlash Examples and TiKV Examples are similar to , It's all one store, It's just TiFlash Of store There will be “key=engine, value=TiFlash” One of the label. add to TiFlash After copy ,PD hold region Dispatch to TiFlash, And let some of them region Always just learner There is a form of , Depends on Placement Rules function .
TiFlash The instance contains a modified version of TiKV Code , Mainly responsible for and TiKV Collaborative processing Raft The operation of the layer , Its output log is the same as TiKV Almost the same .TiUP When the deployment , Its log will be output to tiflash_tikv.log.
TiFlash The instance will periodically start a sub process to deal with TiFlash Copy add 、 Delete related operations . If you occasionally see a process named tiflash_cluster_manager Non resident process ( It's called... On the official website “pd buddy”), It's normal . Its log will be output to tiflash_cluster_manager.log.
TiFlash Internal component architecture diagram
add to TiFlash Copy stages
The work of components in a cluster
add to TiFlash Sequence diagram of the copy
Perform copy number modification DDL
stay TiDB In the implementation of `alter tableset tiflash replica` when , This statement is used as DDL Statement execution .
from progress 0.0 To 1.0 During the synchronization of
TiDB Provide http Interface , Other components can query which tables exist through this interface TiFlash copy :`curl http://:/tiflash/replica`.
TiFlash Yes Regular tasks , be responsible for :
- from TiDB Of tiflash/replica Which tables are pulled by the interface / Zoning TiFlash copy . For not available Table of , If the table is in PD There is no corresponding on Placement Rules, The task will be responsible for setting up corresponding rule,key range by [ t__r, t__ ).
- For not available Table of , The task will start from PD Pull key range Corresponding region_id, And all online TiFlash store How many have been synchronized region_id.
- With TiFlash store After removing the heavy region_id Number PD in region_id Number , By giving tiflash/replica Interface send POST Update synchronization progress as requested progress.
- If PD in placement-rules but tiflash/replica There is no corresponding table_id, Explain the table / The partition has been DROP And it's been GC Time , Will arrive PD Remove the corresponding rule.
The log output of this component is tiflash_cluster_manager.log. If there are more than one in the cluster TiFlash, Will pass PD Built in etcd Choose one to be responsible for the above tasks . When you pass the log troubleshooting, you need to get the nodes responsible for the work in this time period , Or take all TiFlash Related logs of nodes .
PD act : Received placement-rules after ,PD Meeting :
- First pair Region Segmentation , Make sure Region The boundary of will not cross Table data And Indexes ( because TiFlash Synchronize only the data part of the table )
- Yes Region Of Leader Send out AddLearner To TiFlash store The scheduling
TiKV act :
- TiKV in Region Of Leader Accept and execute PD Of AddLearner command
- Region Leader With Snapshot Form handle Region Data sent to TiFlash Of Region peer
For existing TiFlash Partition table of the replica Add partition The process of
TiDB Yes, there is already TiFlash Partition table of the replica Add partition when , Will be generated in partition after ( But not visible to users )block And wait for . until TiFlash Report the partition Corresponding partition_id already available after ,DDL Before execution is complete .( Details can be referred to TiDB relevant PR)
about TiFlash for , Add a... To the partition table partition It is similar to adding a normal table , You can refer to the above process . The difference is that in this case , Will be extra in PD add to accelerate-schedule The operation of , Promote partition table key range relevant Region Scheduling priority of , With Expect that when the cluster is busy , Shorten the partition table available Speed , Reduce DDL block Time for .
Why block Partition table Add partition operation :
- If not block Add partition Of DDL operation , When the user executes a query statement ( such as count(*) ), If the query selects from TiFlash read , But new partition Upper region Has not yet established TiFlash copy , This will cause the user's query because a few region Failure . Show that the user is executing Add partition when , Querying the table is unstable , Easy to fail .
- In order to avoid causing query instability ,block Partition table Add partition operation , The partition to be created Region Set up TiFlash copy ready The partition is not allowed to be read until .
The direction of troubleshooting in case of problems at different stages ( give an example )
perform `alter tableset tiflash replica` Stuck when
Generally speaking , This sentence DDL The operation only modifies TiDB Meta information in , Execution will not block for too long . If the execution of this statement gets stuck , You can see if there are any other DDL operation block The execution of the statement ( For example, whether there exists on the same table add index operation ). You can refer to other TiDB in DDL Stuck experience [FAQ] DDL Stuck troubleshooting experience - TiDB common FAQ.
Copy number modified successfully , however progress It's been zero , perhaps progress Progress , But it is “ slow ”
- First, according to TiFlash The replica is always unavailable Identify the basic questions
- If the above investigation is correct , First check tiflash_cluster_manager.log Log . See if it's related to TiDB or PD Connection exception , If there is any abnormality , First confirm that it is the of the relevant components API Query timeout (curl http://:10080/tiflash/replica, see TiDB And TiFlash Synchronization interface ) There is still a problem with network connectivity .
- Then confirm whether the table in question has been created placement-rule (tiflash_cluster_manager.log Keywords in the log “Set placement rule … table--r”), Report to TiDB Progress information for (id, region_count, flash_region_count). confirm PD Whether the corresponding table can be queried on rule ( Reference resources Placement Rules Using document ).
- Confirm synchronization progress “ slow ” Specific performance of . The table that went wrong , Its flash_region_count Is it a long time ” There is no change ”, Or just “ Change slowly ” ( For example, it will rise a few minutes region).
- If it is “ There is no change ”, It is necessary to check what link has a problem on the whole working link . TiFlash to PD set up rule -> PD to TiKV Medium Region leader Send out AddLearner Dispatch -> TiKV to TiFlash Sync Region data Is there a problem with this link , Collect logs of related components for troubleshooting . You can check tikv、tiflash-proxy In the log warn/error Information , Confirm whether there are errors such as network isolation .
- If it is “ Change slowly ”, It can be investigated TiFlash Current load 、PD The scheduling . Main observation Grafana Medium TiFlash-Summary Kanban ,Raft in “Applying snapshots Count”、“Snapshot Predecode Duration”、“Snapshot Flush Duration” Several figures , reflect TiFlash adopt ApplySnapshot Concurrency of received data 、apply It takes a long time ; as well as Storage Write Stall Medium “Write Stall Duration” Write too often , That led to Write Stall The phenomenon ; Collect other information such as CPU、 disk IO Load, etc , as well as TiFlash Log . PD See... For relevant scheduling parameter adjustment :PD Scheduling parameters .
Yes, there is already TiFlash Partition table of the replica Add partition Stuck in the process
according to PR Medium comment, If it's because TiFlash Failed to create a copy and block live , prints “[ddl] partition replica check failed” Log . The next investigation direction , Probably whether there were more Region In establishment TiFlash copy 、TiFlash apply snapshot The pressure of the 、PD Whether the scheduling priority is effective, etc .
appendix :
Some auxiliary troubleshooting in the process API:
TiDB Query in TiFlash copy 、 Progress, etc
select * from information_schema.tiflash_replica
Look at the recent perform pending Of DDL Mission
admin show ddl jobs
TiDB In order to get TiFlash Copy message API Interface ( And TiFlash The main interface of interaction )
curl http://:/tiflash/replica
TiDB In the query table Region Information
SHOW TABLEREGIONS;
Query individual TiFlash Node table_id Corresponding Region Information
echo "DBGInvoke dump_all_region(,true)" | curl "http://:/?query=" --data-binary @-
PD Query in Region Information about
tiup ctl pd -u http://:region
PD Query in Placement-rules Information
tiup ctl pd -u http://:config placement-rules show
边栏推荐
- Modify the original place where the method needs to be called and triggered
- Leetcode969: pancake sorting (medium, dynamic programming)
- Research Report on global and Chinese titanium concentrate market scale and investment prospects 2022-2028
- No serializer found for class ** and no propert no properties discovered to create BeanSerializer
- The cloud University of "digital and real integration, CO building authentic Internet" is surging forward
- A complete collection of SQL commands. Each command has an example. Xiaobai can become a God after reading it!
- How to obtain the information of easydss single / multiple live streams through the API interface?
- [supply chain • case] Tianneng group: the key to understanding the leading battery manufacturer to achieve the first profit fault
- How to protect your code - ollvm (1)
- How to build an enterprise website? Is it difficult?
猜你喜欢

163 mailbox login portal display, enterprise mailbox computer version login portal

Stm32g474 infrared receiving based on irtim peripherals

BIM model example

If there are enumerations in the entity object, the conversion of enumerations can be carried out with @jsonvalue and @enumvalue annotations

2020 language and intelligent technology competition was launched, and Baidu provided the largest Chinese data set

Leetcode969: pancake sorting (medium, dynamic programming)

Advanced BOM tool intelligent packaging function

How to fill in and register e-mail, and open mass mailing software for free

Introduction to development model + test model

Review of AI hotspots this week: the Gan compression method consumes less than 1/9 of the computing power, and the open source generator turns your photos into hand drawn photos
随机推荐
How to build video websites? What are the types of video websites?
How to build a cloud game server what needs to be considered to build a cloud game server
What is the backbone of marketing website construction? What does it do?
layer 3 switch
The easydss on demand file upload interface calls postman to report an error. Failed to upload the file?
Leetcode problem solving notes for slow ploughing of stupid cattle (dynamic update...)
Stm32g474 infrared receiving based on irtim peripherals
Implementing cos signature with postman
Pan micro reached cooperation with Tencent to help enterprises connect with banking services and support enterprise digital upgrading
The core battlefield of China US AI arms race: trillion level pre training model
Cloud rendering: cloud exhibition hall of Tencent digital ecology Conference - open roaming mode on cloud
Designing complex messaging systems using bridging patterns
Talk about 15 tips of SQL optimization
What is the large bandwidth of IDC machine room?
Analysis report on market development trends and innovation strategies of China's iron and steel industry 2022-2028
What is vxlan? What are its advantages?
Using robot framework to realize multi platform automated testing
If you accidentally make the disk dynamic, how to convert it back (do not guarantee it, but take a snapshot before operation)
How to build your own website? Is it difficult?
The new purchased machines with large customized images are slow to enter the system