当前位置:网站首页>Migrate data from CSV files to tidb
Migrate data from CSV files to tidb
2022-07-06 08:01:00 【Tianxiang shop】
This document describes how to start from CSV File migration data to TiDB.
TiDB Lightning Support reading CSV File format , And other delimiter formats , Such as TSV( Tabs separate values ). For others “ Flat file ” Data import of type , You can also refer to this document .
Prerequisite
The first 1 Step : Get ready CSV file
All to be imported CSV Files in the same directory , If you want to TiDB Lightning Identify all CSV file , The file name must meet the following format :
- Containing the data of the whole table CSV file , It needs to be named
${db_name}.${table_name}.csv
. - If a table is distributed in multiple CSV file , these CSV The suffix of document number shall be added to the document name , Such as
${db_name}.${table_name}.003.csv
. The digital part does not need to be continuous , But it must be incremented , And you need to fill the number part with zeros , Make sure the suffix is the same length .
The first 2 Step : Create target table structure
CSV The file itself does not contain table structure information . To put CSV Data import TiDB, You must provide a table structure for the data . You can create a table structure by any of the following methods :
Method 1 : Use TiDB Lightning Create a table structure .
Write include DDL Of the statement SQL The documents are as follows :
- The file name format is
${db_name}-schema-create.sql
, The content should includeCREATE DATABASE
sentence . - The file name format is
${db_name}.${table_name}-schema.sql
, The content should includeCREATE TABLE
sentence .
- The file name format is
Method 2 : Manually downstream TiDB Database and table building .
The first 3 Step : Writing configuration files
New file tidb-lightning.toml
, Contains the following :
[lightning] # journal level = "info" file = "tidb-lightning.log" [tikv-importer] # "local": This mode is used by default , Apply to TB Large amount of data above level , But downstream during import TiDB Unable to provide external services . # "tidb":TB Data volume below level can also be used `tidb` Back end mode , The downstream TiDB Can normally provide services . For more information about the backend mode, see :https://docs.pingcap.com/tidb/stable/tidb-lightning-backends backend = "local" # Set the temporary storage address of sorted key value pairs , The destination path must be an empty directory , The directory space must be larger than the size of the dataset to be imported , It is recommended to set it to and `data-source-dir` Different disk directories and flash media , Monopoly IO You will get better import performance sorted-kv-dir = "/mnt/ssd/sorted-kv-dir" [mydumper] # Source data directory . data-source-dir = "${data-path}" # Local or S3 route , for example :'s3://my-bucket/sql-backup?region=us-west-2' # Definition CSV Format [mydumper.csv] # Field separator , Must not be empty . If the source file contains fields of non string or numeric type ( Such as binary, blob, bit etc. ), It is not recommended to use the default “,” Simple separator , recommend “|+|” And other unusual character combinations separator = ',' # Reference delimiter , Can be zero or more characters . delimiter = '"' # CSV Whether the file contains header . # If true, be lightning Will use the first line of content to resolve the corresponding relationship of fields . header = true # CSV Does it include NULL. # If true,CSV No column of the file can be resolved to NULL. not-null = false # If `not-null` by false( namely CSV Can contain NULL), # Fields with the following values will be resolved to NULL. null = '\N' # Whether the backslash contained in the string ('\') Characters are treated as escape characters backslash-escape = true # Whether to remove the last separator at the end of the line . trim-last-separator = false [tidb] # Information about the target cluster host = ${host} # for example :172.16.32.1 port = ${port} # for example :4000 user = "${user_name}" # for example :"root" password = "${password}" # for example :"rootroot" status-port = ${status-port} # Import process Lightning You need to start from TiDB Of “ Status port ” Get table structure information , for example :10080 pd-addr = "${ip}:${port}" # colony PD The address of ,Lightning adopt PD Get some information , for example 172.16.31.3:2379. When backend = "local" when status-port and pd-addr Must be filled in correctly , Otherwise, an exception will appear in the import .
More information about configuration files , Please see the TiDB Lightning Configuration parameters .
The first 4 Step : Import performance optimization ( Optional )
The unified size of the imported file is about 256 MiB when ,TiDB Lightning Can achieve the best working condition . If you import a single CSV A large file ,TiDB Lightning In the default configuration, only one thread can be used to process , This will slow down the import .
To solve this problem , But first of all, I will CSV The file is divided into multiple files . For general format CSV file , Without reading the entire file , Unable to quickly determine the start and end positions of lines . therefore , By default TiDB Lightning Will not automatically split CSV file . But if you are sure what to import CSV The document complies with specific restrictions , You can enable strict-format
Pattern . When enabled ,TiDB Lightning Will single CSV The large file is divided into a single size of 256 MiB Multiple file blocks for parallel processing .
Be careful
If CSV The file is not strictly formatted , but strict-format
It is mistakenly set to true
, A single complete field spanning multiple rows will be split into two parts , Causes parsing to fail , Even import the damaged data without error .
Strictly formatted CSV In file , Each field occupies only one line , That is, one of the following conditions must be met :
- delimiter It's empty ;
- Each field does not contain CR (\r) or LF(\n).
If you confirm that the conditions are met , It can be started according to the following configuration strict-format
Mode to speed up import .
[mydumper] strict-format = true
The first 5 Step : Perform import
function tidb-lightning
. If you start the program directly from the command line , Maybe because SIGHUP
Signal and exit , Suggest cooperation nohup
or screen
Tools such as , Such as :
nohup tiup tidb-lightning -config tidb-lightning.toml > nohup.out 2>&1 &
After the import starts , You can view the progress in any of the following ways :
- adopt
grep
Log keywordsprogress
View progress , Default 5 Minute update . - Check the progress through the monitoring panel , Please refer to TiDB Lightning monitor .
- adopt Web Page view progress , Please refer to Web Interface .
After import ,TiDB Lightning Will automatically exit . Check the last of the log 5 There will be the whole procedure completed
, It indicates that the import was successful .
Be careful
Whether the import is successful or not , The last line will show tidb lightning exit
. It just means TiDB Lightning The normal exit , Does not mean that the task is completed .
If you encounter problems during import , Please see the TiDB Lightning common problem .
Files in other formats
If the data source is in another format , In addition to the file name, you must still use .csv
Outside the end , The configuration file tidb-lightning.toml
Of [mydumper.csv]
The format definition also needs to be modified accordingly . The common format is modified as follows :
TSV:
# Format example # ID Region Count # 1 East 32 # 2 South NULL # 3 West 10 # 4 North 39 # Format configuration [mydumper.csv] separator = "\t" delimiter = '' header = true not-null = false null = 'NULL' backslash-escape = false trim-last-separator = false
TPC-H DBGEN:
# Format example # 1|East|32| # 2|South|0| # 3|West|10| # 4|North|39| # Format configuration [mydumper.csv] separator = '|' delimiter = '' header = false not-null = true backslash-escape = false trim-last-separator = true
边栏推荐
- The State Economic Information Center "APEC industry +" Western Silicon Valley will invest 2trillion yuan in Chengdu Chongqing economic circle, which will surpass the observation of Shanghai | stable
- Learn Arduino with examples
- 23. Update data
- . Net 6 learning notes: what is NET Core
- 08- [istio] istio gateway, virtual service and the relationship between them
- MySQL view tablespace and create table statements
- "Friendship and righteousness" of the center for national economy and information technology: China's friendship wine - the "unparalleled loyalty and righteousness" of the solidarity group released th
- Interview Reply of Zhuhai Jinshan
- Data governance: Data Governance under microservice architecture
- C语言自定义类型:结构体
猜你喜欢
Linked list interview questions (Graphic explanation)
File upload of DVWA range
Learn Arduino with examples
解决方案:智慧工地智能巡檢方案視頻監控系統
[count] [combined number] value series
861. Score after flipping the matrix
The State Economic Information Center "APEC industry +" Western Silicon Valley will invest 2trillion yuan in Chengdu Chongqing economic circle, which will surpass the observation of Shanghai | stable
Pyqt5 development tips - obtain Manhattan distance between coordinates
在 uniapp 中使用阿里图标
数据治理:主数据的3特征、4超越和3二八原则
随机推荐
How to prevent Association in cross-border e-commerce multi account operations?
使用 TiDB Lightning 恢复 S3 兼容存储上的备份数据
从 CSV 文件迁移数据到 TiDB
Data governance: misunderstanding sorting
Analysis of Top1 accuracy and top5 accuracy examples
P3047 [USACO12FEB]Nearby Cows G(树形dp)
Codeforces Global Round 19(A~D)
The State Economic Information Center "APEC industry +" Western Silicon Valley will invest 2trillion yuan in Chengdu Chongqing economic circle, which will surpass the observation of Shanghai | stable
Flash return file download
Helm install Minio
Hackathon ifm
MES, APS and ERP are essential to realize fine production
What are the ways to download network pictures with PHP
(lightoj - 1410) consistent verbs (thinking)
On why we should program for all
【T31ZL智能视频应用处理器资料】
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
How to estimate the number of threads
The Vice Minister of the Ministry of industry and information technology of "APEC industry +" of the national economic and information technology center led a team to Sichuan to investigate the operat
49. Sound card driven article collection