当前位置:网站首页>Sqoop command
Sqoop command
2022-07-05 02:46:00 【A vegetable chicken that is working hard】
Data import
- This is just for the convenience of the next command test
1.project Group data
- database
1.create database sqooptest1
2.use sqooptest1
3.create table project(
id int not null auto_increment primary key,
name varchar(100) not null,
type tinyint(4) not null default 0,
description varchar(500) default null,
create_at date default null,
update_at timestamp not null default current_timestamp on update current_timestamp,
status tinyint(4) not null default 0
);
4.insert into project( name,type,description,create_at,status)
values( 'project1',1,'project1 zy','2019-07-27',0);
insert into project( name,type,description,create_at,status)
values( 'project2',1,'project2 zy','2019-07-26',0);
insert into project( name,type,description,create_at,status)
values( 'project2',2,'project2 zy','2019-07-25',0);
- sqoop command
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1 --username root --password a --table project
- result
2.students Group data
- database
1.create database sqooptest1
2.use sqooptest1
3.create table students(
id int not null primary key,
name varchar(100) not null,
age varchar(100) not null
);
- Data location
E:\JAVA Course \...\11.Hadoop\12.Sqoop\a.txt
- Insert data into the database
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class AddBatchMysql {
public static void main(String[] args) {
//1. User input file location
Scanner sc = new Scanner(System.in);
System.out.println(" file location :");
String path = sc.nextLine();
//2. Read all data in the file in the form of stream , According to the line read , Press \t cut , Separate out id,name,age
List<String> list = new ArrayList<String>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path))))){
String str;
while((str=br.readLine())!=null){
list.add(str);
}
} catch (Exception e) {
e.printStackTrace();
}
//3. Bulk insert data
System.out.println(" The total number of data :"+list.size());
String sql = "insert into students values(?,?,?)";
Connection con=null;
PreparedStatement pstmt=null;
try {
con = DriverManager.getConnection("jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC","root","a");
con.setAutoCommit(false);// Set to manually commit transactions
pstmt = con.prepareStatement(sql);
int total = 0;
String s;
String[] ss;
for(int i=0;i<list.size();i++){
s = list.get(i);
ss = s.split("\t");
pstmt.setString(1, ss[0]);
pstmt.setString(2, ss[1]);
pstmt.setString(3, ss[2]);
// Add batch operation
// Add the current operation to the batch cache
pstmt.addBatch();
if((i+1)%1000==0){
//1000 Data processing once
int[] res = pstmt.executeBatch();
total+=sum(res);
con.commit();
pstmt.clearBatch();
}
}
int[] res = pstmt.executeBatch();
System.out.println(res);
total+=sum(res);
con.commit();
pstmt.clearBatch();
System.out.println(" Actual number of inserted data :"+total);
} catch (Exception e) {
e.printStackTrace();
try {
con.rollback();
} catch (SQLException e1) {
e1.printStackTrace();
}
}finally {
if(con!=null){
try {
con.setAutoCommit(true);
} catch (SQLException e) {
e.printStackTrace();
}
try {
con.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
private static int sum(int[] res){
int total = 0;
if(res==null&&res.length<=0){
return 0;
}
for(int i=0;i<res.length;i++){
total+=res[i];
}
return total;
}
}
- Wait until the data insertion is completed ...
Command official website
sqoop-list-
1. Lieku
- command
sqoop-list-databases --connect jdbc:mysql://localhost:3306/mysql?serverTimezone=UTC --username root --password a --verbose
--verbose: Print more information at work
2. list
sqoop-list-tables --connect jdbc:mysql://localhost:3306/mysql?serverTimezone=UTC --username root --password a --verbose
sqoop import-
1. Specify the path :–target-dir
- –target-dir
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --target-dir /sqooptest/input1/project
Specify the directory :/sqooptest/input1/project
Actual catalog :/sqooptest/input1/project
- result
2. The table name is used as the data warehouse name :–warehouse-dir
- –warehouse-dir
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input2
Specify the directory :/sqooptest/input2
Actual catalog :/sqooptest/input2/project
analysis : Create a directory named table name under the specified directory , At this time, the table name is regarded as a data warehouse name (warehouse)
- result
3. Specify the columns and query criteria to query
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input3 --columns 'id,name,type' --where 'id>2' -m 1
--table
--columns
--where
-m: It means only one mapper, One mapper Corresponding to a slice , Corresponding to an output file
Because in the --table, So the above will be assembled automatically sql sentence . , Cannot be associated with -e or -query share
4. Appoint sql sentence
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --target-dir /sqooptest/input4/project --query 'select id,name,type from project where id>2 and $CONDITIONS' --split-by project.id -m 1
--query: Cannot be associated with --table, --columns share
$CONDITIONS: Indicates the partition column
--split-by: Table columns for splitting work units , Cannot be associated with --autoreset-to-one-mapper Use options together
-m: It means only one mapper, One mapper Corresponding to a slice , Corresponding to an output file
5.–direct
- Failure , A pit !!!!!
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input5 --direct -m 1
--direct Use mysqldump Command to complete the import , Because it's a cluster ,map The task is assigned to each node to run , So every node should have mysqldump command
6. Incremental import
- Used to retrieve only rows that are newer than some previously imported rowsets
- Parameters
--check-column: Check the columns
--incremental append: How to determine which values are up-to-date
append: Additional
lastmodified: Last revision
--last-value: The maximum value retrieved from the last import
- insert data
insert into project( name,type,description,create_at,status)
values( 'project5',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project6',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project7',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project8',5,'project5 zy','2019-07-25',0);
- command
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input6 -m 1 --check-column id --incremental append --last-value 3
- result
- insert data
insert into project( name,type,description,create_at,status)
values( 'project9',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project10',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project11',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project12',5,'project5 zy','2019-07-25',0);
- command : Note that the output directory has not changed
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input6 -m 1 --check-column id --incremental append --last-value 7
- result : The output directory in the figure is correct input6
- Add the last modification time
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --warehouse-dir /sqooptest/input8 -m 1 --check-column update_at --incremental lastmodified --last-value "2022-06-29 15:46:12" --append
--append: Append data to HDFS Existing dataset in
- result
sqoop job-
1. Grammar format
sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
Be careful -- After space
2. Create tasks , Import sqooptest1 In the library project The contents of the table go to hadoop
- The original order
sqoop import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --target-dir /sqooptest/input9/project -m 1
- Create task command
sqoop job --create yc-job1 -- import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password a --table project --target-dir /sqooptest/input10/project -m 1
- Possible problems
1. The task already exists , Please change the task name or delete the original task .
2.Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
The lack of jar package (org.json.json), take java-json.jar Packages uploaded to sqoop/lib It's a bag
- There are two ways to view the created task
sqoop job --list
sqoop job --show yc-job1
- Perform tasks
sqoop job --exec yc-job1
- View the execution results
3. Log in to the database with a password file
- When creating tasks above , Tips MySQL Password entry for , Blocking the automatic operation
- official 7.2.1 Prompt to configure password file
- Create password hidden file
echo -n "a" >/root/.mysql.password
chmod 400 /root/.mysql.password
- List all files , Including hiding
ls -al
- The code that creates the task
sqoop job --create yc-job2 -- import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password-file file:////root/.mysql.password --table project --target-dir /sqooptest/input11/project -m 1
- View the created task
sqoop job --list
sqoop job --show yc-job2
- Perform tasks
sqoop job --exec yc-job2
- View the execution results
4. Create an additional import task
- To mysql Check out project Tabular id Maximum
- Insert some new data
insert into project( name,type,description,create_at,status)
values( 'project12',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project13',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project14',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project15',5,'project5 zy','2019-07-25',0);
- Create task code
sqoop job --create yc-job3 -- import --connect jdbc:mysql://node3:3306/sqooptest1?serverTimezone=UTC --username root --password-file file:////root/.mysql.password --table project --target-dir /sqooptest/input12/project -m 1 --check-column id --incremental append --last-value 11
- View the created task
sqoop job --list
- Perform tasks
sqoop job --exec yc-job3
- View the execution results
- Insert some new data
insert into project( name,type,description,create_at,status)
values( 'project16',5,'project5 zy','2019-07-25',0);
insert into project( name,type,description,create_at,status)
values( 'project17',5,'project5 zy','2019-07-25',0);
- Perform tasks
sqoop job --exec yc-job3
- View the execution results
- ** The output shows , This job There is one on the bottom called metastore Metabase of (sqlite, metastore) Store the current id The latest value of , So that you can import from here next time , This facilitates scheduled tasks , There is no need for programmers to record the updated data **
5. Time work
- Three options
1.oozie,azkaban frame ***
2. Write a timer (Thread class ,java.util.TimerTask class ,Quartz Timer frame ->cron expression )
3.centos Self contained crontab Realization ****
- Implementation of the third scheme
- /usr/local/bin Create sqoop_incremental.sh Scheduled task script file
cd /usr/local/bin
vim sqoop_incremental.sh
#! /bin/bash
/usr/local/sqoop147/bin/sqoop job --exec yc-job3>>/usr/local/sqoop147/myjob.out 2>&1 &
# explain
#/usr/local/sqoop147/bin/sqoop:sqoop Command full path , Prevent missing
#/usr/local/sqoop147/myjob.out: The result of the command is output to myjob.out
#2>&1: The error log is also regarded as the correct log
#&: Background processes
- establish crontab
crontab -e
# Every time 5 Once per minute
*/5 * * * * /usr/bin/bash /usr/local/sqoop147/bin/sqoop_incremental.sh
# Format : branch when Japan month Zhou command
- Insert some new data
insert into project( name,type,description,create_at,status)
values( 'project18',5,'project5 zy','2019-07-25',0);
- Wait about five minutes
- view log file :/usr/local/sqoop147/myjob.out
- View the execution results
边栏推荐
- Breaking the information cocoon - my method of actively obtaining information - 3
- Design of kindergarten real-time monitoring and control system
- Asynchronous and promise
- Utilisation simple de devtools
- Naacl 2021 | contrastive learning sweeping text clustering task
- [機緣參悟-38]:鬼穀子-第五飛箝篇 - 警示之一:有一種殺稱為“捧殺”
- Sqoop命令
- [download white paper] does your customer relationship management (CRM) really "manage" customers?
- Talk about the things that must be paid attention to when interviewing programmers
- [Yu Yue education] National Open University spring 2019 0505-22t basic nursing reference questions
猜你喜欢
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
Flume配置4——自定义MYSQLSource
【LeetCode】222. The number of nodes of a complete binary tree (2 mistakes)
Design and implementation of community hospital information system
Asynchronous and promise
Apache Web page security optimization
ELFK部署
Why are there fewer and fewer good products produced by big Internet companies such as Tencent and Alibaba?
spoon插入更新oracle数据库,插了一部分提示报错Assertion botch: negative time
Voice chip wt2003h4 B008 single chip to realize the quick design of intelligent doorbell scheme
随机推荐
Write a thread pool by hand, and take you to learn the implementation principle of ThreadPoolExecutor thread pool
Pytest (4) - test case execution sequence
2021 Li Hongyi machine learning (2): pytorch
Port, domain name, protocol.
Idea inheritance relationship
Application and Optimization Practice of redis in vivo push platform
Start the remedial work. Print the contents of the array using the pointer
Chinese natural language processing, medical, legal and other public data sets, sorting and sharing
From task Run get return value - getting return value from task Run
tuple and point
[機緣參悟-38]:鬼穀子-第五飛箝篇 - 警示之一:有一種殺稱為“捧殺”
丸子百度小程序详细配置教程,审核通过。
Pytorch register_ Hook (operate on gradient grad)
openresty ngx_lua执行阶段
Design and implementation of high availability website architecture
Last week's hot review (2.7-2.13)
Comparison of advantages and disadvantages between platform entry and independent deployment
Vb+access hotel service management system
2021 Li Hongyi machine learning (3): what if neural network training fails
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety