当前位置:网站首页>Installation and use of sqoop
Installation and use of sqoop
2022-07-01 16:34:00 【It addict】
python Programming fast ( Ongoing update …)
Recommended system basis
List of articles
One 、Sqoop Introduce
effect : Data exchange tools , Can achieve data in mysql/oracle<–>hdfs Pass each other
principle : By writing sqoop The command sqoop The order is translated into mapreduce, adopt maperdece Connect various data , Realize data transfer
Two 、Sqoop principle

3、 ... and 、Sqoop install
Prepare the installation package in advance
sqoop link
Extraction code :4tkp
Unpack the installation
Support the installation package into /software Under the table of contents
tar -zxvf sqoop-1.4.6-cdh5.14.2.tar.gz -C /opt
Get into /opt Yes sqoop Change of name
cd /opt/
mv sqoop-1.4.6-cdh5.14.2/ sqoop
Configure environment variables
vi /etc/profile
export SQOOP_HOME=/opt/sqoop
export PATH= S Q O O P H O M E / b i n : SQOOP_HOME/bin: SQOOPHOME/bin:PATH
Let the configuration file take effect
source /etc/profile
Modify the configuration file
cd sqoop/conf
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export ZOOKEEPER_HOME=/opt/zookeeper
export ZOOCFGDIR=/opt/zookeeper
export HBASE_HOME=/opt/hbase
Two to be prepared jar The bag dragged to /opt/sqoop/lib Under the table of contents
Verify input
sqoop help
An order means success
function sqoop1.4.5 newspaper Warning: does not exist! HCatalog jobs will fail.
Get into bin
cd vi configure-sqoop
notes
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
Four 、Sqoop Use
1、MySQL->HDFS
Get ready sql Script , Put it in the directory you know
preparation :mysql Build database, build table
mysql> create database sqoop;
mysql> use sqoop;
mysql> source /tmp/retail_db.sql
mysql> show tables;

Use sqoop take customers Table import to hdfs On
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop //mysql Database on
–driver com.mysql.jdbc.Driver
–table customers //mysql Table on
–username root //mysql user name
–password root // password
–target-dir /tmp/customers // The goal is HDFS route
–m 3 //map Number
sqoop import --connect jdbc:mysql://localhost:3306/sqoop --driver com.mysql.jdbc.Driver --table customers --username root --password root --target-dir /tmp/customers --m 3
Use where Filter
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–where “order_id<500”
–username root
–password root
–target-dir /data1/retail_db/orders
–m 3
Use colum Filter
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop1
–driver com.mysql.jdbc.Driver
–table emp
–columns “EMPNO,ENAME,JOB,HIREDATE”
–where “SAL>2000”
–username root
–password root
–delete-target-dir
–target-dir /data1/sqoop1/emp
–m 3
Using query statements
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select * from orders where order_status!=‘CLOSED’ and $CONDITIONS”
–username root
–password root
–split-by order_id
–delete-target-dir
–target-dir /data1/retail_db/orders
–m 3
Append import
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘2014-07-24 00:00:00’
–target-dir /data1/retail_db/orders
–m 3
2、 establish job
establish job Be careful import There must be a space before
sqoop job
–create mysqlToHdfs
– import
–connect jdbc:mysql://localhost:3306/sqoop
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘0’
–target-dir /data1/retail_db/orders
–m 3
see job
sqoop job --list
perform job
sqoop job --exec mysqlToHdfs
Timing execution
crontab -e
- 2 */1 * *
sqoop job --exec mysqlToHdfs
3、 Import data to Hive in
First in Hive Create a table
hive -e “create database if not exists retail_db;”
If the target path exists, an error will be reported Delete the existing directory
hdfs dfs -rmr hdfs://hadoop1:9000/user/root/orders1
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–hive-import
–create-hive-table
–hive-database retail_db
–hive-table orders1
–m 3
Import data to Hive partition
Delete Hive surface
drop table if exists orders;
Import
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select order_id,order_status from orders where order_date>=‘2013-11-03’ and order_date <‘2013-11-04’ and $CONDITIONS”
–username root
–password ok
–delete-target-dir
–target-dir /data1/retail_db/orders
–split-by order_id
–hive-import
–hive-database retail_db
–hive-table orders
–hive-partition-key “order_date”
–hive-partition-value “2013-11-03”
–m 3
Be careful : Partition fields cannot be imported into the table as ordinary fields
4、 Import data to HBase in
1. stay HBase CSCEC table
create ‘products’,‘data’,‘category’
2.sqoop Import
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password ok
–table products
–hbase-table products
–column-family data
–m 3
5、HDFS towards MySQL Export data from
1.MySQL CSCEC table
create table customers_demo as select * from customers where 1=2;
2. Upload data
hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput
3. Derived data
sqoop export
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password root
–table customers_demo
–export-dir /customerinput
–m 1
6、 To write sqoop Script
1. Write a script job_01.opt
import
--connect
jdbc:mysql://localhost:3306/sqoop
--driver com.mysql.jdbc.Driver
--table customers
--username root
--password root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m 3
2. Execute the script
sqoop --options-file job_01.opt
边栏推荐
- SQLServer查询: a.id与b.id相同时,a.id对应的a.p在b.id对应的b.p里找不到的话,就显示出这个a.id和a.p
- Idea start command line is too long problem handling
- The supply of chips has turned to excess, and the daily output of Chinese chips has increased to 1billion, which will make it more difficult for foreign chips
- 运动捕捉系统原理
- Crypto Daily: Sun Yuchen proposed to solve global problems with digital technology on MC12
- 今天14:00 | 港大、北航、耶鲁、清华、加大等15位ICLR一作讲者精彩继续!
- 芯片供应转向过剩,中国芯片日产增加至10亿,国外芯片将更难受
- In the past six months, it has been invested by five "giants", and this intelligent driving "dark horse" is sought after by capital
- Does 1.5.1 in Seata support mysql8?
- IM即时通讯开发实现心跳保活遇到的问题
猜你喜欢

数据库系统原理与应用教程(006)—— 编译安装 MySQL5.7(Linux 环境)

运动捕捉系统原理

She is the "HR of others" | ones character

The sharp drop in electricity consumption in Guangdong shows that the substitution of high-tech industries for high-energy consumption industries has achieved preliminary results

Where should older test / development programmers go? Will it be abandoned by the times?

Research on multi model architecture of ads computing power chip
![[daily news]what happened to the corresponding author of latex](/img/0f/d19b27dc42124c89993dee1bada838.png)
[daily news]what happened to the corresponding author of latex

idea启动Command line is too long问题处理

Crypto Daily: Sun Yuchen proposed to solve global problems with digital technology on MC12

普通二本,去过阿里外包,到现在年薪40W+的高级测试工程师,我的两年转行心酸经历...
随机推荐
Golang爬虫框架初探
Endeavouros mobile hard disk installation
IM即时通讯开发实现心跳保活遇到的问题
怎么用MySQL语言进行行列装置?
Korean AI team plagiarizes shock academia! One tutor with 51 students, or plagiarism recidivist
【观察】数字化时代的咨询往何处走?软通咨询的思与行
Pico, do you want to save or bring consumer VR?
Why is the pkg/errors tripartite library more recommended for go language error handling?
Sqlserver query: when a.id is the same as b.id, and the A.P corresponding to a.id cannot be found in the B.P corresponding to b.id, the a.id and A.P will be displayed
【Hot100】19. 删除链表的倒数第 N 个结点
Principle of motion capture system
Building blocks for domestic databases, stonedb integrated real-time HTAP database is officially open source!
In the past six months, it has been invested by five "giants", and this intelligent driving "dark horse" is sought after by capital
[nodemon] app crashed - waiting for file changes before starting...解决方法
vim用户自动命令示例
Use Tencent cloud to build a map bed service
Authentication processing in interface testing framework
数据库系统原理与应用教程(006)—— 编译安装 MySQL5.7(Linux 环境)
瑞典公布决定排除华为5G设备,但是华为已成功找到新出路
How long will it take to achieve digital immortality? Metacosmic holographic human avatar 8i
