当前位置:网站首页>Exploration of sqoop1.4.4 native incremental import feature
Exploration of sqoop1.4.4 native incremental import feature
2022-07-03 12:37:00 【Brother Xing plays with the clouds】
Original ideas
To implement incremental import , It's completely possible to not use Sqoop The native incremental feature of , Use only shell The script generates a fixed time range based on the current time , Then joining together Sqoop Command statement .
Introduction to native incremental import features
Sqoop Provides the feature of native incremental import , It contains the following three key parameters :
Argument | Description |
|---|---|
--check-column (col) | To specify a “ Flag column ” Used to determine the data range of incremental import , The column cannot be of character type , It's better to be numeric or date ( This is easy to understand ). |
--incremental (mode) | Specify incremental mode , contain “ Append mode ” append and “ Last modification mode ” lastmodified ( This mode is more suitable for common needs ). |
--last-value (value) | Appoint “ Flag column ” Upper bound of last import . If “ Flag column ” Is the last modification time , be --last-value Is the time when the import script was last executed . |
combination Saved Jobs Mechanism , You can schedule incremental updates repeatedly Job when --last-value Automatic update assignment of fields , combining cron perhaps oozie Time scheduling for , It can realize real incremental update .
experiment : The incremental job Creation and execution of
Create incremental updates job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ sqoop job --create incretest -- import --connect jdbc:Oracle:thin:@192.168.0.138:1521:orcl --username HIVE --password hivefbi --table FBI_SQOOPTEST --hive-import --hive-table INCRETEST --incremental lastmodified --check-column LASTMODIFIED --last-value '2014/8/27 13:00:00'
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/08/27 17:29:37 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/08/27 17:29:37 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-home
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-overwrite
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --create-hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-key
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-value
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --map-column-hive
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
14/08/27 17:29:37 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
14/08/27 17:29:37 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
14/08/27 17:29:37 WARN tool.BaseSqoopTool: from command line to remove this warning.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
14/08/27 17:29:37 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
14/08/27 17:29:37 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
perform Job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Notice what appears in the log SQL sentence :
14/08/27 17:36:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014/8/27 13:00:00', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is create job Specified in the statement of , The upper bound is current time 2014-08-27 17:36:23.
verification :
hive> select * from incretest;
OK
2 lion 2014-08-27
Time taken: 0.085 seconds, Fetched: 1 row(s)
Then I asked Oracle Insert a piece of data in :
Execute it again :
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Displayed in the log SQL sentence :
14/08/27 17:47:19 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is the last execution of this job The upper bound of , in other words ,Sqoop Of “Saved Jobs” Mechanism for incremental import classes Job, The last execution time is automatically recorded , And automatically assign the time to the next execution --last-value Parameters ! in other words , We just need to pass crontab Set regular execution of this job that will do ,job Medium --last-value Will be “Saved Jobs” The mechanism is automatically updated to achieve real incremental import .
above Oracle The newly added data in the table is successfully inserted Hive In the table .
Again to oracle Add a new piece of data in the table , Perform the task again job, The situation remains the same , The log shows that the previous upper bound automatically becomes the lower bound of this import :
14/08/27 17:59:34 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:59:34', 'YYYY-MM-DD HH24:MI:SS') )
边栏推荐
- Fluent: Engine Architecture
- 232. Implement queue with stack
- Display time with message interval of more than 1 minute in wechat applet discussion area
- 145. Post order traversal of binary tree
- 十条职场规则
- flinksql是可以直接客户端建表读mysql或是kafka数据,但是怎么让它自动流转计算起来呢?
- [combinatorics] permutation and combination (example of permutation and combination)
- 【附下载】密码获取工具LaZagne安装及使用
- New features of ES6
- (最新版) Wifi分销多开版+安装框架
猜你喜欢

Shutter widget: centerslice attribute

Develop plug-ins for idea

Take you to the installation and simple use tutorial of the deveco studio compiler of harmonyos to create and run Hello world?

【附下载】密码获取工具LaZagne安装及使用

Integer int compare size

The future of cloud computing cloud native

Shutter: overview of shutter architecture (excerpt)

Use bloc to build a page instance of shutter

剑指Offer03. 数组中重复的数字【简单】

云计算未来 — 云原生
随机推荐
[combinatorics] permutation and combination (summary of permutation and combination content | selection problem | set permutation | set combination)
Wechat applet - basic content
Sword finger offer04 Search in two-dimensional array [medium]
OpenStack节点地址改变
剑指Offer10- I. 斐波那契数列
十條職場規則
temp
Pytext training times error: typeerror:__ init__ () got an unexpected keyword argument 'serialized_ options'
网上炒股开户安不安全?谁给回答一下
Swagger
idea将web项目打包成war包并部署到服务器上运行
Display time with message interval of more than 1 minute in wechat applet discussion area
The solution to change the USB flash disk into a space of only 2m
RedHat5 安装Socket5代理服务器
Kubectl_ Command experience set
Oh my Zsh + TMUX installation
Is it OK to open an account for online stock speculation? Is the fund safe?
Nodejs+Express+MySQL实现登陆功能(含验证码)
Swift Error Handling
记录自己vulnhub闯关记录