当前位置:网站首页>Exploration of sqoop1.4.4 native incremental import feature
Exploration of sqoop1.4.4 native incremental import feature
2022-07-03 12:37:00 【Brother Xing plays with the clouds】
Original ideas
To implement incremental import , It's completely possible to not use Sqoop The native incremental feature of , Use only shell The script generates a fixed time range based on the current time , Then joining together Sqoop Command statement .
Introduction to native incremental import features
Sqoop Provides the feature of native incremental import , It contains the following three key parameters :
Argument | Description |
---|---|
--check-column (col) | To specify a “ Flag column ” Used to determine the data range of incremental import , The column cannot be of character type , It's better to be numeric or date ( This is easy to understand ). |
--incremental (mode) | Specify incremental mode , contain “ Append mode ” append and “ Last modification mode ” lastmodified ( This mode is more suitable for common needs ). |
--last-value (value) | Appoint “ Flag column ” Upper bound of last import . If “ Flag column ” Is the last modification time , be --last-value Is the time when the import script was last executed . |
combination Saved Jobs Mechanism , You can schedule incremental updates repeatedly Job when --last-value Automatic update assignment of fields , combining cron perhaps oozie Time scheduling for , It can realize real incremental update .
experiment : The incremental job Creation and execution of
Create incremental updates job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ sqoop job --create incretest -- import --connect jdbc:Oracle:thin:@192.168.0.138:1521:orcl --username HIVE --password hivefbi --table FBI_SQOOPTEST --hive-import --hive-table INCRETEST --incremental lastmodified --check-column LASTMODIFIED --last-value '2014/8/27 13:00:00'
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/08/27 17:29:37 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/08/27 17:29:37 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-home
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-overwrite
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --create-hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-key
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-value
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --map-column-hive
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
14/08/27 17:29:37 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
14/08/27 17:29:37 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
14/08/27 17:29:37 WARN tool.BaseSqoopTool: from command line to remove this warning.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
14/08/27 17:29:37 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
14/08/27 17:29:37 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
perform Job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Notice what appears in the log SQL sentence :
14/08/27 17:36:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014/8/27 13:00:00', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is create job Specified in the statement of , The upper bound is current time 2014-08-27 17:36:23.
verification :
hive> select * from incretest;
OK
2 lion 2014-08-27
Time taken: 0.085 seconds, Fetched: 1 row(s)
Then I asked Oracle Insert a piece of data in :
Execute it again :
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Displayed in the log SQL sentence :
14/08/27 17:47:19 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is the last execution of this job The upper bound of , in other words ,Sqoop Of “Saved Jobs” Mechanism for incremental import classes Job, The last execution time is automatically recorded , And automatically assign the time to the next execution --last-value Parameters ! in other words , We just need to pass crontab Set regular execution of this job that will do ,job Medium --last-value Will be “Saved Jobs” The mechanism is automatically updated to achieve real incremental import .
above Oracle The newly added data in the table is successfully inserted Hive In the table .
Again to oracle Add a new piece of data in the table , Perform the task again job, The situation remains the same , The log shows that the previous upper bound automatically becomes the lower bound of this import :
14/08/27 17:59:34 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:59:34', 'YYYY-MM-DD HH24:MI:SS') )
边栏推荐
- 启用MemCached的SASL认证
- Sword finger offer07 Rebuild binary tree
- Adult adult adult
- 写一个简单的nodejs脚本
- DEJA_ Vu3d - cesium feature set 053 underground mode effect
- Shell: basic learning
- Swift return type is a function of function
- 232. Implement queue with stack
- init. RC service failed to start
- [ManageEngine] the role of IP address scanning
猜你喜欢
随机推荐
Develop plug-ins for idea
flinksql是可以直接客户端建表读mysql或是kafka数据,但是怎么让它自动流转计算起来呢?
Eureka self protection
记录自己vulnhub闯关记录
Record your vulnhub breakthrough record
lambda与匿名内部类的区别
Tensorflow binary installation & Failure
Airflow installation jump pit
Slf4j log facade
最新版盲盒商城thinkphp+uniapp
Nodejs+Express+MySQL实现登陆功能(含验证码)
232. Implement queue with stack
The difference between lambda and anonymous inner class
Shutter: overview of shutter architecture (excerpt)
Computer version wechat applet full screen display method, mobile phone horizontal screen method.
Adult adult adult
242. Effective letter heteronyms
102. Sequence traversal of binary tree
Lambda expression
公纵号发送提示信息(用户微服务--消息微服务)