当前位置:网站首页>Exploration of sqoop1.4.4 native incremental import feature
Exploration of sqoop1.4.4 native incremental import feature
2022-07-03 12:37:00 【Brother Xing plays with the clouds】
Original ideas
To implement incremental import , It's completely possible to not use Sqoop The native incremental feature of , Use only shell The script generates a fixed time range based on the current time , Then joining together Sqoop Command statement .
Introduction to native incremental import features
Sqoop Provides the feature of native incremental import , It contains the following three key parameters :
Argument | Description |
|---|---|
--check-column (col) | To specify a “ Flag column ” Used to determine the data range of incremental import , The column cannot be of character type , It's better to be numeric or date ( This is easy to understand ). |
--incremental (mode) | Specify incremental mode , contain “ Append mode ” append and “ Last modification mode ” lastmodified ( This mode is more suitable for common needs ). |
--last-value (value) | Appoint “ Flag column ” Upper bound of last import . If “ Flag column ” Is the last modification time , be --last-value Is the time when the import script was last executed . |
combination Saved Jobs Mechanism , You can schedule incremental updates repeatedly Job when --last-value Automatic update assignment of fields , combining cron perhaps oozie Time scheduling for , It can realize real incremental update .
experiment : The incremental job Creation and execution of
Create incremental updates job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ sqoop job --create incretest -- import --connect jdbc:Oracle:thin:@192.168.0.138:1521:orcl --username HIVE --password hivefbi --table FBI_SQOOPTEST --hive-import --hive-table INCRETEST --incremental lastmodified --check-column LASTMODIFIED --last-value '2014/8/27 13:00:00'
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/08/27 17:29:37 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/08/27 17:29:37 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-home
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-overwrite
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --create-hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-table
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-key
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --hive-partition-value
14/08/27 17:29:37 WARN tool.BaseSqoopTool: --map-column-hive
14/08/27 17:29:37 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
14/08/27 17:29:37 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
14/08/27 17:29:37 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
14/08/27 17:29:37 WARN tool.BaseSqoopTool: from command line to remove this warning.
14/08/27 17:29:37 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
14/08/27 17:29:37 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
14/08/27 17:29:37 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
perform Job:
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Notice what appears in the log SQL sentence :
14/08/27 17:36:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014/8/27 13:00:00', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is create job Specified in the statement of , The upper bound is current time 2014-08-27 17:36:23.
verification :
hive> select * from incretest;
OK
2 lion 2014-08-27
Time taken: 0.085 seconds, Fetched: 1 row(s)
Then I asked Oracle Insert a piece of data in :
Execute it again :
[email protected]:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop job --exec incretest
Displayed in the log SQL sentence :
14/08/27 17:47:19 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:36:23', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') )
among ,LASTMODIFIED The lower bound of is the last execution of this job The upper bound of , in other words ,Sqoop Of “Saved Jobs” Mechanism for incremental import classes Job, The last execution time is automatically recorded , And automatically assign the time to the next execution --last-value Parameters ! in other words , We just need to pass crontab Set regular execution of this job that will do ,job Medium --last-value Will be “Saved Jobs” The mechanism is automatically updated to achieve real incremental import .
above Oracle The newly added data in the table is successfully inserted Hive In the table .
Again to oracle Add a new piece of data in the table , Perform the task again job, The situation remains the same , The log shows that the previous upper bound automatically becomes the lower bound of this import :
14/08/27 17:59:34 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM FBI_SQOOPTEST WHERE ( LASTMODIFIED >= TO_DATE('2014-08-27 17:47:19', 'YYYY-MM-DD HH24:MI:SS') AND LASTMODIFIED < TO_DATE('2014-08-27 17:59:34', 'YYYY-MM-DD HH24:MI:SS') )
边栏推荐
猜你喜欢

Use bloc to build a page instance of shutter

Shutter widget: centerslice attribute

Use Tencent cloud IOT platform to connect custom esp8266 IOT devices (realized by Tencent continuous control switch)

Record your vulnhub breakthrough record

Shutter: overview of shutter architecture (excerpt)

云计算未来 — 云原生

实现验证码验证

社交社区论坛APP超高颜值UI界面

TOGAF认证自学宝典V2.0

idea将web项目打包成war包并部署到服务器上运行
随机推荐
Use of atomicinteger
111. Minimum depth of binary tree
Is it safe to open an account for online stock speculation? Who can answer
The best shortcut is no shortcut
【ManageEngine】IP地址扫描的作用
How to deploy web pages to Alibaba cloud
初入职场,如何快速脱颖而出?
Do you feel like you've learned something and forgotten it?
Slf4j log facade
OpenStack节点地址改变
The difference between lambda and anonymous inner class
T430 toss and install OS majave 10.14
Introduction to concurrent programming (II)
Sword finger offer05 Replace spaces
023 ([template] minimum spanning tree) (minimum spanning tree)
Record your vulnhub breakthrough record
Official website of Unicode query
强大的头像制作神器微信小程序
[ManageEngine] the role of IP address scanning
【ArcGIS自定义脚本工具】矢量文件生成扩大矩形面要素