当前位置:网站首页>Kettle reads file split by line
Kettle reads file split by line
2022-07-27 03:45:00 【LongJ_ Sir】
Kettle It's an open source ETL Tools , Support data migration from many data sources , So is the author N It was used years ago , Recently, a batch of people from HDFS Text data of , I also thought about the use of this tool , Come on HDFS Text is often stored in rows , What the author encountered this time is 6 Thousands of lines json data , Total volume in 20G, After looking around the Internet, I couldn't find the operation of reading multiple lines of text , So I fiddled with it myself , For the record :
The data format is as follows :
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-19.441364292429924, 65.52333266640196], [-19.441369748908993, 65.52335128397081], [-19.441456975615232, 65.52334689547278], [-19.441451519136184, 65.5233282779008], [-19.441364292429924, 65.52333266640196]]]}}
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-22.023537553224426, 64.34071231744036], [-22.023766291888357, 64.34079685621145], [-22.023896688463573, 64.34073070136526], [-22.02366796389063, 64.340646160667], [-22.023537553224426, 64.34071231744036]]]}}
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-20.53351958043648, 64.21502080919846], [-20.533556158627654, 64.21504083748084], [-20.533638377500047, 64.21501242439349], [-20.53360179930887, 64.21499239609055], [-20.53351958043648, 64.21502080919846]]]}}Divided by rows JSON, You need to import it into postgresql in :
stay kettle New China “ transformation ”, Select text file input in input , Add questions that need to be extracted :

stay “ Content ” Change the format of the management card to Unix, stay “ Field ” The management card can define the field name by clicking get field :


Then you can preview the record :

In order to read JSON One of the Key The next value , We can add one later “JSON Input ”, stay “ file ” Manage card selection “ The source is defined in a field ”,“ Get source from field ” Select the field output in the previous step :


stay “ Field ” For management card JSONPath Configuration fields :

Next, you can preview the process by previewing the transformation :


Then you can configure the output in the way of warehousing .
Finish off work , Test it out ,6 thousand 5 Millions of text data to postgresql when 37 minute , The execution efficiency should be very fast .
边栏推荐
- Design method and test method of APP interface use case
- Wechat applet generation Excel
- PyCharm中Debug模式进行调试详解
- Redis源码学习(33),命令执行过程
- Spark: ranking statistics of regional advertising hits (small case)
- Database usage security policy
- 网络安全/渗透测试工具AWVS14.9下载/使用教程/安装教程
- MySQL的数据库有关操作
- flask_restful中reqparse解析器继承
- Technology vane | interpretation of cloud native technology architecture maturity model
猜你喜欢

Introduction to database - a brief introduction to MySQL

数字孪生应用及意义对电力的主要作用,概念价值。

Explain工具实际操作

Deeply understand the underlying data structure and algorithm of MySQL index

The application and significance of digital twins are the main role and conceptual value of electric power.

Source code analysis of openfeign

若依的环境的部署以及系统的运行

Detailed explanation of const usage in C language

mysql出现不存在错误

快速排序及优化
随机推荐
mysql底层数据结构
redis秒杀案例,跟着b站尚硅谷老师学习
Food chain (day 79)
MySQL Chinese failure
深圳家具展首日,金可儿展位三大看点全解锁!
The function and application of lpci-252 universal PCI interface can card
[从零开始学习FPGA编程-54]:高阶篇 - 基于IP核的FPGA开发-PLL锁相环IP核的原理与配置(Altera)
Mysql database related operations
Spark Learning Notes (V) -- spark core core programming RDD conversion operator
[regular] judgment, mobile number, ID number
mysql如何优化
Wechat applet generation Excel
[untitled] JDBC connection database read timeout
Maximum continuous subsequence (day 77)
How can you access the domestic server and overseas server quickly with one database?
Daffodils (day 78)
飞腾腾锐 D2000 荣获数字中国“十大硬核科技”奖
Details of impala implementation plan
【1206. 设计跳表】
Message rejected MQ