当前位置:网站首页>Kettle reads file split by line
Kettle reads file split by line
2022-07-27 03:45:00 【LongJ_ Sir】
Kettle It's an open source ETL Tools , Support data migration from many data sources , So is the author N It was used years ago , Recently, a batch of people from HDFS Text data of , I also thought about the use of this tool , Come on HDFS Text is often stored in rows , What the author encountered this time is 6 Thousands of lines json data , Total volume in 20G, After looking around the Internet, I couldn't find the operation of reading multiple lines of text , So I fiddled with it myself , For the record :
The data format is as follows :
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-19.441364292429924, 65.52333266640196], [-19.441369748908993, 65.52335128397081], [-19.441456975615232, 65.52334689547278], [-19.441451519136184, 65.5233282779008], [-19.441364292429924, 65.52333266640196]]]}}
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-22.023537553224426, 64.34071231744036], [-22.023766291888357, 64.34079685621145], [-22.023896688463573, 64.34073070136526], [-22.02366796389063, 64.340646160667], [-22.023537553224426, 64.34071231744036]]]}}
{"type": "Feature", "properties": {},"geometry": {"type": "Polygon","coordinates": [[[-20.53351958043648, 64.21502080919846], [-20.533556158627654, 64.21504083748084], [-20.533638377500047, 64.21501242439349], [-20.53360179930887, 64.21499239609055], [-20.53351958043648, 64.21502080919846]]]}}Divided by rows JSON, You need to import it into postgresql in :
stay kettle New China “ transformation ”, Select text file input in input , Add questions that need to be extracted :

stay “ Content ” Change the format of the management card to Unix, stay “ Field ” The management card can define the field name by clicking get field :


Then you can preview the record :

In order to read JSON One of the Key The next value , We can add one later “JSON Input ”, stay “ file ” Manage card selection “ The source is defined in a field ”,“ Get source from field ” Select the field output in the previous step :


stay “ Field ” For management card JSONPath Configuration fields :

Next, you can preview the process by previewing the transformation :


Then you can configure the output in the way of warehousing .
Finish off work , Test it out ,6 thousand 5 Millions of text data to postgresql when 37 minute , The execution efficiency should be very fast .
边栏推荐
- Source code analysis of openfeign
- Code review pyramid
- Insert pictures and videos in typera
- 【1206. 设计跳表】
- Safe-arc/warner power supply maintenance xenon lamp power supply maintenance analysis
- Characteristics and experimental suggestions of abbkine abfluor 488 cell apoptosis detection kit
- Typescript TS basic knowledge interface, generics
- [tree chain dissection] 2022 Hangzhou Electric Multi school 21001 static query on tree
- Design method and test method of APP interface use case
- Redis源码学习(33),命令执行过程
猜你喜欢

基于OpenCV的轮廓检测(1)

MySQL underlying data structure

redis入门练习

Deeply understand the underlying data structure and algorithm of MySQL index

Cocos game practice-04-collision detection and NPC rendering

Customer cases | pay attention to the elderly user experience, and the transformation of bank app to adapt to aging should avoid falsehood and be practical

Take you to know what Web3.0 is
![[learn FPGA programming from scratch -54]: high level chapter - FPGA development based on IP core - principle and configuration of PLL PLL IP core (Altera)](/img/4f/f75cfeb4422120ef9ac70cdeb0a840.png)
[learn FPGA programming from scratch -54]: high level chapter - FPGA development based on IP core - principle and configuration of PLL PLL IP core (Altera)

阿里 Seata 新版本终于解决了 TCC 模式的幂等、悬挂和空回滚问题

复盘:图像有哪些基本属性?关于图像的知识你知道哪些?图像的参数有哪些
随机推荐
【树链剖分】模板题
Reading notes of Kazuo Inamori's advice to young people
Add support for @data add-on in idea
[common search questions] 111
Banyan loan,
Network security / penetration testing tool awvs14.9 download / tutorial / installation tutorial
百融榕树数据模型
《稻盛和夫给年轻人的忠告》阅读笔记
架构基本概念和架构本质
Spark Learning Notes (V) -- spark core core programming RDD conversion operator
Spark: calculate the average value of the same key in different partitions (entry level - simple implementation)
"Date: write error: no space left on device" solution
Explain工具实际操作
技术风向标 | 云原生技术架构成熟度模型解读
The diagram of user login verification process is well written!
如何进行 360 评估
Binary tree (day 82)
477-82(236、61、47、74、240、93)
若依的环境的部署以及系统的运行
Debug mode in pycharm for detailed debugging