当前位置:网站首页>Oracle makes it clear at one time that a field with multiple separators will be split into multiple rows, and then multiple rows and columns. Multiple separators will be split into multiple rows, and
Oracle makes it clear at one time that a field with multiple separators will be split into multiple rows, and then multiple rows and columns. Multiple separators will be split into multiple rows, and
2022-07-07 08:47:00 【They call me technical director】
Catalog
One 、 A field with multiple separators splits multiple lines
3、 The writing of various separators
Two 、 Multiple rows, multiple columns, multiple separators, split multiple rows
3、 ... and 、 Handle more than 100 million ... Billion level data
3、 Statistics after converting multiple rows into single row split results
Preface :
During the big data demand analysis this time , Received a “ Simple ” The needs of , Count the workload of multiple inspectors . That is, when the finished product is tested , We need to analyze 80-90 Three test indicators to determine whether it is qualified , Only after it is qualified can it be released from the factory . Of course, these testing items should be completed , Requires multi person collaboration . For example, complete the detection of main content , Four people are needed , Completing the detection of impurities requires a lot of ghosts and monsters 6 people 、 The completion of particle size testing requires ethylene propylene glycol 5 people .... And so on. . Suppose there are altogether a, B, C, D, demons 8 people , Because of 80-90 Test items , They all involve the corresponding test items , But whose workload is saturated , There is no way to know who is unsaturated , Therefore, the information of test items and corresponding test personnel is saved in the table through form statistics , Just analyze , How many times does each person test each test item every day , You can analyze the corresponding workload saturation . Ha ha ha , Is this demand super easy~, However, the nightmare has just begun ...

One 、 A field with multiple separators splits multiple lines
First let's look at , How to split a row of data into multiple rows , Here we use REGEXP_SUBSTR This function of , adopt REGEXP_SUBSTR And corresponding regular expressions to accomplish our purpose of splitting .
In order to make everyone understand the code quickly , In the corresponding code [^] It means that it doesn't start with , That is, when we split , Only columns such as “1,2,3” The data of , Instead of splitting “,1,2,3” The data of , Among them “+” Represents multiple matches ,“|” Is or means , That is, when we have multiple separators, we can use , For example, split “1,2\3,4,5\6” Data is needed . The specific usage can be understood if you are interested oracle The contents of regular expressions .
1、 The first way to split
Code :
-- The first way to write it
SELECT
REGEXP_SUBSTR ('1,2,3',
'[^,]+',
1,
rownum)
FROM dual
CONNECT BY
rownum <= LENGTH ('1,2,3') - LENGTH (regexp_replace('1,2,3', ',', ''))+ 1;
effect :

analysis :
As shown in the figure above, we succeeded , take ‘1,2,3’ Split into 1 2 3 Of 3 That's ok , But careful partners in this way will find that there are many empty lines . Therefore, it is strongly not recommended !!!
2、 The second way
Code :
-- The second way
SELECT
REGEXP_SUBSTR ('1,2,3','[^,]+',1,
rownum)
FROM dual
CONNECT BY REGEXP_SUBSTR ('1,2,3','[^,]+',1,LEVEL) is not null effect :

analysis :
This kind of writing code is simple , The result is consistent with our expectation , Only what we need ,1,2,3 Of 3 Row data . Therefore, this writing method is recommended .
3、 The writing of various separators
Code :
-- The second way
SELECT
REGEXP_SUBSTR ('1,2\3,4,5\6','[^,|\]+',1,
rownum)
FROM dual
CONNECT BY REGEXP_SUBSTR ('1,2\3,4,5\6','[^,|\]+',1,LEVEL) is not nulleffect :

analysis :
As shown in the figure , We just need to add [^,] Change to [^,|\] that will do , That is, through “|” Separator list the separators in turn . Is it brain melon seeds buzzing , Ha ha ha , Don't worry. , More buzzing in the back .

Two 、 Multiple rows, multiple columns, multiple separators, split multiple rows
Ha ha ha , After learning above , Believe that ordinary splitting has been difficult for you , So it's estimated that you already feel like you can go to heaven , At this time, the business department tells you that this situation is unknown , There are multiple columns , And there are many separators , At this time, you will find that your small head is not enough seeds . What do I do ? Don't worry , Let's take a look at the scene first .

1、 Single row multi column multi separator split
Code :
-- Split multiple columns
SELECT
REGEXP_SUBSTR ('1,2\3,4,5\6','[^,|\]+',1,LEVEL) a,
REGEXP_SUBSTR ('7,8\3,5,9,0,6,4','[^,|\]+',1,LEVEL) a1
FROM dual
CONNECT BY REGEXP_SUBSTR ('1,2\3,4,5\6','[^,|\]+',1,LEVEL) is not null
or REGEXP_SUBSTR ('7,8\3,5,9,0,6,4','[^,|\]+',1,LEVEL) is not null
effect :

analysis :
As shown in the figure , When we split a single line , When there are multiple columns of data , The final data will be split into multiple rows by the most data columns , Then the data correspond one by one , In turn, is [(1,7),(2,8),(3,3),(4,5),(5,9),(6,0),('',6),('',4)] , To display all the data , So in CONNECT BY in , use or Connect multiple columns and calculate . There will be 3...n Columns are also passed or Separate .
2、 Split multiple rows, columns and separators
Let's first look at the original data , Try splitting again

Code :
SELECT REGEXP_SUBSTR (PH,'[^/|\|//|*]+',1,LEVEL) PH,
REGEXP_SUBSTR (S miscellaneous ,'[^/|\|//|*]+',1,LEVEL) S miscellaneous
from ( select PH,S miscellaneous from qz_zb_cpjcjg where 1=1
and PH is not null and S miscellaneous is not null
and rownum<=2
) qz_zb_cpjcjg
CONNECT BY REGEXP_SUBSTR (PH,'[^/|\|//|*]+',1,LEVEL) is not null or
REGEXP_SUBSTR (S miscellaneous ,'[^/|\|//|*]+',1,LEVEL) is not nulleffect :

analysis :
When there is a 2 That's ok , The first column is 3 It's worth , The second as 4 It's worth , At this time, the total number of split rows is (1+2+4+8)*2 That's ok =30 That's ok , When we change the number of lines to 3 The result of the split is (1+3+9+27)*3 That's ok =120 That's ok , When we change the number of rows to 4 The result of row splitting is (1+4+16+64)*4=340 That's ok . At this point, we find that when the number of rows increases , The total number of branches we split will increase regularly , namely (1+n+n²+n³)*n=n+n²+n³+n^4, among n The number of lines calculated for you , The largest power is the largest split term . So when we have more rows , And the more projects you split , The number of items we finally split will be larger , When our number of rows is 64 That's ok , Split items 10 Time , At this time, the efficiency of splitting will be very low .
3、 ... and 、 Handle more than 100 million ... Billion level data
As shown in the figure , It took us an hour to split a project without calculating , Later, in the test, I took up all the cache and didn't calculate it . It seems that the above split method seems to be a little impractical for this multi line situation .

In reality, we have 80-90 term , And there are at least hundreds of lines , So such huge data , The result of the final split will be more than 100 million ... Billion level , It's going to be better than that 10^64 Much more . If we were to 10^64 Defined as incredible , So we call this split result “ It's incredible ” Well . Because it is composed of hundreds of millions of incredible . therefore , Facing us, there is only 128G It seems that it is impossible for a server with memory to handle this level of data , Do you have to give up ?
1、 Let's think about ?
We found that the efficiency of splitting single row data is very high , So we need to find the smallest unit of Statistics , And the number of rows of the smallest cell cannot exceed 10 That's ok , The time-consuming calculation results that cannot be split will be unacceptable . We think our demand this time is to count the daily workload of each person , Therefore, the first thing we think of is to count by day as the minimum dimension . That is, let's first count the workload of yesterday , Let's count today's workload . Then summarize the historical workload and today's workload, which corresponds to everyone's workload , Take another average to know whether the workload is saturated .

2、 But is it reasonable ?
At first, Xiaobian did the same , However, the amount of data found is still not small , because ....

3、 Statistics after converting multiple rows into single row split results
Because business says , This is the case , And it can't change . So I went to find the smallest unit . That is, split by single line , Finally, less than 1 Split all the workload in seconds . Finally, save the workload data of people on the corresponding date of each test item .

4、 Final effect

5、 analysis
The final split is split through the cursor loop of the stored procedure , Save the corresponding split details to the same table, and then summarize and analyze , Get the final workload , Because it's too late , I won't introduce the detailed split logic for the time being . If you want to get the final stored procedure splitting logic , Feel free to leave a comment ~
边栏推荐
- [Yu Yue education] higher vocational English reference materials of Nanjing Polytechnic University
- Iptables' state module (FTP service exercise)
- IP地址的类别
- Thirteen forms of lambda in kotlin
- [Nanjing University] - [software analysis] course learning notes (I) -introduction
- Mock.js用法详解
- 調用華為遊戲多媒體服務的創建引擎接口返回錯誤碼1002,錯誤信息:the params is error
- Input and output of floating point data (C language)
- 快速集成认证服务-HarmonyOS平台
- QT charts use (rewrite qchartview to realize some custom functions)
猜你喜欢

Virtual address space
![[step on the pit] Nacos registration has been connected to localhost:8848, no available server](/img/ee/ab4d62745929acec2f5ba57155b3fa.png)
[step on the pit] Nacos registration has been connected to localhost:8848, no available server

Go语言中,函数是一种类型

JS的操作

Greenplum6.x搭建_安装

Download and install orcale database11.2.0.4

What is the method of manual wiring in PCB design in 22protel DXP_ Chengdu electromechanical Development Undertaking

调用华为游戏多媒体服务的创建引擎接口返回错误码1002,错误信息:the params is error

Count sort (diagram)

Merge sort and non comparison sort
随机推荐
2-3 lookup tree
数据分片介绍
Opencv converts 16 bit image data to 8 bits and 8 to 16
What is the method of manual wiring in PCB design in 22protel DXP_ Chengdu electromechanical Development Undertaking
[paper reading] icml2020: can autonomous vehicles identify, recover from, and adapt to distribution shifts?
Frequently Asked Coding Problems
Iptables' state module (FTP service exercise)
Thirteen forms of lambda in kotlin
MySQL introduction - crud Foundation (establishment of the prototype of the idea of adding, deleting, changing and searching)
[Yu Yue education] C language programming reference of Zhongbei College of Nanjing Normal University
Merge sort and non comparison sort
[Yugong series] February 2022 U3D full stack class 005 unity engine view
下载和安装orcale database11.2.0.4
Novice entry SCM must understand those things
Go语言中,函数是一种类型
PLSQL的安装和配置
测试踩坑 - 当已有接口(或数据库表中)新增字段时,都需要注意哪些测试点?
路由信息协议——RIP
求有符号数的原码、反码和补码【C语言】
ES6_ Arrow function