当前位置:网站首页>Excel data extraction technique: a universal formula for extracting numbers from mixed text
Excel data extraction technique: a universal formula for extracting numbers from mixed text
2022-06-24 06:10:00 【User 8639654】
In the last article , Floret explains by looking at mixed text features , Set specific formulas , Three scenarios for data extraction . therefore , Some little petals whispered to little flower : Miss Xiaohua , I am stupid. , No data characteristics can be seen , I'm lazy , I don't want to set different formulas for different scenarios , Is there a kind of overlord universal formula , What kind of mixed text we can hard bow ?
The answer, of course, is , yes , we have ! however , It is still necessary to distinguish between the two situations . One is to extract values , There are positive and negative differences in size , There is also a decimal point ; The other is to extract numeric strings , Such as telephone number 、 ID number, etc , The numbers here have no decimals or minus signs , There is no difference in size .
How to write the universal formula of these two scenarios respectively , How to understand ? And listen to the little flower .
Four 、 A universal formula for extracting numerical values
Situational features : Except for the target value , There are no other numbers in the text , Otherwise, it is easy to cause interference .
Universal formula :
{=-LOOKUP(9^9,-MIDB(A2,MIN(FINDB(LEFT(ROW($1:$11)-2,1),A2&-1/19)),ROW($1:$100)))}
The formula is disassembled in detail as follows :
①LEFT(ROW(1:11)-2,1)
ROW(1:11) Well understood. , Back to page 1 Go to the first place 11 The line number of the line , That is to say 11 Made up of... Characters aggregate A{1,2,3…11},-2 It becomes Character set B{-1,0,1,2…9}. Re pass LEFT Extract character set B The first character on the left , Generate Character set C{"-",0,1,2,…9}, That is, symbols and 0-9 These ten characters , All values , By this 11 Characters make up .
Sum up , The function of this part is to construct all characters of Arabic numerals , These numbers help us to lock the position , And then extract the Arabic values .
②FINDB(①,A2&-1/19)
FINDB Is to find the position of the character in the target text , It is associated with FIND The difference is , It returns the byte sequence number , That is to say, Chinese characters and symbols are regarded as 2 Bytes . Thus we can see that ,A2 Cells in mixed text , Minus sign “-” The place where it appears is 5, instead of 3.
The formula uses A2&-1/19 To make sure that Character set C{"-",0,1,2,…9} Every character of is in FIND Appears in the find text for , Make sure FIND There is no error value in the return value of . fragment ② return Character set C{"-",0,1,2,…9} stay A2&-1/19 Position of appearance , namely Ordinal set D{5,13,10,6,…}.
③MIN(②)
MIN(②) take ② Result Ordinal set D{5,13,10,6,…} Minimum of , It is the target value at A2 Starting position in , namely A2 Mixed text , The position where the negative sign or Arabic numeral first appears , That is, the starting position of the target extraction value . This is why the left side of the target number is required , There can be no irrelevant Arabic numerals or negative signs .
④-MIDB(A2,③,ROW($1:$100))
Use here MIDB, instead of MID, It's for correspondence FINDB, Part of the text is intercepted by byte position .ROW($1:$100) Returns an ordered array {1-100}, As MIDB The third argument to the function —— Number of bytes to extract , I.e. separate extraction 1-100 Characters . Learn more skills , Please collect and pay attention to Tribal education excel Text course .
therefore ,MIDB The function of the function is from ③ Start at the determined starting position , Respectively from the A2 The cut length in the cell text is 1-100 Bytes of 100 individual Unequal length string E{"-","-2","-29","-299",…"-299.19"}. and -MIDB Is to subtract unequal length strings , This causes non numeric data to report an error as #VALUE!, And then Unequal length string E Convert to pure numbers and error values #VALUE! A new constant composed of Array F{#VALUE!;2;29;299;299;299.1;299.19;…;299.19}
⑤-LOOKUP(9^9,④)
LOOKUP Queries have three features :
1. The default query area is in ascending order , That is, the later the value is, the greater .
2. The return value should be less than and closest to the query value .
3. Ignore the wrong values in the query area .
thus , We assign a maximum number to the query value 9^9, because LOOKUP Characteristics of 1, So the last non error value of the query area is the maximum value , That is, the value is the return value .LOOKUP These characteristics of , It perfectly ignores the error value and takes the last valid value !
5、 ... and 、 Universal formula for extracting characters
usage : Extract all the values of the target cell in turn and merge .
Universal formula :
{=SUM(MID(0&A2,LARGE(ISNUMBER(--MID(A2,ROW($1:$100),1))*ROW($1:$100),ROW($1:$100))+1,1)*10^ROW($1:$100)/10)}
The formula is briefly disassembled as follows :
① ISNUMBER(--MID(A2,ROW($1:$100),1))*ROW($1:$100)
adopt MID(A2,ROW($1:$100),1) Extract each character one by one , Use double minus sign operation , Distinguish between numbers and other characters , Reuse ISNUMBER Function to determine whether each character is a number , Returns a set of logical values , Last *ROW($1:$100) Make the number return to its in A2 Position in mixed text , Other characters return 0.
② LARGE(①,ROW($1:$100))
adopt LARGE function , take ① Reorder the set of character position values in from large to small . Because the position of the number in the text is always greater than 0, And the lower the number , The higher the position value is . Other characters are always less than 0 Of . The point here is to put all 0 After setting the value , At the same time, all digital position values are inverted .
③ MID(0&A2,②+1,1)
MID according to ② Position value of +1 from 0&A2 One by one . Because the non numeric position value is 0, All non numeric return values take the first place 0, The remaining figures are unaffected . because ② The numeric position value of is reversed , therefore , At this time, the extracted numbers are reversed .
④ SUM(③*10^ROW($1:$100)/10))
The first three steps lead to A2 All the numbers in the cell and a string representing non numeric positions 0 An ordered array of , This completes the final extraction , You also need to arrange the numbers in positive order 、 Remove 0 Values and merge them . These are all handed over to *10^ROW($1:$100)/10 complete , It builds a multi digit number to put the numbers in order , Will eventually represent the number of significant digits before the text 0 Value ellipsis , The rest of the numbers are arranged from one bit to the left in order . The final multi digit number is the result of digital extraction .
Actually , The problem of extracting numeric strings ,19 Years later, the version has a very simple and brain - free solution –– adopt CONCAT Just connect directly .
19 The universal formula is as follows :
{=CONCAT(IFERROR(--MID($A2,ROW($1:$100),1),""))}
边栏推荐
- MySQL series tutorial (I) getting to know MySQL
- Oceanus practice - develop MySQL CDC to es SQL jobs from 0 to 1
- Why do the new generation of highly concurrent programming languages like go and rust hate shared memory?
- The influence of SEO age and the length of external chain retention
- An indoor high-end router with an external cable bundle limiting mechanism
- Flutter layout Basics - page navigation and return
- Could not read username for xxxxx
- Basic concepts of complex networks
- Fixed assets management software enables enterprises to realize intelligent management of fixed assets
- Analysis of DDoS attack methods
猜你喜欢
随机推荐
How much does the domain name registration cost? Is there a time limit for the domain name purchased
Spirit information development log (4)
What if the domain name is blocked? What can I do to quickly unseal?
EEG microstate as a continuous phenomenon
How to solve the enterprise network security problem in the mixed and multi cloud era?
ZABBIX enterprise distributed monitoring
How to solve domain name redirection? How to avoid such problems?
Tencent cloud harbor private warehouse deployment practice
Overview of related concepts of social network analysis
Introduction of frequency standard comparison measurement system
Tesseract-OCR helloworld
Text classification and fine tuning using transformer Bert pre training model
A network box that can adjust the outlet according to the router antenna position
TensorFlow 2 quickstart for beginners
Malicious software packages are found in pypi code base. Tencent security threat intelligence has been included. Experts remind coders to be careful of supply chain attacks
Neighbor vote: use proximity voting to optimize monocular 3D target detection (ACM mm2021)
What is the difference between a white box test and a black box test
Network review
How about the XYZ domain name? What are the advantages over other domain names?
Technology is a double-edged sword, which needs to be well kept



