当前位置：网站首页>Sort according to a number in a string in a column of CSV file

Sort according to a number in a string in a column of CSV file

2022-07-06 08:24:00 【Don't seek great wealth, just seek wealth to rival the country】

The file is shown below ：

**
Sort according to the size of the fourth number in the first column （ Please note that “ Chinese characters ” The sequence needs to correspond to the previous video sequence ）
Insert picture description here
**

Solutions ：

**
1、 Extract the data
2、 Put the first column split Make a list
3、zip Combine ： The split list and the second column , because key Is hashable and cannot be a list , and value It can be modified , So take the first column as value, Second row Dang key, dict Convert to dictionary
4、 basis value, Use sorted Sort
5、 adopt for： The result after cyclic sorting , Split the data join After combination , Append to two empty lists respectively , Again zip that will do （ In order to reverse the order of the two columns ）（ There may be questions ： Why not use “ Key value pair ” Reverse form , And make it so troublesome , In the second step , Dictionary key It can't be a list , Therefore, it cannot be solved by reversing key value pairs ）
**

The specific code is as follows ：

import os
import pandas as pd
path=r"C:\Users\jam96\PycharmProjects\all_module\pandas_test\a"
dir_path=os.path.dirname(os.path.abspath(__file__))
result_path=os.path.join(dir_path,"data")
if not os.path.exists(result_path):
    os.mkdir(result_path)

files=os.listdir(path)
print(files)
num=1
for i in files:
    res=pd.read_csv(filepath_or_buffer=os.path.join(path,i),header=None)
    a=res.values[:,0]
    b=res.values[:,1]
    d=[]
    for i in a:
        c=i.split("_")
        d.append(c)
    new=dict(zip(b,d))
    #sorted Back to the list 
    new1=sorted(new.items(),key=lambda x:int(x[1][3]))
    k=[]
    j=[]
    for i in new1:
        k.append(i[0])
        j.append("_" .join(i[1]))
    result=zip(j,k)

    pd1=pd.DataFrame(data=result)

    pd1.to_csv(result_path+os.path.sep+"C00"+str(num)+".csv",index=False,header=None)
    num+=1

The operation results are as follows ：

Insert picture description here

Optimize the code

Later, I found that the above code is redundant , Actually, it doesn't need to be so complicated , On the other hand, it is due to the lack of proficiency in dictionaries , The optimized code is as follows ：

import  pandas as pd
#csv File path 
path=r"D:\TestSet\csv\dms\abc.csv"
# read csv file 
res=pd.read_csv(path,encoding="gbk",header=None)
# Get the first and second columns 
a=res.values[:,0]
b=res.values[:,1]
# Combine the data in the first column with the data in the second column 
c=dict(zip(a,b))
# according to key Size of the fourth item , Please pay attention to the use of “int” The result is cast to integer 
d=sorted(c.items(),key=lambda x:int(x[0].split("_")[3]))
# Write the sorted results into the new csv
e=pd.DataFrame(data=d)
e.to_csv(path_or_buf=r"C:\Users\xdjiang6\PycharmProjects\ Log results batch modify annotation set \data\a.csv",index=False,header=None)