当前位置:网站首页>根据csv文件某一列字符串中某个数字排序
根据csv文件某一列字符串中某个数字排序
2022-07-06 08:19:00 【不求大富大贵只求富可敌国】
**
文件如下所示:
**
根据第一列第四个数字大小进行排序(请注意“汉字”顺序需要与前面的视频顺序对应)
**
解决思路:
**
1、把数据提取出来
2、把第一列split成列表
3、zip组合:拆分后的列表与第二列,由于key是可哈希的且不能是列表,而value是可以修改,所以把第一列当value,第二列当key, dict转换为字典
4、依据value,使用sorted排序
5、通过for:循环排序后的结果,把拆分的数据join组合后,分别追加到两个空列表,再次zip即可(为了把两列的顺序调换回来)(可能有人会有疑问:为什么不采用“键值对”反转的形式,而搞得那么麻烦,在第二步骤也讲了,字典的key不能是列表,所以不能通过反转键值对解决)
**
具体代码如下:
**
import os
import pandas as pd
path=r"C:\Users\jam96\PycharmProjects\all_module\pandas_test\a"
dir_path=os.path.dirname(os.path.abspath(__file__))
result_path=os.path.join(dir_path,"data")
if not os.path.exists(result_path):
os.mkdir(result_path)
files=os.listdir(path)
print(files)
num=1
for i in files:
res=pd.read_csv(filepath_or_buffer=os.path.join(path,i),header=None)
a=res.values[:,0]
b=res.values[:,1]
d=[]
for i in a:
c=i.split("_")
d.append(c)
new=dict(zip(b,d))
#sorted返回的是列表
new1=sorted(new.items(),key=lambda x:int(x[1][3]))
k=[]
j=[]
for i in new1:
k.append(i[0])
j.append("_" .join(i[1]))
result=zip(j,k)
pd1=pd.DataFrame(data=result)
pd1.to_csv(result_path+os.path.sep+"C00"+str(num)+".csv",index=False,header=None)
num+=1
运行结果如下:
优化代码
后来发现以上代码写的冗余,其实没必要那么复杂,另一方面是由于对字典的掌握熟练度不够,优化后的代码如下:
import pandas as pd
#csv文件路径
path=r"D:\TestSet\csv\dms\abc.csv"
#读csv文件
res=pd.read_csv(path,encoding="gbk",header=None)
#得到第一列与第二列
a=res.values[:,0]
b=res.values[:,1]
#把第一列的数据与第二列结合起来
c=dict(zip(a,b))
#根据key的第四项大小进行排序,请注意使用了“int”对结果进行了强制转换为整型
d=sorted(c.items(),key=lambda x:int(x[0].split("_")[3]))
#把排序后的结果再次写入新的csv
e=pd.DataFrame(data=d)
e.to_csv(path_or_buf=r"C:\Users\xdjiang6\PycharmProjects\日志结果批量修改标注集\data\a.csv",index=False,header=None)
边栏推荐
- Wireshark grabs packets to understand its word TCP segment
- Data governance: data quality
- Use br to back up tidb cluster data to S3 compatible storage
- The State Economic Information Center "APEC industry +" Western Silicon Valley will invest 2trillion yuan in Chengdu Chongqing economic circle, which will surpass the observation of Shanghai | stable
- [luatos-air551g] 6.2 repair: restart caused by line drawing
- [research materials] 2021 Research Report on China's smart medical industry - Download attached
- Chinese Remainder Theorem (Sun Tzu theorem) principle and template code
- Leetcode question brushing record | 203_ Remove linked list elements
- Data governance: misunderstanding sorting
- 使用 TiUP 升级 TiDB
猜你喜欢
Leetcode question brushing record | 203_ Remove linked list elements
All the ArrayList knowledge you want to know is here
It's hard to find a job when the industry is in recession
[t31zl intelligent video application processor data]
Yyds dry goods inventory three JS source code interpretation eventdispatcher
Data governance: 3 characteristics, 4 transcendence and 3 28 principles of master data
ESP series pin description diagram summary
Golang DNS write casually
Database basic commands
Hcip day 16
随机推荐
你想知道的ArrayList知识都在这
好用的TCP-UDP_debug工具下载和使用
Binary tree creation & traversal
Sanzi chess (C language)
Vocabulary notes for postgraduate entrance examination (3)
[Yugong series] February 2022 U3D full stack class 010 prefabricated parts
Fibonacci sequence
Entity class design for calculating age based on birthday
leetcode刷题 (5.28) 哈希表
VMware 虚拟化集群
From monomer structure to microservice architecture, introduction to microservices
【T31ZL智能视频应用处理器资料】
[t31zl intelligent video application processor data]
Char to leading 0
synchronized 解决共享带来的问题
[Yugong series] creation of 009 unity object of U3D full stack class in February 2022
Wireshark grabs packets to understand its word TCP segment
Wincc7.5 download and installation tutorial (win10 system)
649. Dota2 Senate
Configuring OSPF load sharing for Huawei devices