当前位置:网站首页>mmdetection训练自己的数据集--CVAT标注文件导出coco格式及相关操作
mmdetection训练自己的数据集--CVAT标注文件导出coco格式及相关操作
2022-07-02 06:26:00 【chenf0】
前期配置及遇到的乱七八糟的问题等见:https://blog.csdn.net/chenfang0529/article/details/115094036
一、导出
使用mmdetection训练自己的数据集,数据集使用VCAT进行标注,标注的文件是视频文件,将图像帧及标注文件导出为COCO格式。常用的还有PASCAL VOC
导出后包括两个文件
images和annotations
images中包含图像帧
annotations包含标注文件,我们只需要对第三个文件进行修改。
二、相关代码
1.批量修改图片名
import os
class BatchRename():
def rename(self):
path="D:\\achenf\data\\taxi\\test\\task_2_9_car_test-2021_04_13_13_25_24-coco\images"
filelist=os.listdir(path)
total_num = len(filelist)
i=595
for item in filelist:
if item.endswith('.jpg'):
src=os.path.join(os.path.abspath(path),item)
dst=os.path.join(os.path.abspath(path),''+str(i)+'.jpg') #可根据自己需求选择格式
# dst=os.path.join(os.path.abspath(path),'00000'+format(str(i))+'.jpg') #可根据自己需求选择格式,自定义图片名字
try:
os.rename(src,dst) #src:原名称 dst新名称d
i+=1
except:
continue
print ('total %d to rename & converted %d png'%(total_num,i))
if __name__=='__main__':
demo = BatchRename()
demo.rename()
2.批量修改json文件内容
json中id等需要和图片进行对应。
需要的json中包含五部分,info,categories,licenses,annotations,images
我们只需要修改annotations和images两部分。
import json
import os
path = 'D:\\achenf\data\\taxi\\train\\task_2_8_car_test-2021_04_13_13_25_07-coco\\annotations\\test'
dirs = os.listdir(path)
num_flag = 0
for file in dirs: # 循环读取路径下的文件并筛选输出
if os.path.splitext(file)[1] == ".json": # 筛选csv文件
num_flag = num_flag +1
print("path ===== ",file)
print(os.path.join(path,file))
with open(os.path.join(path,file),'r') as load_f:
load_dict = json.load(load_f)
# print(load_dict)
# n=len(load_dict["image_id"])
# print(type(load_dict))
# for i in load_dict:
# print(i)
for i in load_dict['annotations']:
i['image_id'] = i['image_id'] + 595
i['id']=i['id']+2032
# if i['image_id']>=595:
# i['id']=i['id']+3015
for i in load_dict['images']:
i['id'] = i['id'] + 595
i['file_name'] = ""+str(i['id'])+".jpg"
with open(os.path.join(path,file),'w') as dump_f:
json.dump(load_dict, dump_f)
if(num_flag == 0):
print('所选文件夹不存在json文件,请重新确认要选择的文件夹')
else:
print('共{}个json文件'.format(num_flag))
最后将各个对应的部分进行合并
三、其他
1.解析xml文件,查看文件中标注个数
import os
import xml.dom.minidom
res=0
AnnoPath = r'./file_xml/0512/'
Annolist = os.listdir(AnnoPath)
for annotation in Annolist:
filename =AnnoPath + annotation
dom = xml.dom.minidom.parse(filename) # 打开XML文件
collection = dom.documentElement # 获取元素对象
objectlist = collection.getElementsByTagName('box') # s
count = objectlist.length
res =res+count
print("文件名:", filename,"标注数:", count)
print("一共标注:", res)
结果:
边栏推荐
猜你喜欢
第一个快应用(quickapp)demo
ModuleNotFoundError: No module named ‘pytest‘
【信息检索导论】第三章 容错式检索
Implementation of purchase, sales and inventory system with ssm+mysql
Point cloud data understanding (step 3 of pointnet Implementation)
JSP intelligent community property management system
CSRF attack
Practice and thinking of offline data warehouse and Bi development
常见的机器学习相关评价指标
@Transational踩坑
随机推荐
[torch] some ideas to solve the problem that the tensor parameters have gradients and the weight is not updated
[introduction to information retrieval] Chapter 1 Boolean retrieval
Jordan decomposition example of matrix
CONDA creates, replicates, and shares virtual environments
深度学习分类优化实战
TCP attack
【Torch】解决tensor参数有梯度,weight不更新的若干思路
实现接口 Interface Iterable<T>
如何高效开发一款微信小程序
Use matlab to realize: chord cut method, dichotomy, CG method, find zero point and solve equation
MMDetection模型微调
华为机试题
【信息检索导论】第二章 词项词典与倒排记录表
ABM论文翻译
优化方法:常用数学符号的含义
JSP intelligent community property management system
聊天中文语料库对比(附上各资源链接)
Two dimensional array de duplication in PHP
view的绘制机制(一)
Two table Association of pyspark in idea2020 (field names are the same)