当前位置:网站首页>S series · add data to the text file without adding duplicate values

S series · add data to the text file without adding duplicate values

2022-06-10 04:35:00 Python advanced

S series · Add data to the text file without adding duplicate values

S Also known as water , It can also be read as Small, In the process of daily work and study , Occasionally I will find something I haven't seen before 、 Small 、 Interesting operation , Perhaps these operations are not meaningful for solving the current problems , Still want to record , Maybe you can write a complete article by yourself , Then write it down as a daily account .

Series article description :

S series ·<< Article title >>

platform :

  • windows 10.0

  • python 3.8

Purpose

To a text file ( May not exist ) Add a new column of data and the data in the text file is not repeated .

processing method

  • The text file does not exist

data = ['243', '122', '782', '577', '478', '334', '334', '738', '122', '112', '634']

The data to be saved is as above , Where the string '122' and '334' Have duplicate values , It can be de reprocessed before saving .

data2 = list(set(data))

The above uses set Function to remove duplicates , If you need to keep the original order , Can be sorted as follows , Or we can use our own function to remove duplicates .

data2.sort(key=data.index)

The file does not exist and can be used mode='w' Pattern , Can be specified encoding='utf-8' Code to save .

with open('test.txt', 'w', encoding='utf-8') as f:
    f.write('\n'.join(data2) + '\n')
  • Text file exists

The hypothesis already exists test.txt file , And the content is as above , New data needs to be ( as follows ) Add to text file .

new = ['243', '122', '989', '989', '577', '159']

Two problems need to be solved here :

  1. The newly added data itself needs to be de duplicated

  2. New data and existing data also need to be reprocessed

Observe ,'989' Is a duplicate value , and '243', '122', '577' It already exists , That is, only one '989' and '159' Add to test.txt in .

new = ['243', '122', '989', '989', '577', '159']

with open('test.txt', encoding='utf-8') as f:
    data_list = []
    r_data = f.readline()
    while r_data.strip():
        data_list.append(r_data.strip())
        r_data = f.readline()

new2 = list(set(new).difference(set(data_list)))
new2.sort(key=new.index)

with open('test.txt', 'a', encoding='utf-8') as f:
    f.write('\n'.join(new2) + '\n')

first with open First read out the contents of the text file , Use line by line reading , Reduce memory usage , And remove the newline character when reading each line , If one-time reading still need to deal with newline characters , Consider the use of line by line reading , Then judge the data to be added with the existing data only in new The data that appears in , stay the second with open use mode='a' Method to write the data to be added into the text .

Need to use two separate open To read and write , A little trouble , Can be mode Set to a+, Data can also be read out in the new mode .

new = ['243', '122', '989', '989', '577', '159', '777']

with open('test.txt', 'a+', encoding='utf-8') as f:
    f.seek(0)  #  Place the file cursor at the beginning of the file 
    data_list = []
    r_data = f.readline()
    while r_data.strip():
        data_list.append(r_data.strip())
        r_data = f.readline()

    new2 = list(set(new).difference(set(data_list)))
    new2.sort(key=new.index)
    f.write('\n'.join(new2) + '\n')

Compare with the previous method , except open The number of is reduced to one , You need to move the comparison de duplication code to with open within , Use 'a' Mode on , The cursor of the file will be placed at the end of the text by default , Add new data at the end , Each use read, The cursor moves back , In the use of readline You can also read to the end of the file , because a The mode cursor is at the end , Use it directly read It is impossible to read the existing data , Need to use seek, Place the cursor at the beginning of the text , And then we'll do it again read Can read out the existing data in the envisaged way , Then de duplicate and compare the data to be added , write in .

summary

How to add data to existing text , And no duplicate data , Start with the file does not exist , Gradually increasing , End use a+ Mode writing , Merged the case where the document does not exist , Set the cursor at the beginning , Read the existing data smoothly , Then do the reprocessing , Other factors are not taken into account , There may be mistakes in code design .

Like a fierce Falcon , wild and intractable .


2022.6.7 leave

原网站

版权声明
本文为[Python advanced]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206100432407417.html