当前位置:网站首页>Dealing with duplicate data in Excel with xlwings
Dealing with duplicate data in Excel with xlwings
2022-07-03 08:22:00 【cuncaojin】
xlwings brief introduction
xlwings It's a Python library . To simplify the Python and Excel signal communication .
xlwings - Give Way Excel Run fast !
The writing background of this paper & demand & programme
Because a few months ago, I helped my friends who worked in the hospital to learn and sell now VBA Wrote a program , Handle 2 individual excel The data of the document goes to 3 individual Excel Document , There is template data , There are charts , Afraid of data error , Repeat the test , Toss about yes 2 Genius comes out . Trying to figure out how to use VBA During the development process, I found ,VBA Development is too painless .
A few days ago, I came across the title of an article be called Put on your wings , Give Way Excel Fly up ——xlwings , I was attracted by the title , I saw such a thing .
What's more, yesterday my friend asked me to help her deal with a Excel problem , The demand this time is very simple , by :
1. Only one, only 3 List of columns
2. If there are two or more rows of data in the table , Their second 2 Column and the first 3 The column data is the same , The first 1 Whether the column data is the same or not is not considered
3. Then just keep any row of data .
namely : Delete redundant rows with the same data in the last two columns of the table
So I thought xlwings This thing , I want to have a try , See how to play ( I am not yet. python Of HelloWorld)
The renderings are as follows :
- Solution : See the code in this article code part , recommend Scheme code 2
Environmental preparation
Recommended anaconda Solve environmental problems , The specific usage is Baidu
Python3 install
Be careful : Provide administrator privileges during installation , Otherwise, it is easy to make mistakes .
Libraries installed
pip install Library name [= Version number ]
pip install xlwings
Other libraries ( It depends on the installation )
Python Special grammar or attention
>>> def fib(n):
>>> a, b = 0, 1
>>> while a < n:
>>> print(a, end=' ')
>>> a, b = b, a+b
>>> print()
>>> fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Wrong report in Chinese
You need to add... In the first line #coding=utf-8
or # -*- coding: UTF-8 -*-
type()
function
be used for Detect the data type of a data , Very useful . The effect is like :<class ‘list’> <class ‘xlwings.main.Range’>
Strict code alignment
Because there are no braces , Code alignment must be strict , Otherwise, the semantics may be different or even report errors !
Sequence type — list, tuple, range
list list It's a variable sequence
class list([iterable])¶
Use a square bracket to represent an empty list : []
Use square brackets , The items are separated by commas : [a], [a, b, c]
Use list derivation : [x for x in iterable]
Use type constructors : list() or list(iterable)Tuples tuple It's an immutable sequence , A multinomial set usually used to store heterogeneous data
class tuple([iterable])
Use a pair of parentheses to represent empty tuples : ()
Use a comma with a suffix to represent the unit group : a, or (a,)
Use multiple items separated by commas : a, b, c or (a, b, c)
Use the built-in tuple(): tuple() or tuple(iterable)
tuple(‘abc’) return (‘a’, ‘b’, ‘c’) and tuple( [1, 2, 3] ) return (1, 2, 3). If no parameters are given , The constructor will create an empty tuple ()Range range Type represents an immutable sequence of numbers , Usually used in for The number of times specified in the loop .
class range(stop)
class range(start, stop[, step])
give an example
>>> 3,0,-1 (3, 0, -1) >>> (3,0,-1) # ditto , For tuples , Most brackets can be omitted , It is not recommended to save (3, 0, -1) >>> range(3,0,-1) # range , Indicates that the beginning is 3, The end of 1, In steps of -1 The integer range of range(3, 0, -1) >>> for i in range(3,0,-1): ... print(i) ... 3 2 1
for loop , It's really different from other languages
The grammatical format is only for-in
This format
"for" target_list "in" expression_list ":" suite
["else" ":" suite]`
therefore , If you want to dynamically modify the number of cycles , You can use while Replace
r = 3
for i in range(r):
print(i)
# The following modifications are correct for Circulation is useless , because i The value of will be overwritten by the next cycle ,i The value of the range Within the scope of
i-=1
r-=2
# Output :
# 0
# 1
# 2
String splicing
When the plus sign is used for string splicing , Note that non string needs str() function
Deal with it , Such as :" Hello " + str(123)
; or print(" Hello %s"%123)
Or more simply print(a, b, c...)
or format() Method splicing :
" Hello {1}{0}{2}".format(1,2,3)
# ‘ Hello 213’ " Hello {}{}{}".format(1,2,3)
# ‘ Hello 123’ etc.
The colon
if / for And so on , Split code function
>>> if x < 0: ... x = 0 ... print('Negative changed to zero') ... elif x == 0: ... print('Zero') ... elif x == 1: ... print('Single') ... else: ... print('More')
list ( similar C Language array ,Python There are no arrays in ) In the reference
>>> x [1, 3, 5, 8, 9] >>> x[1:3] # The index range [3, 5] >>> x[:] # All [1, 3, 5, 8, 9]
contain 3 One dimensional list of sub lists
X=array( [[1,2,3,4], [5,6,7,8], [9,10,11,12]] )X[:, 0]
Is to take the matrix X Of all lines of the 0 The elements of the column ,X[:,1] Take the first... Of all lines 1 The elements of the columnX[:, m:n]
Take matrix X Of all lines of the m To n-1 Column data , Left and right includedrecommend : Official documents on Range Explanation ( Click to see )
a[x:y:z]
x Indicates the starting point of the slice ,y Indicates the end of the slice ,z Indicating step size , step z The default is 1;
If z Is a positive number , By default x、y Index the beginning and end of the list respectively , The formula of the content is a[i] = start + step*i among i >= 0 And r[i] < stop;
If z It's a negative number , Reverse order , The formula for the content is still a[i] = start + step*i, But the restriction is changed to i >= 0 And a[i] > z.;
If a[0] Restrictions that do not meet the value , Then a The object is empty . a Objects do support negative indexes ( The index of the last element is -1, The index of the penultimate element is -2), However, it will be interpreted as starting from the end of the sequence determined by the positive index ;
If z by 0, False report .as follows :
>>> a = [1,3,5,8] >>> a[::] [1, 3, 5, 8] >>> a[1:3:] [3, 5] >>> a[::2] [1, 5] >>> a[::-1] [8, 5, 3, 1] >>> a[::-2] [8, 3] >>> a[1::-2] [3] >>> a[0:3:-1] [] >>> a[3:0:-1] [8, 5, 3] >>> a[3:0:-2] [8, 3] >>> a[1:3:0] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: slice step cannot be zero
range(10)[::2] Scope of representation [0, 10) In steps of 2 The section of
>>> range(10)[::2] range(0, 10, 2)
code
The code of this demand scheme 1 ( Not recommended )
Although the code can get the correct result , however for i in range(rows-1):
Although there are pairs inside the statement rows Value modification , however i It will be reassigned in the next cycle ,i The value in the next cycle will still not be determined by the internal pair of rows To change .i Will always put all [0, rows-1) The integer value in is taken . See official for Statements explain
#coding=utf-8
# First line function : Prevent Chinese miscoding
import xlwings as xw
import operator
# open Excel Program , The default setting is that the program is visible , Only open without creating
app = xw.App(visible=True,add_book=False)
wb = xw.Book('test.xls')
sht = wb.sheets[0]
rng = sht.range('A1').expand('table')
rows = rng.rows.count
cols = rng.columns.count
print(str(rows)+", "+ str(cols))
# Convert the data in the table into two-dimensional array format
all = sht.range((1,1),(rows,cols)).value
for i in range(rows-1):
print("--"+str(i)+"---")
print(range(rows-1,i,-1))
for j in range(rows-1,i,-1):
# print(str(i)+", "+str(j))
# First use strip() Remove the leading and trailing blanks in the cell data , Prevent blank data from affecting the results
if operator.eq(all[i][1].strip(),all[j][1].strip()) and operator.eq(all[i][2].strip(),all[j][2].strip()):
# print(all[j])
# print(str(j))
rng.rows[j].api.EntireRow.Delete()
del all[j]
rows-=1
wb.save()
wb.close()
app.quit()
Scheme code 2 ( recommend )
#coding=utf-8
import xlwings as xw
# app = xw.App(visible=True, add_book=False)
wb = xw.Book('ok.xls')
# wb = app.books.open(r'D:/cuncaojin/desktop/ok.xls')
# wb = xw.Book(r'D:/cuncaojin/desktop/ok.xls')
sht = wb.sheets[0]
# myList = [['2018 National essential drug procurement catalogue ', ' Albendazole ', ' tablet '], ['2018 National essential drug procurement catalogue ', ' Alfacalcitol ', ' tablet '], ['2018 The country Catalogue of essential drugs ', ' Amikacin ', ' Injection '], ['2018 National essential drug procurement catalogue ', ' Amoxicillin ', ' capsule '], [' Other directories ', ' Amoxicillin ', ' Granule '], [' Other directories ', ' Amoxicillin ', ' Dry suspension '], [' Other directories ', ' Amoxicillin ', ' Dispersible tablets '], [' Other directories ', ' Amoxicillin ', ' capsule '], ['2018 National essential drug procurement catalogue ', ' Amoxicillin ', ' Granule '], [' Catalogue of commonly used low-cost drugs and essential drugs ', ' Amoxicillin ', ' Dispersible tablets 2'], ['2018 National Foundation This drug purchase catalogue ', ' Amoxicillin ', ' capsule '], [' Catalogue of commonly used low-cost drugs and essential drugs ', ' Amoxicillin ', ' Dispersible tablets '], ['2018 National essential drug procurement catalogue ', ' Amoxicillin 3', ' capsule '], ['2018 National essential drug procurement catalogue ', ' Amoxicillin ', ' capsule '], ['2018 National essential drug procurement catalogue ', ' Amoxicillin ', ' capsule ']]
myList = sht[0,0].current_region.value
# print(len(myList),"\n", myList)
i = 0
while i<len(myList)-1:
for j in range(len(myList)-1,i,-1):
if myList[i][1].strip()==myList[j][1].strip() and myList[i][2].strip()==myList[j][2].strip():
# Remove duplicate data
myList.remove(myList[j])
# Compensate for data migration caused by deletion
i-=1
i+=1
sht['F1'].value = myList
# sht['F1'].current_region.autofit()
# sht['F1'].current_region.color = (221,170,244)
# Print the number of final data lines
print(" Number of final valid data rows :%d"%len(myList))
wb.save()
wb.close()
# Sometimes quit、 Even use kill Can't close Excel
# app.quit()
# app.kill()
# sign out Excel application
for app in xw.apps:
app.quit()
Show yourself other code
For more usage, please visit this article Reference resources Link part , Strongly recommended see Python official api 、xlwings official api and Excel official api .
import matplotlib.pyplot as plt
import xlwings as xw
import pandas as pd
import numpy as np
import os
exit = os.path.exists(r'E:\yg\desktop\test.xlsx')
app=xw.App(visible=True,add_book=False)
if(exit):
wb = xw.Book(r'E:\yg\desktop\test.xlsx')
else:
wb=app.books.add()
sht = wb.sheets[0]
df = pd.DataFrame(np.random.rand(7, 4), columns=['aaa', 'bb', 'c', 'd'])
ax = df.plot(kind='bar')
fig = ax.get_figure()
sht.pictures.add(fig, name='MyPlot', update=True)
wb.save(r'E:\yg\desktop\test.xlsx')
wb.close()
app.quit()
help
Because I haven't touched python, I don't understand either. VBA, Therefore, there may be a number of improper or errors in the content of this article , If found , Please correct .
Reference resources
- Python API
- xlwings official API chinese
- xlwings Official documents english
- Range.EntireRow attribute (Excel)
- xlwings Project includes Demo
- Put on your wings , Give Way Excel Fly up ——xlwings
- Use python Draw commonly used charts
- Pandas course
- Novice tutorial ——Python Basic course
- python numpy Use of colons in arrays
- python Middle double colon [::] The function of slices
- Python_Python There are four ways to traverse the list
- Python The middle shape is like xx for xx in yy Linked list derivation of
边栏推荐
- 详解sizeof、strlen、指针和数组等组合题
- Flex flexible box layout
- Compilation error: "not in executable format: file format not recognized"“
- 一条通往服务器所有端口的隧道
- Installation of PHP FPM software +openresty cache construction
- [usaco12mar]cows in a skyscraper g (state compression DP)
- go 解析身份证
- P1896 [scoi2005] non aggression (shape pressure DP)
- Student educational administration management system of C # curriculum design
- Oracle insert single quotation mark
猜你喜欢
the installer has encountered an unexpected error installing this package
Mxone Pro adaptive 2.0 film and television template watermelon video theme apple cmsv10 template
Un système de gestion de centre commercial pour la conception de cours de technologie d'application de base de données
十六进制编码简介
CLion-Toolchains are not configured Configure Disable profile问题解决
Display terrain database on osgearth ball
Transfinite hacker cognition
Xlua task list youyou
Detailed explanation of all transfer function (activation function) formulas of MATLAB neural network
[updating] wechat applet learning notes_ three
随机推荐
Kwai 20200412 recruitment
Scite change background color
UE4 plug in development
[updating] wechat applet learning notes_ three
Solution détaillée de toutes les formules de fonction de transfert (fonction d'activation) du réseau neuronal MATLAB
了解小程序的笔记 2022/7/3
Some understandings of 3dfiles
Ilruntime learning - start from scratch
Golang的range
Redis data structure
MAE
About the problem that the editor and the white screen of the login interface cannot be found after the location of unityhub is changed
Wpf: solve the problem that materialdesign:dialoghost cannot be closed
Clip Related Script
ArrayList
Basic operation and process control
Golang 中string和int类型相互转换
Golang 时间格式整理
Open the influence list of "National Meteorological Short Videos (Kwai, Tiktok) in November" in an interactive way“
【K&R】中文第二版 个人题解 Chapter1