当前位置:网站首页>Efficient office of fintech (I): automatic generation of trust plan specification
Efficient office of fintech (I): automatic generation of trust plan specification
2022-06-23 06:05:00 【cjh_ hit】
Efficient office of fintech : Automatically generate the trust plan specification
background
Computers have greatly improved people's work efficiency , But in addition to using mature software on the market , The financial industry also has to meet the actual business needs , Write your own gadgets to improve office efficiency .
The internship company gave me a task yesterday afternoon , Said he was in a hurry : According to two word The mapping relationship between file paragraphs automatically generates the trust plan specification . In particular , One document is the due diligence report , It contains relevant information about business participants , The information is filled in according to the specific template . Another document is the plan statement , There are also specific templates .
( So there is not much content in this project , Just use Python Will a word Copy the specified paragraph in the file to another word Specify the location in the file .)
Due diligence report :
Instructions :
The mapping table :
demand
In the past, employees copied it manually 、 Paste , But the company has a large volume , Deal with a lot of contracts every day , Therefore, it is necessary to write a program to automatically generate plan specifications according to the mapping relationship to improve office efficiency .
To write
Finally, by checking the data , In the end use Python-docx The library has developed a program that can A The content between two paragraphs in a document ( Include Text and tables ) Copied to the B Procedure after the specified paragraph of the document .
Because the first contact Python-docx, Not very familiar with the principles and details of many interfaces .Python-docx The principle seems to be that Python-docx The structure of is transformed into xml. The data was used last year Java Handle xml The program , But I have forgotten for a long time …
The source code is given directly below :
Because for the first time Python-docx, If there is any non-standard or non Introduction , Please forgive me and point out .
Of course , Before use in adopt pip install Python-docx.
from docx import Document
from docx.text.paragraph import Paragraph
from docx.oxml.text.paragraph import CT_P
from docx.oxml.table import CT_Tbl
from docx.table import Table
from copy import deepcopy
import pandas as pd
def copyText(filename,paratext,Para):
document = Document(filename)
paras=document.paragraphs
index=0
if type(paratext)==str:
print('copy:',paratext,Para.text)
for para in paras:
if para.text==paratext:
index=paras.index(para)+1
para=paras[index]
else:
print('copy:',Para.text,paratext.text)
for para in paras:
if para.text== paratext.text:
index = paras.index(para) + 1
paratext.runs[0].drawing_lst:
para = paras[index]
newPara=para.insert_paragraph_before()
for run in Para.runs:
# Copy content ( Include styles )
newParaRun=newPara.add_run(run.text)
newParaRun.bold = run.bold
newParaRun.italic = run.italic
newParaRun.underline = run.underline
newParaRun.font.color.rgb = run.font.color.rgb
newParaRun.style.name = run.style.name
newPara.paragraph_format.alignment = Para.paragraph_format.alignment
newPara.paragraph_format.first_line_indent = Para.paragraph_format.first_line_indent
newPara.paragraph_format.left_indent = Para.paragraph_format.left_indent
newPara.paragraph_format.right_indent = Para.paragraph_format.right_indent
document.save(filename)
def copyTable(filename,paratext,table):
# Copy the form
document = Document(filename)
paras = document.paragraphs
if type(paratext)==str:
for para in paras:
#print(para.text)
if paratext == para.text :
paragraph=para
tbl, p = table._tbl, paragraph._p
else:
for para in paras:
# print(para.text)
if paratext.text == para.text:
paragraph = para
tbl, p = table._tbl, paragraph._p
new_tbl = deepcopy(tbl)
p.addnext(new_tbl)
document.save(filename)
def Copy_Contents_Between_ParaA_ParaB_to_ParaC(filename1, filename2,Paratext1,Paratext2,Paratext3):
documentA = Document(filename1)
paragraphs = documentA.paragraphs# All paragraphs
Paratext1 = Paratext1.encode('utf-8').decode('utf-8')
for aPara in paragraphs:
if Paratext1 == aPara.text :# Match to the beginning paragraph
ele = aPara._p.getnext()
break
while(True):# Traverse backward
if ele==None:
break
if ele.tag == '':
break
if isinstance(ele, CT_P):# It's a paragraph
para = Paragraph(ele, documentA)
if Paratext2 == para.text:
break
copyText(filename2, Paratext3, para)# Copy the form
if para.text!='':
Paratext3=para
elif isinstance(ele, CT_Tbl):# It's a form
table=Table(ele,documentA)
copyTable(filename2,Paratext3,table)# Copy the form
ele=ele.getnext()
if __name__ == '__main__':
data = pd.read_excel(' To tune out - Plan specification mapping table .xlsx')
for i in range(len(data[' Plan statement ( To generate table )'])):
Copy_Contents_Between_ParaA_ParaB_to_ParaC(' Data sources - Due diligence report .docx',' The resulting trust plan statement .docx',data[' Due diligence report - Start paragraph '][i],
data[' Due diligence report - End paragraph '][i],data[' Plan statement ( To generate table )'][i])
Plan specification generated after program execution :
You can see , The data specified in the due diligence report has been copied to the specified location in the plan specification .
Problems to be solved :
The above program can copy text, tables and styles , But you can't copy pictures . according to the understanding of ,Python-docx There is no interface to extract the picture at the specified location ( At least not in the official manual ), So secondary development is needed , It is necessary to study Python-docx And some xml The knowledge of the . But because time is limited ( Online classes still need to be watched , Homework still needs to be written ), I will leave this question to my internship director .
If the big guys know how to solve the problem of image processing , Please grant me your advice .
边栏推荐
- [Stanford Jiwang cs144 project] lab2: tcpreceiver
- Pat class B 1012 C language
- 线性表 链表结构的实现
- Pat class B 1023 minimum decimals
- PAT 乙等 1014 C语言
- JS interview question - anti shake function
- Visual Studio调试技巧
- vant weapp日历组件性能优化 Calendar 日历添加min-date最小日期页面加载缓慢
- True MySQL interview question (24) -- row column exchange
- PAT 乙等 1020.月饼
猜你喜欢

The construction of digital factory can be divided into three aspects

Analysis on the problems and causes of digital transformation of manufacturing industry

Memory analysis and memory leak detection

jvm-03. JVM memory model

Summary of ant usage (I): using ant to automatically package apk

如何指定pig-register项目日志的输出路径

jvm-05. garbage collection

Radar canvas

mongodb 4.x绑定多个ip启动报错

SSM project construction
随机推荐
Ant Usage Summary (II): description of related commands
【数据库备份】通过定时任务完成MySQL数据库的备份
Pat class B 1015 C language
Leetcode topic analysis add binary
jvm-03.jvm内存模型
ant使用总结(二):相关命令说明
Real MySQL interview question (23) -- pinduoduo ball game analysis
True MySQL interview question (21) - Finance - overdue loan
iNFTnews | 加密之家从宇宙寄来的明信片,你会收到哪一张?
mongodb项目中可能出现的坑
Prometheus, incluxdb2.2 installation and flume_ Export download compile use
Eight data analysis models: ogsm model
Operating mongodb in node
True MySQL interview question (24) -- row column exchange
PAT 乙等 1026 程序运行时间
New classes are launched | 5 minutes each time, you can easily play with Alibaba cloud container service!
PAT 乙等 1009 C语言
[OWT] OWT client native P2P E2E test vs2017 build 6: modify script automatic generation vs Project
三项最高级认证,两项创新技术、两大优秀案例,阿里云亮相云原生产业大会
Tcp/ip explanation (version 2) notes / 3 link layer / 3.4 bridge and switch