当前位置:网站首页>[VBA Script] extract the information and pending status of all annotations in the word document
[VBA Script] extract the information and pending status of all annotations in the word document
2022-06-11 01:27:00 【Netherland's meow】
Preface
About word Documentation tools , I've done this before :
in the light of word.docx Keyword indexer for documents
What I envision in this tool is to check whether there are still... In the document later in the project TBD/TODO Such keywords have not been cleared , Check the completion status of the document . Then , Continue to explore tools for document quality checking , So I found that many of our documents review It is done through annotations ( Of course, there are also websites ), The pending status of these annotations is not very intuitive :

Especially when the document is long , It needs to be done one by one ( Yes, of course ,word It also supports skipping to the next unresolved ). If only one document is OK , If you are the delivery leader , Be responsible for the delivery quality of many documents , It is certainly unrealistic to read documents one by one , Therefore, I think it is necessary to make such a statistical archiving tool . Of course , There has been a review Websites or platforms do this , So I use this tool mainly to practice my hand , Or people who have not bought such platforms .
The ultimate idea
Graphical interface operation :
1. Select directory , Then recursively get all word file ;
2. For each word file , Grab all comments , Include document path 、 Annotate page number 、 Line number 、 Comment content 、 original text 、 Annotator 、 Comment time 、 Annotation resolution status , Annotation resolution status is the core information required ;
3. set an option , You can just grab unresolved annotations ;
4. After successful capture, the information will be sorted out to the required excel In the document , For review .
Grab annotation information
python Grab annotation
At first I thought of using python To grab docx Annotation information in , It also imitates the code :
def docx_comments_get(file):
document = ZipFile(file)
xml = document.read("word/comments.xml")
wordObj = BeautifulSoup(xml.decode("utf-8"), features="xml")
texts = wordObj.findAll("w:t")
for text in texts:
print(text.text)
pass
def main():
docx_comments_get("D:\MyWork\python\ Test documentation .docx")But I found that doing so can only crawl the annotation content , It is difficult to obtain other information , Even if it turns on docx in comments.xml Source file , The content is also very limited :

Other information is scattered in his xml In the document , I really don't know how to deal with . So pass python To extract the complete information of annotations is basically a dead end .
VBA Grab annotation
So I changed direction , adopt VBA To get internal annotation information , Microsoft's own tools are right word The support should not be bad . Continue in this direction and find out that ,VBA You can put a word The internal annotation information is very perfect . adopt word The development tools for visual basic The programming interface of , Start writing macro files .
Here is my final macro code :
Public Sub exportWordComments_Click()
FileName = Application.ActiveDocument ' file name .docx
varResult = VBA.Split(FileName, ".")
FileNameStr = varResult(0) ' Remove the suffix from the file name
Path = Application.ActiveDocument.Path
FilePath = Path & "\" & FileName ' The full path to the current file
LogPath = Path & "\" & FileNameStr & "_comments.txt" ' Output directory of annotation information
'Debug.Print (FilePath)
If FileName = "False" Then
Exit Sub
End If
Rows = ActiveDocument.Comments.Count ' Total number of comments
'Debug.Print (Rows)
Open LogPath For Output As #1 ' Output txt file
Print #1, "==================================================="
For i = 1 To Rows
PageNumber = ActiveDocument.Comments(i).Scope.Information(wdActiveEndPageNumber) ' What page is the annotation on
CharacterLineNumber = ActiveDocument.Comments(i).Scope.Information(wdFirstCharacterLineNumber) ' Comment on the first few lines of this page
Scope = ActiveDocument.Comments(i).Scope ' Annotate the original text
ScopeComment = ActiveDocument.Comments(i).Range ' Comment content
ScopeDate = ActiveDocument.Comments(i).Date ' Comment time
ScopeAuthor = ActiveDocument.Comments(i).Contact ' Annotated by
ScopeDone = ActiveDocument.Comments(i).Done ' Whether the annotation is resolved
'Debug.Print (" original text :" & ActiveDocument.Comments(i).Scope) ' original text
'Debug.Print (ActiveDocument.Comments(i).Done)
'Debug.Print (ActiveDocument.Comments(i).Contact)
'Debug.Print (ActiveDocument.Comments(i).Creator)
'Debug.Print (ActiveDocument.Comments(i).Date)
'Debug.Print (ActiveDocument.Comments(i).Index)
'Debug.Print (ActiveDocument.Comments(i).Parent)
'Debug.Print (ActiveDocument.Comments(i).Reference)
'Debug.Print (" Comment content :" & ActiveDocument.Comments(i).Range) ' Comment content
'Debug.Print (ActiveDocument.Comments(i).IsInk)' Include links
Print #1, " file :" & FilePath
Print #1, " page :" & PageNumber
Print #1, " That's ok :" & CharacterLineNumber
Print #1, " original text :" & Scope
Print #1, " Comments :" & ScopeComment
Print #1, " date :" & ScopeDate
Print #1, " Annotator :" & ScopeAuthor
Print #1, " Whether to solve :" & ScopeDone
Print #1, "==================================================="
Next
Print #1, ""
Close #1
End SubAfter executing the macro command , Will be in word A... Appears under the directory of file name _comments.txt file , Open the file to see the following information :

Postscript
After the most critical first step is completed , The next step is through python Recurse all files with processing , Call macro generation for each file txt, Organize all txt by excel surface , Make a graphical interface for the whole program to use .
Please follow up ~
边栏推荐
猜你喜欢

中间件_Redis_06_Redis的事务

对象存储 S3 在分布式文件系统中的应用

Direct insert sort and shell sort

Network foundation (1) -- understanding the network

nodejs中使用mySql数据库

对多线程的理解

Team management | how to improve the thinking skills of technical leaders?

SAS判别分析(Bayes准则和proc discrim过程)
WSL automatically updates the IP hosts file

中间件_Redis_05_Redis的持久化
随机推荐
Network foundation (1) -- understanding the network
程序员应对压力的几个小窍门
Support standard for cultivation of high-tech enterprises in Miyun District, Beijing, with a subsidy of 100000 yuan
ava. Lang.noclassdeffounderror: org/apache/velocity/context/context solution
什么是C端 什么是B端 这里告诉你
Sealem finance builds Web3 decentralized financial platform infrastructure
About mobx
Function of barcode fixed assets management system, barcode management of fixed assets
87. (leaflet house) leaflet military plotting - straight arrow modification
配置化自定义实现1.实现接口,2.自定义配置3.默认配置
对多线程的理解
項目_基於網絡爬蟲的疫情數據可視化分析
云呐|庆远固定资产管理及条码盘点系统
SQL question brushing and sorting in leetcode of partial deduction
中国专利奖政策支持介绍,补贴100万
Hooks' design philosophy
北京通州区高新技术企业培育支持标准,补贴10万
Array simulation [queue] and [ring queue]_ code implementation
Web3 ecological decentralized financial platform sealem Finance
table_ exists_ Action=append and table_ exists_ action=truncate