当前位置:网站首页>[VBA Script] extract the information and pending status of all annotations in the word document
[VBA Script] extract the information and pending status of all annotations in the word document
2022-06-11 01:27:00 【Netherland's meow】
Preface
About word Documentation tools , I've done this before :
in the light of word.docx Keyword indexer for documents
What I envision in this tool is to check whether there are still... In the document later in the project TBD/TODO Such keywords have not been cleared , Check the completion status of the document . Then , Continue to explore tools for document quality checking , So I found that many of our documents review It is done through annotations ( Of course, there are also websites ), The pending status of these annotations is not very intuitive :

Especially when the document is long , It needs to be done one by one ( Yes, of course ,word It also supports skipping to the next unresolved ). If only one document is OK , If you are the delivery leader , Be responsible for the delivery quality of many documents , It is certainly unrealistic to read documents one by one , Therefore, I think it is necessary to make such a statistical archiving tool . Of course , There has been a review Websites or platforms do this , So I use this tool mainly to practice my hand , Or people who have not bought such platforms .
The ultimate idea
Graphical interface operation :
1. Select directory , Then recursively get all word file ;
2. For each word file , Grab all comments , Include document path 、 Annotate page number 、 Line number 、 Comment content 、 original text 、 Annotator 、 Comment time 、 Annotation resolution status , Annotation resolution status is the core information required ;
3. set an option , You can just grab unresolved annotations ;
4. After successful capture, the information will be sorted out to the required excel In the document , For review .
Grab annotation information
python Grab annotation
At first I thought of using python To grab docx Annotation information in , It also imitates the code :
def docx_comments_get(file):
document = ZipFile(file)
xml = document.read("word/comments.xml")
wordObj = BeautifulSoup(xml.decode("utf-8"), features="xml")
texts = wordObj.findAll("w:t")
for text in texts:
print(text.text)
pass
def main():
docx_comments_get("D:\MyWork\python\ Test documentation .docx")But I found that doing so can only crawl the annotation content , It is difficult to obtain other information , Even if it turns on docx in comments.xml Source file , The content is also very limited :

Other information is scattered in his xml In the document , I really don't know how to deal with . So pass python To extract the complete information of annotations is basically a dead end .
VBA Grab annotation
So I changed direction , adopt VBA To get internal annotation information , Microsoft's own tools are right word The support should not be bad . Continue in this direction and find out that ,VBA You can put a word The internal annotation information is very perfect . adopt word The development tools for visual basic The programming interface of , Start writing macro files .
Here is my final macro code :
Public Sub exportWordComments_Click()
FileName = Application.ActiveDocument ' file name .docx
varResult = VBA.Split(FileName, ".")
FileNameStr = varResult(0) ' Remove the suffix from the file name
Path = Application.ActiveDocument.Path
FilePath = Path & "\" & FileName ' The full path to the current file
LogPath = Path & "\" & FileNameStr & "_comments.txt" ' Output directory of annotation information
'Debug.Print (FilePath)
If FileName = "False" Then
Exit Sub
End If
Rows = ActiveDocument.Comments.Count ' Total number of comments
'Debug.Print (Rows)
Open LogPath For Output As #1 ' Output txt file
Print #1, "==================================================="
For i = 1 To Rows
PageNumber = ActiveDocument.Comments(i).Scope.Information(wdActiveEndPageNumber) ' What page is the annotation on
CharacterLineNumber = ActiveDocument.Comments(i).Scope.Information(wdFirstCharacterLineNumber) ' Comment on the first few lines of this page
Scope = ActiveDocument.Comments(i).Scope ' Annotate the original text
ScopeComment = ActiveDocument.Comments(i).Range ' Comment content
ScopeDate = ActiveDocument.Comments(i).Date ' Comment time
ScopeAuthor = ActiveDocument.Comments(i).Contact ' Annotated by
ScopeDone = ActiveDocument.Comments(i).Done ' Whether the annotation is resolved
'Debug.Print (" original text :" & ActiveDocument.Comments(i).Scope) ' original text
'Debug.Print (ActiveDocument.Comments(i).Done)
'Debug.Print (ActiveDocument.Comments(i).Contact)
'Debug.Print (ActiveDocument.Comments(i).Creator)
'Debug.Print (ActiveDocument.Comments(i).Date)
'Debug.Print (ActiveDocument.Comments(i).Index)
'Debug.Print (ActiveDocument.Comments(i).Parent)
'Debug.Print (ActiveDocument.Comments(i).Reference)
'Debug.Print (" Comment content :" & ActiveDocument.Comments(i).Range) ' Comment content
'Debug.Print (ActiveDocument.Comments(i).IsInk)' Include links
Print #1, " file :" & FilePath
Print #1, " page :" & PageNumber
Print #1, " That's ok :" & CharacterLineNumber
Print #1, " original text :" & Scope
Print #1, " Comments :" & ScopeComment
Print #1, " date :" & ScopeDate
Print #1, " Annotator :" & ScopeAuthor
Print #1, " Whether to solve :" & ScopeDone
Print #1, "==================================================="
Next
Print #1, ""
Close #1
End SubAfter executing the macro command , Will be in word A... Appears under the directory of file name _comments.txt file , Open the file to see the following information :

Postscript
After the most critical first step is completed , The next step is through python Recurse all files with processing , Call macro generation for each file txt, Organize all txt by excel surface , Make a graphical interface for the whole program to use .
Please follow up ~
边栏推荐
- SSH远程登陆配置sshd_config文件详解
- What are the advantages of increased life insurance products? Is the threshold high?
- async await
- A/B机器正常连接后, B机器突然重启, 问A此时处于TCP的 什么状态?如何消除服务器程序中的这个状态?
- 使用 CompletableFuture
- 复利的保险理财产品怎么样?可以买吗?
- Sealem finance builds Web3 decentralized financial platform infrastructure
- 简述自定义注解
- CentOS actual deployment redis
- 条码固定资产管理系统的作用,固定资产条码化管理
猜你喜欢

The emperors of the Ming Dynasty

SAS判别分析(Bayes准则和proc discrim过程)

Inventory management and strategy mode

Simple select sort and heap sort

SAS因子分析(proc factor过程和因子旋转以及回归法求因子得分函数)

Middleware_ Redis_ 06_ Redis transactions

项目_基于网络爬虫的疫情数据可视化分析

SAS主成分分析(求相关阵,特征值,单位特征向量,主成分表达式,贡献率和累计贡献率以及进行数据解释)

Application of object storage S3 in distributed file system

node和express实现mySql模糊搜索
随机推荐
2022北京怀柔区新技术新产品(服务)认定要求
Why can't Google search page infinite?
Project_ Visual analysis of epidemic data based on Web Crawler
async await
table_ exists_ Action=append and table_ exists_ action=truncate
Deepstream series fish eye camera test
Promise
ion_ mmap
How about compound interest insurance and financial products? Can I buy it?
云呐|省级行政单位固定资产管理系统
SAS判别分析(Bayes准则和proc discrim过程)
明朝的那些皇帝
Implementing MySQL fuzzy search with node and express
ava.lang.NoClassDefFoundError: org/apache/velocity/context/Context解决办法
[paper reading] fixmatch: simplifying semi supervised learning with consistency and confidence
项目_基于网络爬虫的疫情数据可视化分析
Some idle gossip
[original] expdp parameter content
A/B机器正常连接后, B机器突然重启, 问A此时处于TCP的 什么状态?如何消除服务器程序中的这个状态?
Non presented paper (no show) policy