当前位置：网站首页>[VBA Script] extract the information and pending status of all annotations in the word document

[VBA Script] extract the information and pending status of all annotations in the word document

2022-06-11 01:27:00 【Netherland's meow】

Preface

About word Documentation tools , I've done this before ：

in the light of word.docx Keyword indexer for documents

What I envision in this tool is to check whether there are still... In the document later in the project TBD/TODO Such keywords have not been cleared , Check the completion status of the document . Then , Continue to explore tools for document quality checking , So I found that many of our documents review It is done through annotations （ Of course, there are also websites ）, The pending status of these annotations is not very intuitive ：

Especially when the document is long , It needs to be done one by one （ Yes, of course ,word It also supports skipping to the next unresolved ）. If only one document is OK , If you are the delivery leader , Be responsible for the delivery quality of many documents , It is certainly unrealistic to read documents one by one , Therefore, I think it is necessary to make such a statistical archiving tool . Of course , There has been a review Websites or platforms do this , So I use this tool mainly to practice my hand , Or people who have not bought such platforms .

The ultimate idea

Graphical interface operation ：

1. Select directory , Then recursively get all word file ;

2. For each word file , Grab all comments , Include document path 、 Annotate page number 、 Line number 、 Comment content 、 original text 、 Annotator 、 Comment time 、 Annotation resolution status , Annotation resolution status is the core information required ;

3. set an option , You can just grab unresolved annotations ;

4. After successful capture, the information will be sorted out to the required excel In the document , For review .

Grab annotation information

python Grab annotation

At first I thought of using python To grab docx Annotation information in , It also imitates the code ：

def docx_comments_get(file):
	document = ZipFile(file)
	xml = document.read("word/comments.xml")
	wordObj = BeautifulSoup(xml.decode("utf-8"), features="xml")
	texts = wordObj.findAll("w:t")
	for text in texts:
		print(text.text)
pass

def main():
	docx_comments_get("D:\MyWork\python\ Test documentation .docx")

But I found that doing so can only crawl the annotation content , It is difficult to obtain other information , Even if it turns on docx in comments.xml Source file , The content is also very limited ：

Other information is scattered in his xml In the document , I really don't know how to deal with . So pass python To extract the complete information of annotations is basically a dead end .

VBA Grab annotation

So I changed direction , adopt VBA To get internal annotation information , Microsoft's own tools are right word The support should not be bad . Continue in this direction and find out that ,VBA You can put a word The internal annotation information is very perfect . adopt word The development tools for visual basic The programming interface of , Start writing macro files .

Here is my final macro code ：


Public Sub exportWordComments_Click()

    FileName = Application.ActiveDocument ' file name .docx
    
    varResult = VBA.Split(FileName, ".")
    FileNameStr = varResult(0) ' Remove the suffix from the file name 
    
    Path = Application.ActiveDocument.Path
    FilePath = Path & "\" & FileName ' The full path to the current file 
    LogPath = Path & "\" & FileNameStr & "_comments.txt" ' Output directory of annotation information 
    'Debug.Print (FilePath)
    If FileName = "False" Then
        Exit Sub
    End If
    
    Rows = ActiveDocument.Comments.Count ' Total number of comments 
    'Debug.Print (Rows)
    
    Open LogPath For Output As #1 ' Output txt file 
    Print #1, "==================================================="
    For i = 1 To Rows
        PageNumber = ActiveDocument.Comments(i).Scope.Information(wdActiveEndPageNumber) ' What page is the annotation on 
        CharacterLineNumber = ActiveDocument.Comments(i).Scope.Information(wdFirstCharacterLineNumber) ' Comment on the first few lines of this page 
        Scope = ActiveDocument.Comments(i).Scope ' Annotate the original text 
        ScopeComment = ActiveDocument.Comments(i).Range ' Comment content 
        ScopeDate = ActiveDocument.Comments(i).Date  ' Comment time 
        ScopeAuthor = ActiveDocument.Comments(i).Contact ' Annotated by 
        ScopeDone = ActiveDocument.Comments(i).Done ' Whether the annotation is resolved 
        
        'Debug.Print (" original text ：" & ActiveDocument.Comments(i).Scope) ' original text 
        'Debug.Print (ActiveDocument.Comments(i).Done)
        'Debug.Print (ActiveDocument.Comments(i).Contact)
        'Debug.Print (ActiveDocument.Comments(i).Creator)
        'Debug.Print (ActiveDocument.Comments(i).Date)
        'Debug.Print (ActiveDocument.Comments(i).Index)
        'Debug.Print (ActiveDocument.Comments(i).Parent)
        'Debug.Print (ActiveDocument.Comments(i).Reference)
        'Debug.Print (" Comment content ：" & ActiveDocument.Comments(i).Range) ' Comment content 
        'Debug.Print (ActiveDocument.Comments(i).IsInk)' Include links 
        Print #1, " file ：" & FilePath
        Print #1, " page ：" & PageNumber
        Print #1, " That's ok ：" & CharacterLineNumber
        Print #1, " original text ：" & Scope
        Print #1, " Comments ：" & ScopeComment
        Print #1, " date ：" & ScopeDate
        Print #1, " Annotator ：" & ScopeAuthor
        Print #1, " Whether to solve ：" & ScopeDone
        Print #1, "==================================================="
    Next

    Print #1, ""
    Close #1
    
End Sub

After executing the macro command , Will be in word A... Appears under the directory of file name _comments.txt file , Open the file to see the following information ：

Postscript

After the most critical first step is completed , The next step is through python Recurse all files with processing , Call macro generation for each file txt, Organize all txt by excel surface , Make a graphical interface for the whole program to use .

Please follow up ~

原网站

版权声明
本文为[Netherland's meow]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206110004003576.html