温馨提示:本文翻译自stackoverflow.com,查看原文请点击:其他 - pull full names and surname/surnames form the consecutive paragraphs in Excel / Word VBA
excel nlp vba ms-word

其他 - 在Excel / Word VBA中连续的段落中提取全名和姓氏/姓氏

发布于 2020-03-31 23:32:42

首先,该宏是从Excel VBA编辑器运行的,但是在先前打开的Word文档中执行了大部分工作,其目的是查找正在被分析的协议的缔约方的全名。

我在代码中遇到的问题是,它是可变的单词数,需要从每个连续的段落中提取。如果名字叫威尔·史密斯(Will SMITH),那么我需要用两个词来表达,当它是嘉莉·安·莫斯(Carrie Ann MOSS)时,则是三个词,有时可能是安娜·妮可·史密斯·伯克(Anna Nicole SMITH BURKE),而不是四个词,它的五个字,依此类推。

获得此全名的另一个想法是,它始终以逗号结尾,并且该逗号始终是本段中出现全名的第一个逗号。

注意!!!我们使用的段落不是ListParagraphs它们是普通/普通的,尽管有缩进和编号。我是从那些不愿意使用编号列表的人那里获得这些合同的:-(
所以,最后一次:在我们使用的那些段落中,编号列表未启用。

这就是在Word中的样子,所选的单词是宏应从文档中提取的名称和姓氏-不包括最后一个姓氏后的逗号。

在此处输入图片说明

Sub FindNamesCleanDraftWithLoop()
    'Variables declaration
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application
    Dim MySheet As Excel.Worksheet
    Dim Para As Word.Paragraph
    Dim Rng As Word.Range
    Dim RngStart As Word.Range
    Dim RngEnd As Word.Range

    Dim TextToFind1 As String
    Dim TextToFind2 As String
    Dim firstName As String
    Dim startPos As Long
    Dim endPos As Long

    Application.ScreenUpdating = False

    'Assigning object variables
    Set WordApp = GetObject(, "Word.Application")
    Set ExcelApp = GetObject(, "Excel.Application")
    Set WordDoc = WordApp.ActiveDocument
    Set MySheet = Application.ActiveWorkbook.ActiveSheet
    'Set MySheet = ExcelApp.ActiveWorkbook.ActiveSheet
    Set Rng = WordApp.ActiveDocument.Content
    TextToFind1 = "REGON 364061169, NIP 951-24-09-783,"
    TextToFind2 = "- ad."

    'InStr function returns a Variant (Long) specifying the position of the first occurrence of one string within another.
    startPos = InStr(1, Rng, TextToFind1) - 1    'here we get 1421, we're looking 4 "TextToFind1"
    endPos = InStr(1, Rng, TextToFind2) - 1      'here we get 2246, we're looking 4 "- ad."
    If startPos = 0 Or endPos = 0 Then Exit Sub
    Rng.SetRange Start:=startPos, End:=endPos
    Debug.Print Rng.Paragraphs.Count

    If startPos = 0 Or endPos = 0 Then
        MsgBox ("Client's names were not found!")
    Else
        'somewhere here I need your help to write some lines that will
        'recognize how many words need to be pulled to extract the full
        'name/names + surname/surnames and nothing else - we end on the first coma.
        For Each Para In Rng.Paragraphs
            firstName = Trim$(Para.Range.Words(3))
            Debug.Print Para.Range.Words(1) & Para.Range.Words(2) & _
                        Para.Range.Words(3) & Para.Range.Words(4) & _
                        Para.Range.Words(5) & Para.Range.Words(6)
        Next Para
    End If
End Sub

For Each Para ... Next Para循环中,我需要您的帮助来写一些行,这些行将识别出提取全名/姓氏+姓氏/姓氏需要提取多少个单词,并且别无其他 -我们以第一个昏迷结尾-这意味着排除了昏迷在姓氏之后。

查看更多

提问者
michal roesler
被浏览
60
Ron Rosenfeld 2020-01-31 22:27

如果我正确理解的话,这可能会做你想要的。

如果NumberDot不是该段的一部分,那么提取的全名,你可以使用:

Debug.Print Left(Para, InStr(Para, ",") - 1)

如果NumberDot 该段的一部分,则:

   Dim Start As Long, Length As Long
Start = InStr(Para, ".") + 1
Length = InStr(Para, ",") - Start

Debug.Print Trim(Mid(Para, Start, Length))

如果要将名称的大写部分与其余部分分开,请将实际字符串的几个示例作为TEXT发布,因为我无法将屏幕截图复制/粘贴到Excel中。