Warm tip: This article is reproduced from stackoverflow.com, please click
excel nlp vba ms-word

pull full names and surname/surnames form the consecutive paragraphs in Excel / Word VBA

发布于 2020-03-31 22:58:27

First Things First: The macro is run from Excel VBA editor, but performs the biggest part of its job on the previously opened Word document, where it's goal is to find the full names of the people who are the contracting parties in the agreement being analized.

The issue I'm experiencing with the code is that it is variable number of words, that I need to pull from every consecutive paragraph. If the name is Will SMITH, then its two words I need to pull, when it's Carrie Ann MOSS, then it's three words, sometimes it can be Anna Nicole SMITH BURKE, than its four words but when it's Anna Nicole SMITH-BURKE, than its five words and so on.

The other idea to get this full name is, that it always ends with a coma, and this coma is always the first coma in this paragraph, where the full name appears.

ATTENTION !!! The Paragraphs we work with are not ListParagraphs. They are the normal/ordinary ones albeit indented and numbered. I get these contracts from people who don't care to use numbered list :-(
So for the last time: The numbered list is not enabled on those paragraphs we work with.

This is how it looks like in Word and the selected words are the names and surnames that the macro is supposed to extract from the document - excluding the coma after the last surname.

enter image description here

Sub FindNamesCleanDraftWithLoop()
    'Variables declaration
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application
    Dim MySheet As Excel.Worksheet
    Dim Para As Word.Paragraph
    Dim Rng As Word.Range
    Dim RngStart As Word.Range
    Dim RngEnd As Word.Range

    Dim TextToFind1 As String
    Dim TextToFind2 As String
    Dim firstName As String
    Dim startPos As Long
    Dim endPos As Long

    Application.ScreenUpdating = False

    'Assigning object variables
    Set WordApp = GetObject(, "Word.Application")
    Set ExcelApp = GetObject(, "Excel.Application")
    Set WordDoc = WordApp.ActiveDocument
    Set MySheet = Application.ActiveWorkbook.ActiveSheet
    'Set MySheet = ExcelApp.ActiveWorkbook.ActiveSheet
    Set Rng = WordApp.ActiveDocument.Content
    TextToFind1 = "REGON 364061169, NIP 951-24-09-783,"
    TextToFind2 = "- ad."

    'InStr function returns a Variant (Long) specifying the position of the first occurrence of one string within another.
    startPos = InStr(1, Rng, TextToFind1) - 1    'here we get 1421, we're looking 4 "TextToFind1"
    endPos = InStr(1, Rng, TextToFind2) - 1      'here we get 2246, we're looking 4 "- ad."
    If startPos = 0 Or endPos = 0 Then Exit Sub
    Rng.SetRange Start:=startPos, End:=endPos
    Debug.Print Rng.Paragraphs.Count

    If startPos = 0 Or endPos = 0 Then
        MsgBox ("Client's names were not found!")
    Else
        'somewhere here I need your help to write some lines that will
        'recognize how many words need to be pulled to extract the full
        'name/names + surname/surnames and nothing else - we end on the first coma.
        For Each Para In Rng.Paragraphs
            firstName = Trim$(Para.Range.Words(3))
            Debug.Print Para.Range.Words(1) & Para.Range.Words(2) & _
                        Para.Range.Words(3) & Para.Range.Words(4) & _
                        Para.Range.Words(5) & Para.Range.Words(6)
        Next Para
    End If
End Sub

There in the For Each Para ... Next Para loop, I need your help to write some lines that will recognize how many words need to be pulled to extract the full name/names + surname/surnames and nothing else - we end on the first coma - that means excluding the coma after the last surname.

Questioner
michal roesler
Viewed
62
Ron Rosenfeld 2020-01-31 22:27

This might do what you want, if I have understood correctly.

If the Number and Dot are NOT part of the paragraph, then to extract the full name, you could use:

Debug.Print Left(Para, InStr(Para, ",") - 1)

If the Number and Dot ARE part of the paragraph, then:

   Dim Start As Long, Length As Long
Start = InStr(Para, ".") + 1
Length = InStr(Para, ",") - Start

Debug.Print Trim(Mid(Para, Start, Length))

If you want to split the capitalized portion of the name from the rest, post several samples of the actual strings as TEXT as I am unable to copy/paste your screenshots into Excel.