Warm tip: This article is reproduced from stackoverflow.com, please click
powershell scripting

How can I filter out text twice in Powershell?

发布于 2020-04-19 09:37:49

I have a Powershell script that returned an output that's close to what I want, however there are a few lines and HTML-style tags I need to remove. I already have the following code to filter out:

get-content "atxtfile.txt" | select-string -Pattern '<fields>' -Context 1

However, if I attempt to pipe that output into a second "select-string", I won't get any results back. I was looking at the REGEX examples online, but most of what I've seen involves the use of coding loops to achieve their objective. I'm more used to the Linux shell where you can pipe output into multiple greps to filter out text. Is there a way to achieve the same thing or something similar with PowerShell? Here's the file I'm working with as requested:

<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.force.com/2006/04/metadata">
<actionOverrides>
    <actionName>Accept</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>CancelEdit</actionName>
    <type>Default</type>
</actionOverrides>
   <actionOverrides>
    <actionName>Today</actionName>
    <type>Default</type>
</actionOverrides>
<actionOverrides>
    <actionName>View</actionName>
    <type>Default</type>
</actionOverrides>
<compactLayoutAssignment>SYSTEM</compactLayoutAssignment>
<enableFeeds>false</enableFeeds>
<fields>
    <fullName>ActivityDate</fullName>
</fields>
<fields>
    <fullName>ActivityDateTime</fullName>
</fields>
<fields>
    <fullName>Guid</fullName>
</fields>
<fields>
    <fullName>Description</fullName>
</fields>
</CustomObject>

So, I only want the text between the <fullName> descriptor and I have the following so far:

get-content "txtfile.txt" | select-string -Pattern '<fields>' -Context 1

This will give me everything between the <fields> descriptor, however I essentially need the <fullName> line without the XML tags.

Questioner
murkywaters
Viewed
53
mklement0 2018-04-17 02:39

The simplest PSv3+ solution is to use PowerShell's built-in XML DOM support, which makes an XML document's nodes accessible as a hierarchy of objects with dot notation:

PS> ([xml] (Get-Content -Raw txtfile.txt)).CustomObject.fields.fullName
ActivityDate
ActivityDateTime
Guid
Description    

Note how even though .fields is an array - representing all child <fields> elements of top-level element <CustomObject> - .fullName was directly applied to it and returned the values of child elements <fullName> across all array elements (<field> elements) as an array.

This ability to access a property on a collection and have it implicitly applied to the collection's elements, with the results getting collected in an array, is a generic PSv3+ feature called member enumeration.


As an alternative, consider using the Select-Xml cmdlet (available in PSv2 too), which supports XPath queries that generally allow for more complex extraction logic (though not strictly needed here); Select-Xml is a high-level wrapper around the [xml] .NET type's .SelectNodes() method.
The following is the equivalent of the solution above:

$namespaces = @{ ns="http://soap.force.com/2006/04/metadata" }
$xpathQuery = '/ns:CustomObject/ns:fields/ns:fullName'
(Select-Xml -LiteralPath txtfile.txt $xpathQuery -Namespace $namespaces).Node.InnerText

Note:

Unlike with dot notation, XML namespaces must be considered when using Select-Xml.

Given that <CustomObject> and all its descendants are in namespace xmlns, identified via URI http://soap.force.com/2006/04/metadata, you must:

  • define this namespace in a hashtable you pass as the -Namespace argument
    • Caveat: Default namespace xmlns is special in that it cannot be used as the key in the hashtable; instead, choose an arbitrary key name such as ns, but be sure to use that chosen key name as the node-name prefix (see next point).
  • prefix all node names in the XPath query with the namespace name followed by :; e.g., ns:CustomObject