Warm tip: This article is reproduced from serverfault.com, please click

Shredding XML in SQL but cross apply adds unwanted records

发布于 2020-12-01 02:36:25

I have to flatten out this XML source into one SQL server table. There are multiple ResponseID per source file and multiple TextAnalyticsItem per ResponseID. I want to limit the records so that I only get the TextAnalyticsItems related to the ResponseID. However, the cross apply method gives me all ResponseId with all TextAnalyticsItem. How do I prevent the additional records?

DECLARE @XMLToParse XML;
SET @XMLToParse = '  
<Responses>
  <Response>
    <ResponseId>7662934</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234909</RespondentId>
    <QuestionId>141757</QuestionId>
    <ScaleId>3401</ScaleId>
    <AnswerId>17130</AnswerId>
    <ResponseMemo>Useful</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-06T09:07:40</CompletedDate>
    <ModifiedDate>2020-07-06T09:07:41</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Values Standards</Level1>
        <Level2>Better/Best/Brilliant</Level2>
        <Level3>Positive</Level3>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
      <TextAnalyticsItem>
        <Level1>All-Behaviors</Level1>
        <Level2>Positive_</Level2>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
   <Response>
    <ResponseId>7662078</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234826</RespondentId>
    <QuestionId>141756</QuestionId>
    <ScaleId>3400</ScaleId>
    <AnswerId>17129</AnswerId>
    <ResponseMemo>Ghjlkk</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-03T07:17:31</CompletedDate>
    <ModifiedDate>2020-07-03T07:17:31</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Nonactionable</Level1>
        <Sentiment>0</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
</Responses>'
 SELECT xmlData.A.value('ResponseId[1]', 'VARCHAR(100)') AS ResponseId, 
       xmlData.A.value('SurveyId[1]', 'VARCHAR(100)') AS SurveyId, 
       xmlData.A.value('RespondentId[1]', 'VARCHAR(100)') AS RespondentId, 
       xmlData.A.value('TextAnalyticsData[1]', 'VARCHAR(100)') AS TextAnalyticsData, 
       tbl1.TxtItems.value('Level1[1]', 'VARCHAR(100)') AS Level1, 
       tbl1.TxtItems.value('Level2[1]', 'VARCHAR(100)') AS Level2, 
       tbl1.TxtItems.value('Level3[1]', 'VARCHAR(100)') AS Level3, 
       tbl1.TxtItems.value('Sentiment[1]', 'VARCHAR(100)') AS Sentiment
FROM @XMLToParse.nodes('Responses/Response/TextAnalyticsData/TextAnalyticsItem') tbl1(TxtItems)
     cross apply @XMLToParse.nodes('Responses/Response') xmlData(A)

ORDER BY ResponseId, 
         RespondentId;

So instead of

ResponseId        SurveyId          RespondentId      TextAnalyticsData                                                     Level1            Level2            Level3     Sentiment
---------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------
7662078           123               234826            Nonactionable0                                                        Values Standards  Better/Best/BrilliPositive          1
7662078           123               234826            Nonactionable0                                                        All-Behaviors     Positive_         NULL              1
7662078           123               234826            Nonactionable0                                                        Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Values Standards  Better/Best/BrilliPositive          1
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 All-Behaviors     Positive_         NULL              1

I want to get

ResponseId        SurveyId          RespondentId      TextAnalyticsData                                                     Level1            Level2            Level3     Sentiment
---------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------
7662078           123               234826            Nonactionable0                                                        Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Values Standards  Better/Best/BrilliPositive          1
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 All-Behaviors     Positive_         NULL              1

Any suggestions? Thanks

Questioner
Fraser
Viewed
0
Yitzhak Khabinsky 2020-12-01 11:15:59

Check it out how to achieve what you need. It simulates one-to-many relationship.

I am not sure what was the intent for the TextAnalyticsData column. That's why I commented it out.

SQL

DECLARE @XMLToParse XML = 
N'<Responses>
  <Response>
    <ResponseId>7662934</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234909</RespondentId>
    <QuestionId>141757</QuestionId>
    <ScaleId>3401</ScaleId>
    <AnswerId>17130</AnswerId>
    <ResponseMemo>Useful</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-06T09:07:40</CompletedDate>
    <ModifiedDate>2020-07-06T09:07:41</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Values Standards</Level1>
        <Level2>Better/Best/Brilliant</Level2>
        <Level3>Positive</Level3>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
      <TextAnalyticsItem>
        <Level1>All-Behaviors</Level1>
        <Level2>Positive_</Level2>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
   <Response>
    <ResponseId>7662078</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234826</RespondentId>
    <QuestionId>141756</QuestionId>
    <ScaleId>3400</ScaleId>
    <AnswerId>17129</AnswerId>
    <ResponseMemo>Ghjlkk</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-03T07:17:31</CompletedDate>
    <ModifiedDate>2020-07-03T07:17:31</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Nonactionable</Level1>
        <Sentiment>0</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
</Responses>';

 SELECT r.value('(ResponseId/text())[1]', 'VARCHAR(100)') AS ResponseId
    , r.value('(SurveyId/text())[1]', 'VARCHAR(100)') AS SurveyId
    , r.value('(RespondentId/text())[1]', 'VARCHAR(100)') AS RespondentId
       --xmlData.A.value('TextAnalyticsData[1]', 'VARCHAR(100)') AS TextAnalyticsData, 
    , a.value('(Level1/text())[1]', 'VARCHAR(100)') AS Level1
    , a.value('(Level2/text())[1]', 'VARCHAR(100)') AS Level2
    , a.value('(Level3/text())[1]', 'VARCHAR(100)') AS Level3
    , a.value('(Sentiment/text())[1]', 'VARCHAR(100)') AS Sentiment
FROM @XMLToParse.nodes('/Responses/Response') t1(r)
     CROSS APPLY t1.r.nodes('TextAnalyticsData/TextAnalyticsItem[Level1/text()]') t2(a)
--ORDER BY ResponseId, 
--         RespondentId;

Output

+------------+----------+--------------+------------------+-----------------------+----------+-----------+
| ResponseId | SurveyId | RespondentId |      Level1      |        Level2         |  Level3  | Sentiment |
+------------+----------+--------------+------------------+-----------------------+----------+-----------+
|    7662934 |      123 |       234909 | Values Standards | Better/Best/Brilliant | Positive |         1 |
|    7662934 |      123 |       234909 | All-Behaviors    | Positive_             | NULL     |         1 |
|    7662078 |      123 |       234826 | Nonactionable    | NULL                  | NULL     |         0 |
+------------+----------+--------------+------------------+-----------------------+----------+-----------+