Warm tip: This article is reproduced from serverfault.com, please click

JMESPath filter/MultiSelect on a large JSON file

发布于 2020-12-24 13:18:42

Datasource is a Juniper Router routing table output in json (> 3 GB json file). What I finally would like to have is being able to loop over a list of prefixes and get the prefix as-path combination.

#!/usr/bin/env python

import json
from jmespath import search

jsonData = """{
  "route-information": [
    {
      "route-table": [
        {
          "rt": [
            {
              "rt-destination": [
                {
                  "data": "2001:db8:1::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "64511 65551 I"
                    }
                  ]
                }
              ]
            },
            {
              "rt-destination": [
                {
                  "data": "2001:db8:2::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "65536 64496 I"
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}"""

data = json.loads(jsonData)

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\"}'

output = search(query, data)
print(output)

the above query results in:

[{'destination': [{'data': '2001:db8:1::/48'}], 'path': [{'as-path': [{'data': '64511 65551 I'}]}]}, {'destination': [{'data': '2001:db8:2::/48'}], 'path': [{'as-path': [{'data': '65536 64496 I'}]}]}]

Looks like on the right way. But I would to have the prefix : as-path combination, so I would like to get rid of the "data:" and "as-path.data" part (in the actual json file there are many more objects on this level and I try to get rid of them here).

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\".\"as-path\"}

and/or

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\".\"as-path\".data}'

results in:

[{'destination': [{'data': '2001:db8:1::/48'}], 'path': None}, {'destination': [{'data': '2001:db8:2::/48'}], 'path': None}]

Any ideas why "None" and or/how to proceed?

Another idea was filtering:

query='"route-information"[]."route-table"[].rt[?"rt-destination".data==`2001:db8:2::/48`]'

and then working my way further down to get the as-path. But the query results in [[]] where as

query='"route-information"[]."route-table"[].rt[?"rt-destination".data=="2001:db8:2::/48"]'

leads to

[[{'rt-destination': [{'data': '2001:db8:1::/48'}], 'rt-entry': [{'as-path': [{'data': '64511 65551 I'}]}]}, {'rt-destination': [{'data': '2001:db8:2::/48'}], 'rt-entry': [{'as-path': [{'data': '65536 64496 I'}]}]}]]

so no filtering at all.

Questioner
tyez
Viewed
0
β.εηοιτ.βε 2020-12-25 05:25:39

If your goal is to have a set of data looking like this:

[
  {
    "destination":  "2001:db8:1::/48",
    "path": "64511 65551 I"
  },
  {
    "destination": "2001:db8:2::/48",
    "path": "65536 64496 I"
  }
]

It can simply be achieved with this query:

route-information[].route-table[].rt[].{destination: rt-destination[0].data, path: rt-entry[0].as-path[0].data}

But I suspect, with the lists you have under rt-destination, rt-entry and as-path, that this query might actually make you miss some data.

Here is one that is less likely going to make you miss data but that is creating a list under path, so the resulting JSON looks like:

[
  {
    "destination": [
      "2001:db8:1::/48"
    ],
    "path": [
      "64511 65551 I"
    ]
  },
  {
    "destination": [
      "2001:db8:2::/48"
    ],
    "path": [
      "65536 64496 I"
    ]
  }
]

And the query is:

route-information[].route-table[].rt[].{destination: rt-destination[*].data, path: rt-entry[*].as-path[*].data | [] }

This query makes use of the flatten operator |, which is explained in the examples under working with nested data.


Here is a script demonstrating this:

import jmespath

data = {
  "route-information": [
    {
      "route-table": [
        {
          "rt": [
            {
              "rt-destination": [
                {
                  "data": "2001:db8:1::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "64511 65551 I"
                    }
                  ]
                }
              ]
            },
            {
              "rt-destination": [
                {
                  "data": "2001:db8:2::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "65536 64496 I"
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

query = [
    '"route-information"[]."route-table"[].rt[].{destination: "rt-destination"[0].data, path: "rt-entry"[0]."as-path"[0].data}',
    '"route-information"[]."route-table"[].rt[].{destination: "rt-destination"[*].data, path: "rt-entry"[*]."as-path"[*].data | [] }'
]

print(jmespath.search(query[0], data))
print('---------------------------')
print(jmespath.search(query[1], data))

Prints (prettified by hand):

[
   {
      "destination":"2001:db8:1::/48",
      "path":"64511 65551 I"
   },
   {
      "destination":"2001:db8:2::/48",
      "path":"65536 64496 I"
   }
]
---------------------------
[
   {
      "destination":[
         "2001:db8:1::/48"
      ],
      "path":[
         "64511 65551 I"
      ]
   },
   {
      "destination":[
         "2001:db8:2::/48"
      ],
      "path":[
         "65536 64496 I"
      ]
   }
]