Warm tip: This article is reproduced from serverfault.com, please click

How to query an array field (AWS Glue)?

发布于 2020-11-20 09:39:23

I have a table in AWS Glue, and the crawler has defined one field as array. The content is in S3 files that have a json format. The table is TableA, and the field is members.

There are a lot of other fields such as strings, booleans, doubles, and even structs.

I am able to query them all using a simpel query such as:

SELECT
    content.my_boolean,
    content.my_string,
    content.my_struct.value
FROM schema.tableA;

The issue is when I add content.members into the query. The error I get is: [Amazon](500310) Invalid operation: schema "content" does not exist.

Content exists because i am able to select other fiels from the main key in the json (content). Probably is something related with how to perform the query agains array field in Spectrum.

Any idea?

Questioner
Maik
Viewed
1
Hyruma92 2020-11-20 23:33:56

You have to rename the table to extract the fields from the external schema:

SELECT
    a.content.my_boolean,
    a.content.my_string,
    a.content.my_struct.value
FROM schema.tableA a;

I had the same issue on my data, I really don't know why it needs this cast but it works. If you need to access elements of an array you have to explod it like:

SELECT member.<your-field>,
FROM schema.tableA a, a.content.members as member;

Reference