Talend : Transform JSON lines to columns, extracting column names from JSON

♦ 2020-06-20 17:12

Here's a completely dynamic solution I put together.

First, you need to read the json in order to get the column list. Here's what tExtractJSONFields_2 looks like:

Then you store the columns and their positions in a tHashOutput (you need to unhide it in File > Project properties > Designer > Palette settings). In tMap_2, you get the position of the column using a sequence:

Numeric.sequence("s", 1, 1)

The output of this subjob is:

|=-------+--------=|
|position|column   |
|=-------+--------=|
|1       |firstname|
|2       |lastname |
|3       |age      |
|4       |city     |
'--------+---------'

The 2nd step is to read the json again, in order to parse the fields property. Like in step 1, you need to add a position to each field, relative to the columns. Here's the expression I used to get the sequence:

(Numeric.sequence("s1", 0, 1) % ((Integer)globalMap.get("tHashOutput_1_NB_LINE"))) + 1

Note that I'm using a different sequence name, because sequences keep their value throughout the job. I'm using the number of columns from tHashOutput_1 in order to keep things dynamic.
Here's the output from this subjob:

|=-------+---------+---------------=|
|position|label    |value           |
|=-------+---------+---------------=|
|1       |John     |John            |
|2       |Smith    |/person/4315    |
|3       |43       |43              |
|4       |London   |/city/54        |
|1       |Albert   |Albert          |
|2       |Einstein |/person/154     |
|3       |141      |141             |
|4       |Princeton|/city/9541      |
'--------+---------+----------------'

In the last subjob, you need to join the fields data with the columns, using the column position we stored with either one.

In tSplitRow_1 I generate 2 rows for each incoming row. Each row is a key value pair. The first row is <columnName>_label (like firstname_label, lastname_label) its value being the label from the fields. The 2nd row's key is <columnName>_value, and its value is the value from the fields.

Once again, we need to add a position to our data in tMap_4, using this expression:

(Numeric.sequence("s2", 0, 1) / ((Integer)globalMap.get("tHashOutput_1_NB_LINE") * 2)) + 1

Note that since we have twice as many rows coming out of tSplitRow, I multiply the number of columns by 2.
This will attribute the same ID for the data that needs to be on the same row in the output file. The output of this tMap will be like:

|=-+---------------+-----------=|
|id|col_label      |col_value   |
|=-+---------------+-----------=|
|1 |firstname_label|John        |
|1 |firstname_value|John        |
|1 |lastname_label |Smith       |
|1 |lastname_value |/person/4315|
|1 |age_label      |43          |
|1 |age_value      |43          |
|1 |city_label     |London      |
|1 |city_value     |/city/54    |
|2 |firstname_label|Albert      |
|2 |firstname_value|Albert      |
|2 |lastname_label |Einstein    |
|2 |lastname_value |/person/154 |
|2 |age_label      |141         |
|2 |age_value      |141         |
|2 |city_label     |Princeton   |
|2 |city_value     |/city/9541  |
'--+---------------+------------'

This leads us to the last component tPivotToColumnsDelimited which will pivot our rows to columns using the unique ID.

And the final result is a csv file like:

id;firstname_label;firstname_value;lastname_label;lastname_value;age_label;age_value;city_label;city_value
1;John;John;Smith;/person/4315;43;43;London;/city/54
2;Albert;Albert;Einstein;/person/154;141;141;Princeton;/city/9541

Note that you end up with an extraneous column at the beginning which is the row id which can be easily removed by reading the file and removing it.
I tried adding a new column along with the corresponding fields in the input json, and it works as expected.

Emmanuel BRUNO 1970-01-01 08:00:00

Thank you very much Ibrahim for your efforts. Your answer is very clear. I will test it as soon as I can and will validate your answer as soon as it's successfully done.

Ibrahim Mezouar 1970-01-01 08:00:00

My pleasure, let me know ;) Here's a solution on how to remove the extra id column stackoverflow.com/a/57047626/899863

Emmanuel BRUNO 1970-01-01 08:00:00

I had to adapt it since my example was just an example, my json was a bit more complicated, but it's working perfectly. Great job ! Thx again.

Ibrahim Mezouar 1970-01-01 08:00:00

Happy to hear that!

Related issues

talend tsendmail how to filter files with regex?

Issue Installing Talend 7.3, No response

Importing external jar in Talend

java.lang.NoClassDefFoundError in Talend Studio

Using Query_sm (Smpp)

Talend

How to make multiple rows from one row in Talend?

Read correctly a value which includes the file's delimiter

How can I batch load 1,000 XML files and convert to JSON with Talend?

Fiddler not capturing Talend tRestClient calls