I have a dataframe as follows:
df=
year|text|value
2001|text1|10
2001|text2|11
2002|text2|12
2003|text3|56
2005|text8|8
2005|text1|23
Now,I want to make a list of lists from the dataframe as follows:
l1=[[[10,0,0,23],[0,12,0,0],[0,0,56,0],[0,0,0,8]],[text1,text2,text3,text8],[2001,2002,2003,2005]]
I want to add zeros in the list when there is no value for the text in a particular year.
I have tried the following code:
for value in list(df['text'].values):
df1=df[df['text']==value]
series_list.append(list(df1['value'].values))
names_list.append(value)
year_list.append(list(df1['year'].values))
I did not get the expected output. I initially tried to make 3 separate lists.
Convert the first two columns to a MultiIndex. Build a rectangular matrix by unstacking one level of the index. Extract the values and arrange them into a list.
matrix = df.set_index(['text', 'year']).unstack(fill_value=0)
matrix.values.tolist()
#[[10, 0, 0, 23], [11, 12, 0, 0], [0, 0, 56, 0], [0, 0, 0, 8]]
Add the index and the columns, if necessary:
matrix.values.tolist() + [matrix.index.tolist()] \
+ [matrix.columns.levels[1].tolist()]
#[[10, 0, 0, 23], [11, 12, 0, 0], [0, 0, 56, 0], [0, 0, 0, 8],
# ['text1', 'text2', 'text3', 'text8'], [2001, 2002, 2003, 2005]]
I also want the order of year to be preserved @DYZ
Updated the answer. The change was obvious.
updated the code as follows and worked.
[matrix.values.tolist()]+[matrix.index.tolist()]+[matrix.columns.levels[1].tolist()]
. It doesn't work with tocolumns because it is multilevel columns @DYZAgree, the level selection is necessary.