I have counted the number of times a word appears in a text document and put those values in a dictionary. Now I want to add those amounts to a matrix consisting of the textfiles as columns and the different words as rows. This is the output of the dictionary:
{'test1.txt': {'peer': 1, 'appel': 1, 'moes': 1},
'test2.txt': {'peer': 1, 'appel': 1},
'test3.txt': {'peer': 1, 'moes': 2},
'test4.txt': {'peer': 1, 'moes': 1, 'ananas': 1}}
And the output of the matrix has to look like this:
[['', 'test1.txt', 'test2.txt', 'test3.txt', 'test4.txt'],
['moes', 1, 0, 2, 1],
['appel', 1, 1, 0, 0],
['peer', 1, 1, 1, 1],
['ananas', 0, 0, 0, 1]]
This is the code I have now to print the matrix, but the number of times a word appears in each document is not implemented yet.
term_freq_matrix = []
list_of_files.insert(0," ")
term_freq_matrix.insert(1, list_of_files)
for unique_word in unique_words:
unique_word = unique_word.split()
term_freq_matrix.append(unique_word)
print(term_freq_matrix)
Thanks!
To do this without external libraries:
Code:
d = {'test1.txt': {'peer': 1, 'appel': 1, 'moes': 1},
'test2.txt': {'peer': 1, 'appel': 1},
'test3.txt': {'peer': 1, 'moes': 2},
'test4.txt': {'peer': 1, 'moes': 1, 'ananas': 1}}
res = [[''] + list(d.keys())]
for c in set(k for v in d.values() for k in v.keys()):
res.append([c] + [d[k].get(c, 0) for k in res[0][1:]])
Output:
>>> res
[['', 'test1.txt', 'test2.txt', 'test3.txt', 'test4.txt'],
['peer', 1, 1, 1, 1],
['ananas', 0, 0, 0, 1],
['appel', 1, 1, 0, 0],
['moes', 1, 0, 2, 1]]