Warm tip: This article is reproduced from stackoverflow.com, please click
matrix nlp similarity tf-idf word-embedding

Create a matrix from a dict of dicts for calculating similarities between docs

发布于 2020-03-27 10:22:22

Here is my problem:

I have a dataframe like this:

id   tfidf_weights   
1    {word1: 0.01, word2: 0.01, word3: 0.01, ...}
2    {word4: 0.01, word5: 0.01, word6: 0.01, ...}
3    {word7: 0.01, word8: 0.01, word9: 0.01, ...}
4    {word10: 0.01, word11: 0.01, word12: 0.01, ...}
5    {word13: 0.01, word14: 0.01, word15: 0.01, ...}    
.
.
.

column 'id' represent the ids of the docs and 'tfidf_weights' the tfidf weight for each word of each docs.

from this dataframe, i can obtain a dict with the following structure:

mydict = {1:{word1: 0.01, word2: 0.01, word3: 0.01, ...}, 2:{word4: 0.01, word5: 0.01, word6: 0.01, ...}, 3:{word7: 0.01, word8: 0.01, word9: 0.01, ...}, 4:{word10: 0.01, word11: 0.01, word12: 0.01, ...}, 5:{word13: 0.01, word14: 0.01, word15: 0.01, ...}, ...}

what i want to do is, from this dictionary, obtain a matrix like this:

      word1     word2     word3     word4   ...
1     0.01      0.01      0.01      0.01     
2     0.01      0.01      0.01      0.01
3     0.01      0.01      0.01      0.01
4     0.01      0.01      0.01      0.01
5     0.01      0.01      0.01      0.01
.
.
.

Thank you for your help !

Questioner
nipato
Viewed
67
amdex 2019-07-03 22:21

You can convert a list of dictionaries into a dataframe by using the pandas DataFrame class directly.

import pandas as pd

a = [{"0": 0}, {"1": 1}]
df = pd.DataFrame(a)

To apply this to your problem, all you have to do is turn mydict into a list of dictionaries instead of a dictionary of dictionaries.