Warm tip: This article is reproduced from serverfault.com, please click

python 2.7-如何将gensim LDA主题输出以及分数保存到csv?

(python 2.7 - How to save gensim LDA topics output to csv along with the scores?)

发布于 2015-06-08 20:27:31

如何保存输出?我正在使用以下代码:

%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
Questioner
VP10
Viewed
22
Feng Mai 2018-06-15 06:25:29

要将每个文档的主题混合导出到csv文件,请执行以下操作:

import pandas as pd

mixture = [dict(lda_model[x]) for x in corpus1]
pd.DataFrame(mixture).to_csv("topic_mixture.csv")

要将每个主题的热门单词导出到csv文件中,请执行以下操作:

top_words_per_topic = []
for t in range(lda_model.num_topics):
    top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 5)])

pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words.csv")

CSV文件将具有以下格式

Topic Word  P  
0     w1    0.004437  
0     w2    0.003553  
0     w3    0.002953  
0     w4    0.002866  
0     w5    0.008813  
1     w6    0.003393  
1     w7    0.003289  
1     w8    0.003197 
...