温馨提示:本文翻译自stackoverflow.com，查看原文请点击：python - Twitter sentiment analysis on a string

machine-learning nlp python scikit-learn sentiment-analysis

python - Twitter情绪分析

发布于 2020-03-27 11:15:37

我编写了一个程序，该程序接收包含推文和标签的Twitter数据（0用于中性情绪和1消极情绪），并预测该推文所属的类别。该程序在训练和测试集上效果很好。但是我在对字符串应用预测函数时遇到问题。我不确定该怎么做。

我尝试在调用预报函数之前以清理数据集的方式清理字符串，但返回的值格式错误。

import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
import re

#Loading dataset
dataset = pd.read_csv('tweet.csv')

#List to hold cleaned tweets
clean_tweet = []

#Cleaning tweets
for i in range(len(dataset)):
    tweet = re.sub('[^a-zA-Z]', ' ', dataset['tweet'][i])
    tweet = re.sub('@[\w]*',' ',dataset['tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(token) for token in tweet if not token in set(stopwords.words('english'))]
    tweet = ' '.join(tweet)
    clean_tweet.append(tweet)

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 3000)
X = cv.fit_transform(clean_tweet)
X =  X.toarray()
y = dataset.iloc[:, 1].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.naive_bayes import GaussianNB
n_b = GaussianNB()
n_b.fit(X_train, y_train)
y_pred  = n_b.predict(X_test) 

some_tweet = "this is a mean tweet"  # How to apply predict function to this string

提问者

Imanpal Singh

被浏览

140

查看英文版

查看原文

Toodle Pip 2019-07-03 22:50

cv.transform([cleaned_new_tweet])在新的字符串上使用，可以将新的Tweet转换为现有的文档术语矩阵。这将以正确的形状返回推文。

Imanpal Singh 2019-07-03 22:46:26

cv.transform() 在我的新字符串上给我一个错误- ValueError: Iterable over raw text documents expected, string object received.

Toodle Pip 2019-07-03 22:50:16

抱歉，cv.transform（）采用了可迭代类型的对象，因此您需要添加可迭代类型的new_tweet部分。我已经更新了答案，应该可以。

Imanpal Singh 2019-07-03 22:56:47

谢谢，它奏效了。但是，您能告诉我为什么cv.fit_transform() 这里错了吗？

Toodle Pip 2019-07-03 23:05:01

stackoverflow.com/questions/38692520/…这应该为您指明正确的方向。

Imanpal Singh 2019-07-04 00:48:48

谢谢你的链接

相关问题

1

如何使用python cut方法创建bin，接受一个参数并返回适当的bin？

2

从具有特定条件的列表列表创建字典

3

根据行值选择列，Python，Pandas

4

在数据框中绘制零和一的计数

5

python函数。

6

在两个DataFrame之间执行大量Pandas查找的最佳方法

7

如何获取Pandas数据透视表中的列数和每列的宽度？

8

在Pandas数据框中分组时缺少所需值时显示一列

9

Python隐藏壁虱但显示壁虱标签

10

获取Entry和checkbutton值Tkinter时出现问题

热门github

1

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application. (翻译：LobeChat 是开源的高性能聊天机器人框架，支持语音合成、多模态、可扩展的（Function Call）插件系统。)

2

Collection of leaked system prompts

3

Jelly Evolution Simulator

4

Master programming by recreating your favorite technologies from scratch. (翻译：在这个项目中，你能学会如何创造自己的各种工具，引擎，游戏，框架，库......)

5

Agent S: an open agentic framework that uses computers like a human

6

An open source payments switch written in Rust to make payments fast, reliable and affordable (翻译：YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite)

7

Python - 100天从新手到大师

8

Truly independent web browser

9

Curated list of project-based tutorials (翻译：收藏了基于项目的教程列表)

10

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ (翻译：12 节课程，开始使用生成式 AI 进行构建)

11

ChatGPT DAN, Jailbreaks prompt

12

A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage

13

real time face swap and one-click video deepfake with only a single image