Twitter sentiment analysis on a string

发布于 2020-03-27 10:23:34

I've written a program that takes a twitter data that contains tweets and labels (0 for neutral sentiment and 1 for negative sentiment) and predicts which category the tweet belongs to. The program works well on the training and test Set. However I'm having problem in applying prediction function with a string. I'm not sure how to do that.

I have tried cleaning the string the way I cleaned the dataset before calling the predict function but the values returned are in wrong shape.

import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
import re

#Loading dataset
dataset = pd.read_csv('tweet.csv')

#List to hold cleaned tweets
clean_tweet = []

#Cleaning tweets
for i in range(len(dataset)):
    tweet = re.sub('[^a-zA-Z]', ' ', dataset['tweet'][i])
    tweet = re.sub('@[\w]*',' ',dataset['tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(token) for token in tweet if not token in set(stopwords.words('english'))]
    tweet = ' '.join(tweet)
    clean_tweet.append(tweet)

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 3000)
X = cv.fit_transform(clean_tweet)
X =  X.toarray()
y = dataset.iloc[:, 1].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.naive_bayes import GaussianNB
n_b = GaussianNB()
n_b.fit(X_train, y_train)
y_pred  = n_b.predict(X_test) 

some_tweet = "this is a mean tweet"  # How to apply predict function to this string

Questioner

Imanpal Singh

Viewed

Chinese

Original

Imanpal Singh 2019-07-03 22:46:26

cv.transform() on my new string gives me an error - ValueError: Iterable over raw text documents expected, string object received.

Toodle Pip 2019-07-03 22:50:16

Sorry, cv.transform() takes an object of type iterable, so you will need to add the new_tweet part of an iterable. I've updated the answer and that should work.

Imanpal Singh 2019-07-03 22:56:47

Thank you, it worked. However can you tell me why cv.fit_transform() would be wrong here ?

Toodle Pip 2019-07-03 23:05:01

stackoverflow.com/questions/38692520/… this should point you in the right direction.

Imanpal Singh 2019-07-04 00:48:48

Thank you for the link

Twitter sentiment analysis on a string

Related issues