Sentiment Analysis RNN Project

Project Overview

This project implements a Recurrent Neural Network (RNN) model for sentiment analysis using TensorFlow and Keras. The goal is to classify text data into positive, neutral, or negative sentiments. We use GridSearchCV to fine-tune hyperparameters and improve the model's accuracy.

Installation

To get started, clone this repository and install the necessary packages:

git clone https://github.com/dlynch42/sentiment-analysis.git
cd sentiment-analysis
pip install -r requirements.txt

Requirements

pandas
numpy
tensorflow
sklearn
pyprind
re

Install these packages using pip:

pip install pandas numpy tensorflow scikit-learn pyprind matplotlib seaborn

Usage

To train the model and make predictions, follow these steps:

EDA & Preprocessing: Load and preprocess the text data to remove HTML tags, URLs, special characters, and stopwords, then tokenize and stem the text. Create sequence mappings for model.
Model Architecutre: Initialize and build the RNN model. Train model on X_train. Tune and adjust hyperparameters using GridSearchCV and Pipe.
Test Model: Use the trained model to make predictions on new data.
Conclusion: Analyze results.

Model Architecture

The model architecture consists of the following layers:

Embedding Layer: Converts input sequences into dense vectors of fixed size.
LSTM Layers: Captures dependencies in both forward and backward directions using bidirectional LSTM. Multiple layers can be stacked for deeper representations.
Dropout Layers: Helps prevent overfitting.
Dense Output Layer: Outputs the final sentiment prediction.
Optimizer & Loss Function: Used Adam to optimize and BCE to calculate loss

Hyperparameters

We focused on the following hyperparameters to optimize the model:

seq_len: Sequence length of the input data.
lstm_size: Number of units in the LSTM layers.
num_layers: Number of LSTM layers.
batch_size: Size of the batches during training.
learning_rate: Learning rate for the optimizer.

Results

The best model achieved a test accuracy of 73.45%. The results were lower than expected, likely due to over-processing the data. The original text sequences were short, and excessive preprocessing reduced the amount of useful data.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
data.csv		data.csv
data_extraction.ipynb		data_extraction.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
sentiment_analysis.ipynb		sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis RNN Project

Project Overview

Table of Contents

Installation

Requirements

Usage

Model Architecture

Hyperparameters

Results

About

Releases

Packages

Languages

dlynch42/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis RNN Project

Project Overview

Table of Contents

Installation

Requirements

Usage

Model Architecture

Hyperparameters

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages