Quill: An Exercise in NLP

I created Quill to venture into NLP and Machine Learning. This program takes a corpus of texts from pre-determined authors, processes/cleans their work, and outputs a new poem or short story based on the training data gathered from their work.

Prerequisites

You need to have a machine with Python 3.9+ installed. Please also refer to the requirements and run the following command to install them:

pip install -r requirements.txt

To check your Python version and shell, use:

$ python3.9 -V
Python 3.9.7

$ echo $SHELL
/usr/bin/zsh

Development Environment

It is recommended to create and activate a virtual environment before contributing to this repository.

Install virtualenv:

$ pip install virtualenv

On Windows:

$ python -m venv <your-environment-name>
$ <your-environment-name>\Scripts\activate

On Mac/Linux:

$ python3 -m venv <your-environment-name>
$ source <your-environment-name>/bin/activate

To deactivate the virtual environment:

$ deactivate

Quill

I've implemented an LSTM model to train on Hemingway's limited works. As of the last commit of this README file, only 20 epochs were used to train on the corpus. Here is an example of the output generated so far:

Test start text: 'soldiers never do die well'

soldiers never do die well crosses mark the places wooden crosses where they fell stuck above their faces soldiers pitch and cough and twitch all the world roars red and black soldiers smother in a ditch choking through the whole attack i like americans they are so unlike canadians they do not take their policemen

Test start text: 'the age demanded'

the age demanded that we sing and cut away our tongue the age demanded that we flow and hammered in the bung the age demanded that we dance and jammed us into iron pants and in the end the age was handed the sort of shit that it demanded a porcupine skin stiff

Very poetic, isn't it?

Raising the number of epochs could lead to overfitting, so an important implementation to work on would be early stopping. Early stopping halts the training process if the model's performance on a validation set stops improving for a certain number of epochs. Another useful technique to consider is regularization; using methods like L1/L2 regularization can help prevent the model from overfitting the data.

Last update: 4:22 PM EDT // 7/7/2024

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
License		License
data		data
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
README.md		README.md
main.py		main.py
poop.py		poop.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quill: An Exercise in NLP

Prerequisites

Development Environment

On Windows:

On Mac/Linux:

Quill

Test start text: 'soldiers never do die well'

Test start text: 'the age demanded'

About

Releases

Packages

Languages

alhakimiakrm/Quill

Folders and files

Latest commit

History

Repository files navigation

Quill: An Exercise in NLP

Prerequisites

Development Environment

On Windows:

On Mac/Linux:

Quill

Test start text: 'soldiers never do die well'

Test start text: 'the age demanded'

About

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages