Topic Modeling with Latent Dirichlet Allocation in Python

Installation/Prerequisites

To run this project, you will need to install the following Python packages:

numpy
pandas
spacy
gensim
sklearn

You can install them using pip or conda commands. For example:

pip install numpy pip install -U spacy python -m spacy download en import nltk nltk.download('stopwords')

References

This project is based on the following resources:

A friendly introduction to LDA by Edwin Chen: https://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation
A tutorial on LDA by David Blei: http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf
A Python library for LDA by Radim Rehurek: https://radimrehurek.com/gensim/models/ldamodel.html
Vedio Ref - https://youtu.be/T05t-SqKArY
Vedio-Ref : https://youtu.be/BaM1uiCpj_E

LDA Workflow

FAQ

Some frequently asked questions about this project are:

What is topic modeling? Topic modeling is a technique to extract hidden topics from large volumes of text. It is an unsupervised machine learning algorithm that assumes that each document is a mixture of topics and each topic contains a set of words with certain probabilities.
What is latent dirichlet allocation? Latent dirichlet allocation (LDA) is one of the most popular methods for topic modeling. It uses a probabilistic model to capture the information in a given collection of documents. It assigns each word in a document to a topic based on two factors: how prevalent is that word in that document, and how prevalent is that word in that topic.

How to run this project?

To run this project, you will need to follow these steps:

Download or clone this repository.
Install the required packages.
Load and preprocess the data using spacy.
Train the LDA model using gensim or sklearn.
Evaluate the model using perplexity and coherence scores.
Visualize the topics and their keywords using pyLDAvis.

Link to My Blog :

https://medium.com/@soulofmercara10/topic-modeling-with-lda-505151fdffec

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Images		Images
PDF-Files		PDF-Files
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TP-with-gensim.ipynb		TP-with-gensim.ipynb
TP_modeling.ipynb		TP_modeling.ipynb
Topic_Modeling(LDA).ipynb		Topic_Modeling(LDA).ipynb
Topic_Modeling.ipynb		Topic_Modeling.ipynb
Topic_Modeling_Using_LDA.ipynb		Topic_Modeling_Using_LDA.ipynb
gitattributes		gitattributes
tp-20newpaper.ipynb		tp-20newpaper.ipynb
ushmm_dn.json		ushmm_dn.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic Modeling with Latent Dirichlet Allocation in Python

Installation/Prerequisites

References

LDA Workflow

FAQ

How to run this project?

Link to My Blog :

About

Releases

Packages

Languages

Zaheer-10/Topic-Modleing-With-Latent-Dirichlet-Allocation

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling with Latent Dirichlet Allocation in Python

Installation/Prerequisites

References

LDA Workflow

FAQ

How to run this project?

Link to My Blog :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages