TopicModelTuning

The has code that parallels the article Using Metrics to Determine The Right LDA Topic Model Size. Users can run the notebook and step-by-step re-create the procedures described in the article.

To run the code presented here, follow this outline (details in the cells below):

Download two csv files from the GitHub repository into a directory accessible to the notebook.
Download the text DB csv file from Kaggle.
Assign the global directory value to the location of the above files.
Install the required packages.
Execute the imports.
Run the cells containing Python function definitions used in the notebook.
Generate the six models used in the evaluation. This shold take about 15 minutes on a standard Google Colab account. You can save the models for later use if desired.
Run the evaluation code.
Download CSV Files

There are three csv files that are needed to run this notebook:

In the GitHub repository:

ExcludelistDF.csv
ModelRunMetrics.csv

On Kaggle

NewsDF.csv
ExcludelistDF is a list of stop words which can be used when building models based on the sample text.

ModelRunMetrics are the metrics from 90 runs of the LDA and can be used to re-create and explore the data from the article.

NewsDF is a copy of the 30,000 article DB that has both the original text as well as pre-processed versions of the articles. You will need this if you want to run your own models AND if you want to explore the text that the models are built on.

It is recommended that you place all of these files in a location accessible to the Colab notebook and referenced in the DATA_DIR variable

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ExcludelistDF.csv		ExcludelistDF.csv
LICENSE		LICENSE
ModelRunMetrics.csv		ModelRunMetrics.csv
README.md		README.md
TopicModelTuning.ipynb		TopicModelTuning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopicModelTuning

About

Releases

Packages

Languages

License

drob-xx/TopicModelTuning

Folders and files

Latest commit

History

Repository files navigation

TopicModelTuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages