CTM: Topic Count Impossibly Large #202

tau-241 · 2023-05-11T14:56:37Z

I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:

This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.

I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:

What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTM: Topic Count Impossibly Large #202

CTM: Topic Count Impossibly Large #202

tau-241 commented May 11, 2023 •

edited

Loading

CTM: Topic Count Impossibly Large #202

CTM: Topic Count Impossibly Large #202

Comments

tau-241 commented May 11, 2023 • edited Loading

tau-241 commented May 11, 2023 •

edited

Loading