Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTM: Topic Count Impossibly Large #202

Open
tau-241 opened this issue May 11, 2023 · 0 comments
Open

CTM: Topic Count Impossibly Large #202

tau-241 opened this issue May 11, 2023 · 0 comments

Comments

@tau-241
Copy link

tau-241 commented May 11, 2023

I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:
image

This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.

I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:
image

What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant