Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coherence crashing for 50 topics LDA model / 40k+ long documents (~20M total tokens) #191

Open
Dijkie85 opened this issue Dec 6, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@Dijkie85
Copy link

Dijkie85 commented Dec 6, 2022

Trying to compute c_v coherence for a 50 topic LDA model trained on 40k long documents (around 20M total tokens) takes about 15 minutes before crashing the kernel. Using gensim (via the great snippet provided in another issue) works just fine, takes about 2.5 minutes.

I'm running the following code on tomotopy 0.12.3 / python 3.10.8, adapted from the examples repo:

coh_model = Coherence(lda_model_50k, coherence='c_v')
average_coherence = coh_model.get_score()
print(average_coherence)

Any thoughts?

@bab2min bab2min added the bug Something isn't working label Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants