Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_doc() method very slow #199

Open
erima2020 opened this issue Mar 10, 2023 · 0 comments
Open

add_doc() method very slow #199

erima2020 opened this issue Mar 10, 2023 · 0 comments

Comments

@erima2020
Copy link

erima2020 commented Mar 10, 2023

Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD).

I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)

Thank you in advance for your prompt answer!

Cheers,
Eric

Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant