Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading a fairly large model takes a long time #180

Open
erip opened this issue Aug 1, 2022 · 3 comments
Open

Loading a fairly large model takes a long time #180

erip opened this issue Aug 1, 2022 · 3 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@erip
Copy link

erip commented Aug 1, 2022

I have a 6.5GB model trained on 10M docs to model 100 topics trained the usual way. I'm trying to load the model and I'm finding that load times are incredibly high. For reference, I've been monitoring top and my program has only loaded ~5.1GB of the 6.5GB model after 10 minutes.

I suspect this is because I used default save with full=True... Should I expect a model with full=False to load faster?

@erip erip changed the title Loading a fairly large model takes time Loading a fairly large model takes a long time Aug 1, 2022
@bab2min bab2min added the question Further information is requested label Aug 7, 2022
@bab2min
Copy link
Owner

bab2min commented Aug 7, 2022

Hi @erip
Yes, the model saved with full=True argument has all parameters related to whole train, thus it may take a long time to re-load it. If you save the model with full=False, you cannot continue to train with this model, but you can load faster.

@erip
Copy link
Author

erip commented Aug 7, 2022

Thanks very much, @bab2min! It seems like if the model is binarized it shouldn't take long to reload. I haven't looked at the details so sorry for the silly question, but does the model use numpy binarization under the hood? If so, it could be quick to deserialize even if full (though maybe I don't appreciate the complexity here).

@bab2min
Copy link
Owner

bab2min commented Aug 8, 2022

@erip Actually, the package doesn't use numpy binarization for loading & saving, but uses a custom serialization functions.
And it is true that there are many features related to backward compatibility in the custom functions, so their process is somewhat inefficiently.
I'll check if it can be improved or re-implement the loading & saving in the near future.

@bab2min bab2min added the enhancement New feature or request label Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
Development

No branches or pull requests

2 participants