You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using BERTopic for topic modelling and recently needed to update my existing BERTopic model with new documents. I want to push the updated model to the Hugging Face Hub, ensuring that it reflects the new number of documents and topics.
Despite following these steps, I still see the old number of training documents in the repository on the Hugging Face Hub. How can I ensure that the updated model reflects the new number of training and topics?
Any help or guidance on this would be greatly appreciated!
Reproduction
frombertopicimportBERTopic# Load your existing BERTopic modeltopic_model=BERTopic.load("shantanudave/BERTopic_ArXiv",embedding_model="sentence-transformers/all-MiniLM-L6-v2")
new_topics, new_probs=topic_model.transform(lemmatized_docs, embeddings)
new_model_name="BERTopic_v2"# Save the updated model locally using safetensorsembedding_model="sentence-transformers/all-MiniLM-L6-v2"topic_model.save(new_model_name, serialization="safetensors", save_ctfidf=True, save_embedding_model=embedding_model)
fromhuggingface_hubimportlogin# Authenticate with Hugging Facelogin(token="your_hugging_face_token")
# Push the updated model to Hugging Face Hubtopic_model.push_to_hf_hub(
repo_id=f"shantanudave/{new_model_name}",
serialization="safetensors",
save_ctfidf=True,
save_embedding_model=embedding_model
)
BERTopic Version
pip install -U bertopic
The text was updated successfully, but these errors were encountered:
That's the thing, you didn't update the model. When you use .transform, you are merely predicting the topics of the documents that you passed to it. .transform, like it's used in scikit-learn, it not meant to update the underlying model. Instead, if you want to update the model, I would advise using either online topic modeling or the .merge_model technique.
Have you searched existing issues? 馃攷
Desribe the bug
I have been using BERTopic for topic modelling and recently needed to update my existing BERTopic model with new documents. I want to push the updated model to the Hugging Face Hub, ensuring that it reflects the new number of documents and topics.
Here鈥檚 what I鈥檝e done so far:
Despite following these steps, I still see the old number of training documents in the repository on the Hugging Face Hub. How can I ensure that the updated model reflects the new number of training and topics?
Any help or guidance on this would be greatly appreciated!
Reproduction
BERTopic Version
pip install -U bertopic
The text was updated successfully, but these errors were encountered: