Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided topic model with pre embedded seed_topic_list #2016

Open
1jamesthompson1 opened this issue May 28, 2024 · 1 comment
Open

Guided topic model with pre embedded seed_topic_list #2016

1jamesthompson1 opened this issue May 28, 2024 · 1 comment

Comments

@1jamesthompson1
Copy link

1jamesthompson1 commented May 28, 2024

This issues follows is about a similar problem addressed in #2014. I can update and merge

I would like to run a guided topic model with a embedding model that is not supported by BERTopic, I would also like to be able to test some hyperparameters without having to rerun the embeddings. To support this I would like to be able to pass the pre embeded seed_topic_list.

What I want to be able to do is something like this:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]

seed_topic_list = [["drug", "cancer", "drugs", "doctor"],
                   ["windows", "drive", "dos", "file"],
                   ["space", "launch", "orbit", "lunar"]]

embedded_seed_topic_list = np.random.rand(len(seed_topic_list), 1024)

topic_model = BERTopic(
    seed_topic_list=seed_topic_list,
    embedded_seed_topic_list = embedded_seed_topic_list,
    verbose=True)

topics, probs = topic_model.fit_transform(docs)

Like with #2014 I am happy to write up the simple change of adding in another argument so that it can check if the embeddings are arleady present before trying to embed the seed_topic_list.

@1jamesthompson1 1jamesthompson1 changed the title Guided topic model with pre embeded seed_topic_list Guided topic model with pre embedded seed_topic_list May 28, 2024
@MaartenGr
Copy link
Owner

A good request for which I have the same answer as #2014 since for me they touch upon the same underlying issue. I'm okay with keeping this issue open for others and continuing the discussion in the other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants