-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how does topicTuner help the parameter setting process #20
Comments
The short version is that you use the search functions to test different parameters. You then evaluate the resulting clustering. Typically I'm looking to find a number of clusters that makes sense for the corpus I'm working with. Then you can further tune to find the fewest outliers. There are always going to be questions about whether reducing outliers has a (negative) material effect on the cluster formation. However, it seems pretty clear that optimizing for the fewest number of outliers is the way to go. Once you have the parameters that work for you, you can generate a BERTopic model. |
@ I see, so grid-search is a tool to find all the parameters setting and you actually need to evaluate which setting is the best. the way you do the evaluation is something like "visualization" or manually checking maybe? (I am not sure). I do have a question that how did u do the grid-searching according to your docs number?(I didn't find doc nums in these functions) and how did u name a parameter setting the "best one". Thank you. |
There are different searches which balance off the "depth" and "width" of the search. Since I suggest you take a minute to go through the API documentation and run through the provided notebook to get a more thorough understanding of the tools and how I envisioned them to be used. |
what is the evaluation metrics you used here |
@drob-xx I checked your code, very impressive work, here I got a question. I think you used grid search to do different setting of min_cluster_size and min-samples and did some experiments, I also checked BaseHDBSCANTuner and gridSearch, pseudoGridSearch and randomSearch functions. But I am still having questions about how this "grid search" or more exactly, these functions help the parameters setting.
The text was updated successfully, but these errors were encountered: