Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue with optimizer #122

Open
adrnmt95 opened this issue May 11, 2024 · 0 comments
Open

Memory issue with optimizer #122

adrnmt95 opened this issue May 11, 2024 · 0 comments

Comments

@adrnmt95
Copy link

  • OCTIS version: 1.13.1
  • Python version: 3.10.12 (Google Colab)

Description

I'm trying to optimize the hyperparameters of a ProdLDA. Training, testing a validation are together roughly 350k documents (1/3 of the full dataset) and the vocabulary size is 35k (stemmed unigrams and bigrams). When I run the optimizer it takes roughly 25 gb of ram during the first call. When it gets to the 2nd call it spikes up to more than 50gb of ram and at that point the kernel crashes. Is there something I could do to solve this without reducing samples/vocabulary?

What I Did

from octis.dataset.dataset import Dataset
from octis.models.ProdLDA import ProdLDA
from octis.optimization.optimizer import Optimizer
from skopt.space.space import Real, Categorical, Integer
from octis.evaluation_metrics.diversity_metrics import TopicDiversity
from octis.evaluation_metrics.coherence_metrics import Coherence

dataset = Dataset()
dataset.load_custom_dataset_from_folder('topic_model_dataset')

# Define the ProdLDA model and the search space for optimization
model = ProdLDA()

search_space = {
    "num_topics": Integer(low=25, high=40),
    "dropout": Real(low=0.0, high=0.5),
    "learning_rate": Real(low=0.001, high=0.1),
    "num_epocs": Categorical({50, 100}),
    "num_layers": Categorical({1, 2, 3}),
    "num_neurons": Categorical({100, 200, 300})
}

# Set up the optimizer
optimizer = Optimizer()

# Define evaluation metrics
topic_diversity = TopicDiversity(topk=10)
npmi = Coherence(texts=dataset.get_corpus())

# Run the optimization
optimization_results = optimizer.optimize(
        model,
        dataset,
        npmi,
        search_space,
        model_runs=1,
        number_of_call=20,
        optimization_type='Maximize',
        save_models=False,
        early_stop=True,
        extra_metrics = [topic_diversity]
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant