Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

Open
n30111 opened this issue Jul 3, 2024 · 2 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-evaluation Bug affecting policy evaluation with RLlib. rllib-logging This problem is related to logging metrics rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack

Comments

@n30111
Copy link
Contributor

n30111 commented Jul 3, 2024

What happened + What you expected to happen

When using evaluation_num_env_runners > 1, for RLLib evaluation, the results ["evaluation"]["env_runners"]["num_episodes"] is not equal to the evaluation_duration set in the configuration.

Versions / Dependencies

Ray 2.31
Python 3.11
Linux

Reproduction script

from ray.tune.tuner import Tuner
from ray import tune, train

stopping_criteria = {"training_iteration": 2}
param_space ={
        "env": "LunarLanderContinuous-v2",
        "kl_coeff": 1.0,
        "num_workers": 0,
        "num_cpus": 0.5,  # number of CPUs to use per trial
        "num_gpus": 0,  # number of GPUs to use per trial
        "lambda": 0.95,
        "clip_param": 0.2,
        "lr": 1e-4,
        "evaluation_interval":1,
        "evaluation_duration":6,
        "evaluation_num_env_runners":1,
}

tuner = Tuner("PPO",
    tune_config=tune.TuneConfig(
        metric="env_runners/episode_reward_mean",
        mode="max",
        num_samples=1,
    ),
    param_space=param_space,
    run_config=train.RunConfig(stop=stopping_criteria),
)

result_grid = tuner.fit()
res = result_grid._experiment_analysis  # pylint: disable=protected-access
print(res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"])
assert param_space["evaluation_duration"] == res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"]

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@n30111 n30111 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 3, 2024
@anyscalesam anyscalesam added the rllib RLlib related issues label Jul 8, 2024
@simonsays1980 simonsays1980 self-assigned this Jul 9, 2024
@simonsays1980
Copy link
Collaborator

@n30111 Thanks for raising this issue. I could reproduce it. It is somewhere in the old stack. I can run the example without errors using the new stack:

from ray.tune.tuner import Tuner
from ray import tune, train

stopping_criteria = {"training_iteration": 2}
param_space ={
        "env": "LunarLander-v2",
        "env_config": {"continuous": True},
        "enable_rl_module_and_learner": True,
        "enable_env_runner_and_connector_v2": True,
        "kl_coeff": 1.0,
        "num_workers": 0,
        "num_cpus": 0.5,  # number of CPUs to use per trial
        "num_gpus": 0,  # number of GPUs to use per trial
        "lambda": 0.95,
        "clip_param": 0.2,
        "lr": 1e-4,
        "evaluation_interval":1,
        "evaluation_duration":6,
        "evaluation_num_env_runners":1,
}

tuner = Tuner("PPO",
    tune_config=tune.TuneConfig(
        metric="env_runners/episode_return_mean",
        mode="max",
        num_samples=1,
    ),
    param_space=param_space,
    run_config=train.RunConfig(stop=stopping_criteria),
)

result_grid = tuner.fit()
res = result_grid._experiment_analysis  # pylint: disable=protected-access
print(res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"])
assert param_space["evaluation_duration"] == res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"]

Maybe this is an alternative for you?

@simonsays1980 simonsays1980 added rllib-evaluation Bug affecting policy evaluation with RLlib. rllib-logging This problem is related to logging metrics rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 9, 2024
@n30111
Copy link
Contributor Author

n30111 commented Jul 10, 2024

We are dependent on the old stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't rllib RLlib related issues rllib-evaluation Bug affecting policy evaluation with RLlib. rllib-logging This problem is related to logging metrics rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

No branches or pull requests

3 participants