[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

n30111 · 2024-07-03T16:19:40Z

What happened + What you expected to happen

When using evaluation_num_env_runners > 1, for RLLib evaluation, the results ["evaluation"]["env_runners"]["num_episodes"] is not equal to the evaluation_duration set in the configuration.

Versions / Dependencies

Ray 2.31
Python 3.11
Linux

Reproduction script

from ray.tune.tuner import Tuner
from ray import tune, train

stopping_criteria = {"training_iteration": 2}
param_space ={
        "env": "LunarLanderContinuous-v2",
        "kl_coeff": 1.0,
        "num_workers": 0,
        "num_cpus": 0.5,  # number of CPUs to use per trial
        "num_gpus": 0,  # number of GPUs to use per trial
        "lambda": 0.95,
        "clip_param": 0.2,
        "lr": 1e-4,
        "evaluation_interval":1,
        "evaluation_duration":6,
        "evaluation_num_env_runners":1,
}

tuner = Tuner("PPO",
    tune_config=tune.TuneConfig(
        metric="env_runners/episode_reward_mean",
        mode="max",
        num_samples=1,
    ),
    param_space=param_space,
    run_config=train.RunConfig(stop=stopping_criteria),
)

result_grid = tuner.fit()
res = result_grid._experiment_analysis  # pylint: disable=protected-access
print(res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"])
assert param_space["evaluation_duration"] == res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"]

Issue Severity

Medium: It is a significant difficulty but I can work around it.

The text was updated successfully, but these errors were encountered:

simonsays1980 · 2024-07-09T14:53:27Z

@n30111 Thanks for raising this issue. I could reproduce it. It is somewhere in the old stack. I can run the example without errors using the new stack:

from ray.tune.tuner import Tuner
from ray import tune, train

stopping_criteria = {"training_iteration": 2}
param_space ={
        "env": "LunarLander-v2",
        "env_config": {"continuous": True},
        "enable_rl_module_and_learner": True,
        "enable_env_runner_and_connector_v2": True,
        "kl_coeff": 1.0,
        "num_workers": 0,
        "num_cpus": 0.5,  # number of CPUs to use per trial
        "num_gpus": 0,  # number of GPUs to use per trial
        "lambda": 0.95,
        "clip_param": 0.2,
        "lr": 1e-4,
        "evaluation_interval":1,
        "evaluation_duration":6,
        "evaluation_num_env_runners":1,
}

tuner = Tuner("PPO",
    tune_config=tune.TuneConfig(
        metric="env_runners/episode_return_mean",
        mode="max",
        num_samples=1,
    ),
    param_space=param_space,
    run_config=train.RunConfig(stop=stopping_criteria),
)

result_grid = tuner.fit()
res = result_grid._experiment_analysis  # pylint: disable=protected-access
print(res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"])
assert param_space["evaluation_duration"] == res.trials[0].last_result["evaluation"]["env_runners"]["num_episodes"]

Maybe this is an alternative for you?

n30111 · 2024-07-10T08:07:01Z

We are dependent on the old stack.

n30111 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 3, 2024

anyscalesam added the rllib RLlib related issues label Jul 8, 2024

simonsays1980 self-assigned this Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

n30111 commented Jul 3, 2024

simonsays1980 commented Jul 9, 2024

n30111 commented Jul 10, 2024

[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

[RLLib] Evaluation duration doesn't match with results num episodes for evaluation #46412

Comments

n30111 commented Jul 3, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

simonsays1980 commented Jul 9, 2024

n30111 commented Jul 10, 2024