Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

Open
InigoGastesi opened this issue Jul 5, 2024 · 1 comment
Assignees
Labels
bug Something that is supposed to be working; but isn't P3 Issue moderate in impact or severity rllib RLlib related issues rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack

Comments

@InigoGastesi
Copy link

What happened + What you expected to happen

Hello,

I am encountering an error while training a PPO agent using RLlib. During training, I receive the following error message:
File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 85, in loss curr_action_dist = dist_class(logits, model) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 250, in __init__ self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/torch/distributions/normal.py", line 56, in __init__ super().__init__(batch_shape, validate_args=validate_args) File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/torch/distributions/distribution.py", line 68, in __init__ raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 2)) of distribution Normal(loc: torch.Size([128, 2]), scale: torch.Size([128, 2])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan]], grad_fn=<SplitBackward0>)
I have checked all observations to ensure there are no NaN values, but the error persists. Can you please help me identify the cause of this issue and how to resolve it?

Thank you for your assistance.

Versions / Dependencies

ray: 2.31
Python: 3.11
torch: 2.3.1

Reproduction script

tuner = Tuner(
trainable=PPO,
param_space=...,
run_config=...
)
tuner.fit()

Issue Severity

Low: It annoys or frustrates me.

@InigoGastesi InigoGastesi added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 5, 2024
@anyscalesam anyscalesam added the rllib RLlib related issues label Jul 8, 2024
@simonsays1980
Copy link
Collaborator

@InigoGastesi Thanks for raising this issue. I guess this behavior is not error, but due to the training process. Could you check, if your KL divergence is very high? I guess the reason for the logits turning NaN are too extreme gradients resulting in numerical problems.

You could try to increase the kl_coeff and decrease the learning rate. If this behavior remains, we will need a reproducable example to analyse the code.

@simonsays1980 simonsays1980 self-assigned this Jul 9, 2024
@simonsays1980 simonsays1980 added P3 Issue moderate in impact or severity rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P3 Issue moderate in impact or severity rllib RLlib related issues rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

No branches or pull requests

3 participants