Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip certain step during training #20045

Closed
real-junjiezhang opened this issue Jul 4, 2024 · 3 comments
Closed

Skip certain step during training #20045

real-junjiezhang opened this issue Jul 4, 2024 · 3 comments
Labels
question Further information is requested ver: 2.2.x

Comments

@real-junjiezhang
Copy link

Bug description

I want to ignore some batch step during training, how can I write the code? Any suggestions would be appreciated.Thanks in advance.

The chatGPT answer below:

    def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
        # Define steps to skip
        steps_to_skip = {199, 302, 493, 1283}
        if self.trainer.global_step in steps_to_skip:
            return -1  # Skip training this step

Is this correct?

Thanks!

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@real-junjiezhang real-junjiezhang added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jul 4, 2024
@awaelchli awaelchli added question Further information is requested and removed bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jul 4, 2024
@awaelchli
Copy link
Member

@real-junjiezhang It's possible to skip a step by returning None from the training step. Would this answer your question?

@real-junjiezhang
Copy link
Author

real-junjiezhang commented Jul 5, 2024

@real-junjiezhang It's possible to skip a step by returning None from the training step. Would this answer your question?

Thanks for your quick reply! What if currently I am using DDP to train the model? Can I write like this ?

    def on_train_batch_start(self, batch, batch_idx, dataloader_idx):
        # Define steps to skip
        steps_to_skip = {199, 302, 493, 1283}
        if self.trainer.global_step in steps_to_skip:
            return -1  # Skip training this step

Thanks!

@awaelchli
Copy link
Member

No, for DDP this is not supported, because the processes could fall out of sync.
In this case you need to think about how to rewrite your logic so that it doesn't have to be skipped. For example, if you have problematic data points, exclude them in your dataset beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested ver: 2.2.x
Projects
None yet
Development

No branches or pull requests

2 participants