Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

Open
1 task done
mabubakarsaleem opened this issue Jul 7, 2024 · 1 comment
Open
1 task done

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

mabubakarsaleem opened this issue Jul 7, 2024 · 1 comment
Labels
question Further information is requested

Comments

@mabubakarsaleem
Copy link

Search before asking

Question

Hello,

I have a question regarding the checkpointing mechanism in YOLOv5, specifically related to saving and resuming the training process.
When training a YOLOv5 model, the last.pt checkpoint saves the model's weights and optimizer state. However, it appears that training process parameters, such as the early stopping patience value, are not included in this checkpoint.
If my training is interrupted and I restart from the last.pt checkpoint, does the patience value reset to zero, or does it continue from the previously recorded value?

Additional

No response

@mabubakarsaleem mabubakarsaleem added the question Further information is requested label Jul 7, 2024
@glenn-jocher
Copy link
Member

@mabubakarsaleem hello,

Thank you for your question and for thoroughly searching the issues and discussions beforehand!

Currently, the last.pt checkpoint in YOLOv5 saves the model's weights and optimizer state but does not include training process parameters such as the early stopping patience value. Therefore, if your training is interrupted and you restart from the last.pt checkpoint, the patience value will reset to its initial state rather than continuing from the previously recorded value.

To maintain the early stopping patience value across training sessions, you can manually track this parameter and adjust it when resuming training. Here's a simple way to do this:

  1. Save the Patience Value: Before interrupting the training, save the current patience value to a file.
  2. Load the Patience Value: When resuming training, read the saved patience value and set it accordingly.

Here's a code snippet to illustrate this:

# Save patience value before interrupting training
patience_value = early_stopping.patience
with open('patience_value.txt', 'w') as f:
    f.write(str(patience_value))

# Load patience value when resuming training
with open('patience_value.txt', 'r') as f:
    patience_value = int(f.read())
early_stopping.patience = patience_value

Additionally, I encourage you to verify that you are using the latest versions of torch and the YOLOv5 repository to ensure you have the most up-to-date features and bug fixes. You can update YOLOv5 with the following commands:

git pull  # update YOLOv5
pip install -U torch  # update PyTorch

If you have any further questions or need additional assistance, feel free to ask. The YOLO community and the Ultralytics team are here to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants