Saving Early Stopping Patience Value in last.pt Checkpoint #13173

mabubakarsaleem · 2024-07-07T13:31:17Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hello,

I have a question regarding the checkpointing mechanism in YOLOv5, specifically related to saving and resuming the training process.
When training a YOLOv5 model, the last.pt checkpoint saves the model's weights and optimizer state. However, it appears that training process parameters, such as the early stopping patience value, are not included in this checkpoint.
If my training is interrupted and I restart from the last.pt checkpoint, does the patience value reset to zero, or does it continue from the previously recorded value?

Additional

No response

glenn-jocher · 2024-07-08T12:47:43Z

@mabubakarsaleem hello,

Thank you for your question and for thoroughly searching the issues and discussions beforehand!

Currently, the last.pt checkpoint in YOLOv5 saves the model's weights and optimizer state but does not include training process parameters such as the early stopping patience value. Therefore, if your training is interrupted and you restart from the last.pt checkpoint, the patience value will reset to its initial state rather than continuing from the previously recorded value.

To maintain the early stopping patience value across training sessions, you can manually track this parameter and adjust it when resuming training. Here's a simple way to do this:

Save the Patience Value: Before interrupting the training, save the current patience value to a file.
Load the Patience Value: When resuming training, read the saved patience value and set it accordingly.

Here's a code snippet to illustrate this:

# Save patience value before interrupting training
patience_value = early_stopping.patience
with open('patience_value.txt', 'w') as f:
    f.write(str(patience_value))

# Load patience value when resuming training
with open('patience_value.txt', 'r') as f:
    patience_value = int(f.read())
early_stopping.patience = patience_value

Additionally, I encourage you to verify that you are using the latest versions of torch and the YOLOv5 repository to ensure you have the most up-to-date features and bug fixes. You can update YOLOv5 with the following commands:

git pull  # update YOLOv5
pip install -U torch  # update PyTorch

If you have any further questions or need additional assistance, feel free to ask. The YOLO community and the Ultralytics team are here to help!

mabubakarsaleem added the question Further information is requested label Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

mabubakarsaleem commented Jul 7, 2024

glenn-jocher commented Jul 8, 2024

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

Saving Early Stopping Patience Value in last.pt Checkpoint #13173

Comments

mabubakarsaleem commented Jul 7, 2024

Search before asking

Question

Additional

glenn-jocher commented Jul 8, 2024