Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trainer.test() with given checkpoint logs last epoch instead of checkpoint epoch #20052

Open
markussteindl opened this issue Jul 5, 2024 · 1 comment
Labels
bug Something isn't working help wanted Open to be worked on repro needed The issue is missing a reproducible example

Comments

@markussteindl
Copy link

markussteindl commented Jul 5, 2024

Bug description

Testing from a given checkpoint leads to logging the epoch number of the last checkpoint instead of the checkpoint specified:

trainer = Trainer(..., max_epochs=10)
lightning_module = MyLightningModule(...)
datamodule = MyDatamodule()

trainer.fit(lightning_module , datamodule=datamodule)

trainer.test(lightning_module , datamodule=datamodule, ckpt_path="last")     # <-- ok: logs correct epoch and step
ckpt_path="/.../checkpoints/epoch=2-step=396.ckpt"
trainer.test(lightning_module , datamodule=datamodule, ckpt_path=ckpt_path)  # <-- incorrect: logs last epoch and step

The second test logs epoch 10 instead of epoch 2. Similarly, the step number of the second test is incorrect.

What version are you seeing the problem on?

v2.2.1

@markussteindl markussteindl added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jul 5, 2024
@heth27
Copy link

heth27 commented Jul 5, 2024

I guess this could be caused by the same as #18060. The checkpoint callback is not the last callback called, and thus some loop counter are not updated. Have a look at the fields mentioned in #18060 (comment) and see if this explains the behavior you notice, it might also offer you a workaround.

@awaelchli awaelchli added help wanted Open to be worked on repro needed The issue is missing a reproducible example and removed needs triage Waiting to be triaged by maintainers labels Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on repro needed The issue is missing a reproducible example
Projects
None yet
Development

No branches or pull requests

3 participants