Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiStepLR with Warmup Scheduler #31831

Open
penguinwang96825 opened this issue Jul 7, 2024 · 3 comments
Open

Add MultiStepLR with Warmup Scheduler #31831

penguinwang96825 opened this issue Jul 7, 2024 · 3 comments
Labels
Feature request Request for a new feature

Comments

@penguinwang96825
Copy link

penguinwang96825 commented Jul 7, 2024

Feature request

I would like to propose the addition of a new learning rate scheduler that combines MultiStepLR with a warmup phase. Currently, the Transformers library does not include a scheduler that uses both MultiStepLR and warmup. This feature can be beneficial for training models where the learning rate needs to be adjusted at specific epochs with an initial warmup phase to stabilise training.

Motivation

In many training scenarios, it is beneficial to start with a warmup phase where the learning rate gradually increases, followed by a phase where the learning rate decreases at specific milestones (steps).

Contribution

I propose adding a new scheduler, get_multistep_schedule_with_warmup, which combines the functionality of MultiStepLR and Warmup. This scheduler will increase the learning rate linearly during the warmup phase and then follow the MultiStepLR schedule. I am more than happy to create a pull request (PR) implementing this feature. Please let me know if this sounds like a valuable addition, and I will proceed with the implementation.

@penguinwang96825 penguinwang96825 added the Feature request Request for a new feature label Jul 7, 2024
@amyeroberts
Copy link
Collaborator

cc @muellerzr @SunMarc

@muellerzr
Copy link
Contributor

Hi! In general we prefer if you can provide tangible results of improvement via either your own work or a paper referencing it. Can you link any please? thanks!

@penguinwang96825
Copy link
Author

@muellerzr Thanks for the prompt reply! Sure I present it in detail below.

Popularity and Practical Use

The MultiStepLR scheduler is widely used and recognized for its effectiveness in practice, as evidenced by its popularity among PyTorch users. According to Defazio et al. (2023), it is one of the top three most popular schedulers. This piece-wise approach to decreasing the learning rate when progress plateaus has proven to be effective. Many studies incorporate this scheduler as a default choice for learning rate adjustment (Sohn et al., 2016; Wang et al., 2017; Gong et al., 2021).

PyTorch Scheduler GitHub Files (K)
ReduceLROnPlateau 105.0
StepLR 101.0
MultiStepLR 37.9
CosineAnnealingLR 37.1
ExponentialLR 16.0
OneCycleLR 14.9
CosineAnnealingWarmRestarts 10.9
CyclicLR 9.1
LinearLR 5.9
ConstantLR 3.6
MultiplicativeLR 2.6
PolynomialLR 1.3

References

Defazio, Aaron, et al. "When, why and how much? adaptive learning rate scheduling by refinement." arXiv preprint arXiv:2310.07831 (2023).

Gong, Yuan, Yu-An Chung, and James Glass. "Ast: Audio spectrogram transformer." Proc. Interspeech (2021).

Wang, Jian, et al. "Deep metric learning with angular loss." Proceedings of the IEEE international conference on computer vision. (2017).

Sohn, Kihyuk. "Improved deep metric learning with multi-class n-pair loss objective." Advances in neural information processing systems 29 (2016).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants