Add MultiStepLR with Warmup Scheduler #31831

penguinwang96825 · 2024-07-07T14:27:50Z

Feature request

I would like to propose the addition of a new learning rate scheduler that combines MultiStepLR with a warmup phase. Currently, the Transformers library does not include a scheduler that uses both MultiStepLR and warmup. This feature can be beneficial for training models where the learning rate needs to be adjusted at specific epochs with an initial warmup phase to stabilise training.

Motivation

In many training scenarios, it is beneficial to start with a warmup phase where the learning rate gradually increases, followed by a phase where the learning rate decreases at specific milestones (steps).

Contribution

I propose adding a new scheduler, get_multistep_schedule_with_warmup, which combines the functionality of MultiStepLR and Warmup. This scheduler will increase the learning rate linearly during the warmup phase and then follow the MultiStepLR schedule. I am more than happy to create a pull request (PR) implementing this feature. Please let me know if this sounds like a valuable addition, and I will proceed with the implementation.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-07-08T09:43:44Z

cc @muellerzr @SunMarc

muellerzr · 2024-07-08T15:59:08Z

Hi! In general we prefer if you can provide tangible results of improvement via either your own work or a paper referencing it. Can you link any please? thanks!

penguinwang96825 · 2024-07-08T22:53:05Z

@muellerzr Thanks for the prompt reply! Sure I present it in detail below.

Popularity and Practical Use

The MultiStepLR scheduler is widely used and recognized for its effectiveness in practice, as evidenced by its popularity among PyTorch users. According to Defazio et al. (2023), it is one of the top three most popular schedulers. This piece-wise approach to decreasing the learning rate when progress plateaus has proven to be effective. Many studies incorporate this scheduler as a default choice for learning rate adjustment (Sohn et al., 2016; Wang et al., 2017; Gong et al., 2021).

PyTorch Scheduler	GitHub Files (K)
ReduceLROnPlateau	105.0
StepLR	101.0
MultiStepLR	37.9
CosineAnnealingLR	37.1
ExponentialLR	16.0
OneCycleLR	14.9
CosineAnnealingWarmRestarts	10.9
CyclicLR	9.1
LinearLR	5.9
ConstantLR	3.6
MultiplicativeLR	2.6
PolynomialLR	1.3

References

Defazio, Aaron, et al. "When, why and how much? adaptive learning rate scheduling by refinement." arXiv preprint arXiv:2310.07831 (2023).

Gong, Yuan, Yu-An Chung, and James Glass. "Ast: Audio spectrogram transformer." Proc. Interspeech (2021).

Wang, Jian, et al. "Deep metric learning with angular loss." Proceedings of the IEEE international conference on computer vision. (2017).

Sohn, Kihyuk. "Improved deep metric learning with multi-class n-pair loss objective." Advances in neural information processing systems 29 (2016).

penguinwang96825 added the Feature request Request for a new feature label Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiStepLR with Warmup Scheduler #31831

Add MultiStepLR with Warmup Scheduler #31831

penguinwang96825 commented Jul 7, 2024 •

edited

Loading

amyeroberts commented Jul 8, 2024

muellerzr commented Jul 8, 2024

penguinwang96825 commented Jul 8, 2024

Add MultiStepLR with Warmup Scheduler #31831

Add MultiStepLR with Warmup Scheduler #31831

Comments

penguinwang96825 commented Jul 7, 2024 • edited Loading

Feature request

Motivation

Contribution

amyeroberts commented Jul 8, 2024

muellerzr commented Jul 8, 2024

penguinwang96825 commented Jul 8, 2024

Popularity and Practical Use

References

penguinwang96825 commented Jul 7, 2024 •

edited

Loading