Introduction to the Finetuning Scheduler¶
The FinetuningScheduler
callback accelerates and enhances
foundational model experimentation with flexible finetuning schedules. Training with the
FinetuningScheduler
callback is simple and confers a host of benefits:
it dramatically increases finetuning flexibility
expedites and facilitates exploration of model tuning dynamics
enables marginal performance improvements of finetuned models
Note
If you’re exploring using the FinetuningScheduler
, this is a great place
to start!
You may also find the notebook-based tutorial
useful and for those using the LightningCLI, there is a
CLI-based example at the bottom of this introduction.
Setup¶
Setup is straightforward, just install from PyPI!
pip install finetuning-scheduler
Additional installation options (from source etc.) are discussed under “Additional installation options” in the README
Motivation¶
Fundamentally, the FinetuningScheduler
callback enables
multi-phase, scheduled finetuning of foundational models. Gradual unfreezing (i.e. thawing) can help maximize
foundational model knowledge retention while allowing (typically upper layers of) the model to optimally adapt to new
tasks during transfer learning 1 2 3 .
FinetuningScheduler
orchestrates the gradual unfreezing
of models via a finetuning schedule that is either implicitly generated (the default) or explicitly provided by the user
(more computationally efficient). Finetuning phase transitions are driven by
FTSEarlyStopping
criteria (a multi-phase
extension of EarlyStopping
),
user-specified epoch transitions or a composition of the two (the default mode). A
FinetuningScheduler
training session completes when the
final phase of the schedule has its stopping criteria met. See
Early Stopping for more details on that callback’s configuration.
Basic Usage¶
If no finetuning schedule is user-provided, FinetuningScheduler
will generate a
default schedule and proceed to finetune
according to the generated schedule, using default
FTSEarlyStopping
and FTSCheckpoint
callbacks with
monitor=val_loss
.
from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler
trainer = Trainer(callbacks=[FinetuningScheduler()])
The Default Finetuning Schedule¶
Schedule definition is facilitated via
gen_ft_schedule()
which dumps
a default finetuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as
desired by the user and/or subsequently passed to the callback. Using the default/implicitly generated schedule will
often be less computationally efficient than a user-defined finetuning schedule but can often serve as a
good baseline for subsequent explicit schedule refinement and will marginally outperform many explicit schedules.
Specifying a Finetuning Schedule¶
To specify a finetuning schedule, it’s convenient to first generate the default schedule and then alter the thawed/unfrozen parameter groups associated with each finetuning phase as desired. Finetuning phases are zero-indexed and executed in ascending order.
First, generate the default schedule to
Trainer.log_dir
. It will be named after yourLightningModule
subclass with the suffix_ft_schedule.yaml
.
from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler
trainer = Trainer(callbacks=[FinetuningScheduler(gen_ft_sched_only=True)])
Alter the schedule as desired.
Changing the generated schedule for this boring model…
1 0:
2 params:
3 - layer.3.bias
4 - layer.3.weight
5 1:
6 params:
7 - layer.2.bias
8 - layer.2.weight
9 2:
10 params:
11 - layer.1.bias
12 - layer.1.weight
13 3:
14 params:
15 - layer.0.bias
16 - layer.0.weight
… to have three finetuning phases instead of four:
1 0:
2 params:
3 - layer.3.bias
4 - layer.3.weight
5 1:
6 params:
7 - layer.2.*
8 - layer.1.bias
9 - layer.1.weight
10 2:
11 params:
12 - layer.0.*
Once the finetuning schedule has been altered as desired, pass it to
FinetuningScheduler
to commence scheduled training:
from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler
trainer = Trainer(callbacks=[FinetuningScheduler(ft_schedule="/path/to/my/schedule/my_schedule.yaml")])
EarlyStopping and Epoch-Driven Phase Transition Criteria¶
By default, FTSEarlyStopping
and epoch-driven
transition criteria are composed. If a max_transition_epoch
is specified for a given phase, the next finetuning
phase will begin at that epoch unless
FTSEarlyStopping
criteria are met first.
If epoch_transitions_only
is
True
, FTSEarlyStopping
will not be used
and transitions will be exclusively epoch-driven.
Tip
Use of regex expressions can be convenient for specifying more complex schedules. Also, a per-phase
base_max_lr
can be specified:
1 0:
2 params: # the parameters for each phase definition can be fully specified
3 - model.classifier.bias
4 - model.classifier.weight
5 max_transition_epoch: 3
6 1:
7 params: # or specified via a regex
8 - model.albert.pooler.*
9 2:
10 params:
11 - model.albert.encoder.*.ffn_output.*
12 max_transition_epoch: 9
13 lr: 1e-06 # per-phase maximum learning rates can be specified
14 3:
15 params: # both approaches to parameter specification can be used in the same phase
16 - model.albert.encoder.*.(ffn\.|attention|full*).*
17 - model.albert.encoder.embedding_hidden_mapping_in.bias
18 - model.albert.encoder.embedding_hidden_mapping_in.weight
19 - model.albert.embeddings.*
For a practical end-to-end example of using
FinetuningScheduler
in implicit versus explicit modes,
see scheduled finetuning for SuperGLUE below or the
notebook-based tutorial.
Resuming Scheduled Finetuning Training Sessions¶
Resumption of scheduled finetuning training is identical to the continuation of
other training sessions with the caveat that the provided checkpoint must
have been saved by a FinetuningScheduler
session.
FinetuningScheduler
uses
FTSCheckpoint
(an extension of
ModelCheckpoint
) to maintain schedule state with
special metadata.
from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler
trainer = Trainer(callbacks=[FinetuningScheduler()], ckpt_path="some/path/to/my_checkpoint.ckpt")
Training will resume at the depth/level of the provided checkpoint according the specified schedule. Schedules can be altered between training sessions but schedule compatibility is left to the user for maximal flexibility. If executing a user-defined schedule, typically the same schedule should be provided for the original and resumed training sessions.
Tip
By default (
restore_best
is True
),
FinetuningScheduler
will attempt to restore
the best available checkpoint before finetuning depth transitions.
trainer = Trainer(
callbacks=[FinetuningScheduler()],
ckpt_path="some/path/to/my_kth_best_checkpoint.ckpt",
)
Note that similar to the behavior of
ModelCheckpoint
,
(specifically this PR), when resuming training
with a different FTSCheckpoint
dirpath
from the provided
checkpoint, the new training session’s checkpoint state will be re-initialized at the resumption depth with the
provided checkpoint being set as the best checkpoint.
Finetuning all the way down!¶
There are plenty of options for customizing
FinetuningScheduler
’s behavior, see
scheduled finetuning for SuperGLUE below for examples of composing different
configurations.
Example: Scheduled Finetuning For SuperGLUE¶
A demonstration of the scheduled finetuning callback
FinetuningScheduler
using the
RTE and
BoolQ tasks of the
SuperGLUE benchmark and the LightningCLI
is available under ./fts_examples/
.
Since this CLI-based example requires a few additional packages (e.g. transformers
, sentencepiece
), you
should install them using the [examples]
extra:
pip install finetuning-scheduler['examples']
There are three different demo schedule configurations composed with shared defaults (./config/fts_defaults.yaml) provided for the default ‘rte’ task. Note DDP (with auto-selected GPUs) is the default configuration so ensure you adjust the configuration files referenced below as desired for other configurations.
Note there will likely be minor variations in training paths and performance as packages (e.g. transformers
,
datasets
, finetuning-scheduler
itself etc.) evolve. The precise package versions and salient environmental
configuration used in the building of this tutorial is available in the tensorboard summaries, logs and checkpoints
referenced below if you’re interested.
# Generate a baseline without scheduled finetuning enabled:
python fts_superglue.py fit --config config/nofts_baseline.yaml
# Train with the default finetuning schedule:
python fts_superglue.py fit --config config/fts_implicit.yaml
# Train with a non-default finetuning schedule:
python fts_superglue.py fit --config config/fts_explicit.yaml
All three training scenarios use identical configurations with the exception of the provided finetuning schedule. See
the
tensorboard experiment summaries
and table below for a characterization of the relative computational and performance tradeoffs
associated with these FinetuningScheduler
configurations.
FinetuningScheduler
expands the space of possible
finetuning schedules and the composition of more sophisticated schedules can yield marginal finetuning performance
gains. That stated, it should be emphasized the primary utility of
FinetuningScheduler
is to grant greater finetuning
flexibility for model exploration in research. For example, glancing at DeBERTa-v3’s implicit training run, a critical
tuning transition point is immediately apparent:
Our val_loss begins a precipitous decline at step 3119 which corresponds to phase 17 in the schedule. Referring to our
schedule, in phase 17 we’re beginning tuning the attention parameters of our 10th encoder layer (of 11). Interesting!
Though beyond the scope of this documentation, it might be worth investigating these dynamics further and
FinetuningScheduler
allows one to do just that quite
easily.
In addition to the tensorboard experiment summaries, full logs/schedules for all three scenarios are available as well as the checkpoints produced in the scenarios (caution, ~3.5GB).
Example Scenario
|
nofts_baseline
|
fts_implicit
|
fts_explicit
|
---|---|---|---|
Finetuning Schedule
|
None |
Default |
User-defined |
RTE Accuracy
(
0.81 , 0.84 , 0.85 ) |
Note that though this example is intended to capture a common usage scenario, substantial variation is expected among
use cases and models. In summary, FinetuningScheduler
provides increased finetuning flexibility that can be useful in a variety of contexts from exploring model tuning
behavior to maximizing performance.
Note
The FinetuningScheduler
callback is currently in beta.
Footnotes¶
- 1
Howard, J., & Ruder, S. (2018). Fine-tuned Language Models for Text Classification. ArXiv, abs/1801.06146.
- 2
Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019). An embarrassingly simple approach for transfer learning from pretrained language models. arXiv preprint arXiv:1902.10547.
- 3
Peters, M. E., Ruder, S., & Smith, N. A. (2019). To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987.
See also