Shortcuts

Introduction to the Finetuning Scheduler

The FinetuningScheduler callback accelerates and enhances foundational model experimentation with flexible finetuning schedules. Training with the FinetuningScheduler callback is simple and confers a host of benefits:

  • it dramatically increases finetuning flexibility

  • expedites and facilitates exploration of model tuning dynamics

  • enables marginal performance improvements of finetuned models

Note

If you’re exploring using the FinetuningScheduler, this is a great place to start! You may also find the notebook-based tutorial useful (link provided here as soon as it is published on the pytorch lightning production documentation site) and for those using the LightningCLI, there is a CLI-based example at the bottom of this introduction.

Setup

Setup is straightforward, just install from PyPI!

pip install finetuning-scheduler

Additional installation options (from source etc.) are discussed under “Additional installation options” in the README

Motivation

Fundamentally, the FinetuningScheduler callback enables multi-phase, scheduled finetuning of foundational models. Gradual unfreezing (i.e. thawing) can help maximize foundational model knowledge retention while allowing (typically upper layers of) the model to optimally adapt to new tasks during transfer learning 1 2 3 .

FinetuningScheduler orchestrates the gradual unfreezing of models via a finetuning schedule that is either implicitly generated (the default) or explicitly provided by the user (more computationally efficient). Finetuning phase transitions are driven by FTSEarlyStopping criteria (a multi-phase extension of EarlyStopping), user-specified epoch transitions or a composition of the two (the default mode). A FinetuningScheduler training session completes when the final phase of the schedule has its stopping criteria met. See Early Stopping for more details on that callback’s configuration.

Basic Usage

If no finetuning schedule is user-provided, FinetuningScheduler will generate a default schedule and proceed to finetune according to the generated schedule, using default FTSEarlyStopping and FTSCheckpoint callbacks with monitor=val_loss.

from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler

trainer = Trainer(callbacks=[FinetuningScheduler()])

The Default Finetuning Schedule

Schedule definition is facilitated via gen_ft_schedule() which dumps a default finetuning schedule (by default using a naive, 2-parameters per level heuristic) which can be adjusted as desired by the user and/or subsequently passed to the callback. Using the default/implicitly generated schedule will often be less computationally efficient than a user-defined finetuning schedule but can often serve as a good baseline for subsequent explicit schedule refinement and will marginally outperform many explicit schedules.

Specifying a Finetuning Schedule

To specify a finetuning schedule, it’s convenient to first generate the default schedule and then alter the thawed/unfrozen parameter groups associated with each finetuning phase as desired. Finetuning phases are zero-indexed and executed in ascending order.

  1. First, generate the default schedule to Trainer.log_dir. It will be named after your LightningModule subclass with the suffix _ft_schedule.yaml.

from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler

trainer = Trainer(callbacks=[FinetuningScheduler(gen_ft_sched_only=True)])
  1. Alter the schedule as desired.

Changing the generated schedule for this boring model…

 1  0:
 2      params:
 3      - layer.3.bias
 4      - layer.3.weight
 5  1:
 6      params:
 7      - layer.2.bias
 8      - layer.2.weight
 9  2:
10      params:
11      - layer.1.bias
12      - layer.1.weight
13  3:
14      params:
15      - layer.0.bias
16      - layer.0.weight

… to have three finetuning phases instead of four:

 1  0:
 2      params:
 3      - layer.3.bias
 4      - layer.3.weight
 5  1:
 6      params:
 7      - layer.2.*
 8      - layer.1.bias
 9      - layer.1.weight
10  2:
11      params:
12      - layer.0.*
  1. Once the finetuning schedule has been altered as desired, pass it to FinetuningScheduler to commence scheduled training:

from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler

trainer = Trainer(callbacks=[FinetuningScheduler(ft_schedule="/path/to/my/schedule/my_schedule.yaml")])

EarlyStopping and Epoch-Driven Phase Transition Criteria

By default, FTSEarlyStopping and epoch-driven transition criteria are composed. If a max_transition_epoch is specified for a given phase, the next finetuning phase will begin at that epoch unless FTSEarlyStopping criteria are met first. If epoch_transitions_only is True, FTSEarlyStopping will not be used and transitions will be exclusively epoch-driven.

Tip

Use of regex expressions can be convenient for specifying more complex schedules. Also, a per-phase base_max_lr can be specified:

 1 0:
 2   params: # the parameters for each phase definition can be fully specified
 3   - model.classifier.bias
 4   - model.classifier.weight
 5   max_transition_epoch: 3
 6 1:
 7   params: # or specified via a regex
 8   - model.albert.pooler.*
 9 2:
10   params:
11   - model.albert.encoder.*.ffn_output.*
12   max_transition_epoch: 9
13   lr: 1e-06 # per-phase maximum learning rates can be specified
14 3:
15   params: # both approaches to parameter specification can be used in the same phase
16   - model.albert.encoder.*.(ffn\.|attention|full*).*
17   - model.albert.encoder.embedding_hidden_mapping_in.bias
18   - model.albert.encoder.embedding_hidden_mapping_in.weight
19   - model.albert.embeddings.*

For a practical end-to-end example of using FinetuningScheduler in implicit versus explicit modes, see scheduled finetuning for SuperGLUE below or the notebook-based tutorial (link will be added as soon as it is released on the PyTorch Lightning production documentation site).

Resuming Scheduled Finetuning Training Sessions

Resumption of scheduled finetuning training is identical to the continuation of other training sessions with the caveat that the provided checkpoint must have been saved by a FinetuningScheduler session. FinetuningScheduler uses FTSCheckpoint (an extension of ModelCheckpoint) to maintain schedule state with special metadata.

from pytorch_lightning import Trainer
from finetuning_scheduler import FinetuningScheduler

trainer = Trainer(callbacks=[FinetuningScheduler()], ckpt_path="some/path/to/my_checkpoint.ckpt")

Training will resume at the depth/level of the provided checkpoint according the specified schedule. Schedules can be altered between training sessions but schedule compatibility is left to the user for maximal flexibility. If executing a user-defined schedule, typically the same schedule should be provided for the original and resumed training sessions.

Tip

By default ( restore_best is True), FinetuningScheduler will attempt to restore the best available checkpoint before finetuning depth transitions.

trainer = Trainer(
    callbacks=[FinetuningScheduler()],
    ckpt_path="some/path/to/my_kth_best_checkpoint.ckpt",
)

Note that similar to the behavior of ModelCheckpoint, (specifically this PR), when resuming training with a different FTSCheckpoint dirpath from the provided checkpoint, the new training session’s checkpoint state will be re-initialized at the resumption depth with the provided checkpoint being set as the best checkpoint.

Finetuning all the way down!

There are plenty of options for customizing FinetuningScheduler’s behavior, see scheduled finetuning for SuperGLUE below for examples of composing different configurations.


Example: Scheduled Finetuning For SuperGLUE

A demonstration of the scheduled finetuning callback FinetuningScheduler using the RTE and BoolQ tasks of the SuperGLUE benchmark and the LightningCLI is available under ./fts_examples/.

Since this CLI-based example requires a few additional packages (e.g. transformers, sentencepiece), you should install them using the [examples] extra:

pip install finetuning-scheduler['examples']

There are three different demo schedule configurations composed with shared defaults (./config/fts_defaults.yaml) provided for the default ‘rte’ task. Note DDP (with auto-selected GPUs) is the default configuration so ensure you adjust the configuration files referenced below as desired for other configurations.

# Generate a baseline without scheduled finetuning enabled:
python fts_superglue.py fit --config config/nofts_baseline.yaml

# Train with the default finetuning schedule:
python fts_superglue.py fit --config config/fts_implicit.yaml

# Train with a non-default finetuning schedule:
python fts_superglue.py fit --config config/fts_explicit.yaml

All three training scenarios use identical configurations with the exception of the provided finetuning schedule. See the tensorboard experiment summaries and table below for a characterization of the relative computational and performance tradeoffs associated with these FinetuningScheduler configurations.

FinetuningScheduler expands the space of possible finetuning schedules and the composition of more sophisticated schedules can yield marginal finetuning performance gains. That stated, it should be emphasized the primary utility of FinetuningScheduler is to grant greater finetuning flexibility for model exploration in research. For example, glancing at DeBERTa-v3’s implicit training run, a critical tuning transition point is immediately apparent:

Our val_loss begins a precipitous decline at step 3119 which corresponds to phase 17 in the schedule. Referring to our schedule, in phase 17 we’re beginning tuning the attention parameters of our 10th encoder layer (of 11). Interesting! Though beyond the scope of this documentation, it might be worth investigating these dynamics further and FinetuningScheduler allows one to do just that quite easily.

In addition to the tensorboard experiment summaries, full logs/schedules for all three scenarios are available as well as the checkpoints produced in the scenarios (caution, ~3.5GB).

Example Scenario
nofts_baseline
fts_implicit
fts_explicit
Finetuning Schedule

None

Default

User-defined

RTE Accuracy
(0.81, 0.84, 0.85)

Note that though this example is intended to capture a common usage scenario, substantial variation is expected among use cases and models. In summary, FinetuningScheduler provides increased finetuning flexibility that can be useful in a variety of contexts from exploring model tuning behavior to maximizing performance.

FinetuningScheduler Explicit Loss Animation

Note

The FinetuningScheduler callback is currently in beta.

Footnotes

1

Howard, J., & Ruder, S. (2018). Fine-tuned Language Models for Text Classification. ArXiv, abs/1801.06146.

2

Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019). An embarrassingly simple approach for transfer learning from pretrained language models. arXiv preprint arXiv:1902.10547.

3

Peters, M. E., Ruder, S., & Smith, N. A. (2019). To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987.

Finetuning Scheduler API

fts

Finetuning Scheduler

fts_supporters

Finetuning Scheduler Supporters

Contributor Covenant Code of Conduct

Our Pledge

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Examples of behavior that contributes to creating a positive environment include:

  • Using welcoming and inclusive language

  • Being respectful of differing viewpoints and experiences

  • Gracefully accepting constructive criticism

  • Focusing on what is best for the community

  • Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

  • The use of sexualized language or imagery and unwelcome sexual attention or advances

  • Trolling, insulting/derogatory comments, and personal or political attacks

  • Public or private harassment

  • Publishing others’ private information, such as a physical or electronic address, without explicit permission

  • Other conduct which could reasonably be considered inappropriate in a professional setting

Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

Scope

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at waf2107@columbia.edu. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.

Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq

Contributing

Welcome to the community! Finetuning Scheduler extends the most advanced DL research platform on the planet (PyTorch Lightning) and strives to support the latest, best practices and integrations that the amazing PyTorch team and other research organizations roll out!

As Finetuning Scheduler is an extension of PyTorch Lightning, the remainder of the contribution guidelines conform to (and many are drawn from) the PyTorch Lightning contribution documentation.

A giant thank you to the PyTorch Lightning team for their tireless effort building the immensely useful PyTorch Lightning project and their thoughtful feedback on and review of this extension.

Main Core Value: One less thing to remember

Simplify the API as much as possible from the user perspective. Any additions or improvements should minimize the things the user needs to remember.

Design Principles

We encourage all sorts of contributions you’re interested in adding! When coding for Finetuning Scheduler, please follow these principles.

No PyTorch Interference

We don’t want to add any abstractions on top of pure PyTorch. This gives researchers all the control they need without having to learn yet another framework.

Simple Internal Code

It’s useful for users to look at the code and understand very quickly what’s happening. Many users won’t be engineers. Thus we need to value clear, simple code over condensed ninja moves. While that’s super cool, this isn’t the project for that :)

Simple External API

What makes sense to you may not make sense to others. When creating an issue with an API change suggestion, please validate that it makes sense for others. Treat code changes the way you treat a startup: validate that it’s a needed feature, then add if it makes sense for many people.

Backward-compatible API

We all hate updating our deep learning packages because we don’t want to refactor a bunch of stuff. With the Finetuning Scheduler, we make sure every change we make which could break an API is backward compatible with good deprecation warnings.

You shouldn’t be afraid to upgrade the Finetuning Scheduler :)

Gain User Trust

As a researcher, you can’t have any part of your code going wrong. So, make thorough tests to ensure that every implementation of a new trick or subtle change is correct.


Contribution Types

We are always open to contributions of new features or bug fixes.

A lot of good work has already been done in project mechanics (requirements.txt, setup.py, pep8, badges, ci, etc…) so we’re in a good state there thanks to all the early contributors (even pre-beta release)!

Bug Fixes:
  1. If you find a bug please submit a GitHub issue.

    • Make sure the title explains the issue.

    • Describe your setup, what you are trying to do, expected vs. actual behaviour. Please add configs and code samples.

    • Add details on how to reproduce the issue - a minimal test case is always best, colab is also great. Note, that the sample code shall be minimal and if needed with publicly available data.

  2. Try to fix it or recommend a solution. We highly recommend to use test-driven approach:

    • Convert your minimal code example to a unit/integration test with assert on expected results.

    • Start by debugging the issue… You can run just this particular test in your IDE and draft a fix.

    • Verify that your test case fails on the main branch and only passes with the fix applied.

  3. Submit a PR!

Note, even if you do not find the solution, sending a PR with a test covering the issue is a valid contribution, and we can help you or finish it with you :]

New Features:
  1. Submit a GitHub issue - describe what is the motivation of such feature (adding the use case, or an example is helpful).

  2. Determine the feature scope with us.

  3. Submit a PR! We recommend test driven approach to adding new features as well:

    • Write a test for the functionality you want to add.

    • Write the functional code until the test passes.

  4. Add/update the relevant tests!

Test cases:

Want to keep Finetuning Scheduler healthy? Love seeing those green tests? So do we! How to we keep it that way? We write tests! We value tests contribution even more than new features.


Guidelines

Developments scripts

To build the documentation locally, simply execute the following commands from project root (only for Unix):

  • make clean cleans repo from temp/generated files

  • make docs builds documentation under docs/build/html

  • make test runs all project’s tests with coverage

Original code

All added or edited code shall be the own original work of the particular contributor. If you use some third-party implementation, all such blocks/functions/modules shall be properly referred and if possible also agreed by code’s author. For example - This code is inspired from http://....

Coding Style
  1. Use f-strings for output formation

  2. You can use pre-commit to make sure your code style is correct.

Documentation

We are using Sphinx with Napoleon extension. Moreover, we set Google style to follow with type convention.

See following short example of a sample function taking one position string and optional

from typing import Optional


def my_func(param_a: int, param_b: Optional[float] = None) -> str:
    """Sample function.

    Args:
        param_a: first parameter
        param_b: second parameter

    Return:
        sum of both numbers

    Example::

        Sample doctest example...
        >>> my_func(1, 2)
        3

    Note:
        If you want to add something.
    """
    p = param_b if param_b else 0
    return str(param_a + p)

When updating the docs make sure to build them first locally and visually inspect the html files (in the browser) for formatting errors. In certain cases, a missing blank line or a wrong indent can lead to a broken layout. Run these commands

pip install -r requirements/docs.txt
make clean
cd docs
make html

and open docs/build/html/index.html in your browser.

Notes:

  • You need to have LaTeX installed for rendering math equations. You can for example install TeXLive by doing one of the following:

    • on Ubuntu (Linux) run apt-get install texlive or otherwise follow the instructions on the TeXLive website

    • use the RTD docker image

  • with PL used class meta you need to use python 3.7 or higher

Testing

Local: Testing your work locally will help you speed up the process since it allows you to focus on particular (failing) test-cases. To setup a local development environment, install both local and test dependencies:

python -m pip install ".[all]"
python -m pip install pre-commit
pre-commit install

Note: if your computer does not have multi-GPU nor TPU these tests are skipped.

GitHub Actions: For convenience, you can also use your own GHActions building which will be triggered with each commit. This is useful if you do not test against all required dependency versions.

You can then run:

python -m pytest finetuning_scheduler tests fts_examples -v
Pull Request

We welcome any useful contribution! For your convenience here’s a recommended workflow:

  1. Think about what you want to do - fix a bug, repair docs, etc. If you want to implement a new feature or enhance an existing one.

    • Start by opening a GitHub issue to explain the feature and the motivation. In the case of features, ask yourself first - Is this NECESSARY for Finetuning Scheduler? There are some PRs that are just purely about adding engineering complexity which has no place in Finetuning Scheduler.

    • Core contributors will take a look (it might take some time - we are often overloaded with issues!) and discuss it.

    • Once an agreement was reached - start coding.

  2. Start your work locally.

    • Create a branch and prepare your changes.

    • Tip: do not work on your main branch directly, it may become complicated when you need to rebase.

    • Tip: give your PR a good name! It will be useful later when you may work on multiple tasks/PRs.

  3. Test your code!

    • It is always good practice to start coding by creating a test case, verifying it breaks with current behavior, and passes with your new changes.

    • Make sure your new tests cover all different edge cases.

    • Make sure all exceptions raised are tested.

    • Make sure all warnings raised are tested.

  4. If your PR is not ready for reviews, but you want to run it on our CI, open a “Draft PR” to let us know you don’t need feedback yet.

  5. When you feel ready for integrating your work, mark your PR “Ready for review”.

    • Your code should be readable and follow the project’s design principles.

    • Make sure all tests are passing and any new code is tested for (coverage!).

    • Make sure you link the GitHub issue to your PR.

    • Make sure any docs for that piece of code are updated, or added.

    • The code should be elegant and simple. No over-engineering or hard-to-read code.

    Do your best but don’t sweat about perfection! We do code-review to find any missed items. If you need help, don’t hesitate to ping the core team on the PR.

  6. Use tags in PR name for the following cases:

    • [blocked by #] if your work is dependent on other PRs.

    • [wip] when you start to re-edit your work, mark it so no one will accidentally merge it in meantime.

Question & Answer
How can I help/contribute?

All types of contributions are welcome - reporting bugs, fixing documentation, adding test cases, solving issues, and preparing bug fixes. To get started with code contributions, look for issues marked with the label good first issue or chose something close to your domain with the label help wanted. Before coding, make sure that the issue description is clear and comment on the issue so that we can assign it to you (or simply self-assign if you can).

Is there a recommendation for branch names?

We recommend you follow this convention <type>/<issue-id>_<short-name> where the types are: bugfix, feature, docs, or tests (but if you are using your own fork that’s optional).

How to add new tests?

We are using pytest with Finetuning Scheduler.

Here is the process to create a new test

    1. Find a file in tests/ which match what you want to test. If none, create one.

    1. Use this template to get started !

    1. Use BoringModel and derivatives to test out your code.

# TEST SHOULD BE IN YOUR FILE: tests/..../...py
# TEST CODE TEMPLATE

# [OPTIONAL] pytest decorator
# @pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires GPU machine")
def test_explain_what_is_being_tested(tmpdir):
    """
    Test description about text reason to be
    """

    class ExtendedModel(BoringModel):
        ...

    model = ExtendedModel()

    # BoringModel is a functional model. You might want to set methods to None to test your behaviour
    # Example: model.training_step_end = None

    trainer = Trainer(default_root_dir=tmpdir, ...)  # will save everything within a tmpdir generated for this test
    trainer.fit(model)
    trainer.test()  # [OPTIONAL]

    # assert the behaviour is correct.
    assert ...

run our/your test with

python -m pytest tests/..../...py::test_explain_what_is_being_tested -v --capture=no

Finetuning Scheduler Governance

This document describes governance processes we follow in developing the Finetuning Scheduler.

Persons of Interest

BDFL

Role: All final decisions related to Finetuning Scheduler.

  • Dan Dale (speediedan) (Finetuning Scheduler author)

Releases

Release cadence TBD

Project Management and Decision Making

TBD

API Evolution

For API removal, renaming or other forms of backward-incompatible changes, the procedure is:

  1. A deprecation process is initiated at version X, producing warning messages at runtime and in the documentation.

  2. Calls to the deprecated API remain unchanged in their function during the deprecation phase.

  3. Two minor versions in the future at version X+2 the breaking change takes effect.

The “X+2” rule is a recommendation and not a strict requirement. Longer deprecation cycles may apply for some cases.

New API and features are declared as:

  • Experimental: Anything labelled as experimental or beta in the documentation is considered unstable and should

    not be used in production. The community is encouraged to test the feature and report issues directly on GitHub.

  • Stable: Everything not specifically labelled as experimental should be considered stable. Reported issues will be

    treated with priority.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[0.1.3] - 2022-05-04

[0.1.3] - Added
[0.1.3] - Changed
  • bumped latest tested PL patch version to 1.6.3

[0.1.3] - Fixed
[0.1.3] - Deprecated

[0.1.2] - 2022-04-27

[0.1.2] - Added
  • added multiple badges (docker, conda, zenodo)

  • added build status matrix to readme

[0.1.2] - Changed
  • bumped latest tested PL patch version to 1.6.2

  • updated citation cff configuration to include all version metadata

  • removed tag-based trigger for azure-pipelines multi-gpu job

[0.1.2] - Fixed
[0.1.2] - Deprecated

[0.1.1] - 2022-04-15

[0.1.1] - Added
  • added conda-forge package

  • added docker release and pypi workflows

  • additional badges for readme, testing enhancements for oldest/newest pl patch versions

[0.1.1] - Changed
  • bumped latest tested PL patch version to 1.6.1, CLI example depends on PL logger fix (#12609)

[0.1.1] - Deprecated
[0.1.1] - Fixed
  • Addressed version prefix issue with readme transformation for pypi

[0.1.0] - 2022-04-07

[0.1.0] - Added
  • None (initial release)

[0.1.0] - Changed
  • None (initial release)

[0.1.0] - Deprecated
  • None (initial release)

[0.1.0] - Fixed
  • None (initial release)

Indices and tables


© Copyright Copyright (c) 2021-2022, Dan Dale. Revision e39dcb75.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: v0.1.3
Versions
latest
stable
v0.1.3
v0.1.2
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.