strategy_adapters¶

Fine-Tuning Scheduler Strategy Adapters¶

class finetuning_scheduler.strategy_adapters.FSDPStrategyAdapter(awp_overrides=None, *args, **kwargs)[source]¶

Bases: finetuning_scheduler.strategy_adapters.base.StrategyAdapter

A StrategyAdapter that extends FinetuningScheduler (FTS) to support flexible, multi-phase, scheduled fine-tuning with the native Fully Sharded Data Parallel (FSDP) strategy ( DDPFullyShardedNativeStrategy).

As with standard FSDP usage, FSDP wrapping of a LightningModule can be performed either by providing an auto_wrap_policy or (for maximal control) by overriding the configure_sharded_model method of LightningModule and manually wrapping the module.

In order to support multi-phase scheduled fine-tuning with FSDP, FTS’s key precondition is that the defined fine-tuning schedule phases have disjoint sets of FSDP-flattened parameters (i.e. FlatParameter s, which are created when wrapping a set of modules in a FSDP instance/unit). This constraint is derived from the fact that the requires_grad attribute currently must be the same for all parameters flattened into the same FlatParameter.

To facilitate module wrapping in alignment with fine-tuning schedule phases, FTS provides the awp_overrides feature which allows users to provide module name-based complements to a given auto_wrap_policy. See the Example: Multi-Phase Scheduled Fine-Tuning with FSDP tutorial for a concrete example and additional guidance.

FTS will attempt to validate that the module is wrapped in a manner that aligns with the defined fine-tuning schedule phases prior to the start of training and provided detailed feedback for the user if a misalignment is discovered.

Warning

FSDPStrategyAdapter is in BETA and subject to change. The interface can bring breaking changes and new features with the next release of PyTorch.

Note

This version of FSDPStrategyAdapter supports stable PyTorch releases >= 1.13. Support for PyTorch 2.0 is expected upon its release.

Note

The no_decay attribute that FTS supports on LightningModule with the base StrategyAdapter is not currently supported in the context of FSDP fine-tuning.

Tip

Because of inter-module dependencies (among other reasons), wrapping every submodule in its own separate FSDP instance is often not a viable approach to ensuring fine-tuning schedule/module wrapping alignment. Starting with a provided auto_wrap_policy (e.g. transformer_auto_wrap_policy) and providing module name-based complements as needed using awp_overrides is often the most expedient approach to auto-wrapping in alignment with a fine-tuning schedule. As always, if needed, one can override configure_sharded_model and manually wrap a given LightningModule to align with a desired fine-tuning schedule.

The only user-facing configuration for FSDPStrategyAdapter is awp_overrides, an optional list of module names that should be wrapped in separate FSDP instances, complementing the modules that would be individually wrapped by auto_wrap_policy provided in the DDPFullyShardedNativeStrategy strategy configuration.

Parameters: awp_overrides¶ (Optional[List]) – A list of module names to wrap in separate FSDP instances (i.e., auto_wrap_policy overrides). Only applicable when complementing/overriding an auto_wrap_policy provided in the DDPFullyShardedNativeStrategy strategy configuration. Override lists will be ignored when manually wrapping modules via a configure_sharded_model method. If the named modules cannot be found, an exception will be thrown. Defaults to None.

awp_overrides¶: A list of module names to wrap in separate FSDP instances.

fsdp_param_transform(orig_thaw_pl)[source]¶

The parameter transformation function currently used by fts_optim_view() to transform original parameter lists for optimizer operations.

Parameters: orig_thaw_pl¶ (List) – The original parameter name list before FSDP’s transformation of them.
Returns: A transformed parameter name list that matches the current Optimizer’s view of them after FSDP’s transformation of the original parameter names.
Return type: List

fts_optim_view(orig_pl)[source]¶

Because FSDP performs parameter transformations that cause the current.

Optimizer’s view of parameter names to diverge from the original parameter names, this parameter transformation is required for optimizer operations.

Parameters: orig_pl¶ (List) – The original parameter name list before FSDP’s transformation of them.
Returns: A transformed parameter name list that matches the current Optimizer’s view of them after FSDP’s transformation of the original parameter names.
Return type: List

load_optimizer_state_dict(checkpoint_connector)[source]¶

Override the default load_optimizer_state_dict method so that we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.

Parameters: checkpoint_connector¶ (CheckpointConnector) – The CheckpointConnector associated with the current training session.
Return type: None

logical_param_translation(param_names)[source]¶

Effectively the reverse transformation of fts_optim_view().

Parameters: param_names¶ (List) – A parameter name list from the current Optimizer’s view of them after FSDP’s transformation of the original parameter names.
Returns: The original parameter name list before a given FSDP’s transformation.
Return type: List

on_after_init_fts()[source]¶

To accommodate FSDP, we defer executing the first fine-tuning phase that would otherwise be executed in this hook, which fires in FinetuningScheduler setup immediately after init_fts()

Return type: None

on_before_fts_fit_start()[source]¶

In this hook executed immediately before the FinetuningScheduler on_fit_start() hook begins, we ensure the provided fine-tuning schedule and FSDP wrapped LightningModule are appropriately aligned and valid. If the fine-tuning schedule and wrapped module are detected to be incompatible, detailed feedback is provided to the user (which is why multiple checks are aggregated before returning any alignment exceptions).

Raises: MisconfigurationException – If any FTS FSDP fine-tuning schedule/module wrapping alignment exceptions are thrown. The provided exceptions provide detailed feedback for the user to address the misalignment.
Return type: None

on_before_init_fts()[source]¶

In this hook executed immediately before init_fts(), to accommodate FSDP we:

Disable Lightning’s restoration of the optimizer to allow us to implement special handling
Prune no_decay specification since it is not currently supported in the context of FSDP fine-tuning
Validate the awp_overrides configuration
Configure FTS wrapping of the provided LightningModule to either use the provided LightningModule.configure_sharded_model method (if present) or a provided auto_wrap_policy.

Return type: None

on_before_restore_optimizers_and_lrs()[source]¶

Allow the FSDPStrategyAdapter to override the default load_optimizer_state_dict method.

This is necessary so we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.

Return type: None

property lightning_restore_optimizer: bool¶

Disable Lightning’s restoration of the optimizer to allow FTS to implement special handling.

Returns: Returns False to allow FTS control over optimizer restoration.
Return type: bool

class finetuning_scheduler.strategy_adapters.StrategyAdapter[source]¶

Bases: object

Base class for all strategy adapters. Implements the default FinetuningScheduler hooks. Can be subclassed to extend FinetuningScheduler support for a complex or custom Strategy via an associated StrategyAdapter.

Warning

StrategyAdapter is in BETA and subject to change. The interface can bring breaking changes and new features with the next release of FTS.

Tip

If you want to extend FTS to use a custom, currently unsupported strategy or override current FTS behavior in the context of a given training strategy, subclassing StrategyAdapter is a way to do so. See FSDPStrategyAdapter for an example implementation.

The default fine-tuning phase execution function is set on StrategyAdapter initialization. This can be overridden by StrategyAdapter subclasses to adapt fine-tuning phase execution to meet strategy-specific requirements.

static base_ft_phase(module, thaw_pl, translation_func=None, init_thaw=False)[source]¶

Thaw/unfreeze the provided list of parameters in the provided Module

Parameters

module¶ (Module) – The Module that will have parameters selectively unfrozen/thawed.
thaw_pl¶ (List) – The list of parameters that should be thawed/unfrozen in the Module
init_thaw¶ (bool) – If True, modifies message to user accordingly. Defaults to False.

Returns

A Tuple of two lists.

The list of newly thawed/unfrozen parameters thawed by this function
A list of all currently thawed/unfrozen parameters in the target Module

Return type

Tuple[List, List]

connect(fts_parent)[source]¶

Create a handle for the associated FinetuningScheduler instance.

Parameters: fts_parent¶ (Callback) – The associated FinetuningScheduler instance
Return type: None

fts_optim_view(orig_pl)[source]¶

A method that can be overridden by a StrategyAdapter if a.

Strategy performs parameter transformations that cause the current Optimizer’s view of parameter names to diverge from the original parameter names. By default, no transformation of schedule parameter names is required for optimizer operations.

Parameters

orig_pl¶ (List) – The original parameter name list before a given Strategy’s transformation of them.

Returns

A transformed parameter name list that matches the current: Optimizer’s view of them after a given Strategy’s transformation of the original parameter names.

Return type

List

logical_param_translation(param_names)[source]¶

Effectively the reverse transformation of fts_optim_view(). Can be overridden by a StrategyAdapter if a Strategy performs parameter transformations that cause the original user view of parameter names to diverge from the current Optimizer’s view. By default, no transformation of Optimizer parameter names is required.

Parameters

param_names¶ (List) – A parameter name list from the current Optimizer’s view of them after a Strategy’s transformation of the original parameter names.

Returns

The original parameter name list before a given: Strategy’s transformation.

Return type

List

on_after_init_fts()[source]¶

Hook executed in FinetuningScheduler setup immediately after init_fts().

Return type: None

on_before_fts_fit_start()[source]¶

Hook executed immediately before the FinetuningScheduler on_fit_start() hook begins.

Return type: None

on_before_init_fts()[source]¶

Hook executed in FinetuningScheduler setup immediately before init_fts()

Return type: None

on_before_restore_optimizers_and_lrs()[source]¶

Hook executed immediately before FinetuningScheduler restores optimizers and schedulers.

Return type: None

property pl_module: pytorch_lightning.core.module.LightningModule¶

Convenient access to the LightningModule being fine- tuned.

Returns: The user’s LightningModule
Return type: LightningModule

property pls_handle: pytorch_lightning.strategies.strategy.Strategy¶

Convenient access to the current Strategy in use.

Returns: The Strategy in use.
Return type: Strategy