strategy_adapters¶
Fine-Tuning Scheduler Strategy Adapters¶
- class finetuning_scheduler.strategy_adapters.FSDPStrategyAdapter(awp_overrides=None, *args, **kwargs)[source]¶
Bases:
finetuning_scheduler.strategy_adapters.base.StrategyAdapter
A
StrategyAdapter
that extendsFinetuningScheduler
(FTS) to support flexible, multi-phase, scheduled fine-tuning with the native Fully Sharded Data Parallel (FSDP) strategy (DDPFullyShardedNativeStrategy
).As with standard FSDP usage, FSDP wrapping of a
LightningModule
can be performed either by providing anauto_wrap_policy
or (for maximal control) by overriding theconfigure_sharded_model
method ofLightningModule
and manually wrapping the module.In order to support multi-phase scheduled fine-tuning with FSDP, FTS’s key precondition is that the defined fine-tuning schedule phases have disjoint sets of FSDP-flattened parameters (i.e.
FlatParameter
s, which are created when wrapping a set of modules in a FSDP instance/unit). This constraint is derived from the fact that therequires_grad
attribute currently must be the same for all parameters flattened into the sameFlatParameter
.To facilitate module wrapping in alignment with fine-tuning schedule phases, FTS provides the
awp_overrides
feature which allows users to provide module name-based complements to a givenauto_wrap_policy
. See the Example: Multi-Phase Scheduled Fine-Tuning with FSDP tutorial for a concrete example and additional guidance.FTS will attempt to validate that the module is wrapped in a manner that aligns with the defined fine-tuning schedule phases prior to the start of training and provided detailed feedback for the user if a misalignment is discovered.
Warning
FSDPStrategyAdapter
is in BETA and subject to change. The interface can bring breaking changes and new features with the next release of PyTorch.Note
This version of
FSDPStrategyAdapter
supports stable PyTorch releases >= 1.13. Support for PyTorch 2.0 is expected upon its release.Note
The
no_decay
attribute that FTS supports onLightningModule
with the baseStrategyAdapter
is not currently supported in the context of FSDP fine-tuning.Tip
Because of inter-module dependencies (among other reasons), wrapping every submodule in its own separate FSDP instance is often not a viable approach to ensuring fine-tuning schedule/module wrapping alignment. Starting with a provided
auto_wrap_policy
(e.g.transformer_auto_wrap_policy
) and providing module name-based complements as needed usingawp_overrides
is often the most expedient approach to auto-wrapping in alignment with a fine-tuning schedule. As always, if needed, one can overrideconfigure_sharded_model
and manually wrap a givenLightningModule
to align with a desired fine-tuning schedule.The only user-facing configuration for
FSDPStrategyAdapter
isawp_overrides
, an optional list of module names that should be wrapped in separate FSDP instances, complementing the modules that would be individually wrapped byauto_wrap_policy
provided in theDDPFullyShardedNativeStrategy
strategy configuration.- Parameters
awp_overrides¶ (Optional[List]) – A list of module names to wrap in separate FSDP instances (i.e.,
auto_wrap_policy
overrides). Only applicable when complementing/overriding anauto_wrap_policy
provided in theDDPFullyShardedNativeStrategy
strategy configuration. Override lists will be ignored when manually wrapping modules via aconfigure_sharded_model
method. If the named modules cannot be found, an exception will be thrown. Defaults to None.
- awp_overrides¶
A list of module names to wrap in separate FSDP instances.
- fsdp_param_transform(orig_thaw_pl)[source]¶
The parameter transformation function currently used by
fts_optim_view()
to transform original parameter lists for optimizer operations.
- fts_optim_view(orig_pl)[source]¶
Because FSDP performs parameter transformations that cause the current.
Optimizer
’s view of parameter names to diverge from the original parameter names, this parameter transformation is required for optimizer operations.
- load_optimizer_state_dict(checkpoint_connector)[source]¶
Override the default
load_optimizer_state_dict
method so that we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.
- logical_param_translation(param_names)[source]¶
Effectively the reverse transformation of
fts_optim_view()
.
- on_after_init_fts()[source]¶
To accommodate FSDP, we defer executing the first fine-tuning phase that would otherwise be executed in this hook, which fires in
FinetuningScheduler
setup immediately afterinit_fts()
- Return type
- on_before_fts_fit_start()[source]¶
In this hook executed immediately before the
FinetuningScheduler
on_fit_start()
hook begins, we ensure the provided fine-tuning schedule and FSDP wrappedLightningModule
are appropriately aligned and valid. If the fine-tuning schedule and wrapped module are detected to be incompatible, detailed feedback is provided to the user (which is why multiple checks are aggregated before returning any alignment exceptions).- Raises
MisconfigurationException – If any FTS FSDP fine-tuning schedule/module wrapping alignment exceptions are thrown. The provided exceptions provide detailed feedback for the user to address the misalignment.
- Return type
- on_before_init_fts()[source]¶
In this hook executed immediately before
init_fts()
, to accommodate FSDP we:Disable Lightning’s restoration of the optimizer to allow us to implement special handling
Prune
no_decay
specification since it is not currently supported in the context of FSDP fine-tuningValidate the
awp_overrides
configurationConfigure FTS wrapping of the provided
LightningModule
to either use the providedLightningModule.configure_sharded_model
method (if present) or a providedauto_wrap_policy
.
- Return type
- on_before_restore_optimizers_and_lrs()[source]¶
Allow the
FSDPStrategyAdapter
to override the defaultload_optimizer_state_dict
method.This is necessary so we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.
- Return type
- class finetuning_scheduler.strategy_adapters.StrategyAdapter[source]¶
Bases:
object
Base class for all strategy adapters. Implements the default
FinetuningScheduler
hooks. Can be subclassed to extendFinetuningScheduler
support for a complex or customStrategy
via an associatedStrategyAdapter
.Warning
StrategyAdapter
is in BETA and subject to change. The interface can bring breaking changes and new features with the next release of FTS.Tip
If you want to extend FTS to use a custom, currently unsupported strategy or override current FTS behavior in the context of a given training strategy, subclassing
StrategyAdapter
is a way to do so. SeeFSDPStrategyAdapter
for an example implementation.The default fine-tuning phase execution function is set on
StrategyAdapter
initialization. This can be overridden byStrategyAdapter
subclasses to adapt fine-tuning phase execution to meet strategy-specific requirements.- static base_ft_phase(module, thaw_pl, translation_func=None, init_thaw=False)[source]¶
Thaw/unfreeze the provided list of parameters in the provided
Module
- Parameters
- Returns
- A Tuple of two lists.
The list of newly thawed/unfrozen parameters thawed by this function
A list of all currently thawed/unfrozen parameters in the target
Module
- Return type
Tuple[List, List]
- connect(fts_parent)[source]¶
Create a handle for the associated
FinetuningScheduler
instance.- Parameters
fts_parent¶ (Callback) – The associated
FinetuningScheduler
instance- Return type
- fts_optim_view(orig_pl)[source]¶
A method that can be overridden by a
StrategyAdapter
if a.Strategy
performs parameter transformations that cause the currentOptimizer
’s view of parameter names to diverge from the original parameter names. By default, no transformation of schedule parameter names is required for optimizer operations.
- logical_param_translation(param_names)[source]¶
Effectively the reverse transformation of
fts_optim_view()
. Can be overridden by aStrategyAdapter
if aStrategy
performs parameter transformations that cause the original user view of parameter names to diverge from the currentOptimizer
’s view. By default, no transformation ofOptimizer
parameter names is required.
- on_after_init_fts()[source]¶
Hook executed in
FinetuningScheduler
setup immediately afterinit_fts()
.- Return type
- on_before_fts_fit_start()[source]¶
Hook executed immediately before the
FinetuningScheduler
on_fit_start()
hook begins.- Return type
- on_before_init_fts()[source]¶
Hook executed in
FinetuningScheduler
setup immediately beforeinit_fts()
- Return type
- on_before_restore_optimizers_and_lrs()[source]¶
Hook executed immediately before
FinetuningScheduler
restores optimizers and schedulers.- Return type
- property pl_module: pytorch_lightning.core.module.LightningModule¶
Convenient access to the
LightningModule
being fine- tuned.- Returns
The user’s
LightningModule
- Return type
LightningModule
- property pls_handle: pytorch_lightning.strategies.strategy.Strategy¶
Convenient access to the current
Strategy
in use.- Returns
The
Strategy
in use.- Return type
Strategy