MultiSteps & state.steps & warmup #324
-
Hello, Two questions regarding MultiSteps:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hello @agemagician! Taking a look at: It looks like the inner optimizer is only called each time "final_step" is called. Since optax works by chaining together GradientTransformations, and usually the step count is used by GradientTransformations (such as the learning rate schedule), the answer to your question depends on how the GradientTransformations are chained together: e.g. if the schedule is applied before the multi step transformation, it will be applied once on every step (whether or not accumulation is done), but if it's chained after the multistep transformation, it will be applied once on every accumulated step. Does this help? |
Beta Was this translation helpful? Give feedback.
Hello @agemagician!
Taking a look at:
https://github.com/deepmind/optax/blob/3fb68179604e349c3083ad12cd2e38ff8713f613/optax/_src/wrappers.py#L184
It looks like the inner optimizer is only called each time "final_step" is called. Since optax works by chaining together GradientTransformations, and usually the step count is used by GradientTransformations (such as the learning rate schedule), the answer to your question depends on how the GradientTransformations are chained together:
e.g. if the schedule is applied before the multi step transformation, it will be applied once on every step (whether or not accumulation is done), but if it's chained after the multistep transformation, it will …