How to best access a specific gradient transformation state? #278

andsteing · 2022-01-12T08:09:12Z

andsteing
Jan 12, 2022

Sometimes I need to access a specific gradient transformation state within opt_state.

Does Optax provide a utility for this? Is this an anti-pattern?

(In my case I have some utility creating a composed gradient transformation and I'd rather not access an inner gradient transformation state by a nested index because that feels too brittle.)

Currently I'm using

def find_state(state, cls):
  if isinstance(state, cls):
    return [state]
  if isinstance(state, tuple):
    return functools.reduce(
        operator.add, (find_state(child, cls) for child in state), [])
  return []

Then I can look up a specific state with e.g. find_state(opt_state, optax.ScaleByAdamState).

Answered by mkunesch

Jan 18, 2022

Hi! Thanks a lot for the question!

As far as I know there is no utility for this at the moment in optax, but I also wouldn't consider it an anti-pattern (e.g. I think it's the best way to log variables from the optimizer state #206).

For a simple optimizer state (and many are simple) I think it's fine to use the index. For more complicated chains I think it can be less readable to have e.g. state[2].

What would the nested index look like in your case? Would it only be known at runtime?

The reason I'm asking is that if it's a matter of readability, it might be possible to improve this without introducing a utility for searching within an optimizer state.

Thanks a lot for the question again!

View full answer

mkunesch · 2022-01-18T23:20:05Z

mkunesch
Jan 18, 2022
Maintainer

Hi! Thanks a lot for the question!

As far as I know there is no utility for this at the moment in optax, but I also wouldn't consider it an anti-pattern (e.g. I think it's the best way to log variables from the optimizer state #206).

For a simple optimizer state (and many are simple) I think it's fine to use the index. For more complicated chains I think it can be less readable to have e.g. state[2].

What would the nested index look like in your case? Would it only be known at runtime?

The reason I'm asking is that if it's a matter of readability, it might be possible to improve this without introducing a utility for searching within an optimizer state.

Thanks a lot for the question again!

3 replies

andsteing Jan 19, 2022
Author

In my case I have a utility that creates a complicated composed gradient transformation (parameter freezing, global gradient norm, masked weight decay, multiple learning rates, multiple learning rate schedules, parameter freezing) from a config. So the exact order of the nested gradient transformation depends on the config.

mkunesch Jan 19, 2022
Maintainer

Okay, in this case a find_state function like you already have seems like the best solution to me.

I wonder whether it's possible to implement this with a jax.tree_util function, perhaps something like:

def find_state(opt_state, cls):
  return jax.tree_util.tree_leaves(
      opt_state, is_leaf=lambda node: isinstance(node, cls))

The reason why I'm saying this is that we recently changed OptState to allow more general tree structures here.

andsteing Jan 19, 2022
Author

Ah right. Thanks for the heads-up. And the smart idea using jax.tree_leaves() for this :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to best access a specific gradient transformation state? #278

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to best access a specific gradient transformation state? #278

andsteing Jan 12, 2022

Replies: 1 comment · 3 replies

mkunesch Jan 18, 2022 Maintainer

andsteing Jan 19, 2022 Author

mkunesch Jan 19, 2022 Maintainer

andsteing Jan 19, 2022 Author

andsteing
Jan 12, 2022

Replies: 1 comment 3 replies

mkunesch
Jan 18, 2022
Maintainer

andsteing Jan 19, 2022
Author

mkunesch Jan 19, 2022
Maintainer

andsteing Jan 19, 2022
Author