-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RepeatingBasisFunction.inverse_transform
#687
base: main
Are you sure you want to change the base?
Conversation
…verse_transform() in the underlying _RepeatingBasisFunction, which can be accessed through self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform() is needed (rare use case for sure).
…tting output format to pandas work, a second to make sure that the inverse_transform() truly returns the original column (as long as original are within the input_range).
Requesting review and approval for the tests, @koaning |
I am open to the inverse transform, but I think the implementation may change in the future in favor our scikit-learn pipelines. This transformer was added before the @FBruzzesi figured I'd check with you, do you have any comments/opinions on this one? I am open to adding it, just wondering what might be most pragmatic given the sklearn estimator that now exists. |
Thanks, I had missed the possibility to set SplineTransformer to periodic! The result is pretty similar indeed. Now I see that sklearn also has an open issue regarding inverse transform on Spline transformer: scikit-learn/scikit-learn#28551 |
Hey @Alex-Cremers , thanks for the PR, it is much appreciated! I can take a close look during the weekend, from a sneakpeak I would say that some linting is required. Every other technical comment will follow in the weekend 😂
I am generally in favor to keep what we have bug free but not expanding (and maintaining) new features if they are already in scikit-learn - which seems not to be the case for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the PR. I think it is a nice feature to have in here!
As first comment, please run the following to format and lint the code:
python -m pip install ruff
make lint
I think I am missing something on why RepeatingBasisFunction
re-implements some logic in the inverse_transform
and does not just call
self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(X)
I feel like we are separating the logic while there is no real need for it - in fact they return different shape objects and I would not be expecting that.
def inverse_transform(self, X): | ||
"""Transform RBF features back to the input range. Outputs a numpy array. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add what you mention in the description in the docstring Notes of the method?
In particular I am referring to:
Note that the transformation is only invertible if the original values are in the input_range (upper bound excluded). Otherwise, the reconstructed values are only equal modulo the width of the range.
check_is_fitted(self, ["pipeline_"]) | ||
|
||
if isinstance(X,np.ndarray): | ||
Xarr = check_array(X[:,:self.n_periods], estimator=self, ensure_2d=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more comfortable with something along the following lines:
If line 144 becomes
Xarr = check_array(X, estimator=self, ensure_2d=True, ensure_min_features=self.n_periods)[:, :self.n_periods]
which also convert to array (some) dataframe-like objects and maybe it's even possible to avoid checking for array instance.
Z = tf.fit(X, y).transform(X) | ||
assert np.allclose( | ||
X["a"], | ||
tf.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Z), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit too nested to me.
Why don't we
tf.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Z), | |
tf.inverse_transform(Z), |
?
They should suppose to return the same or am I missing something?
if isinstance(X,np.ndarray): | ||
Xarr = check_array(X[:,:self.n_periods], estimator=self, ensure_2d=True) | ||
new_x = self.pipeline_.named_transformers_['repeatingbasis'].inverse_transform(Xarr) | ||
Xarr = np.hstack((new_x.reshape(-1, 1),X[:,self.n_periods:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning the input as well? kind of related to the comment in tests
@@ -208,7 +243,27 @@ def transform(self, X): | |||
|
|||
# apply rbf function to series for each basis | |||
return self._rbf(base_distances) | |||
|
|||
|
|||
def inverse_transform(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome ✨
Thanks for the thorough review! Sorry for the lint, I only ran "ruff check" and it passed, so I didn't dig deeper! Regarding the inverse_transform: I actually meant to remove it from RepeatingBasisFunction before submitting, precisely because it was a big confusing, but it turns out I only removed part of it. There are two reasons why it's more complicated than _RepeatingBasisFunction:
As far as I can see, the best would probably be to remove the inverse_transform from RepeatingBasisFunction entirely for now and wait for scikit-learn/scikit-learn#11463 (inverse_transform for ColumnTransformer) and the other issue mentioned above to be solved, as there would then be a proper way to implement the inverse transform without relying on funny tricks. If a user really needs the inverse transform in the meantime, they can recover the function as I did in the test. If you agree with this, I'll remove |
Description
This PR implements
get_feature_names_out()
for_RepeatingBasisFunction
, andRepeatingBasisFunction
(which inherits feature names from the former).This PR also implements an inverse_transform for
_RepeatingBasisFunction
in passing. I did not include a more general implementation forRepeatingBasisFunction
(as the one I use requires importing pandas), but the inverse_transform() can be accessed from the fitted transformer via.pipeline_.named_transformers_['repeatingbasis'].inverse_transform()
). It's a rare use case, but it shouldn't affect other uses in any way, so I figured I'd include it. Note that the transformation is only invertible if the original values are in the input_range (upper bound excluded). Otherwise, the reconstructed values are only equal modulo the width of the range.Two tests have been added to test_repeatingbasisfunction.py:
RepeatingBasisFunction
inverse_transform()
for_RepeatingBasisFunction
truly recovers the original values (as long as they fall within the input_range).Note that for feature names, ClassNamePrefixFeaturesOutMixin could also be used if
self.n_periods
was renamed toself._n_features_out
, but I didn't see a simple way of keeping the original column name as prefix, so I adopted a solution perhaps more idiosyncratic.It's a minor PR overall, so I didn't ping before submitting. I hope it's okay.
Progress on Issue #543
Type of change
Checklist:
If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.