PassTrough Embeddings #139

EpicUsaMan · 2024-09-28T02:07:00Z

EpicUsaMan
Sep 28, 2024

in some tasks we need deep contextual embedding from LLM

Ways to include:

PCA over embeddings on preprocessing stage (unsupervised, linear)
LDA over embeddings on preprocessing stage (supervised, linear)
UMAP over embeddings on preprocessing stage (unsupervised, non-linear)
VectorNormalization over embeddings on preprocessing stage (unsupervised, linear)
-> np.linalg.norm(embd) and sum over it (in practice we can generate two features out of it, unit vector and magnitude)
Raw include
-> x = concat([self.embeddings(num_features, cat_features), emb_features])
-> x = self.embeddings(num_features, cat_features) + emb_features

AnFreTh · 2024-09-28T07:49:47Z

AnFreTh
Sep 28, 2024
Maintainer

Are you looking to extract the embeddings after a model is trained?
Or are you suggesting to include extracted embeddings from a LLM and using these as features? If the later, it is more a feature engineering step and can easily be done by adding the embeddings to the data.

0 replies

EpicUsaMan · 2024-09-28T13:25:42Z

EpicUsaMan
Sep 28, 2024
Author

Are you looking to extract the embeddings after a model is trained? Or are you suggesting to include extracted embeddings from a LLM and using these as features? If the later, it is more a feature engineering step and can easily be done by adding the embeddings to the data.

Yes, adding the embeddings

The problem is that it's computation intensive to have embeddings over embeddings

Raw include it's what I'm looking for, but other methods can be used as well to make it less computation intensive

0 replies

AnFreTh · 2024-09-28T13:37:46Z

AnFreTh
Sep 28, 2024
Maintainer

To clarify, you want to do something like this:

model = MambularRegressor()
model.fit(X_train, embeddings_train, y_train)

Where the embeddings_train are pre-computed embeddings of shape N x T x d, with N=length of X_train, T= sequence length of the pre-computed embeddings (could also be one if already pooled) and d = embedding dimension.
Then you want to simply concatenate these pre-trained embeddings onto the embeddings that are generated for the tabular data such that the overall embedding shape is

N x T+J x d, where J=number of features of X_train

 x = concat([self.embeddings(num_features, cat_features), emb_features], axis=1)
x.shape -> N, J+T, d

0 replies

EpicUsaMan · 2024-09-28T13:48:59Z

EpicUsaMan
Sep 28, 2024
Author

To clarify, you want to do something like this:
model = MambularRegressor()
model.fit(X_train, embeddings_train, y_train)
Where the embeddings_train are pre-computed embeddings of shape N x T x d, with N=length of X_train, T= sequence length of the pre-computed embeddings (could also be one if already pooled) and d = embedding dimension. Then you want to simply concatenate these pre-trained embeddings onto the embeddings that are generated for the tabular data such that the overall embedding shape is

N x T+J x d, where J=number of features of X_train
 x = concat([self.embeddings(num_features, cat_features), emb_features], axis=1)
x.shape -> N, J+T, d

Yes, exactly

Also, probably I missed something in documentation, but there is any way to make some specific feature categorical?
I've few numeric columns, which are better to be trained as numbers (not finite number of elements) and some are better to be trained as cat_features, but my finite-number features has less unique entries than my categorical :D (very task specific case)

0 replies

AnFreTh · 2024-09-29T07:58:06Z

AnFreTh
Sep 29, 2024
Maintainer

I am not quite sure whether this would a useful addition. It's a very specific usecase where other model structures - a task specific head on the embeddings and additional features, without subsequent Mamba/Attention - seem more sensible.
I will turn this into a discussion. If more people are interested in this we might include it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PassTrough Embeddings #139

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

PassTrough Embeddings #139

EpicUsaMan Sep 28, 2024

Replies: 5 comments

AnFreTh Sep 28, 2024 Maintainer

EpicUsaMan Sep 28, 2024 Author

AnFreTh Sep 28, 2024 Maintainer

EpicUsaMan Sep 28, 2024 Author

AnFreTh Sep 29, 2024 Maintainer

EpicUsaMan
Sep 28, 2024

AnFreTh
Sep 28, 2024
Maintainer

EpicUsaMan
Sep 28, 2024
Author

AnFreTh
Sep 28, 2024
Maintainer

EpicUsaMan
Sep 28, 2024
Author

AnFreTh
Sep 29, 2024
Maintainer