You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are already running SDV, please indicate the following details about the environment in
which you are running it:
SDV version: 1.17.3
Python version: 3.10.11
Operating System: Windows
Problem description
I'm trying to compare different synthesizers, specifically GaussianCopula, TVAE, CTGAN, and CopulaGAN. With the last three, I encountered the problem of the synthetic data containing floats instead of integers in the same columns as the original data. I was wondering if there is any hyperparameter in these synthesizers (I couldn't find it in the documentation) to round these values during training or sampling.
What I already tried
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
The text was updated successfully, but these errors were encountered:
Hi @CarlangaUC, nice to meet you. I've seen this issue once before with a different user, but we never really got to the bottom of it 100%. It could be great if you're able to help us debug this!
My hunch is that is related to how you are loading and storing your real data in Python (data that is used for fitting). The problem appears to be unrelated to the actual ML modeling, which means that updating hyperparameters or other settings won't have any effects.
To get to the bottom of this, would you be able to share your code from loading the data into Python up until sampling from the synthesizer? In particular, I'm curious what format your original data is in. Are you updating or modifying your data in anyway after loading it into Python?
We've tested the SDV explicitly with the case of reading a CSV file into Python and passing it directly (unmodified) into a synthesizer. As a result, the data table should be stored as floats, ints, objects, etc. (dtypes of the pandas DataFrame).
Environment details
If you are already running SDV, please indicate the following details about the environment in
which you are running it:
Problem description
I'm trying to compare different synthesizers, specifically GaussianCopula, TVAE, CTGAN, and CopulaGAN. With the last three, I encountered the problem of the synthetic data containing floats instead of integers in the same columns as the original data. I was wondering if there is any hyperparameter in these synthesizers (I couldn't find it in the documentation) to round these values during training or sampling.
What I already tried
The text was updated successfully, but these errors were encountered: