You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of having a script for each dataset, instead make a parser for each dataset, register the parser and process the dataset.
Create a generic class that represent a parser
fromabcimportABC, abstractmethodclassParser(ABC):
@abstractmethoddef__init__(self, data_path: str, fold: int, val_split: float):
self.data_path=data_pathself.val_split=val_splitself.classes: dict=self._get_classes()
@abstractmethoddef_get_classes(self):
"""Get the class idx and class names. Returns: dict[str, int]: class name, class id """@abstractmethoddefprocess(self):
"""Process the dataset, generating the train,val,test splits"""defsplit_train_val(
self, train_data: pd.DataFrame
) ->Tuple[pd.DataFrame, pd.DataFrame]:
total_len=len(train_data)
val_len=int(total_len*self.val_split)
train_len=total_len-val_lenshuffled=train_data.sample(frac=1).reset_index(drop=True)
returnshuffled.iloc[:train_len], shuffled.iloc[train_len:]
defsave_csv(self, data: pd.DataFrame, file_name: str):
data.to_csv(
os.path.join(self.data_path, file_name),
sep=" ",
index=False,
header=False,
)
Based on this class, abstract from it and create the parser for each dataset. Create a main that instanciates the correct parser based on the arguments etc..
The text was updated successfully, but these errors were encountered:
Instead of having a script for each dataset, instead make a parser for each dataset, register the parser and process the dataset.
Create a generic class that represent a parser
Based on this class, abstract from it and create the parser for each dataset. Create a main that instanciates the correct parser based on the arguments etc..
The text was updated successfully, but these errors were encountered: