Rewrite the generation of csv files (splits) into a parser format #38

martinwholtmon · 2024-03-16T14:26:30Z

Instead of having a script for each dataset, instead make a parser for each dataset, register the parser and process the dataset.

Create a generic class that represent a parser

from abc import ABC, abstractmethod

class Parser(ABC):
    @abstractmethod
    def __init__(self, data_path: str, fold: int, val_split: float):
        self.data_path = data_path
        self.val_split = val_split
        self.classes: dict = self._get_classes()

    @abstractmethod
    def _get_classes(self):
        """Get the class idx and class names.
        
        Returns:
            dict[str, int]: class name, class id
        """

    @abstractmethod
    def process(self):
        """Process the dataset, generating the train,val,test splits"""

    def split_train_val(
        self, train_data: pd.DataFrame
    ) -> Tuple[pd.DataFrame, pd.DataFrame]:
        total_len = len(train_data)
        val_len = int(total_len * self.val_split)
        train_len = total_len - val_len

        shuffled = train_data.sample(frac=1).reset_index(drop=True)
        return shuffled.iloc[:train_len], shuffled.iloc[train_len:]

    def save_csv(self, data: pd.DataFrame, file_name: str):
        data.to_csv(
            os.path.join(self.data_path, file_name),
            sep=" ",
            index=False,
            header=False,
        )

Based on this class, abstract from it and create the parser for each dataset. Create a main that instanciates the correct parser based on the arguments etc..

martinwholtmon added enhancement New feature or request maybe labels Mar 16, 2024

martinwholtmon added the wontfix This will not be worked on label Jun 13, 2024

martinwholtmon closed this as completed Jun 13, 2024

martinwholtmon reopened this Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the generation of csv files (splits) into a parser format #38

Rewrite the generation of csv files (splits) into a parser format #38

martinwholtmon commented Mar 16, 2024 •

edited

Loading

Rewrite the generation of csv files (splits) into a parser format #38

Rewrite the generation of csv files (splits) into a parser format #38

Comments

martinwholtmon commented Mar 16, 2024 • edited Loading

martinwholtmon commented Mar 16, 2024 •

edited

Loading