ValueError when trying temporal hold out splitting #30

omgwenxx · 2025-01-24T10:01:44Z

Describe the bug
Trying to split my dataset using a temporal hold out throws an error.

To Reproduce
Steps to reproduce the behavior:
config file

experiment:
  backend: pytorch
  data_config:
    strategy: dataset
    dataset_path: ../filtered_transactions_train_sample.tsv
  binarize: True
  splitting:
    save_on_disk: True
    save_folder: ../splits/
    test_splitting:
        strategy: temporal_hold_out
        test_ratio: 0.2
  dataset: hm
  top_k: 12
  evaluation:
    cutoffs: [ 12 ]
    simple_metrics: [ Recall, nDCG, MAP]
  gpu: 0
  external_models_path: ../external/models/__init__.py
  models:
    MostPop:
      meta:
        verbose: True
        save_recs: False

Sample of dataset (tried with no headers, and different camelCase writing e.g. userID, userId, userid)

userId	itemId	rating	timestamp
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318	663713001	5.0	2018-09-20
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318	541518023	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	505221004	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687003	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687004	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687001	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	505221001	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	688873012	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	501323011	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	598859003	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	688873020	5.0	2018-09-20

attach the script you use to run the experiment;

from elliot.run import run_experiment

run_experiment(f"config_files/split_hm.yml")

Error Stacktrace

Traceback (most recent call last):
  File "./Ducho-meets-Elliot/start_experiments.py", line 8, in <module>
    run_experiment(f"config_files/{args.config}.yml")
  File "./Ducho-meets-Elliot/elliot/run.py", line 59, in run_experiment
    dataloader = dataloader_class(config=base.base_namespace)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/dataset/dataset.py", line 112, in __init__
    self.tuple_list = splitter.process_splitting()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".r/Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 90, in process_splitting
    tuple_list = self.handle_hierarchy(data, splitting_ns.test_splitting)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 156, in handle_hierarchy
    tuple_list = self.splitting_temporal_holdout(data, float(valtest_splitting_ns.test_ratio))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 228, in splitting_temporal_holdout
    data['rank_first'] = data.groupby(['userId'])['timestamp'].rank(method='first', ascending=True, axis=1)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./ducho_env/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 4769, in rank
    axis = self.obj._get_axis_number(axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./ducho_env/lib/python3.11/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
    raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series

System details (please complete the following information):

OS: Ubuntu 22.04.5 LTS
Python Version 3.11.2
Version of the Libraries: elliot v0.3.1

The text was updated successfully, but these errors were encountered:

domenicodegioia · 2025-01-26T01:17:33Z

data['rank_first'] = data.groupby(['userId'])['timestamp'].rank(method='first', ascending=True, axis=1)

delete axis=1

notiche that Ducho-meets-Elliot has a different repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError when trying temporal hold out splitting #30

ValueError when trying temporal hold out splitting #30

omgwenxx commented Jan 24, 2025

domenicodegioia commented Jan 26, 2025 •

edited

Loading

ValueError when trying temporal hold out splitting #30

ValueError when trying temporal hold out splitting #30

Comments

omgwenxx commented Jan 24, 2025

domenicodegioia commented Jan 26, 2025 • edited Loading

domenicodegioia commented Jan 26, 2025 •

edited

Loading