Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when trying temporal hold out splitting #30

Open
omgwenxx opened this issue Jan 24, 2025 · 1 comment
Open

ValueError when trying temporal hold out splitting #30

omgwenxx opened this issue Jan 24, 2025 · 1 comment

Comments

@omgwenxx
Copy link

Describe the bug
Trying to split my dataset using a temporal hold out throws an error.

To Reproduce
Steps to reproduce the behavior:
config file

experiment:
  backend: pytorch
  data_config:
    strategy: dataset
    dataset_path: ../filtered_transactions_train_sample.tsv
  binarize: True
  splitting:
    save_on_disk: True
    save_folder: ../splits/
    test_splitting:
        strategy: temporal_hold_out
        test_ratio: 0.2
  dataset: hm
  top_k: 12
  evaluation:
    cutoffs: [ 12 ]
    simple_metrics: [ Recall, nDCG, MAP]
  gpu: 0
  external_models_path: ../external/models/__init__.py
  models:
    MostPop:
      meta:
        verbose: True
        save_recs: False

Sample of dataset (tried with no headers, and different camelCase writing e.g. userID, userId, userid)

userId	itemId	rating	timestamp
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318	663713001	5.0	2018-09-20
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318	541518023	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	505221004	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687003	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687004	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	685687001	5.0	2018-09-20
00007d2de826758b65a93dd24ce629ed66842531df6699338c5570910a014cc2	505221001	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	688873012	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	501323011	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	598859003	5.0	2018-09-20
00083cda041544b2fbb0e0d2905ad17da7cf1007526fb4c73235dccbbc132280	688873020	5.0	2018-09-20
  1. attach the script you use to run the experiment;
from elliot.run import run_experiment

run_experiment(f"config_files/split_hm.yml")

Error Stacktrace

Traceback (most recent call last):
  File "./Ducho-meets-Elliot/start_experiments.py", line 8, in <module>
    run_experiment(f"config_files/{args.config}.yml")
  File "./Ducho-meets-Elliot/elliot/run.py", line 59, in run_experiment
    dataloader = dataloader_class(config=base.base_namespace)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/dataset/dataset.py", line 112, in __init__
    self.tuple_list = splitter.process_splitting()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".r/Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 90, in process_splitting
    tuple_list = self.handle_hierarchy(data, splitting_ns.test_splitting)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 156, in handle_hierarchy
    tuple_list = self.splitting_temporal_holdout(data, float(valtest_splitting_ns.test_ratio))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./Ducho-meets-Elliot/elliot/splitter/base_splitter.py", line 228, in splitting_temporal_holdout
    data['rank_first'] = data.groupby(['userId'])['timestamp'].rank(method='first', ascending=True, axis=1)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./ducho_env/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 4769, in rank
    axis = self.obj._get_axis_number(axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./ducho_env/lib/python3.11/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
    raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series

System details (please complete the following information):

  • OS: Ubuntu 22.04.5 LTS
  • Python Version 3.11.2
  • Version of the Libraries: elliot v0.3.1
@domenicodegioia
Copy link

domenicodegioia commented Jan 26, 2025

data['rank_first'] = data.groupby(['userId'])['timestamp'].rank(method='first', ascending=True, axis=1)

delete axis=1

notiche that Ducho-meets-Elliot has a different repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants