very long run time for very large dataset #596

baptisteavot-ukdri · 2024-11-22T10:15:04Z

Hello!

I have an input matrix of roughly 240k cells x 6.5k genes.

pySCENIC works fine so my question is not about a bug, more about whether it is possible to speed up the process in a smart way.
On the HPC that I use there is a limitation on the amount of time a job can run (maximum 72 hours).
I have run the GRN command with 64 CPUs and 920gb of RAM and it could not finish in 72 hours. I know I can request more CPUs but I'm having a hard time getting access to those nodes.

Would you have any recommendation on speeding the process up?

My command is the following:

log_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/log/

table_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/tables/

subcluster=Astro

resources_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/resources/

out_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/out/

START=$(date)
echo job started at $START

ulimit -S -n 4096

mkdir -p $log_dir

singularity run docker://aertslab/pyscenic:0.12.0 pyscenic grn
$table_dir/Astro.0.1.tsv
$resources_dir/allTFs_hg38.txt
--num_workers 64
--transpose
-o $out_dir/$subcluster.adjacencies.tsv &> $log_dir/$subcluster.grn.out

withermatt · 2025-01-11T21:38:38Z

I second this. I also run with the singularity image on a HPC.

robertzeibich · 2025-01-15T01:45:29Z

Hi @baptisteavot-ukdri.

Best your subset your data multiple times (max 80k cells).
Run those subsets on all TFs. Takes ~4 hours with 44CPUs.
Extract the TFs from all those runs.
Run all cells with the adjusted list of TFs.

baptisteavot-ukdri added the bug Something isn't working label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very long run time for very large dataset #596

very long run time for very large dataset #596

baptisteavot-ukdri commented Nov 22, 2024 •

edited

Loading

withermatt commented Jan 11, 2025

robertzeibich commented Jan 15, 2025

very long run time for very large dataset #596

very long run time for very large dataset #596

Comments

baptisteavot-ukdri commented Nov 22, 2024 • edited Loading

withermatt commented Jan 11, 2025

robertzeibich commented Jan 15, 2025

baptisteavot-ukdri commented Nov 22, 2024 •

edited

Loading