Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very long run time for very large dataset #596

Open
baptisteavot-ukdri opened this issue Nov 22, 2024 · 2 comments
Open

very long run time for very large dataset #596

baptisteavot-ukdri opened this issue Nov 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@baptisteavot-ukdri
Copy link

baptisteavot-ukdri commented Nov 22, 2024

Hello!

I have an input matrix of roughly 240k cells x 6.5k genes.

pySCENIC works fine so my question is not about a bug, more about whether it is possible to speed up the process in a smart way.
On the HPC that I use there is a limitation on the amount of time a job can run (maximum 72 hours).
I have run the GRN command with 64 CPUs and 920gb of RAM and it could not finish in 72 hours. I know I can request more CPUs but I'm having a hard time getting access to those nodes.

Would you have any recommendation on speeding the process up?

My command is the following:

log_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/log/

table_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/tables/

subcluster=Astro

resources_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/resources/

out_dir=/rds/general/project/ukdrmultiomicsproject/live/MAP_analysis/TREM2_enriched_scflow/pySCENIC/Astro/out/

START=$(date)
echo job started at $START

ulimit -S -n 4096

mkdir -p $log_dir

singularity run docker://aertslab/pyscenic:0.12.0 pyscenic grn
$table_dir/Astro.0.1.tsv
$resources_dir/allTFs_hg38.txt
--num_workers 64
--transpose
-o $out_dir/$subcluster.adjacencies.tsv &> $log_dir/$subcluster.grn.out

@baptisteavot-ukdri baptisteavot-ukdri added the bug Something isn't working label Nov 22, 2024
@withermatt
Copy link

I second this. I also run with the singularity image on a HPC.

@robertzeibich
Copy link

Hi @baptisteavot-ukdri.

Best your subset your data multiple times (max 80k cells).
Run those subsets on all TFs. Takes ~4 hours with 44CPUs.
Extract the TFs from all those runs.
Run all cells with the adjusted list of TFs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants