Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue] running pySCENIC on large datasets #580

Open
li-xuyang28 opened this issue Sep 26, 2024 · 1 comment
Open

[issue] running pySCENIC on large datasets #580

li-xuyang28 opened this issue Sep 26, 2024 · 1 comment
Labels
question Further information is requested

Comments

@li-xuyang28
Copy link

I am running pySCENIC using the singularity container with scipy (aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif) on a decently large dataset on a HPC, with a 150G memory and 40 cores allocation (salloc -J interact -N 1-1 -n 40 --mem=150G --time=2:00:00 -p parallel srun --pty bash). I had been able to create meta cells and run the pipeline, however would still like to examine the results with the original sc data if possible. I ran into the following issue with the command shown:

arboreto_with_multiprocessing.py \
    /home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/filtered.loom \
    /home/xli324/data-kkinzle1/xli324/resources/allTFs_hg38.txt  \
    --method grnboost2 \
    --output /home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/adj.tsv \
    --num_workers 40 \
    --seed 777
Loaded expression matrix of 230586 cells and 15431 genes in 117.41096949577332 seconds...
Loaded 1892 TFs...
starting grnboost2 using 40 processes...
  0%|                                                                                                                                                                             | 0/15431 [00:00<?, ?it/s]Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
MemoryError
Killed

I was wondering if you would have any suggestions? I have tried to also downsample to a certain extent without much luck. Is there any chance that the GPU support is something that has been considered? Thanks!

@li-xuyang28 li-xuyang28 added the question Further information is requested label Sep 26, 2024
@ghuls
Copy link
Member

ghuls commented Oct 10, 2024

Run with less number of workers to reduce memory pressure (and avoiding killing of your processes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants