Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following integration of multiple samples, SVD producing all NA cell embeddings #1786

Open
gherrgo opened this issue Sep 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@gherrgo
Copy link

gherrgo commented Sep 25, 2024

Hi guys, looking for some help on an issue I cannot seem to solve. I have been following the integration pipeline (https://stuartlab.org/signac/articles/merging) for a set of 19 samples, and am having some issues when it comes to processing data following a merge. My individual sample pipeline looks something like this:

Reading in peaks

H9A <- Read10X_h5("09A_counts/outs/filtered_peak_bc_matrix.h5", use.names = T)
peaks_H9A <- read.table(
file = "09A_counts/outs/peaks.bed",
col.names = c("chr", "start", "end"))
gr.H9A <- makeGRangesFromDataFrame(peaks_H9A)

(defining combined_peaks using all objects)

Loading metadata

md.H9A <- read.table(
file = "09A_counts/outs/singlecell.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row

Initial filtering of low-count cells

plot(md.H9A$passed_filters)
md.H9A <- md.H9A[md.H9A$passed_filters > 100, ]

create fragment objects

frags.H9A <- CreateFragmentObject(
path = "09A_counts/outs/fragments.tsv.gz",
cells = rownames(md.H9A)
)

Quantifying peaks

H9A.counts <- FeatureMatrix(
fragments = frags.H9A,
features = combined.peaks,
cells = rownames(md.H9A)
)

Creating objects

H9A_assay <- CreateChromatinAssay(H9A.counts, fragments = frags.H9A)
s.H9A <- CreateSeuratObject(H9A_assay, assay = "ATAC", meta.data=md.H9A, project="H9A")
ncol(s.H9A) #2119
Annotation(s.H9A) <- annotation

Quality control - per sample

grange <- StringToGRanges(rownames(s.H9A), sep = c(":", "-"))
grange.use <- seqnames(grange) %in% standardChromosomes(grange)
s.H9A <- s.H9A[as.vector(grange.use), ]
s.H9A <- NucleosomeSignal(object=s.H9A) #2119 cell barcodes
s.H9A <- TSSEnrichment(object=s.H9A, fast=FALSE)

s.H9A$pct_reads_in_peaks <- s.H9A@meta.data$peak_region_fragments / s.H9A@meta.data$passed_filters * 100
s.H9A$blacklist_ratio <- s.H9A@meta.data$blacklist_region_fragments / s.H9A@meta.data$peak_region_fragments
s.H9A$high.tss <- ifelse(s.H9A$TSS.enrichment > 1, 'High', 'Low')
s.H9A$nucleosome_group <- ifelse(s.H9A$nucleosome_signal > 2, 'NS > 2', 'NS < 2')
s.H9A <- subset(
x = s.H9A,
subset = peak_region_fragments < 250 &
pct_reads_in_peaks > 0.1 &
pct_reads_in_peaks < 2 &
nucleosome_signal < 1.5 &
nucleosome_signal > 0.1 &
TSS.enrichment < 5
)
ncol(s.H9A) #1638 cells

Then merging all samples into an object called total_ATAC and running:
total_ATAC <- RunTFIDF(total_ATAC)
total_ATAC <- FindTopFeatures(total_ATAC, min.cutoff = 20)
total_ATAC[['ATAC']]
total_ATAC <- RunSVD(total_ATAC, n = 50)
total_ATAC <- RunUMAP(total_ATAC, dims = 2:50, reduction = 'lsi')

However, when I run 'RunUMAP' I receive the error:
14:33:10 UMAP embedding parameters a = 0.9922 b = 1.112
Error in checkna(X) : Missing values found in 'X'

Which is due to the cell embeddings from LSI being all NA valued:

head(total_ATAC@reductions$lsi@cell.embeddings, 3)
LSI_1 LSI_2 LSI_3 LSI_4 LSI_5 LSI_6 LSI_7 LSI_8 LSI_9 LSI_10 LSI_11 LSI_12 LSI_13
H9A_AAACGGTTCCACTATG-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
H9A_AAAGCGAAGCAAACAC-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
H9A_AAAGGCAAGCTTGCTA-1 NaN NaN NaN NaN NaN NaN NaN NaN

This issue has only persisted across merged data, as I can run the SVD and UMAP on individual samples with no problem. I cannot seem to figure out how to troubleshoot this, tried many different things, all to no solution.
If you have any ideas, it would be much appreciated.
Thanks a bunch.

Here's my sessioninfo():
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.28.1 AnnotationFilter_1.28.0
[4] GenomicFeatures_1.56.0 SingleCellExperiment_1.26.0 rhdf5_2.48.0
[7] Rcpp_1.0.13 data.table_1.16.0 plyr_1.8.9
[10] magrittr_2.0.3 gtable_0.3.5 gtools_3.9.5
[13] gridExtra_2.3 ArchR_1.0.2 Signac_1.14.0
[16] SCP_0.5.6 stringr_1.5.1 reprex_2.1.1
[19] cowplot_1.1.3 SCINA_1.2.0 gplots_3.1.3.1
[22] MASS_7.3-61 infercnv_1.20.0 patchwork_1.3.0
[25] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4
[28] Matrix_1.6-5 gprofiler2_0.2.3 scales_1.3.0
[31] NMF_0.28 cluster_2.1.6 rngtools_1.5.2
[34] registry_0.5-1 DOSE_3.28.2 org.Hs.eg.db_3.18.0
[37] AnnotationDbi_1.66.0 clusterProfiler_4.10.1 readxl_1.4.3
[40] rtracklayer_1.64.0 ggpubr_0.6.0 reshape2_1.4.4
[43] tidyr_1.3.1 dplyr_1.1.4 DESeq2_1.44.0
[46] SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0
[49] matrixStats_1.4.1 GenomicRanges_1.56.1 GenomeInfoDb_1.40.1
[52] IRanges_2.38.1 S4Vectors_0.42.1 BiocGenerics_0.50.0
[55] tibble_3.2.1 DescTools_0.99.56 caret_6.0-94
[58] lattice_0.22-6 ggrepel_0.9.6 ggplot2_3.5.1
[61] pROC_1.18.5 doMC_1.3.8 iterators_1.0.14
[64] foreach_1.5.2

loaded via a namespace (and not attached):
[1] igraph_2.0.3 ica_1.0-3 plotly_4.10.4 zlibbioc_1.50.0
[5] tidyselect_1.2.1 bit_4.0.5 doParallel_1.0.17 clue_0.3-65
[9] rjson_0.2.23 blob_1.2.4 S4Arrays_1.4.1 png_0.1-8
[13] cli_3.6.3 ggplotify_0.1.2 ProtGenerics_1.36.0 goftest_1.2-3
[17] BiocIO_1.14.0 SCpubr_2.0.2 purrr_1.0.2 uwot_0.2.2
[21] shadowtext_0.1.4 curl_5.2.2 mime_0.12 tidytree_0.4.6
[25] leiden_0.4.3.1 coin_1.4-3 ComplexHeatmap_2.18.0 stringi_1.8.4
[29] backports_1.5.0 rjags_4-16 slingshot_2.10.0 parallelDist_0.2.6
[33] XML_3.99-0.17 Exact_3.3 lubridate_1.9.3 httpuv_1.6.15
[37] rappdirs_0.3.3 splines_4.4.1 RcppRoll_0.3.1 prodlim_2024.06.25
[41] ggraph_2.2.1 sctransform_0.4.1 rootSolve_1.8.2.4 DBI_1.2.3
[45] HDF5Array_1.32.1 withr_3.0.1 class_7.3-22 enrichplot_1.22.0
[49] lmtest_0.9-40 ggnewscale_0.5.0 tidygraph_1.3.1 formatR_1.14
[53] BiocManager_1.30.25 htmlwidgets_1.6.4 fs_1.6.4 biomaRt_2.58.2
[57] princurve_2.1.6 labeling_0.4.3 SparseArray_1.4.8 cellranger_1.1.0
[61] lmom_3.0 reticulate_1.39.0 zoo_1.8-12 XVector_0.44.0
[65] UCSC.utils_1.0.0 timechange_0.3.0 fansi_1.0.6 caTools_1.18.3
[69] timeDate_4032.109 ggtree_3.10.1 R.oo_1.26.0 RSpectra_0.16-2
[73] irlba_2.3.5.1 fastDummies_1.7.4 gridGraphics_0.5-1 lazyeval_0.2.2
[77] yaml_2.3.10 phyclust_0.1-34 survival_3.7-0 scattermore_1.2
[81] crayon_1.5.3 RcppAnnoy_0.0.22 RColorBrewer_1.1-3 progressr_0.14.0
[85] tweenr_2.0.3 later_1.3.2 ggridges_0.5.6 codetools_0.2-20
[89] GlobalOptions_0.1.2 KEGGREST_1.44.1 Rtsne_0.17 shape_1.4.6.1
[93] limma_3.60.4 Rsamtools_2.20.0 filelock_1.0.3 pkgconfig_2.0.3
[97] xml2_1.3.6 spatstat.univar_3.0-1 GenomicAlignments_1.40.0 aplot_0.2.3
[101] spatstat.sparse_3.1-0 ape_5.8 viridisLite_0.4.2 gridBase_0.4-7
[105] xtable_1.8-4 car_3.1-2 fastcluster_1.2.6 httr_1.4.7
[109] tools_4.4.1 globals_0.16.3 hardhat_1.4.0 broom_1.0.6
[113] nlme_3.1-166 futile.logger_1.4.3 lambda.r_1.2.4 HDO.db_0.99.1
[117] dbplyr_2.5.0 assertthat_0.2.1 digest_0.6.37 farver_2.1.2
[121] TrajectoryUtils_1.10.1 ModelMetrics_1.2.2.2 yulab.utils_0.1.7 viridis_0.6.5
[125] rpart_4.1.23 glue_1.7.0 cachem_1.1.0 BiocFileCache_2.10.2
[129] polyclip_1.10-7 proxyC_0.4.1 generics_0.1.3 Biostrings_2.72.1
[133] mvtnorm_1.3-1 parallelly_1.38.0 statmod_1.5.0 R.cache_0.16.0
[137] RcppHNSW_0.6.0 carData_3.0-5 pbapply_1.7-2 spam_2.10-0
[141] gson_0.1.0 utf8_1.2.4 gower_1.0.1 SeuratWrappers_0.3.0
[145] graphlayouts_1.1.1 ggsignif_0.6.4 shiny_1.9.1 lava_1.8.0
[149] GenomeInfoDbData_1.2.12 R.utils_2.12.3 rhdf5filters_1.16.0 RCurl_1.98-1.16
[153] memoise_2.0.1 R.methodsS3_1.8.2 gld_2.6.6 future_1.34.0
[157] RANN_2.6.2 Cairo_1.6-2 spatstat.data_3.1-2 rstudioapi_0.16.0
[161] spatstat.utils_3.1-0 hms_1.1.3 fitdistrplus_1.2-1 munsell_0.5.1
[165] colorspace_2.1-1 rlang_1.1.4 ipred_0.9-15 dotCall64_1.1-1
[169] ggforce_0.4.2 circlize_0.4.16 coda_0.19-4.1 e1071_1.7-16
[173] TH.data_1.1-2 remotes_2.5.0 recipes_1.1.0 modeltools_0.2-23
[177] abind_1.4-8 GOSemSim_2.28.1 libcoin_1.0-10 treeio_1.26.0
[181] Rhdf5lib_1.26.0 futile.options_1.0.1 bitops_1.0-8 promises_1.3.0
[185] scatterpie_0.2.4 RSQLite_2.3.7 qvalue_2.34.0 sandwich_3.1-1
[189] fgsea_1.28.0 DelayedArray_0.30.1 proxy_0.4-27 GO.db_3.18.0
[193] compiler_4.4.1 forcats_1.0.0 prettyunits_1.2.0 boot_1.3-31
[197] listenv_0.9.1 edgeR_4.2.1 tensor_1.5 progress_1.2.3
[201] BiocParallel_1.38.0 spatstat.random_3.3-2 R6_2.5.1 fastmap_1.2.0
[205] multcomp_1.4-26 fastmatch_1.1-4 rstatix_0.7.2 ROCR_1.0-11
[209] rsvd_1.0.5 nnet_7.3-19 KernSmooth_2.23-24 miniUI_0.1.1.1
[213] deldir_2.0-4 htmltools_0.5.8.1 RcppParallel_5.1.9 bit64_4.0.5
[217] spatstat.explore_3.3-2 lifecycle_1.0.4 restfulr_0.0.15 vctrs_0.6.5
[221] spatstat.geom_3.3-3 ggfun_0.1.6 future.apply_1.11.2 pillar_1.9.0
[225] locfit_1.5-9.10 jsonlite_1.8.8 expm_1.0-0 argparse_2.2.3
[229] GetoptLong_1.0.5

@gherrgo gherrgo added the bug Something isn't working label Sep 25, 2024
@gherrgo
Copy link
Author

gherrgo commented Sep 26, 2024

I should add, I attempted the example workflow for merging data and I think I may have isolated the issue. When running RunSVD(combined) I receive the error:
Warning in irlba(A = t(x = object), nv = n, work = irlba.work, tol = tol) :
did not converge--results might be invalid!; try increasing work or maxit
Scaling cell embeddings

And then receive all NA's for cell embeddings across all of the example cells:

combined@reductions$lsi@cell.embeddings
LSI_1 LSI_2 LSI_3 LSI_4 LSI_5 LSI_6 LSI_7 LSI_8 LSI_9 LSI_10 LSI_11 LSI_12 LSI_13
500_AAACTGCAGACTCGGA-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
500_AAAGATGCACCTATTT-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
500_AAAGATGCAGATACAA-1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Could this be an error forced by irlba ? How should I circumvent?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant