Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi! This is the other error I was running into (the one due to which I originally decided to split the computation to individual genes, which lead me to discover the first reported error).
This one is pretty straightforward: when the data is "too simple", the KDE estimator is failing because the automatically selected bandwidth is zero. These genes are otherwise uninteresting (they get binarized to
NaN
anyway asZeroInf
; maybeDiscareded
would be even more appropriate here), but it's hard to filter them out beforehand. I originally used a "at least X non-zero samples" criterion to avoid this issue, but here X should ideally depend on the number and variance of observations overall and I couldn't figure out a formula that was reliable enough to always remove all problematic genes.Hence, in this PR, I try to compute the bandwidth (using the same method) before using the KDE estimator itself. If it fails, I print a warning and return
NaN
, otherwise everything works as usual. Hopefully this should only impact very low-quality genes anyway. The main reason for fixing this is to avoid failing a large computation when only one or two genes are problematic (and having to search for these genes through other means).(Also: I am branching off of PR #2, since I need the single-gene bug fixed in order for my test to work, but otherwise these should be two separate things)