Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which column of final_report.txt should be used for strain abundance? #23

Open
haihao999 opened this issue Sep 18, 2024 · 7 comments
Open

Comments

@haihao999
Copy link

Hi,
In this result, should I select Predicted_Depth?
Strain_ID Strain_Name Cluster_ID Relative_Abundance_Inside_Cluster Predicted_Depth Coverage Covered/Total_kmr

Why is there no Predicted_Depth (Ab*cls_depth) column in my final_report.txt result?

@liaoherui
Copy link
Owner

liaoherui commented Sep 18, 2024

Hi, thanks for using StrainScan!

For the first question, yes. The Predicted_Depth and Coverage columns can be used to infer the abundance of the identified strains.

For the second question, the possible reason is that the tool only performed cluster-level identification, meaning all identified strains belong to clusters with a size of 1. In this case, the Predicted_Depth (Ab*cls_depth) column is not provided, as it is only calculated for identified strains from clusters with a size greater than 1.

@haihao999
Copy link
Author

Thank you very much for your reply. How should I choose if I encounter the following situation?
Coverage Predicted_Depth
0.98 26.66
0,93 9.9
0.72 7.73

@liaoherui
Copy link
Owner

You should choose "Predicted_Depth" if your goal is to estimate the abundance of identified strains. "Coverage" here roughly reflects the percentage of genomic regions covered by k-mers.

@haihao999
Copy link
Author

If I use environmental metagenomic data, but with different sequencing depths, does the sum of the depths of each station make sense? Thank you

@liaoherui
Copy link
Owner

Apologies for the late reply.

The predicted depth reflects the depth of each strain in the dataset and is influenced by sequencing depths. If your goal is to examine the relative strain diversity within each sample, you can still use the relative abundance by normalizing the "Predicted_Depth."

However, if you aim to compare the absolute abundance of a specific strain across different samples, the results may be biased.

@ZhangDengwei
Copy link

Hi,

Still a little confused.

Let's say the result is as follows:

Strains Coverage Predicted_Depth
C1 0.98 26.66
C2 0,93 9.9
C3 0.72 7.73

So the relative abundance of three strains within this sample should be:

C1 = 26.66 / (26.66 + 9.9 + 7.3) = 0.608
C2 = 9.9 / (26.66 + 9.9 + 7.3) = 0.226
C3 = 7.3 / (26.66 + 9.9 + 7.3) = 0.166

Please correct if I am wrong.

@liaoherui
Copy link
Owner

Hi Dengwei,

In this context, "Coverage" refers to the ratio of how many k-mers in the cluster are covered; it does not correspond to "relative abundance." Therefore, the calculation here is incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants