-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which column of final_report.txt should be used for strain abundance? #23
Comments
Hi, thanks for using StrainScan! For the first question, yes. The Predicted_Depth and Coverage columns can be used to infer the abundance of the identified strains. For the second question, the possible reason is that the tool only performed cluster-level identification, meaning all identified strains belong to clusters with a size of 1. In this case, the Predicted_Depth (Ab*cls_depth) column is not provided, as it is only calculated for identified strains from clusters with a size greater than 1. |
Thank you very much for your reply. How should I choose if I encounter the following situation? |
You should choose "Predicted_Depth" if your goal is to estimate the abundance of identified strains. "Coverage" here roughly reflects the percentage of genomic regions covered by k-mers. |
If I use environmental metagenomic data, but with different sequencing depths, does the sum of the depths of each station make sense? Thank you |
Apologies for the late reply. The predicted depth reflects the depth of each strain in the dataset and is influenced by sequencing depths. If your goal is to examine the relative strain diversity within each sample, you can still use the relative abundance by normalizing the "Predicted_Depth." However, if you aim to compare the absolute abundance of a specific strain across different samples, the results may be biased. |
Hi, Still a little confused. Let's say the result is as follows:
So the relative abundance of three strains within this sample should be:
Please correct if I am wrong. |
Hi Dengwei, In this context, "Coverage" refers to the ratio of how many k-mers in the cluster are covered; it does not correspond to "relative abundance." Therefore, the calculation here is incorrect. |
Hi,
In this result, should I select Predicted_Depth?
Strain_ID Strain_Name Cluster_ID Relative_Abundance_Inside_Cluster Predicted_Depth Coverage Covered/Total_kmr
Why is there no Predicted_Depth (Ab*cls_depth) column in my final_report.txt result?
The text was updated successfully, but these errors were encountered: