Which column of final_report.txt should be used for strain abundance? #23

haihao999 · 2024-09-18T14:02:22Z

Hi,
In this result, should I select Predicted_Depth?
Strain_ID Strain_Name Cluster_ID Relative_Abundance_Inside_Cluster Predicted_Depth Coverage Covered/Total_kmr

Why is there no Predicted_Depth (Ab*cls_depth) column in my final_report.txt result?

liaoherui · 2024-09-18T18:10:04Z

Hi, thanks for using StrainScan!

For the first question, yes. The Predicted_Depth and Coverage columns can be used to infer the abundance of the identified strains.

For the second question, the possible reason is that the tool only performed cluster-level identification, meaning all identified strains belong to clusters with a size of 1. In this case, the Predicted_Depth (Ab*cls_depth) column is not provided, as it is only calculated for identified strains from clusters with a size greater than 1.

haihao999 · 2024-09-24T05:27:09Z

Thank you very much for your reply. How should I choose if I encounter the following situation?
Coverage Predicted_Depth
0.98 26.66
0,93 9.9
0.72 7.73

liaoherui · 2024-09-24T22:49:53Z

You should choose "Predicted_Depth" if your goal is to estimate the abundance of identified strains. "Coverage" here roughly reflects the percentage of genomic regions covered by k-mers.

haihao999 · 2024-11-19T02:31:52Z

If I use environmental metagenomic data, but with different sequencing depths, does the sum of the depths of each station make sense? Thank you

liaoherui · 2024-11-21T01:02:40Z

Apologies for the late reply.

The predicted depth reflects the depth of each strain in the dataset and is influenced by sequencing depths. If your goal is to examine the relative strain diversity within each sample, you can still use the relative abundance by normalizing the "Predicted_Depth."

However, if you aim to compare the absolute abundance of a specific strain across different samples, the results may be biased.

ZhangDengwei · 2024-11-27T03:06:00Z

Hi,

Still a little confused.

Let's say the result is as follows:

Strains Coverage Predicted_Depth
C1 0.98 26.66
C2 0,93 9.9
C3 0.72 7.73

So the relative abundance of three strains within this sample should be:

C1 = 26.66 / (26.66 + 9.9 + 7.3) = 0.608
C2 = 9.9 / (26.66 + 9.9 + 7.3) = 0.226
C3 = 7.3 / (26.66 + 9.9 + 7.3) = 0.166

Please correct if I am wrong.

liaoherui · 2024-11-28T23:53:04Z

Hi Dengwei,

In this context, "Coverage" refers to the ratio of how many k-mers in the cluster are covered; it does not correspond to "relative abundance." Therefore, the calculation here is incorrect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which column of final_report.txt should be used for strain abundance? #23

Which column of final_report.txt should be used for strain abundance? #23

haihao999 commented Sep 18, 2024

liaoherui commented Sep 18, 2024 •

edited

Loading

haihao999 commented Sep 24, 2024

liaoherui commented Sep 24, 2024

haihao999 commented Nov 19, 2024

liaoherui commented Nov 21, 2024

ZhangDengwei commented Nov 27, 2024

liaoherui commented Nov 28, 2024

Which column of final_report.txt should be used for strain abundance? #23

Which column of final_report.txt should be used for strain abundance? #23

Comments

haihao999 commented Sep 18, 2024

liaoherui commented Sep 18, 2024 • edited Loading

haihao999 commented Sep 24, 2024

liaoherui commented Sep 24, 2024

haihao999 commented Nov 19, 2024

liaoherui commented Nov 21, 2024

ZhangDengwei commented Nov 27, 2024

liaoherui commented Nov 28, 2024

liaoherui commented Sep 18, 2024 •

edited

Loading