Skip to content

Releases: autosome-ru/ADASTRA-pipeline

release-Susan

13 Apr 14:25
Compare
Choose a tag to compare

ADASTRA release Susan (Apr 2021): README
Key changes and updates:

(1) Estimating significance of individual ASBs: the weight parameter obtained by fitting the negative binomial mixture (applicable for scoring ASBs for BAD > 1) is now used as an informative prior, that is treated as the probability of the tested allele (the Reference allele for Ref-ASBs and the Alternative allele for Alt-ASBs) to have a higher copy number (compared to the other allele with a fixed read count), and thus to have a higher ChIP-Seq read count independently of TF binding.

The posterior was calculated for each particular SNV and used for ASB scoring, the Bayesian factor was calculated from the likelihood ratio of obtaining the observed ChIP-Seq read count at the tested allele agreeing (the tested allele has a higher DNA copy number) or contrasting (conversely) with the DNA copy number (defined by BAD). This posterior weight was used to compute the P-value and the effect size for individual SNVs.

This updated approach improves the statistical scoring of ASBs by reweighting the Negative binomial mixture and placing an emphasis on the component that is more likely to be the source of the observed read counts. This is specifically important for cell type-ASBs, where the allele with a larger ChIP-Seq read count is commonly shared between experiments.

This improvement marks the main difference with the published algorithm (doi:10.1101/2020.10.07.327643), which had a disadvantage that different observations (experiments for the same SNV) having a common allele with a greater ChIP-Seq read count, in fact, did not comply with the 'global' fit of the Negative Binomial Mixture model.

(2) BAD calling procedure changes: the penalty for generating additional segments in the BABACHI algorithm (https://github.com/autosome-ru/BABACHI) was changed to CAIC4 (CAIC with the multiplier of 4) instead of 9 used in Soos. This provides a minor but consistent improvement in terms of BAD maps agreement with COSMIC.

release-Soos

30 Aug 17:11
1f5708f
Compare
Choose a tag to compare

Pipeline used for ADASTRA data processing

release-0412

04 Dec 17:17
Compare
Choose a tag to compare
  1. Существенно доработана процедура оценки BAD: убраны snp с покрытием меньше 8, порог на выбор CAIC/SQRT - 1,5*10^7 по суммарному покрытию (включая выброшенные снипы), доп. состояния: BAD = 1.33, 1.5, 2.5, 6

  2. Масштабирование sBAD: 1->1, 1.33->1.5, 1.5->2, 2->3, 2.5->3, 3->4, 4->5, 5->6, 6->6

  3. Фильтр >=25 на максимальное покрытие при агрегации

  4. Три варианта подсчета p_value: старый, по бином. тестам, cor - по цензурированным и bal - по цензурированным и симметризованным, чтобы исключить наличе p_value = 1

release-1211

12 Nov 17:35
Compare
Choose a tag to compare

тестовый релиз

  1. 'm_logpref', 'm_logpalt' - новый метод агрегации p-value, по ним же посчитаны FDR

release-0611

06 Nov 11:59
Compare
Choose a tag to compare
  1. Обновлена аннотация пиков

  2. Были учтены регионы с повторами: для common SNP (c rs ID) в соответствие ставился типа повтора (колонка repeat_type),
    non-rs снипы в повторах выкидывались

  3. Новая процедура подсчета плоидности (Отделен коллинг крупных делеций, штраф в делециях 1 - 10**(-1/SQRT(N)),
    штраф за границу - CAIC до 30к SNP, SQRT - после, состояния 1.5, 6)

  4. Раздельные эффект сайзы для ref и alt (колонки m1_ref, m1_alt, m2_ref, m2_alt, вместо m1, m2)

  5. m_callers теперь нет, есть total_callers и unique_callers

Новые данные на globe:/home/abramov/RESULTS/release-0611/