Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDA and ProdLDA #10

Open
pawel-czyz opened this issue Apr 11, 2023 · 0 comments
Open

LDA and ProdLDA #10

pawel-czyz opened this issue Apr 11, 2023 · 0 comments

Comments

@pawel-czyz
Copy link
Member

pawel-czyz commented Apr 11, 2023

Consider an addmixture model, where each mutation $Y_{ng}\in {0, 1}$ is generated from a "topic" $Z_{ng}\in {H, 1, ..., K}$, where $H$ is a "healthy" topic, with $P(Y_{ng}=1\mid Z_{ng}=H) \ll 1$.

Then, we can use an LDA-like model where instead of word positions we have enumerated genes and the vocabulary at each position is ${0, 1}$, sampled from the Bernoulli distribution. Hence, the mixing matrix is again $\eta_{kg} = P(Y_g=1\mid Z_g=k)$ and is interpretable (as it can be made sparse using e.g., $\mathrm{Beta}(0.1, 0.1)$ distribution).

Inference in LDA and closely-related ProdLDA can be implemented e.g., in NumPyro.

This task should be split into several smaller tasks, for example:

  • Simulate data sets according to LDA and ProdLDA models.
  • Experiment with the implementation provided. See whether simulations match the results.
  • If the results are satisfactory, incorporate LDA and ProdLDA into the codebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant