Multinomial Logit and Variants From Scratch

Model files are standard alone

Standard Multinomial Logit: mlogit.py
Multinomial Logit With Latent Class: mlogit_latent_class.py

The utils.py file contains stable version of softmax function and numeric hessian approximation via the central finite difference. The Hessian matrix may not be invertible or the inverted matrix has non-positive values in diagonal, so standard error is calculated by SVD decomposition on the Hessian and inverted diagonal matrix.

Note, the scipy.optimize.minimize changes beta when calculating gradient and hessian, so caching forward calculation is not necessary.

Standard Multinomial Logit

Assume each observation is IID.

Notation

N is number of observation, K is number of choices, M is number of features, X: [N,T,K,M], Y: [N,T], and $\beta$: [M,]

Utility for choice $k$ is

$$ U_{k} = X_{k}\beta $$

Let $M = \max X_k \beta$. The probability of choosing $k$ is

$$ P_{k} = \frac{exp(U_{k} - M)}{\sum_{j=1}^{K} exp(U_j - M)} $$

The log loss is

$$ \begin{align*} LL_i & = \sum_{k=1}^{K} Y_{ik} log(P_{ik}) \\ & = \sum_{k=1}^{K} Y_{ik} (U_{ik} - M) - \sum_{k=1}^{K} Y_{ik} log( \sum_{k=1}^{K} exp(U_{ik} - M)) \end{align*} $$

The Hessian is

$$ \begin{align*} H_jm & = -\frac{\partial}{\partial \beta_j} \frac{\partial LL}{\partial \beta_m} \\ & = - \sum_{k=1}^{K} \frac{\partial P_{k}}{\partial \beta_j} X_{km} \\ & = - \sum_{k=1}^{K} P_k \left(X_{kj} - \sum_k P_k X_{kj} \right) X_{km} \end{align*} $$

The standard error is

$$ se = \sqrt{diag\left((-H)^{T}\right)} $$

or use SVD to decompose $H$ and then take the inverse of the diagonal matrix D.

Latent Class

Notation: H is individual level to determine latent class probability. Q is number of latent class. $P_{tkq}$ is predicted probability for choice k in latent class q at time t. Data: Y:[N,K], X:[N,K,M], H:[N,T,L] Parameter: $\beta$: [M,Q], $\gamma$:[L,Q] Model:

$$ \begin{align*} P_{tkq} & = \frac{exp(X_{tk} \beta_{q})}{\sum_{j=1}^{K} exp(X_{tj} \beta_q)} \\ W_q & = \frac{exp(\sum_t H_t \gamma_{q})}{\sum_{j=1}^{Q} exp(\sum_t H_t \gamma_{j})} \\ L_{iq} & = \prod_t \prod_k P_{tkq}^{Y_tk} \\ L_i & = \sum_{q}^{Q} L_{iq} W_q \\ LL_i & = log(Li) \\ LL & = \sum_{i=1}^{n} LL_i \end{align*} $$

Gradient

$$ \begin{align*} \frac{\partial LL}{\partial \beta_{q}} & = \sum_{i}^{N} \frac{1}{L_i} \sum_{q} W_q \frac{\partial L_{iq}}{\partial \beta_{q}} \\ & = \sum_{i}^{N} \frac{1}{L_i} W_q \frac{\partial L_{iq}}{\partial \beta_{q}} \end{align*} $$

where

$$ \begin{align*} \frac{\partial L_{iq}}{\partial \beta_{q}} & = L_{iq} \frac{\partial log(L_{iq})}{\partial \beta_{q}} \\ & = L_{iq} \sum_t \sum_k Y_{tk} \frac{1}{P_{tkq}} \left[X_{tk}^{T} P_{tkq}(1-P_{tkq}) \right] \\ & = L_{iq}\sum_t \sum_k Y_{tk} X_{tk}^{T} (1-P_{tkq})\\ & = L_{iq} \sum_t \sum_k X_{tk}^{T} (Y_{tk}-P_{tkq})\\ \end{align*} $$

$$ \frac{\partial LL}{\partial \beta_{q}} = \sum_{i}^{N} \frac{1}{L_i} W_q L_{iq} \sum_t \sum_k X_{tk}^{T} (Y_{tk}-P_{tkq}) $$

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
ReadMe.md		ReadMe.md
ReadMe.pdf		ReadMe.pdf
data.txt		data.txt
latent_class.py		latent_class.py
mlogit.py		mlogit.py
mlogit_latent_class.py		mlogit_latent_class.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multinomial Logit and Variants From Scratch

Standard Multinomial Logit

Notation

Latent Class

About

Releases

Packages

Languages

yu45020/MLogit_Friends

Folders and files

Latest commit

History

Repository files navigation

Multinomial Logit and Variants From Scratch

Standard Multinomial Logit

Notation

Latent Class

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages