IRLS for Multinomial Logistic Regression — 5 Classes

Softmax link, no intercept, K=5 classes (class 5 = reference). Parameters: (K−1)×p = 4×2 = 8. Fisher information is 8×8 block matrix.
t = 0
0.020
Data & decision regions
Parameter traces (βk over iterations)
Solid = βk1, dashed = βk2
Thin lines = true values
Per-observation predicted probabilities
Stacked bars = P(Y=k|x) at current β. Black tick = true class.
Convergence (log-log)
IRLS
GD
IRLS computation at current iteration
Convergence history