IRLS for Multinomial Logistic Regression — 5 Classes
Softmax link, no intercept, K=5 classes (class 5 = reference). Parameters: (K−1)×p = 4×2 = 8. Fisher information is 8×8 block matrix.
⏮
◀
▶ Play
▶
⏭
t = 0
Slow
Med
Fast
n
30
50
100
200
↻
GD
α
0.020
steps
100
300
1000
Data & decision regions
Parameter traces (β
k
over iterations)
Solid = β
k1
, dashed = β
k2
Thin lines = true values
Per-observation predicted probabilities
Stacked bars = P(Y=k|x) at current β. Black tick = true class.
Convergence (log-log)
IRLS
GD
IRLS computation at current iteration
Convergence history