漸近論の歴史 – Hirofumi Shiba

\(T\) を時間パラメータの有向集合，\((\Omega,\mathcal{F},(\mathcal{F}_t)_{t\in T},\operatorname{P})\) をフィルトレーション付きの標本空間とする．この上にパラメータづけられた確率分布の族 \(\{P_\theta\}_{\theta\in\Theta}\subset\mathcal{P}(\Omega)\) を考える．

\((\Theta,d)\) はコンパクト距離空間とし，\(H:=C(\Theta;\mathbb{R}^p)\) を一様ノルムに関して Hilbert 空間と見る．

1 最小コントラスト推定量

定義（コントラスト関数） (Pfanzagl, 1969)

可測関数 \(C:\Omega\to H\) が次を満たすとき，これを コントラスト関数 という：¹ \[ E_\theta[C(\theta)]<E_\theta[C(\theta')],\qquad(\theta\ne\theta'). \]

別名 (Extremum Estimator)

最小コントラスト推定量のように，特定の標本依存関数 \(\widehat{Q}_n(\theta)\) を最小化して得る推定量 \(\widehat{\theta}_n\) を extremum estimator ということもある (Manski, 1975), (Amemiya, 1985), (Newey and McFadden, 1994)．

この場合の一致性証明も全く同様に与えられ (Amemiya, 1973, p. 補題3), (Newey and McFadden, 1994, p. 定理2.1)，本質的に同じ枠組みであると言うことができる．

最小コントラスト推定量の用語は (Pfanzagl, 1969) で初めて導入され，その後もヨーロッパ系，分野で言えば確率過程の統計推測において広く用いられている印象がある．

このような（これよりも少しだけ広い意味での）識別可能性を持った関数 \(C\) と，これに収束する列 \(C_t\) をうまく設計することにより，\(\operatorname{P}\in\{P_\theta\}\) を要請せずとも，目標の値 \(\theta^*\in\Theta\) に収束する推定量の列 \(\widehat{\theta}_t\) を構成することができる：

定理（漸近最小コントラスト推定量の一致性）

\(C_t:\Omega\to H\) を \(\mathcal{F}_t\)-可測関数の列とし，\(\Theta'\subset\Theta\) を集合とする．\(C_t\) に確率収束極限 \(C\) があり，\(\widehat{\theta}_t:\Omega\to\Theta\) が次を満たすならば，\(d(\widehat{\theta}_t,\Theta')\overset{\text{p}}{\to}0.\) が成り立つ：

\(C\) に関する識別可能性条件：任意の \(\eta>0\) に関して， \[ \inf_{d(\theta,\Theta')\ge\eta}C(\theta)-\inf C(\Theta')>0. \]
\(C_t\) に関する収束条件：任意の \(\epsilon>0\) に対して， \[ \operatorname{P}\left[C_t(\widehat{\theta}_t)-\inf C_t(\Theta)>\epsilon\right]\to0. \]

例（対数尤度と KL 乖離度）

対数尤度はコントラスト関数になる： \[ C(-;\theta)=-\log\frac{d P_\theta}{d P_{\theta^*}}(-). \] このとき， \[ E_{\theta^*}[C(\theta)]=\operatorname{KL}(P_{\theta^*},P_\theta). \] \(\{P_\theta\}\) に真のデータ生成分布 \(\operatorname{P}\) が含まれていない場合でも，定理は有効であり，何かしらの収束極限 \(\Theta'\subset\Theta\) は持つ．

一般のコントラスト関数に関して，誤特定の場合でも，この点 \(\Theta'\) は特定の距離を真のモデル \(\operatorname{P}\) との間で最小化した点であり，大偏差原理様の結果が成り立つことが (Golubev and Spokoiny, 2009) で議論されている．

例（一般化モーメント法 (Hansen, 1982)）

「モーメント関数」とは \(\operatorname{E}[g(X,\theta_0)]=0\) を満たすベクトル値関数 \(g\) をいう．これに対して， \[ C_n(\theta)=-G_n^\top\widehat{W}G_n,\qquad G_n:=\frac{1}{n}\sum_{i=1}^ng(z_i,\theta) \] と，半正定値行列 \(\widehat{W}\) を通じて \(0\) に近づけていこうとする手続きを GMM (Generalized Method of Moments) という．

この形の目的関数が最も自然に現れるのが 操作変数法 であり， \[ G_n:=\frac{1}{n}\sum_{i=1}^nz_i(y_i-x_i^\top\beta) \] と与えた場合に当たる．

この枠組みでの最適な荷重 \(\widehat{W}\) の取り方が，(Newey and McFadden, 1994, p. 2170 Section 5.4) などで議論されている．例えば最小二乗法は \[ G_n:=\sum_{i=1}^n(y_i-h(x_i,\theta))^2 \] と取った場合の GMM に当たるが，この際の荷重 \(\widehat{W}\) の各成分は誤差の条件付き分散 \(\mathrm{V}[\epsilon^2|X=x]\) の逆行列が最適な選択になる．この推定量は GLS (Generalized Least Squares) とも呼ばれる．

1.1 最小距離推定量

1.2 \(M\)-推定量

2 \(Z\)-推定量

定理（\(Z\)-推定量の漸近分布）

\(\varphi_t:\Omega\to H\) を \(\mathcal{F}_t\)-可測関数の列とし，\(\theta^*\in\Theta^\circ\) の近傍で \(C^1\)-級，\((\widehat{\theta}_t)\) を \((\mathcal{F}_t)\)-適合過程とする．次を満たすならば， \[ c_t^{-1}(\widehat{\theta}_t-\theta^*)=\Gamma^{-1}_{\theta^*}\Delta_T(\theta^*)+o_p(1). \] が成り立つ．

一致性：\(\widehat{\theta}_t\overset{\text{p}}{\to}\theta^*\)．
漸近的零点である：ある \(0\) に収束する正数列 \(b_t\) について， \[ \Delta_t(\widehat{\theta}_t):=b_t\varphi_t(\widehat{\theta}_t)\overset{\text{p}}{\to}0. \]
中心極限定理：ある確率変数 \(\Delta_{\theta^*}\) が存在して， \[ \Delta_t(\theta^*)\overset{\text{d}}{\to}\Delta_{\theta^*}. \]
局所一様大数の法則：ある \(0\) に収束する正数列 \(c_t\) と，確率変数 \(\Gamma_{\theta^*}\) について， \[ \sup_{\lvert\theta-\theta^*\rvert\le\eta}\lvert\Gamma_t(\theta)-\Gamma_{\theta^*}\rvert\overset{\text{p}}{\to}0\qquad(\eta,t)\to(0,\infty), \] \[ \Gamma_t:=-b_t(\partial_\theta\varphi_t)c_t. \]

例（最尤推定量 (Fisher, 1925)）

最尤推定量は，真値が \(\theta^*\in\Theta^\circ\) を満たし，尤度が \(C^1\)-級であるとき， \[ 0=\frac{1}{n}\sum_{i=1}^n\psi_i(z_i;\theta),\quad\psi_i(z;\theta):=\partial_\theta\log f(z|\theta) \] という推定方程式の解と理解できる．平均値の定理より，ある \(\theta'\) が \(\theta\) と \(\theta^*\) の間に存在して，² \[ 0=\frac{1}{n}\sum_{i=1}^n\psi_i(z_i;\theta^*)+\left(\frac{1}{n}\sum_{i=1}^n\partial_\theta\psi(z_i;\theta')\right)(\theta-\theta^*) \] と表せ，中心極限定理のオーダーの項と大数の法則のオーダーの項との積と見れる： \[ \sqrt{n}(\widehat{\theta}_n-\theta^*)=-\frac{1}{\sqrt{n}}\sum_{i=1}^n\psi(z_i;\theta^*)\left(\frac{1}{n}\sum_{i=1}^n\partial_\theta\psi(z_i;\theta')\right)^{-1}. \tag{1}\] ２つ目の因子に関する大数の法則は，\(\theta^*\) に十分近い \(\theta'\) のとり得る値について一様に起こる必要があることに注意．

最終的に，第一項である経験スコアの極限分布 \(N(0,J)\) と第二項の収束先である対数尤度の Hessian \(H^{-1}\) について，Slutzky の補題から極限分布は \(N(0,H^{-1}JH^{-1})\) というサンドイッチ型の分散を持つ．最尤推定法に限っては \(H=-J\) が成り立つことに注意．

一般に推定関数 \(\psi(z;\theta)\) はデータ \(z\) の推定量への漸近的な影響を，式 (1) の意味で定量化していると見れ，式 (1) のような１次近似を与える関数 \(\psi\) は 影響関数 とも呼ばれる．

例（一般化推定方程式 (Liang and Zeger, 1986)）

一般化推定方程式法とは，一般化線型モデル \(\mu(\beta)\) に対して， \[ U(\beta):=\sum_{i=1}^N\frac{\partial \mu_i}{\partial \beta}V_i^{-1}(Y_i-\mu_i(\beta)) \] が定める \(Z\)-推定量をいう．

\(U=0\) という条件は，\(V_i\) を荷重として，ほとんど GLS の目的関数が導く１階の最適性条件に一致する．

References

Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica, 41(6), 997–1016.

Amemiya, T. (1985). Advanced econometrics. Harvard University Press.

Fisher, R. A. (1925). Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 22(5), 700–725.

Golubev, Y., and Spokoiny, V. (2009). Exponential bounds for minimum contrast estimators. Electronic Journal of Statistics, 3(none), 712–746.

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50(4), 1029–1054.

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In L. M. Le Cam and J. Neyman, editors, Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: statistics,Vol. 1, pages 221–233.

Liang, K.-Y., and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.

Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics, 3(3), 205–228.

Newey, W. K., and McFadden, D. (1994). Chapter 36 large sample estimation and hypothesis testing. In,Vol. 4, pages 2111–2245. Elsevier.

Pfanzagl, J. (1969). On the measurability and consistency of minimum contrast estimates. Metrika, 14(1), 249–272.

Footnotes

(Pfanzagl, 1969, pp. 250 定義1.1) が (Huber, 1967) を拡張する形で導入した用語である．↩︎
\(\theta':\Omega\to\Theta\) の可測性に注意は必要．↩︎