Variance estimation for transformations of estimated parameters
Part of LearnByCode — MATLAB toolboxes for econometric methods
You estimated a regression and obtained coefficient estimates $\hat\beta_1, \hat\beta_2, \ldots$ with their covariance matrix. Now someone asks: what is the standard error of $\hat\beta_1 / \hat\beta_2$?
You cannot simply divide the standard errors—the ratio of two normal random variables is not normal, and the covariance between numerator and denominator matters. The delta method solves this: it gives you the asymptotic variance of any smooth function of your estimates, using only quantities you already have.
When do you need it? Whenever you need the standard error of a quantity that is a function of estimated coefficients:
All of these are functions $g(\hat\theta)$ of the coefficient vector. The delta method tells you: if you know the variance of $\hat\theta$, you can get the variance of $g(\hat\theta)$ by sandwiching with the Jacobian.
Suppose OLS gives $\hat\beta = (3.05,\; 0.98)'$ with covariance matrix
$$ \widehat{V} = \begin{pmatrix} 0.0042 & -0.0001 \\ -0.0001 & 0.0038 \end{pmatrix}. $$You want the standard error of the ratio $g(\beta) = \beta_1 / \beta_2$.
That is all the delta method does—three lines of computation once you have the Jacobian.
Dimensions:
| Object | Size | Description |
|---|---|---|
| $\theta_0,\;\hat\theta$ | $k \times 1$ | Parameter vector and its estimator |
| $V$ | $k \times k$ | Asymptotic covariance of $\sqrt{n}(\hat\theta - \theta_0)$ |
| $g(\cdot)$ | $\mathbb{R}^k \to \mathbb{R}^q$ | Transformation of interest |
| $G = \partial g / \partial \theta'$ | $q \times k$ | Jacobian evaluated at $\theta_0$ |
| $G V G'$ | $q \times q$ | Asymptotic covariance of $\sqrt{n}(g(\hat\theta) - g(\theta_0))$ |
By the mean value theorem,
$$ g(\hat\theta) - g(\theta_0) = F\!\bigl(\tilde\theta\bigr)\,\bigl(\hat\theta - \theta_0\bigr), $$where $\tilde\theta$ lies between $\hat\theta$ and $\theta_0$, and $F(\theta) = \partial g(\theta)/\partial\theta'$ is the Jacobian. Since $\hat\theta$ is consistent, $\tilde\theta \xrightarrow{p} \theta_0$, and by continuity of $F$:
$$ F\!\bigl(\tilde\theta\bigr) \xrightarrow{p} F(\theta_0). $$By Slutsky's theorem,
$$ \sqrt{n}\bigl(g(\hat\theta) - g(\theta_0)\bigr) = F(\theta_0)\,\sqrt{n}\bigl(\hat\theta - \theta_0\bigr) + o_p(1). $$The result follows by an application of Cramér's theorem. $\square$
When $g(\theta) = R\theta - r$ with $R$ a $q \times k$ matrix and $r$ a $q \times 1$ vector, the Jacobian is simply $G = R$ (constant). Therefore:
$$ \operatorname{Var}(\hat\phi) = R\, V\, R'. $$This is exact — no first-order approximation is involved.
Common uses in econometrics:
See Hansen (2021), Chapter 8.1 for the connection to constrained least squares.
The Jacobian $G = \partial g/\partial\theta'$ must be evaluated — analytically or numerically. The variance formula
$$ \operatorname{Var}(\hat\phi) \approx G\, V\, G' $$is a first-order approximation. Its quality depends on:
The delta method theorem starts with "given that you know $V$…" But where does $V$ actually come from? In practice, most estimators are plug-in estimators: you replace population moments with sample moments and then apply some formula.
The smooth function model (Hansen 2021, Section 6.6) formalizes this. The parameter of interest is
$$ \theta = g(\mu), \qquad \mu = \mathbb{E}[h(Y)], $$where $\hat\mu = n^{-1}\sum_{i=1}^n h(Y_i)$ is the sample analogue. The plug-in estimator $\hat\theta = g(\hat\mu)$ just substitutes $\hat\mu$ for $\mu$.
Why this matters: the smooth function model tells you that the delta method applies automatically to any plug-in estimator. OLS is a special case: $\mu = \bigl(\mathbb{E}[X'X],\;\mathbb{E}[X'Y]\bigr)$, and $g$ extracts $\beta = \mathbb{E}[X'X]^{-1}\mathbb{E}[X'Y]$. So when you compute the OLS covariance matrix, you are already using the smooth function model—the delta method is the same idea applied one more time on top.
In OLS, collinearity means that your regressors are linearly dependent: $X'X$ is singular and you cannot invert it. The delta method has an exact analogue: if your parameters of interest are linearly dependent functions of the estimated coefficients, then $G\, V\, G'$ is singular and you cannot get independent standard errors for all of them.
Example. You estimated two coefficients $\hat\theta_1, \hat\theta_2$, but you report three quantities of interest:
$$ \phi_1 = \theta_1 + \theta_2, \qquad \phi_2 = \theta_1 - \theta_2, \qquad \phi_3 = 2\theta_1 + 3\theta_2. $$Here $\phi_3 = \frac{1}{2}(3\phi_1 + \phi_2)$—a linear combination of the other two. The $3 \times 3$ covariance matrix of $(\hat\phi_1, \hat\phi_2, \hat\phi_3)$ has rank at most 2, just as adding a collinear regressor to an OLS model makes $X'X$ singular. The remedy is the same: drop the redundant quantity, or recognize that you have only two independent parameters.
The implementation in DeltaMethod.m performs the
finite-sample analogue of $\eqref{eq:delta}$:
para = g(coef); % point estimate: g(theta_hat)
F = JacobianEst(g, coef); % numerical Jacobian: G at theta_hat
varpara = F * varcoef * F'; % sandwich: G * V_hat * G'
JacobianEst.m uses central finite differences:
with step size $h = \varepsilon^{1/3} \cdot \max(|\theta_j|,\, 1)$, where $\varepsilon \approx 2.2 \times 10^{-16}$ is machine epsilon. This balances truncation error $O(h^2)$ against roundoff error $O(\varepsilon/h)$.
A good test: compare the numerical Jacobian against the analytical one.
For $g(\beta) = \beta_1/\beta_2$, the analytical Jacobian is
$G = (1/\beta_2,\; -\beta_1/\beta_2^2)$. The numerical and analytical standard
errors should agree to $\sim\!10$ digits. See MinimalExample.m
for a runnable verification.