top of page

Statistics Indian Statistical Service Part 3 Question-4 Solved Solution







4.(a) Consider the following two variables linear model with observations taken as

deviation from mean :

yt = βxt + ut ; t = 1, 2, ..., n

where it's follow AR(1) process ut=p

εt's are iid with mean 0 and variance .

(i) Obtain the OLS and GLS estimators of β, say b and β respectively.

To obtain the OLS estimator of β, we need to minimize the sum of squared errors:


SSE(β) = ∑(yt - βxt)^2


Taking the derivative of SSE(β) with respect to β and setting it to zero, we get:

∂SSE(β)/∂β = -2∑(yt - βxt)xt = 0

Solving for β, we get the OLS estimator:

b = ∑xtyt / ∑x^2


To obtain the GLS estimator of β, we need to first estimate the variance-covariance matrix of ut:

Var(ut) = σ_ε^2 / (1 - ρ^2) ... (1)


Using (1), we can define the GLS estimator of β as follows:

β = (X'V^-1X)^-1(X'V^-1y)

where X is the n x 1 matrix of xt, y is the n x 1 matrix of yt, and V is the n x n variance-covariance matrix of ut given by:

V = [Var(u1) Cov(u1,u2) ... Cov(u1,un)]

[Cov(u2,u1) Var(u2) ... Cov(u2,un)]

[... ... ... ... ]

[Cov(un,u1) Cov(un,u2) ... Var(un)]


Using the AR(1) structure of ut, we can derive the elements of V as follows:

Var(ui) = σ_ε^2 / (1 - ρ^2)

Cov(ui,ui+1) = ρ Var(ui)

Cov(ui,uj) = 0 for |i - j| > 1


Substituting these expressions into the GLS estimator of β, we get:

β = [(∑xt^2 / (1 - ρ^2)) - (∑xtxt+1ρ / (1 - ρ^2))]^-1 [(∑xtyt / (1 - ρ^2)) - (∑ytxt+1ρ / (1 - ρ^2))]


Therefore, the OLS estimator of β is b = ∑xtyt / ∑x^2, while the GLS estimator of β is β = [(∑xt^2 / (1 - ρ^2)) - (∑xtxt+1ρ / (1 - ρ^2))]^-1 [(∑xtyt / (1 - ρ^2)) - (∑ytxt+1ρ / (1 - ρ^2))].

(ii) If r1 is the sample autocorrelation of {xi, t = 1, ..., n} of order one and terms of order 0( r1) are ignored for all (k +0?-3, then obtain Variance (β)

Variance (b)

Comment on the relative efficiency of GLS estimator over OLS estimator.

(7+8)


This model is known as the AR(1)-error model, and it can be written as:


yt = βxt + pu_(t-1) + εt


where ut follows an AR(1) process with parameter p, and εt are independent and identically distributed (iid) random variables with mean 0 and variance σ_ε^2.


To estimate the parameters of this model, we can use the method of maximum likelihood. The likelihood function is given by:


L(β,p,σ_ε^2) = (2πσ_ε^2)^(-n/2) exp[-(1/2σ_ε^2)∑(yt - βxt - pu_(t-1))^2]


Taking the log of the likelihood function and simplifying, we get:


log L = -(n/2) log(2π) - (n/2) log(σ_ε^2) - (1/2σ_ε^2)∑(yt - βxt - pu_(t-1))^2


To maximize the log-likelihood function, we need to take partial derivatives with respect to each parameter and set them equal to zero:


∂log L/∂β = 0

∂log L/∂p = 0

∂log L/∂σ_ε^2 = 0


Solving these equations, we can obtain the maximum likelihood estimates of β, p, and σ_ε^2.


Note that the AR(1) process for ut implies that the errors are autocorrelated. This can be addressed by estimating the autocorrelation parameter p and incorporating it into the model. Alternatively, we can use a different type of model that allows for autocorrelated errors, such as the ARIMA model.


4.(b) (i) Define finite and infinite distributed lag models.

Finite distributed lag models and infinite distributed lag models are two types of regression models that are commonly used in econometrics and other fields to study the relationship between a dependent variable and one or more explanatory variables over time.


A finite distributed lag model is a linear regression model that includes a set of lagged values of the explanatory variable(s) up to a finite number of periods. For example, a simple finite distributed lag model with one explanatory variable might be written as:


y_t = β_0 + β_1x_t + β_2x_{t-1} + β_3x_{t-2} + ε_t


where y_t is the dependent variable at time t, x_t is the value of the explanatory variable at time t, x_{t-1} is the value of the explanatory variable at the previous time period, and ε_t is the error term. In this model, the coefficients β_1, β_2, and β_3 represent the effects of the explanatory variable on the dependent variable in the current period and in the two previous periods.


An infinite distributed lag model, on the other hand, includes all past values of the explanatory variable(s), going back to an infinite number of periods. For example, a simple infinite distributed lag model with one explanatory variable might be written as:


y_t = β_0 + β_1x_t + β_2x_{t-1} + β_3x_{t-2} + ... + ε_t


In this model, the coefficients β_1, β_2, β_3, etc. represent the effects of the explanatory variable on the dependent variable over time, with the effect of each past value of x_t decaying exponentially over time.


In practice, infinite distributed lag models are rarely used because they require a large amount of data to estimate accurately, and the coefficients can be difficult to interpret. Finite distributed lag models are more commonly used because they allow researchers to study the short-term and medium-term effects of an explanatory variable on the dependent variable. However, the choice between a finite and infinite distributed lag model depends on the specific research question and the available data.



(iii) How can we estimate parameters and lag length of finite distributed lag models.


The parameters and lag length of a finite distributed lag model can be estimated using various statistical methods. Here are a few common approaches:


Ordinary least squares (OLS): This is the most common method for estimating the parameters of a finite distributed lag model. OLS involves minimizing the sum of squared errors between the predicted values and the actual values of the dependent variable. The lag length of the model is typically chosen based on prior knowledge or by testing different lag lengths and selecting the one with the best performance in terms of model fit.


Two-stage least squares (2SLS): If the explanatory variable is endogenous (i.e., correlated with the error term), then OLS may produce biased and inconsistent estimates. In this case, 2SLS can be used to obtain consistent estimates. This method involves first regressing the endogenous variable on the instruments (i.e., variables that are correlated with the endogenous variable but not with the error term), and then using the predicted values from this first-stage regression as the explanatory variable in the second-stage regression.


Maximum likelihood estimation (MLE): MLE is another method for estimating the parameters of a finite distributed lag model. MLE involves finding the values of the parameters that maximize the likelihood of observing the data. This method can be particularly useful when the error term is not normally distributed or when the model includes other complex features.


To determine the appropriate lag length for the model, one common approach is to use the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). These criteria evaluate the trade-off between the goodness of fit of the model and the complexity of the model (i.e., the number of parameters and lags included), and select the model that achieves the best balance between these factors.


Another approach for determining the lag length is to perform a lag length selection test, such as the t-test, the F-test, or the Lagrange multiplier test. These tests can be used to evaluate the significance of each lag and to determine the