# Statistics Indian Statistical Service Part 3 Question-4 Solved Solution

**4.(a) Consider the following two variables linear model with observations taken as**

**deviation from mean :**

**yt = βxt + ut ; t = 1, 2, ..., n**

**where **it's* *follow AR(1) process *ut=p*

*εt's *are iid with mean *0 *and variance .

**(i) Obtain the OLS and GLS estimators of β, say b and β respectively.**

**To obtain the OLS estimator of β, we need to minimize the sum of squared errors:**

SSE(β) = ∑(yt - βxt)^2

Taking the derivative of SSE(β) with respect to β and setting it to zero, we get:

∂SSE(β)/∂β = -2∑(yt - βxt)xt = 0

Solving for β, we get the OLS estimator:

b = ∑xtyt / ∑x^2

To obtain the GLS estimator of β, we need to first estimate the variance-covariance matrix of ut:

Var(ut) = σ_ε^2 / (1 - ρ^2) ... (1)

Using (1), we can define the GLS estimator of β as follows:

β = (X'V^-1X)^-1(X'V^-1y)

where X is the n x 1 matrix of xt, y is the n x 1 matrix of yt, and V is the n x n variance-covariance matrix of ut given by:

V = [Var(u1) Cov(u1,u2) ... Cov(u1,un)]

[Cov(u2,u1) Var(u2) ... Cov(u2,un)]

[... ... ... ... ]

[Cov(un,u1) Cov(un,u2) ... Var(un)]

Using the AR(1) structure of ut, we can derive the elements of V as follows:

Var(ui) = σ_ε^2 / (1 - ρ^2)

Cov(ui,ui+1) = ρ Var(ui)

Cov(ui,uj) = 0 for |i - j| > 1

Substituting these expressions into the GLS estimator of β, we get:

β = [(∑xt^2 / (1 - ρ^2)) - (∑xtxt+1ρ / (1 - ρ^2))]^-1 [(∑xtyt / (1 - ρ^2)) - (∑ytxt+1ρ / (1 - ρ^2))]

Therefore, the OLS estimator of β is b = ∑xtyt / ∑x^2, while the GLS estimator of β is β = [(∑xt^2 / (1 - ρ^2)) - (∑xtxt+1ρ / (1 - ρ^2))]^-1 [(∑xtyt / (1 - ρ^2)) - (∑ytxt+1ρ / (1 - ρ^2))].

**(ii) If r1 is the sample autocorrelation of **{xi, *t *= 1, ..., **n} of order one and terms of order 0( r1) are ignored for all (k +0?-3, then obtain Variance (β) **

**Variance (b)**

**Comment on the relative efficiency of GLS estimator over OLS estimator.**

**(7+8)**

This model is known as the AR(1)-error model, and it can be written as:

yt = βxt + pu_(t-1) + εt

where ut follows an AR(1) process with parameter p, and εt are independent and identically distributed (iid) random variables with mean 0 and variance σ_ε^2.

To estimate the parameters of this model, we can use the method of maximum likelihood. The likelihood function is given by:

L(β,p,σ_ε^2) = (2πσ_ε^2)^(-n/2) exp[-(1/2σ_ε^2)∑(yt - βxt - pu_(t-1))^2]

Taking the log of the likelihood function and simplifying, we get:

log L = -(n/2) log(2π) - (n/2) log(σ_ε^2) - (1/2σ_ε^2)∑(yt - βxt - pu_(t-1))^2

To maximize the log-likelihood function, we need to take partial derivatives with respect to each parameter and set them equal to zero:

∂log L/∂β = 0

∂log L/∂p = 0

∂log L/∂σ_ε^2 = 0

Solving these equations, we can obtain the maximum likelihood estimates of β, p, and σ_ε^2.

Note that the AR(1) process for ut implies that the errors are autocorrelated. This can be addressed by estimating the autocorrelation parameter p and incorporating it into the model. Alternatively, we can use a different type of model that allows for autocorrelated errors, such as the ARIMA model.

**4.(b) (i) Define finite and infinite distributed lag models.**

Finite distributed lag models and infinite distributed lag models are two types of regression models that are commonly used in econometrics and other fields to study the relationship between a dependent variable and one or more explanatory variables over time.

A finite distributed lag model is a linear regression model that includes a set of lagged values of the explanatory variable(s) up to a finite number of periods. For example, a simple finite distributed lag model with one explanatory variable might be written as:

y_t = β_0 + β_1x_t + β_2x_{t-1} + β_3x_{t-2} + ε_t

where y_t is the dependent variable at time t, x_t is the value of the explanatory variable at time t, x_{t-1} is the value of the explanatory variable at the previous time period, and ε_t is the error term. In this model, the coefficients β_1, β_2, and β_3 represent the effects of the explanatory variable on the dependent variable in the current period and in the two previous periods.

An infinite distributed lag model, on the other hand, includes all past values of the explanatory variable(s), going back to an infinite number of periods. For example, a simple infinite distributed lag model with one explanatory variable might be written as:

y_t = β_0 + β_1x_t + β_2x_{t-1} + β_3x_{t-2} + ... + ε_t

In this model, the coefficients β_1, β_2, β_3, etc. represent the effects of the explanatory variable on the dependent variable over time, with the effect of each past value of x_t decaying exponentially over time.

In practice, infinite distributed lag models are rarely used because they require a large amount of data to estimate accurately, and the coefficients can be difficult to interpret. Finite distributed lag models are more commonly used because they allow researchers to study the short-term and medium-term effects of an explanatory variable on the dependent variable. However, the choice between a finite and infinite distributed lag model depends on the specific research question and the available data.

**(iii) How can we estimate parameters and lag length of finite distributed lag models.**

The parameters and lag length of a finite distributed lag model can be estimated using various statistical methods. Here are a few common approaches:

Ordinary least squares (OLS): This is the most common method for estimating the parameters of a finite distributed lag model. OLS involves minimizing the sum of squared errors between the predicted values and the actual values of the dependent variable. The lag length of the model is typically chosen based on prior knowledge or by testing different lag lengths and selecting the one with the best performance in terms of model fit.

Two-stage least squares (2SLS): If the explanatory variable is endogenous (i.e., correlated with the error term), then OLS may produce biased and inconsistent estimates. In this case, 2SLS can be used to obtain consistent estimates. This method involves first regressing the endogenous variable on the instruments (i.e., variables that are correlated with the endogenous variable but not with the error term), and then using the predicted values from this first-stage regression as the explanatory variable in the second-stage regression.

Maximum likelihood estimation (MLE): MLE is another method for estimating the parameters of a finite distributed lag model. MLE involves finding the values of the parameters that maximize the likelihood of observing the data. This method can be particularly useful when the error term is not normally distributed or when the model includes other complex features.

To determine the appropriate lag length for the model, one common approach is to use the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). These criteria evaluate the trade-off between the goodness of fit of the model and the complexity of the model (i.e., the number of parameters and lags included), and select the model that achieves the best balance between these factors.

Another approach for determining the lag length is to perform a lag length selection test, such as the t-test, the F-test, or the Lagrange multiplier test. These tests can be used to evaluate the significance of each lag and to determine the optimal number of lags to include in the model.

**(iii) Consider geometric distributed lag model as a special case of infinite distributed lag model and obtain its mean lag and median lag. (3+4+8)**

A geometric distributed lag model is a special case of an infinite distributed lag model, where the weights follow a geometric sequence. The model can be represented as:

Yt = α + β(θ/1-θ)Xt + β(θ/1-θ)γXt-1 + β(θ/1-θ)γ^2Xt-2 + ... + εt

where Yt is the dependent variable, Xt is the independent variable, β is the coefficient of the independent variable, α is the intercept, γ is the geometric lag coefficient, θ is the sum of all the geometric lag coefficients, and εt is the error term.

The mean lag of a geometric distributed lag model is given by:

Mean Lag = (1-θ)/θ

To derive this formula, we can use the fact that the sum of an infinite geometric sequence with first term a and common ratio r is given by a/(1-r). In our case, the sum of the geometric lag coefficients is θ, and the common ratio is γ. Therefore, we have:

θ = 1 + γ + γ^2 + ...

θγ = γ + γ^2 + γ^3 + ...

Subtracting the second equation from the first, we get:

θ - θγ = 1

Solving for γ, we get:

γ = (θ-1)/θ

The mean lag is then given by:

Mean Lag = 1/γ - 1

Substituting the value of γ, we get:

Mean Lag = (1-θ)/θ

The median lag of a geometric distributed lag model is given by:

Median Lag = -log(0.5)/log(γ)

To derive this formula, we can use the fact that the cumulative distribution function of a geometric distribution is given by F(k) = 1 - γ^k, where k is the lag. The median is the lag for which F(k) = 0.5, i.e., when γ^k = 0.5. Taking the logarithm of both sides, we get:

k = -log(0.5)/log(γ)

Substituting the value of γ, we get:

Median Lag = -log(0.5)/log[(θ-1)/θ]

**4.(c) (i) Explain the method of principal component regression for handling the**

**multicollinearity problem.**

Multicollinearity is a common problem in regression analysis, where the independent variables are highly correlated with each other, making it difficult to estimate the individual effects of each variable on the dependent variable. One approach to dealing with multicollinearity is the method of principal component regression (PCR).

The basic idea of PCR is to use a smaller set of principal components (PCs) as independent variables in the regression analysis, instead of using all the original independent variables. The PCs are a linear combination of the original variables that capture most of the variation in the data, and are uncorrelated with each other by definition. Thus, using PCs as independent variables can reduce the multicollinearity problem.

The method of PCR involves the following steps:

Standardize the data: The original independent variables are standardized (i.e., centered and scaled) to have zero mean and unit variance.

Compute the principal components: The principal components are computed using the covariance matrix or the correlation matrix of the standardized variables. The first principal component (PC1) is the linear combination of the variables that explains the most variation in the data, and each subsequent PC explains the most remaining variation, subject to the constraint that it is uncorrelated with the previous PCs.

Select the number of components: The number of principal components to be used as independent variables in the regression analysis is chosen based on a criterion such as the percentage of variance explained by the PCs or a cross-validation procedure.

Regression analysis: The regression analysis is performed using the selected principal components as independent variables and the original dependent variable.

Interpretation: The regression coefficients of the principal components can be interpreted as the effect of each component on the dependent variable, and the loadings of the variables on each component can be used to interpret the underlying structure of the data.

PCR can be a useful approach to handling multicollinearity, especially when the number of variables is large and there is a high degree of correlation among them. However, it may also have some limitations, such as loss of interpretability of the individual variables and reduced precision of the estimated coefficients.

**(ii) How can we select the number of principal components to be omitted using the scree plot and variance explained criterion? **

The scree plot and variance explained criterion are commonly used methods for selecting the number of principal components to be omitted in a principal component analysis (PCA). Here's how you can use them:

Scree plot method: The scree plot is a graph that shows the eigenvalues of each principal component in descending order. The point at which the plot "levels off" is used as a guide for selecting the number of components to retain. The idea is that the initial components will have high eigenvalues, while the later ones will have relatively low eigenvalues. The point at which the plot levels off indicates that the additional components are not accounting for a significant amount of variance in the data.

To use the scree plot method to select the number of principal components, you can follow these steps:

Conduct a PCA on your data.

Plot the eigenvalues of each principal component in descending order.

Look for the "elbow" or point where the plot levels off.

Select the number of components to retain based on the location of the elbow.

Variance explained criterion: Another way to select the number of principal components to retain is to use the variance explained criterion. This method involves selecting the number of components that explain a desired amount of variance in the data. For example, you may decide that you want to retain enough components to explain 80% of the variance in the data.

To use the variance explained criterion to select the number of principal components, you can follow these steps:

Conduct a PCA on your data.

Calculate the proportion of variance explained by each component.

Calculate the cumulative proportion of variance explained by adding up the proportions of variance explained by each component.

Select the number of components that explain the desired amount of variance in the data.

Note that there is no hard and fast rule for selecting the number of principal components to retain, and different methods may result in slightly different results. It's also important to interpret the retained components in the context of your research question and to assess the quality of the PCA results using other diagnostic tools.

Would you like to enroll for more solved papers?

- Yes, I want to enroll
- Thinking about it