Statistics_ Indian Statistical Service_ Part 3 Question-1 Solved Solution

SOURAV DAS
Mar 24, 2023
8 min read

1. (a) Consider a population of size N. Let S1 be a simple random sample of size n1 drawn

without replacement. Another simple random sample S2 of size n2 was also drawn without replacement from the remaining population.

(i) Find the probability of obtaining the combined sample Si US2 from the population.

The combined sample Si U S2 contains all the elements of both S1 and S2. The probability of obtaining Si U S2 can be found by considering the probability of obtaining S1 and then obtaining S2 from the remaining population, or by considering the probability of obtaining S2 and then obtaining S1 from the remaining population.

We can use the hypergeometric distribution to find the probability of obtaining S1 and S2 separately. The hypergeometric distribution models the probability of drawing a sample of size k that contains exactly r successes (elements of interest) from a population of size N, where there are a total of K successes in the population.

The probability of obtaining S1 is:

P(S1) = (K choose n1) * ((N - K) choose (n1 - K))/ (N choose n1)

where (a choose b) represents the number of ways to choose b items from a set of size a.

The probability of obtaining S2 from the remaining population is:

P(S2 | S1) = ((K - k) choose n2) * ((N - K - n1) choose (n2 - k))/((N - n1) choose n2)

where k is the number of elements in S1 that are also in S2.

The probability of obtaining Si U S2 is then:

P(Si U S2) = P(S1) * P(S2 | S1) + P(S2) * P(S1 | S2)

where P(S1 | S2) is the probability of obtaining S1 from the remaining population after S2 has been drawn.

Since S1 and S2 are both simple random samples, their probability of occurrence is the same, and we can simplify the above equation to:

P(Si U S2) = 2 * P(S1) * P(S2 | S1)

Note that if n1 + n2 > N, then P(S1) and P(S2 | S1) will be 0, since it is not possible to obtain samples of size n1 and n2 without replacement from a population of size N when n1 + n2 > N.

Overall, the probability of obtaining the combined sample Si U S2 from the population depends on the sizes of S1 and S2, as well as the overlap between them.

^ A^ A

(ii) Define = aYi+ (1— a) Y2 , 0<a< 1. Show that Ya is an unbiased estimator for the population mean. Here Yi is the mean of sample Si, i = 1, 2. (7+8)

To show that Ya is an unbiased estimator for the population mean, we need to show that the expected value of Ya is equal to the population mean.

We have:

Ya = aY1 + (1 - a)Y2

where Y1 and Y2 are the sample means of samples S1 and S2, respectively.

The expected value of Ya can be computed as follows:

E(Ya) = E(aY1 + (1 - a)Y2)

= aE(Y1) + (1 - a)E(Y2)

Since Y1 and Y2 are sample means, we know that they are unbiased estimators for the population mean, which means that E(Y1) = E(Y2) = μ, where μ is the population mean. Therefore:

E(Ya) = aμ + (1 - a)μ

= μ

This shows that the expected value of Ya is equal to the population mean, and hence Ya is an unbiased estimator for the population means.

1.(b) Consider the multiple regression model with a set of linear equality restrictions

binding the regression coefficients.

(i) Derive the restricted regression estimator by minimizing the residual sum of squares under the set of restrictions.

When there are linear equality restrictions binding the regression coefficients in a multiple regression model, the unrestricted least squares estimator may not satisfy these restrictions. Therefore, we need to derive a restricted regression estimator that satisfies these restrictions.

Let's consider a multiple regression model with p independent variables and n observations:

Y = β0 + β1X1 + β2X2 + ... + βpXp + ε

where Y is the dependent variable, X1, X2, ..., Xp are the independent variables, β0, β1, β2, ..., βp are the regression coefficients, and ε is the error term.

Suppose we have k linear equality restrictions on the regression coefficients, expressed as:

Rβ = r

where R is a k x (p+1) matrix of constants and r is a k x 1 vector of constants.

The restricted regression estimator, denoted as βr, is obtained by minimizing the residual sum of squares (RSS) subject to the linear equality restrictions.

The RSS is given by:

RSS = (Y - Xβ)'(Y - Xβ)

where X is the n x (p+1) matrix of independent variables, with the first column all ones to capture the intercept term.

To incorporate the linear equality restrictions, we need to use Lagrange multipliers. The Lagrangian function is given by:

L(β, λ) = (Y - Xβ)'(Y - Xβ) + λ'(Rβ - r)

where λ is a k x 1 vector of Lagrange multipliers.

To find the restricted regression estimator, we need to take the derivative of the Lagrangian function with respect to β and λ, and set them equal to zero:

∂L/∂β = -2X'(Y - Xβ) + R'λ = 0

∂L/∂λ = Rβ - r = 0

Solving for β and λ, we get:

βr = (X'R(X'R)')^-1 X'R(Y - Xβr)

which is a generalization of the unrestricted least squares estimator to satisfy the linear equality restrictions, and

Rβr = r

which ensures that the restricted estimator satisfies the linear equality restrictions.

Note that if the number of restrictions k is equal to p, then the system of equations is square, and we can solve for βr directly using the normal equations:

(X'R(X'R)') βr = X'R(Y)

This is called the restricted least squares estimator.

(ii) Obtain the bias of the restricted regression estimator when the restrictions may not be true. Show that the estimator is unbiased and satisfies linear restrictions provided the restrictions are true.

If the linear equality restrictions are not true, the restricted regression estimator obtained by minimizing the residual sum of squares subject to the restrictions may be biased.

To derive the bias of the restricted estimator, let's consider the true regression coefficients β*, which satisfy the linear equality restrictions Rβ* = r.

The restricted estimator βr can be expressed as:

βr = β* + (X'R(X'R)')^-1 X'Rε

where ε is the vector of errors.

Taking the expected value of both sides, we get:

E(βr) = E(β*) + E[(X'R(X'R)')^-1 X'Rε]

Since β* satisfies the linear equality restrictions, we have:

Rβ* = r

Taking the transpose of both sides and left-multiplying by R, we get:

R'β* = R'r

Multiplying both sides by (X'X)^-1, we get:

β* = (X'R(R'X)^-1R'r)

Substituting this expression for β* in the expression for E(βr), we get:

E(βr) = (X'R(R'X)^-1R'r) + E[(X'R(X'R)')^-1 X'Rε]

The second term on the right-hand side is the bias term, which is equal to:

Bias(βr) = E[(X'R(X'R)')^-1 X'Rε]

To show that the restricted estimator is unbiased and satisfies linear restrictions when the restrictions are true, we need to show that the bias term is zero when Rβ = r.

If the restrictions are true, we have:

Rβ* = r

Multiplying both sides by X', we get:

R'Xβ* = R'r

Substituting β* = (X'R(R'X)^-1R'r) in the above equation, we get:

R'X(X'R(R'X)^-1R'r) = R'r

Simplifying, we get:

R'R(R'X)^-1R'r = R'r

which implies:

R'R(R'X)^-1 = I

Multiplying both sides by X, we get:

XR'R(R'X)^-1 = X

Substituting this expression for X in the bias term, we get:

Bias(βr) = E[(XR'R(R'X)^-1(XR'R(R'X)^-1)' X'Rε]

Simplifying, we get:

Bias(βr) = E[(XR'R(R'X)^-1X)' (XR'R(R'X)^-1X)^-1 XR'R(R'X)^-1 X'Rε]

Since (XR'R(R'X)^-1X)' is a symmetric idempotent matrix, we have:

(XR'R(R'X)^-1X)' (XR'R(R'X)^-1X)^-1 = I

Therefore, the bias term simplifies to:

Bias(βr) = E[XR'R(R'X)^-1 X'Rε]

= R'R(R'X)^-1 E[(X'ε)]

= 0

where the last equality follows from the fact that the unrestricted estimator is unbiased, and hence E[(X'ε)] = 0.

Therefore, we have shown that the restricted estimator is unbiased and satisfies linear restrictions when the restrictions are true.

1.(c) (i) Explain the steps in constructing a consumer price index and discuss a method

of its construction.

A consumer price index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. The CPI is used as a gauge of inflation, and it is one of the most widely used measures of economic activity.

Here are the steps involved in constructing a CPI:

Select the CPI market basket: The first step is to select the market basket of goods and services that will be used to calculate the CPI. The market basket should be representative of the goods and services purchased by the population being measured.

Collect price data: The prices of the goods and services in the market basket are collected at regular intervals, such as monthly. The prices are typically collected from a sample of retail stores, service providers, and other outlets where the goods and services are sold.

Weight the prices: The prices are weighted according to the importance of the goods and services in the market basket. The weights are determined by the consumption patterns of the population being measured.

Calculate the cost of the market basket: The cost of the market basket is calculated by multiplying the prices by the weights and summing the results. This gives the total cost of the market basket in the current period.

Calculate the CPI: The CPI is calculated by dividing the cost of the market basket in the current period by the cost of the market basket in the base period (which is usually set to 100). This gives a percentage change in the cost of the market basket over time.

Calculate the inflation rate: The inflation rate is calculated as the percentage change in the CPI over time.

There are several methods for constructing a CPI, but the most commonly used method is the Laspeyres index. The Laspeyres index uses the fixed weights of the base period to calculate the CPI for subsequent periods. The Laspeyres index is relatively easy to calculate and is widely used, but it may overestimate inflation because it does not take into account the substitution effect, which occurs when consumers switch to lower-priced goods and services when prices rise.

Another method for constructing a CPI is the Paasche index, which uses current period weights to calculate the CPI for each period. The Paasche index takes into account the substitution effect, but it is more difficult to calculate and may be less accurate because it relies on current period data, which may be subject to measurement error.

In practice, a CPI may use a combination of these methods or other methods to address specific issues or limitations in the data. The Bureau of Labor Statistics in the United States, for example, uses a modified Laspeyres index that incorporates some elements of the Paasche index to address the substitution effect.

(ii) Indicate the precautions required while using the consumer price index numbers.

The consumer price index (CPI) is an important economic indicator that measures the average change over time in the prices paid by consumers for a basket of goods and services. While CPI is widely used as a measure of inflation, it is important to take certain precautions when interpreting and using CPI data. Here are some precautions to keep in mind:

Understand the CPI methodology: The CPI is calculated using a complex methodology that takes into account changes in the prices of thousands of goods and services. It is important to understand the details of the methodology to correctly interpret the CPI data.

Consider the basket of goods and services: The CPI is based on a fixed basket of goods and services that may not reflect the spending patterns of all consumers. Therefore, the CPI may not accurately reflect the inflation experienced by different demographic groups.

Take regional differences into account: The CPI may not reflect regional differences in prices, particularly for items such as housing and transportation, which can vary significantly depending on location.

Be aware of substitution bias: The CPI assumes that consumers will substitute lower-priced goods and services for those that have become more expensive. However, this may not always be possible or desirable, particularly for items such as housing or healthcare.

Watch out for measurement errors: CPI data can be affected by measurement errors such as sampling errors, data collection errors, and quality adjustments. It is important to be aware of these errors and their potential impact on the accuracy of the CPI data.

By keeping these precautions in mind, you can make better use of CPI data to understand inflation and its impact on consumers.

What are you waiting for, enroll now