Advertisement

Chi ( χ2) Square Test

 Chi ( χ2) Square Test




In the preceding chapter, the tests were based on the assumption that the samples were drawn from a normally or approximately normally distributed population. These types of tests are said to be parametric tests. In some situations where it is not possible to assume a particular type of population distribution from which samples are drawn. It is required to test the frequency of objects falling in specified ranges. For example, in socio-economic studies, we assign a number of families in different income levels, in production management studies, we assign a number of defective products, in market research, we assign a number of favored and disfavored products, etc.

These types of tests are possible with the help of a non-parametric test or Chi-square test. It is denoted by the Greek letter x2 and was developed by Karl Pearson in 1990. This theory describes

 

Where

            O = Observed frequency

            E = Expected frequency

 Degree of freedom

The degree of freedom is merely the numbers of data which are given as variables in a row or column or frequencies that are put in a contingency table and can be calculated independently. It is denoted by v. The degree of freedom of χ2 is developed in two independent ways:

Case I:

 If observed frequencies are presented in series (i.e., in the form of row or column), in this case degree of freedom is developed by v = n – 1. Here 'n' is the number of variables in the series in a given row or column.

Case II:

If observed frequencies are presented in the form of a contingency table (i.e. in the form of rows as well as columns), in this case, the degree of freedom is developed by v = (r - 1) (c - 1). Here 'r' and 'c' represent the numbers of rows and columns respectively.

Note:

The standard form of degree of freedom in χ2 distribution is given by v = n-1- k1 - k2.

Here,

(i)  1d.f. is lost due to the linear constraints .

(ii) k1 d.f. is lost due to the number of estimated parameters when parameters are not given in binomial and Poisson distribution. If parameters are given, we take k1 as zero.

(iii) k2 d.f. is lost due to the pooling of theoretical frequencies which are less than 5. If no one frequency is less than 5, in this case, we take k2 as zero.

Before going to test the χ2 the following precautions are necessary:

1.      The constraints on the cell frequencies should be linear such as 𝚺0= 𝚺E=N.

2.       Sample observations must be drawn randomly from the population

     3.      The sample should contain at least 50 observations

  `   4.      The observations should be expressed in original units, rather than in percentage or ratio form.                 Such precaution helps in precaution helps in comparison attributes of interest.

5.      Each cell (a group of result) should contain at least 5 observations. If it is less than 5, the value shall be overestimated, resulting in the rejection of the null hypothesis. Hence if any theoretical frequency is less than 5, we cannot apply a chi-square test. If found less than 5, we use the technique of pooling in which frequencies which are less than 5 are added with preceding or succeeding frequency/frequencies so as to get the resulting sum greater than 5 and degrees of freedom are adjusted accordingly.

6.      All the individual observations in a sample should be independent.

Properties of Chi-Square Distribution

1.      χ2 distribution which lies between 0 to  is continuous probability distribution.

2.      Since χ2 is the sum of squares, the value cannot be negative.

3.      The value of χ2  will be zero if the difference of each pair is zero.

4.      For different degrees of freedom, the shape of the curve will be different as shown in the following figure.

5.      χ2 is always based on a one-tailed test of the right-hand side of the standard normal curve.

6.      χ2 distribution is always positively skewed. [ Since d.f. ≥ 1]

7.      For chi-square distribution with v d.f., we have mean = v, variance = 2v and mode = v-2.

8.      Median of χ2 distribution divides total data into two equal parts.

Application of Chi-Square Distribution

The basic applications of the χ2 test are as follows:

1.      Test of goodness of fit

2.      Test of independence of attributes

3.      Yates correction for continuity

4.      Test for the population variance

5.      Test for homogeneity

Here we will discuss only the test of goodness of fit; Test of independence of attributes; and Yates correction for continuity.

Test of Goodness of Fit

If a researcher needs to understand whether an observed sample frequency distribution coincides with a theoretical frequency distribution, χ2 goodness of fit enables us to understand the situation of observed and expected frequency distributions. The observed frequencies come from the sample of fields and expected frequencies come from the theoretical hypothesized distribution. The goodness of fit describes the differences between the observed and expected frequency distributions. The small differences between the observed and expected frequency distributions are assumed to be resulting from sampling error. On the other side, the large differences between the observed and expected frequency distributions throw doubt on the assumption that the hypothesized theoretical frequency distribution is correct.

The test goodness of fit' is also used to test the significant difference between the observed and expected frequency distributions of binomial, Poisson, and normal, etc.

Basic steps for the goodness of fit are as follows:

Step 1: Null hypothesis: There is no significant difference between observed and expected frequency distributions.

Step 2: Alternative hypothesis: There is a significant difference between observed and expected frequency distributions.

Step 3: Test Statistic under H0, the statistic is

Where, O = Observed frequency (from the field)

Under H o, E = Expected frequency (from the formula)

 (i) E= 𝚺0/n (for equal proportion)

(ii) E= N × Proportion (for unequal proportion)

Here, N = Total frequency = Total observed data = 𝚺0

(iii) E=N × nCr pr qr (for binomial distribution)

 (iv) E=N × e-m mr /r!  (for Poisson distribution)

Step 4: Level of significance:

Step 5: Degree of freedom: (n-1)

Step 6: Critical value: We have to determine the tabulated value of χ2 at α % level of significance for (n - 1) degree of freedom from x' χ2 table.

Step 7: Decision: If the calculated value of χ2 ≤ tabulated value of χ2, we accept the null hypothesis otherwise, we reject the null hypothesis.


Post a Comment

0 Comments