# Chi Square Distribution – Definition

### Chi-Square (C2) Distribution Definition

In probability theory and statistics, the Chi-squared distribution also referred as chi-square or X2-distribution, with k degrees of freedom, is the distribution of a sum of squares of k independent standard regular normal variables.

Chi-distribution is a unique case of a gamma distribution and is among the most broadly applied probability distribution in inferential statistics. It is used commonly in hypothesis evaluation or development of an acceptable range of deviation.

### A Little More on What is Chi-Square (C2) Distribution

The chi-squared is applied in the regular chi-squared tests for goodness of fit of a witnessed distribution to a hypothetical one. More specifically, it measures the independence of the two methods of a grouping of qualitative information and confidence range approximation for population standard deviation of the normal distribution from a representative standard deviation. Other mathematical studies such as Friedman’s analysis of variance by ranks apply chi-square distribution.

The chi-squared distribution is most commonly employed in hypothesis testing. Despite popular distributions, for instance, normal distribution and the exponential distributions, chi-square distribution is rarely applied in direct modeling of ordinary occurrences. It results in the following hypothesis evaluation:

•         Chi-squared test of independence in contingency tables
•         Chi-squared test of goodness of fit of observed data to hypothetical distributions
•         Likelihood-ratio test for nested models
•         Log-rank test in survival analysis
•         Cochran–Mantel–Haenszel test for stratified contingency tables

Besides the above applications, chi-squared distribution is a part of the definition of t-distribution and F-distribution useful in t-tests which are an analysis of variance and regression analysis.

The major reason for the extensive use of chi-square in postulate evaluation is its association to the normal distribution. Many hypothesis tests use test statistics, for example, t-statistic in a t-test. For these t-tests, as the sample size, n, increases the sample distribution of the test statistic moves to the normal distribution in a central limit theorem concept.

As a result of test statistics being asymptotically normally distributed, given that the sample size is large enough, the distribution applied for hypothesis testing may be estimated by a normal distribution. The process of testing hypotheses using a normal distribution is well understood and is relatively easy. The simplest chi-squared distribution is the square of the standard normal distribution. In case of testing a hypothesis using a normal distribution, a chi-square distribution may be used.

Additionally, Chi-squared distribution is generally applied is that it belongs to a class of likelihood ratio tests (LRT). LRTs possess favorable characteristics specifically; it provides the high power in the null hypothesis rejection. On the other hand, Normal and chi-squared estimations are invalid asymptotically, and this preference is given to a t-distribution instead of normal estimation or chi-squared approximation for small sample size. Ramsey indicated that exact binomial test is normally powerful than a normal approximation.

### The Chi-Square Statistic

Assume we perform the following statistical experiment. We choose a random sample of n from a normal population, with a standard deviation equal to σ. Standard deviation is found to be s. with this information we can define a statistic referred to as chi-square using this equation

Χ2 = [ ( n – 1 ) * s2 ] / σ2

The distribution of the chi-square statistic is referred to as the chi-square distribution. The chi-square distribution is given by the following probability density function:

Y = Y0 * ( Χ2 ) ( v/2 – 1 ) * e-Χ2 / 2

Where Y0 is a constant that depends on the number of degrees of freedom, Χ2 is the chi-square statistic, v = n – 1 is the number of degrees of freedom, and e is a constant equal to the base of the natural logarithm system (estimated 2.71828). Y0 is defined so that the area under the chi-square curve is equal to 1.

### Academic Research for Chi Square (c2) Distribution

• The relation of control charts to the analysis of variance and chisquare tests, Scheffe, H. (1947). Journal of the American Statistical Association, 42(239), 425-431. This paper shows some established connections by simple and intuitive paths among these statistical methods: Shewhart control charts, analysis of variance, and chi-square tests.
• Remarks on a multivariate transformation, Rosenblatt, M. (1952). The Annals of mathematical statistics, 23(3), 470-472. This paper points out and discusses a simple transformation of an absolute continuous k-variate distribution into a uniform distribution on the k-dimensional hypercube. One can illustrate that random vector Z=TX is evenly distributed on the k-dimensional hypercube.
• Chisquare test for continuous distributions with shift and scale parameters, Nikulin, M. S. (1974). Theory of Probability & Its Applications, 18(3), 559-568. This paper examines the verification problem of the null hypothesis that the distribution function of independent similar distributed random variables belonging to a family of continuous function depending on the shift factor and the scalar factor in the given distribution function. Dividing the line into k-intervals by the points and grouping over these intervals, a frequency vector is obtained and probability vector.
• Bounds on normal approximations to Student’s and the chisquare distributions, Wallace, D. L. (1959). The Annals of Mathematical Statistics, 30(4), 1121-1130. The paper addresses the conversion of upper tail values of t or chi-square variates with n degrees of freedom to normal deviates. The main purpose of this paper is to develop bounds on the deviation from the actual normal deviates to the extent of absolute deviation is bounded by an-12cn -12 evenly throughout the tail.
• Bivariate distributions of some ratios of independent noncentral chisquare random variables, Hawkins, D. L., & Han, C. P. (1986). Communications in Statistics-Theory and Methods, 15(1), 261-277.  The paper illustrates the examination of three-paired ratios of bivariate distribution of independent non-central chi-squared random variables. These ratios emerge from the problem of calculating the combined power of simultaneous in balanced F-tests in balanced ANOVA and ANCOVA.
• On the choice of the number of class intervals in the application of the chi-square test, Mann, H. B., & Wald, A. (1942). The Annals of Mathematical Statistics, 13(3), 306-317.  The paper states that to verify whether a sample has been taken from a population with a particular probability distribution, the range of the variable is divided into some class range and the statistic calculated. Under the null hypothesis, it is clear that the statistic; has asymptotically the chi-square distribution with k-1 degrees of freedom when each population number is large.  When a choice is made regarding the number of class intervals, it is normally possible to get the alternative hypothesis with class probabilities similar to the class probabilities under the null hypothesis.
• Density functions of the bivariate chisquare distribution, Gunst, R. F., & Webster, J. T. (1973). Journal of statistical computation and simulation, 2(3), 275-288. This paper was intended to provide an experimental approach in solving simultaneous testing and approximation challenges faced by experimenters. Bivariate Chi-square which enables direct computer programming was introduced. An estimation that decreases the general dependency type to specific form is proposed, supported by strong theoretical reasoning. The last function of the Bivariate Chi-square is calculating the density function of a linear combination of private Chi-square irregular variables.
• On the limiting power function of the frequency chisquare test, Mitra, S. K. (1958). The Annals of Mathematical Statistics, 29(4), 1221-1233. The paper states that many authors have studied the power function of the frequency X2-test by obtaining a large sample of the expression of simple goodness of fit X2-test. As a result of challenges in finding the power function of the frequency X2-test regularly, a suggestion of the derivation of its Pitman limiting factor and illustration was given in the simple goodness fit’s case. The asymptotic power concept has been applied in various areas including the nonparametric conclusion, and it proved its usefulness in comparison of various consistent tests or alternative designs for study.
• A family of transformed chisquare distributions, Rahman, M. S., & Gupta, R. P. (1992). Communications in Statistics-Theory and Methods, 22(1), 135-146. The paper defines a family of Transformed Chi-square distributions, a special class of exponential family of distribution. Outward terminologies for the minimum variance unbiased predictors with less variance of a function of a factor of this family are given. The sign is and the power function for different hypothesis tests for the elements of this family are also found.
• Monitoring process means and variability with one non-central chisquare chart, Costa, A. F. B., & Rahim, M. A. (2004). Journal of Applied Statistics, 31(10), 1171-1183. Conventionally, an X-chart is useful in controlling the process average and R-chart to regulate process variance. These charts are insensitive to small changes in the process factors. A better option to these charts is exponential weighted moving average (AWMA) control chart for controlling the process average and variation ability that is highly effective in detecting small process changes. Besides, the EWMA control chart based on non-centered Chi-square data is more effective in identifying the average variability.
• On chisquare goodness-of-fit tests for continuous distributions, Watson, G. S. (1958). Journal of the Royal Statistical Society. Series B (Methodological), 44-72. The paper suggests that Given that in a Chi-squared goodness-of-fit test the unknown factors are approximated from the probability of the continuous observations before clustering. The impacts on the distribution of test criterion are studied.