Next: Verification and Validation of Up: Input Modeling Previous: Parameter Estimation

Goodness-of-Fit Tests

Earlier we introduced hypothesis test to examine the quality of random number generators. Now we will apply these tests to hypothesis about distributional forms of input data.
Goodness-of-fit tests provide helpful guidance for evaluating the suitability of a potential input model.
The tests depends heavily on the amount of data. If very little data are available, the test is unlikely to reject any candidate distribution (because not enough evidence to reject); if a lot of data are available, the test will likely reject all candidate distributions (because none fits perfectly).
Failing to reject a candidate should be viewed as a piece of evidence in favor of that choice; while rejecting an input model is only one piece of evidence against the choice.
Chi-square test is for large sample sizes, for both discrete and continuous distributional assumptions, when parameters are estimated by maximum likelihood.
- Arranging the n observations into a set of k class intervals or cells.
- The test statistic
  
  $\begin{displaymath}{x_0}^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2} {E_i} \end{displaymath}$
  
  where is the observed frequency in the ith class interval and is the expected frequency in that class interval.
- The ${x_0}^2$ approximately follows the chi-square distribution with k-s-1 degrees of freedom, where s represents the number of parameters of the estimated distribution. E.g Poisson distribution has s = 1, normal distribution has s=2.
- The hypothesis
  
  ${\rm H_0}:$
  
  the random variable, X, conforms to the distributional assumption with the parameter(s) given by the parameter estimate(s)
  
  ${\rm H_1}:$
  
  the random variable X does not conform the distribution
- The critical value ${x_{\alpha, k-s-1}}^2$ is found in Table A.6. is rejected if ${x_0}^2 > {x_{\alpha, k-s-1}}^2$ .
- The choice of k, the number of class intervals, see Table 10.5 on page 377.
- Example 10.13 on page 377.
Chi-square test with equal probabilities:
- If a continuous distributional assumption is being tested, class intervals that are equal in probability rather than equal in width of interval should be used.
- Example 10.14: Chi-square test for exponential distribution (page 379)
  - test with intervals of equal probability (not necessary equal width)
  - number of intervals less than or equal to n/5
  - n = 50, so $k \le 10$ , according to recommendations in Table 10.5, 7 to 10 class intervals be used.
  - Let k = 8, thus
  - The end points for each interval are computed from the cdf for the exponential distribution
    
    $\begin{displaymath}F(a_i) = 1 - e^{-\lambda a_i} \end{displaymath}$
    
    where represents the end point of the ith interval.
  - Since is the cumulative area from zero to , thus
    
    $\begin{displaymath}ip = 1 - e^{-\lambda a_i} \end{displaymath}$
    
    thus
    
    $\begin{displaymath}a_i = -\frac{1}{\lambda} \ln (1-ip) ~~ i = 0, 1, ..., k \end{displaymath}$
    
    regardless the value of $\lambda$ , and $a_k = \infty$ .
  - With $\lambda = 0.084$ in this example and k = 8,
    
    $\begin{displaymath}a_1 = - \frac{1}{0.084} \ln (1-0.125) = 1.590 \end{displaymath}$
    
    continue with i = 2,3,...,7 results in 3.425, 5.595, 8,252, 11.677, 16.503, and 24.755.
  - See page 379 and 380 for completion of the example.
- Example 10.15 (Chi-square test for Weibull distribution) on page 380
- Example 10.16 (Computing intervals for the normal distribution) on page 381
  - For the given data, using suggested estimator in Table 10.3 on page 370, we know (the original data was from Example 10.3 on page 360)
    
    $\begin{displaymath}\mu = \overline(x) = 11.90 \end{displaymath}$
    
    $\begin{displaymath}\sigma^2 = S^2 = \end{displaymath}$
- Kolmogorov-Smirnov Goodness-of-fit test
  - Chi-square test heavily depends on the class intervals. For the same data, different grouping of the data may result in different conclusion, rejection or acceptance.
  - The K-S goodness-of-fit test is designed to overcome this difficulty. The idea of K-S test is from q-q plot.
  - The K-S test is particularly useful when sample size are small and when no parameters have been estimated from the data.
  - Example 10.7 on page 383, using the method described in Section 8.4.1 on page 299. A few notes:
    - If the interarrival time is exponentially distributed, the arrival times are uniformly distributed on (0,T]

Next: Verification and Validation of Up: Input Modeling Previous: Parameter Estimation

Meng Xiannong 2002-10-18