In this notebook I provide two data sets from the roll of a nominally standard 6-sided die. The question is: Is this a fair die, or not? The data is an array of integers chosen from the set {0,1,2,3,4,5}. (We could make this set be {1,2,3,4,5,6} to correspond to the numbers on a die, but with python it's easier to start at 0.)
Use the procedure outlined at top of p.112 in Hughes and Hase, 2and calculate the $\chi^2$ statistic:
$$ \chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i} $$where $O_i$ represents the number of occurrences of something in the sample data, and $E_i$ gives the expected number of occurrences based on the null hypothesis.
Marty Ligare, August 2020
import numpy as np
from scipy import stats
from scipy import optimize
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in the notebook.
%matplotlib notebook
# M.L. modifications of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
mpl.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size
data = np.loadtxt('die_1000.dat', dtype=int)
data
e = len(data)*np.ones(6)/6.
print(e)
Here that just means counting occurrences.
o = np.bincount(data, minlength=6)
print(o)
plt.figure()
x = np.linspace(0,5,6)
plt.scatter(x, e, label='expected')
plt.scatter(x, o, color='red', label='observed')
plt.xlabel('value')
plt.ylabel('occurrences')
plt.title('1000 Rolls')
plt.legend(loc='upper left');
Calculate $\chi^2$ and find probability that random sampling would result in a value of $\chi^2$ greater than the observed value.
chi2_data = np.sum((o-e)**2/e)
print('chi2 = ',chi2_data)
print('reduced chi2 = ',chi2_data/6)
p = 1 - stats.chi2.cdf(chi2_data, 6)
print('probability of getting a value of chisq greater than ',chi2_data,'with 6 bins is ',p)
data = np.loadtxt('die_50000.dat', dtype=int)
data
e = len(data)*np.ones(6)/6.
print(e)
Here that just means counting occurrences.
o = np.bincount(data, minlength=6)
print(o)
plt.figure()
x = np.linspace(0,5,6)
plt.scatter(x, e, label='expected')
plt.scatter(x, o, color='red', label='observed')
plt.xlabel('value')
plt.ylabel('occurrences')
plt.title('1000 Rolls')
plt.legend(loc='upper left');
Calculate $\chi^2$ and find probability that random sampling would result in a value of $\chi^2$ greater than the observed value.
chi2_data = np.sum((o-e)**2/e)
print('chi2 = ',chi2_data)
print('reduced chi2 = ',chi2_data/6)
p = 1 - stats.chi2.cdf(chi2_data, 6)
print('probability of getting a value of chisq greater than ',chi2_data,'with 6 bins is ',p)
version_information
is from J.R. Johansson (jrjohansson at gmail.com).
See Introduction to scientific computing with Python
for more information and instructions for package installation.
version_information
has been installed system-wide on Bucknell linux networked computers.
%load_ext version_information
%version_information numpy, scipy, matplotlib