In the "Prelude to Chapter 8" notebooks introduced the notion of a $\chi^2$ probability distribution function associated with a data set and an assumed functional model underlying the model. As discussed in H&H Chapter 8.2, the pdf depends on the number of degrees of freedom, $\nu$, defined in H&H Eq. (8.1) as
$$ \nu = N−{\cal N}, $$where $N$ is the number of data points, and ${\cal N}$ is the number of ,i>constraints</i>. The number of constraints is typically the number of parameters that you are determining with the statistical analysis. In Prelude A, $N=5$ and ${\cal N}=2$, so $\nu_A =5−3=2$; in Prelude B, $N=21$, and ${\cal N}=3$, so $\nu_B= 21 − 3 = 18$.
In this notebook I will introduce python tools for working with the $\chi^2$ probability distribution function.
Marty Ligare, August 2020
import numpy as np
from scipy import stats
from scipy import optimize
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modifications of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
mpl.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size
As we have done with other probability distributions, will use statistics functions from the stats sub-module of scipy. Where before we called functions like stats.norm.pdf()
we will now call stats.chi2.pdf
. We can make graphs of the pdfs approriate for the data and model of Prelude A and Prelude B as follows:
df = 20
stats.chi2.stats(df)
mean, var, skew, kurt = stats.chi2.stats(df, moments='mvsk')
print(mean, var, skew, kurt)
stats.chi2.stats?
df = 3 # number of degrees of freedom
plt.figure()
ulim = 16 # upper limit for plot
n = 161 # number of points for plot
x = np.linspace(0,ulim,n)
y = stats.chi2.pdf(x, df)
plt.plot(x, y)
plt.xlabel('$\chi^2$')
plt.ylabel('$X(\chi^2;3)$')
plt.title('$\chi^2$ pdf for $\\nu=3$');
df = 18 # number of degrees of freedom
plt.figure()
ulim = 50 # upper limit for plot
n = 201 # number of points for plot
x = np.linspace(0,ulim,n)
y = stats.chi2.pdf(x, df)
plt.plot(x, y)
plt.xlabel('$\chi^2$')
plt.ylabel('$X(\chi^2;3)$')
plt.title('$\chi^2$ pdf for $\\nu=18$');
NOTE: H&H's nomenclature is not the standard one. For example, in Fig. 8.4, they call what's on the
vertical the "cumulative probability distribution function", and give it the symbol $P$. This is what most
other references, as well as the stats
sub-module of scipy
call this one minus the cumulative distribution
function, i.e.,
where $C_{DF}$ is what most people call the cumulative distribution function.
These are just names, but be careful.
df = 3 # number of degrees of freedom
plt.figure()
ulim = 16 # upper limit for plot
n = 161 # number of points for plot
x = np.linspace(0,ulim,n)
y1 = stats.chi2.cdf(x, df)
y2 = 1 - stats.chi2.cdf(x, df)
plt.plot(x, y1, label='$C_{DF}$')
plt.plot(x, y2, label='$1-C_{DF}$')
plt.xlabel('$\chi^2$')
plt.ylabel('probability')
plt.legend();
The curve_fit
function found a minimum value of $\chi^2 =3.70$.
The probability of obtaining a value of $\chi^2$ greater than this for a model with 3 degrees of freedom can be found using the $C_{DF}$:
print(1 - stats.chi2.cdf(3.70, 3))
This is consistent with the probabilities found in the simulations.
The `curve_fit`` function found a minimum value of $\chi^2 =21.5$.
The probability of obtaining a value of $\chi^2$ greater than this for a model with 18 degrees of freedom can be found using the $C_{DF}$:
print(1 - stats.chi2.cdf(21.5, 18))
This is consistent with the probabilities found in the simulations.
plt.figure()
x = np.linspace(0,30,301)
y1 = stats.chi2.pdf(x,1)
y4 = stats.chi2.pdf(x,4)
y7 = stats.chi2.pdf(x,7)
y10 = stats.chi2.pdf(x,10)
plt.plot(x, y1, label='$\\nu = 1$')
plt.plot(x, y4, label='$\\nu = 4$')
plt.plot(x, y7, label='$\\nu = 7$')
plt.plot(x, y10, label='$\\nu = 10$')
plt.legend()
plt.ylim(0,0.3)
plt.xlabel('$\chi^2$')
plt.ylabel('$X(\chi^2;\\nu)$');
plt.figure()
x = np.linspace(0,30,301)
y1 = 1 - stats.chi2.cdf(x,1)
y4 = 1 - stats.chi2.cdf(x,4)
y7 = 1 - stats.chi2.cdf(x,7)
y10 = 1 - stats.chi2.cdf(x,10)
plt.plot(x, y1, label='$\\nu = 1$')
plt.plot(x, y4, label='$\\nu = 4$')
plt.plot(x, y7, label='$\\nu = 7$')
plt.plot(x, y10, label='$\\nu = 10$')
plt.legend()
plt.ylim(0,1)
plt.xlabel('$\chi^2$')
plt.ylabel('$P(\chi^2;\\nu)$');
df = 20
mean, var = stats.chi2.stats(df, moments='mv')
mean
stats.chi2.cdf(mean + 3*np.sqrt(var), df), (mean + 3*np.sqrt(var))/df
dfs = np.array([5, 10, 20, 30, 50, 100, 500])
for i in range(len(dfs)):
df = dfs[i]
mean, var = stats.chi2.stats(df, moments='mv')
print(df, (mean + 3*np.sqrt(var))/df)
version_information
is from J.R. Johansson (jrjohansson at gmail.com); see Introduction to scientific computing with Python for more information and instructions for package installation.
version_information
is installed on the linux network at Bucknell
%load_ext version_information
version_information numpy, scipy, matplotlib