In this notebook we will simulate simple "experiments" in which we "roll" a fair six-sided die, and record the number that comes up. The probabilities for the 6 discrete outcomes for the rolling of a die (1, 2, 3, 4, 5, 6) are all $\frac{1}{6}$. We will look at
Finally, we will compare the results of the distribution of averages with the predictions of the Central Limit Theorem.
import numpy as np
from scipy import stats
import matplotlib as mpl # As of July 2017 Bucknell computers use v. 2.x
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in the notebook.
# For figures in separate windows, comment out following line and uncomment
# the next line
# Must come before defaults are changed.
%matplotlib notebook
#%matplotlib
# As of Aug. 2017 reverting to 1.x defaults.
# In 2.x text.ustex requires dvipng, texlive-latex-extra, and texlive-fonts-recommended,
# which don't seem to be universal
# See https://stackoverflow.com/questions/38906356/error-running-matplotlib-in-latex-type1cm?
mpl.style.use('classic')
# M.L. modifications of matplotlib defaults using syntax of v.2.0
# More info at http://matplotlib.org/2.0.0/users/deflt_style_changes.html
# Changes can also be put in matplotlibrc file, or effected using mpl.rcParams[]
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size
p_vals = np.array([1/6 for i in range(6)])
print(p_vals)
i_vals = np.array([i for i in range(1,7)])
print(i_vals)
mean_parent = np.sum(i_vals*p_vals)
print(mean_parent)
var_parent = np.sum((i_vals - mean_parent)**2)/6
print(var_parent)
# OR
var_parent = np.var(i_vals - mean_parent)
print(var_parent)
stats.randint.rvs(1, 7) # One roll
n_rolls = 100
data = stats.randint.rvs(1, 7, size=n_rolls) # many rolls
#print(data)
low = 0.5
high = 6.5
nbins = 6
plt.figure()
plt.xlabel("value")
plt.ylabel("occurences")
plt.title("Histogram of die-rolling results")
out = plt.hist(data, nbins, [low,high])
#out[0],out[1] # occurrences and bin boundaries
(Consistent with $\bar{x}_{\rm parent}$ and $\sigma^2_{\rm parent}$)
np.mean(data)
np.var(data)
n_expt = 10000
results = np.zeros(n_expt)
for i in range(n_expt):
data = stats.randint.rvs(1, 7, size=n_rolls)
results[i] = np.mean(data)
nbins = 40
low = 2.5
high = 4.5
plt.figure()
plt.xlabel("value")
plt.ylabel("occurences")
plt.title("Results for Average")
plt.xlim(2,5)
out = plt.hist(results, nbins, [low,high])
#out[0],out[1] # occurrences and bin boundaries
The distribution of averages in the histogram above looks like it's approaching a gaussian distribution. Let's check the prediction the Central Limit Theorem.
print("Mean of all of the averages =", np.mean(results))
print("Standard deviation of all of the averages =", np.std(results))
prediction = np.sqrt(var_parent)/np.sqrt(n_rolls)
print("prediction from Central Limit Theorem for sample standard deviation =", prediction)