The beginning of this notebook is a duplication of the notebook from the first M&M problem.
import numpy as np
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modification of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
plt.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
To get started, sample one bag of M&Ms, and count the numberof brown M&Ms.
Do this by generating 60 random integers from the set 0, 1, 2, 3, 4, 5, and let's say that "brown" = 0.
bag = stats.randint.rvs(0,6,size = 60)
print(bag)
np.bincount(bag)
. The first element in the array is the number of occurences of 0 in "bag," the second element is the number of occurences of 1, etc.np.bincount(bag)
bincount
, or np.bincount(bag)[0]
.np.bincount(bag)[0]
# Long version of sampling many bags
nb = 24 # number of bags
data_section = np.zeros(nb) # array for data from a lab section
for i in range(nb):
bag = stats.randint.rvs(0,6,size=60)
data_section[i] = np.bincount(bag)[0]
data_section
# Concise version of sampling many bags
nb = 24 # number of bags
data_section = np.array([np.bincount(stats.randint.rvs(0,6,size=60))[0] for i in range(nb)])
data_section
np.mean(data_section), np.std(data_section), np.std(data_section)/np.sqrt(len(data_section)-1)
$\overline N = 9.6 \pm 0.6$
plt.figure()
nbins = 20
low = 0
high = 20
plt.hist(data_section,nbins,[low,high])
plt.xlim(0,20)
plt.title("Histogram of brown M&Ms per bag - Single Section")
plt.xlabel("Number of brown M&Ms")
plt.ylabel("Occurences");
nb = 24 # Number of bags in a section
ns = 2000 # Number of sections in class
data_class = np.zeros(ns) # array for results from each section
for j in range(ns):
data_section = np.zeros(nb) # array for section data
for i in range(nb):
bag = stats.randint.rvs(0,6,size=60)
data_section[i] = np.bincount(bag)[0]
data_class[j] = np.mean(data_section)
np.mean(data_class), np.std(data_class)
plt.figure()
nbins = 30
low = 5
high = 15
binwidth = (high-low)/nbins
plt.hist(data_class,nbins,[low,high],alpha=0.5)
plt.xlim(low,high)
plt.title("Histogram of section averages",fontsize=14)
plt.xlabel("Number of brown M&Ms")
plt.ylabel("Occurences");
The standard deviation of the number of brown M&Ms in a bag in an indvidual section (that was determined in Part 1 above) is $\sigma = 2.9$. The Central Limit Theorem predicts that the standard deviation of the section averages (i.e., standard deviation of the mean) should be
$$ \sigma_{\rm averages} = \frac{\sigma}{\sqrt{N}} = \frac{2.9}{\sqrt{24}} \simeq 0.59 $$The standard deviation of the 2000 means I simulated is 0.59, which is consistent with that predicted by Central Limit Theorem.
We can also check the shape of the distribution of the means by comparing the results to the histrogram predicted by the Central Limit Theorem.
nbins = 30
low = 5
high = 15
plt.figure()
plt.xlabel("# of brown M&Ms")
plt.ylabel("occurences")
h_out = plt.hist(data_class, nbins, [low,high], alpha = 0.5, label='MC sim.')
plt.xlim(low,high);
x = np.zeros(len(h_out[1])-1) # Array for mid-points of histogram bins
y = np.zeros(len(h_out[1])-1) # Array for expected occurences from normal dist.
for i in range(len(x)):
x[i] = (h_out[1][i+1] + h_out[1][i])/2
y[i] = ns*(stats.norm.cdf(h_out[1][i+1],10,2.8/np.sqrt(nb)) -stats.norm.cdf(h_out[1][i],10,2.8/np.sqrt(nb)))
plt.scatter(x, y, c = 'red', label='CLT')
plt.legend();
version_information
is from J.R. Johansson (jrjohansson at gmail.com); see Introduction to scientific computing with Python for more information and instructions for package installation.
version_information
is installed on the linux network at Bucknell
%load_ext version_information
version_information numpy, scipy, matplotlib