stats
sub-module of scipy
¶Question: I flip a fair coin 100 times. What is the probability that I get 60 or more "heads"?
One way to answer this question is to use the known properties of the binomial distribution (see the end of this notebook), but I want to get the answer performing a Monte Carlo simulation that uses tools from the stats
submodule of scipy
.
Marty Ligare, August 2020
import numpy as np
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modification of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
plt.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('xtick', labelsize=12)
plt.rc('ytick', labelsize=12)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
I am going to generate 100 random 0's and 1's, and let 1 represent a "heads" and 0 represent a "tails."
n_flips = 100 # Number of coin flips
data = stats.randint.rvs(0, 2, size=n_flips)
print(data)
I can count the "heads" by summing the array data
.
n_heads = np.sum(data)
print(n_heads)
n_expts = 10000 # Number of experiments to simulate
results = np.zeros(n_expts) # Create array for results of simulations
for i in range(n_expts):
results[i] = np.sum(stats.randint.rvs(0, 2, size=n_flips))
nbins = 101
low = -.5
high = 100.5
plt.figure()
plt.xlabel("# of heads")
plt.ylabel("occurences")
plt.hist(results, nbins, [low,high])
plt.xlim(30,70);
np.mean(results)
success = 0
for i in range(len(results)):
if results[i] > 59:
success += 1
print("number of trials with 60 or more heads =", success, ", probability =",success/n_expts)
sum(1 if i>59 else 0 for i in results)
The probablity of obtaining 60 or more heads is
$$1 - \mbox{probability of obtaining 59 or fewer heads} = 1 - C_{DF}(59)$$p = 0.5 # probability of getting a "head" in a single trial
n = 100 # number of trials
print("probability =", 1 - stats.binom.cdf(59, n, p))
This result is consistent with results of the Monte Carlo simulation.
%version_information is an IPython magic extension for showing version information for dependency modules in a notebook;
See https://github.com/jrjohansson/version_information
%version_information
is available on Bucknell computers on the linux network. You can easily install it on
any computer.
%load_ext version_information
version_information numpy, scipy, matplotlib