stats sub-module of scipy¶Question: I flip a fair coin 100 times. What is the probability that I get 60 or more "heads"?
One way to answer this question is to use the known properties of the binomial distribution (see the end of this notebook), but I want to get the answer performing a Monte Carlo simulation that uses tools from the stats submodule of scipy.
Marty Ligare, August 2020
import numpy as np
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modification of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
plt.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('xtick', labelsize=12)
plt.rc('ytick', labelsize=12)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
I am going to generate 100 random 0's and 1's, and let 1 represent a "heads" and 0 represent a "tails."
n_flips = 100 # Number of coin flips
data = stats.randint.rvs(0, 2, size=n_flips)
print(data)
I can count the "heads" by summing the array data.
n_heads = np.sum(data)
print(n_heads)
n_expts = 10000 # Number of experiments to simulate
results = np.zeros(n_expts) # Create array for results of simulations
for i in range(n_expts):
results[i] = np.sum(stats.randint.rvs(0, 2, size=n_flips))
nbins = 101
low = -.5
high = 100.5
plt.figure()
plt.xlabel("# of heads")
plt.ylabel("occurences")
plt.hist(results, nbins, [low,high])
plt.xlim(30,70);
np.mean(results)
success = 0
for i in range(len(results)):
if results[i] > 59:
success += 1
print("number of trials with 60 or more heads =", success, ", probability =",success/n_expts)
sum(1 if i>59 else 0 for i in results)
The probablity of obtaining 60 or more heads is
$$1 - \mbox{probability of obtaining 59 or fewer heads} = 1 - C_{DF}(59)$$p = 0.5 # probability of getting a "head" in a single trial
n = 100 # number of trials
print("probability =", 1 - stats.binom.cdf(59, n, p))
This result is consistent with results of the Monte Carlo simulation.
%version_information is an IPython magic extension for showing version information for dependency modules in a notebook;
See https://github.com/jrjohansson/version_information
%version_information is available on Bucknell computers on the linux network. You can easily install it on
any computer.
%load_ext version_information
version_information numpy, scipy, matplotlib