{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Coin-flip experiment and the Central Limit Theorem\n", "\n", "Question: I flip a fair coin 100 times. What is the probability distribution of the number of heads \n", "that I get." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import stats\n", "\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook\n", "\n", "# M.L. modification of matplotlib defaults\n", "# Changes can also be put in matplotlibrc file, \n", "# or effected using mpl.rcParams[]\n", "mpl.style.use('classic')\n", "plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures\n", "plt.rc('axes', labelsize=16, titlesize=14)\n", "plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Simulating spin flips\n", "\n", "I am going to generate 100 random 0's and 1's, and let 1 represent a \"heads\" and 0 represent a \"tails.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_flips = 100 # Number of coin flips\n", "data = stats.randint.rvs(0, 2, size=n_flips)\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can count the \"heads\" by summing the array `data`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_heads = np.sum(data)\n", "print(n_heads)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Simulating 200 trials of 100 coin flip experiments\n", "I store the number of heads in each trial in the array `results`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_expts = 200 # umber of experiments to simulate\n", "results = np.zeros(n_expts) # Create array for results of \"experiments\"\n", "\n", "for i in range(n_expts):\n", " results[i] = np.sum(stats.randint.rvs(0, 2, size=n_flips)) #Note that each experiment consists of 100 coin flips" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Histogram of results\n", "Choose bin boundaries so that each bin includes a single integer, e.g.,\n", "$[47.5, 48.5]$, $[48.5,49.5]$, $[49.5, 50.5]$, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "low = 35.5\n", "high = 64.5\n", "nbins = int(high-low) \n", "plt.figure()\n", "plt.xlabel(\"# of heads\")\n", "plt.ylabel(\"occurrences\")\n", "h_out = plt.hist(results, nbins, [low,high])\n", "plt.xlim(low,high);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Increasing the number of trials. Looks more Gaussian." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_expts = 10000 \n", "results = np.zeros(n_expts) # Create array for results of \"experiments\"\n", "\n", "for i in range(n_expts):\n", " results[i] = np.sum(stats.randint.rvs(0, 2, size=n_flips))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "low = 35.5\n", "high = 64.5\n", "nbins = int(high - low)\n", "plt.figure()\n", "plt.xlabel(\"# of heads\")\n", "plt.ylabel(\"occurrences\")\n", "out = plt.hist(results, nbins, [low,high])\n", "plt.xlim(low,high);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Central Limit Theorem\n", "Following Hughes and Hase statement of the Central Limit Theorem at the \n", "top of p.33, we should look at the distribution of the sample mean:\n", "\n", "$$\n", "\\langle x\\rangle = \\frac{1}{N}(x_1 + x_2 + \\cdots + x_N).\n", "$$\n", "\n", "It's the distribution of the sample mean that approaches the normal distribution.\n", "\n", "The individual values $x_i$ are sampled from a discrete distribution with two values:\n", "$x_i = 0$ with a probability of 0.5, and $x_i = 1$ with a probability 0.5. The \n", "mean of this parent distribution for the $x_i$'s is $\\langle x\\rangle = 0.5$, and the \n", "standard deviation of this parent distribution is $\\sigma = 0.5$.\n", "\n", "The histogram below is the same as the histogram immediately above, except that the \n", "results for the number of heads in each trial have been divided by $N$, the number of \n", "coin-flips in a trial (in this example 100).\n", "\n", "The Central Limit Theorem states that total number of heads divided by the number of \n", "flips (100) should tend toward a normal distribution with a mean of 0.5 and a standard deviation of $\\sigma_\\text{parent}/\\sqrt{100}$. Let's check." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "low = 35.5/n_flips\n", "high = 64.5/n_flips\n", "# Don't change number of bins from previous histogram\n", "plt.figure()\n", "plt.xlabel(\"fraction of heads\")\n", "plt.ylabel(\"occurrences\")\n", "h_out = plt.hist(results/n_flips, nbins, [low,high])\n", "plt.xlim(low,high);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check the standard deviation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"sample mean =\", np.mean(results/n_flips), \", sample std =\", np.std(results/n_flips))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sigma_parent = 1/2\n", "print(\"predicted std =\", sigma_parent/np.sqrt(n_flips) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We can check more than just the numerical value of the standard deviation\n", "Use the CDF of the normal distribution and the bin boundaries to determine the expected occurrences for a normal distribution:\n", "\n", "\\begin{eqnarray*}\n", "P(x_1Note: Information about the occurrences and bin boundaries is part of the output of the `plt.hist()` function. (I gave this output the name `h_out` in the plotting cell above.) `h_out[0]` is an array with the counts in each bin, and `h_out[1]` is an array with the bin boundaries." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "h_out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "low = 35.5/n_flips\n", "high = 64.5/n_flips\n", "# use nbins previously defined\n", "plt.figure()\n", "plt.xlabel(\"(# of heads)/100\")\n", "plt.ylabel(\"occurrences\")\n", "h_out = plt.hist(results/n_flips, nbins, [low,high], alpha = 0.5)\n", "plt.xlim(low,high);\n", "\n", "x = np.zeros(len(h_out[1])-1) # Create empty array for mid-points of histogram bins\n", "y = np.zeros(len(h_out[1])-1) # Create empty array for expected occurences from normal dist.\n", "for i in range(len(x)):\n", " x[i] = (h_out[1][i+1] + h_out[1][i])/2 # fill x with center-of-bin values\n", " y[i] = n_expts*(stats.norm.cdf(h_out[1][i+1],0.5,0.5/np.sqrt(n_flips))\\\n", " - stats.norm.cdf(h_out[1][i],0.5,0.5/np.sqrt(n_flips))) # fill y with probabilities\n", " # based on gaussian pdf\n", "plt.scatter(x, y, c = 'red');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Looks pretty Gaussian!!\n", "\n", "In a future class we will talk about how to make quantitative assessments of confidence \n", "about whether a sample of random variables comes from a given distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Version information\n", "`version_information` is from J.R. Johansson (jrjohansson at gmail.com); see Introduction to scientific computing with Python for more information and instructions for package installation.\n", "\n", "`version_information` is installed on the linux network at Bucknell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext version_information" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "version_information numpy, scipy, matplotlib" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 2 }