{ "cells": [ { "cell_type": "markdown", "id": "574d6859-c223-4fd7-af60-66c2e0dcebd2", "metadata": {}, "source": [ "## Class 9, 2025\n", "\n", "### Goodness of Fit and Hypothesis Testing\n", "\n", "* How do we know if a model is a good fit?\n", "\n", "* How do we know if error bars assigned are reasonable?\n", "\n", "* We can use the measured value of chi-squared to provide a quantitative measure of the two questions above" ] }, { "cell_type": "code", "execution_count": null, "id": "61ca5691-5797-4785-846f-b94c7f8dd8ce", "metadata": {}, "outputs": [], "source": [ "import numpy as np;\n", "import matplotlib.pyplot as plt\n", "from scipy import stats\n", "from scipy.optimize import curve_fit\n", "import urllib\n", "\n", "\n", "plt.rc('figure', figsize = (4.5, 3)) # Reduces overall size of figures\n", "plt.rc('axes', labelsize=16, titlesize=14)\n", "plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size" ] }, { "cell_type": "markdown", "id": "4f3531a3-f036-41ab-8369-ed3073192b80", "metadata": {}, "source": [ "#### Functions\n", "Remember:\n", "$$\n", "\\chi^2 = \\sum_i\\frac{\\left(y(x_i) - y_i\\right)^2}{\\alpha_i^2}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "4eb71e9f-bcf9-45bb-a7be-4de9e9a5eab6", "metadata": {}, "outputs": [], "source": [ "link = 'https://www.eg.bucknell.edu/~phys310/skills/data_analysis/Class8Data.dat'\n", "fh = urllib.request.urlopen(link)\n", "\n", "xdata,ydata,unc = np.loadtxt(fh)" ] }, { "cell_type": "code", "execution_count": null, "id": "cf4dea23-1e2d-4a60-b305-3b45684ba874", "metadata": {}, "outputs": [], "source": [ "## Make an errorbar plot of ydata vs xdata. The uncertaintainty unc applies to the ydata\n" ] }, { "cell_type": "markdown", "id": "53cb6f5e-55c8-4920-9ca1-6abfd8cf40b3", "metadata": {}, "source": [ "We are interested in the following Hypotheses which we can check individually:\n", "* (1.) The data described by a single resonance described by a Gaussian?\n", "* (2.) The data is derived from two resonances whose amplitudes are related by a factor of 2.\n", "\n", "Note that it is possible that neither hypothesis is true." ] }, { "cell_type": "code", "execution_count": null, "id": "02daabe6-9629-4b31-bd59-ba132c2b7a4f", "metadata": {}, "outputs": [], "source": [ "\n", "def f1G(x,A1,x1,sig, c):\n", " '''\n", " Single Gaussian.\n", " A1: amplitude, \n", " x1: resonance frequency\n", " sig: width parameter\n", " c: constant offset\n", " '''\n", " return A1*np.exp(-(x - x1)**2/sig**2) + c\n", "\n", "def f2G(x,A1,x1,x2,sig,c):\n", " '''\n", " Sum of Two Gaussians\n", " A1: amplitude of first resonance\n", " x1: resonant frequency of first peak\n", " x2: resonant frequency of second peak\n", " sig: width parameter\n", " c: constant offset\n", " '''\n", " \n", " return A1*np.exp(-(x - x1)**2/sig**2) + A1/2*np.exp(-(x - x2)**2/sig**2) + c\n", "\n", "def chi2(xdata,ydata,unc,fitFunc,fitParams):\n", " '''Chisquare as a function of data (xdata, ydata, unc), and model (fitFunc,fitParams)\n", " '''\n", " dof = len(ydata) - len(fitParams)\n", " chisquare = np.sum((ydata - fitFunc(xdata,*fitParams))**2/unc**2)\n", " \n", " print('dof = ',dof)\n", " print('Chi-Squared is ',np.round(chisquare,2))\n", " #p = 1 - stats.chi2.cdf(chisquare, dof)\n", " #print('probability of getting value of chisq greater than ',np.round(chisquare,2),'is ',np.round(p,2))\n", " \n", " return np.array([chisquare,dof])\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bdc4c6b4-bef8-405b-b05a-205d1663d03d", "metadata": {}, "outputs": [], "source": [ "## Perform a least squares fit of your data with the exponential function above.\n", "# Plot the data with the fit\n", "# Use the chi2 function to determine the chi-squared" ] }, { "cell_type": "code", "execution_count": null, "id": "03715897-b2fa-4faa-af75-8f1139dd4f9b", "metadata": {}, "outputs": [], "source": [ "#f1G(x,A1,x1,sig, c)\n", "pguess1 = [10, 20,10,5]\n", "params1, covMat1 = curve_fit(f1G, xdata, ydata,p0 = pguess1, sigma = unc,absolute_sigma=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "76bbf6a5-1ea5-4fd6-8249-6a89addd4eb4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "844e356e-c950-4a32-839e-03722c44be8c", "metadata": {}, "source": [ "### What we assume when we make a least squares fit\n", "\n", "* The data points are independent\n", "* The error bars represent one standard deviation from tbe best fit. \n", "* This implies that each y data is a random number drawn from a Normal distribution which is centered on yfit, with standard deviation unc\n", "* Normalized residuals should therefore be \n", " * described by a gaussian distribution\n", " * the mean of the distribution should be 1.\n", " \n", " \n", "#### What about the chi-Squared variable? How should it be distributed?" ] }, { "cell_type": "code", "execution_count": null, "id": "84727c7a-79d9-4bfe-9850-cd05d246d51c", "metadata": {}, "outputs": [], "source": [ "#Consider 10 normally distributed random variables. \n", "\n", "\n", "nres = stats.norm.rvs(loc = 0, scale = 1, size = 10)\n", "##Calculate the sum of squares of nres\n", "ssq = np.sum(nres**2)\n", "\n", "print(ssq)" ] }, { "cell_type": "code", "execution_count": null, "id": "e3be91d5-8415-4b67-b06b-4f293d2ce881", "metadata": {}, "outputs": [], "source": [ "# Repeat your measurement 1000 times and make a historgram of the ssq that you obtain\n", "ssq=[]\n", "\n", "ndata = 5;\n", "Nexpts = 1000;\n", "for j in range(Nexpts):\n", " \n", " nres = stats.norm.rvs(loc = 0, scale = 1, size = ndata)\n", " ##Calculate the sum of squares of nres\n", " ssq.append(np.sum(nres**2))\n", " \n" ] }, { "cell_type": "code", "execution_count": null, "id": "4a7faf1b-1cc6-45cb-ab0c-04e26d681aa5", "metadata": {}, "outputs": [], "source": [ "plt.hist(ssq,rwidth = 0.8);" ] }, { "cell_type": "markdown", "id": "c822ff90-b209-470c-9634-83b0ca3a3e9c", "metadata": {}, "source": [ "* Investigate the effect of varying the number of data points used in calculating the ssq. \n", "\n", "* Try values of ndata ranging from ndata = 4, to ndata = 50\n", "\n", "* What do you get for the variable mean(ssq)/ndata?\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3ab47820-4981-4f87-9be5-e06dccb45810", "metadata": {}, "outputs": [], "source": [ "## The Chi-Squared distribution\n", "ssq=[]\n", "\n", "ndata = 5;\n", "Nexpts = 1000;\n", "for j in range(Nexpts):\n", " \n", " nres = stats.norm.rvs(loc = 0, scale = 1, size = ndata)\n", " ##Calculate the sum of squares of nres\n", " ssq.append(np.sum(nres**2))\n", " \n", " \n", "xx = np.linspace(0,1.5*max(ssq));\n", "yy = stats.chi2.pdf(xx,ndata);\n", "plt.hist(ssq,rwidth = 0.8,density = True);\n", "plt.plot(xx,yy)\n", "plt.xlabel('sum of squares (ssq)')\n", "plt.ylabel('PDF')" ] }, { "cell_type": "code", "execution_count": null, "id": "5e6fa2de-8057-4e5e-abed-501e148d5567", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fcb26867-15ba-4964-a686-e484ce5193d0", "metadata": {}, "source": [ "** Discussions\n", "* What characterizes the mean and width of the chi-square distribtuion?\n", "* Consider the 1-CDF of the chi-Squared dstribution (Fig 8.4 in H&H)\n", "* See Page 106 in H & H regaring whether to question or reject null hypothesis\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d468511f-6b30-4ba3-b07a-530e668026e9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "02abe99f-ecf3-4549-bd4a-405c0d46d09c", "metadata": {}, "source": [ "Calculate the porbability of getting a value of $\\chi^2$ greater than the value determined from the data from fitting with the model f1G" ] }, { "cell_type": "code", "execution_count": null, "id": "58d0edda-90fd-4b6c-b981-8a07d1039cee", "metadata": {}, "outputs": [], "source": [ "p = 1 - stats.chi2.cdf(chisquare, dof)\n", "print('probability of getting value of chisq greater than ',chi2_data,'is ',p)" ] }, { "cell_type": "code", "execution_count": null, "id": "4cf1bc5c-caa2-4102-81a2-e2e2302d256d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "66d9fba3-84b5-41c9-ae53-1dd55142b72b", "metadata": {}, "source": [ "Now implement a fit with the function f2G. How does the Chi-Squared value compare?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }