{ "cells": [ { "cell_type": "markdown", "id": "3b8ddf8d-902e-4462-aba9-4c468600854f", "metadata": {}, "source": [ "## Data Analysis III\n", "\n", "### February 11, 2024\n", "\n", "Goals for Today\n", "* Loading data from text files\n", "* Computing weighted averages\n", "* Plotting with errorbars\n", "* Compute the Chi-Squared\n", "* Implement Linear and Nonliner least squares fits" ] }, { "cell_type": "code", "execution_count": null, "id": "11ebb3bb-a7da-4b5d-ba66-40705bcdb8c8", "metadata": {}, "outputs": [], "source": [ "#Importing numpy and matplotlib\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "#For Curve Fitting\n", "from scipy import optimize \n", "\n", "#For reading urls\n", "import urllib\n", "\n", "plt.rc('figure', figsize = (6, 3.5)) # Reduces overall size of figures\n" ] }, { "cell_type": "markdown", "id": "03c6ef7b-053c-43e5-836e-9672c71c199c", "metadata": {}, "source": [ "* Download the file 'hdata.dat' from the course website into your working direcory. Open it in a text editor to get a sense of what it contains\n", "\n", "Look at the syntax of `np.loadtxt`, and use it to open the file.\n", "\n", "Before you run any code, answer the following questions\n", "* What kind of delimiter is used? Is it space, comma, tab?\n", "* How many header lines are used? Headers often provide relevant information such as units, instrument settings used for the data included. \n", "* When running np.loadtxt, you have to specify optional arguments for delimiter, number of rows to skip, whether to unpack in columns or rows.\n", "\n" ] }, { "cell_type": "markdown", "id": "a92e452b-7754-4432-82ef-c214ae9ffc14", "metadata": {}, "source": [ "The command `data = np.loadtxt(...)` stores the contents of the file into an array called data\n", "\n", "Now, split up data into two arrays: `y_values` and `y_unc` for the values and uncertainty." ] }, { "cell_type": "code", "execution_count": null, "id": "11c3f3c0-86a3-4da0-8244-a6a0bb884023", "metadata": {}, "outputs": [], "source": [ "#Make a plot of Measurement number vs value with properly labeled axes\n", "#Calculate the mean of the values using np.mean() or some other method you like\n" ] }, { "cell_type": "markdown", "id": "c0098102-e3b8-434e-a02a-2c04bc25272c", "metadata": {}, "source": [ "### Recall from last week: \n", "Average: $\\bar{x} = \\frac{1}{N}\\Sigma_ix_i$\n", "\n", "Weighted average: $\\bar{x}_{weighted} = \\frac{\\Sigma_iw_ix_i}{\\Sigma_iw_i}$, where the weights are given by $w_i = \\frac{1}{\\alpha_i^2}$ \n", "\n", "In this notation, $\\alpha_i$ represents the uncertainty of data point $i$." ] }, { "cell_type": "markdown", "id": "b16ec743-e96c-44e0-b19a-cecab2b2f7cc", "metadata": {}, "source": [ "calculate the weighted mean of the measurements taken. (Hint: a dot product makes the numerator easy. you can take the dot product of two matrices a and b either with numpy.dot -- i.e., `np.dot(a,b)` -- or by using the @ sign -- i.e., a @ b.) \n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bb6d5f21-291c-4ee0-9688-c98b27581be6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8730b24a-77da-47bc-92dc-6ce4c44b3aaf", "metadata": {}, "source": [ "Make a plot of Measurement number vs value with properly labeled axes\n", "\n", "Use ```np.hlines``` to draw horizontal lines for the mean, and the weighted means.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a10ce75a-e7bd-4369-9001-c141a21e91e2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "af7d19ad-2b56-4d76-bd36-62c0ae60c573", "metadata": {}, "source": [ "Look up the syntax of the function ```np.errorbar()```\n", "\n", "For example, if given three arrays, xval,yval,yerr, then the following code snippet plots errorbar plots with circular indicators\n", "` errorbarplot(xval,yval,yerr,fmt = 'o') `" ] }, { "cell_type": "code", "execution_count": null, "id": "ce313557-e670-48b4-81f4-562a307c6eca", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f1e49038-e815-4f69-a079-27621726a0a2", "metadata": {}, "source": [ "Next, plot the residuals $x_i - \\bar{x}$" ] }, { "cell_type": "code", "execution_count": null, "id": "af054489-2b0f-4f83-88a3-beea137f2e5d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f998c383-3ad9-470a-ba0a-ddf4b944cd13", "metadata": {}, "source": [ "And then plot the normalized residuals ($(x_i - \\bar{x})/\\alpha_i$)" ] }, { "cell_type": "code", "execution_count": null, "id": "4a910725-b422-450e-bb28-84feb22fc49b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "2e5445e9-676f-4093-909b-b551de631d4d", "metadata": {}, "source": [ "Now square the residuals and add them up. This gives the chi-squared." ] }, { "cell_type": "code", "execution_count": null, "id": "d475ba51-8918-420e-97c8-59a8c700477c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "26074307-6c53-47c4-ae15-e0c3cbd7ef09", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7ac83e5a-923f-45af-90d9-b26251969e88", "metadata": {}, "source": [ "## Part - II: Linear Least Squares fitting" ] }, { "cell_type": "markdown", "id": "72464157-d533-4920-9a9c-1142afdb0bd4", "metadata": {}, "source": [ "A least squares fit is a procedure where a model is adjusted in order to best match data in such a way that the sum of squares of the separation between the data and the model is minimized. \n", "\n", "the `scipy.optimize.curve_fit()` function implements a least squares fit routine. We demonstrate it below. \n", "\n", "The `curve_fit()` function returns a tuple of two variables, the optimal parameters of the fit, and the covariance matrix which encodes the uncertainty." ] }, { "cell_type": "code", "execution_count": null, "id": "83724f04-221b-4af7-8d79-cc6e3834bf87", "metadata": {}, "outputs": [], "source": [ "## Load Data into arrays xdata,ydata and udata\n", "link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/a.dat'\n", "fh = urllib.request.urlopen(link)\n", "xdata, ydata, udata = np.loadtxt(fh, unpack=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "2553348f-4c70-4d6d-b76d-50ba17d3a6fa", "metadata": {}, "outputs": [], "source": [ "## Make an errorbar plot of the data you downloaded. " ] }, { "cell_type": "code", "execution_count": null, "id": "63628d1c-6528-4e4c-9b17-eb1d6c7d80d4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "32926aec-376b-459d-b739-5345e04a819c", "metadata": {}, "outputs": [], "source": [ "#Define function for a line\n", "def f_lin(x,m,b):\n", " return m*x + b\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e315c25b-7545-4a9a-b07c-f87eb1fedcca", "metadata": {}, "outputs": [], "source": [ "#The snippet of code below performs a least squares fit on the data\n", "popt, pcov = optimize.curve_fit(f_lin,xdata,ydata,sigma=udata, absolute_sigma=True)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "86feb798-d72d-4900-be30-a8b010832107", "metadata": {}, "outputs": [], "source": [ "#print optimal parameters:\n", "print(popt)" ] }, { "cell_type": "code", "execution_count": null, "id": "dbac6f3e-96fd-4e5a-b079-b11e06a6bdb0", "metadata": {}, "outputs": [], "source": [ "#print standard errors for the fit parameters\n", "print(np.sqrt(np.diag(pcov)))" ] }, { "cell_type": "code", "execution_count": null, "id": "01765d4f-1f12-4cc1-b90f-90365d1aba33", "metadata": {}, "outputs": [], "source": [ "## Plot data and line of best fit\n", "plt.figure()\n", "plt.errorbar(xdata,ydata,udata, fmt='ok',label = 'data')\n", "xc = np.linspace(0,4,100)\n", "plt.plot(xc, f_lin(xc,*popt), label = 'best fit')\n", "plt.xlabel('$x$')\n", "plt.ylabel('$y$')\n", "plt.legend()\n", "plt.title('data and linear best-fit line');" ] }, { "cell_type": "code", "execution_count": null, "id": "bd92625c-2c7a-45a4-a364-a68ca348aded", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1e0c8784-ac31-4f03-aee8-ecfc45dbd0a1", "metadata": {}, "outputs": [], "source": [ "#compute residuals\n", "res = ydata - f_lin(xdata,*popt) # same as r = ydata - f_lin(xdata,popt[0],popt[1])\n", "\n", "plt.errorbar(xdata,res,udata, fmt='ok')\n", "plt.axhline(0)\n", "plt.xlabel('$x$')\n", "plt.ylabel('residuals ($y_i-f(x_i)$)')\n", "plt.title('residuals for linear model')\n", "plt.show()" ] }, { "cell_type": "raw", "id": "62397698-0a0e-4381-97b2-ad4546a1e9c6", "metadata": {}, "source": [ "# The block of code below combines the data, fit and residuals into an effective format using matplotlib functions.\n", "# Experiment with the parameters to customize to your taste\n", "\n", "fig, (ax0, ax1) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex=True,figsize = [7,4])\n", "ax0.errorbar(xdata,ydata,udata, fmt='.k',label = 'data')\n", "ax0.plot(xc, f_lin(xc,*popt), label = 'best fit')\n", "ax0.legend()\n", "ax0.set_ylabel('y')\n", " \n", "ax1.errorbar(xdata,res,udata, fmt='.k')\n", "ax1.set_ylabel('resid')\n", "ax1.set_xlabel('x');\n", "ax1.axhline(0)\n" ] }, { "cell_type": "markdown", "id": "f0921cb1-afb8-4fa2-9db7-19004d453b53", "metadata": {}, "source": [ "## Exercise\n", "\n", "* Define a new function f_quad which describes a quadratic function in x, i.e. $y = a x^2 + m x + b$\n", "\n", "* Using optimize.curve_fit(), fit the data above to the function f_quad\n", "\n", "* Plot the data and residuals" ] }, { "cell_type": "code", "execution_count": null, "id": "c4ce6c57-e4b4-430c-b89e-e4e9ac33c65b", "metadata": {}, "outputs": [], "source": [ "#def f_quad(...):\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7b585f67-25a8-4a16-8bf8-a41e6abe5541", "metadata": {}, "outputs": [], "source": [ "#Implement fit " ] }, { "cell_type": "code", "execution_count": null, "id": "e2854062-1e53-4219-984a-d45f6bb5c95e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a44c540a-13e7-4ec3-b7c7-5717bf2981d6", "metadata": {}, "outputs": [], "source": [ "#Plot Data and residuals" ] }, { "cell_type": "code", "execution_count": null, "id": "1fe14f87-de7a-4b28-b3e2-01f127067ecc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e11bccf7-f7b2-4e2d-a083-18d77175dd45", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f2c216dd-dac7-4c30-b8c5-e264bf2dc39b", "metadata": {}, "source": [ "## Part - III: Nonlinear Least Squares fitting" ] }, { "cell_type": "code", "execution_count": null, "id": "bf2096cc-1236-4ee3-ae94-8600f27bd052", "metadata": {}, "outputs": [], "source": [ "link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/b.dat'\n", "fh = urllib.request.urlopen(link)\n", "xdata_b, ydata_b, udata_b np.loadtxt(fh, unpack=True)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f70c158f-24b8-469a-8f00-a7c903f718df", "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "plt.errorbar(xdata_b,ydata_b,udata_b, fmt='ok')\n", "plt.xlabel('$x$')\n", "plt.ylabel('$y$');" ] }, { "cell_type": "code", "execution_count": null, "id": "efe65a9e-417f-422d-b843-18e194df00f4", "metadata": {}, "outputs": [], "source": [ "def fosc(x,a,wl,ph):\n", " '''Cosine function with three fit parameters: a (amplitude), wl (wavelength),\n", " and ph (phase)'''\n", " return a*np.cos(2*np.pi*x/wl + ph)" ] }, { "cell_type": "code", "execution_count": null, "id": "a9e58c91-150c-40c2-a9d5-768afa6f1d7e", "metadata": {}, "outputs": [], "source": [ "#Using scipy.optimize, fit the data to fosc.\n", "\n", "#Nonlinear fits often need initial fit guess\n", "pinit = [1,1,1];\n", "\n", "plt.figure()\n", "plt.errorbar(xdata,ydata,udata, fmt='ok')\n", "xc = np.linspace(0,4,100)\n", "#plt.plot(xc, f(xc,1.2,1.7,0));\n", "plt.plot(xc, f(xc,*pinit),'--', label = 'init guess')\n", "\n", "\n", "## Adjust your values for p0 as you wish" ] }, { "cell_type": "code", "execution_count": null, "id": "e4649c84-7761-458a-ad2d-6176ecb28719", "metadata": {}, "outputs": [], "source": [ "#Implementing the nonlinear curve_fit using the initial fit parameters\n", "popt, pcov = optimize.curve_fit(fosc,xdata_b,ydata_b,p0=pinit, sigma=udata, absolute_sigma=True)\n" ] }, { "cell_type": "markdown", "id": "58f46dc9-116b-4924-adf2-608cfbaf1f47", "metadata": {}, "source": [ "## Exercise\n", "\n", "* Plot the data and residuals\n", "* Calculate the chi-squared for the fit" ] }, { "cell_type": "code", "execution_count": null, "id": "0b0b61cd-bd15-4d64-af91-d35f8f6d457f", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }