{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3b8ddf8d-902e-4462-aba9-4c468600854f",
   "metadata": {},
   "source": [
    "## Data Analysis III\n",
    "\n",
    "### February 11, 2024\n",
    "\n",
    "Goals for Today\n",
    "* Loading data from text files\n",
    "* Computing weighted averages\n",
    "* Plotting with errorbars\n",
    "* Compute the Chi-Squared\n",
    "* Implement Linear and Nonliner least squares fits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11ebb3bb-a7da-4b5d-ba66-40705bcdb8c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Importing numpy and matplotlib\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "#For Curve Fitting\n",
    "from scipy import optimize  \n",
    "\n",
    "#For reading urls\n",
    "import urllib\n",
    "\n",
    "plt.rc('figure', figsize = (6, 3.5)) # Reduces overall size of figures\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03c6ef7b-053c-43e5-836e-9672c71c199c",
   "metadata": {},
   "source": [
    "* Download the  file 'hdata.dat' from the course website into your working direcory. Open it in a text editor to get a sense of what it contains\n",
    "\n",
    "Look at the syntax of `np.loadtxt`, and use it to open the file.\n",
    "\n",
    "Before you run any code, answer the following questions\n",
    "* What kind of delimiter is used? Is it space, comma, tab?\n",
    "* How many header lines are used? Headers often provide relevant information such as units, instrument settings used for the data included. \n",
    "* When running np.loadtxt, you have to specify optional arguments for delimiter, number of rows to skip, whether to unpack in columns or rows.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a92e452b-7754-4432-82ef-c214ae9ffc14",
   "metadata": {},
   "source": [
    "The command `data = np.loadtxt(...)`  stores the contents of the file into an array called data\n",
    "\n",
    "Now, split up data into two arrays: `y_values` and `y_unc` for the values and uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11c3f3c0-86a3-4da0-8244-a6a0bb884023",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Make a plot of Measurement number vs value with properly labeled axes\n",
    "#Calculate the mean of the values using np.mean() or some other method you like\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0098102-e3b8-434e-a02a-2c04bc25272c",
   "metadata": {},
   "source": [
    "### Recall from last week:  \n",
    "Average: $\\bar{x} = \\frac{1}{N}\\Sigma_ix_i$\n",
    "\n",
    "Weighted average: $\\bar{x}_{weighted} = \\frac{\\Sigma_iw_ix_i}{\\Sigma_iw_i}$,  where the weights are given by $w_i = \\frac{1}{\\alpha_i^2}$ \n",
    "\n",
    "In this notation, $\\alpha_i$ represents the uncertainty of data point $i$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b16ec743-e96c-44e0-b19a-cecab2b2f7cc",
   "metadata": {},
   "source": [
    "calculate the weighted mean of the measurements taken. (Hint: a dot product makes the numerator easy. you can take the dot product of two matrices a and b either with numpy.dot -- i.e., `np.dot(a,b)` -- or by using the @ sign -- i.e., a @ b.) \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb6d5f21-291c-4ee0-9688-c98b27581be6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8730b24a-77da-47bc-92dc-6ce4c44b3aaf",
   "metadata": {},
   "source": [
    "Make a plot of Measurement number vs value with properly labeled axes\n",
    "\n",
    "Use ```np.hlines``` to draw horizontal lines for the mean, and the weighted means.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a10ce75a-e7bd-4369-9001-c141a21e91e2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "af7d19ad-2b56-4d76-bd36-62c0ae60c573",
   "metadata": {},
   "source": [
    "Look up the syntax of the function ```np.errorbar()```\n",
    "\n",
    "For example, if given three arrays, xval,yval,yerr, then the following code snippet plots errorbar plots with circular indicators\n",
    "` errorbarplot(xval,yval,yerr,fmt = 'o') `"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce313557-e670-48b4-81f4-562a307c6eca",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f1e49038-e815-4f69-a079-27621726a0a2",
   "metadata": {},
   "source": [
    "Next, plot the residuals $x_i - \\bar{x}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af054489-2b0f-4f83-88a3-beea137f2e5d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f998c383-3ad9-470a-ba0a-ddf4b944cd13",
   "metadata": {},
   "source": [
    "And then plot the normalized residuals ($(x_i - \\bar{x})/\\alpha_i$)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a910725-b422-450e-bb28-84feb22fc49b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "2e5445e9-676f-4093-909b-b551de631d4d",
   "metadata": {},
   "source": [
    "Now square the residuals and add them up. This gives the chi-squared."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d475ba51-8918-420e-97c8-59a8c700477c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26074307-6c53-47c4-ae15-e0c3cbd7ef09",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7ac83e5a-923f-45af-90d9-b26251969e88",
   "metadata": {},
   "source": [
    "## Part - II: Linear Least Squares fitting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72464157-d533-4920-9a9c-1142afdb0bd4",
   "metadata": {},
   "source": [
    "A least squares fit is a procedure where a model is adjusted in order to best match data in such a way that the sum of squares of the separation between the data and the model is minimized. \n",
    "\n",
    "the `scipy.optimize.curve_fit()` function implements a least squares fit routine. We demonstrate it below. \n",
    "\n",
    "The `curve_fit()` function returns a tuple of two variables, the optimal parameters of the fit, and the covariance matrix which encodes the uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83724f04-221b-4af7-8d79-cc6e3834bf87",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Load Data into arrays xdata,ydata and udata\n",
    "link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/a.dat'\n",
    "fh = urllib.request.urlopen(link)\n",
    "xdata, ydata, udata = np.loadtxt(fh, unpack=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2553348f-4c70-4d6d-b76d-50ba17d3a6fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Make an errorbar plot of the data you downloaded. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63628d1c-6528-4e4c-9b17-eb1d6c7d80d4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32926aec-376b-459d-b739-5345e04a819c",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Define function for a line\n",
    "def f_lin(x,m,b):\n",
    "    return m*x + b\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e315c25b-7545-4a9a-b07c-f87eb1fedcca",
   "metadata": {},
   "outputs": [],
   "source": [
    "#The snippet of code below performs a least squares fit on the data\n",
    "popt, pcov = optimize.curve_fit(f_lin,xdata,ydata,sigma=udata, absolute_sigma=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86feb798-d72d-4900-be30-a8b010832107",
   "metadata": {},
   "outputs": [],
   "source": [
    "#print optimal parameters:\n",
    "print(popt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbac6f3e-96fd-4e5a-b079-b11e06a6bdb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "#print standard errors for the fit parameters\n",
    "print(np.sqrt(np.diag(pcov)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01765d4f-1f12-4cc1-b90f-90365d1aba33",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Plot data and line of best fit\n",
    "plt.figure()\n",
    "plt.errorbar(xdata,ydata,udata, fmt='ok',label = 'data')\n",
    "xc = np.linspace(0,4,100)\n",
    "plt.plot(xc, f_lin(xc,*popt), label = 'best fit')\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('$y$')\n",
    "plt.legend()\n",
    "plt.title('data and linear best-fit line');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd92625c-2c7a-45a4-a364-a68ca348aded",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e0c8784-ac31-4f03-aee8-ecfc45dbd0a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "#compute residuals\n",
    "res = ydata - f_lin(xdata,*popt)     # same as    r = ydata - f_lin(xdata,popt[0],popt[1])\n",
    "\n",
    "plt.errorbar(xdata,res,udata, fmt='ok')\n",
    "plt.axhline(0)\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('residuals ($y_i-f(x_i)$)')\n",
    "plt.title('residuals for linear model')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "62397698-0a0e-4381-97b2-ad4546a1e9c6",
   "metadata": {},
   "source": [
    "# The block of code below combines the data, fit and residuals into an effective format using matplotlib functions.\n",
    "# Experiment with the parameters to customize to your taste\n",
    "\n",
    "fig, (ax0, ax1) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex=True,figsize = [7,4])\n",
    "ax0.errorbar(xdata,ydata,udata, fmt='.k',label = 'data')\n",
    "ax0.plot(xc, f_lin(xc,*popt), label = 'best fit')\n",
    "ax0.legend()\n",
    "ax0.set_ylabel('y')\n",
    "           \n",
    "ax1.errorbar(xdata,res,udata, fmt='.k')\n",
    "ax1.set_ylabel('resid')\n",
    "ax1.set_xlabel('x');\n",
    "ax1.axhline(0)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0921cb1-afb8-4fa2-9db7-19004d453b53",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "\n",
    "* Define a new function f_quad which describes a quadratic function in x, i.e. $y = a x^2 + m x + b$\n",
    "\n",
    "* Using optimize.curve_fit(), fit the data above to the function f_quad\n",
    "\n",
    "* Plot the data and residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4ce6c57-e4b4-430c-b89e-e4e9ac33c65b",
   "metadata": {},
   "outputs": [],
   "source": [
    "#def f_quad(...):\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b585f67-25a8-4a16-8bf8-a41e6abe5541",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Implement fit "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2854062-1e53-4219-984a-d45f6bb5c95e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a44c540a-13e7-4ec3-b7c7-5717bf2981d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Plot Data and residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fe14f87-de7a-4b28-b3e2-01f127067ecc",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e11bccf7-f7b2-4e2d-a083-18d77175dd45",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f2c216dd-dac7-4c30-b8c5-e264bf2dc39b",
   "metadata": {},
   "source": [
    "## Part - III: Nonlinear Least Squares fitting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf2096cc-1236-4ee3-ae94-8600f27bd052",
   "metadata": {},
   "outputs": [],
   "source": [
    "link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/b.dat'\n",
    "fh = urllib.request.urlopen(link)\n",
    "xdata_b, ydata_b, udata_b np.loadtxt(fh, unpack=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f70c158f-24b8-469a-8f00-a7c903f718df",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "plt.errorbar(xdata_b,ydata_b,udata_b, fmt='ok')\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('$y$');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efe65a9e-417f-422d-b843-18e194df00f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fosc(x,a,wl,ph):\n",
    "    '''Cosine function with three fit parameters: a (amplitude), wl (wavelength),\n",
    "    and ph (phase)'''\n",
    "    return a*np.cos(2*np.pi*x/wl + ph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9e58c91-150c-40c2-a9d5-768afa6f1d7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Using scipy.optimize, fit the data to fosc.\n",
    "\n",
    "#Nonlinear fits often need initial fit guess\n",
    "pinit = [1,1,1];\n",
    "\n",
    "plt.figure()\n",
    "plt.errorbar(xdata,ydata,udata, fmt='ok')\n",
    "xc = np.linspace(0,4,100)\n",
    "#plt.plot(xc, f(xc,1.2,1.7,0));\n",
    "plt.plot(xc, f(xc,*pinit),'--', label = 'init guess')\n",
    "\n",
    "\n",
    "## Adjust your values for p0 as you wish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4649c84-7761-458a-ad2d-6176ecb28719",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Implementing the nonlinear curve_fit using the initial fit parameters\n",
    "popt, pcov = optimize.curve_fit(fosc,xdata_b,ydata_b,p0=pinit, sigma=udata, absolute_sigma=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58f46dc9-116b-4924-adf2-608cfbaf1f47",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "\n",
    "* Plot the data and residuals\n",
    "* Calculate the chi-squared for the fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b0b61cd-bd15-4d64-af91-d35f8f6d457f",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}