The data for this problem is available in the file fit_2.dat
which can be downloaded from http://www.eg.bucknell.edu/~phys310/hw/hw4.html. One option is to download the data file to a local directory, and then import it using np.loadtxt()
. Another option is to download it directly into the Jupyter notebook. I will use the second option here.
import numpy as np
from scipy import optimize
import urllib # for importing data from URL
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
# next line is compatible with html but does not include figures in pdf-file
%matplotlib notebook
# use next line, if you want to download pdf-file of this notebook
#%matplotlib inline
# M.L. modification of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
plt.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
def f(x, m, b):
return m*x + b
np.loadtxt
function imports the content of the data file into a numpy
array.
The unpack = 'True'
option transposes the array so that each column is in
a separate array.In this notebook I use the second option.
g = urllib.request.urlopen('https://www.eg.bucknell.edu/~phys310/hw/assignments/fitting_2/fit_2.dat')
data = np.loadtxt(g)
Data is for an experiment in which there is a suspected linear relationship between measured values of $x$ and $y$. The data for each point is on a single line in the file. The first number on each line is the value of $x$, the second is the value of $y$, and the third is the uncertainty in $y$. Uncertainties in $x$ are assumed to be negligible.
x, y, u = data.T
x, y, u
popt, pcov = optimize.curve_fit(f, x, y, sigma=u, absolute_sigma=True)
print(popt) # print the best fit parameters
print(pcov) # print the covariance matrix
for i in range(len(pcov)):
print(np.sqrt(pcov[i,i])) # print the uncertainties in fit parameters
plt.figure()
plt.errorbar(x,y,u,fmt='o')
xc = np.linspace(-1,21,10)
yc = f(xc,*popt)
plt.plot(xc,yc)
plt.xlim(-1,21)
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.axhline(0, color='k')
plt.axvline(0, color='k');
plt.figure()
r = y-f(x,*popt) # Calculate residuals
plt.errorbar(x,r,u, fmt = 'ko')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('residuals ($y_i - f(x_i)$)')
plt.xlim(0,21);
By inspection of the graph, I count 5 values for which 0 is not included within the error bars, and 15 for which it is.
I could make the computer do the counting:
# METHOD 1
cnt = 0
for j in range(len(r)):
if np.abs(r[j]) < u[j]:
cnt += 1
print(cnt, 'of the', len(r), 'residuals are within 1 alpha_i of 0')
# METHOD 2
r_norm = r/u # normalized residuals
plt.figure()
plt.errorbar(x,r_norm,1, fmt = 'ko')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('$(y_i - f(x_i))/\\alpha_i$')
plt.xlim(0,21);
cnt = np.sum(abs(r_norm) < 1)
print(cnt, 'of the', len(r_norm), 'residuals are within 1 alpha_i of 0')
chi2 = np.sum(r**2/u**2)
print(chi2, len(r))
The goodness of fit parameter is $\chi^2 = 16.7$, which is just under the number of data points. This indicates that the data is consistent with linear model, so the best fit parameters determined above are good values for the slope and intercept of a line. We also could conclude from Fig.2 (residuals as function of $x_i$) that the residuals fluctuate around $0$ and that there seem to be no systematic discrepancy and therefore that the linear fit is a good fit.
version_information
is from J.R. Johansson (jrjohansson at gmail.com); see Introduction to scientific computing with Python for more information and instructions for package installation.
version_information
is installed on the linux network at Bucknell
%load_ext version_information
version_information numpy, scipy, matplotlib