Fitting of linear models. This notebook uses curve_fit()
from the optimize
submodule of
scipy
. (This function can be used to fit linear and non-linear models. When used for
linear models no initial parameter estimates are required.)
NOTE: Linear models are not necessarily linear functions. In this notebook I fit two functions:
Both of these are models that are linear in the fit parameters, although g is not linear in the independent variable x.
Marty Ligare, August 2020
modified by Katharina Vollmayr-Lee February 2022
import numpy as np
from scipy import optimize
import urllib
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modifications of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
mpl.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
Usually all self-defined functions are at the beginning of the program. I put the definitions here however in comment, so that you can see the function definition below where we then do a fit with it.
#def f(x,m,b):
# return m*x + b
#
#def g(x,m,b,c):
# return m*x + b + c*x**2
Below you practice to download the data file to a local directory, and import it using np.loadtxt
, but here we
skip the intermediate step and load it directly from the URL to the notebook.
Each row in file gives information on single data point, xi, yi, αi. The option
unpack = True
takes transpose of the data file array so that all values of x
are in a single
array, and similarly for y
and u
(the uncertainty). Notice that by default np.loadtxt()
treats
#
as the comment character.
link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/a.dat'
fh = urllib.request.urlopen(link)
xdata, ydata, udata = np.loadtxt(fh, unpack=True)
#print(xdata,ydata,udata)
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='o')
plt.xlabel('$x$')
plt.ylabel('$y$')
#plt.scatter(xdata,ydata)
# KVL: if above %matplotlib notebook doesn't work uncomment next line
#plt.show()
Since for most labs you will have a file with your data instead of the file of a URL link, let us practice here how to use np.loadtext
directly on a file. So download from the Phys 310 webpage the file a.dat
into your local directory. Next we import it using np.loadtxt
.
Each row in this file gives information on single data point, xi, yi, αi. The option
unpack = True
takes transpose of the data file array so that all values of x
are in a single
array, and similarly for y
and u
(the uncertainty). Notice that by default np.loadtxt()
treats
#
as the comment character.
xdata, ydata, udata = np.loadtxt("a.dat", unpack=True)
#print(xdata,ydata,udata)
And here other loadtxt command, where you first load the data into an array and then transpose and assign the x, y, and u data.
data = np.loadtxt("a.dat") # each line i data point x_i,y_i,u_i
xdata = data.T[0] # The .T gives transpose of array
ydata = data.T[1]
udata = data.T[2]
#print(udata)
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
plt.xlabel('$x$')
plt.xlabel('$y$')
#plt.scatter(xdata,ydata);
#plt.show()
def f(x,m,b):
return m*x + b
Next we do a Curve Fit. Since this is a linear fit we do not need to provide initial parameters.
curve_fit
returns best-fit parameters (→ popt
) and the covariance matrix (→ pcov
).popt, pcov = optimize.curve_fit(f,xdata,ydata,sigma=udata, absolute_sigma=True)
print(popt)
print(pcov)
print ("m= ",popt[0]," +- ",np.sqrt(pcov[0,0]))
print ("b= ",popt[1]," +- ",np.sqrt(pcov[1,1]))
# fit for data without error bars
#popt, pcov = optimize.curve_fit(f,xdata,ydata)
#print ("m= ",popt[0]," +- ",np.sqrt(pcov[0,0]))
#print ("b= ",popt[1]," +- ",np.sqrt(pcov[1,1]))
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
xc = np.linspace(0,4,100)
plt.plot(xc, f(xc,*popt))
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('data and best-fit line');
#plt.show()
r = ydata - f(xdata,*popt) # same as r = ydata - f(xdata,popt[0],popt[1])
plt.figure()
plt.errorbar(xdata,r,udata, fmt='ok')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('residuals ($y_i-f(x_i)$)')
plt.title('residuals for linear model')
plt.show()
chisq = np.sum(r**2/udata**2)
print(chisq, len(r))
The value of χ2 is much larger than the number of data points.
That, coupled with the obvious non-random pattern in the residuals, indicates that the data is not consistent with the model.
This is still a linear model, i.e., it is linear in the fit parameters a
, b
, and c
).
def g(x,m,b,c):
return m*x + b + c*x**2
popt, pcov = optimize.curve_fit(g,xdata,ydata,sigma=udata, absolute_sigma=True)
print(popt)
for i in range(len(popt)):
print(np.sqrt(pcov[i,i]))
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
xc = np.linspace(0,4,100)
plt.plot(xc, g(xc,*popt))
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('data and best-fit quadratic');
r = ydata - g(xdata,*popt)
plt.figure()
plt.errorbar(xdata,r,udata, fmt='ko')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('residuals ($y_i-f(x_i)$)')
plt.title('residuals for quadratic model');
chisq = np.sum(r**2/udata**2)
print(chisq, len(r))
%version_information is an IPython magic extension for showing version information for dependency modules in a notebook;
See https://github.com/jrjohansson/version_information
%version_information
is available on Bucknell computers on the linux network. You can easily install it on
any computer.
%load_ext version_information
version_information numpy, scipy, matplotlib