Fitting of linear models. This notebook uses curve_fit()
from the optimize
submodule of
scipy
. (This function can be used to fit linear and non-linear models. When used for
linear models no initial parameter estimates are required.)
NOTE: Linear models are not necessarily linear functions. In this notebook I fit two functions:
Both of these are models that are linear in the fit parameters, although $g$ is not linear in the independent variable $x$.
Marty Ligare, August 2020
modified by Katharina Vollmayr-Lee February 2022
import numpy as np
from scipy import optimize
import urllib
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in notebook.
%matplotlib notebook
# M.L. modifications of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
mpl.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot params for new size
Usually all self-defined functions are at the beginning of the program. I put the definitions here however in comment, so that you can see the function definition below where we then do a fit with it.
#def f(x,m,b):
# return m*x + b
#
#def g(x,m,b,c):
# return m*x + b + c*x**2
Below you practice to download the data file to a local directory, and import it using np.loadtxt
, but here we
skip the intermediate step and load it directly from the URL to the notebook.
Each row in file gives information on single data point, $x_i$, $y_i$, $\alpha_i$. The option
unpack = True
takes transpose of the data file array so that all values of x
are in a single
array, and similarly for y
and u
(the uncertainty). Notice that by default np.loadtxt()
treats
#
as the comment character.
link = 'http://www.eg.bucknell.edu/~phys310/skills/data_analysis/a.dat'
fh = urllib.request.urlopen(link)
xdata, ydata, udata = np.loadtxt(fh, unpack=True)
#print(xdata,ydata,udata)
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='o')
plt.xlabel('$x$')
plt.ylabel('$y$')
#plt.scatter(xdata,ydata)
# KVL: if above %matplotlib notebook doesn't work uncomment next line
#plt.show()
Since for most labs you will have a file with your data instead of the file of a URL link, let us practice here how to use np.loadtext
directly on a file. So download from the Phys 310 webpage the file a.dat
into your local directory. Next we import it using np.loadtxt
.
Each row in this file gives information on single data point, $x_i$, $y_i$, $\alpha_i$. The option
unpack = True
takes transpose of the data file array so that all values of x
are in a single
array, and similarly for y
and u
(the uncertainty). Notice that by default np.loadtxt()
treats
#
as the comment character.
xdata, ydata, udata = np.loadtxt("a.dat", unpack=True)
#print(xdata,ydata,udata)
And here other loadtxt command, where you first load the data into an array and then transpose and assign the x, y, and u data.
data = np.loadtxt("a.dat") # each line i data point x_i,y_i,u_i
xdata = data.T[0] # The .T gives transpose of array
ydata = data.T[1]
udata = data.T[2]
#print(udata)
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
plt.xlabel('$x$')
plt.xlabel('$y$')
#plt.scatter(xdata,ydata);
#plt.show()
def f(x,m,b):
return m*x + b
Next we do a Curve Fit. Since this is a linear fit we do not need to provide initial parameters.
curve_fit
returns best-fit parameters ($\rightarrow$ popt
) and the covariance matrix ($\rightarrow$ pcov
).popt, pcov = optimize.curve_fit(f,xdata,ydata,sigma=udata, absolute_sigma=True)
print(popt)
print(pcov)
print ("m= ",popt[0]," +- ",np.sqrt(pcov[0,0]))
print ("b= ",popt[1]," +- ",np.sqrt(pcov[1,1]))
# fit for data without error bars
#popt, pcov = optimize.curve_fit(f,xdata,ydata)
#print ("m= ",popt[0]," +- ",np.sqrt(pcov[0,0]))
#print ("b= ",popt[1]," +- ",np.sqrt(pcov[1,1]))
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
xc = np.linspace(0,4,100)
plt.plot(xc, f(xc,*popt))
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('data and best-fit line');
#plt.show()
r = ydata - f(xdata,*popt) # same as r = ydata - f(xdata,popt[0],popt[1])
plt.figure()
plt.errorbar(xdata,r,udata, fmt='ok')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('residuals ($y_i-f(x_i)$)')
plt.title('residuals for linear model')
plt.show()
chisq = np.sum(r**2/udata**2)
print(chisq, len(r))
The value of $\chi^2$ is much larger than the number of data points.
That, coupled with the obvious non-random pattern in the residuals, indicates that the data is not consistent with the model.
This is still a linear model, i.e., it is linear in the fit parameters a
, b
, and c
).
def g(x,m,b,c):
return m*x + b + c*x**2
popt, pcov = optimize.curve_fit(g,xdata,ydata,sigma=udata, absolute_sigma=True)
print(popt)
for i in range(len(popt)):
print(np.sqrt(pcov[i,i]))
plt.figure()
plt.errorbar(xdata,ydata,udata, fmt='ok')
xc = np.linspace(0,4,100)
plt.plot(xc, g(xc,*popt))
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('data and best-fit quadratic');
r = ydata - g(xdata,*popt)
plt.figure()
plt.errorbar(xdata,r,udata, fmt='ko')
plt.axhline(0)
plt.xlabel('$x$')
plt.ylabel('residuals ($y_i-f(x_i)$)')
plt.title('residuals for quadratic model');
chisq = np.sum(r**2/udata**2)
print(chisq, len(r))
%version_information is an IPython magic extension for showing version information for dependency modules in a notebook;
See https://github.com/jrjohansson/version_information
%version_information
is available on Bucknell computers on the linux network. You can easily install it on
any computer.
%load_ext version_information
version_information numpy, scipy, matplotlib