Consider a simple experiment in which the goal is to measure what is presumed to be a constant signal in the presence of a background. For this example, assume that there is Gaussian noise on both the signal and the background. In the first part of this notebook assume that you make several measurements of the 'total' of 'background + noise', and then you make measurements of the 'noise' alone. The 'signal' will then be the difference between the 'total' and the 'noise.' For class on Tuesday execute the cells in this notebook, and then determine the value of the signal, with an uncertainty. (There is nothing new here -- I just want you to become familiar with the 'experiment.'
Marty Ligare, September 2020
import numpy as np
from scipy import stats
import urllib # For importing file from URL
import matplotlib as mpl
import matplotlib.pyplot as plt
# Following is an Ipython magic command that puts figures in the notebook.
%matplotlib notebook
# M.L. modifications of matplotlib defaults
# Changes can also be put in matplotlibrc file,
# or effected using mpl.rcParams[]
mpl.style.use('classic')
plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size
fn = urllib.request.urlopen('http://www.eg.bucknell.edu/~phys310/skills/data_analysis/correlation_1.dat')
values_tot = np.loadtxt(fn)
fn = urllib.request.urlopen('http://www.eg.bucknell.edu/~phys310/skills/data_analysis/correlation_2.dat')
values_bg = np.loadtxt(fn)
n_trials = len(values_tot)
plt.figure()
x = np.linspace(1,n_trials,n_trials)
plt.scatter(x, values_tot, color='r', label='signal + bg')
plt.scatter(x+n_trials, values_bg, label='background only')
plt.xlabel('trial')
plt.legend()
plt.ylim(0,40)
plt.grid();
mean_tot = np.mean(values_tot)
std_tot = np.std(values_tot)
mean_bg = np.mean(values_bg)
std_bg = np.std(values_bg)
alpha_tot = std_tot/np.sqrt(n_trials - 1)
alpha_bg = std_bg/np.sqrt(n_trials - 1)
signal = mean_tot - mean_bg
alpha_sig = np.sqrt(alpha_tot**2 + alpha_bg**2)
print('signal =',signal)
print('uncertainty in total = ', alpha_tot)
print('uncertainty in background =', alpha_bg)
print('uncertainty in signal =', alpha_sig)
$ \mbox{signal} = 19.9 \pm 0.3 $
This isn't too far-fetched an idea. The total could be something like the light from a astronomical object as seen through a telecscope, and the background could be some measure of the ambient light, such as the dectected intensity in a different part of the same image.
Each data point is now associated with a specific value of the background. That is values_bg[0]
is the background at the same time that the total recorded is values_tot[0]
, etc.
plt.figure()
x = np.linspace(1,n_trials,n_trials)
plt.scatter(x, values_tot, color='r', label='signal + bg')
plt.scatter(x, values_bg, label='background only')
plt.xlabel('trial')
plt.legend()
plt.ylim(0,50)
plt.grid();
values_sig = values_tot - values_bg
mean_sig = np.mean(values_sig)
std_sig = np.std(values_sig)
alpha_sig = std_sig/np.sqrt(n_trials -1)
print('signal mean =', mean_sig)
print('uncertainty = ', alpha_sig)
In this scenario, I find that the signal is
$\mbox{signal} = 19.0 \pm 0.1$
Notice that the uncertainty is reduced by a factor of 3!
The data is obviously correlated. When the noise is higher than average, we wouldn't be surpised to find the meausured total to be higher than average too. But the correlation isn't perfect because there is random noise on the signal as well as the background.
Another way to see that the data is correlated is to make a scatter plot of the measured total vs. the measured background.
plt.figure()
plt.scatter(values_bg,values_tot)
plt.xlabel('background')
plt.ylabel('total (signal + background)');
This is an example of highly correlated data, with a correlation coefficient close to 1. (See discussion in H&H starting in the last paragraph of p. 94 and continuing onto p. 95.) The total increases with increasing background, indicating that the correlation coefficient is positive.
In this section we're going to renalyze the simultaneously acquired data in a different way.
As in H&H 4.2, consider a derived quantity $Z$ that depends on measured values of variables $A$ and $B$:
$$ Z = f(A,B). $$If we have determined that
$$ A = \bar{A} \pm \alpha_A\quad\quad\mbox{and}\quad\quad B = \bar{B} \pm \alpha_B, $$we say that
$$ \bar Z = f(\bar A, \bar B), $$but what is the value of $\alpha_Z$?
Up until now we have used the rule for combining uncertainties given in H&H Eq. (4.10) on p. 41 (and also given in the PHYS 211/212 lab manual):
\begin{eqnarray*} \alpha_Z^2 &=& \left(\alpha_Z^A\right)^2 + \left(\alpha_Z^B\right)^2\\ &=& \left(\frac{\partial Z}{\partial A}\right)^2\frac{\sigma_A^2}{N-1} + \left(\frac{\partial Z}{\partial B}\right)^2\frac{\sigma_B^2}{N-1} \end{eqnarray*}When there are correlations in the measured quantities, we should be using the results presented in H&H Section 7.3.1 (starting on p. 93). Equation (7.28) gives the variance of the distribution of $Z$'s:
$$ \sigma_Z^2 = \left(\frac{\partial Z}{\partial A}\right)^2\sigma_A^2 + \left(\frac{\partial Z}{\partial B}\right)^2\sigma_B^2 + \frac{2}{N-1}\left.\frac{\partial Z}{\partial A}\right|_{\bar A,\bar B}\, \left.\frac{\partial Z}{\partial B}\right|_{\bar A,\bar B} \sum_i (A_i - \bar A)(B_i - \bar B), $$where the new term in this equation contains the factor defined as the sample covariance; the symbol used for the covariance in Hughes & Hase is $\sigma_{AB}$:
$$ \sigma_{AB} \equiv\frac{1}{N-1} \sum_i (A_i - \bar A)(B_i - \bar B). $$NOTE: In the widely used text Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson, the symbol for the covariance is $\sigma^2_{AB}$, which might lead to confusion, because the covariance can be negative. In many treatments the symbol for the covariance is $\mbox{cov}(A,B)$.
Dividing both sides of the equation for $\sigma_Z^2$ above by $N-1$ converts the equation into a relationship between standard errors
$$ \alpha_Z^2 = \left(\frac{\partial Z}{\partial A}\right)^2\alpha_A^2 + \left(\frac{\partial Z}{\partial B}\right)^2\alpha_B^2 + 2\, \frac{\partial Z}{\partial A}\, \frac{\partial Z}{\partial B}\frac{\sigma_{AB}}{N-1}. $$In the example treated in this notebook, the signal (i.e., $Z$) is just the difference between the total and the background (i.e., $Z = A-B$), so
$$ \left.\frac{\partial Z}{\partial A}\right|_{\bar A}\, \left.\frac{\partial Z}{\partial B}\right|_{\bar B} = -1 $$cov = np.sum((values_tot - mean_tot)*(values_bg - mean_bg))/(n_trials - 1)
print('covariance = ', cov)
cc = cov/np.std(values_tot)/np.std(values_bg)
print('correlation coefficient = ', cc)
std_sig = np.sqrt(std_tot**2 + std_bg**2 - 2*cov)
print('sample std dev in signal =',std_sig)
print('standard error in signal =',std_sig/np.sqrt(n_trials-1))
$ \mbox{signal} = 19.94 \pm 0.09 $
How do we know if data is correlated or uncorrelated?
version_information
is from J.R. Johansson (jrjohansson at gmail.com); see Introduction to scientific computing with Python for more information and instructions for package installation.
version_information
is installed on the linux network at Bucknell
%load_ext version_information
%version_information numpy, scipy, matplotlib