Correlation and uncertainty - Part 1

Introduction

Consider a simple experiment in which the goal is to measure what is presumed to be a constant signal in the presence of a constant background. For this example, assume that there is Gaussian noise on both the signal and the background. In the first part of this notebook assume that you can measure the 'total' of 'background + noise', and then you can turn off the signal, and measure the 'noise' alone. The 'signal' will then be the difference between the 'total' and the 'noise.' For class on Tuesday execute the cells in this notebook, and then determine the value of the signal, with an uncertainty. (There is nothing new here -- I just want you to become familiar with the 'experiment.' In the final cells of this notebook I propose a small variation to the experiment that I want to ponder. This will serve as an introduction to Tuesday's class.

In [1]:
import numpy as np
from scipy import stats

import matplotlib as mpl       # As of July 2017 Bucknell computers use v. 2.x 
import matplotlib.pyplot as plt
In [2]:
# Following is an Ipython magic command that puts figures in the  notebook.
%matplotlib notebook

# As of Aug. 2017 reverting to 1.x defaults.
# In 2.x text.ustex requires dvipng, texlive-latex-extra, and texlive-fonts-recommended, 
# which don't seem to be universal
# See https://stackoverflow.com/questions/38906356/error-running-matplotlib-in-latex-type1cm?
mpl.style.use('classic')
        
# M.L. modifications of matplotlib defaults using syntax of v.2.0 
# More info at http://matplotlib.org/2.0.0/users/deflt_style_changes.html
# Changes can also be put in matplotlibrc file, or effected using mpl.rcParams[]
plt.rc('figure', figsize = (6, 4.5))            # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True)             # Adjusts supblot parameters for new size

Generate noisy background and noisy signal. Add them to get measured data.

In [3]:
n_trials = 100
s = 20
alpha_s_gen = 1
b = 10
alpha_b_gen = 2
In [ ]:
values_sig = stats.norm.rvs(s, alpha_s_gen, size=n_trials)
values_bg = stats.norm.rvs(b, alpha_b_gen, size=n_trials)
values_tot = values_sig + values_bg
np.savetxt('correlation_1.dat', values_tot)

np.savetxt('correlation_2.dat', values_bg)

#values_tot=np.loadtxt('http://www.eg.bucknell.edu:/~phys310/skills/data_analysis/correlation_1.dat')http://www.eg.bucknell.edu/~phys310/skills/data_analysis/correlation_1.dat
In [4]:
values_tot = np.loadtxt('http://www.eg.bucknell.edu/~phys310/skills/data_analysis/correlation_1.dat')
values_bg = np.loadtxt('http://www.eg.bucknell.edu/~phys310/skills/data_analysis/correlation_2.dat')
n_trials = len(values_tot)
In [9]:
plt.figure()
x = np.linspace(1,n_trials,n_trials)
plt.scatter(x, values_tot, color='r', label='signal + bg')
plt.scatter(x, values_bg, label='background only')
plt.xlabel('trial')
plt.legend()
plt.ylim(0,45)
plt.grid();

Determine signal (with associated uncertainty)

In [7]:
mean_tot = np.mean(values_tot)
std_tot = np.std(values_tot)
mean_bg = np.mean(values_bg)
std_bg = np.std(values_bg)

alpha_tot = std_tot/np.sqrt(n_trials - 1)
alpha_bg = std_bg/np.sqrt(n_trials - 1)

signal = mean_tot - mean_bg
alpha_sig = np.sqrt(alpha_tot**2 + alpha_bg**2)
print('signal =',signal)
print('uncertainty in total = ', alpha_tot)
print('uncertainty in background =', alpha_bg)
print('uncertainty in signal =', alpha_sig)
signal = 19.92792973777987
uncertainty in total =  0.22237201029855114
uncertainty in background = 0.18490131044101177
uncertainty in signal = 0.2892020151503484

Result:

$ \mbox{signal} = 19.9 \pm 0.3 $

Now assume that we are able to sample the noise at (approximately) the same time that we sample the total

Each data point is now associated with a specific value of the background.

How does this affect your approach to determining the value of the signal?

We will discuss this quantitatively tomorrow, but I want you to think about some of the issues that are involved before Tuesday's discssion.

In [9]:
plt.figure()
x = np.linspace(1,n_trials,n_trials)
plt.scatter(x, values_tot, color='r', label='signal + bg')
plt.scatter(x, values_bg, label='background only')
plt.xlabel('trial')
plt.legend()
plt.ylim(0,50)
plt.grid();

We can now calculate 100 values for the signal, and find $\bar{s}$, $\sigma_s$, and $\alpha_s$:

In [10]:
values_sig = values_tot - values_bg
mean_sig = np.mean(values_sig)
std_sig = np.std(values_sig)
alpha_sig = std_sig/np.sqrt(n_trials -1)
print('signal mean =', mean_sig)
print('uncertainty = ', alpha_sig)
signal mean = 19.92792973777987
uncertainty =  0.09872660912478129

Results:

In this scenario, I find that the signal is

$\mbox{signal} = 20.0 \pm 0.1$

Notice that the uncertainty is reduced by a factor of 3!

Beyond PHYS 211/212, PHYS 221, and Hughes & Hase Chapter 4

As in H&H 4.2, consider a derived quantity $Z$ that depends on measured values of variables $A$ and $B$:

$$ Z = f(A,B). $$

If we have determined that

$$ A = \bar{A} \pm \alpha_A\quad\quad\mbox{and}\quad\quad B = \bar{B} \pm \alpha_B, $$

we say that

$$ \bar Z = f(\bar A, \bar B), $$

but what is the value of $\alpha_Z$?

Up until now we have used the rule given in H&H Eq.(4.10) on p. 41 (and also in the PHYS 211/212 lab manual) that

\begin{eqnarray*} \alpha_Z^2 &=& \left(\alpha_Z^A\right)^2 + \left(\alpha_Z^{B}\right)^2\\ \end{eqnarray*}

When there are correlations in the measured quantities, we should be using

$$ \alpha_Z^2 = \left(\alpha_Z^A\right)^2 + \left(\alpha_Z^B\right)^2 + 2\left.\frac{\partial f}{\partial A}\right|_{\bar A}\, \left.\frac{\partial f}{\partial B}\right|_{\bar B} \left[\frac{1}{N-1}\sum_i (A_i - \bar A)(B_i - \bar B)\right] . $$

The new term in this equation can be written in terms the sample covariance, $\sigma_{AB}$, where

$$ \left(\alpha_Z^{AB}\right)^2 = 2\left.\frac{\partial f}{\partial A}\right|_{\bar A}\, \left.\frac{\partial f}{\partial B}\right|_{\bar B} \sigma^2_{AB} . $$

Rewriting the expression for the standard error in $Z$ in terms of the covariance gives

$$ \alpha_Z^2 =\left(\alpha_Z^A\right)^2 + \left(\alpha_Z^B\right)^2 + \left(\alpha_Z^{AB}\right)^2. $$

In the example in this notebook, the signal (i.e., $Z$) is just the difference between the total and the background (i.e., $Z = A-B$), so

$$ \left.\frac{\partial f}{\partial A}\right|_{\bar A}\, \left.\frac{\partial f}{\partial B}\right|_{\bar B} = -1. $$

Calculating the sample covariance

In [16]:
cov = np.sum((values_tot - mean_tot)*(values_bg - mean_bg))/(n_trials - 1)
print(cov)
std_sig = cor = np.sqrt(std_tot**2 + std_bg**2 - 2*cov)
print(std_sig)
print(std_sig/np.sqrt(n_trials-1))
3.69454311088725
0.9439579065618935
0.09487133921076249

Uncertainty in the signal

$\mbox{Uncertainty} = 0.1$

In [17]:
std_tot**2 + std_bg + 2*cov
Out[17]:
14.124312817165752
In [18]:
np.std(values_tot), np.std(values_bg)
Out[18]:
(2.212573566112023, 1.839744809933578)
In [19]:
std_tot**2 + std_bg**2 + 2*cov_s_cor
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-85566502cbc0> in <module>
----> 1 std_tot**2 + std_bg**2 + 2*cov_s_cor

NameError: name 'cov_s_cor' is not defined

Results with correltated data

In [ ]:
mean_s_2 = sp.mean()
In [20]:
np.sqrt(461 + 127- 2*245)
Out[20]:
9.899494936611665
In [21]:
np.sqrt(1.9**2 + 0.99**2)
Out[21]:
2.1424518664371432
In [22]:
(values_t - mean_t)*(values_b - mean_b)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-df00dec966c9> in <module>
----> 1 (values_t - mean_t)*(values_b - mean_b)

NameError: name 'values_t' is not defined
In [ ]:
np.sum(_)
In [ ]: