Example: counting events and throwing dice

Jelle, October 2023

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import nafi

# Use float64 in jax for better precision
import jax
jax.config.update("jax_enable_x64", True)

Consider a counting experiment without background. We observe $n$ events, and want to set an upper limit on the number of expected events $\mu$. We use the test statistic:

$\displaystyle q_\mu = -2 \, \mathrm{\Theta}(\mu - \hat{\mu}) \, \ln \frac{L(\mu)}{L(\hat{\mu})} ,$

where the best-fit hypothesis $\hat{\mu} = n$. The step function ensures we get upper limits:

Outcomes that are excesses ($\hat{\mu} = n > \mu$) are assigned $q_\mu = 0$;
Nafi uses right-tailed p-values, and $q_\mu \geq 0$, so excesses are assigned a p-value of 1;
Thus we always include all $\mu$ below $n$ in any interval we quote, i.e. we get upper limits.

In the notation of CCGV, this is $\tilde{q}_\mu$ or $q_\mu$. (There is no background, so $\hat{\mu} \geq 0$ and the two q’s are identical.)

For computational convenience we consider only $\mu \leq 15$ and $n \leq n_\mathrm{max} = 25$.

[2]:

# Hypotheses: expected events
mu = np.arange(0, 15, .0025)

# Highest observed number of events to consider
# (15 wouldn't be enough to get .01 accuracy)
n_max = 25

# Compute likelihood (outcomes, hypotheses) array, weights (same shape), and outcomes array
lnl, weights, ns = nafi.likelihoods.counting.lnl_and_weights(
    mu, mu_bg=0, n_max=n_max, return_outcomes=True)

# Find test statistics and Neyman construction p-values, both (outcomes, hypotheses) arrays.
# Statistic is q to get upper limits.
qs, ps = nafi.ts_and_pvals(lnl, weights, statistic='q')

# Lower and upper limits, (outcomes,) arrays
ll_counting, ul_counting = nafi.intervals(ps, mu)
assert np.all(ll_counting == 0)

# Statistics on probability of different outcomes (dictionary)
poutcome_counting = nafi.outcome_probabilities(ll_counting, ul_counting, weights, mu)

Consider a new experiment that also throws an $m_\mathrm{max}$-sided die, for which high rolls give very slight evidence for high $\mu$.

For setting upper limits, we must then give outcomes with high dice rolls $m$ slightly lower $q$ values, so they get higher p-values, and will be picked more readily in the Neyman construction. We can implement this by changing the test statistic $q_\mu \rightarrow q_\mu - \epsilon \, m$ and $\epsilon$ a very small number.

More formally, we could define the likelihood of the dice roll $m$ as \begin{equation} P_\mathrm{dice}(m|\mu) \propto 1/m + \epsilon \mu m \end{equation} which clearly makes high rolls slightly more likely for high $\mu$. With some algebra (see Mathematica notebook) we can show that multiplying the Poisson likelihood for $n$ with this term changes the test statistic to \begin{equation} q_\mu \rightarrow q_\mu - \epsilon m \cdot 2 m_\mathrm{max} (\mu - n) + \epsilon \cdot (\text{term that does not depend on $m$}) + O(\epsilon^2). \end{equation} For $n < \mu$ (else $q_\mu = 0$), this clearly leads to the same upper limits as subtracting a small number proportional to $m$ from the test statistic.

[3]:

# Number of sides on the dice
die_sides = 6
m = np.arange(die_sides)

# Create weights for the outcomes, first as a (n_max + 1, m_max, n_hypotheses array).
# Each dice roll is equally likely to occur.
weights_dice = weights[:,None,:] * np.ones(die_sides)[None,:,None] / die_sides
# Reshape as a ((n_max + 1) * m_max, n_hypotheses) array.
# From now on n_outcomes = (n_max + 1) * m_max.
weights_dice = weights_dice.reshape(-1, mu.size)

# Create test statistics. We want higher dice rolls to make q lower,
# i.e. the p-value higher, so the outcome will have more preference
# in the Neyman construction.
qs_dice = (qs[:,None,:] - (1e-6 * m)[None,:,None]).reshape(-1, mu.size)

ps_dice = nafi.neyman_pvals(qs_dice, weights_dice)
ll_dice, ul_dice = nafi.intervals(ps_dice, mu)
poutcome_dice = nafi.outcome_probabilities(ll_dice, ul_dice, weights_dice, mu)

The dice roll allows the experiment to get closer to a 10% mistake rate, or 90% coverage:

[4]:

plt.figure(figsize=(5,4))
plt.plot(mu, 100 * poutcome_counting['mistake'], linewidth=1, label='Just do the experiment', zorder=5)
plt.plot(mu, 100 * poutcome_dice['mistake'], linewidth=1, label=f'... and roll a {die_sides}-sided die')
#plt.axhline(10, linewidth=1, label='... and roll many dice', c='C3')
plt.legend(loc='upper right', frameon=False)
plt.xlabel("True expected events")
plt.ylabel("P(wrong) [%]")
plt.ylim(0, 14)
plt.xlim(0, 10)
# plt.savefig('poisson_dice.png', dpi=200, bbox_inches='tight')

[4]:

(0.0, 10.0)

For the counting experiment, we reproduce the analytical upper limits with good accuracy (diminishing near high $n$ due to our cutoffs in $n$ and $\mu$):

[5]:

ni_max_plot = 6

df = pd.DataFrame(dict(numerical=ul_counting[:ni_max_plot],
                  analytic=[nafi.poisson_ul(n) for n in range(ni_max_plot)]),
            ).T
df.columns = [f"saw {n_} events" for n_ in range(ni_max_plot)]
df

[5]:

	saw 0 events	saw 1 events	saw 2 events	saw 3 events	saw 4 events	saw 5 events
numerical	2.302585	3.88972	5.32232	6.680783	7.993591	9.274683
analytic	2.302585	3.88972	5.32232	6.680783	7.993590	9.274674

The experiment that also throws a dice obtains considerably more stringent limits, unless the highest dice roll appears.

[6]:

import pandas as pd
df = pd.DataFrame(ul_dice.reshape(-1, die_sides)[:ni_max_plot,:]).T.round(2)
df.index = [f"rolled a {m_ + 1}" for m_ in m]
df.columns = [f"saw {n_} events" for n_ in range(ni_max_plot)]
df

[6]:

	saw 0 events	saw 1 events	saw 2 events	saw 3 events	saw 4 events	saw 5 events
rolled a 1	0.51	2.67	4.20	5.61	6.95	8.25
rolled a 2	1.20	2.99	4.48	5.87	7.20	8.49
rolled a 3	1.61	3.27	4.73	6.10	7.42	8.71
rolled a 4	1.90	3.51	4.95	6.31	7.63	8.91
rolled a 5	2.12	3.71	5.15	6.51	7.82	9.10
rolled a 6	2.30	3.89	5.32	6.68	7.99	9.27

Clearly the sensitivity (median result when no signal is present) is much better than the 2.3 events of the pure counting experiment. For a die with many sides, this can get as low as 1.61 events.

[7]:

np.median(np.nan_to_num(ul_dice[:die_sides]))

[7]:

1.753279422264797