Documentation
Internal
Invariants

Invariants of Probability Distributions

Invariants to check with property tests.

This document right now is normative and aspirational, not a description of the testing that's currently done.

Algebraic combinations

The academic keyword to search for in relation to this document is "algebra of random variables (opens in a new tab)". Squiggle doesn't yet support getting the standard deviation, denoted by σ\sigma, but such support could yet be added.

Means and standard deviations

Sums

mean(f+g)=mean(f)+mean(g)mean(f+g) = mean(f) + mean(g)
σ(f+g)=σ(f)2+σ(g)2\sigma(f+g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}

In the case of normal distributions,

mean(normal(a,b)+normal(c,d))=mean(normal(a+c,b2+d2))mean(normal(a,b) + normal(c,d)) = mean(normal(a+c, \sqrt{b^2 + d^2}))

Subtractions

mean(fg)=mean(f)mean(g)mean(f-g) = mean(f) - mean(g)
σ(fg)=σ(f)2+σ(g)2\sigma(f-g) = \sqrt{\sigma(f)^2 + \sigma(g)^2}

Multiplications

mean(fg)=mean(f)mean(g)mean(f \cdot g) = mean(f) \cdot mean(g)
σ(fg)=(σ(f)2+mean(f))(σ(g)2+mean(g))(mean(f)mean(g))2\sigma(f \cdot g) = \sqrt{ (\sigma(f)^2 + mean(f)) \cdot (\sigma(g)^2 + mean(g)) - (mean(f) \cdot mean(g))^2}

Divisions

Divisions are tricky, and in general we don't have good expressions to characterize properties of ratios. In particular, the ratio of two normals is a Cauchy distribution, which doesn't have to have a mean.

Probability density functions (pdfs)

Specifying the pdf of the sum/multiplication/... of distributions as a function of the pdfs of the individual arguments can still be done. But it requires integration. My sense is that this is still doable, and I (Nuño) provide some pseudocode to do this.

Sums

Let f,gf, g be two independently distributed functions. Then, the pdf of their sum, evaluated at a point zz, expressed as (f+g)(z)(f + g)(z), is given by:

(f+g)(z)=f(x)g(zx)dx(f + g)(z)= \int_{-\infty}^{\infty} f(x)\cdot g(z-x) \,dx

See a proof sketch here (opens in a new tab)

Here is some pseudocode to approximate this:

// pdf1 and pdf2 are pdfs,
// and cdf1 and cdf2 are their corresponding cdfs
 
let epsilonForBounds = 2 ** -16;
let getBounds = (cdf) => {
  let cdf_min = -1;
  let cdf_max = 1;
  let n = 0;
  while (
    (cdf(cdf_min) > epsilonForBounds || 1 - cdf(cdf_max) > epsilonForBounds) &&
    n < 10
  ) {
    if (cdf(cdf_min) > epsilonForBounds) {
      cdf_min = cdf_min * 2;
    }
    if (1 - cdf(cdf_max) > epsilonForBounds) {
      cdf_max = cdf_max * 2;
    }
  }
  return [cdf_min, cdf_max];
};
 
let epsilonForIntegrals = 2 ** -16;
let pdfOfSum = (pdf1, pdf2, cdf1, cdf2, z) => {
  let bounds1 = getBounds(cdf1);
  let bounds2 = getBounds(cdf2);
  let bounds = [
    Math.min(bounds1[0], bounds2[0]),
    Math.max(bounds1[1], bounds2[1]),
  ];
 
  let result = 0;
  for (let x = bounds[0]; (x = x + epsilonForIntegrals); x < bounds[1]) {
    let delta = pdf1(x) * pdf2(z - x);
    result = result + delta * epsilonForIntegrals;
  }
  return result;
};

pdf, cdf, and quantile

With dist,pdf:=xpdf(dist,x)cdf:=xcdf(dist,x)quantile:=pquantile(dist,p)\forall dist, pdf := x \mapsto \texttt{pdf}(dist, x) \land cdf := x \mapsto \texttt{cdf}(dist, x) \land quantile := p \mapsto \texttt{quantile}(dist, p),

cdf and quantile are inverses

x(0,1),cdf(quantile(x))=xxdom(cdf),x=quantile(cdf(x))\forall x \in (0,1), cdf(quantile(x)) = x \land \forall x \in \texttt{dom}(cdf), x = quantile(cdf(x))

The codomain of cdf equals the open interval (0,1) equals the codomain of pdf

cod(cdf)=(0,1)=cod(pdf)\texttt{cod}(cdf) = (0,1) = \texttt{cod}(pdf)

To do:

  • Write out invariants for CDFs and Inverse CDFs
  • Provide sources or derivations, useful as this document becomes more complicated
  • Provide definitions for the probability density function, exponential, inverse, log, etc.
  • Provide at least some tests for division
  • See if playing around with characteristic functions turns out anything useful