Blog Home

Bayesian Statistics

These notes come from the aforementioned handbook.

Basic basics

Definition: Bayesian Statistics is a statistical inference system (or philosphy depending on how you look at it) that uses prior knowledge to predict some posterior knowledge

  • This new posterior knowledge can then be used to make predictions and serve as the next prior

Bayes Theorem: The foundation of Bayesian statistics which states that for events A and B

or in other words, the probability of A occuring given B has occured is the fraction as stated.

Applied to data, the formula is

where is the prior, is a variable whose probability is captured by the prior and posterior distributions (the updating example uses for the sake of computation, but its actually . As such, it is used , and is the likelihood function and subscribes the probability of observing the data given the specific model parameter(s).

Lastly, is called the marginal probability. It essentially represents the weighted average of the likelihoods of the data over all possible model parameters . Mathematically, it is an integral w.r.t where the probability of the data given is multiplied by the prior at .

Note: the potation switches from to since we are interested in the probability of the components not the distributions themselves

Mathematics of Bayesian Updating

With these distributions in mind, updating the prior is basically just using Bayes' Theorem after defining the initial prior (with something like a Beta distribution). The actual update is still a PDF, but when using software like JASP, the yielded output is the expectation of posterior.

This isn't part of the book, but in general for bernoulli trials, you calculate the posterior as

with the numerator being the unnormalized posterior. It becomes normalized by dividing by the definite integral from 0 to 1.

Bayesian Hypothesis Testing

Neatly, you can use Bayes' Theorem for hypothesis testing. The framework is essentially

  1. null
  2. alternative or more properly,
  3. Get the Bayes Factor by taking the ratio of marginal likelihoods where if the null is in the numerator, it tells us how many more times the null is likely than the alternative.

From this, we see that posterior odds = prior odds where denotes the ratio of the probabilities of the alternative and null hypotheses.

Bayesian Correlation

Recall the frequentest Pearson correlation coefficient where represents a perfect (either positive or negative) correlation. The neat thing is that because inference with the Pearson correlation coefficient is based on estimating , we can use Bayes' Theorem substituting for .

  • Note: The prior is called the stretched beta distribution which is a beta distribution measured from .

Also, we can apply the Bayesian hypothesis testing from the last section where and . Then we can calculate the same Bayes Factor which tells us which hypothesis does a better job predicting the observed data. The Bayes' Factor also quantifies by what factor the better hypothesis performs compared to the other one.

Communicating Bayesian Results

Basically we want to include 1. Model Definitions, 2. Model Comparison and Parameter Estimation, 3. Sensitivity Analysis. These are loosely described as

  1. Our hypothesis like in frequentist statistics, and the prior distribution under .
  2. Interpretation of the Bayes Factor and/or posterior model probability.
  3. Discussion of how the choice of prior affects the overall results.

Bayesian t-test

The Bayesian t-test revolves around effect size (also called Cohen's ) which is the standardized difference between two population means Mathematically, it is defined as

where is the standard deviation.

The concept of "one-tailed" or "two-tailed" tests is expressed in the form of hypothesis surrounding where a two tailed test fixes and a one tailed test fixes or .

The prior distribution is a scaled Cauchy distribution with continuous scaling parameter . The updating algorithm uses Bayes' Theorem while calculating the Bayes Factor before updating. This helps determine the credibility of the prior model before update which is valuable in communicating your results.

Bayesian Regression

Recall the frequentist for of linear regression where coefficients are estimated to predict a target from input . This is written as

where is the error term and generally assumed to be normally distrbuted.

Predictably, we can apply Bayes Theorem by using as the parameter of interest. As a result, we get a posterior distribution over the coefficients. Then assuming the likelihood follows from the assumption that the errors are normal, we can apply Bayesian hypothesis testing where (predictor has no effect) and . In this instance, the Bayes Factor tells us which hypothesis is more likely to be accurate in variable selection.

E.g. Predicting Test Scores from Study Hours Suppose we want to predict test scores () from hours studied (). We have data from 5 students:

Hours ()Score ()
150
255
365
470
580

Step 1: Define the prior

We set expressing weak prior belief that the slope is near zero but allowing for a wide range of values.

Step 2: Compute the likelihood

Using the regression model , we calculate how probable the observed data is for different values of .

Step 3: Obtain the posterior

After applying Bayes' Theorem (often via MCMC sampling in software like JASP or R's brms), we get:

This tells us the posterior mean for the slope is approximately 7.5 points per hour studied, with a 95% credible interval of roughly .

Step 4: Hypothesis testing

We can test vs . If , this means the data are 15 times more likely under than , providing strong evidence that study hours predict test scores.

  • Note: The credible interval not containing 0 aligns with the Bayes Factor favoring , but these are conceptually different—the interval describes parameter uncertainty while the Bayes Factor compares model fit.