Bayes’ Theorem – Definition

Cite this article as:"Bayes’ Theorem – Definition," in The Business Professor, updated July 27, 2019, last accessed October 20, 2020,


Bayes’ Theorem’ Definition

Bayes’ theorem refers to a mathematical formula used to determine conditional probability. The theorem was named after Thomas Bayes, an 18th-century British mathematician. This theorem offers a method of revising existing theories or predictions given new or even additional evidence. Bayes’ theorem, in finance, can be utilized in rating the risk involved in lending money to potential borrowers.

The formula goes thus:

Bayes’ theorem is also referred to as Bayes’ Law or Bayes’ Rule.

A Little More on What is the Bayes’ Theorem

The theorem’s applications are extensive and not restricted to the financial sphere. For instance, Bayes’ theorem can be utilized in determining how accurate medical test results are by considering how possible any specific individual is to have a disease, as well as, the test’s general accuracy.

Bayes’ theorem gives the likelihood of an event dependent on the information which is or might be related to that event. The formula can be utilized in seeing how the probability of an event happening is affected by entirely new information if the new information is true. For instance, say one card is drawn from a full deck of 52 cards. The probability of the card being a king is 4 divided by 52, which is equal to 1/13 or approximately 7.69%. Keep in mind that 4 kings exist in the deck card. Assume it’s revealed that the chosen card is a face card. The probability of the selected card being a king, given it’s a face card, is 4 divided by 12, or approximately 33.3%, as a deck has 12 face cards.

Bayes’ theorem follows from the principle of conditional probability. Conditional probability refers to the probability of an event considering that another event occurred. For instance, an easy probability question might be “What’s the probability of, Inc., (AMZN) stock price falling? ” This question is taken a step further by conditional probability, in that it asks “What’s the probability of Amazon stock price falling considering the fact that the Dow Jones Industrial Average index had fallen earlier?”

A’s conditional probability given considering that B has occurred can be expressed thus:

P(A|B) = P(A and B) / P(B) = P(A∩B) / P(B)

If A stands for Amazon price falls and B, DJIA is already down, the conditional probability expression would read as “the probability that Amazon drops given a decline in DJIA equals the probability that Amazon price declines and also DJIA falls over the probability of a DJIA index decrease.

The probability of A, as well as, B occurring is P(A∩B). It’s the same as the probability of A occurring multiplied by the probability that B occurs considering that A occurred, shown as P(A) x P(B|A). Making use of the same rationale, P(A∩B) is also the probability that B occurs multiplied by the probability that A occurs considering that B occurs, shown as P(B) x P(A|B). The fact that the two expressions are equal brings about the Bayes’ theorem and it’s written as:

if P(A∩B) = P(A) x P(B|A) = P(B) x P(A|B)

then, P(A|B) = [P(A) x P(B|A)] / P(B).

Where P(A) and P(B) are A and B’s probabilities with no regard to each other.

P(B|A) is the probability of B occurring given A is true.

Finally, the conditional probability of A occurring given that B is true is P(A|B).

This formula explicates the relationship existing between the hypothesis’ probability before getting the evidence P(A) and then the hypothesis’ probability after getting the evidence P(A|B), given hypothesis A, as well as, evidence B.

Another instance, imagine that a drug test exists which is 98% accurate which means that 98% of the time its result is positive for someone taking the drug and its result is negative 98% of the time for nonusers of the drug. Next, assume the drug is used by 0.5% of the people. If someone selected randomly, tests positive to the drug, the calculation below can be made to ascertain the probability of the person being an actual user of that drug.

(0.98 x 0.005) / [(0.98 x 0.005) + ((1 – 0.98) x (1 – 0.005))] = 0.0049 / (0.0049 + 0.0199) = 19.76%.

References for Bayesian Theorem

Academic Research on Bayesian Theorem

  • Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem, Smets, P. (2008). Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem. In Classic Works of the Dempster-Shafer Theory of Belief Functions (pp. 633-664). Springer, Berlin, Heidelberg. We generalize the Bayes’ theorem within the transferable belief model framework. The Generalized Bayesian Theorem (GBT) allows us to compute the belief over a space Θ given an observation x⊆ X when one knows only the beliefs over X for every θi ∈ Θ. We also discuss the Disjunctive Rule of Combination (DRC) for distinct pieces of evidence. This rule allows us to compute the belief over X from the beliefs induced by two distinct pieces of evidence when one knows only that one of the pieces of evidence holds. The properties of the DRC and GBT and their uses for belief propagation in directed belief networks are analysed. The use of the discounting factors is justfied. The application of these rules is illustrated by an example of medical diagnosis.
  • Bayesian-based iterative method of image restoration, Richardson, W. H. (1972). Bayesian-based iterative method of image restoration. JOSA, 62(1), 55-59. An iterative method of restoring degraded images was developed by treating images, point spread functions, and degraded images as probability-frequency functions and by applying Bayes’s theorem. The method functions effectively in the presence of noise and is adaptable to computer operation.
  • Rough sets, decision algorithms and Bayestheorem, Pawlak, Z. (2002). Rough sets, decision algorithms and Bayes’ theorem. European Journal of Operational Research, 136(1), 181-189. Rough set-based data analysis starts from a data table, called an information system. The information system contains data about objects of interest characterized in terms of some attributes. Often we distinguish in the information system condition and decision attributes. Such information system is called a decision table. The decision table describes decisions in terms of conditions that must be satisfied in order to carry out the decision specified in the decision table. With every decision table a set of decision rules, called a decision algorithm, can be associated. It is shown that every decision algorithm reveals some well-known probabilistic properties, in particular it satisfies the total probability theorem and Bayes’ theorem. These properties give a new method of drawing conclusions from data, without referring to prior and posterior probabilities, inherently associated with Bayesian reasoning.
  • A multidimensional unfolding method based on Bayestheorem, D’Agostini, G. (1995). A multidimensional unfolding method based on Bayes’ theorem. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 362(2-3), 487-498. Bayes’ theorem offers a natural way to unfold experimental distributions in order to get the best estimates of the true ones. The weak point of the Bayes approach, namely the need of the knowledge of the initial distribution, can be overcome by an iterative procedure. Since the method proposed here does not make use of continuous variables, but simply of cells in the spaces of the true and of the measured quantities, it can be applied in multidimensional problems.
  • Bayesian system identification based on probability logic, Beck, J. L. (2010). Bayesian system identification based on probability logic. Structural Control and Health Monitoring, 17(7), 825-847. Probability logic with Bayesian updating provides a rigorous framework to quantify modeling uncertainty and perform system identification. It uses probability as a multi‐valued propositional logic for plausible reasoning where the probability of a model is a measure of its relative plausibility within a set of models. System identification is thus viewed as inference about plausible system models and not as a quixotic quest for the true model. Instead of using system data to estimate the model parameters, Bayes’ Theorem is used to update the relative plausibility of each model in a model class, which is a set of input–output probability models for the system and a probability distribution over this set that expresses the initial plausibility of each model. Robust predictive analyses informed by the system data use the entire model class with the probabilistic predictions of each model being weighed by its posterior probability. Additional robustness to modeling uncertainty comes from combining the robust predictions of each model class in a set of candidates for the system, where each contribution is weighed by the posterior probability of the model class. This application of Bayes’ Theorem automatically applies a quantitative Ockham’s razor that penalizes the data‐fit of more complex model classes that extract more information from the data. Robust analyses involve integrals over parameter spaces that usually must be evaluated numerically by Laplace’s method of asymptotic approximation or by Markov Chain Monte Carlo methods. An illustrative application is given using synthetic data corresponding to a structural health monitoring benchmark structure.
  • Diagnosis with dependent symptoms: Bayes theorem and the analytic hierarchy process, Saaty, T. L., & Vargas, L. G. (1998). Diagnosis with dependent symptoms: Bayes theorem and the analytic hierarchy process. Operations Research, 46(4), 491-502. Judgments are needed in medical diagnosis to determine what tests to perform given certain symptoms. For many diseases, what information to gather on symptoms and what combination of symptoms lead to a given disease are not well known. Even when the number of symptoms is small, the required number of experiments to generate adequate statistical data can be unmanageably large. There is need in diagnosis for an integrative model that incorporates both statistical data and expert judgment. When statistical data are present but no expert judgment is available, one property of this model should be to reproduce results obtained through time honored procedures such as Bayes theorem. When expert judgment is also present, it should be possible to combine judgment with statistical data to identify the disease that best describes the observed symptoms. Here we are interested in the Analytic Hierarchy Process (AHP) framework that deals with dependence among the elements or clusters of a decision structure to combine statistical and judgmental information. It is shown that the posterior probabilities derived from Bayes theorem are part of this framework, and hence that Bayes theorem is a sufficient condition of a solution in the sense of the AHP. An illustration is given as to how a purely judgment-based model in the AHP can be used in medical diagnosis. The application of the model to a case study demonstrates that both statistics and judgment can be combined to provide diagnostic support to medical practitioner colleagues with whom we have interacted in doing this work.
  • Fiducial distributions and Bayestheorem, Lindley, D. V. (1958). Fiducial distributions and Bayes’ theorem. Journal of the Royal Statistical Society. Series B (Methodological), 102-107. x is a one-dimensional random variable whose distribution depends on a single parameter θ. It is the purpose of this note to establish two results: (i) The necessary and sufficient condition for the fiducial distribution of θ, given x, to be a Bayes’ distribution is that there exist transformations of x to u, and of θ to τ, such that τ is a location parameter for u. The condition will be referred to as (A). This extends some results of Grundy’s (1956). (ii) If, for a random sample of any size from the distribution for x, there exists a single sufficient statistic for θ then the fiducial argument is inconsistent unless condition (A) obtains: and when it does, the fiducial argument is equivalent to a Bayesian argument with uniform prior distribution for τ. The note concludes with an investigation of (A) in the case of the exponential family.
  • Bayestheorem, Koch, K. R. (1990). Bayes’ theorem. In Bayesian Inference with Geodetic Applications (pp. 4-8). Springer, Berlin, Heidelberg.
  • Quantum probabilities as Bayesian probabilities, Caves, C. M., Fuchs, C. A., & Schack, R. (2002). Quantum probabilities as Bayesian probabilities. Physical review A, 65(2), 022305. In the Bayesian approach to probability theory, probability quantifies a degree of belief for a single trial, without any a priori connection to limiting frequencies. In this paper, we show that, despite being prescribed by a fundamental law, probabilities for individual quantum systems can be understood within the Bayesian approach. We argue that the distinction between classical and quantum probabilities lies not in their definition, but in the nature of the information they encode. In the classical world, maximal information about a physical system is complete in the sense of providing definite answers for all possible questions that can be asked of the system. In the quantum world, maximal information is not complete and cannot be completed. Using this distinction, we show that any Bayesian probability assignment in quantum mechanics must have the form of the quantum probability rule, that maximal information about a quantum system leads to a unique quantum-state assignment, and that quantum theory provides a stronger connection between probability and measured frequency than can be justified classically. Finally, we give a Bayesian formulation of quantum-state tomography.

Was this article helpful?