Probability Distributions — A summary of discrete ones

GM Fuster

Published in

Nerd For Tech

7 min readDec 7, 2022

I’m no expert here, these are my summary notes.

Note: I have another article you may want to check out first

Just Some General Concepts

A probability distribution provides the probabilities of different possible occurrences. We can denote the actual outcome of an event as Y, and one of the possible outcomes can be y.

P(Y = y) can also be indicated with P(y)

We can call P(y) the probability function. P(5) would be the probability of getting a 5.

Population Data: all the data

Sample Data: a subset of the population data

The more the data is close to the middle or mean (or expected value), the less dispersed the data is.

There is a difference between the mean (arithmetic mean) and the median.

Let’s say we have outcomes 15,14,2,27,13.

For the median, we need to sort the elements and take the middle value.

Or, if the number of elements is even:

The difference between each element and the arithmetic mean is a deviation from the mean. The sum of all the deviation will be zero. Taking all deviations into account tells how spread the data is. The formula for the variance is:

Notice the denominator is (n-1), not n.

The variance doesn’t have the same units as the elements it is used on, because of the square. To get the deviation on the same units, we can use the standard deviation, which is just:

There is a constant correlation between the mean and the variance. (Example with the formula later on).

remember:

The variance and the standard deviation are absolute measures. For comparisons of different sets, it’s better to use relative variations, such as the coefficient of variation.

Discrete and Continuous Distributions

When we have a finite number of outcomes we have a discrete distribution. When we can have infinite outcomes, we have a continuous distribution. Sometimes whether we consider a data discrete or continuous depends on how we are measuring. Are we just going to use some rounding? then we can pretend the data is discrete (we will take only integers for instance). If not, as long as we can have a value that has infinite decimals (for example), we have a continuous distribution.

Characteristics of Discrete Distributions:

finite number of outcomes
can be expressed in table or graph
The expected value may be unattainable (2.5 when we are only dealing with integers)
We usually use bar graphs to indicate the values.
P(y ≤ y) = P( y < y+1)

Characteristics of Continuous Distributions:

Infinite number of values
Cannot just add up values to get results
Expressed with a graph or continuous function (not bars but a curve)

Types of Discrete Distributions:

When all outcomes are equally likely we have an Equiprobable Uniform Distribution or Uniform Distribution.

When we only have 2 possible outcomes and just 1 trial, we have a Bernoulli Distribution. If we take a Bernoulli distribution and repeat it multiple times (more than 1 trial), we have a Binomial Distribution.

If we want to test how UNUSUAL an event is, we need a Poisson Distribution.

Types of Continuous Distributions:

We express this graphically with a curve. The values at the end are called outliers. When we only have a sample of the original data, we don’t have a normal distribution anymore, but a Student’s-T distribution. In this case there can be values that are very extreme (very low probability) in the normal distribution but that appear a few times in the sample distribution, therefore being overrepresented.

When the values start at zero and are only negative, we can have a Chi-Squared distribution. Not common. When the events change rapidly early on, we have an exponential distribution. When we use them for forecasting, we have a logistic distribution.

I’m just going to cover discrete distributions in this article.

Uniform Distribution (discrete)

We use U to denote this type of distribution.

X ~ U (3,7) : x follows a uniform distribution for values between 3 and 7.

In a uniform distribution, all outcomes have the same probability. In this case, the mean and the variance can be calculated but they don’t really provide any predictability value. For instance:

The above shows how to calculate the mean (or expected value) with the discrete distribution, uniform since all values have the same probability. The 2 ways calculated above yield the same result because the distribution is UNIFORM. If all outcomes didn’t have the same probability they would not be the same.

We have the following formula previously shown:

For the dice example:

As previously stated, the mean and the variance when all outcomes have the same probability can be calculated but cannot be used to predict anything.

Bernoulli Distribution (discrete)

We denote this one as Bern(p) instead of U(p).

In this distribution we only have 2 possible outcomes and 1 trial. Since we only have 2 possible outcomes, one has probability of p and the other one of (1-p). We usually know the probabilities or can calculate them from previous data.

By convention in this type of distribution, we assign 0 to one outcome and 1 to the other one. Also by convention, we take p> (1-p). Also, p will be assigned 1 and (1-p) will be assigned 0.

E(X) = 1 * p + 0 * (1-p) = p.

The typical example is tossing a coin, just once.

Binomial Distribution (discrete)

Similar to the Bernoulli one but now instead of one trial we have multiple ones. Still only 2 possible outcomes in each trial.

We denote it by B(n,p) where n is the number of trials and p the probability to succeed in each trial.

The formula for the Binomial distribution is:

In the above, n is the number of trials we are going to do, and y the number of success outcomes we want to get (and we assign p to the outcome we consider success).

For an example of this we need only 2 possible outcomes per trial. Let’s start with the famous coins, in this case p = 0.5 and (1-p) = 0.5 (like we saw in the Bernoulli section).

We are going to flip the coin 10 times, and we want to know the probability of getting 6 heads. From the above formula.

From the E(x) formula we have been using to calculate the mean, after simplifications

E(y) = n * p

variance = n * p * (1-p)

Another example. A person can like Mozart or not (2 possible outcomes). We know that 80% of people like Mozart (we don’t, I’m just saying it). If we select 9 people, what are the odds that 6 of them like Mozart?

Poisson Distribution (discrete)

When we want to know the likelihood of a certain event occurring over a given interval of time or distance (interval, not number of trials). We use it to find out how likely it is when we get something different from what’s normal.

For example, you have a store and you usually get 4 people every morning, but one morning you get 7 people. How likely was that.