DSCI 200
Katie Burak, Gabriela V. Cohen Freue
Last modified – 14 January 2026
\[ \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\find}{find} \DeclareMathOperator{\st}{subject\,\,to} \newcommand{\E}{E} \newcommand{\Expect}[1]{\E\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]} \newcommand{\given}{\ \vert\ } \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\snorm}[1]{\lVert #1 \rVert} \newcommand{\tr}[1]{\mbox{tr}(#1)} \newcommand{\brt}{\widehat{\beta}^R_{s}} \newcommand{\brl}{\widehat{\beta}^R_{\lambda}} \newcommand{\bls}{\widehat{\beta}_{ols}} \newcommand{\blt}{\widehat{\beta}^L_{s}} \newcommand{\bll}{\widehat{\beta}^L_{\lambda}} \newcommand{\U}{\mathbf{U}} \newcommand{\D}{\mathbf{D}} \newcommand{\V}{\mathbf{V}} \]
This material is based on content adapted from
Words like probability, chance, and odds are commonly used to express uncertainty:
‘Very high’ probability that Canada will see coronavirus, says Toronto respirologist, (CBC News, Jan, 2020)
CDC says flu activity probably has not peaked amid record-breaking season (CNN, Jan 2026)
The riskiest asteroid on record now has near-zero chance of hitting Earth (CNN, Feb 2025)
The probability of a recession has fallen to 40%, (JP Morgan, May 2025)
2025-26 Stanley Cup odds: Avalanche, Hurricanes, Lightning lead favorites (ESPN, Jan 2026)
The odds of winning the Powerball lottery with single ticket is about 1 in 300 million.
More stories in eBook by K. Ross
Randomness can come from:
Outcomes: possible results of the random phenomenon, not necessarily numeric (e.g, side of a coin, students selected for a survey)
Events: a collection of outcomes (e.g, winning all games of the season)
Probability of events: a number between 0 and 1 that measures the uncertainty of an event, how likely an event is to occur.
Note
In many situations, we won’t be focus on defining the set of all possible outcomes (aka sample space). However, it can help us understand what is plausible
{H, T}{H}Since the coin is fair, both outcomes are equally likely
\[P(H) = P(\text{event}) = \frac{(\text{number of outcomes in the event})} {(\text{total number of possible outcomes})} = \frac{1}{2} = 0.5\]
For example, probability of high temperature in wildfires randomly selected from all fires in Alberta
The exact arrival time is uncertain and depends on many unpredictable factors (traffic, delays, decisions).
All possible outcomes: any time of the day
Event: times in the interval [8:55, 9:05]
Photo by Dominic Kurniawan Suryaputra, Unsplash
We can’t compute the probability of the event using frequencies.
Random variables: a number assigned to each outcome, representing a quantity of interest (e.g., age of students selected, temperature the day of a fire)
It can take on multiple possible values, some potentially more likely than others.
Before we observe it, the value is unknown. After observation, it is fixed.
Why is it called a random variable?
Technically, random variables are functions that assign numbers to outcomes. The name reflects two ideas: the value can change (it’s a variable), and it’s uncertain which value it will take (it’s random).
Example: tossing a coin
The distribution of a random variable: is a function that describes the variability of the random variable.
We write this as: \[X \sim \text{Distribution}(\text{parameters})\]
Just like we can have parameters from a finite population, a distribution takes parameters that describe the behavior of a random variable across all possible values
Distributions can be used to calculate probabilities associated with random variables.
In some cases, but not always, the distribution is defined by a specific formula.
Some distributions have have special names (e.g., Bernoulli, Normal)
In Statistics and Data Science:
rows of data can be thought as realizations of outcomes
columns of data can be thought as realizations of random variables
| Finite Population | Model-Based Approach |
|---|---|
| Finite population | Data generated from a probability model |
| Outcomes are a subset of actual units | Outcomes are generated from a distribution |
| Finite sample space | Infinite or Finite sample space |
| Variability comes from the sampling | Variability comes from randomness in the model |
| Probability as frequencies | Model-based probability |
Today, we’ll look at three types of discrete random variables:
A Bernoulli random variable assigns a numeric value to the outcome of a single trial with two possible results:
Notation: \[X \sim \text{Bernoulli}(p)\]
where \(p\) is the probability of success fixed in advance (so \(1 - p\) is failure).
\[x = \left\{ \begin{array}{ll} 1 & \text{if oucome = success};\\ 0 & \text{if oucome = failure}\end{array} \right.\]
for example: did a user click on an ad? (yes = 1, no = 0)
A binomial random variable represents the number of successes in a fixed number of independent Bernoulli trials.
Notation:
\[X \sim \text{Binomial}(n, p)\]
where:
\(n\) - number of trials
\(p\) - probability of success on each trial fixed in advance
\(X\) can take values in \(\{0, 1, \ldots, n\}\)
for example: number of people who vote in favour of a proposal out of 100 surveyed
A Poisson random variable counts the number of times an event occurs in a fixed interval of time or space.
Events happen independently of one another and occur at a constant average rate.
There’s no upper limit to the number of occurrences, but large counts become increasingly rare
Notation:
\[X \sim \text{Poisson}(\lambda)\]
where \(\lambda\) is the expected number of events in the interval.
for example: the number of customers arriving at a store in an hour
| Distribution | Notation | Parameters | Scenario Example |
|---|---|---|---|
| Bernoulli | \(X \sim \text{Bernoulli}(p)\) | \(p\) = chance of success | Single yes/no outcome (e.g., clicked or not) |
| Binomial | \(X \sim \text{Binomial}(n, p)\) | \(n\) = trials, \(p\) = success rate | Count of successes in fixed # of trials |
| Poisson | \(X \sim \text{Poisson}(\lambda)\) | \(\lambda\) = rate | Count of events in time or space (e.g., arrivals) |
Note: There are many other types of discrete random variables you may encounter in future courses (e.g., STAT 302).
A continuous random variable can take on any value in a continuous range (not just whole numbers).
Today, we’ll look at three types of continuous random variables:
We’ll describe the type of data each models and how we use parameters to define their distributions.
All values in a given interval are equally likely
The graph of the distribution is a flat horizontal line
Notation: \[X \sim \text{Uniform}(a, b)\]
where:
Models waiting times between independent events that happen at a constant average rate.
Events are memoryless, meaning that the past doesn’t affect the future.
Notation:
\[X \sim \text{Exponential}(\lambda)\]
where:
where:
| Distribution | Notation | Parameters | Scenario Example |
|---|---|---|---|
| Uniform | \(X \sim \text{Uniform}(a, b)\) | \(a\), \(b\) = min/max | Value equally likely anywhere in a range |
| Exponential | \(X \sim \text{Exponential}(\lambda)\) | \(\lambda\) = rate | Time between independent events |
| Normal | \(X \sim \text{Normal}(\mu, \sigma)\) | \(\mu\) = mean, \(\sigma\) = SD | Symmetric, bell-shaped data (e.g., heights, measurements) |
You’re running an online A/B test where each website visitor is randomly assigned to either version A or version B. For each visitor, you record whether they click the “Sign Up” button (yes or no).
Which random variable best models the click outcome for a single visitor?
A. Binomial
B. Bernoulli
C. Poisson
D. Uniform
E. Normal
Your data science team is analyzing the number of times customers log into an app during a single day. Login events happen independently, and you want to model the how many logins there are per user per day.
Which random variable would best represent the number of logins in a day?
A. Binomial
B. Bernoulli
C. Poisson
D. Uniform
E. Normal
Daily profits of a startup vary around zero, sometimes positive, sometimes negative, with most days near the average.
Which random variable best models this?
A. Binomial
B. Bernoulli
C. Poisson
D. Uniform
E. Normal
In a marketing survey, you contact 50 randomly selected customers to ask if they would recommend your product. Each customer independently responds “yes” or “no.” You want to model the total number of “yes” responses in your sample.
Which random variable is most appropriate for modeling the number of positive responses?
A. Binomial
B. Bernoulli
C. Poisson
D. Uniform
E. Normal
You are simulating the load times for a web app where the server response time is equally likely to be anywhere between 0.5 and 2.5 seconds.
Which random variable best models this server response time?
A. Binomial
B. Bernoulli
C. Poisson
D. Uniform
E. Normal
Outcomes of random phenomena are uncertain.
Randomness may be the result of physical events, selection, assignment, or complex processes, among others.
Outcomes or events may not be equally likely.
Sample space is the set of all possible outcomes; an event is a subset of that space.
A random variable assigns numbers to outcomes of a random process.
Random variables can be discrete (countable values) or continuous (range of values).
Different types of random variables are modeled by different distributions (e.g., Bernoulli, Binomial, Poisson, Uniform).
UBC DSCI 200