Interpreting Regression
1
Preamble
1.1
Caution
1.2
Preamble
1.3
A focus on Interpretation
Interpreting a Random Quantity
2
Probability: When an Outcome is Unknown
2.1
Probability Distributions
2.1.1
Probability
2.1.2
Probability Distributions
2.1.3
Examples of Probability Distributions
2.2
Continuous random variables (10 min)
2.3
Density Functions (20 min)
2.3.1
Example: “Low Purity Octane”
2.3.2
Example: Monthly Expenses
2.4
Summary and take-aways
3
Distribution Properties: Quantities we can Interpret
3.1
Probabilistic Quantities
3.2
Measures of central tendency and uncertainty
3.2.1
Mode and Entropy
3.2.2
Mean and Variance
3.3
What is the mean, anyway?
3.4
Quantiles
3.5
Continuous Distribution Properties
3.5.1
Mean, Variance, Mode, and Entropy (5 min)
3.5.2
Median (5 min)
3.5.3
Quantiles (5 min)
3.5.4
Prediction Intervals (5 min)
3.5.5
Skewness (5 min)
3.5.6
Examples
3.6
Heavy-Tailed Distributions
3.6.1
Sensitivity of the mean to extremes
3.6.2
Heavy-tailed Distributions
3.6.3
Heavy-tailed distribution families
3.6.4
Extreme Value Analysis
3.6.5
Multivariate Student’s
t
distributions
4
Explaining an uncertain outcome: interpretable quantities
4.0.1
Cumulative Density Functions (cdf’s) / Distribution Functions
4.0.2
Survival Function (2 min)
4.0.3
Quantile Function (5 min)
4.0.4
Other ways of depicting a distribution (Optional) (1 min)
5
Simulation: When calculations are difficult
5.0.1
Learning Objectives
5.0.2
Review Activity (15 min)
5.0.3
Random Samples: Terminology (5 min)
5.0.4
Seeds (5 min)
5.0.5
Generating Random Samples: Code
5.0.6
Running Simulations
5.0.7
Multi-Step Simulations (10 min)
5.0.8
Generating Continuous Data
6
Parametric Families of Distributions
6.1
Concepts
6.1.1
Binomial Distribution
6.1.2
Families vs. distributions
6.1.3
Parameters
6.1.4
Parameterization
6.1.5
Distribution Families in Practice
6.2
Common Parametric Families
6.2.1
Geometric
6.2.2
Negative Binomial
6.2.3
Poisson
6.2.4
Bernoulli
6.2.5
Uniform
(3 min)
6.2.6
Gaussian / Normal
(4 min)
6.2.7
Log-Normal
Family
6.2.8
Exponential
Family
6.2.9
Weibull
Family
6.2.10
Beta
Family
6.2.11
Activity
6.3
Relevant R functions (8 min)
6.4
Analyses under a Distributional Assumption
6.4.1
Maximum Likelihood Estimation
6.4.2
Usefulness in Practice
Prediction: harnessing the signal
7
Reducing uncertainty of the outcome: conditional distributions
7.1
Conditional Distributions
7.2
Joint Distributions
7.2.1
Example: Length of Stay vs. Gang Demand
7.2.2
Marginal Distributions
7.2.3
Calculating Marginals from the Joint
7.2.4
Conditioning on one Variable
7.2.5
Law of Total Probability/Expectation
7.2.6
Exercises
7.3
Multivariate Densities/pdf’s
7.3.1
Conditional Distributions, revisited
7.4
Dependence concepts
7.4.1
Independence
7.4.2
Measures of dependence
7.4.3
Dependence as separate from the marginals
7.4.4
Dependence as giving us more information
7.5
Harvesting Dependence
7.5.1
Example: River Flow
7.5.2
Direction of Dependence
7.6
Marginal Distributions
7.6.1
Marginal Distribution from Conditional
7.6.2
Marginal Mean from Conditional
7.6.3
Marginal Quantiles from Conditional
7.6.4
Activity
8
Estimating parametric model functions
8.1
Writing the sample mean as an optimization problem
8.2
Evaluating Model Goodness: Quantiles
8.3
Simple Linear Regression
8.3.1
Model Specification
8.4
Linear models in general
8.5
reference-treatment parameterization
8.5.1
More than one category (Lab 2)
8.6
Concepts
9
Estimating assumption-free: the world of supervised learning techniques
9.1
What machine learning is
9.2
Types of Supervised Learning
9.3
Local Regression
9.3.1
kNN
9.3.2
loess
9.3.3
In-Class Exercises
9.3.4
Hyperparameters and the bias/variance tradeoff
9.3.5
Extensions to kNN and loess
9.3.6
Model assumptions and the bias/variance tradeoff
9.4
Splines and Loess Regression
9.4.1
Loess
Special cases
10
Regression when data are censored: survival analysis
10.1
Data
10.2
Univariate Estimation
10.2.1
Non-parametric Estimates with Kaplan-Meier
10.2.2
Parametric Estimation
10.3
Regression with Survival Data
10.3.1
Proportional Hazards Model
10.3.2
Prediction
10.4
Concept list
11
Regression when data are ordinal
11.1
Concept list
12
Regression when data are missing: multiple imputation
12.1
Mean Imputation
12.2
Multiple Imputation
12.2.1
Patterns
12.2.2
Multiple Imputation
12.2.3
Pooling
12.3
Step 0: What data are missing?
12.4
Step 1: Handling Missing Data
12.4.1
Any Ideas?
12.4.2
mice
12.5
Step 3: Pool results
12.6
Concepts
13
Regression on an entire distribution: Probabilistic Forecasting
13.1
Probabilistic Forecasting: What it is
13.2
Review: Univariate distribution estimates
13.2.1
Continuous response
13.2.2
Discrete Response
13.3
Probabilistic Forecasts: subset-based learning methods
13.3.1
The techniques
13.3.2
Exercise
13.3.3
Bias-variance tradeoff
13.3.4
Evaluating Model Goodness
13.4
Discussion Points
13.5
When are they not useful?
Published with bookdown
Interpreting Regression
Prediction: harnessing the signal