Intuition
Given a scatter plot of two variables, simple linear regression draws the single best straight line through the points. “Best” means the line that minimizes the total squared vertical distance from each point to the line. One predictor, one response, one line. It’s the most elementary form of regression and the natural starting point before moving to the multiple-predictor case in Regression Fundamentals.
Definition
The simple linear regression model expresses a response as a linear function of a single predictor :
- - the true intercept (value of when )
- - the true slope (change in per unit change in )
- - independent random errors
The goal is to estimate and from observed data .
Key Formulas
Estimated regression equation:
where and are the least-squares estimates that minimize the residual sum of squares:
Slope estimate:
Intercept estimate:
Estimated error variance:
The denominator is because two parameters (, ) are estimated.
Note
For the full OLS derivation, multiple regression extension, residual diagnostics, and interpretation, see Regression Fundamentals.
Example
Predicting chemistry grades. A professor collects intelligence test scores () and chemistry grades () for 20 students, obtaining , , , and .
The fitted line is . Interpretation: each additional point on the intelligence test is associated with a 0.30-point increase in chemistry grade, on average. For a student scoring :
The predicted chemistry grade is 78. The same technique applies to predicting final animal weight from feed consumed - any scenario where one continuous variable drives another.
Why It Matters in CS
Simple linear regression is the “can a straight line explain this?” test. In ML, it’s the first model you fit before reaching for anything fancier, because if a line already gets you 90% of the way there, a neural net is probably overkill. It sets both a performance floor and an interpretability ceiling.
One underrated use: empirical complexity analysis. Plot execution time against input size and fit . If the fit is tight, your algorithm is linear. Fit instead and estimates the polynomial exponent. This is often faster than deriving the complexity analytically, especially for messy real-world code.
Tip
When doing exploratory data analysis, fitting a simple regression for each feature individually is a quick way to see which predictors have any marginal relationship with the response. It’s not a substitute for multiple regression (confounding is real), but it’s a useful first pass.
Related Notes
- Regression Fundamentals - extends to multiple predictors, OLS in matrix form, residual diagnostics, and
- Maximum Likelihood Estimation - under normality, MLE of regression coefficients equals OLS
- Normal Distribution - the error distribution assumption
- Hypothesis Testing - -tests on to assess whether the slope differs from zero