In Chapter 11, we introduced logistic regression models for binary response variables. Here we will show some of the mathematics underlying these models, making using of the matrix notation for regression introduced in Appendix A.
Matrix representation of logistic regression
Given a binary response variable and predictors , the logistic regression model is
where .
Similar to linear regression, we can write a matrix representation of Equation B.1.
We have the following components in Equation B.2:
- is the vector of probabilities, such that
- is the design matrix. Similar to linear regression, the first column is , a column of 1’s corresponding to the intercept.
- is a vector of model coefficients.
Though not directly in Equation B.1 or Equation B.2, the underlying data also includes , an vector of the binary response variables.
We are often interested in the probabilities computed from the logistic regression model. The probabilities computed from Equation B.2 are
See Section 11.2 for more detail about the relationship between the logit, odds, and probability.
Estimation
We want to find estimates that are the best fit for the data based on the model in Equation B.2. In Section 11.4, we outlined how we use maximum likelihood estimation to find . Here we will show more of the mathematical details behind the model estimation.
Let be a random variable that takes values 0 or 1. Then follows a Bernoulli distribution such that
where .
and .
The response variable follows a Bernoulli distribution, such that . Let be the row of the design matrix . Then, using Equation B.3, we have
that the likelihood function is a measure of how likely we observe the data given particular values of the model parameters .
Let be independent Bernoulli random variables. The joint distribution of (the probability of observing these values) is
where .
Using Equation B.4, the likelihood function for logistic regression is
To make the math more manageable, we will maximize the log likelihood shown in Equation B.6. Maximizing Equation B.6 is equivalent to maximizing Equation B.5.
We take the first derivative of Equation B.6 with respect to . An outline of the steps is shown below. The maximum likelihood estimator is the vector of coefficients such that is the solution to .
There is no closed-from solution for this, i.e., there is no neat formula for as we found in Section A.3.1 for linear regression. Therefore, numerical approximation methods such are used to find the maximum likelihood estimators . One popular method is Newton-Raphson, a “root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function” (Wikipedia contributors 2025). Numerical approximation methods such as Newton Raphson systematically search the space of of possible values of until it converges on the solution (the “root”) to Equation B.7.
Inference for logistic regression
In Section 11.5, we introduced inference for a single coefficient in the logistic regression model. Because there is no closed-form solution for the maximum likelihood estimator found in Section B.2, there is no closed form solution for the mean and variance for the distribution of . We rely on theoretical results to know the distribution of as gets large (called asymptotic results).
Given is large,
where is an diagonal matrix, such that is the estimated variance for the observation.
The standard error used for hypothesis testing and confidence intervals for a single coefficient , is computed as the diagonal element of . This is why the hypothesis tests and confidence intervals in Section 11.5.2 are only reliable for large , because they depend on asymptotic approximations in Equation B.8. We can use simulation-based methods if the data has a small sample size.
Wikipedia contributors. 2025.
“Newton’s Method.” https://en.wikipedia.org/wiki/Newton%27s_method.