TRANSFORMATIONS

We let y denote a response variable such as the proportion of women in the applicant pool or annual salary or number of manuscripts published in a year, and use x to denote a vector of covariates that might include type of institution, disci­pline, proportion of women on the search committee, etc. If y can be assumed to be normally distributed with some mean m and some variance s2 then we typically fit a linear regression model to y that establishes that m = xP, where P is a vector of unknown regression coefficients.

When the response y is not normally distributed (for example, because y can only take on values 0 and 1) then we can define h = XP and then choose a trans­formation g of m such that

g(M) = H = xP.

For example, if the response variable is a proportion, the logit transformation

g(X> = log [~T~

11-H.

is appropriate. Wheny is a count variable (as in the number of manuscripts pub­lished in a year) the usual transformation is the log transformation.

One approach to obtaining estimates of P is the method of maximum likeli­hood. Let p denote the maximum likelihood estimate (MLE) of p. A nice prop­erty of MLEs is invariance; in general, the MLE of a function h(P) is equal to the function of the MLE of P, thus

h(P) = h(P).

In particular, if n = x£, then

£ = g_1(n).

The difficulty arises when we wish to also estimate the variance of £ for example to then obtain a confidence interval around the point estimate £ . To do so, we typically need to resort to linearization techniques that allow us to com­pute an approximation to the variance of a non-linear function of the parameters. A method that can be used for this purpose is called the Delta method and is described below.

Updated: 12.11.2015 — 10:36