Although very useful in Economics, a distributed lag-model with a not defined length of the lag (infinite lag-model) or rather how far back we want go poses some serious problem of estimation. Ad Hoc estimation implies a sequential procedure. Therefore, first regress Yt on Xt then regress Yt on Xt-1 and so on.
A test significance of βi (the coefficient of the lagged variable) is run each time until the coefficient start becoming statistically insignificant. This approach, however suffers from many drawbacks such as there is no prior guide of what is the maximum length of the lag; as one estimate successive lags, there are fewer degrees of freedom left; successive values (lags) tend to be highly correlated and the sequential search for the lag length opens the researcher to the charge of data mining.
The Koyck model proposed a method to estimate distributed-lag model. It assumes an infinite lag distributed-lag model.
Yt = α + β0 Xt+ β1Xt-1+ β1Xt-2+ …. + μt
Hence assuming that the β’s are all of the same sign, Koyck assumes that they decline geometrically.
βk= β0λk where k=0,1,… and 0<λ<1 where λ : rate of decline and (1-λ): speed of decline
This equation postulates that each successive coefficient (β) is numerically less than each preceding β since λ<1, implying that as one goes back into distant past, the effect of that lag on Y became smaller . Therefore the closer λ is to 1, the slower the rate of decline.
As a result, the infinite lag model can be written as
Yt = α + β0 Xt+ β0 λXt-1+ β0 λ2Xt-2+ …. + μt (1)
The model cannot be easily estimated in this form therefore, Koyck transformation proceeds with these steps:
It lags one period to obtain
Yt-1 = α + β0 Xt-1+ β0 λXt-2+ β0 λ2Xt-3+ ….+ μt-1 (2) Then he multiplies by λ to obtain
λ Yt-1 = λ α + λ β0 Xt-1+ λ β0 λXt-2+ λ β0 λ2Xt-3+ ….+ λ μt-1 (3) Subtracting (3) from (1)
he obtains
Yt- λ Yt-1 = α (1- λ) + β0 Xt+ (μt - λ μt-1)
Thus, rearranging
Yt= α (1- λ) + β0 Xt+ λ Yt-1+ vt where vt= μt - λ μt-1
Therefore we have converted a distributed model into an auto regressive model. This transformation implies that we know have only three unknowns (α, β0, λ); there is no reason to expect multi-collinearity, although we may have a problem of serial correlation because of vt .