Warm tip: This article is reproduced from serverfault.com, please click

Recovering Last Available Observation Per country Stata

发布于 2020-12-06 22:45:01

I am trying to only keep the last available observation for each variable, however, the issue is that different variables per country were measured in different years. My data currently looks like this:

iso3c year  Gini          variable1         variable2                variable3
AND   2000       .          1.279314         33                        22
AND   2001      22          2.571869         .                          .
AND   2002       .          3.492054         .                          .
AND   2003      44          3.89996          .                           

This is my code:

gsort + iso3c - year
drop if Gini==. & variable1==.      &     variable2==.       &       variable3==.
bysort iso3c: keep if _n==1 
drop year

I tried this with one variable as in below, and then ran the other lines and it worked well.

drop if Gini==. 

However, because I have different variables measured in different years per country, Stata ends up only keeping the following:

iso3c Gini     variable1            variable2            variable3

AND    44           3.89996          .                   .                               

However, I want something like this, where the last available observation for variables 2 and 3 are also kept from the year 2000 even though the variables were not measured in 2004.

iso3c Gini      variable1            variable2                variable3

AND    44           3.89996          33                        22                
Questioner
maldini425
Viewed
0
Nick Cox 2020-12-07 16:59:12

Note that collapse (lastnm) Gini variable*, by(iso3c) is a one-line solution to this.

Let's show as well how to get something similar from first principles.

The last non-missing value in each panel is accessible once you sort the non-missings to the end of the panel (temporarily). If no non-missing value is available, necessarily a missing value will be returned instead.

clear 
input str3 iso3c year Gini variable1 variable2 variable3
AND 2000 . 1.279314 33 22
AND 2001 22 2.571869 . .
AND 2002 . 3.492054 . .
AND 2003 44 3.89996 . . 
end 

gen OK = . 

foreach v in Gini variable1 variable2 variable3 { 
    replace OK = !missing(`v')
    bysort iso3c (OK year) : gen `v'_lnm = `v'[_N] 
}

sort iso3c year 
list iso3c year *lnm 

     +----------------------------------------------------------+
     | iso3c   year   Gini_lnm   va~1_lnm   va~2_lnm   va~3_lnm |
     |----------------------------------------------------------|
  1. |   AND   2000         44    3.89996         33         22 |
  2. |   AND   2001         44    3.89996         33         22 |
  3. |   AND   2002         44    3.89996         33         22 |
  4. |   AND   2003         44    3.89996         33         22 |
     +----------------------------------------------------------+