View Stata variable labels in Pandas

Question

Warm tip: This article is reproduced from stackoverflow.com, please click

dataframe pandas python stata

View Stata variable labels in Pandas

发布于 2020-05-15 13:29:15

Stata .dta files include labels/descriptions for each column, which can be viewed in Stata using the describe command. For example, the adults and kids variables in this online dataset, have descriptions number of adults in household and number of children in household, respectively:

clear
use http://www.principlesofeconometrics.com/stata/alcohol.dta

describe

Contains data from http://www.principlesofeconometrics.com/stata/alcohol.dta
  obs:         1,000                          
 vars:             4                          10 Nov 2007 11:33
 size:         5,000                          (_dta has notes)
-------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------------------------------------------------------
adults          byte    %8.0g                 number of adults in household
kids            byte    %8.0g                 number of children in household
income          int     %8.0g                 weekly income
consume         byte    %8.0g                 =1 if consume alcohol, =0 otherwise
-------------------------------------------------------------------------------------------------------------------------------------
Sorted by:

Those descriptions do not show up in Pandas, for example with describe():

df = pd.read_stata('http://www.principlesofeconometrics.com/stata/alcohol.dta')
df

     adults  kids  income  consume
0         2     2     758        1
1         2     3    1785        1
2         3     0    1200        1
..      ...   ...     ...      ...
997       2     0    1383        1
998       2     2     816        0
999       2     2     387        0

df.describe()

            adults         kids       income      consume
count  1000.000000  1000.000000  1000.000000  1000.000000
mean      2.012000     0.722000   649.528000     0.766000
std       0.815181     1.078833   460.657826     0.423584
min       1.000000     0.000000    12.000000     0.000000
25%       2.000000     0.000000   295.000000     1.000000
50%       2.000000     0.000000   562.500000     1.000000
75%       2.000000     1.000000   887.500000     1.000000
max       6.000000     5.000000  3846.000000     1.000000

Is there a way to view this information after loading it to a Pandas DataFrame using read_stata()?

Questioner

Max Ghenis

Viewed

38

Chinese

Original

View Stata variable labels in Pandas

Related issues