python - 查看Pandas中的Stata变量标签

发布于 2020-05-19 17:10:00

Stata .dta文件包含每列的标签/描述，可以使用describe命令在Stata中查看。例如，此在线数据集中的adults和kids变量分别具有和的描述：number of adults in householdnumber of children in household

clear
use http://www.principlesofeconometrics.com/stata/alcohol.dta

describe

Contains data from http://www.principlesofeconometrics.com/stata/alcohol.dta
  obs:         1,000                          
 vars:             4                          10 Nov 2007 11:33
 size:         5,000                          (_dta has notes)
-------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------------------------------------------------------
adults          byte    %8.0g                 number of adults in household
kids            byte    %8.0g                 number of children in household
income          int     %8.0g                 weekly income
consume         byte    %8.0g                 =1 if consume alcohol, =0 otherwise
-------------------------------------------------------------------------------------------------------------------------------------
Sorted by:

这些描述不会出现在Pandas中，例如describe()：

df = pd.read_stata('http://www.principlesofeconometrics.com/stata/alcohol.dta')
df

     adults  kids  income  consume
0         2     2     758        1
1         2     3    1785        1
2         3     0    1200        1
..      ...   ...     ...      ...
997       2     0    1383        1
998       2     2     816        0
999       2     2     387        0

df.describe()

            adults         kids       income      consume
count  1000.000000  1000.000000  1000.000000  1000.000000
mean      2.012000     0.722000   649.528000     0.766000
std       0.815181     1.078833   460.657826     0.423584
min       1.000000     0.000000    12.000000     0.000000
25%       2.000000     0.000000   295.000000     1.000000
50%       2.000000     0.000000   562.500000     1.000000
75%       2.000000     1.000000   887.500000     1.000000
max       6.000000     5.000000  3846.000000     1.000000

使用将信息加载到Pandas DataFrame之后，是否有办法查看此信息read_stata()？

提问者

Max Ghenis

被浏览

查看英文版

查看原文

sysuse auto, clear describe Contains data from auto.dta obs: 74 1978 Automobile Data vars: 12 13 Apr 2014 17:45 size: 3,182 (_dta has notes) ------------------------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type ------------------------------------------------------------------------------------------------------------------------------------- Sorted by: foreign

import pandas as pd data = pd.read_stata('auto.dta', iterator = True) labels = data.variable_labels() labels Out[5]: {'make': 'Make and Model', 'price': 'Price', 'mpg': 'Mileage (mpg)', 'rep78': 'Repair Record 1978', 'headroom': 'Headroom (in.)', 'trunk': 'Trunk space (cu. ft.)', 'weight': 'Weight (lbs.)', 'length': 'Length (in.)', 'turn': 'Turn Circle (ft.) ', 'displacement': 'Displacement (cu. in.)', 'gear_ratio': 'Gear Ratio', 'foreign': 'Car type'}

python - 查看Pandas中的Stata变量标签

相关问题

热门github