Warm tip: This article is reproduced from serverfault.com, please click

Set index label in pandas to_stata

发布于 2020-11-30 20:26:29

I am trying to save a pandas DataFrame to a file in Stata format. More specifically, the index of the DataFrame needs to be saved to and the header of the column where the index will be located must have a spacific text: in other words I need to set an index label.

Pandas' to_csv has an index_label option, but pandas' to_stata function does not have a index_label option.

How can I set an index label when I am saving in Stata format?

Questioner
raffaem
Viewed
0
Álvaro A. Gutiérrez-Vargas 2020-12-03 19:36:45

There is a subtle difference between having a DataFrame that already has an index name (Case 1), and when the index name is not set beforehand (Case2).

Case 1: index_name already in place (using.set_index)

import pandas as pd
#data
data = [['Eren Jaeger', 15,'Soldier' ] , ['Mikasa Ackerman', 14,'Soldier'], ['Armin Arlert', 14,'Soldier'],['Levi Ackerman', 30, 'Captain']]  
#creating DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank'])
#setting index_name based on a previous variable
df = df.set_index('Rank', drop=True)
#creating dta file (no need of .rename_axis(index='my_index'))
df.to_stata('stata_df_1.dta' )  

df
##                Name  Age
## Rank                         
## Soldier      Eren Jaeger   15
## Soldier  Mikasa Ackerman   14
## Soldier     Armin Arlert   14
## Captain    Levi Ackerman   30

Case 2: index no named (.rename_axis(index='my_index') is required)

Based on @QuangHoang comments, here is a way to set the index's name when it is no named beforehand.

data = [['Eren Jaeger', 15] , ['Mikasa Ackerman', 14], ['Armin Arlert', 14],['Levi Ackerman', 30]]  
df = pd.DataFrame(data, columns = ['Name', 'Age'])

df
##              Name  Age
## 0      Eren Jaeger   15
## 1  Mikasa Ackerman   14
## 2     Armin Arlert   14
## 3    Levi Ackerman   30

#this will have a first variable with digits 1 to 4 called "index" (default)
df.to_stata('stata_df_no_name.dta' )  

#this will have a first variable with digits 1 to 4 called "my_index"
df.rename_axis(index='my_index').to_stata('stata_df_2.dta')