I have a data frame with one column and I'd like to split it into two columns, with one column header as 'fips'
and the other 'row'
My dataframe df
looks like this:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
I do not know how to use df.row.str[:]
to achieve my goal of splitting the row cell. I can use df['fips'] = hello
to add a new column and populate it with hello
. Any ideas?
fips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
There might be a better way, but this here's one approach:
In [34]: import pandas as pd
In [35]: df
Out[35]:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
In [36]: df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['flips','row'])
In [37]: df
Out[37]:
flips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
Be aware that .tolist() will remove any indexes you had, so your new Dataframe will be reindexed from 0 (It doesn't matter in your specific case).
@Crashthatch -- then again you can just add
index = df.index
and you are good.what if one cell can't be split?
AttributeError: 'DataFrame' object has no attribute 'row'
@Nisba: If any cell can't be split (e.g. string doesn't contain any space for this case) it will still work but one part of the split will be empty. Other situations will happen in case you have mixed types in the column with at least one cell containing any number type. Then the
split
method returns NaN and thetolist
method will return this value as is (NaN) which will result inValueError
(to overcome this issue you can cast it to string type before splitting). I recommend you to try it on your own it's the best way of learning :-)