Warm tip: This article is reproduced from stackoverflow.com, please click
dataframe lookup merge pandas python

Pandas pickup and merge data from n-dataframes into one dataframe

发布于 2020-04-21 11:08:49

I have already looked up another similar q&a on the subject but I can not figure out my issue so I appreciate any hint. I have three data frames stored in the dictonary and another dataframe when I want to gather data.

import numpy as np
import pandas as pd

df1= pd.DataFrame({'tenor':['1w', '1m', '3m', '2y'],
                   'rate':[2.40, 2.51, 2.66, 2.92],
                   'end_date':['14022020', '09022020', '07052020', '07022022']})

df2= pd.DataFrame({'tenor':['3x6', '6x9', '9x12'],
                   'rate':[2.95, 3.06, 3.98],
                   'end_date':['07082020', '09112020', '08022021']})

df3= pd.DataFrame({'tenor':['2y', '3y', '4y'],
                   'rate':[1.80, 1.81, 1.84],
                   'end_date':['08022022', '07022023', '07022024']})


rates = {'ois':df1, 'fra':df2, 'irs':df3}


dfA= pd.DataFrame({'label':['ois', 'ois', 'fra', 'fra', 'irs', 'irs', 'irs'],
                   'tenor':['1w', '1m', '3x6', '9x12', '2y', '3y', '4y']})

I would like to add another columne ['rates'] in dfA by picking up the values from the coresponding dataframes (via dict. mapping) that match ['tenor']. So the expected outcome would be like this:

Out[]: 
  label tenor  rate
0   ois    1w  2.40
1   ois    1m  2.51
2   fra   3x6  2.95
3   fra  9x12  3.98
4   irs    2y  1.80
5   irs    3y  1.81
6   irs    4y  1.84

I know I can get particular data in the dataframe with this line (for instance):

rates['ois'].loc[rates['ois']['tenor']=='1w', 'rate']
Out[]: 
0    2.4
Name: rate, dtype: float64

So I try to embed it into apply() function with the following code:

dfA['rate'] = dfA.apply(lambda x: rates[x['label']][rates[x['label']]['tenor']==x['tenor']]['rate'], axis=1)

But unfortunately it returns:

Out[]: 
  label tenor  rate
0   ois    1w  2.40
1   ois    1m   NaN
2   fra   3x6  2.95
3   fra  9x12   NaN
4   irs    2y  1.80
5   irs    3y   NaN
6   irs    4y   NaN

I do not understand why some rates are NaN. What am I missing here? Please help.

Questioner
halny
Viewed
27
Michel Guimarães 2020-02-06 04:23

I don't know if this is a plausive solution for you, but I'd do like below:

  1. Concat the dfs in another DataFrame:

    dfAux = pd.concat([df1, df2, df3])

  2. Do a merge, using dfA on left:

    dfA = pd.merge(dfA, dfAux, how = 'left', on = ['tenor']).drop(['end_date'], axis = 1)