Warm tip: This article is reproduced from serverfault.com, please click

Doing value lookups between dataframes with pandas?

发布于 2020-12-01 20:53:48

I have a dataframe with users and their two top features

df_a: enter image description here

I have a second dataframe with the actual values for these features

df_b:

enter image description here

I am trying to look up the actual value from df_b, using the given top features from df_a, to get something like this:

df_c

enter image description here

I am currently doing this lookup using a for loop and it's quite slow...hoping there's a more appropriate way. Thanks

Questioner
L Xandor
Viewed
0
Quang Hoang 2020-12-02 05:02:40

Something like this would work for you:

df_c = (df_a.melt('UID', value_name='variable', var_name='feat')
     .merge(df_b.melt('UID'), on=('UID','variable'))
     .pivot(index='UID',columns='feat')

)

Output:

        variable                   value            
feat 2nd_feature top_feature 2nd_feature top_feature
UID                                                 
123        feat2       feat3    0.720324    0.000114
124        feat3       feat1    0.092339    0.302333
125        feat2       feat1    0.345561    0.186260
126        feat2       feat3    0.419195    0.685220

Or a little manually with lookup:

df_b = df_b.set_index('UID')

for col in ['top_feature', '2nd_feature']:
    df_a[f'{col}_value'] = df_b.lookup(df_a['UID'], df_a[col])

so you modified df_a to:

   UID top_feature 2nd_feature  top_feature_value  2nd_feature_value
0  123       feat3       feat2           0.000114           0.720324
1  124       feat1       feat3           0.302333           0.092339
2  125       feat1       feat2           0.186260           0.345561
3  126       feat3       feat2           0.685220           0.419195