Create a column in a dataframe that is a string of characters summarizing data in other columns

发布于 2020-03-27 10:22:24

I have a dataframe like this where the columns are the scores of some metrics:

I want to create a new column to summarize which metrics each row scored over a set threshold in, using the column name as a string. So if the threshold was A > 2, B > 3, C > 1, D > 3, I would want the new column to look like this:

A B C D NewCol  
4 3 3 1 AC  
2 5 2 2 BC  
3 5 2 4 ABCD

I tried using a series of np.where:

df[NewCol] = np.where(df['A'] > 2, 'A', '')  
df[NewCol] = np.where(df['B'] > 3, 'B', '')

etc.

but realized the result was overwriting with the last metric any time all four metrics didn't meet the conditions, like so:

A B C D NewCol  
4 3 3 1 C  
2 5 2 2 C  
3 5 2 4 ABCD

I am pretty sure there is an easier and correct way to do this.

Questioner

J.S.P.

Viewed

Chinese

Original

import pandas as pd data = [[4, 3, 3, 1], [2, 5, 2, 2], [3, 5, 2, 4]] df = pd.DataFrame(data=data, columns=['A', 'B', 'C', 'D']) th = {'A': 2, 'B': 3, 'C': 1, 'D': 3} df['result'] = [''.join(k for k in df.columns if record[k] > th[k]) for record in df.to_dict('records')] print(df)

Create a column in a dataframe that is a string of characters summarizing data in other columns

Related issues