Warm tip: This article is reproduced from stackoverflow.com, please click
count countif pandas python

How to count occurrences based on multiple criteria in a DataFrame

发布于 2020-04-13 09:39:05

I'm trying to figure out how to count a number of occurrences in the DataFrame using multiple criteria. In this particular example, I'd like to know the number of female passengers in Pclass 3.

    PassengerId Pclass  Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
0       892 3   male    34.5    0   0   330911  7.8292  NaN Q
1       893 3   female  47.0    1   0   363272  7.0000  NaN S
2       894 2   male    62.0    0   0   240276  9.6875  NaN Q
3       895 3   male    27.0    0   0   315154  8.6625  NaN S
4       896 3   female  22.0    1   1   3101298 12.2875 NaN S

Here's my few failed attempts:

    len(test[test["Sex"] == "female", test["Pclass"] == 3])
    sum(test.Pclass == 3 & test.Sex == "female")
    test.[test["Sex"] == "female", test["Pclass"] == 3].count()

None of them seem to be working. At the end I've created my own function, but there must be a simpler way to calculate that.

def countif(sex, pclass):
    x = 0
    for i in range(0,len(test)):
        s = test.iloc[i]['Sex']
        c = test.iloc[i]['Pclass']
        if s == sex and c == pclass:
                x = x + 1
    return x

Thank you in advance

Questioner
Krzysztof Bobkowski
Viewed
19
StupidWolf 2020-02-03 08:40

There are a few ways to do this:

test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896}, 
      'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3}, 
      'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'}, 
      'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0}, 
      'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1}, 
      'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 
      'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298}, 
      'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875}, 
      'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan}, 
      'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})

You need to put the boolean in round brackets and join with an &

sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]

Or you can do:

tab = pd.crosstab(df.Pclass,df.Sex)

Sex female  male
Pclass      
2   0   1
3   2   2

tab.iloc[tab.index==3]['female']