Warm tip: This article is reproduced from serverfault.com, please click
sas

Summarise and calculate the items specifically in the dataset using proc sql

发布于 2020-12-01 17:02:22

My dataset and attempt

data mydata;
input Category $ Item $;
datalines;
A 1
A 1 
A 2
B 3
B 1
;

proc sql;
create table mytable as 
    select *, count(Category) as Total_No_in_Category, count(Category)-count(item, "3") as No_of_not_3_in_the_same_category from mydata
    group by Category;
run;

Result

Category     No_of_not_3_in_the_same_category   Total_No_in_Category


  A                      3                          3
  A                      3                          3
  A                      3                          3
  B                      2                          2
  B                      2                          1

My expected result

 Category   No_of_not_3_in_the_same_ category   Total_No_in_Category
    A                          2                       3
    B                          1                       2

I wonder how to achieve the expected result using only proc SQL. Thank you so much.

Questioner
Alana
Viewed
0
Richard 2020-12-02 03:02:59

The two argument COUNT(item, "3") function call is not an summary function. That causes all rows from original table to be automatically remerged with the aggregate computation (those count()). The remerge is a proprietary feature of SAS Proc SQL and not part of the ANSI Standard for SQL.

You appear to want the number of unique non-3 item values, so you will need a

COUNT(DISTINCT ...expression...)

in the query. The ...expression... can be a case clause that transforms item="3" to a null value by not having an else part of the case clause.

Example:

create table want as
select
  category
  , count(*) as freq
  , count(distinct case when item ne "3" then item end) as n_unq_item_not_3
  from mydata
  group by category
  ;