Warm tip: This article is reproduced from stackoverflow.com, please click
sas

SAS how to use first. with NOTSORTED

发布于 2020-04-15 10:51:41

I am having this issue

My code is

data step10;
  set step9;
  by referenceid NOTSORTED;
  if first.referenceid then JOIN_KEY=1;
  ELSE JOIN_KEY+1;
run;

Then output showing

enter image description here

This last two rows should be 2, since "MBA1" AND "MBA2" are already exists before.

Except these two rows should be 1, since it is unique.

How should I change my code?

Questioner
supercool djkazu
Viewed
29
Richard 2020-02-05 01:14

A by-group is the sequence of rows that are adjacent, having the same by-var values.

NOTSORTED is for processing groups constructed from by values that are contiguous yet not sorted.

All your sample data has by-groups of size 1 because none of the id values are repeated looking down the column.

Here are two techniques you can try:

  • sort the data by referenceid and <some-other-sequencing-variable> and do normal by group processing.
  • maintain a hash of referenceid and hit-counts as you process the data set

Hash example (my sequenceId === your join_key):

data want;
  set have;
  if _n_ = 1 then do;
    declare hash ids();
    ids.defineKey('referenceid');
    ids.defineData('referenceid', 'sequenceId');
    ids.defineDone();
  end;

  if ids.find() ne 0 
    then sequenceId = 1;
    else sequenceId + 1;

  ids.replace();
run;