I am having this issue
My code is
data step10;
set step9;
by referenceid NOTSORTED;
if first.referenceid then JOIN_KEY=1;
ELSE JOIN_KEY+1;
run;
Then output showing
This last two rows should be 2, since "MBA1" AND "MBA2" are already exists before.
Except these two rows should be 1, since it is unique.
How should I change my code?
A by-group is the sequence of rows that are adjacent, having the same by-var values.
NOTSORTED
is for processing groups constructed from by values that are contiguous yet not sorted.
All your sample data has by-groups of size 1 because none of the id values are repeated looking down the column.
Here are two techniques you can try:
referenceid
and <some-other-sequencing-variable>
and do normal by group processing.referenceid
and hit-counts as you process the data setHash example (my sequenceId
=== your join_key
):
data want;
set have;
if _n_ = 1 then do;
declare hash ids();
ids.defineKey('referenceid');
ids.defineData('referenceid', 'sequenceId');
ids.defineDone();
end;
if ids.find() ne 0
then sequenceId = 1;
else sequenceId + 1;
ids.replace();
run;