Warm tip: This article is reproduced from stackoverflow.com, please click
group-by sql sql-server tsql

SQL-query for grouping continuous numbers

发布于 2020-03-27 10:17:55

Lets say i have the following table:

| ... | orderId | serialNumber | type          | ... |
|-----|---------|--------------|---------------|-----|
| ... | 1       | 01           | someType      | ... |
| ... | 1       | 02           | someType      | ... |
| ... | 1       | 03           | someOtherType | ... |
| ... | 1       | 04           | someOtherType | ... |
| ... | 1       | 05           | someType      | ... |
| ... | 1       | 06           | someType      | ... |
| ... | 2       | 07           | someType      | ... |
| ... | 2       | 08           | someType      | ... |

I want my query to generate the following result:

| orderId | count | min | max | type          |
|---------|-------|-----|-----|---------------|
| 1       | 2     |  01 | 02  | someType      |
| 1       | 2     |  03 | 04  | someOtherType |
| 1       | 2     |  05 | 06  | someType      |
| 2       | 4     |  07 | 08  | someType      |

I want groups by type and orderId. When the serialNumber is not ongoing, a new entry should be created.

This is my current query:

SELECT
 orderId,
 count(*) AS count,
 min(serialNumber) AS min,
 max(serialNumber) AS max,
 type
FROM tblMyTable
group by type, orderId
order by orderId

but the result it creates is wrong:

| orderId | count | min | max | type          |
|---------|-------|-----|-----|---------------|
| 1       | 4     |  01 | 06  | someType      |  <-- this should be 2 entries
| 1       | 2     |  03 | 04  | someOtherType |
| 2       | 2     |  07 | 08  | someType      |

As you can see it does not detect that serialNumber 03 and 04 belongs to another description. This leads to an incorrect count and max-value.

I don't know how to add a criteria that checks for a continuous serialNumber.

EDIT: There can also occur single records like:

 | orderId | count | min | max | type          |
 |---------|-------|-----|-----|---------------|
 | 1337    | 1     |  10 | 10  | someNewType   | 
Questioner
daly
Viewed
110
Gordon Linoff 2019-07-03 22:07

This is a gaps and islands problem, but I would use the difference of row numbers approach:

select orderid, count(*), min(serialNumber), max(serialNumber), type
from (select t.*,
             row_number() over (partition by orderid order by serialnumber) as seqnum,
             row_number() over (partition by orderid, type order by serialnumber) as seqnum_type
      from t
     ) t
group by orderid, type, (seqnum - seqnum_type)
order by orderid, min(serialNumber);

Your serialnumber looks like a string, but it has numeric values. If you can trust the sequencing of it, you don't even need both row number values:

select orderid, count(*), min(serialNumber), max(serialNumber), type
from (select t.*,
             row_number() over (partition by orderid, type order by serialnumber) as seqnum_type
      from t
     ) t
group by orderid, type, (serialnumber - seqnum_type)
order by orderid, min(serialNumber);

The reason this works is a little hard to explain, but quite obvious if you run the subqueries. You will see how the difference between the row numbers identifies the groups that you want to identify.