温馨提示:本文翻译自stackoverflow.com,查看原文请点击:其他 - Removing duplicates and selecting distinct values with struct array in Bigquery
google-bigquery

其他 - 在Bigquery中使用结构数组删除重复项并选择不同的值

发布于 2020-04-08 17:24:48

我正在开始使用BigQuery。我有一个看起来像这样的数据库, 可以生成为

    WITH T AS (
  SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(2, "dot"), (2, "dot"), (1, "string")] AS arr, DATE(2020,01,31) AS date UNION ALL
  SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(2, "dot"), (2, "dot"), (1, "string")] AS arr, DATE(2020,01,31) AS date UNION ALL
  SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(20, "dot"), (20, "dot"), (1, "string")] AS arr, DATE(2020,01,30) AS date UNION ALL
  SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (212, "plane"), (156, "cube")] AS arr, DATE(2020,01,31) AS date UNION ALL
  SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (212, "plane"), (156, "cube")] AS arr, DATE(2020,01,31) AS date UNION ALL
  SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (21, "plane"), (156, "cube")] AS arr, DATE(2020,01,30) AS date UNION ALL
  SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (2, "plane"), (156, "cube")] AS arr, DATE(2020,01,30) AS date UNION ALL
  SELECT 1 AS id, 'blue' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(4, "cube"), (4, "cube"), (4, "cube")], DATE(2020, 01, 31) AS date UNION ALL
  SELECT 2 AS id, 'orange' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(5, "string")], DATE(2020,01,31) AS date UNION ALL
  SELECT 2 AS id, 'orange' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(5, "string")], DATE(2020,01,30) AS date
)
SELECT *
FROM T;

我想选择每个不同的日期,并针对每个日期采用每种形状和每种ID和每种颜色的最大数量。例如,对于2020-01-31,红色0为2点1字符串,对于2020-01-30对于0黑色为296点21平面156立方体。数据中的行,日期和struct数组中可能会重复。

更准确地说,我希望查询结果看起来像这样 ,可以由

WITH T AS (
  SELECT DATE(2020,01,31) AS date, ARRAY<STRUCT<count INT64, shape STRING, id INT64, colour STRING>>[(2, "dot", 0, "red"), (1, "string", 0, "red"), (296, "dot", 0, "black"), (212, "plane", 0, "black"), (156, "cube", 0, "black"), (4, "cube", 1, "blue"), (5, "string", 2, "orange")] AS res UNION ALL
  SELECT DATE(2020,01,30) AS date, ARRAY<STRUCT<count INT64, shape STRING, id INT64, colour STRING>>[(20, "dot", 0, "red"), (1, "string", 0, "red"), (296, "dot", 0, "black"), (21, "plane", 0, "black"), (156, "cube", 0, "black"), (5, "string", 2, "orange")] AS res
)
SELECT *
FROM T;

我在两个问题上苦苦挣扎:删除重复项,并为数组的每一行选择ID和形状。例如查询

SELECT date, ARRAY_CONCAT_AGG(ARRAY((SELECT AS STRUCT MAX(count), shape FROM UNNEST(arr) GROUP BY shape)))
FROM T
GROUP BY date

返回我的副本。然后,我需要为每个嵌套行分配id和颜色。任何建议将不胜感激。

谢谢!

查看更多

提问者
John
被浏览
113
rtenha 2020-02-01 01:11
with T AS (),
unnested_and_unique as (
  select distinct id, colour, count, shape, date
  from T left join unnest(arr) x
)
select date,array_agg(struct(count,shape,id,colour)) as res
from unnested_and_unique
group by 1