Consider a simplified example:
SELECT
lengthy_expression AS a,
g(a) AS b,
h(a) AS c,
...
FROM
my_table
Here, lengthy_expression
represents a complex expression that uses several fields from my_table
and spans several lines. Its result is used to calculate two other fields b
and c
. However, the above is not allowed in standard SQL, because an expression in the WHERE
clause is not allowed to refer to the result of another expression in the WHERE
clause (why not is beyond me).
The naive option is to repeat the lengthy_expression
, but that's what I explicitly want to avoid.
One option is to use a subquery:
SELECT
a,
g(a) AS b,
h(a) AS c,
...
FROM (
SELECT
lengthy_expression AS a,
...
FROM
my_table
)
But as you can see, I now need to repeat the other fields ...
that the outer query needs.
Maybe a subquery with a join then?
SELECT
a,
g(a) AS b,
h(a) AS c,
...
FROM
my_table
INNER JOIN (
SELECT
lengthy_expression AS a
FROM
my_table
) USING id
It works, but now there's a (possibly expensive) join that serves no purpose except to keep the query readable. And it doesn't even serve that purpose very well, because the lengthy_expression
is tucked away below the point where it's used, and the human reader has to jump all over the place to find out what's going on.
An alternative is using a CTE:
WITH
my_table_with_a AS (
SELECT
*,
lengthy_expression AS a
FROM
my_table
)
SELECT
*,
g(a) AS b,
h(a) AS c,
...
FROM
my_table_with_a
At least now the reading order more or less matches the logical order in which operations happen, but it's pretty verbose, and finding a good name for my_table_with_a
is difficult. Especially because in practice, I'm repeating this pattern two or three times.
Is there a better way to do this?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
a,
g(a) AS b,
h(a) AS c,
FROM `project.dataset.my_table`,
UNNEST([lengthy_expression]) a
Below is extremely simplified example of above approach
#standardSQL
WITH `project.dataset.my_table` AS (
SELECT 1 x, 2 y, 3 z UNION ALL
SELECT 4, 5, 6
)
SELECT
a,
a / 2 AS b,
2 * a AS c
FROM `project.dataset.my_table`,
UNNEST([x + y + z]) a
WHERE a > 10
with result
Row a b c
1 15 7.5 30
Note: in case if result of lengthy_expression
is an ARRAY by itself - you need to enclose it into struct as array of array is not supported in BigQuery
Hah, that's a very interesting hack, thank you! Any idea if this has better/worse performance than the subquery or CTE approach?
I don't expect any performance hit here - but obviously less verbose and easier to maintenance