我有下面的蜂巢表。列的cycle_month
值为YYYYMM格式。
+---------------+--------------+------------+
| column_value | metric_name |cycle_month |
+---------------+--------------+------------+
| A37B | Mean | 202005 |
| ACCOUNT_ID | Mean | 202005 |
| ANB_200 | Mean | 202005 |
| ANB_201 | Mean | 202006 |
| AS82_RE | Mean | 202006 |
| ATTR001 | Mean | 202007 |
| ATTR001_RE | Mean | 202007 |
| ATTR002 | Mean | 202008 |
| ATTR002_RE | Mean | 202008 |
| ATTR003 | Mean | 202009 |
| ATTR004 | Mean | 202009 |
| ATTR005 | Mean | 202009 |
| ATTR006 | Mean | 202010 |
我需要编写一个动态查询,以获取用户传递的cycle_month值和cycle_month-4个月之间的值。
Spark SQL查询:
select column_name, metric_name from table where cycle_month between add_months(to_date(202010,'YYYYMM'),-4) and 202010
遇到错误
[错误10015]:第1行:323参数长度不匹配“ YYYYMM”:to_date()需要1个参数,得到2个(状态= 21000,代码= 10015)
预期产量:
+---------------+--------------+------------+
| column_value | metric_name |cycle_month |
+---------------+--------------+------------+
| ANB_201 | Mean | 202006 |
| AS82_RE | Mean | 202006 |
| ATTR001 | Mean | 202007 |
| ATTR001_RE | Mean | 202007 |
| ATTR002 | Mean | 202008 |
| ATTR002_RE | Mean | 202008 |
| ATTR003 | Mean | 202009 |
| ATTR004 | Mean | 202009 |
| ATTR005 | Mean | 202009 |
| ATTR006 | Mean | 202010 |
Y
不是年份的正确格式;应该是y
。你应该使用yyyyMM
。有关详细信息,请参见https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html。
SELECT
column_name, metric_name, cycle_month
FROM
table
WHERE
to_date(cycle_month, 'yyyyMM') BETWEEN
add_months(to_date(202010, 'yyyyMM'), -4)
AND
to_date(202010, 'yyyyMM')