I am having following hive table. Column cycle_month
has values in YYYYMM format.
+---------------+--------------+------------+
| column_value | metric_name |cycle_month |
+---------------+--------------+------------+
| A37B | Mean | 202005 |
| ACCOUNT_ID | Mean | 202005 |
| ANB_200 | Mean | 202005 |
| ANB_201 | Mean | 202006 |
| AS82_RE | Mean | 202006 |
| ATTR001 | Mean | 202007 |
| ATTR001_RE | Mean | 202007 |
| ATTR002 | Mean | 202008 |
| ATTR002_RE | Mean | 202008 |
| ATTR003 | Mean | 202009 |
| ATTR004 | Mean | 202009 |
| ATTR005 | Mean | 202009 |
| ATTR006 | Mean | 202010 |
I need to write a dynamic query to get values between user passed cycle_month value and cycle_month - 4 months.
Spark SQL Query:
select column_name, metric_name from table where cycle_month between add_months(to_date(202010,'YYYYMM'),-4) and 202010
Getting error
[Error 10015]: Line 1:323 Arguments length mismatch ''YYYYMM'': to_date() requires 1 argument, got 2 (state=21000,code=10015)
Expected output:
+---------------+--------------+------------+
| column_value | metric_name |cycle_month |
+---------------+--------------+------------+
| ANB_201 | Mean | 202006 |
| AS82_RE | Mean | 202006 |
| ATTR001 | Mean | 202007 |
| ATTR001_RE | Mean | 202007 |
| ATTR002 | Mean | 202008 |
| ATTR002_RE | Mean | 202008 |
| ATTR003 | Mean | 202009 |
| ATTR004 | Mean | 202009 |
| ATTR005 | Mean | 202009 |
| ATTR006 | Mean | 202010 |
Y
is not the correct format for year; it should be y
. You should use yyyyMM
. See https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html for details.
SELECT
column_name, metric_name, cycle_month
FROM
table
WHERE
to_date(cycle_month, 'yyyyMM') BETWEEN
add_months(to_date(202010, 'yyyyMM'), -4)
AND
to_date(202010, 'yyyyMM')