Warm tip: This article is reproduced from serverfault.com, please click

Error while applying to_date() and add_months function in Spark SQL

发布于 2020-11-28 01:13:01

I am having following hive table. Column cycle_month has values in YYYYMM format.

+---------------+--------------+------------+
| column_value  | metric_name  |cycle_month |
+---------------+--------------+------------+
| A37B          | Mean         | 202005     |
| ACCOUNT_ID    | Mean         | 202005     |
| ANB_200       | Mean         | 202005     |
| ANB_201       | Mean         | 202006     |
| AS82_RE       | Mean         | 202006     |
| ATTR001       | Mean         | 202007     |
| ATTR001_RE    | Mean         | 202007     |
| ATTR002       | Mean         | 202008     |
| ATTR002_RE    | Mean         | 202008     |
| ATTR003       | Mean         | 202009     |
| ATTR004       | Mean         | 202009     |
| ATTR005       | Mean         | 202009     |
| ATTR006       | Mean         | 202010     |

I need to write a dynamic query to get values between user passed cycle_month value and cycle_month - 4 months.

Spark SQL Query:

select column_name, metric_name from table where cycle_month between add_months(to_date(202010,'YYYYMM'),-4) and 202010  

Getting error

[Error 10015]: Line 1:323 Arguments length mismatch ''YYYYMM'': to_date() requires 1 argument, got 2 (state=21000,code=10015)

Expected output:

+---------------+--------------+------------+
| column_value  | metric_name  |cycle_month |
+---------------+--------------+------------+
| ANB_201       | Mean         | 202006     |
| AS82_RE       | Mean         | 202006     |
| ATTR001       | Mean         | 202007     |
| ATTR001_RE    | Mean         | 202007     |
| ATTR002       | Mean         | 202008     |
| ATTR002_RE    | Mean         | 202008     |
| ATTR003       | Mean         | 202009     |
| ATTR004       | Mean         | 202009     |
| ATTR005       | Mean         | 202009     |
| ATTR006       | Mean         | 202010     |

Questioner
Arvinth
Viewed
0
mck 2020-11-28 15:22:03

Y is not the correct format for year; it should be y. You should use yyyyMM. See https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html for details.

SELECT 
    column_name, metric_name, cycle_month
FROM 
    table
WHERE 
    to_date(cycle_month, 'yyyyMM') BETWEEN 
        add_months(to_date(202010, 'yyyyMM'), -4)
            AND 
        to_date(202010, 'yyyyMM')