Warm tip: This article is reproduced from stackoverflow.com, please click
amazon-redshift amazon-web-services aws-glue pyspark

redshift glue job bigint issue

发布于 2020-04-18 09:54:46

I have a redshi9ft database. in the database i have created a table and in the table i have a bigint column. i created a glue job to insert data in to redshift. but problem is with bigint field. it is not inserting. seems some issue with bigint. job code is below. I am using python 3 and spark 2.2,

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
 spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = 
"tbl_test", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string", 
 "testdata", "string"), ("selling", "bigint", "selling", "bigint")], transformation_ctx = "applymapping1")

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_cols", 
 transformation_ctx = "resolvechoice2")

dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = 
"dropnullfields3")

 datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3, 
 catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test", 
 "database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4")
 job.commit()
Questioner
user2768132
Viewed
33
thehawkman 2020-01-09 05:47

Try using the Mapping: ("selling", "int", "selling", "long")

If this doesn't work, you should post what the "tbl_test" definition in the Glue Catalog looks like. The first type in your ApplyMapping should match the type listed in the Catalog's table definition.

I had a similar issue, it turned out the type on the glue table created by the Glue Crawler in the console was 'int', not 'long', so the ApplyMapping needed to be ("fieldName", "int", "fieldName", "long") in the Glue Job for the Redshift Type 'bigint'.

Interestingly, it allowed me to keep the value in the Glue DynamicFrame and even print it to the logs immediately before writing when I had the ApplyMapping as ("field", "long", "field", "long"), but would not write the data to Redshift.

Hope this helps!