我有一个redshi9ft数据库。在数据库中,我创建了一个表,在表中,我有一个bigint列。我创建了一个胶水作业以将数据插入到redshift中。但是问题出在bigint领域。它没有插入。似乎bigint有问题。作业代码如下。我正在使用python 3和spark 2.2,
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name =
"tbl_test", transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string",
"testdata", "string"), ("selling", "bigint", "selling", "bigint")], transformation_ctx = "applymapping1")
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_cols",
transformation_ctx = "resolvechoice2")
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx =
"dropnullfields3")
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3,
catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test",
"database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4")
job.commit()
尝试使用映射:(“ selling”,“ int”,“ selling”,“ long”)
如果这不起作用,则应发布Glue目录中的“ tbl_test”定义。ApplyMapping中的第一种类型应与目录表定义中列出的类型匹配。
我有一个类似的问题,事实证明,控制台中由Glue Crawler在胶水表上创建的类型是'int',而不是'long',因此ApplyMapping必须为(“ fieldName”,“ int”,“ fieldName “,” long“)在Redshift类型'bigint'的胶水作业中。
有趣的是,当我将ApplyMapping为(“ field”,“ long”,“ field”,“ long”)时,它使我可以将值保留在Glue DynamicFrame中,甚至在打印之前立即将其打印到日志中,但不会将数据写入Redshift。
希望这可以帮助!
谢谢回复。我也会尝试这个。
谢谢。这是工作。