Warm tip: This article is reproduced from stackoverflow.com, please click
apache-spark jdbc oracle scala sql

java.sql.SQLException: Unrecognized SQL type -102 while connecting to Oracle Database from Apache Sp

发布于 2020-03-27 10:16:34

I'm trying to load a remote Oracle Database table on to Apache Spark shell.

This is how I started the spark-shell.

./spark-shell --driver-class-path ../jars/ojdbc6.jar --jars ../jars/ojdbc6.jar --master local

And I get a Scala prompt, where I try to load an Oracle database table like below. (I use a custom JDBC URL)

val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=WHATEVER)(HOST=myDummyHost.com)(PORT=xxxx)))(CONNECT_DATA=(SERVICE_NAME=dummy)(INSTANCE_NAME=dummyKaMummy)(UR=A)(SERVER=DEDICATED)))").option("dbtable", "THE_DUMMY_TABLE").option("user", "DUMMY_USER").option("password", "DUMMYPASSWORD").option("driver", "oracle.jdbc.driver.OracleDriver").load()

(Replaced employer data with dummy variables)

And then I get this error.

java.sql.SQLException: Unrecognized SQL type -102
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:246)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
  ... 49 elided

I tried to see if there is an issue with the quotes, but it's not that.

Can somebody save my life, please?

Questioner
Aryan Venkat
Viewed
292
quasipolynomial 2019-07-03 21:21

The problem is an incompatible field in the database. If you cannot modify the database, but would still like to read it, the solution would be to ignore specific columns (in my case it's a field with type geography). With the help of How to select specific columns through Spack JDBC?, here's a solution in pyspark (scala solution would be similar):

df = spark.read.jdbc(url=connectionString, table="(select colName from Table) as CompatibleTable", properties=properties)