Warm tip: This article is reproduced from stackoverflow.com, please click
amazon-athena amazon-web-services aws-glue

Not able read part files from parquet

发布于 2020-04-23 11:37:49

HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file s3://xx/xxxx/part-xxxxxxxxxx.parquet.

I have created the parquet files using AWS glue dynamic frame write api, when I'm trying to read through AWS athena from glue catalog table, getting this error.

When I'm trying to read this file through glue catalog via dynamic frame, this seems to be fine, but Athena is giving be the above mentioned error.

Worked with the avro format, there seems to be no issue in that.

CREATE EXTERNAL TABLE `table_name`(
`column_name_1` string, 
`column_name_2` string
 )
 ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
 STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
 OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
 LOCATION
  's3://xxxxxxxxxxx/xxxxx/xxx/'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0', 
  'UPDATED_BY_CRAWLER'='xxxxxxxxxx', 
  'averageRecordSize'='xxxxx', 
  'classification'='parquet', 
  'compressionType'='none', 
  'objectCount'='xxxxx', 
  'recordCount'='xxx', 
  'sizeKey'='xxxx', 
  'typeOfData'='file') 
Questioner
Rahul Berry
Viewed
129
Rahul Berry 2020-02-08 15:39

There is issue with smallint in case of athena,it is having not null value

It cannot not used with smallint and any other data type, for that reason we get above mentioned error.

A solution would be converting smallint to string before to s3 in parquet