How to control the number of Hadoop IPC retry attempts for a Spark job submission?

发布于 2020-04-08 09:22:25

Suppose I attempt to submit a Spark (2.4.x) job to a Kerberized cluster, without having valid Kerberos credentials. In this case, the Spark launcher tries repeatedly to initiate a Hadoop IPC call, but fails:

20/01/22 15:49:32 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "node-1.cluster/172.18.0.2"; destination host is: "node-1.cluster":8032; , while invoking ApplicationClientProtocolPBClientImpl.getClusterMetrics over null after 1 failover attempts. Trying to failover after sleeping for 35160ms.

This will repeat a number of times (30, in my case), until eventually the launcher gives up and the job submission is considered failed.

Various other similar questions mention these properties (which are actually YARN properties but prefixed with spark. as per the standard mechanism to pass them with a Spark application).

spark.yarn.maxAppAttempts
spark.yarn.resourcemanager.am.max-attempts

However, neither of these properties affects the behavior I'm describing. How can I control the number of IPC retries in a Spark job submission?

Questioner

Jeff Evans

Viewed

114

Chinese

Original

How to control the number of Hadoop IPC retry attempts for a Spark job submission?

Related issues