Urgent :Launching pyspark shell demanding cluster resources fails with connection errors


#1

pyspark --master yarn --conf spark.ui.port=17818 --num-executors 2 --executor-memory 1g --executor-cores 4

unable to launch pyspark shell …

I have made repeated attempts but it fails everytime while connecting to one node or the other.

as per suggestion on of the blog , its asking to increase this ratio

yarn.nodemanager.vmem-pmem-ratio
2.1

Could you please suggest what needs to be done here?

    ... 1 more

Caused by: java.nio.channels.ClosedChannelException
18/06/28 00:04:30 ERROR TransportClient: Failed to send RPC 7339423828195556328 to wn03.itversity.com/172.16.1.104:37569: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
18/06/28 00:04:30 WARN NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts
java.io.IOException: Failed to send RPC 7339423828195556328 to wn03.itversity.com/172.16.1.104:37569: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
18/06/28 00:04:33 ERROR TransportClient: Failed to send RPC 6490132699757942167 to wn03.itversity.com/172.16.1.104:37569: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
18/06/28 00:04:33 WARN NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 2 attempts
java.io.IOException: Failed to send RPC 6490132699757942167 to wn03.itversity.com/172.16.1.104:37569: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)


#2

this is specially happening with parameter --executor-cores assigned to some higher Value.

Each time , it has to utilize resources from other servers in the cluster , it fails ! its only working with setting which use single server resource.