Pyspark launched with winutil error

pyspark

#1

I am trying to launch Pyspark through Command prompt and got the following error.

"
C:\WINDOWS\system32>pyspark
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/22 09:50:13 INFO SparkContext: Running Spark version 1.6.3
17/11/22 09:50:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
17/11/22 09:50:14 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
** at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)**
** at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)**
** at org.apache.hadoop.util.Shell.(Shell.java:363)**
** at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)**
** at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)**
** at org.apache.hadoop.security.Groups.(Groups.java:86)**
** at org.apache.hadoop.security.Groups.(Groups.java:66)**
** at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)**
** at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)**
** at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)**
** at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)**
** at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)**
** at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)**
** at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2214)**
** at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2214)**
** at scala.Option.getOrElse(Option.scala:120)**
** at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214)**
** at org.apache.spark.SparkContext.(SparkContext.scala:322)**
** at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)**
** at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)**
** at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)**
** at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)**
** at java.lang.reflect.Constructor.newInstance(Unknown Source)**
** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)**
** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)**
** at py4j.Gateway.invoke(Gateway.java:214)**
** at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)**
** at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)**
** at py4j.GatewayConnection.run(GatewayConnection.java:209)**
** at java.lang.Thread.run(Unknown Source)**
17/11/22 09:50:14 INFO SecurityManager: Changing view acls to: karth
17/11/22 09:50:14 INFO SecurityManager: Changing modify acls to: karth
17/11/22 09:50:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(karth); users with modify permissions: Set(karth)
17/11/22 09:50:15 INFO Utils: Successfully started service ‘sparkDriver’ on port 64099.
17/11/22 09:50:15 INFO Slf4jLogger: Slf4jLogger started
17/11/22 09:50:15 INFO Remoting: Starting remoting
17/11/22 09:50:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.80.1:64112]
17/11/22 09:50:15 INFO Utils: Successfully started service ‘sparkDriverActorSystem’ on port 64112.
17/11/22 09:50:15 INFO SparkEnv: Registering MapOutputTracker
17/11/22 09:50:15 INFO SparkEnv: Registering BlockManagerMaster
17/11/22 09:50:15 INFO DiskBlockManager: Created local directory at C:\Users\karth\AppData\Local\Temp\blockmgr-84aa1fa8-8f51-4bba-8add-482a4e613a1a
17/11/22 09:50:15 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/11/22 09:50:15 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/22 09:50:16 INFO Utils: Successfully started service ‘SparkUI’ on port 4040.
17/11/22 09:50:16 INFO SparkUI: Started SparkUI at http://192.168.80.1:4040
17/11/22 09:50:16 INFO Executor: Starting executor ID driver on host localhost
17/11/22 09:50:16 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 64131.
17/11/22 09:50:16 INFO NettyBlockTransferService: Server created on 64131
17/11/22 09:50:16 INFO BlockManagerMaster: Trying to register BlockManager
17/11/22 09:50:16 INFO BlockManagerMasterEndpoint: Registering block manager localhost:64131 with 511.1 MB RAM, BlockManagerId(driver, localhost, 64131)
17/11/22 09:50:16 INFO BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .
_/_,// //_\ version 1.6.3
/
/

Using Python version 2.7.14 (v2.7.14:84471935ed, Sep 16 2017 20:19:30)
SparkContext available as sc, HiveContext available as sqlContext.

"

I had set the environment variable for WinUtils as shown below

I had also tried running CMD as 'Run as Administrator’, but winutil is not being recognized in cmd prompt.

Still, the error persists while launching pyspark. Thanks.


#2

As it is unable to locate the Winutils.exe. I had added them to Path…

Now I got the differnt error as shown below


D:\Python27\python.exe D:/PycharmProjects/sparkdemo/src/main/python/SparkDemo.py
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/23 08:32:15 INFO SparkContext: Running Spark version 1.6.3
17/11/23 08:32:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
17/11/23 08:32:15 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable C:\winutils\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.(Groups.java:86)
at org.apache.hadoop.security.Groups.(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2214)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2214)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214)
at org.apache.spark.SparkContext.(SparkContext.scala:322)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Unknown Source)
17/11/23 08:32:15 INFO SecurityManager: Changing view acls to: karth
17/11/23 08:32:15 INFO SecurityManager: Changing modify acls to: karth

I had set the bove Env variable once by mistake with additional bin directory. Even though I changed them back still windows is looking into that location.

After placing winutil.exe in ‘C:\winutils\bin\bin’ . It is working as expected.

Please note all my Environment variables are pointing to “C:\winutils\bin\winutils.exe” but hadoop binaries are looking into the different directory.

Now this issue is fixed and i am able to execute Spark code through pycharm and pyspark without any error.

Thanks
Karthick.


#3

Can you paste new environment variables and PATH variables screenshot.