Error when run spark content read from file(the first program to validate spark with pycharm?

the code is :
from pyspark import SparkConf, SparkContext
sc = SparkContext(‘local’, ’ Spark Demo’)
print (sc.textFile(“C:\deckofcards.txt”).first())

when I run the file I got error

C:\Users\aakdar\PycharmProjects\FirstProject\venv\Scripts\python.exe C:/Users/aakdar/PycharmProjects/FirstProject/Spark.py
Traceback (most recent call last):
File “C:/Users/aakdar/PycharmProjects/FirstProject/Spark.py”, line 2, in
sc = SparkContext(‘local’, ’ Spark Demo’)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\context.py”, line 112, in init
SparkContext._ensure_initialized(self, gateway=gateway)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\context.py”, line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\java_gateway.py”, line 48, in launch_gateway
SPARK_HOME = os.environ[“SPARK_HOME”]
File “C:\Users\aakdar\PycharmProjects\FirstProject\venv\lib\os.py”, line 425, in getitem
return self.data[key.upper()]
KeyError: ‘SPARK_HOME’

Process finished with exit code 1

please advise ?

please follow below blog to Develop pyspark program using Pycharm on Windows 10

I already follow all steps from Udemy course
same error here
any advise ?

I set env inside the pycharm
the error is changed
C:\Users\aakdar\PycharmProjects\FirstProject\venv\Scripts\python.exe C:/Users/aakdar/PycharmProjects/FirstProject/Spark.py
Traceback (most recent call last):
File “C:/Users/aakdar/PycharmProjects/FirstProject/Spark.py”, line 2, in
sc = SparkContext(master=“local”,appName=“Spark Demo”)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\context.py”, line 112, in init
SparkContext._ensure_initialized(self, gateway=gateway)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\context.py”, line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\java_gateway.py”, line 79, in launch_gateway
proc = Popen(command, stdin=PIPE, env=env)
File “C:\Python27\Lib\subprocess.py”, line 390, in init
errread, errwrite)
File “C:\Python27\Lib\subprocess.py”, line 640, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Process finished with exit code 1

To read a local file you can use below way, you have to use double slashes
rdd = sc.textFile(“C:\Users\username\Desktop\sample.txt”)

the problem is resolved I change the vm java XMS XMX

Hi
I Follow the all the steps but install Spark 2 with Python 3.6, everything is fine, and pyspark is working fine but when i try to read the file its gave error: ModuleNotFoundError: No module named ‘resource’.
code:
sc.textFile(“C:\Users\Parvez\Documents\SparkAndPython\deckofcards.txt”).first()

Error:

File “C:\Users\Parvez\Anaconda3\lib\runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “C:\Users\Parvez\Anaconda3\lib\runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “C:\Users\Parvez\Documents\SparkAndPython\spark-2.4.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py”, line 25, in
ModuleNotFoundError: No module named ‘resource’
2019-03-06 01:27:12 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.net.PlainSocketImpl.accept(Unknown Source)
at java.net.ServerSocket.implAccept(Unknown Source)
at java.net.ServerSocket.accept(Unknown Source)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:164)
… 14 more
2019-03-06 01:27:12 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.net.PlainSocketImpl.accept(Unknown Source)
at java.net.ServerSocket.implAccept(Unknown Source)
at java.net.ServerSocket.accept(Unknown Source)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:164)
… 14 more

@Mohammad_Parvez Can you install the previous version of Spark (2.3 instead of 2.4). It is an issue of the lastest version of pyspark.

@annapurna that is not correct. It should work with 2.4 as well. Please see the code is working on your windows PC with Spark 2.3 or not.

I think on Windows you need to provide path with \\ not \ at each and every directory.

We recommend using Ubuntu on your Windows 10 to practice Pyspark. You can setup using Windows Subsystem for Linux.

@Mohammad_Parvez Its work fine in spark2.3. Please refer below screenshot.

We recommend using Ubuntu on your Windows 10 to practice Pyspark. You can setup using Windows Subsystem for Linux.

thanks , i installed spark2.3. its working fine.

I changed the path and try again but it was not working in spark2.4. its working fine in spark2.3.

Is uninstalling 2.4 version and instlling 2.3 the only way to solve this issue? I tried using printing: print(sc.textFile(“C:\deckofcards.txt”).first()). Got the same issue " ModuleNotFoundError: No module named ‘resource’"

I tried with double ‘’ as well, still the same problem.
print(sc.textFile(“C:\deckofcards.txt”).first()) - ModuleNotFoundError: No module named ‘resource’"

i’m having this issue anyone can help

An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException: Unsupported class file major version 55

have u tried the module findspark