Hello Friends ,
I am still confused regarding the python/scala scripts that will be given.
- so are we going to execute a .py/.scala scripts or execute it in CLI one by one
2)even if they ask us to run the script ? can we execute it in CLI one by one.
i mean i can run the script, but i am wondering if something goes wrong. can i execute it line by line in pyspark/scala shell as at the end the output will be the same (storing,aggregating etc etc)as it would have been while running the script.
3)do we need to create the SC in the script to run or they will be providing all this configuration as i heard it will be skeleton of codes with codes to fillup .
4)if it is a.py script then , spark-submit --master local testScript.py will execute the .py script , so should we run it in local mode or yarn mode.
5)or is it that there would be two scripts , 1 main .sh script and a child script which will be .py/scala script. we need to fill the .py script and launch the .sh script (./script,sh)which will call the .py script and execute it without doing all those configuration things and setting SC etc.
please guys i need help in understanding how the script is going to be and how will we be executing .