How to develop, test, build and deploy to production Spark application using Docker + Kubernetes


Hi All,

I am searching and still could not find any complete tutorial / article / book, especially for developing a spark application using Python / Scala and then test it, build and move to production environment using docker containers etc.

Someone, who can explain in detail, also if organization A have 20+ data scientists and all are comfortably working on their own machines, as a Data Engineer, how can I provide them a central or same environment for development and easy for me to handle and product-ionize?

Also, because I am trying to understand the docker first time, still I am not able to grasp:

If there is a hadoop cluster of one master node, 10 data nodes, how the docker works on it? If there is any explanation with some example or visual presentation, please help me to find it.

It would be great if Durga Sir can arrange 1/2 sessions for explaining end-to-end whole process with hands on examples… :slightly_smiling_face:

Thanks in advance!

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster