Data Ingest - real time, near real time and streaming analytics - Spark Streaming – Develop Word Count program

  • Streaming Word Count
    • Develop the code
    • Start netcat service
    • Run application using spark-submit

from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext conf = SparkConf(). \ setAppName(“Streaming Department Count”). \ setMaster(“yarn-client”) sc = SparkContext(conf=conf) ssc = StreamingContext(sc, 15) lines = ssc.socketTetxStream(gw01.itversity.com", 19999) words = lines.flatMap(lambda line: line.split(" ")) wordTuples = words.map(lambda word => (word, 1)) wordCount = wordTuples.reduceByKey(lambda x, y: x + y) wordCount.print() ssc.start() ssc.awaitTermination()


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster