Big data Lab information


#1

I’m planning to subscribe to this lab. I have the following questions. Can someone clarify?

Is kafka set up on this lab? if so which version is available? Does kafka have a clustering set up?
Lab contains CDH distribution or hortonworks?
Can i use ML tools like tensorflow?
Can i use pyspark in this lab?
how much is the RAM size for the subscription?
Is it allowed to copy the data to the cluster? if so how much is the limit?
Request you to clarify the above questions.


#2

@Sudarsan_Padmanaban

  1. Yes Kafka is available in labs, the Kafka version available in labs is 1.0.0
  2. Lab contains Hortonworks distribution
  3. Tensor flow is not available in labs
  4. Yes pyspark and spark-shell both are available
  5. You will get access to a 64 GB Gate way node which can be accessed by multiple users as well.
    6)Yes you are allowed to copy data from your local machine to the itversity labs cluster. Let us know how much is the data you want to copy.

You can check below details of the lab

What are the tools and versions that are available as part of your Big Data Developer Labs?
https://itversity.freshdesk.com/support/solutions/articles/35000046991-what-are-the-tools-and-versions-that-are-available-as-part-of-your-big-data-developer-labs-


#3

Thanks for your reply. is it possible to copy around 8 GB of data for running spark ML programs? is there any restrictions in data size? if so what is the maximum size of data we can copy?

Do you also have mysql? if i want to use sqoop, which relational database i can use to test it?


#4

It is possible to copy the huge data but not recommend as it will reduce the speed. It is better to solve the data set in your local machine, build a jar file of it and then copy it to the cluster to run it using spark-submit

You can use mysql and hive