As part of this discussion Chad from Amazon have walked us through details about how to use Spot Instances while spinning up EMR Cluster to process the data.
Following are covered in detail:
- Why Spot? How has Spot changed overtime?
- What should be the criteria for choosing Spot Instances? What types of workloads are a good fit for Spot Instances?
- What makes Spot Instances a good fit for EMR?
- How much can I expect to save by using Spot Instances vs On-Demand Instances for EMR?
- Should I use Instance Fleets or Instance Groups when using Spot Instances?
- What are the most common ways EMR clusters are created with spot instances? In EMR we have master node(s), core nodes and task nodes. For which type of nodes we can consider Spot instances?
- Demo - Create EMR Cluster with Spot Instances and run Spark Job
- Case Study
You can access the material for demo using this link.