Are you working professional with experience in one of the below roles and transition to Data Engineer?
- Mainframes Developer
- ETL Developer using technologies like Informatica, Ab Initio, Data Stage, SAS etc
- Datawarehouse Developer
- Database Developer
- Application Developers should have idea about Data Engineering tools and technologies but it need not be better career choice.
Here is the plan to become Data Engineer using Big Data eco system:
- Good understanding of Linux commands and ability to understand as well as develop shell scripts
- Expert in writing high quality and efficient SQL
- Good understanding about Data Modeling - both Normalized data models as well as Dimensional Modeling
- Good core programming skills using any programming language - preferably Python, Scala or Java (Object Oriented concepts are not that important)
- Expertise in Spark - Data Frame Operations, Spark SQL. One should be able to develop Scala or Python or Java based applications using Spark APIs
- SQL based tools in Big Data - Spark SQL, Hive, Impala, Presto etc
- Ability to build batch data pipelines using programming language and Spark with scheduling tools such as Azkaban, Airflow or any other enterprise scheduler
- High level understanding about NoSQL technologies such as HBase, Cassandra, MongoDB etc with expertise in one of the NoSQL technologies
- Real time data ingestion using tools like Kafka and integrating with Spark Streaming to apply rules in real time and derive streaming insights
- Good knowledge about Amazon EMR and other analytics services such as Kinesis, Athena etc
Please add any more information as part of the reply.