Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Brief overview of Python and Scala

Basics (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Understanding the Basics through the Databricks Environment (Hands-on Workshop):

  • Exercises using the RDD API
  • Basic action and transformation functions
  • PairRDD
  • Joins
  • Caching strategies
  • Exercises using the DataFrame API
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • UDF (User Defined Functions)
  • Exploring the DataFrame API
  • Streaming

Understanding Deployment in the AWS Environment (Hands-on Workshop):

  • Basics of AWS Glue
  • Key differences between AWS EMR and AWS Glue
  • Example jobs in both environments
  • Advantages and disadvantages of each

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python and Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories