Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- History and core concepts of Hadoop
- Overview of the Hadoop ecosystem
- Different distributions available
- High-level architecture
- Common misconceptions about Hadoop
- Challenges associated with Hadoop
- Hardware and software requirements
- Lab: First look at Hadoop
Section 2: HDFS
- Design and architecture
- Key concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: Namenode, Secondary Namenode, DataNode
- Communication mechanisms and heart-beats
- Data integrity management
- Read and write paths
- Namenode High Availability (HA) and Federation
- Labs: Interacting with HDFS
Section 3: MapReduce
- Core concepts and architecture
- Daemons (MRv1): JobTracker and TaskTracker
- Execution phases: driver, mapper, shuffle/sort, and reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internals of MapReduce
- Introduction to writing MapReduce programs in Java
- Labs: Running a sample MapReduce program
Section 4: Pig
- Pig versus Java MapReduce
- Workflow of a Pig job
- The Pig Latin programming language
- ETL processes with Pig
- Transformations and Joins
- User Defined Functions (UDFs)
- Labs: Writing Pig scripts to analyze data
Section 5: Hive
- Architecture and design principles
- Data types supported
- SQL capabilities within Hive
- Creating Hive tables and performing queries
- Partitioning data
- Performing joins
- Text processing techniques
- Labs: Various practical exercises on processing data with Hive
Section 6: HBase
- Core concepts and architecture
- HBase versus RDBMS versus Cassandra
- HBase Java API
- Handling time-series data in HBase
- Schema design strategies
- Labs: Interacting with HBase via the shell; programming with the HBase Java API; Schema design exercise
Requirements
- Proficiency in the Java programming language (as most practical exercises are conducted in Java)
- Familiarity with the Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
Lab environment
No Installation Required: Students do not need to install Hadoop software on their local machines. A fully operational Hadoop cluster will be provided for use.
Participants will need to have access to the following tools:
- An SSH client (Linux and Mac systems come with built-in SSH clients; for Windows, PuTTY is recommended)
- A web browser to access the cluster (Firefox is recommended)
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already