Big Data and Hadoop

img

Course Information

  • Course Price $250
  • Total Students 800+
  • Course Duration 4 Weeks

Description

The amount of data produced by the internet is rising every day. Enterprises need highly skilled Hadoop professionals to handle this huge data. A technology that has become one of the major forces for handling Big Data processing is Hadoop. This amazing platform assists in storing, dealing with and retrieving huge sums of data in diverse applications. This also assists in deep analytics. Several organizations are adopting Hadoop, and the demand for Hadoop Developers is seeing a raise. Lay the Foundation for an Excellent Career with the Leading Big Data Hadoop Technology changes at a swift rate and also the demands in the job scenario. To keep yourself updated you should be aware of the functioning of Big Data Hadoop; you can also be in pace with the changing trends.

Benefits

  • Controlling data is a challenging task and companies need skilled people to deal with their data to tackle these challenges. There is a great demand for Big Data Hadoop experts in big companies
  • People who can grasp and ace components of Hadoop ecosystems are much in need. The sooner you learn the skill the greater the chance of getting placed in a top organization
  • The salary provided to Big Data Hadoop trained participants is high; when you possess an experience of six months to a year then you can get a lucrative salary
  • The scope of progressing and earning is really big. Learn Big Data Hadoop from LETFIX and enter into your desired job

Syllabus

Session 1- Big Data Introduction
  • What is Big Data
  • Evolution of Big Data
  • Benefits of Big Data
  • Operational vs Analytical Big Data
  • Need for Big Data Analytics
  • Big Data Challenges
Session2 –Hadoop cluster
  • Master Nodes
    • Name Node
    • Secondary Name Node
    • Job Tracker
  • Client Nodes
  • Slaves
  • Hadoop configuration
  • Setting up a Hadoop cluster
session 3- HDFS
  • Introduction to HDFS
  • HDFS Features
  • HDFS Architecture
  • Blocks
  • Goals of HDFS
  • The Name node & Data Node
  • Secondary Name node
  • The Job Tracker
  • The Process of a File Read
  • How does a File Write work?
  • Data Replication
  • Rack Awareness
  • HDFS Federation
  • Configuring HDFS
  • HDFS Web Interface
  • Fault tolerance
  • Name node failure management
  • Access HDFS from Java
Session 4-Yarn
  • Introduction to Yarn
  • Why Yarn
  • Classic MapReduce v/s Yarn
  • Advantages of Yarn
  • Yarn Architecture
    • Resource Manager
    • Node Manager
    • Application Master
  • Application submission in YARN
  • Node Manager containers
  • Resource Manager components
  • Yarn applications
  • Scheduling in Yarn
    • Fair Scheduler
    • Capacity Scheduler
  • Fault tolerance
Session 5-MapReduce
  • What is MapReduce
  • Why MapReduce
  • How MapReduce works
  • Difference between Hadoop 1 & Hadoop 2
  • Identity mapper & reducer
  • Data flow in MapReduce
  • Input Splits
  • Relation Between Input Splits and HDFS Blocks
  • Flow of Job Submission in MapReduce
  • Job submission & Monitoring
  • MapReduce algorithms
    • Sorting
    • Searching
    • Indexing
    • TF-IDF
Session 6-Hadoop Fundamentals
  • What is Hadoop
  • History of Hadoop
  • Hadoop Architecture
  • Hadoop Ecosystem Components
  • How does Hadoop work
  • Why Hadoop & Big Data
  • Hadoop Cluster introduction
  • Cluster Modes
    • Standalone
    • Pseudo-distributed
    • Fully – distributed
  • HDFS Overview
  • Introduction to MapReduce
  • Hadoop in demand
Session 7-HDFS Operations
  • Starting HDFS
  • Listing files in HDFS
  • Writing a file into HDFS
  • Reading data from HDFS
  • Shutting down HDFS
Session 8-HDFS Command Reference
  • Listing contents of directory
  • Displaying and printing disk usage
  • Moving files & directories
  • Copying files and directories
  • Displaying file contents
Session 9-Java Overview for Hadoop
  • Object oriented concepts
  • Variables and Data types
  • Static data type
  • Primitive data types
  • Objects & Classes
  • Java Operators
  • Method and its types
  • Constructors
  • Conditional statements
  • Looping in Java
  • Access Modifiers
  • Inheritance
  • Polymorphism
  • Method overloading & overriding
  • Interfaces
Session 10-MapReduce Programming
  • Hadoop data types
  • The Mapper Class
    • Map method
  • The Reducer Class
    • Shuffle Phase
    • Sort Phase
    • Secondary Sort
    • Reduce Phase
  • The Job classes
    • Job class constructor
  • Job Context interface
  • Combiner Class
    • How Combiner works
    • Record Reader
    • Map Phase
    • Combiner Phase
    • Reducer Phase
    • Record Writer
  • Partitioners
    • Input Data
    • Map Tasks
    • Partitioner Task
    • Reduce Task
    • Compilation & Execution

Hadoop Ecosystems

Session 11-Pig
  • What is Apache Pig?
  • Why Apache Pig?
  • Pig features
  • Where should Pig be used
  • Where not to use Pig
  • The Pig Architecture
  • Pig components
  • Pig v/s MapReduce
  • Pig v/s SQL
  • Pig v/s Hive
  • Pig Installation
  • Pig Execution Modes & Mechanisms
  • Grunt Shell Commands
  • Pig Latin – Data Model
  • Pig Latin Statements
  • Pig data types
  • Pig Latin operators
  • Case Sensitivity
  • Grouping & Co Grouping in Pig Latin
  • Sorting & Filtering
  • Joins in Pig Latin
  • Built-in Function
  • Writing UDFs
  • Macros in Pig
Session 12-HBase
  • What is HBase
  • History of HBase
  • The NoSQL Scenario
  • HBase & HDFS
  • Physical Storage
  • HBase v/s RDBMS
  • Features of HBase
  • HBase Data model
  • Master server
  • Region servers & Regions
  • HBase Shell
  • Create table and column family
  • The HBase Client API
Session 13-Spark
  • Introduction to Apache Spark
  • Features of Spark
  • Spark built on Hadoop
  • Components of Spark
  • Resilient Distributed Datasets
  • Data Sharing using Spark RDD
  • Iterative Operations on Spark RDD
  • Interactive Operations on Spark RDD
  • Spark shell
  • RDD transformations
  • Actions
  • Programming with RDD
    • Start Shell
    • Create RDD
    • Execute Transformations
    • Caching Transformations
    • Applying Action
    • Checking output
  • GraphX overview
Session 14-Impala
  • Introducing Cloudera Impala
  • Impala Benefits
  • Features of Impala
  • Relational databases vs Impala
  • How Impala works
  • Architecture of Impala
  • Components of the Impala
    • The Impala Daemon
    • The Impala State store
    • The Impala CatLog Service
  • Query Processing Interfaces
  • Impala Shell Command Reference
  • Impala Data Types
  • Creating & deleting databases and tables
  • Inserting & overwriting table data
  • Record Fetching and ordering
  • Grouping records
  • Using the Union clause
  • Working of Impala with Hive
  • Impala v/s Hive v/s HBase
Session 15-MongoDB Overview
  • Introduction to MongoDB
  • MongoDB v/s RDBMS
  • Why & Where to use MongoDB
  • Databases & Collections
  • Inserting & querying documents
  • Schema Design
  • CRUD Operations
Session 16-Oozie & Hue Overview
  • Introduction to Apache Oozie
  • Oozie Workflow
  • Oozie Coordinators
  • Property File
  • Oozie Bundle system
  • CLI and extensions
  • Overview of Hue
Session 17-Hive
  • What is Hive?
  • Features of Hive
  • The Hive Architecture
  • Components of Hive
  • Installation & configuration
  • Primitive types
  • Complex types
  • Built in functions
  • Hive UDFs
  • Views & Indexes
  • Hive Data Models
  • Hive vs Pig
  • Co-groups
  • Importing data
  • Hive DDL statements
  • Hive Query Language
  • Data types & Operators
  • Type conversions
  • Joins
  • Sorting & controlling data flow
  • local vs MapReduce mode
  • Partitions
  • Buckets
Session 18-Sqoop
  • Introducing Sqoop
  • Scoop installation
  • Working of Sqoop
  • Understanding connectors
  • Importing data from MySQL to Hadoop HDFS
  • Selective imports
  • Importing data to Hive
  • Importing to HBase
  • Exporting data to MySQL from Hadoop
  • Controlling import process
Session 19-Flume
  • What is Flume?
  • Applications of Flume
  • Advantages of Flume
  • Flume architecture
  • Data flow in Flume
  • Flume features
  • Flume Event
  • Flume Agent
    • Sources
    • Channels
    • Sinks
  • Log Data in Flume
Session 20-Zookeeper Overview
  • Zookeeper Introduction
  • Distributed Application
  • Benefits of Distributed Applications
  • Why use Zookeeper
  • Zookeeper Architecture
  • Hierarchical Namespace
  • Znodes
  • Stat structure of a Znode
  • Electing a leader
Session 21-Kafka Basics
  • Messaging Systems
    • Point-to-Point
    • Publish – Subscribe
  • What is Kafka
  • Kafka Benefits
  • Kafka Topics & Logs
  • Partitions in Kafka
  • Brokers
  • Producers & Consumers
  • What are Followers
  • Kafka Cluster Architecture
  • Kafka as a Pub-Sub Messaging
  • Kafka as a Queue Messaging
  • Role of Zookeeper
  • Basic Kafka Operations
    • Creating a Kafka Topic
    • Listing out topics
    • Starting Producer
    • Starting Consumer
    • Modifying a Topic
    • Deleting a Topic
  • Integration with Spark
Session 22-Scala Basics
  • Introduction to Scala
  • Spark & Scala interdependence
  • Objects & Classes
  • Class definition in Scala
  • Creating Objects
  • Scala Traits
  • Basic Data Types
  • Operators in Scala
  • Control structures
  • Fields in Scala
  • Functions in Scala
  • Collections in Scala
    • Mutable collection
    • Immutable collection
LETFIX Technologies
LETFIX Technologies