Big Data & Hadoop Administrator

Apart from the Multi Node Local LAN Batch Clusters, single / multi node cluster on local machine through VM Virtualization will also be covered. Data ETL through Talend Open Studio / Sqoop & Orchestration through SaltStack

What is Hadoop Administration?

When things operate in a group, we need a manager. In computer terms, the manager is called the administrator. This administrator or admin is accountable for the maintenance of the systems in the cluster. The administrator is liable for the performance and accessibility of the systems on the cluster. Apart from this, the data present in the system and the jobs that run in it are also the administrator's duty. He/she will be required to take on tasks like configuration, monitoring, backing up, troubleshooting, upgrades, deployment, job management etc.


To start with fundamentals of Hadoop Administration one will be introduced to Hadoop framework, as basic outline about the tools and functionalities, its usages, history etc. All sorts of queries relating to why Hadoop is needed, what are its profits or advantages over the previous framework will be cleared to construct a strong foundation for the course. It will be compared with all available traditional file systems.


Online Big Data Hadoop Administrator Certification Training Course is abundant and aspirants can take up this course to become professionals in this area. The role of a Hadoop Admin is mainly allied with tasks that involve installing and monitoring Hadoop clusters. Hadoop Admin job responsibilities might include some routine tasks, but each one in important for the efficient and continued operation of Hadoop clusters, to avert problems and to enhance the overall performance. A Hadoop admin is the person accountable for keeping the firm’s Hadoop clusters safe and running efficiently. Online Hadoop Administration Course is available for working professionals who want to make it big in this field.


Hadoop Admin Job Roles and Responsibilities


Managing big data and Hadoop clusters offers various challenges to Hadoop admins with running test data through a couple of machines. Many times, formal deployments of Hadoop fail as the administrators try to replicate the processes and procedures tested on 1 or 2 different machines across more complex Hadoop clusters.


The distinctive responsibilities of a Hadoop admin include – deploying a Hadoop cluster, upholding a Hadoop cluster, adding and removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera Manager, configuring the NameNode high availability and possession of a track of all the running Hadoop jobs.


A Hadoop administrator will have to work thoroughly with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected. If working with open source Apache Distribution then Hadoop admins have to manually setup all the configurations - Core-Site, HDFS-Site, YARN-Site and MapRed-Site. However, when working with popular Hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on start-up and the Hadoop admin need not configure them manually. Hadoop admin is in control for capacity planning and estimating the requirements for lowering or increasing the capacity of the Hadoop cluster.

  • +

    Module 1: Introduction to Big Data and Ecosystem

    • Data Types - RDBMS, NoSQL, Time Series, Graph, Filesystem, Stream, Sensor, Spatial.
    • Distributed / Parallel Processing Concepts.
    • Hadoop TimeLine & History
    • Big Data Characteristics, Challenges with Traditional Systems.
    • Fundamentals, Core Components, Rack Awareness, Node & Cluster Concept.
    • Solution Types, Distributions & Specialties, Challenges & Complexity & Use Cases.
    • Linux, Filesystems & Other Terminology

      1) RedHat, CentOS, Ubuntu - VM, Server, AWS & Other Cloud Options.

      2)Ext3, Ext4, SAN, NAS, NFS, RAID, S3, ZFS, Alluxio, QuantcastFS, XtreemFS, BeeGFS, MooseFS, OrangeFS, LizardFS, Lucene.

      3) OpenLDAP, DNS, DHCP, NTP, Kerberos, CA, SSH, Putty, HAProxy, Saltstack.

    • Role Expectation, Job Description, Responsibilities & Growth Plan
    • Data Modelling, Designing, ETL (Development / Process) Management, Capacity Planning, Proposal, POCs & Deployment, for (New / Expansion) of Hardware and Software Environments, with Systems Engineering, Infrastructure, Network, Database, Application, Data Delivery and Business Intelligence teams, to ensure business applications are highly available and performing within agreed SLAs.
    • Installation, Implementation, Administration, Configuration,Connectivity, Scaling, Backup, Recovery, Updates, Upgrades,Security, (OS {Primarily Linux} / Memory / Network / Disk / File / User / Node / Volume) Management, Performance Monitoring, Tuning, Task Automation {Bash Scripting}, Maintenance, Support, CI Integration, Log Review (Data Exhaust), Quality Audit, (Develop/ Document) Best Practices & Benchmarking for New, Ongoing &Existing Enterprise Cluster, Based on specific / generic Distro orCloud Provider, and Apache Hadoop.
    • Primary Point of Contact for Vendor Selection, Management & Escalation.
  • +

    Module 2: HDFS, Hadoop Architecture & YARN

    • HDFS Components, Fault Tolerance, Horizontal Scaling, Block Size, Replication Factor, Daemons, HA, Federation, Quotas.
    • Anatomy of Read / Write & Failure / Recovery on HDFS.
    • YARN “The Hadoop OS” In Depth (Architecture, HA, RM, Scheduler, Queues, Node Labels)
  • +

    Module 3: Enviornment

    • Stack Insight {On Premise Vs Cloud} (Cloudera, Hortonworks, MapR, AWS).
    • Capacity Planning, Hardware / Virtualization Options
    • Multi Node “Cloudera” Cluster “First Look”.
    • Architecture Discussion, Network SetUp & Nodes Enlisting for “Batch" Multi Node Cluster for Classroom assignments and learning
    • Automated Bash Scripts Creation / Understanding for speeddeployment.
    • OS Modifications, Java, MySql & Other required Installations
    • Ensuring All Lab & Participants System Prerequisites are fulfilled for further proceedings.
  • +

    Module 4: Cloudera Multi Node “On Premise” Cluster (CentOS 7 + CDH 5.13)

    • Set up a local CDH repository. Install Cloudera Manager Server and agents. Install CDH using Cloudera Manager. Add a new node to an existing cluster. Add a service using Cloudera Manager.
    • Configure a service using Cloudera Manager. Create an HDFS user's home directory. Configure NameNode HA. Configure ResourceManager HA. Configure proxy for Hiveserver2/Impala.
    • Rebalance the cluster (bandwidth, balance).Set up alerting for excessive disk fill. Define and install a rack topology script. Install new type of I/O compression library
    • in cluster. Revise YARN resource assignment based on user feedback. Commission/decommission a node.
    • Configure HDFS ACLs. Install and configure Sentry. Configure Hue user authorization and authentication. Enable/configure log and query redaction. Create encrypted zones in HDFS.LDAP Authentication on Gateway Machines.
    • Execute file system commands via HTTPFS. Efficiently copy data within / between clusters. Create/restore a snapshot of an HDFS directory. Get/set ACLs for a file or directory structure. Benchmark the cluster (I/O, CPU, and network).
    • Resolve errors/warnings in Cloudera Manager. Resolve performance problems/errors in cluster operation. Determine reason for application failure. Configure the Fair Scheduler to resolve application delays.
  • +

    Module 5: Hortonworks Multi Node “On Premise” Cluster (CentOS 7 + HDP 2.6)

    • Configure a local HDP repository. Install ambari-server and ambariagent. Install HDP using the Ambari install wizard. Add a new node to an existing cluster. Decommission a node. Add an HDP service to a cluster using Ambari.
    • Define and deploy a rack topology script. Change the configuration of a service using Ambari. Configure the Capacity Scheduler. Create a home directory for a user and configure permissions. Configure the include and exclude DataNode files.
    • Restart an HDP service. View an application’s log file. Configure and manage alerts. Troubleshoot a failed job.–
    • Configure NameNode HA. Configure ResourceManager HA. Copy data between two clusters using distcp. Create a snapshot of an HDFS directory. Recover a snapshot. Configure HiveServer2 HA
    • Configure HDFS ACLs. Kerberos Implementation. LDAP Authentication on Gateway Machines. Benchmark the cluster (I/O, CPU, and network).

Key Features

32hours of Practical and certification oriented training Program

Trainers are Industry experts & certified professionals with more 15+ years of experience

Training is mainly focused on Creating the multi node Cluster, Data Segeregation , Data Modeling

Training Requires 8 GB RAM Machine to practice Hands On

100%_Money-Back-Guarantee*(Refund in case of non-satisfaction on the first day of the class)

Batch Size will be not more than 10 Candidates.


Case studies will be discuused on all the Major topics

Trainings will be during weekends which is convienet for Working Professionals

  • +

    Who are the Instructors?

    • We believe in quality & follow a rigorous process in selecting our trainers. All our trainers are industry experts/ professionals with an experience in delivering trainings
Date Time Course Type Price

CONTACT US

10:00 AM to 05:00 PM

Classroom

INR:23000

+ 18% (GST)

Please contact on 9108460933/8951896669 to know the details