Join us for our fully interactive live online classes as AnalytixLabs pledges Social Distancing!

Big Data Course - Certified Big Data Engineer

One of the most comprehensive Big Data Analytics courses using SQL, NoSQL, Hadoop- Spark and Cloud computing

Do you know that over 80% of the enterprises have moved Big Data to Cloud? Cloud computing helps to process and analyse large amounts of data sets and social media information at lightning speed. This enables timely insights to make right business decisions and improve business performance remarkably. No wonder McKinsey Global Institute estimates shortage of 1.7 million Big Data professionals over next 3 years.

Considering the gap in the demand and supply, this Big Data course can help IT/ ITES professionals to bag lucrative opportunities by gaining sought after Big Data Analytics skills.  If you are looking for Big Data online course or Big Data training in Bangalore, Gurgaon or Noida, your search end here with India’s top rated institute. 

In this Big Data training you will gain practical skills on Data Engineering using SQL, NoSQL (MongoDB), Hadoop ecosystem, including most widely used components like HDFS, Sqoop, Hive, Impala, Spark & Cloud Computing. Hence, this is one of the few Big Data Analytics courses that focusses on multiple dimensions. This is important to leverage structured and unstructured Big Data sources for advanced analytics, artificial intelligence and large scale machine learning models for predictive analytics.

This is an extensive Big Data analytics training with flexibility of attending the Big Data online training, through self-paced videos and classroom mode as well. At end of the program candidates are awarded Certification on successful completion of assignments and projects that are provided as part of the training. 

You may also like to know about Advance Big Data Science program to learn Big Data along with Machine Learning skills.

Big Data Hadoop course duration: 120 hours (At least 60 hours live training + Practice and Self-study, with ~8 hrs of weekly self-study)


Delivery Formats:

1. Classroom Big Data training in Gurgaon, Noida & Bangalore

2. Fully interactive live online training (Global access)

3. Self-paced e-learning modules (Global access)


Useful Blogs:

1. Big Data and Its Importance for Enterprise

2. 3 reasons to start with a Big Data Hadoop certification course

3. Cloud deployment capabilities for AI and Big Data

Who Should do this course?

IT/ ITES, Business Intelligence, Database professionals/ computer science (or any other circuit branches) graduates who are not just looking for generic Hadoop training for Data Engineering role, but want Big Data Engineering certification based on practical Hadoop-Spark and Cloud Computing skills.


Combo Deals!

Learn more, save more.
See our combo offers here.

Course Duration 120 hours
Classes 20
Tools Cloudera Hadoop VM, Spark, MongoDB, AWS/ AZURE/ GCP
Learning Mode Live/Video Based

What will you get

Access to 60 hours of instructor led live classes of 20x3 hours each, spread over 10 weekends

Video recordings of the class sessions for self study purpose

Weekly assignment, refernce codes and study material in PDF format

Module wise case studies/ projects

Specially curated study material and sample questions for Big Data Certification (Developer/Analyst)

Career guidance and career support post the completion of some selected assignments and case studies

Course Outline

  • What is Big Data & Data engineering?
  • Importance of Data engineering in the Big Data world
  • Role of RDBMS (SQL Server), Hadoop, Spark, NOSQL and Cloud computing in Data engineering
  • What is Big Data Analytics
  • Key terminologies (Data Mart, Data ware house, Data Lake, Data Ocean, ETL, Data Model, Schema, Data pipeline etc)

  • What are Databases & RDBMS
  • Create data model (Schema –Meta Data –ER Diagram) & database
  • Data Integrity Constraints & types of Relationships
  • Working with Tables
  • Introduction to SQL Server & SQL
  • SQL Management Studio & Utilizing the Object Explorer
  • Basic concepts – Queries, Data types & NULL Values, Operators, Comments in SQL, Joins, Indexes, Functions, Views, Sorting, filtering, sub querying, summarising, merging, appending, new variable creation, case when statement usage etc.
  • Data manipulation – Reading & Manipulating a Single and multiple tables
  • Data based objects creation(DDL Commands) (Tables, Indexes, views etc)
  • Optimizing your work
  • End to End to data manipulation exercise

  • Motivation for Hadoop
  • Limitations and Solutions of existing Data Analytics Architecture
  • Comparison of traditional data management systems with Big Data Evaluate key framework requirements for Big Data analytics
  • Hadoop Ecosystem & core components
  • The Hadoop Distributed File System - Concept of data storage
  • Explain different types of cluster setups(Fully distributed/Pseudo etc.)
  • Hadoop Cluster Overview & Architecture
  • A Typical enterprise cluster – Hadoop Cluster Modes
  • HDFS Overview & Data storage in HDFS
  • Get the data into Hadoop from local machine(Data Loading ) - vice versa
  • Practice complete data loading and managing them using command line(Hadoop commands) & HUE
  • Map Reduce Overview (Traditional way Vs. MapReduce way)

  • Integrating Hadoop into an Existing Enterprise
  • Loading Data from an RDBMS into HDFS, Hive, Hbase Using Sqoop
  • Exporting Data to RDBMS from HDFS, Hive, Hbase Using Sqoop

  • Apache Hive - Hive Vs. PIG - Hive Use Cases
  • Discuss the Hive data storage principle
  • Explain the File formats and Records formats supported by the Hive environment
  • Perform operations with data in Hive
  • Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
  • Hive Script, Hive UDF
  • Join datasets using a variety of techniques, including Map-side joins and Sort-Merge-Bucket joins
  • Use advanced Hive features like windowing, views and ORC files
  • Hive Persistence formats
  • Loading data in Hive - Methods
  • Serialization & Deserialization
  • Integrating external BI tools with Hadoop Hive
  • Use the Hive analytics functions (rank, dense_rank, cume_dist, row_number)
  • Use Hive to compute ngrams on Avro-formatted files

  • Impala & Architecture
  • How Impala executes Queries and its importance

  • Introduction to Data Analysis Tools
  • Apache PIG - MapReduce Vs Pig, Pig Use Cases
  • PIG’s Data Model
  • PIG Streaming
  • Pig Latin Program & Execution
  • Pig Latin : Relational Operators, File Loaders, Group Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
  • PIG Macros
  • Parameterization in Pig (Parameter Substitution)
  • Use Pig to automate the design and implementation of MapReduce applications
  • Use Pig to apply structure to unstructured Big Data

  • Introduction to Apache Spark
  • Streaming Data Vs. In Memory Data
  • Map Reduce Vs. Spark
  • Modes of Spark
  • Spark Installation Demo
  • Overview of Spark on a cluster
  • Spark Standalone Cluster

  • Invoking Spark Shell
  • Creating the Spark Context
  • Loading a File in Shell
  • Performing Some Basic Operations on Files in Spark Shell
  • Caching Overview
  • Distributed Persistence
  • Spark Streaming Overview

  • Basics of Scala that are required for programming Spark applications
  • Basic constructs of Scala such as variable types, control structures, collections, and more

  • Understanding & Loading data into RDD
  • Hadoop RDD, Filtered RDD, Joined RDD
  • Transformations, Actions and Shared Variables
  • Spark Operations on YARN
  • Sequence File Processing

  • Spark Structured Query Language
  • Linking with Spark SQL
  • Initializing Spark SQL and execute Basic Queries
  • Analyze Hive and Spark SQL Architecture

  • Spark Streaming, its Architecture and abstraction
  • Different Transformations in Spark Streaming such as Stateless and Stateful, Input Sources
  • 24/7 Operations and Streaming UI

  • Introduction to MLib
  • Data Types and working with vectors
  • Examples for usage of Spark MLLib

  • Limitations of RDBMS & Motivation for NoSQL
  • Nosql Design goals & Advantages
  • Types of Nosql databases (Categories) – Cassandra/MongoDB/Hbase
  • CAP theorem
  • How data stored in a NoSQL data storage
  • NoSQL database queries and update languages
  • Indexing and searching in NoSQL Databases
  • Reducing data via reduce function
  • Clustering and scaling of NoSQL Database

  • Overview & Architecture of MongoDB
  • Depth understanding of Database and Collection
  • Documents and Key/Values etc.
  • Introduction to JSON and BSON Documents
  • Installing MongoDB on Linux
  • Usage of various MongoDB Tools available with MongoDB package
  • Introduction to MongoDB shell
  • MongoDB Data types
  • CRUD concepts & operations
  • Query behaviors in MongoDB

  • Data modeling concepts & approach
  • Analogy between RDBMS & MongoDB data modeling
  • Model relationship between documents (one-one, one-many)
  • Model tree structures with parent references and with child references
  • Challenges in modeling
  • Model data for Atomic operations and support search
  • Query building

  • API and drivers for MongoDB, HTTP and REST interface,
  • Install Node.js, dependencies
  • Node.js find & display data, Node.js saving and deleting data

  • Indexing concepts, Index types, Index properties, aggregation

  • MongoDB monitoring, health check, backups & Recovery options, Performance Tuning
  • Data Imports & Exports to & from MongoDB
  • Introduction to Scalability & Availability
  • MongoDB replication, Concepts around sharding, Types of sharding and Managing shards
  • Master – Slave Replication
  • Security concepts & Securing MongoDB

  • Creation of MongoDB app

  • What is Cloud Computing? Why it matters?
  • Traditional IT Infrastructure vs. Cloud Infrastructure
  • Cloud Companies (Microsoft Azure, GCP, AWS ) & their Cloud Services (Compute, storage, networking, apps, cognitive etc.)
  • Use Cases of Cloud computing
  • Overview of Cloud Segments: IaaS, PaaS, SaaS
  • Overview of Cloud Deployment Models
  • Overview of Cloud Security
  • Introduction to AWS, Microsoft Azure Cloud and OpenStack. Similarities and differences between these Public / Private Cloud offerings

  • Creating Virtual machine
  • Overview of available Big Data products & Analytics
  • Services in Cloud
  • Storage services
  • Compute Services
  • Database Services
  • Analytics Services
  • Machine Learning Services
  • Manage Hadoop Ecosystem & Spark, NOSQL in the Cloud Services
  • Creating Data pipelines
  • Scaling Data Analysis & Machine Learning Models

Case Studies

This case study aims to give practical experience on Storing & managing different types of data(Structured/Semi/Unstructured) - both compressed and un-compressed.

This case study aims to give practical experience on understanding & developing Map reduce programs in JAVA & R and running streaming job in terminal & Ecclipse

This case study aims to give practical experience on Extracting data from Oracle and load into HDFS and vice versa also Extracting data from twitter and store in HDFS

This case study aims to give practical experience on complete data analysis using pig and create and usage of user defined function (UDF)

This case study aims to give practical experience on complete data analysis using Hive and create and usage of user defined function (UDF)

This case study aims to give practical experience on Data table/cluster creation using Hbase

The final project aims to give practical experience on how different modules(Pig-Hive-Hbase) can be used for solving big data problems


Don’t worry. You will always get a recording for the class in your inbox. Have a look at that and reach out to the faculty in case of doubts. All our live classes are recorded for self-study purpose and future reference, and these can also be accessed through our Learning Management System. Hence, in case you miss a class, you can refer to the video recording and then reach out to the faculty during their doubts clearing time or ask your question in the beginning of the subsequent class.

You can also repeat any class you want in the next one year after your course completion.

1 year post your course commencement. If needed, you can also repeat any number of classes you want in the next one year after course completion. Batch change policies will however, apply in this case.

In case required because any genuine reasons, the recordings access can be extended further for upto 1 year post the completion of one year validity. Please note that given the constant changes in the Analytics industry, our courses continue to be upgraded and hence old courses might no longer hold relevance. Hence, we do not promise lifetime access just for marketing purposes. 

No. Our recordings can be accessed through your account on LMS or stream them live online at any point of time though.

Recordings are integral part of AnalytixLabs intellectual property by Suo Jure. The downloading/distribution of these recordings in anyway is strictly prohibited and illegal as they are protected under copyright act. Incase a student is found doing the same, it will lead to an immediate and permanent suspension in the services, access to all the learning resources will be blocked, course fee will be forfeited and the institute will have all the rights to take strict legal action against the individual.

The sharing of LMS login credentials is unauthorized, and as a security measure, if the LMS is accessed by multiple places, it will flag in the system and your access to LMS can be terminated.

Yes. All our course are certified. As part of the course, students get weekly assignments and module-wise case studies. Once all your submissions are received and evaluated, the certificate shall be awarded.

We follow a comprehensive and a self-sustaining system to help our students with placements. This is a win-win situation for our candidates and corporate clients. As a pre-requisite for learning validation, candidates are required to submit the case studies and project work provided as a part of the course (flexible deadline). Support from our side is continuous and encompasses help in profile building, CV referrals (as and when applicable) through our ex-students, HR consultants and companies directly reaching out to us.

We will provide guidance to you in terms of what are the right profiles for you based on your education and experience, interview preparation and conducting mock interviews, if required. The placement process for us doesn’t end at a definite time post your course completion, but is a long relationship that we will like to build.

No institute can guarantee placements, unless they are doing so as a marketing gimmick! It is on a best effort basis.

In professional environment, it is not feasible for any institute to do so, except for a marketing gimmick. For us, it is on a best effort basis but not time – bound – in some cases students reach out to us even after 3 years for career support.

Yes we have classroom option for Delhi-NCR candidates. However, most of our students end up doing instructor led live online classes, including those who join classroom in the beginning. Based on the student feedback, the learning experience is same both in classroom and instructor led live online fully interactive mode.

We provide both the options and for instructor led live online classes we use the gold standard platform used by the top universities across the globe. These video sessions are fully interactive and students can chat or even ask their questions verbally over the VoIP in real time to get their doubts cleared.

To attend the online classes, all you need is a laptop/PC with a basic internet connection. Students have often shared good feedback of attending these live classes through their data card or even their mobile 3G connection, though we recommend a basic broadband connection.

For best user experience, a mic-headphone is recommended to enhance the voice quality, though the laptop’s in-built mic works fine and you can ask your question over the chat as well.

Through the LMS, students can always connect with the trainer or even schedule one-to-one time over the phone or online. During the course we also schedule periodic doubts-clearing classes though students can also ask doubts of a class in the subsequent class.

LMS also has a discussion forum where a lot of your doubts might get easily answered.

Incase you are having a problem still, repeat the class and schedule one-to-one time with the trainer.

For all the courses, we also provide the recordings of each class for their self-reference as well as revision in case you miss any concept in the class. In case you still have doubts after revising through the recordings, you can also take one-to-one time with the faculty outside classes during. Furthermore, if students want to break their courses in different modules, they get one year time to repeat any of the classes with other batches.

  • Instructor Led Live online or Classroom - Within 7 days of registartion date and latest 3 days before batch start
  • Video-based - 2 days

Not for this course. The instalment options are available only for our courses which are atleast 3 months long.

It is recommended to have 64-bit operating system with minimum 8GB RAM so that the virtual lab can be installed easily

My name is Kalyan G. I completed my B.Tech(Electronics) and choose to start my career in data science based on recommendation from my seniors and mentors. I have choosen Analytixlabs for the training in Data science based on recommendation from one of my relative who completed the course at Analytixlabs. I am so happy with my decision of choosing Analytixlabs. I have gone through Data Science Boot camp (Data analytics & visualization and data science using Python & R) and the way course is structured is very good. They will make you work on different tools related to analytics and concepts are taught from scratch. Also explained such tha the layman can understand. We worked on 50+ projects & assignments related to the all topics covered in the course. They also provide materials for Reference for interview, also helped us in resume & profile building. Also helped us in forwarding resumes to multiple companies and finally landed in Data science Job. It does'nt matter what is your background, you can achieve your goal of career in Data science by learning this course. I would recommend everyone whoever and whatever back ground they have. Thank you Analytixlabs.

Kalyan Guntupalli
(Decision Scientist)

Change the course of your career

Over 6000 learners and hundreds making right choice every month!
Open chat
Got any questions? Get in touch with the course counsellor!
Course Brochure
Student Reviews
Upcoming Batches