Join us for our fully interactive live online classes as AnalytixLabs pledges Social Distancing!
 

Advance Big Data Science

Learn all that is necessary to be a Data Scientist - Python, Hadoop, Spark with Machine Learning

Do you know that Apache Spark has risen to become the most active open source project in big data? No wonder McKinsey Global Institute estimates shortage of 1.7 million Data Science and Big Data professionals over next 3 years.


Considering this increasing gap in the demand and supply with the help of this Advance Data Science training, IT/ ITES professionals can bag lucrative opportunities and boost their career by gaining sought after Big Data Analytics skills.


This is our advanced Big Data training, where attendees will gain practical skill set not only on Hadoop in detail, but also learn advanced analytics concepts through Python, Hadoop and Spark. For extensive hands-on practice, candidates will get access to the virtual lab and several assignments and projects. At end of the program candidates are awarded Advance Data Science Certification on successful completion of projects that are provided as part of the training.


A completely industry relevant Big Data Analytics training and a great blend of analytics and technology, making it quite apt for aspirants who want to develop Big Data skills and head-start in Big Data Analytics!


Course duration: 240 hours (Atleast 138 hours live training + Practice and Self-study, with ~8hrs of weekly self-study)

Who Should do this course?

Students coming from IT, Software, Datawarehouse background and wanting to get into the Big Data Analytics domain

SELECT THE COURSE
ENROLL NOW

Combo Deals!

Learn more, save more.
See our combo offers here.

Course Duration 240 hours
Classes 45
Tools Python, Spark,Hadoop, Cloud Computing
Learning Mode Live/Video Based

What will you get



Access to 105 hours instructor led live classes of 35x3 hours each, spread over 18 weekends



Video recordings of the class sessions for self study purpose



Weekly assignment, reference codes and study material in PDF format



Module wise case studies/ projects



Career guidance and career support post the completion of some selected assignments and case studies

Course Outline

  • What is Data Science?
  • Why Python for data science?
  • Relevance in industry and need of the hour
  • How leading companies are harnessing the power of Data Science with Python?
  • Different phases of a typical Analytics/Data Science projects and role of python
  • Anaconda vs. Python

  • Overview of Python- Starting with Python
  • Introduction to installation of Python
  • Introduction to Python Editors & IDE's(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
  • Understand Jupyter notebook & Customize Settings
  • Concept of Packages/Libraries - Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
  • Installing & loading Packages & Name Spaces
  • Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
  • List and Dictionary Comprehensions
  • Variable & Value Labels –  Date & Time Values
  • Basic Operations - Mathematical - string - date
  • Reading and writing data
  • Simple plotting
  • Control flow & conditional statements
  • Debugging & Code profiling
  • How to create class and modules and how to call them?
  • Scientific distributions used in python for Data Science - Numpy, scify, pandas, scikitlearn, statmodels, nltk etc

  • Importing Data from various sources (Csv, txt, excel, access etc)
  • Database Input (Connecting to database)
  • Viewing Data objects - subsetting, methods
  • Exporting Data to various formats
  • Important python modules: Pandas, beautifulsoup

  • Cleansing Data with Python
  • Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
  • Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
  • Python Built-in Functions (Text, numeric, date, utility functions)
  • Python User Defined Functions
  • Stripping out extraneous information
  • Normalizing data
  • Formatting data
  • Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)

  • Introduction exploratory data analysis
  • Descriptive statistics, Frequency Tables and summarization
  • Univariate Analysis (Distribution of data & Graphical Analysis)
  • Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  • Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
  • Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)

  • Basic Statistics - Measures of Central Tendencies and Variance
  • Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
  • Inferential Statistics -Sampling - Concept of Hypothesis Testing
  • Statistical Methods - Z/t-tests (One sample, independent, paired), Anova, Correlation and Chi-square
  • Important modules for statistical methods: Numpy, Scipy, Pandas

  • Introduction to Machine Learning & Predictive Modeling
  • Types of Business problems - Mapping of Techniques - Regression vs. classification vs. segmentation vs. Forecasting
  • Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
  • Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
  • Overfitting (Bias-Variance Trade off) & Performance Metrics
  • Feature engineering & dimension reduction
  • Concept of optimization & cost function
  • Concept of gradient descent algorithm
  • Concept of Cross validation(Bootstrapping, K-Fold validation etc)
  • Model performance metrics (R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics)

  • Linear & Logistic Regression
  • Segmentation - Cluster Analysis (K-Means)
  • Decision Trees (CART/CD 5.0)
  • Ensemble Learning (Random Forest, Bagging & boosting)
  • Artificial Neural Networks(ANN)
  • Support Vector Machines(SVM)
  • Other Techniques (KNN, Naïve Bayes, PCA)
  • Introduction to Text Mining using NLTK
  • Introduction to Time Series Forecasting (Decomposition & ARIMA)
  • Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
  • Fine tuning the models using Hyper parameters, grid search, piping etc.

  • Applying different algorithms to solve the business problems and bench mark the results

  • Introduction and Relevance
  • Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
  • Problems with Traditional Large-Scale Systems

  • Motivation for Hadoop
  • Different types of projects by Apache
  • Role of projects in the Hadoop Ecosystem
  • Key technology foundations required for Big Data
  • Limitations and Solutions of existing Data Analytics Architecture
  • Comparison of traditional data management systems with Big Data management systems
  • Evaluate key framework requirements for Big Data analytics
  • Hadoop Ecosystem & Hadoop 2.x core components
  • Explain the relevance of real-time data
  • Explain how to use Big Data and real-time data as a Business planning tool

  • Hadoop Master-Slave Architecture
  • The Hadoop Distributed File System - Concept of data storage
  • Explain different types of cluster setups(Fully distributed/Pseudo etc)
  • Hadoop cluster set up - Installation
  • Hadoop 2.x Cluster Architecture
  • A Typical enterprise cluster – Hadoop Cluster Modes
  • Understanding cluster management tools like Cloudera manager/Apache ambari

  • HDFS Overview & Data storage in HDFS
  • Get the data into Hadoop from local machine(Data Loading Techniques) - vice versa
  • Map Reduce Overview (Traditional way Vs. MapReduce way)
  • Concept of Mapper & Reducer
  • Understanding MapReduce program Framework
  • Develop MapReduce Program using Java (Basic)
  • Develop MapReduce program with streaming API) (Basic)

  • Integrating Hadoop into an Existing Enterprise
  • Loading Data from an RDBMS into HDFS by Using Sqoop
  • Managing Real-Time Data Using Flume
  • Accessing HDFS from Legacy Systems

  • Introduction to Data Analysis Tools
  • Apache PIG - MapReduce Vs Pig, Pig Use Cases
  • PIG’s Data Model
  • PIG Streaming
  • Pig Latin Program & Execution
  • Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
  • Writing JAVA UDF’s
  • Embedded PIG in JAVA
  • PIG Macros
  • Parameter Substitution
  • Use Pig to automate the design and implementation of MapReduce applications
  • Use Pig to apply structure to unstructured Big Data

  • Apache Hive - Hive Vs. PIG - Hive Use Cases
  • Discuss the Hive data storage principle
  • Explain the File formats and Records formats supported by the Hive environment
  • Perform operations with data in Hive
  • Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
  • Hive Script, Hive UDF
  • Hive Persistence formats
  • Loading data in Hive - Methods
  • Serialization & Deserialization
  • Handling Text data using Hive
  • Integrating external BI tools with Hadoop Hive

  • Impala & Architecture
  • How Impala executes Queries and its importance
  • Hive vs. PIG vs. Impala
  • Extending Impala with User Defined functions

  • NoSQL database - Hbase
  • Introduction Oozie

  • Introduction to Apache Spark
  • Streaming Data Vs. In Memory Data
  • Map Reduce Vs. Spark
  • Modes of Spark
  • Spark Installation Demo
  • Overview of Spark on a cluster
  • Spark Standalone Cluster

  • Invoking Spark Shell
  • Creating the Spark Context
  • Loading a File in Shell
  • Performing Some Basic Operations on Files in Spark Shell
  • Caching Overview
  • Distributed Persistence
  • Spark Streaming Overview(Example: Streaming Word Count)

  • Analyze Hive and Spark SQL Architecture
  • Analyze Spark SQL
  • Context in Spark SQL
  • Implement a sample example for Spark SQL
  • Integrating hive and Spark SQL
  • Support for JSON and Parquet File Formats Implement Data Visualization in Spark
  • Loading of Data
  • Hive Queries through Spark
  • Performance Tuning Tips in Spark
  • Shared Variables: Broadcast Variables & Accumulators

  • Extract and analyze the data from twitter using Spark streaming
  • Comparison of Spark and Storm – Overview

  • Overview of GraphX module in spark
  • Creating graphs with GraphX

  • Understand Machine learning framework
  • Implement some of the ML algorithms using Spark MLLib

  • Consolidate all the learnings
  • Working on Big Data Project by integrating various key components

Case Studies

Objective: The objective of the case study is to provide end to end steps to build and validate regression model to identify the key drivers of customer spend using Python-Spark. Problem Statement: One of the leading banks would like to identify key drivers for customer spending so that they can define strategy to optimize the product features.

Objective: The objective of the case study is to provide end to end steps to build and validate classification model using python-spark Problem Statement: One of the leading banks would like to predict bad customers (Defaulters) based on the customer data provided by them in their application

Objective: The objective of the case study to apply advanced algorithms like factor and cluster analysis for data reduction and customer segmentation based on the customer behavioural data Problem Statement: Build an enriched customer segmentation and profile them using different KPIs for one of the leading telecom company to define marketing strategy

Objective: The objective of the case study to given hands-on experience on how to apply/use different time series forecasting techniques (Averages/Smoothening, decomposition, ARIMA etc) Problem Statement: One of the leading travel companies would like predict number of air passengers travelling to Europe so that they can define their marketing strategy accordingly

FAQS

Don’t worry. You will always get a recording for the class in your inbox. Have a look at that and reach out to the faculty in case of doubts. All our live classes are recorded for self-study purpose and future reference, and these can also be accessed through our Learning Management System. Hence, in case you miss a class, you can refer to the video recording and then reach out to the faculty during their doubts clearing time or ask your question in the beginning of the subsequent class.

You can also repeat any class you want in the next one year after your course completion.

1 year post your course commencement. If needed, you can also repeat any number of classes you want in the next one year after course completion. Batch change policies will however, apply in this case.

In case required because any genuine reasons, the recordings access can be extended further for upto 1 year post the completion of one year validity. Please note that given the constant changes in the Analytics industry, our courses continue to be upgraded and hence old courses might no longer hold relevance. Hence, we do not promise lifetime access just for marketing purposes. 

No. Our recordings can be accessed through your account on LMS or stream them live online at any point of time though.

Recordings are integral part of AnalytixLabs intellectual property by Suo Jure. The downloading/distribution of these recordings in anyway is strictly prohibited and illegal as they are protected under copyright act. Incase a student is found doing the same, it will lead to an immediate and permanent suspension in the services, access to all the learning resources will be blocked, course fee will be forfeited and the institute will have all the rights to take strict legal action against the individual.

The sharing of LMS login credentials is unauthorized, and as a security measure, if the LMS is accessed by multiple places, it will flag in the system and your access to LMS can be terminated.

Yes. All our course are certified. As part of the course, students get weekly assignments and module-wise case studies. Once all your submissions are received and evaluated, the certificate shall be awarded.

We follow a comprehensive and a self-sustaining system to help our students with placements. This is a win-win situation for our candidates and corporate clients. As a pre-requisite for learning validation, candidates are required to submit the case studies and project work provided as a part of the course (flexible deadline). Support from our side is continuous and encompasses help in profile building, CV referrals (as and when applicable)  through our ex-students, HR consultants and companies directly reaching out to us.

We will provide guidance to you in terms of what are the right profiles for you based on your education and experience, interview preparation and conducting mock interviews, if required. The placement process for us doesn’t end at a definite time post your course completion, but is a long relationship that we will like to build.

No institute can guarantee placements, unless they are doing so as a marketing gimmick! It is on a best effort basis.

In professional environment, it is not feasible for any institute to do so, except for a marketing gimmick. For us, it is on a best effort basis but not time – bound – in some cases students reach out to us even after 3 years for career support.

Yes we have classroom option for Delhi-NCR candidates. However, most of our students end up doing instructor led live online classes, including those who join classroom in the beginning. Based on the student feedback, the learning experience is same both in classroom and instructor led live online fully interactive mode.

We provide both the options and for instructor led live online classes we use the gold standard platform used by the top universities across the globe. These video sessions are fully interactive and students can chat or even ask their questions verbally over the VoIP in real time to get their doubts cleared.

To attend the online classes, all you need is a laptop/PC with a basic internet connection. Students have often shared good feedback of attending these live classes through their data card or even their mobile 3G connection, though we recommend a basic broadband connection.

For best user experience, a mic-headphone is recommended to enhance the voice quality, though the laptop’s in-built mic works fine and you can ask your question over the chat as well.

Through the LMS, students can always connect with the trainer or even schedule one-to-one time over the phone or online. During the course we also schedule periodic doubts-clearing classes though students can also ask doubts of a class in the subsequent class.

LMS also has a discussion forum where a lot of your doubts might get easily answered.

Incase you are having a problem still, repeat the class and schedule one-to-one time with the trainer.

  • Instructor Led Live online or Classroom - Within 7 days of registartion date and latest 3 days before batch start
  • Video-based - 2 days

Yes. While making the fee payment, most of the courses have the installment option.

For all the courses, we also provide the recordings of each class for their self-reference as well as revision in case you miss any concept in the class. In case you still have doubts after revising through the recordings, you can also take one-to-one time with the faculty outside classes during. Furthermore, if students want to break their courses in different modules, they get one year time to repeat any of the classes with other batches.

It is recommended to have 64-bit operating system with minimum 8GB RAM so that the virtual lab can be installed easily

"First of all i would like to thank Chandra sir, for guiding me to shine myself with a good spirit and ambition. It is one of the best decisions I made in my life to switch my career in analytics and AnalytixLabs provided me with the right platform and support. I was taught in a traditional education system myself for most of my life and this whole side of creative teaching techniques were new to me. I am so happy that I was part of this course which helped me to think different and be craetive. It has impacted my personality in a positive way and gave me a fresh perspective about analytics field.I am so excited to practice what I learned in this course. I am thankful for all the help my tutors and trainers Mr Satish, Mr Shantanu have provided me. I really appreciate their kindness, timely help and detailed feedback.Most of the case studies(retail,banking,customer segmentation and sales forecasting) what we do there are more related to present business scenarios and those are pretty much helpful you to gain good insights from the data and it helps you to solve lot more business cases. I feel more confident now than I was before. I am thankful to Analytix Labs for giving me this opportunity to make new friends and to groom myself for my future analyst career. Analytixlabs will always ready to provide job assistance in different organisations after successful completion of the course and they will help you in resume building."


Satya Narayana
(Jr. Data Analyst)

Change the course of your career

Over 6000 learners and hundreds making right choice every month!
Course Brochure
Student Reviews
Upcoming Batches