Applications Open now for May 2023 Batch | Applications Close: May 10, 2023 | Exam: July 16, 2023

Applications Open now for May 2023 Batch | Applications Close: May 10, 2023 | Exam: July 16, 2023

Degree Level Course

Introduction to Big Data

This course will introduce students to practical aspects of analytics at a large scale, i.e. big data. The course will start with a basic introduction to big data and cloud concepts spanning hardware, systems and software, and then delve into the details of algorithm design and execution at large scale.

by Rangarajan Vasudevan

Course ID: BSCCS3006

Course Credits: 4

Course Type: Elective

Pre-requisites: None

What you’ll learn

Introduction to Cloud Concepts: Cloud-Native architecture, serverless computing, message queues, PaaS, SaaS, IaaS
Introduction to Big Data concepts: divide- and-conquer, parallel algorithms, distributed virtualized storage, distributed resource management, real-time processing.
Technology deep-dive on GCP as the vehicle for the experiments: Google Cloud Storage, GCP Dataflow, DataProc, Google Pub/Sub, Cloud Functions
Analytics at Large Scale: PySpark, BigQuery, Integration with Tensorflow/Pytorch

Course structure & Assessments

11 weeks of coursework, weekly online assignments, 2 in-person invigilated quizzes, 1 in-person invigilated end term exam. For details of standard course structure and assessments, visit Academics page.

WEEK 1 Introduction: ​Big data concepts & GCP Platform Setup
WEEK 2 Cloud concepts​: ​Cloud-Native architecture, serverless computing, message queues, PaaS, SaaS, IaaS
WEEK 3 Types of Data​:​ Data formats, sources & their semantics, processing & storage options on Cloud. Use of serverless to get started (e.g. Google Cloud Functions)
WEEK 4 Intro to Big Data Engineering​:​ Hadoop and PySpark
WEEK 5 ELT​:​ ETL, processing patterns for large data, ETL vs ELT, role of a scheduler
WEEK 6 SQL & NoSQL: For most analysis tasks, SQL is sufficient. Tools like Spark SQL allow that familiarity to translate to big data solutions. Types of NoSQL, evolution, best-of-fit options.
WEEK 7 Streaming​: Overview, Fundamental Concepts, Walkthrough of Google Pub/Sub & Google DataFlow as example technologies
WEEK 8 Streaming​: Kafka as another example of message queue technology & Spark Streaming
WEEK 9 Big Data ML​:​ DataProc with ML - including Spark ML (Batch processing)
WEEK 10 Deep Learning​ with big data on cloud.
WEEK 11 Prep week for final project, summarizing key concepts, and also for Q&A and clarifications
+ Show all weeks

About the Instructors

Rangarajan Vasudevan
Co-Founder & Chief Data Officer ,

Rangarajan Vasudevan is the Co-Founder & CDO of, India’s fastest growing lending cloud. He did “big data” & “data science” before it was fashionable, building data-native applications across industries and geographies over 15+ years.

...  more

Ranga joined Lentra by way of an acquisition in June 2022 of his company TheDataTeam, creators of customer intelligence platform. Prior to founding TheDataTeam, Ranga served as Director, Big Data with Teradata Corporation’s international business unit. Ranga joined Teradata via the acquisition of Aster Data Systems, where he was a founding engineer and co-invented a company-defining, patented, pattern recognition algorithm. He is a recipient of both the Distinguished Engineer (R&D) and Consulting Excellence awards while at Teradata.

Ranga has degrees in Computer Science from the University of Michigan and IIT Madras.