Tutorialspoint

April Learning Carnival is here, Use code FEST10 for an extra 10% off

Apache Druid : Complete Guide

person icon Ganesh Dhareshwar

4.4

Apache Druid : Complete Guide

Learn Druid Architecture, Kafka Ingestion, Schema Evolution, Tuning and Druid Hive Integration with Twitter example

updated on icon Updated on Apr, 2024

language icon Language - English

person icon Ganesh Dhareshwar

category icon Apache Hive,Big Data

Lectures -21

Resources -8

Duration -2 hours

4.4

price-loader

30-days Money-Back Guarantee

Training 5 or more people ?

Get your team access to 10000+ top Tutorials Point courses anytime, anywhere.

Course Description

What do you learn from this course ?

In this course, we learn end-to-end apache druid salient features and integration with Apache Hive. We start  this course by gaining theoretical knowledge on Druid and its key features.

Next, we jump to  practical part where we install druid locally and walk you through its user portal. We change the druid metadata storage to Mysql and deep storage to S3 for enhancing the druid setup.  After that, we write our own Twitter Producer app which pulls the tweets from twitter in realtime and push the tweets to apache Kafka.

We create a Kafka ingestion task on Druid which pull tweets from Kafka and store it into Apache Druid. Also, we learn how to apply transformation, filter, schema configuration during Kafka ingestion process.

Keeping  practical knowledge in mind we jump to the theory part and dig  deeper into the druid internal working principal. We learn, how the data is distributed between the data nodes and retrieved in realtime.  Next, we tune our ingestion pipeline to  gain better result. Lastly, we explore salient features like Accessing Druid through JDBC and Schema Evolution.

In the 2nd module, we talk about druid hive integration. At first, we learn what is this integration ? Next, we provision VM from AWS and install apache druid on it. After that, we acquire a hive EMR  cluster from AWS and configure it such that it can communicate to druid easily. Lastly, we run the same druid queries on hive and learn how the computation is pushdown to druid for better performance.

Overall, this course is composite of theory and practical sessions. Throughout this course we use latest druid and hive version. At the end  of this course, you will be excel on apache druid.

Goals

What will you learn in this course:

  • In depth knowledge on Druid Components and it's Architecture
  • Realtime data ingestion from Apache Kafka using Twitter Producer application
  • Tuning Apache Druid for better throughput
  • Accessing Apache Druid Tables through Avatica JDBC driver
  • Learning Schema Evolution
  • Complete Druid Hive Integration with hands-on  experience

Prerequisites

What are the prerequisites for this course?

  • Basics of Apache Kafka, Apache Hive
  • Practical experience on Mysql,  AWS
Apache Druid :  Complete Guide

Curriculum

Check out the detailed breakdown of what’s inside the course

Introduction
1 Lectures
  • play icon Introduction 03:03 03:03
Apache Druid
15 Lectures
Tutorialspoint
Druid Hive Integration
5 Lectures
Tutorialspoint

Instructor Details

Ganesh Dhareshwar

Ganesh Dhareshwar

Data Engineer | Myntra | Udemy Instructor | Corporate Trainer
6+ years working experience from many fast growing companies like Myntra Designs Pvt. Ltd, Swiggy, Lendingkart Technologies Private Limited, Sprinklr. Currently working as Big Data Engineer at Myntra Designs Pvt. Ltd. Highly dedicated, self-motivated and confident person. Hands-on experience in developing large scale web application, writing Spark Jobs, developing Api’s etc. Coding is my passion and I love exploring new programming methodologies. I am a self motivated, highly enthusiastic and dedicated person. I keep myself occupied with learning and practising it. I own a blog and publish wonderful articles on this. I worked on multiple tech stacks. I love to conduct experiments by comparing each other and publish the results on my blog. I excel in Apache Airflow, Spark, Hadoop, Hive, Presto, Kafka, Spark Streaming, Apache Druid etc Big data technologies. I worked on Mysql, MongoDB, Memsql, PostgreSQL, Redis databases.

Course Certificate

Use your certificate to make a career change or to advance in your current career.

sample Tutorialspoint certificate

Our students work
with the Best

Related Video Courses

View More

Annual Membership

Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video Courses

Subscribe now
Annual Membership

Online Certifications

Master prominent technologies at full length and become a valued certified professional.

Explore Now
Online Certifications

Talk to us

1800-202-0515