Hadoop Articles

Found 15 articles

Setting Up Hadoop Pre-requisites and Security Hardening

Ayush Singh
Ayush Singh
Updated on 17-Mar-2026 464 Views

Hadoop Pre-requisites and Security Hardening involves installing essential software dependencies, configuring network settings, creating secure user accounts, and implementing authentication mechanisms before deploying a Hadoop cluster. This process ensures that the distributed computing environment operates securely with proper access controls and encrypted communications. Methods Used Manual Installation − Direct configuration on Linux systems using package managers and command-line tools. Hadoop Distributions and Deployment Tools − Using pre-packaged solutions like Cloudera or Hortonworks with automated setup tools. Manual Installation Manual installation provides complete control over the Hadoop setup process. This approach involves directly installing software ...

Read More

Difference between Hadoop and MongoDB

Pradeep Kumar
Pradeep Kumar
Updated on 15-Mar-2026 947 Views

Hadoop is a Java-based distributed computing framework designed to store and analyze large volumes of data across multiple computer clusters. It processes enormous amounts of structured and unstructured data through its core components: HDFS (Hadoop Distributed File System) for storage and MapReduce for parallel data processing. MongoDB is an open-source NoSQL document database that stores data in BSON format rather than traditional tables, rows, and columns. It's designed to solve performance, availability, and scalability issues of SQL-based databases by offering a flexible, document-oriented approach to data storage. What is Hadoop? Apache Hadoop is a distributed computing platform ...

Read More

Difference Between RDBMS and Hadoop

Shirjeel Yunus
Shirjeel Yunus
Updated on 14-Mar-2026 4K+ Views

RDBMS stores structured data in tables with ACID compliance using SQL. Hadoop is an open-source framework for distributed storage and processing of large-scale structured and unstructured data using HDFS and MapReduce. What is RDBMS? RDBMS (Relational Database Management System) stores data in tables with rows and columns, following ACID properties (Atomicity, Consistency, Isolation, Durability). It is designed for fast storage and retrieval of structured data using SQL. Examples: Oracle, MySQL, PostgreSQL. What is Hadoop? Hadoop is an open-source framework for running distributed applications and storing large-scale data. It handles structured, semi-structured, and unstructured data with high ...

Read More

Difference between Apache Kafka and Flume

Mahesh Parahar
Mahesh Parahar
Updated on 14-Mar-2026 898 Views

Apache Kafka and Apache Flume are both used for real-time data processing and are developed by Apache. Kafka is a general-purpose publish-subscribe messaging system, while Flume is specifically designed for collecting and moving log data into the Hadoop ecosystem (HDFS). Apache Kafka Kafka is a distributed data store optimized for ingesting and processing streaming data in real time. It uses a publish-subscribe model where producers publish messages to topics and consumers pull messages at their own pace. Kafka is highly available, resilient to node failures, and supports automatic recovery. Apache Flume Flume is a distributed system ...

Read More

Difference between Hadoop 1 and Hadoop 2

Mahesh Parahar
Mahesh Parahar
Updated on 14-Mar-2026 9K+ Views

Hadoop is an open-source framework from the Apache Software Foundation, built on Java, designed for storing and processing Big Data across distributed clusters. Apache released Hadoop 2 as a major upgrade over Hadoop 1, introducing YARN for resource management and support for multiple processing models beyond MapReduce. Hadoop 1 Hadoop 1 uses a tightly coupled architecture where MapReduce handles both data processing and cluster resource management. It uses a single NameNode (single point of failure) and relies on fixed map/reduce task slots for resource allocation. Hadoop 1 only supports MapReduce as its processing model. Hadoop 2 ...

Read More

How to Install and Configure Hive with High Availability?

Satish Kumar
Satish Kumar
Updated on 12-May-2023 2K+ Views

Hive is an open-source data warehousing framework built on top of Apache Hadoop. It allows users to query large datasets stored in Hadoop using a SQL-like language called HiveQL. Hive provides an interface for data analysts and developers to work with Hadoop without having to write complex MapReduce jobs. In this article, we will discuss how to install and configure Hive with high availability. High availability (HA) is a critical requirement for any production system. HA ensures that system is always available, even in event of hardware or software failures. In context of Hive, HA means that Hive server is ...

Read More

How to Install and Configure Apache Hadoop on a Single Node in CentOS 8?

Satish Kumar
Satish Kumar
Updated on 12-May-2023 3K+ Views

Apache Hadoop is an open-source framework that allows for distributed processing of large data sets. It can be installed and configured on a single node, which can be useful for development and testing purposes. In this article, we will discuss how to install and configure Apache Hadoop on a single node running CentOS 8. Step 1: Install Java Apache Hadoop requires Java to be installed on system. To install Java, run following command − sudo dnf install java-11-openjdk-devel Step 2: Install Apache Hadoop Apache Hadoop can be downloaded from official Apache website. latest stable version at time of writing ...

Read More

Difference between Mahout and Hadoop

Premansh Sharma
Premansh Sharma
Updated on 13-Apr-2023 678 Views

Introduction In today’s world humans are generating data in huge quantities from platforms like social media, health care, etc., and with this data, we have to extract information to increase business and develop our society. For handling this data and extraction of information from data we use two important technologies named Hadoop and Mahout. Hadoop and Mahout are two important technologies in the field of big data analytics, but they have different functionalities and use cases. Hadoop is primarily used for batch processing, while Mahout is used for building machine-learning models. Ultimately, the choice depends on the user's needs. In ...

Read More

Big Data Servers Explained

Satish Kumar
Satish Kumar
Updated on 10-Apr-2023 939 Views

In era of digitalization, data has become most valuable asset for businesses. Organizations today generate an enormous amount of data on a daily basis. This data can be anything, from customer interactions to financial transactions, product information, and more. Managing and storing this massive amount of data requires a robust and efficient infrastructure, which is where big data servers come in. Big data servers are a type of server infrastructure designed to store, process and manage large volumes of data. In this article, we will delve deeper into what big data servers are, how they work, and some popular examples. ...

Read More

Best Practices for Deploying Hadoop Server on CentOS/RHEL 8

Satish Kumar
Satish Kumar
Updated on 10-Apr-2023 759 Views

Hadoop is an open-source framework that is used for distributed storage and processing of large datasets. It provides a reliable, scalable, and efficient way to manage Big Data. CentOS/RHEL 8 is a popular Linux distribution that can be used to deploy a Hadoop server. However, deploying Hadoop on CentOS/RHEL 8 can be a complex process, and there are several best practices that should be followed to ensure a successful deployment. In this article, we will discuss best practices for deploying a Hadoop server on CentOS/RHEL 8. We will cover following sub-headings − Pre-requisites for Deploying Hadoop on CentOS/RHEL 8 ...

Read More
Showing 1–10 of 15 articles
« Prev 1 2 Next »
Advertisements