Best Practices for Deploying Hadoop Server on CentOS/RHEL 8


Hadoop is an open-source framework that is used for distributed storage and processing of large datasets. It provides a reliable, scalable, and efficient way to manage Big Data. CentOS/RHEL 8 is a popular Linux distribution that can be used to deploy a Hadoop server. However, deploying Hadoop on CentOS/RHEL 8 can be a complex process, and there are several best practices that should be followed to ensure a successful deployment.

In this article, we will discuss best practices for deploying a Hadoop server on CentOS/RHEL 8. We will cover following sub-headings −

  • Pre-requisites for Deploying Hadoop on CentOS/RHEL 8

  • Installing Java

  • Installing Hadoop

  • Configuring Hadoop

  • Starting Hadoop Services

  • Testing Hadoop

Pre-requisites for Deploying Hadoop on CentOS/RHEL 8

Before deploying Hadoop on CentOS/RHEL 8, you need to ensure that following pre-requisites are met −

  • A CentOS/RHEL 8 server with a minimum of 4 GB of RAM and 2 CPU cores.

  • A user account with sudo privileges.

  • Network connectivity to internet.

Installing Java

Hadoop requires Java to be installed on server. CentOS/RHEL 8 comes with OpenJDK pre-installed, but it is recommended to install Oracle JDK as it is more stable and performs better.

To install Oracle JDK, follow steps below

Download Oracle JDK tarball from Oracle website.

Extract tarball using following command −

tar -xvf jdk-8u281-linux-x64.tar.gz

Move extracted directory to /opt using following command −

sudo mv jdk1.8.0_281 /opt/

Set JAVA_HOME environment variable by adding following line to /etc/environment file −

JAVA_HOME=/opt/jdk1.8.0_281

Reload environment variables using following command −

source /etc/environment

Installing Hadoop

To install Hadoop on CentOS/RHEL 8, follow steps below

Download Hadoop tarball from Apache website.

Extract tarball using following command −

tar -xvf hadoop-3.3.0.tar.gz

Move extracted directory to /opt using following command −

sudo mv hadoop-3.3.0 /opt/

Set HADOOP_HOME environment variable by adding following line to /etc/environment file −

HADOOP_HOME=/opt/hadoop-3.3.0

Reload environment variables using following command −

source /etc/environment

Configuring Hadoop

After installing Hadoop, you need to configure it to work with your cluster. configuration files are located in $HADOOP_HOME/etc/hadoop directory. two main configuration files that you need to modify are core-site.xml and hdfs-site.xml.

Configuring core-site.xml

The core-site.xml file contains configuration properties for Hadoop's core services. To configure core-site.xml, follow steps below −

Open core-site.xml file using a text editor −

sudo vi $HADOOP_HOME/etc/hadoop/core-site.xml

Add following configuration properties to file −

<configuration>
   <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

Save and close file.

Configuring hdfs-site.xml

The hdfs-site.xml file contains configuration properties for Hadoop's distributed file system. To configure hdfs-site.xml, follow steps below −

Open hdfs-site.xml file using a text editor −

sudo vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add following configuration properties to file −

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/hadoop/data/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hadoop/data/datanode</value>
   </property>
</configuration>

Save and close file.

Starting Hadoop Services

After configuring Hadoop, you need to start Hadoop services. To start Hadoop services, follow steps below −

Format Hadoop file system by running following command −

hdfs namenode -format

Start Hadoop daemons by running following command −

start-all.sh

Testing Hadoop

After starting Hadoop services, you need to test Hadoop installation to ensure that it is working properly. To test Hadoop, follow steps below −

Create a test file in Hadoop by running following command −

hdfs dfs -touchz /test.txt

Verify that file was created by running following command −

hdfs dfs -ls /

Remove test file by running following command −

hdfs dfs -rm /test.txt

If above commands execute without any errors, then Hadoop is working properly.

Here are some additional best practices that can be helpful when deploying a Hadoop server on CentOS/RHEL 8 −

Secure Hadoop Cluster − By default, Hadoop does not have any security measures in place. To secure your Hadoop cluster, you should enable authentication and authorization, enable encryption, and configure firewalls.

Optimize Hadoop Performance − Hadoop performance can be improved by tuning various parameters such as block size, replication factor, and memory allocation. You can also use techniques like data compression and data partitioning to optimize Hadoop performance.

Backup and Restore Hadoop Data − Hadoop is designed to handle large datasets, which makes it difficult to backup and restore data. To backup and restore Hadoop data, you can use tools like DistCp and Hadoop Archive.

Monitor Hadoop Cluster − Monitoring your Hadoop cluster is important to ensure that it is running smoothly and efficiently. You can use various monitoring tools such as Ganglia, Nagios, and Ambari to monitor your Hadoop cluster.

Upgrade Hadoop − As new versions of Hadoop are released, it is important to upgrade your Hadoop cluster to take advantage of new features and bug fixes. Before upgrading Hadoop, you should backup your data and test upgrade on a non-production environment.

By following these best practices, you can ensure that your Hadoop deployment on CentOS/RHEL 8 is secure, optimized, and efficient. Hadoop is a powerful tool for managing Big Data, and with right deployment strategy, you can take advantage of its capabilities to extract insights from your data.

Conclusion

In conclusion, deploying Hadoop on CentOS/RHEL 8 can be a complex process, but following best practices outlined in this article can make process smoother and more efficient. By ensuring that pre-requisites are met, installing Java and Hadoop correctly, configuring Hadoop properly, starting Hadoop services, and testing Hadoop, you can deploy a Hadoop server on CentOS/RHEL 8 with confidence.

Updated on: 10-Apr-2023

328 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements