Setting Up Hadoop Pre-requisites and Security Hardening


You must meet specific requirements and put security hardening into place before you can set up Hadoop. Install the essential software prerequisites first, such as Java Development Kit (JDK) and Secure Shell (SSH). Before establishing the network settings, verify that the DNS resolution and firewall rules are accurate. Then, make sure that access is safe by creating user accounts for Hadoop services and assigning the proper permissions. Harden Hadoop's security by activating Kerberos-based authentication and authorisation systems and setting up SSL/TLS for secure communication. To further safeguard sensitive data housed in Hadoop clusters, update security patches on a regular basis and put in place stringent access controls.

Methods Used

  • Manual Installation.

  • Hadoop Distributions and Deployment Tools.

Manual Installation

Manual installation entails carrying out the required steps directly on a Linux system in the context of configuring Hadoop prerequisites and implementing security hardening. Using package managers, install necessary software dependencies like JDK and SSH. Edit configuration files to change network settings, DNS resolution, and firewall rules. Set up SSH access and create user accounts with the proper permissions for the Hadoop services. Installing and configuring the required packages will enable authentication and authorization systems like Kerberos. By creating and installing SSL/TLS certificates, you can secure communication. To safeguard critical data housed in Hadoop clusters, employ stringent access rules and regularly install security fixes. The setup process can be more flexible and in your control with manual installation.

Algorithm

  • Install Software Dependencies 

  • Install JDK and SSH as well as other necessary software dependencies using a package manager (such as apt or yum).

sudo apt update
sudo apt install openjdk-8-jdk ssh
  • Adjust Network Settings 

  • Edit the required network configuration files to adjust DNS resolution and firewall rules in line with your network environment.

sudo apt update
sudo apt install openjdk-8-jdk ssh
  • Establish User Accounts 

  • The "useradd" command can be used to establish user accounts for Hadoop services like HDFS and YARN.

  • To provide secure access and limit rights as needed, set the right permissions for the user accounts.

sudo useradd -m -s /bin/bash hadoop_user
  • Establish SSH Access 

    SSH key pairs can be generated with the "ssh-keygen" command.

  • Copies the public keys of each Hadoop service user to the authorized_keys file to provide safe SSH access.

ssh-keygen -t rsa -b 4096
  • The authorized_keys file for the Hadoop user should be added.

cat ~/.ssh/id_rsa.pub >> /home/hadoop_user/.ssh/authorized_keys
  • Make authentication and authorization possible 

  • To enable safe user authentication, install and setup Kerberos or another authentication system.

  • Set up access control policies to impose user permission restrictions and enforce authorization.

sudo apt install krb5-user

Configure Kerberos by editing the krb5.conf file 

sudo nano /etc/krb5.conf
  • Establish SSL/TLS for Secure Communications 

  • For the Hadoop services, create SSL/TLS certificates using software like OpenSSL.

  • Install the granted certificates before configuring the needed Hadoop components to permit secure communication.

openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem

Example configuration for Hadoop core-site.xml 

<property>
   <name>hadoop.ssl.enabled</name>
   <value>true</value>
</property>
<property>
   <name>hadoop.ssl.keystores.factory.class</name>
  <   value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
</property>
<property>
   <name>hadoop.ssl.server.conf</name>
   <value>ssl-server.xml</value>
</property>
  •  Apply security patches after routinely checking for updates and security fixes for installed program dependencies.

  • Quickly apply the fixes to eliminate any potential security flaws.

sudo apt update && sudo apt upgrade
  • Put in place Strict Access Controls − Configure network access and firewall rules to limit who can access the Hadoop cluster.

  • Establish strict password regulations and make sure that only people with permission can access important information.

sudo apt install ufw
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw enable

Hadoop Distributions and Deployment Tools

Setting up Hadoop prerequisites and implementing security hardening are made simpler by Hadoop distributions and deployment tools. These solutions provide pre-packaged Hadoop installations with built-in software dependencies and security settings. Users can easily setup network settings, manage user accounts, and enable security features by following the provided documentation and implementation guidance. Additionally, Hadoop cluster setup and management are simplified by deployment tools like Ambari, which automate numerous configuration procedures and offer a user-friendly web-based interface. Utilizing Hadoop distributions and deployment tools speeds up setup, assures consistency, and makes it easier to effectively harden the Hadoop environment's security.

Algorithm

Installing Requirements:

  • To install JDK, use a package manager (such as apt or yum) 

sudo apt install default-jdk
  • To enable remote access, install SSH 

sudo apt install openssh-server
  • Setup network settings

sudo nano /etc/hosts
  • For Hadoop services, create user accounts 

sudo adduser hadoop_user
  • Create user accounts for Hadoop services 

sudo chown -R hadoop_user:hadoop_group /hadoop_directory

Security strengthening

  • Make Kerberos authentication available 

sudo apt install krb5-user
sudo nano /etc/krb5.conf
  • Set up SSL/TLS 

sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/private/hadoop.key -out /etc/ssl/certs/hadoop.crt
sudo nano /etc/hadoop/hadoop-env.sh
  • Install security updates 

sudo apt update
sudo apt upgrade
  • Put access controls in place −

sudo chmod 700 /hadoop_directory
sudo ufw allow 22  # SSH access
sudo ufw enable

Automation with Configuration Management Tools is optional.

  • Automate installation and configuration chores using Ansible.

  • Ansible playbooks should be defined with tasks for each phase.

  • Use the ansible-playbook command to run the playbooks on the target Linux computers.

Optional − Distributions and deployment tools for Hadoop

  • Pick a distribution of Hadoop like Cloudera or Hortonworks.

  • Observe the distribution's documentation and deployment instructions.

  • For streamlined Hadoop cluster setup and maintenance, use deployment tools like Ambari.

Conclusion

In conclusion, establishing the prerequisites for Hadoop and putting security hardening measures in place are essential for a safe and effective Hadoop environment. The security of the Hadoop cluster is increased by installing the required software dependencies, setting up the network, creating user accounts, and turning on authentication technologies like Kerberos. Sensitive data stored in Hadoop clusters is further protected from unauthorized access by establishing SSL/TLS for secure connection, installing security patches often, and putting in place stringent access controls. Additional ease and effectiveness in the setup process are provided by optional options like automation with configuration management tools or using Hadoop distributions and deployment tools. These measures collectively provide a strong and secure Hadoop infrastructure.

Updated on: 03-Aug-2023

116 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements