How to Monitor System Usage, Outages and Troubleshoot Linux Servers?


In today's technology-driven world, Linux servers play a crucial role in powering various applications and services. As system administrators or DevOps professionals, it is essential to monitor the usage of Linux servers, detect outages, and troubleshoot any issues that may arise. This ensures optimal performance, reliability, and availability of the systems.

In this blog post, we will explore practical techniques and tools for monitoring system usage, detecting outages, and troubleshooting Linux servers. By implementing these practices, you can proactively identify potential problems, mitigate risks, and maintain a healthy server infrastructure.

Monitoring System Usage

To ensure optimal performance and resource management, it is crucial to monitor system usage on Linux servers. Let's explore some essential tools and commands for monitoring different aspects of system usage 

Using "top" Command to Monitor System Resources

The "top" command provides a real-time overview of system resource usage, including CPU, memory, and processes. Simply run the following command in your terminal −

top

It displays a dynamic table with detailed information about each process, CPU usage, memory consumption, and more. Pressing 'q' will exit the "top" command.

Monitoring CPU Usage and Load Average

To check CPU usage, you can use the "mpstat" command. Run the following command −

mpstat

It shows CPU usage statistics, including idle, user, system, and more. Another useful command is "uptime" which provides the load average over different time periods 

uptime

The load average indicates the average number of processes in the run queue and waiting for CPU time.

Monitoring Memory Usage and Swap Usage

The "free" command provides information about memory usage, including total, used, and free memory. Run the following command −

free -h

To check swap usage, use the "swapon" command 

swapon --show

It shows the swap space usage on your system.

Monitoring Disk Usage and I/O operations

The "df" command displays disk space usage for mounted file systems 

df -h

To monitor disk I/O operations, you can use the "iotop" command 

iotop

It provides real-time information about disk I/O usage by processes.

Monitoring Network Activity and Connections

The "iftop" command allows you to monitor network bandwidth usage in real-time 

iftop

It displays a table showing network connections, data transfer rates, and more.

These are just a few examples of the tools and commands available for monitoring system usage on Linux servers. By regularly monitoring these metrics, you can gain insights into resource utilization and identify any potential bottlenecks or performance issues.

Next, we will explore how to detect and troubleshoot outages on Linux servers.

Detecting and Troubleshooting Outages

Detecting and resolving outages is crucial for maintaining the availability and reliability of your Linux servers. Let's explore some techniques and tools to help you detect and troubleshoot outages effectively −

Using the "ping" Command to Check Network Connectivity

The "ping" command allows you to check the reachability and response time of a remote server or IP address. Run the following command 

ping example.com

It sends ICMP echo requests to the specified host and displays the round-trip time and packet loss information. This can help you determine if there are any network connectivity issues.

Checking DNS Resolution using "nslookup" or "dig"

To verify DNS resolution, you can use the "nslookup" or "dig" command. For example 

nslookup example.com

or

dig example.com

These commands retrieve the IP address associated with the specified domain and provide information about the DNS resolution process.

Monitoring System Logs for Errors and Warnings

System logs, such as the syslog or journal, contain valuable information about system events, errors, and warnings. You can use commands like "grep" or "tail" to filter and view specific log entries 

grep "error" /var/log/syslog

or

tail -n 50 /var/log/syslog

By monitoring system logs, you can identify any anomalies or issues that might be causing outages.

Analyzing Apache or Nginx Access Logs for Web Server Issues

For web servers like Apache or Nginx, access logs can provide insights into potential issues or attacks. Use commands like "grep" or "tail" to analyze the logs 

grep "500" /var/log/apache2/access.log

or

tail -n 50 /var/log/nginx/access.log

This helps you identify any error responses or suspicious activity that might impact web server performance.

Checking Service Status and Restarting Services if Needed

Regularly checking the status of critical services is important to ensure they are running properly. Use commands like "systemctl" to check and restart services 

systemctl status apache2

or

systemctl restart apache2

Next, we will focus on troubleshooting performance issues that might impact server performance and responsiveness.

Troubleshooting Performance Issues

When it comes to Linux server management, troubleshooting performance issues is a critical skill. Let's explore some strategies and tools that can help you identify and resolve performance problems:

Using "top" and "htop" to Identify Resource-hungry Processes

The "top" and "htop" commands provide real-time information about CPU and memory usage, allowing you to identify processes that consume excessive resources. Run the following commands 

top

or

htop

These commands display a list of running processes along with CPU and memory utilization. Look for processes with high CPU or memory usage that might be causing performance issues.

Analyzing CPU Performance using "sar" or "mpstat"

The "sar" command collects and reports system resource utilization, including CPU statistics. Run the following command 

sar -u 1 5

This command displays CPU usage at one-second intervals for a total of five times. You can also use the "mpstat" command to monitor CPU performance 

mpstat -P ALL

It provides detailed CPU statistics, including per-core utilization.

Monitoring Disk I/O using "iotop" or "iostat"

To analyze disk I/O performance, you can use the "iotop" command, as mentioned earlier. Additionally, the "iostat" command provides detailed I/O statistics for devices and partitions 

iostat -d -x 1 5

This command displays disk I/O utilization at one-second intervals for a total of five times. It helps you identify any potential disk I/O bottlenecks.

Investigating Memory Usage with "free" and "vmstat"

The "free" command, as mentioned earlier, provides information about memory usage. Additionally, the "vmstat" command offers insights into virtual memory statistics 

vmstat 1 5

This command displays system-wide memory utilization, including swap usage and page faults, at one-second intervals for a total of five times. It helps you understand memory patterns and potential issues.

Profiling Application Performance with "strace" or "perf"

To delve into the performance of specific applications, you can use tools like "strace" or "perf". For example, the "strace" command traces system calls made by a process 

strace -p <pid>

This allows you to analyze the interactions between the application and the operating system.

Conclusion

Effectively monitoring and troubleshooting Linux servers is essential for maintaining system reliability and optimal performance. By following the techniques and utilizing the tools mentioned in this article, you can proactively detect outages, identify performance bottlenecks, and resolve issues promptly. Regularly checking system usage, monitoring logs, and utilizing performance analysis tools empowers you to take proactive measures, minimizing downtime and ensuring smooth server operation.

Updated on: 09-Aug-2023

104 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements