Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to overplot a line on a scatter plot in Python?
Overplotting a line on a scatter plot combines scattered data points with a trend line or reference line. This technique is useful for showing relationships, trends, or theoretical models alongside actual data points.
Basic Approach
Create the scatter plot first using scatter(), then add the line using plot() on the same axes ?
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
x_data = np.linspace(0, 10, 20)
y_data = 2 * x_data + 1 + np.random.normal(0, 2, 20) # Linear with noise
# Create scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, color='blue', alpha=0.6, label='Data points')
# Add trend line
x_line = np.linspace(0, 10, 100)
y_line = 2 * x_line + 1 # Theoretical line
plt.plot(x_line, y_line, color='red', linewidth=2, label='Trend line')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.legend()
plt.title('Scatter Plot with Overplotted Line')
plt.show()
Multiple Lines on Scatter Plot
You can add multiple lines to show different relationships or models ?
import matplotlib.pyplot as plt
import numpy as np
# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2.1, 3.9, 6.2, 8.1, 9.8, 12.2, 14.1, 15.9, 18.2, 20.1])
plt.figure(figsize=(10, 6))
# Scatter plot
plt.scatter(x, y, color='blue', s=50, alpha=0.7, label='Actual data')
# Linear trend line
linear_fit = np.polyfit(x, y, 1)
plt.plot(x, np.polyval(linear_fit, x), color='red', linewidth=2, label='Linear fit')
# Quadratic trend line
quad_fit = np.polyfit(x, y, 2)
x_smooth = np.linspace(1, 10, 100)
plt.plot(x_smooth, np.polyval(quad_fit, x_smooth), color='green', linewidth=2, label='Quadratic fit')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.legend()
plt.title('Scatter Plot with Multiple Trend Lines')
plt.grid(True, alpha=0.3)
plt.show()
Using Seaborn for Enhanced Plots
Seaborn provides scatterplot() and lineplot() functions that can be combined easily ?
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create sample dataset
np.random.seed(42)
data = pd.DataFrame({
'x': np.linspace(1, 10, 30),
'y': 3 * np.linspace(1, 10, 30) + np.random.normal(0, 3, 30)
})
plt.figure(figsize=(8, 6))
# Scatter plot with seaborn
sns.scatterplot(data=data, x='x', y='y', color='blue', s=60, alpha=0.7)
# Add regression line
sns.regplot(data=data, x='x', y='y', scatter=False, color='red', line_kws={'linewidth': 2})
plt.title('Scatter Plot with Regression Line (Seaborn)')
plt.show()
Key Parameters
| Parameter | Function | Description |
|---|---|---|
alpha |
scatter() | Controls point transparency (0-1) |
linewidth |
plot() | Sets line thickness |
label |
Both | Adds legend labels |
color |
Both | Sets colors for points/lines |
Best Practices
Use different colors for scatter points and lines for clarity
Set appropriate
alphavalues to avoid overlapping points hiding the lineAdd legends to identify different elements
Use
grid(True, alpha=0.3)for better readability
Conclusion
Overplotting lines on scatter plots is achieved by calling scatter() followed by plot() on the same axes. Use different colors and add legends to distinguish between data points and trend lines for clear visualization.
