Rainfall Prediction using Machine Learning


The power of machine learning has enabled us to predict rainfall with several algorithms, including Random Forest and XGBoost.

There are no best algorithms for predicting rainfall, every algorithm has its advantages and disadvantages. The Random Forest is efficient with small datasets, while the XGboost is efficient with large datasets.

In the same way, we can categorise other algorithms based on the needs of our projects.

Our goal here is to build a predictive machine-learning model of rainfall based on Random Forests.

Algorithm

  • Import all the required libraries such as Pandas, Numpy, Sklearn, and matplotlib.

  • Load the historical rainfall data into a pandas data frame.

  • Preprocess the data by dropping any unnecessary columns and handling missing values, if any.

  • Split the data into training and testing sets.

  • Choose a machine learning algorithm, such as Random Forest or XGBoost, to use for prediction. For this example, we chose the Random Forest algorithm since it best fits the dataset we selected.

  • Train the algorithm on the training set of data.

  • Use the trained model to predict the rainfall for the given month and year.

  • Evaluate the efficiency of the model

Example

# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
#Load the dataset

df=pd.read_csv("Rainfall_dataset.csv")
df.head()
df.fillna(value = 0,inplace =True)
grouped = df.groupby(df.DIVISION)
UP = grouped.get_group("EAST UTTAR PRADESH")

UP.head()
UP.hist(figsize=(12,12))
# Split the dataset into training and testing sets

data = np.asarray(UP[['FEB', 'MAR', 'APR','MAY']])
print(np.shape(data))
X = data[:,0:3]
y = data[:,3]

data = np.asarray(UP[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])
print(np.shape(data))

X = None; y = None
for i in range(data.shape[1]-3):
   if X is None:
      X = data[:, i:i+3]
      y = data[:, i+3]
   else:
      X = np.concatenate((X, data[:, i:i+3]), axis=0)
      y = np.concatenate((y, data[:, i+3]), axis=0)
# Train the model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
np.shape(X_test)
rf = RandomForestRegressor(n_estimators = 100, max_depth=10, n_jobs=1)
rf.fit(X, y)

# Predict on the test set
y_pred = rf.predict(X)
# Evaluate the model
mean_absolute_error(y, y_pred)

print(mean_absolute_error(y, y_pred))
print(y_pred)

The data is loaded from the Rainfall_dataset.csv file and stored into python dataframe. The missing values are filled up with 0. The dataset is then split into training and testing sets. The rainfall values of Feb, Mar and Apr months are extracted from the dataframe and stored in a different array, while the rainfall value of May month is separately stored in another array.

A forest regression model is trained on the entire dataset where the model is used to make predictions on the dataset. The predicted values are then stored in an array. The performance of the model is then evaluated using Mean Absolute Error between the actual rainfall values, which was loaded from the dataset and predicted rainfall values which is calculated using the mean_absolute_error() function.

Output

25.71495399881942   //This is the mean absolute error (MAE) between the actual values y and the predicted values y_pred 

[18.15560485 28.51579025 18.42870772 ...  3.45343635  6.94081644
  8.22604943]  //These are the predicted values stored in the y_pred.

Note − In the above example, rainfall predictions are for East Uttar Pradesh; you may choose any state or region.

Make sure you download the dataset from the link mentioned above to get the output.

Conclusion

Machine learning algorithms can be used to build accurate rainfall prediction models that can help in effective water resource management and disaster management.

However, the accuracy of the model depends on the quality of the data, the selection of features, and the selection of the appropriate algorithm.

Therefore, it is important to carefully collect and preprocess the data, select the relevant features, and choose the appropriate machine learning algorithm for rainfall prediction.

Updated on: 21-Jul-2023

248 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements