Pre-processing data refers to cleaning of data, removing invalid data, noise, replacing data with relevant values and so on. This doesn’t always mean text data; it could also be images or video processing as well.
Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data). Since real-world data is never ideal, there is a possibility that the data would have missing cells, errors, outliers, discrepancies in columns, and much more.
Sometimes, images may not be correctly aligned, or may not be clear or may have a very large size. The goal of pre-processing is to remove these discrepancies and errors.
Let us take the example of uploading an image and viewing it on console using scikit-learn library −
from skimage import io path = "path to puppy.PNG" img = io.imread(path) print("Image being read") io.imshow(img) print("Image printed on console")