In simple language, big data is a collection of data that is larger, more complex than traditional data, and yet growing exponentially with time. It is so huge that no traditional data management software or tool can manage, store, or can process it efficiently. So, it needs to be processed step by step via different methodologies.
There are three issues with Big data and they are as follows −
Low-quality data or inaccurate data quality may lead to inaccurate results or predictions which does nothing but just wastes the time and effort of the individuals.
To solve, to predict or to find new patterns from the data, the data must be of high quality and accurate.
Due to a large amount of data, no traditional data management tool or software can directly/easily process because the size of these large data sets is usually in Terabytes which is really hard to process.
So we need to go through various stages to process the data like removing unnecessary low-quality data, partitioning the data by some defined factor, etc.
Data comes from various types of sources like social media, different websites, captured images/videos, customer logs, reports created by individuals, newspapers, emails, etc.
Collecting and integrating various data which are of different types is a very challenging task.