
- Data Engineering - Home
- Data Engineering - Introduction
- Data Engineering - Data Collection
- Data Engineering - Data Storage
- Data Engineering - Data Processing
- Data Engineering - Data Integration
- Data Engineering - Data Quality & Governance
- Data Engineering - Data Security & Privacy
- Data Engineering - Tools & Technologies
- Data Engineering Useful Resources
- Data Engineering - Useful Resources
- Data Engineering - Discussion
Data Engineering - Data Storage
There are two types of digital information: input and output data. Users provide the input data, while computers generate the output data. The CPU requires user input to perform computations and produce outputs.
Users had to manually enter data into computers, which was time-consuming and inefficient. To address this, random access memory(RAM), was used as a short-term solution. RAM has limited storage capacity and retention. Read-only memory(ROM) which can only read and edit the data, is used to control the computer's basic functions.
Despite advancements in computer memory with the development of dynamic RAM(DRAM) and synchronous DRAM(SDRAM), these technologies are still limited by cost, space and memory retention. When a computer is powered off, RAM losses its data.
With data storage, users can save data on a device, ensuring it is retained even if the computer power down. Instead of manually entering the data, users can instruct the computer to retrieve the information from storage devices. Computers can read data from various sources and save it to the same or different storage locations. Additionally, users can share data with others.
Relational Database
A relational database is a collection of information that organizes data into predefined relationships. Data is stored in one or more tables with rows and columns, making it easy to understand how different data structures relate to each other. Relationships are logical connections between tables, established based on their interactions.
Here's a simple example of two tables a small business might use to process orders for its products. The first table is a customer's information table, where each record includes customer's name, billing information, address, phone number, and other contact details. Each piece of information is in its own column and the database assigns a unique ID to each row. In the second table, the customer order table, each record includes the ID of the customer who placed the order, the product ordered, the quality, the selected size and color, and so on- but not the customers name or contact information.
The relational model separates logical data structures from physical storage structures. This allows database administrators to manage physical storage without affecting data access.
Separation also applies to database operations. Logical operations specify the content, while physical operations determine how to access and retrieve the data.
To maintain the data accuracy and accessibility, relational databases follow integrity rules. For instance, one rule might prevent duplicate rows in a table to avoid inconsistent data.
NoSQL Databases
NoSQL databases are designed to handle large volumes of unstructured and semi-structured data. Unlike traditional relational databases with predefined schemes, NoSQL databases use flexible data models that can adapt to changes and scale horizontally to mange increasing data volumes. This is classified into four main categories −
Key-value stores
Column-family stores
Graph databases
Document databases
NoSQL databases are often used in applications that handles high volume of data such as social media, gaming and e-commerce. Where real-time processing and analysis are difficult. They are also used in other application, such as content management systems and document management.