Distributed Catalogue Management


Catalogs are referred to as database systems that contain information about objects present in the database itself or the database itself that contains metadata of a distributed database. Catalog management is to be handled effectively as it will affect the performance of site autonomy, view management, and data distribution and replication. Distributed catalog management knows the data distribution across the sites. If any fragmentation and replication of relation occur, then with the help of a distributed catalog, we can uniquely find the replica of each fragment.

Global relation name is shown as <local name>, <birth−site> and Global replica name is the addition of 'replica id' and 'global relation name'. The site catalog represents the objects such as fragments and replicas at a site and ha/s track of replicas present at the site. Management schema for the distributed database is centralized catalogs, fully replicated catalogs, and partitioned catalogs

Centralized Catalog

Centralized refers to the single site on which the whole data of the catalog is stored. This makes it easy to use and understand. The benefits of dependability, availability, autonomy, and processing load distribution, on the other hand, are negatively impacted. The required catalog data is locked at the central site and then transmitted to the requesting site for read operations from noncentral locations. An acknowledgment is transmitted to the central site upon successful completion of the reading process, which helps to unlock the data. The central site must be used to process each update activity which can affect write−intensive applications and can become a performance constraint very fast.

Fully Replicated Catalogs

Each location site in this plan has identical copies of the whole catalog. Questions may be answered locally under this system and reading can go more quickly. All changes must be distributed across all websites. To guarantee catalog consistency, updates are handled as transactions, and a centralized two−phase commit method is used. Write−intensive applications may result in more network traffic due to the broadcast connected with the writes, much like with the centralized approach.

Partially Replicated Catalogs

Site autonomy is limited by the centralized and completely duplicated systems since they are required to maintain a consistent global picture of the catalog. Each site in the partially replicated method has a comprehensive catalog of the data that is locally stored at that site. Additionally, each site is allowed to cache data acquired from other sites. These cached copies might not always be the most recent and updated versions, though. The system keeps track of catalog entries for both the sites where the object was generated and the sites where it is duplicated. Any modifications made to copies are promptly transmitted to the original (birth) site. It can take some time before you can obtain the updated copies you need to replace the outdated ones. Fragments of relationships between sites should generally be available just once. Users should be able to define synonyms for distant objects and utilize those for subsequent referrals to guarantee data distribution transparency.

Conclusion

This article consists of distributed catalog management where catalog refers to the information about the objects and can affect the performance of site autonomy, view management, and data distribution and replication. Three management schema for distributed catalogs is shown. First is a centralized catalog which has data collected at a single site to perform operations. The second is a fully replicated catalog which has a copy of the catalogue present at each site. Third is a partially replicated catalog which helps in maintaining the whole catalog data and performing operations with cache entries.

Updated on: 14-Jul-2023

579 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements