When separate copies of data are stored at each of two or more sites it is called replication?

Data replication is the process of making multiple copies of data and storing them at different locations for backup purposes, fault tolerance and to improve their overall accessibility across a network. Similar to data mirroring, data replication can be applied to both individual computers and servers. The data replicates can be stored within the same system, on-site and off-site hosts, and cloud-based hosts.

Common database technologies today either have built-in capabilities, or use third-party tools to accomplish data replication. While Oracle Database and Microsoft SQL actively support data replication, some traditional technologies may not include this feature out of the box.

Data replication can either be synchronous, meaning that any changes made to the original data will be replicated, or asynchronous, meaning replication is initiated only when the Commit statement is passed to the database.

Benefits of data replication

Although data replication can be demanding in terms of cost, computational, and storage requirements, businesses widely use this database management technique to achieve one or more of the following goals:

Improve the availability of data

When a particular system experiences a technical glitch due to malware or a faulty hardware component, the data can still be accessed from a different site or node. Data replication enhances the resilience and reliability of systems by storing data at multiple nodes across the network.

Increase data access speed

In organizations where there are multiple branch offices spread across the globe, users may experience some latency while accessing data from one country to another. Placing replicas on local servers provides users with faster data access and query execution times.

Enhance server performance

Database replication effectively reduces the load on the primary server by dispersing it among other nodes in the distributed system, thereby improving network performance. By routing all read-operations to a replica database, IT administrators can save the primary server for write-operations that demand more processing power.

Accomplish Disaster recovery

Businesses are often susceptible to data loss due to a data breach or hardware malfunction. During such a catastrophe, the employees' valuable data, along with client information can be compromised. Data replication facilitates the recovery of data which is lost or corrupted by maintaining accurate backups at well-monitored locations, thereby contributing to enhanced data protection.

How does data replication work?

Modern day applications use a distributed database in the back end, where data is stored and processed using a cluster of systems, instead of relying on one particular system for the same.

Let us assume that a user of an application wishes to write a piece of data to the database. This data gets split into multiple fragments, with each fragment getting stored on a different node across the distributed system. The database technology is also responsible for gathering and consolidating the different fragments when a user wants to retrieve or read the data.

In such an arrangement, a single system failure can inhibit the retrieval of the entire data. This is where data replication saves the day. Data replication technology can store multiple fragments at each node to streamline read and write operations across the network.

Data replication tools ensure that complete data can still be consolidated from other nodes across the distributed system during the event of a system failure.

Types of data replication

Depending on data replication tools employed, there are multiple types of replication practiced by businesses today. Some of the popular replication modes are as follows

Full table replication

Full table replication means that the entire data is replicated. This includes new, updated as well as existing data that is copied from source to the destination. This method of replication is generally associated with higher costs since the processing power and network bandwidth requirements are high.

However, full table replication can be beneficial when it comes to the recovery of hard-deleted data, as well as data that do not possess replication keys - discussed further down this article.

Transactional replication

In this method, the data replication software makes full initial copies of data from origin to destination following which the subscriber database receives updates whenever data is modified. This is more efficient mode of replication since fewer rows are copied each time data is changed. Transactional replication is usually found in server-to-server environments.

Snapshot replication

In Snapshot replication, data is replicated exactly as it appears at any given time. Unlike other methods, Snapshot replication does not pay attention to the changes made to data. This mode of replication is used when changes made to data tends to be infrequent; for example performing initial synchronizations between publishers and subscribers

Merge replication

This type of replication is commonly found in server-to-client environments and allows both the publisher and subscriber to make changes to data dynamically. In merge replication, data from two or more databases are combined to form a single database thereby contributing to the complexity of using this technique.

Key-based incremental replication

Also called key-based incremental data capture, this technique only copies data changed since the last update. Keys can be looked at as elements that exist within databases that trigger data replication. Since only a few rows are copied during each update, the costs are significantly low.

However, the drawback lies in the fact that this replication mode cannot be used to recover hard deleted data, since the key value is also deleted along with the record.

Data replication in DBMS

Data replication in DBMS (distribution servers) can be carried out using a suitable replication scheme. The widely-adopted replication schemes are as follows:

Full data replication
Partial data replication
No replication

Full data replication

Full replication means that the complete database is replicated at every site of the distributed system. This scheme maximizes data availability and redundancy across a wide area network.

For example, users in a cross-country network have access to the complete database from an Asia based server if the European or North American server experiences a technical difficulty.

Full replication also contributes to faster execution of global queries as the results can be obtained from any local server.The disadvantage of full replication is that the update process tends to be on the slower side. This makes keeping up-to-date copies of data at every location quite challenging.

Partial data replication

Partial replication occurs when only certain fragments of the database are replicated based on the importance of data at each location. Here, the number of copies can range from one to the total number of nodes in the distributed system.

In an enterprise environment, this mode of replication can be useful for members of sales and marketing teams where a partial database is stored on personal computers and regularly synced with the main server.

No replication

In this mode of replication, only one fragment exists on each site of the distributed system. While no replication can be attributed to the ease of data recovery, it can have an adverse effect on the speed of execution of queries since multiple users access the same server. Compared to other replication schemes, no data replication in DBMS provides poor availability of data.

Prevent data loss with Device Control Plus

Device Control Plus is a security solution from ManageEngine that prevents removable devices, such as USB sticks or thumb drives, from gaining unauthorized access to nodes across a distributed system. Removable storage devices are an ever-present danger to the security of data in an organization, as well as the privacy of customer and employee personal information.

Additionally, critical systems across your production environment are subject to insider attacks for personal or professional gain. Whenever files are modified or copied to USB devices, Device Control Plus copies the original file to a password-protected network share that enables ease of recovery in the event of a data breach.

Device Control Plus comes with a built-in file shadowing feature that protects vital data across your network. Select endpoints to enable file replication, set file size and file extension limits, configure the remote share path, and you are all set to safeguard your business from the risk of data loss. Avail your 30 day free trial today!

In the last chapter, we had introduced different design alternatives. In this chapter, we will study the strategies that aid in adopting the designs. The strategies can be broadly divided into replication and fragmentation. However, in most cases, a combination of the two is used.

Data Replication

Data replication is the process of storing separate copies of the database at two or more sites. It is a popular fault tolerance technique of distributed databases.

Advantages of Data Replication

Reliability − In case of failure of any site, the database system continues to work since a copy is available at another site(s).
Reduction in Network Load − Since local copies of data are available, query processing can be done with reduced network usage, particularly during prime hours. Data updating can be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query processing and consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at different sites and minimal coordination across the network. Thus, they become simpler in nature.

Disadvantages of Data Replication

Increased Storage Requirements − Maintaining multiple copies of data is associated with increased storage costs. The storage space required is in multiples of the storage required for a centralized system.
Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update needs to be reflected in all the copies of the data at the different sites. This requires complex synchronization techniques and protocols.
Undesirable Application – Database coupling − If complex update mechanisms are not used, removing data inconsistency requires complex co-ordination at application level. This results in undesirable application – database coupling.

Some commonly used replication techniques are −

Snapshot replication
Near-real-time replication
Pull replication

Fragmentation

Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be classified into two techniques: primary horizontal fragmentation and derived horizontal fragmentation.

Fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the original table can be reconstructed from the fragments whenever required. This requirement is called “reconstructiveness.”

Advantages of Fragmentation

Since data is stored close to the site of usage, efficiency of the database system is increased.
Local query optimization techniques are sufficient for most queries since data is locally available.
Since irrelevant data is not available at the sites, security and privacy of the database system can be maintained.

Disadvantages of Fragmentation

When data from different fragments are required, the access speeds may be very low.
In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
Lack of back-up copies of data in different sites may render the database ineffective in case of failure of a site.

Vertical Fragmentation

In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered students in a Student table having the following schema.

STUDENT

Regd_No

Name

Course

Address

Semester

Fees

Marks

Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the database as follows −

CREATE TABLE STD_FEES AS SELECT Regd_No, Fees FROM STUDENT;

Horizontal Fragmentation

Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment must have all columns of the original base table.

For example, in the student schema, if the details of all students of Computer Science Course needs to be maintained at the School of Computer Science, then the designer will horizontally fragment the database as follows −

CREATE COMP_STD AS SELECT * FROM STUDENT WHERE COURSE = "Computer Science";

Hybrid Fragmentation

In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used. This is the most flexible fragmentation technique since it generates fragments with minimal extraneous information. However, reconstruction of the original table is often an expensive task.

Hybrid fragmentation can be done in two alternative ways −

At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the vertical fragments.