Delphix architecture with PostgreSQL
Primer: PostgreSQL replication methodology
Delphix achieves virtualization for the PostgreSQL database through leveraging PostgreSQL's streaming replication protocol between a Primary (parent) and Secondary (child) database configuration. Delphix uses the secondary, hot-standby database as a source database to ingest from.
In a typical PostgreSQL replication workflow, PostgreSQL leverages a parent and child configuration to maintain high-availability with Write Ahead Logs (WAL - a log file where all the modifications to the database are written before they’re applied/written to data files) to maintain sync between both databases. This can be seen in the architectural diagram below.
Delphix leverages this feature by capturing the replication stream via file-based log shipping or Streaming WAL records depending on your PostgreSQL configuration. Below is the same PostgreSQL replication architecture diagram with Delphix use case overlayed:
Core concepts:
Write Ahead Logs - WAL:
These are log files that record all modifications to the database before they’re applied/written to data files.
File-Based Log Shipping:
A method of syncing parent and child databases by applying WAL log files.
One WAL log file can contain up to 16MB of data
The WAL file is shipped only after it reaches that threshold
Note: This will cause a delay in replication and also increase the chances of losing data if the parent crashes and logs are not archived.
Streaming WAL Records:
WAL record chunks are streamed by database servers to keep data in sync.
The standby server connects to the parent to receive the WAL chunks.
The WAL records are streamed as they are generated.
The streaming of WAL records need not wait for the WAL file to be filled.
This allows a standby server to stay more up-to-date than is possible with file-based log shipping.
By default, streaming replication is asynchronous even though it also supports synchronous replication.
Delphix supports both methods of ingestion (Log Shipping and Streaming WAL Records). It is important to note that both methods have pros/cons. For example, streaming replication doesn't have as much lag between parent and child, as records are sent as they are generated. However, streaming requires both parent and child to be online and able to communicate directly. It also requires the replica to keep up well enough that the parent still has on-disk copies of the WAL the replica needs, and generally requires you to spend extra pg_xlog space on retaining extra WAL for the replica.
PostgreSQL architecture for streaming replication
Transaction logs - WAL (Write Ahead Logs) - are replayed from source to the staging environment to maintain data sync between the two.
Steps involved in supporting this architectural setup:
The end-user application connecting the PostgreSQL source database may perform read/write queries on the database.
The database changes are recorded as WAL segments in the PostgreSQL database under the directory pg_xlog.
Set up the source PostgreSQL server as Standby Node.
Configure replication security by creating a replication user and specifying the authentication protocol.
Initiate a base backup on secondary.
Configure postgresql.conf file as per the source environment.
Start the standby server.
WAL receiver
The WAL receiver process at the secondary continuously listens for any incoming WAL segments from primary in its receive queue and applies the same on the source database.
WAL segments
WAL segments are archived when a segment size reaches 16 MB (default).
Delphix architecture with PostgreSQL
This topic describes the high-level process for adding PostgreSQL environments, linking PostgreSQL data sources to the Delphix Engine, and provisioning virtual databases from PostgreSQL data sources.
Given the preceding primer on Streaming Replication and WAL Logs, A high-level breakdown of the PostgreSQL/Delphix Architecture breaks down as follows:
To match Delphix Terminology the PostgreSQL-denoted parent Host will be referred to as the Source Host and the PostgreSQL-denoted child Host will be referred to as the Staging Host.
For greater detail - Delphix has three ingestion options that leverage a combination of the PostgreSQL Replication Syncing Method as well as backup ingestion initiation:
Delphix Initiated Backup Ingestion with PostgreSQL Streaming Replication.
Customer Initiated Backup Ingestion with PostgreSQL Streaming Replication.
Customer Initiated Backup Ingestion with WAL log Shipping.
Below is a brief breakdown of the pros/cons of the three ingestions options:
Ingestion method | Backup mechanism | Syncing mechanism | Delphix considerations |
---|---|---|---|
Delphix Backup + Streaming Replication | The initial backup is taken by Plugin through the | Streaming ensures that source and staging databases are closely synched | This method ensures that Delphix has the capability to ingest as frequently as WAL logs are generated by the Source. This leads to less snapshots than a customer-initiated backup, but those snapshots are aligned to policy. |
Customer Backup + Streaming Replication | The initial backup is provided by the customer. Backup is taken through the | Streaming ensures that source and staging databases are closely synched | This method ensures that the Delphix Engine dSource is always closely aligned to the Source Database |
Customer Backup + Log Shipping | The initial backup is provided by the customer. Backup is taken through the | The less bandwidth-intensive method which does not require source and staging databases to be directly connected at all times | This method does not require all resources to be concurrent, which leads to a less resource-intensive setup. |
Environment setup
Linking architecture between PostgreSQL and Delphix Engine
At a high level, the Delphix Engine maintains a logical representation of the source database files, from which one can provision virtual databases (VDBs). In order to link a data source and provision a VDB, the following types of environments are required:
PostgreSQL provisioning architecture - source environment - to - Delphix engine
A source environment is where the un-virtualized source database runs. The Delphix Engine uses the backup, restore, and replication features of the PostgreSQL DBMS to maintain its internal representation of the source database, to be used for provisioning VDBs. The Delphix Engine must be able to connect to the source environment in order to discover each running source database and to orchestrate the backup, restore, and replication functionality necessary to keep its representation synchronized with the source database. The Delphix Engine is designed to have a minimal impact on the performance of the source database and the source environment.
Target environments for PostgreSQL
A target environment is where virtualized databases run. PostgreSQL target environments serve two purposes:
Since PostgreSQL does not provide a native incremental backup API, a warm standby server (in other words, one in log-shipping mode) must be created where all database files are stored remotely on a Delphix Engine. We refer to the process of applying those WAL files to the staging database as validated sync. During validated sync, we retrieve shipped WAL logs from the source, roll the staging database forward, and create a snapshot of that data state.
Once a source database has been set up, you can provision virtual databases from any of the discrete snapshots along the Timeflow mentioned above to any compatible target environment (for more information, see Requirements for PostgreSQL Target Hosts and Databases). Database files are exported over the network to the target environment, where the virtual database instance runs.
Workflow for PostgreSQL environments
Linking environments
Prior to linking a data source, both the source environment and a compatible target environment (to be used for the source database mentioned above) must be added to the Delphix Engine. Prior to provisioning a virtual database, a target environment (with similar settings/versioning to the source environment) must be added to the Delphix Engine.
Target hosts for PostgreSQL
Container for VDBs
This topic describes the basic concepts involved with provisioning VDBs from Postgres dSources or even other Postgres VDBs.
Once an environment is added to the Delphix Engine, environment discovery takes place. Environment discovery is the process of enumerating PostgreSQL installations and configurations when a source or target environment is added to the Delphix Engine. The discovery process is repeated during environment refresh in order to detect new PostgreSQL installations and clusters.
A dSource is a copy of a physical database that is created when the Delphix Engine links to and loads the database. The Delphix Engine keeps the dSource in sync with the source database in order to facilitate the provisioning of Virtual Databases (VDBs) from the dSource's TimeFlow. While creating a dsource, the Delphix Engine initiates a full database backup of the source database by running pg_basebackup on the source host. The initial snapshot of the dSource is derived from this backup.
After obtaining the initial snapshot and linking the dSource, the Delphix Engine keeps the source and staging database in sync by monitoring the source database for new transaction logs on the source host and then applying those transaction logs on the source database. PostgreSQL streaming replication protocol is used to achieve data replication between the source database and the staging database.
When provisioning a VDB, Delphix creates the database with the default Postgres database options. If the database options have been altered on the source database and you wish for the VDB to reflect these same options, they would need to be altered via a Post-Script or by hook operations.
Dependencies
If there are dependencies on the snapshot, you will not be able to delete the snapshot free space; the dependencies rely on the data associated with the snapshot.