Selective data distribution overview
The selective data distribution technology permits the distribution of masked data between Delphix Engines without bringing over the unmasked parent source. These engines must be the identical versions of the Delphix Data as a Service Engine with the Delphix Masking Engine. Otherwise, they can be asymmetric in terms of engine configuration. You can provision VDBs from distributed masked objects, allowing for the geographical distribution of data and remote provisioning.
You can run selective data distribution ad hoc, but it is typically run according to a predefined schedule. After the initial update, each subsequent update sends only the changes incurred since the previous update. Selective data distribution does not provide synchronous semantics, meaning that the data distributed to the target is only as current as of the most recent update.
Selective data distribution features
As virtual appliances, you can backup, restore, replicate, and migrate data objects between Delphix Engines using features of VMWare and the underlying storage infrastructure. In addition to the replication capabilities provided by this infrastructure, selective data distribution permits the distribution of masked data between Delphix Engines. The sources received on a target Delphix Engine do not include the original parent source, thereby making the original source inaccessible from the target.
Selective data distribution is configured on the source Delphix Engine. It first copies a subset of masked VDBs to a target Delphix Engine, then sends incremental updates either manually or according to a schedule. As illustrated below, sensitive data from the Production Data Center is brought into Delphix as a dSource. The Masking Engine then masks the dSource data as a VDB. Synchronously, DxFS redacts the sensitive data within that dSource before sending it across into the Non-Production datacenter using Delphix Replication. Using SDD capability, sensitive data is never exposed because it is protected both at the dSource and VDB layers.
You can use replicated masked VDBs to provision new VDBs on the target Delphix Engine. The provisioned VDBs contain the data in their masked parent and are therefore also considered masked. You can refresh these VDBs to snapshots sent as part of an incremental replication update, as long as you do not destroy the parent object on the replication source. For more information, see Provisioning from a Replicated Data Sources or VDBs.
During replication, replicated masked VDBs are maintained in an alternate replica and are not active on the target side. The failover of a selective data distribution replica is not supported.
Selective data distribution details
When you select masked objects for selective data distribution, the engine will automatically include any dependencies, such as environments, associated with the VDB. The parent dSource and any parent VDBs are not included automatically. The data associated with parent objects are selectively included for disk space efficiency, but data in the parent dSource and VDBs that the masked VDB does not need are excluded.
During replication, the Delphix Engine will negotiate an SSL connection with its server peer to use SSL_RSA_WITH_RC4_128_MD5 as the cipher suite, and TLSv1 as the protocol.
Only database objects and their dependencies are copied as part of a selective data distribution operation, including:
Masked VDBs
Environments
Environment configuration (users, database instances, and installations)
The following objects are not copied as part of a selective data distribution operation:
Parent dSources of masked VDBs - The storage blocks for non-sensitive dSource data are sent from the source to the target replication host. This storage is displayed under Held Space, for more information see An Overview of Held Space.
Groups of the parent dSources
Users and roles
Policies
VDB (init.ora) configuration templates
Events and faults
Job history
System services settings, such as SMTP
Resumable selective data distribution
A single selective data distribution instance can fail for a number of environmental and internal reasons. However, using the Resume feature, you can restart selective data distribution from an intermediate point; no data is retransmitted. Selective data distribution is resumable across machine reboot, stack restart, and network partitions. The resumable replication feature is fully automated and does not require or allow any user intervention.
For example, you can resume a large, time-consuming initial distribution or incremental update after it is interrupted. Suppose a selective data distribution profile has already been configured from a source to a target. A large, full send from the source begins that is expected to take weeks to complete. Halfway through, a power outage at the data center that houses the source causes the source machine to go down and only come back up after a few hours. On startup, the source will detect that a selective data distribution was ongoing, automatically re-contact the target, and resume the distribution where it left off. In the user interface (UI) on the source, the same selective data distribution send job will appear as active and continue to update its progress. However, in the UI of the target, a new distribution receives job will appear, although it will track its progress as a percentage of the entire replication.
Selective data distribution will not resume after failures that leave the source and target connected. For example, if a storage failure on the target, such as an out-of-space error, causes a distribution to fail, then the source and target remain connected. As a result, the Engine will discard state data associated with the failed Replication operation.
Selective data distribution restrictions
Only masked VDBs can be added to a selective data distribution spec. You cannot add dSources, groups, or the entire domain.
Only masked VDBs with a Snapshot Policy of None should be added to a selective data distribution spec.
Unmasked VDBs cannot be added to a selective data distribution spec.
VDBs that undergo selective data distribution and their children cannot be selectively redistributed to another target.
You cannot go to the target engine and create a selective data distribution spec that includes VDBs that are present because of selective data distribution. However, you can replicate this data using a traditional replication spec.
Best practices for using Selective Data Distribution are described in the Selective Data Distribution Best Practices knowledge base article.
Selective data distribution supported platforms
Selective Data Distribution supports the following platforms:
Oracle
Microsoft SQL Server
SAP ASE (Sybase)
Db2