Datasets overview

This section outlines everything required to prepare and provision datasets (VDBs, files, etc) once you have successfully deployed the Delphix Continuous Data Engine. This includes setting up environments, dSources, and virtual databases (VDB). Before beginning, read through this section to better understand the overall ingestion strategy and the requirements to perform it.

The General architecture describes the layout of various components outside of Delphix, such as the source databases, environments, and other Continuous Data Engines. Determining how you get data into the Continuous Data Engine will determine where you should place certain environments and how many are required. Based on the ingestion configuration, an equivalent environment called the target environment is required to provision to.
Learn how to create and manage environments in the Environment management section. The section also touches on various technical requirements and how to ensure a reliable connection between your environments (or hosts) and the Continuous Data Engine. Once connected, the Continuous Data Engine will perform scans to affirm that dataset binaries and installation is available.
Next, the formal ingestion phase begins with the creation of dSources. dSources are a central component in how Delphix’s data virtualization is able to produce ephemeral copies. Once represented within the Continuous Data Engine, its state is then managed through Timeflow, Snapshots, and provisioning new database copies.
The provisioning phase concludes with the creation of virtual databases (VDBs). Once created, they can be easily distributed to your application teams without worry of overhead or broken infrastructure. Similar to a dSource, users can snapshot the VDB’s state or provision new VDB’s. If they get into a broken state or need to view an earlier point in time, the refresh capability can easily rollback time.
The Hook operations and Policies sections provide functions that can be applied to all datasets and sources. Hooks and Policies help manage and optimize the interaction with datasets.
- Hooks are scripts or sets of commands executed at specific events or stages of a dataset's lifecycle, to ensure custom processes are integrated seamlessly.
- Policies define the rules and behaviors that govern the handling and maintenance of datasets within the Continuous Data Engine.

Due to differences between data source connectors, it is recommended to supplement each of these sections with connector-specific requirements. Further details can be found within each data source’s documentation.