Best practices for staging targets
This host is called a "staging target" because it has much in common with other targets, such as the remote storage mount to the Continuous Data Engine. Because it leverages remote storage over the network, the staging target only needs enough disk capacity for the OS, database application, and any relevant logs or tools. A staging target is a requirement on all platforms that Delphix supports except Oracle, but you can also use it for Oracle to replay logs leveraging validated sync.
Memory and CPU
32 GB RAM minimum
4 vCPU minimum
General guidance for staging servers (Multi-platform)
Delphix recommends dedicated Staging servers for role/architecture separation. However, any Target server can be used as Staging.
In cases where the same server is used as both Staging and Target, we strongly recommend a dedicated instance/install for staging to avoid confusion.
Delphix recommends at least one Staging server per Delphix Engine to avoid the possibility of a single point of failure across multiple engines.
If a staging server is shared among multiple Delphix Engines, please ensure that a dedicated SQL Server Instance is created for each Delphix Engine.
Configuration / performance factors:
Transaction log generation rate.
Number of VDBs.
Precise guidance on these items has not yet been defined. In general, if there is a heavy log generation rate and few VDBs, the ideal recommendation is to have at least 1 Staging Target per Delphix Engine.
Disk / local storage
The only local storage needed is for the OS and application with default databases.
Storage for a staging database is provided from the Delphix Engine, which is mounted over the network similar to any Target host (NFS/iSCSI).
If the customer has a standard DB server build, their standard storage sizing is probably fine.
If a recommendation is still needed, suggest 30GB for OS and application and any tools needed.
Network requirements
The Staging Target engages in network data transfers between Staging and the Source backup shared location as well as between Staging and the Delphix Engine.
The Staging Target is also a Target server, and as such should have < 1ms latency to the Delphix Engine (and low latency to the Source backup, when possible).
If the change rate on the Source database(s) is > 1 Gb/sec, the recommended network bandwidth to support network transfers is 10 Gb/sec.
In cases where only 1 Gb/sec network bandwidth is available, segregation of each network is recommended to reduce network contention.
Ensure that the virtual NIC is using the standard vmxnet3 adapter and not Intel for VMWare based clients. Logical IO errors have been reported while using Intel instead of vmxnet3 adapter.
Windows and MSSQL specific
An MSSQL Server Instance used for Staging should not be clustered.
Staging should not be hosted on Windows 2003 - extended support ended July 14, 2015. It is also the first Windows version with iSCSI support and is not ideal.
The SQL Server Instances hosted on the Staging Target should have a Maximum Memory set. Also ensure that at all times, at least 10% of total memory is available for OS operations.
Only system databases (Master/MSDB/Temp/MSDB) are kept on local storage. All other data is read/written to the Delphix Engine.
Windows iSCSI configuration and limits for v2p, target and staging hosts.
Ensure that Receive Side Scaling (RSS) is enabled on every network interface that Delphix will be connecting to.
Delphix Engine controls the number of concurrent restore operations that can run on a staging environment by the validated sync workers, which means we throttle the number of restore operations done by validated sync workers running for different staging databases on the staging environment, with five executing at a time and others waiting for their turn as per First Come First Serve scheduling. This is done to improve overall system performance by reducing resource contention, disk I/O, and network traffic. Also, note that this limit is per Delphix Engine which is connecting to the staging environment.
Following are the limiting factors which will come into play when looking at the performance of staging databases on a staging environment when a validated sync worker runs to keep up with the production databases:
Backup generation frequency: With higher backup frequency, increased restore time will be seen as the pre-provisioning worker will keep ingesting previous backups while new backups are being generated.
Staging database count: When multiple staging databases are hosted on the same server, the backup ingestion load on the staging host will increase. Additionally, if the frequency of backups is high, there will be a greater number of candidates (pre-provisioning workers) waiting in the queue.
Number of VDB hosted on the server
Multiple Delphix Engines connecting to the same staging environment will increase the number of parallel restore operations running on the staging environment and contribute to the performance.
Below are the troubleshooting steps for improving performance:
Have dedicated Staging servers for role/architecture separation from VDB
Add CPU/Memory
Decrease backup frequency
Introduce dedicated networks
Sharing an example of performance in a given setup for reference purposes.
Note
The below findings are from a non-production setup
Environment details
Staging Host: 64 GB Memory, 8 vCPUs, ESXi version: 7.0.3
Backup File Size: 200MB
User for linking: Database user
Setup notes:
No other operations were executed on the Delphix Engine other than pre-provisioning worker running.
No virtual database existed on the staging host.
The source servers and the Delphix Engine are all on the same on-prem data center.
Only one Delphix Engine was connecting to the staging host.
Performance Scenario 1
For a staging host with the above configuration supporting 50 staging databases on a single database instance, and with every dSource having a backup at the 15-minute interval, the time taken to restore these transaction logs stay under 13 minutes(< 750 seconds) on average and hence the staging databases keep up with the backups.
Performance Scenario 2
The same setup could support frequent backups, that is every 10 min, but required the staging databases to be reduced. For example, 40 staging databases on a single database instance could support backups every 10 minutes without causing any lag.