Best practices for network configuration

Best practices

For more information about network configuration refer to Network performance configuration options.

Delphix Engine <===> Target Host (Implement standard requirements for optimal NFS/ISCSI performance):
- Optimal physical network topology:
  - Low latency: <1ms for 8K packets.
  - Network adjacency: minimize network hops, co-locate in the same blade enclosure, co-locate on the same physical host.
  - Eliminate all Layer 3+ devices like firewalls, IDS, packet filters (Deep Packet Inspection - DPI).
  - Multiple switches can add latency and fragmentation, and reordering issues will add significant latency.
- Optimal throughput:
  - 10GbE physical uplinks or higher.
  - Jumbo frames (typically MTU 9000) improve network efficiency by lowering CPU and latency, and allowing greater throughput.
  - All devices end-to-end must be configured for the larger frame size including switches, routers, fabric interconnects, hypervisors, and servers.
- Optimal logical flow:
  - Disable QoS throttles limiting network utilization below line rate (e.g. HP Virtual Connect FlexFabric).
  - Consider a dedicated VLAN (with jumbo frames) for NFS/iSCSI traffic.
    - VMware KB-Configuring iSCSI port binding with multiple NICs in one vSwitch for VMware ESXi 6.0.x
    - VMware KB-Considerations for using software iSCSI port binding in ESX/ESXi
- NIC Teaming (at ESX layer) of multiple physical uplinks can provide additional throughput for higher workloads.
  - Examples: 4x1Gb NICs support up to 400 MBPS IO, 2x10Gb NICs support up to 2GBPS IO.
  - VMware KB-1004088 has NIC teaming recommendations, including route- based-on-IP-hash policy.
  - VMware KB-1001938 has host requirements for physical link aggregation (LACP, EtherChannel).
  - VMware KB-1007371 and this popular blog post details problems with NIC selection using dest-IP hash.
- Fragmentation and dropped packets can result in excessive retransmissions of data, reducing throughput.
- Jumbo frames check via ping:
  - Delphix Engine
    $ ping -D -s [Target_IP] 8000
    - "ICMP Fragmentation needed and DF set from gateway" indicates MTU < 8028
  - Linux
    $ ping -M do -s 8000 [Delphix_Engine_IP]
    - "Frag needed and DF set (mtu = xxxx)" indicates MTU < 8028
  - MacOS
    ping -D -s 8000 [Delphix_Engine_IP]
    - sudo sysctl -w net.inet.raw.maxdgram=16384 will increase the max ICMP datagram size on Mac, allowing you to use -s 9000 on MacOS.
  - Windows
    ping -f -l 8000 [Delphix_Engine_IP]
    - http://www.mylesgray.com/hardware/test-jumbo-frames-working/
- Measure network bandwidth and latency:
  - Latency in both directions should be < 1ms for an 8KB payload.
  - Network Hops should be minimized: traceroute (Unix/Linux) / tracert (windows).
  - Throughput in both directions: 50-100 MB/s on 1 GbE, 500-1000 MB/s on 10 GbE physical link.
- NIC should use Auto-negotiate on Ethernet with a minimum of 1000Mbps. 
  - Hard setting speed/duplex will limit network throughput below the line rate.
Delphix <===> Staging Server (SQL Server, Sybase):
- Measure latency, bandwidth for transaction log restore performance
Source <===> Delphix:
- Measure latency, bandwidth for snapsync performance
ESX host <===> ESX host (ESX Cluster):
- Measure latency, bandwidth for cluster operations; e.g. vMotion.
- The Delphix Engine uses the bulk of available RAM as a read cache for frequently accessed filesystem data. This cache is updated as I/O occurs on the Delphix Engine. Due to this, live vMotion of a Delphix Engine may take exceedingly long or may fail in the case that the engine receives more I/O than the vMotion network can sustain.
  - This is because the entire memory footprint of the Delphix VM (more precisely, the entire address space of the ESX processes that comprise the VM) must be copied to the receiving ESX host, along with all changes to that address space as they happen.

Frequently asked questions

Why does Delphix request 10GE Ethernet?

As a matter of physics and standards, 10 gigabit (Gb) Ethernet can sustain approximately 1 gigabyte (GB) per second of throughput. With all our best practices applied, a Delphix Engine can achieve line speed or greater, allowing for optimal load and engine utilization. Lower network speeds may be acceptable for low loads, while in some environments NIC teaming (e.g. LACP) may be required for top speeds.

Why does Delphix require <1ms latency to TARGET servers and <50ms to SOURCE servers?

Delphix leverages NFS and iSCSI (depending on platform) for live TARGET DB mounting over the network, making it imperative that latency is as low as possible. Data coming from SOURCE servers is not generally as time-sensitive, so you need a minimum latency of <50ms to ensure operational integrity.

Why does Delphix request Jumbo frames?

Jumbo frames increase the Ethernet maximum transmission unit (MTU) from the default 1500 bytes to 9000 bytes. This has several effects, such as decreasing CPU cycles by transferring fewer packets and increasing the engine throughput. You will find jumbo frames have a 10-20% real-world impact and are required (along with all other best practices) to handle peak loads of 800-1000MB/s on an 8 vCPU engine with a 10Gb network.

How does Delphix avoid communication impact with non-jumbo frame hosts when Jumbo Frames are enabled on the Delphix Host?

Path MTU Discovery is the mechanism by which two hosts agree on the MTU leveraged for communication between them. This mechanism will ensure communication between both standard and Jumbo Frame enabled hosts works as expected.

When does Delphix recommend NIC teaming?

The Delphix Engine is capable of high throughput, but not every enterprise has sufficient network bandwidth to support it. Teaming is a less expensive way of increasing the bandwidth when compared to new hardware.

Why does Delphix recommend logical and physical and co-location?

The Delphix Engine leverages network connections extensively, so optimizing the latency whenever possible is very important and can sometimes be critical.