NSX 4.x Series - Part 4 NSX Logical Routing

This is the fourth part of a series of blogpost related to NSX technology, relative version 4.x.
You can find other blogpost from series here:
– Part 1 NSX Layers and Architecture
– Part 2 NSX Infrastructure Preparation
– Part 3 NSX Logical Switching
– Part 5 NSX Logical Bridging (tbd)
– Part 6 NSX Firewall (tbd)
– Part 7 Advanced Thread Prevention (tbd)
– Part 8 NSX Services (tbd)
– Part 9 NSX Users and Roles (tbd)
– Part 10 NSX Federation (tbd)

Let’s talk about Logical Routing inside NSX useful for allowing network communication within NSX but also with the “outside” physical world.

Logical Routing Introduction

In NSX logical routing is supporting different type of solutions:

Single or Multitenant models
Separation with tenants and networks
Cloud environments with container workloads
Routing path and simple routing in virtual networks
Distributed routing and and central services in data center
Extend logical network to physical

There are of course certain requirement that must be fulfilled before had a working logical routing:

NSX Manager cluster up and ready
Transport Zones with N-VDS/VDS
ESXi prepared with the NSX Transport node profile and attached to the right Transport Zone (TZ)
NSX Edge node / nodes deployed and configured in line with the infrastructure and the requirements

In more deep the NSX Routing service is managing different parts:

North-South Routing (from NSX to Physical Network)
East-West Routing (Inside NSX)
Multitenancy with Tier 0 / Tier 1
High Availability
IPv6 and multicast
Central services like Gateway Firewall, NAT, VPN and Load Balancing

NSX has two types of Gateway that you must understand since the beginning, Tier-0 and Tier-1, each one has different type of qualities.

Tier-0	Tier-1
Managed by Infrastructure Administrator / Provider	Managed and Configured by tenants
Static and Dynamic Routing	No Dynamic Routing
Support to ECMP routing forwarding to physical routers	No ECMP
Send the traffic between logical and physical (North – South)	Manage the routing between different segments (East – West) and should be connected with T0 Gateway for moving packets outside NSX
An Edge Cluster is required	Edge Cluster is needed only if central services are activated/configured

NSX Single – Tier Topology

In this topology only T0 Gateway are used and the segment are connected directly to it.

NSX Multi-Tier Topology

In this topology both T0 and T1 are used, T1 gateways are connected to T0 gateways and the segments are connected to the T1.

EDGE nodes and Clusters

The forwarding services needs for sure a place where to run, especially those services that are central (NAT, VPN and so on) and needs compute resources. The EDGE node is the transport node that is hosting the Gateway and centralized Stateful services. Commonly its a Virtual Machine but it could be also a Physical Server.
In order to consume the resources offered by the edge node a logical object must be created, this is the Edge Cluster where the edge nodes are grouped. This object is used in order to provide redundancy and scalability to Gateway Routing and Stateful Services.

One important part of course is how to connect and forward the traffic to the physical network, so the Edge node must have one or more uplinks dedicated to this type of role.

On the image above you can see that the Edge nodes needs at least one interface for communicating with the physical network, you can of course use multiple uplink interfaces, configured per edge node and dedicated for that edge node.
On the left part you can see that the idea was to use a specific shared transit VLAN for all the edge nodes used to manage the traffic, instead on the right part the decision was to use a point to point VLAN for each edge node interface. This type of object are created on T0 Gateway and distributed with configuration to multiple Edge node.

Gateway SR and DR

Each T0 Gateway will have Distributed Router component, that basically it will be realized in every Transport Node. The T0 Gateway they will have, in opposition to the other common Transport Node also the Service Router component, that is in charge to manage basically the N/S routing and centralized services mentioned some paragraph above.

Distributed Router	Service Router
Basic forwarding functionalities	North / South Routing
Realized in all transport nodes (Edge and Hosts)	Manage centralized stateful services (VPN, NAT, etc..)
Run as kernel module in the ESXi	Uplinks to external network in order to forward outside NSX
Provide Distributed Routing and First-hop routing for workloads	Realized only on the Edge transport nodes

Conceptually it could be summarized with the drawing here under

In a Multi-Tier configuration both T0 and T1 could have SR/DR components.

As I mentioned before in this article the DR component is realized on each transport node instead the SR component only on Edge nodes. In a different way from the previous NSX version, in this one DR and SR are automatically interconnected, this is gratefully help the work of the Virtual/Network administrator that simply doesn’t need to work around routing network distribution between this two components. (We will see later on this which configuration the administrator could do in order to distributing information about the network infrastructure from one component to another).

Single-Tier Topology View

In this topology T0 had the segment directly attached and communicating with the T0 – DR component. We had then also two SR (means two edge nodes deployed) part of the T0 Gateway that will handle the traffic in/out NSX with specific uplinks interfaces.
On the Physical view you can see that the DR component is touching all the Transport Nodes because it should be in every node in order to be called “distributed“.
Instead the SR component are only on the Edge Nodes, instantiated on each one of them in order to balancing and avoiding SPOF, their first job is handling the routing central service for N/S communication and possibly additional services.

Multi-Tier Topology View

In this topology instead we are adding to the previous one some more components and complexity. The segments are attached now in two different T1 Gateway that will be spread among all the transport nodes with their DR component. Each T1 then is connected to the T0 Gateway that has its DR component also shared with all TN. The rest of course is similar to the previous topology where the SR are running on the Edge nodes only. You can see the the DR components coming from each different T1 and T0 gateway are part of each nodes.

As said before if on T1-GW the administrator enables the stateful services the SR component is instantiated also for them and use the compute resources provided by the selected Edge cluster.
Ideally it would a logical and physical representation should be something like this.

Gateway Interfaces

The gateway has different types of interfaces:

Uplink interface: connect the T0 GW to the physical devices (Routers, TOR, etc..)
Downlink interface: connect segments or logical switches to GW
RouterLink port: connect T0 and T1 GWs
Intratier Transit Link: it’s an internal link between the DR and SR on GW
Service Interface: it’s a special interface for VLAN-based services and partners services redirection

NSX Edge and Edge Cluster

The Edge node provides resources for different types of services:

connection to external network / infrastructure
hosts the SR component of T0 and T1 GWs
hosts services like dynamic routing, DHCP, NAT o VPN
establish Geneve tunneling for the overlay networks
enables Service Insertion with Third-Party vendor services

The concept of edge cluster allows the administrator to consuming the resources coming from edge nodes that are forming a group. This object has definitely a scope:

resources to scale
high availability
up to 10 edge nodes for a maximum of 160 edge clusters
failure domain are supported inside an edge cluster

The Edge nodes commonly are virtual machine but its given to the administrator the possibility to also deploy it in a physical form factor, the main reason behind are the performance required.

Here a small table in regards of the Edge VM form factor and its different sizes

Size	CPU	RAM	Disk	VM HW Version
Small	2	4	200 GB	at least VM version 11 or ESXi 6.0
Medium	4	8	200 GB	at least VM version 11 or ESXi 6.0
Large	8	32	200 GB	at least VM version 11 or ESXi 6.0
Extra Large	16	64	200 GB	at least VM version 11 or ESXi 6.0

The NSX Edge node VM could be deployed in different ways, from the NSX UI, with the .OVA, installing it via ISO and even using a PXE environment if you want. What you must know is that you can only deploy the edge node in VM in a ESXi hypervisor, nothing else is supported.
The VM is ready to go of course, what is strictly needed during the deployment phase is of course an hostname, ip address, the needed communication ports opened and usual infrastructure services like NTP and DNS available.

An important pre-requisite to deploy correctly a virtual edge is the usage of the network interfaces ( maximum of 5) and their assignments. Some pre-requisites must be satisfied:

At least two ports/interfaces from the NSX Node connected to the VDS
First interface (eth0) it’s used for the management traffic / access and it’s using one Edge vNic
Four remaining interfaces are used for Datapath / overlay and uplink connection (vLAN)

The choices in how to use this four interface could be divided in two:

Create multiple N-VDS with different vnic attached, in order to have simple predictable configuration, for example: one N-VDS for Overlay, one N-VDS for Uplink1 and one N-VDS for Uplink2

Create a unique N-VDS with multiple vnics attached that carries both VLAN and Overlay with two TEPS for balancing the overlay traffic

It is possible also to take advantage of network offloading for the NSX Edge traffic using the SmartNICs mentioned on few article before in this series. In this case what needs to be enable is the UPT or Uniform Passthrough mode in the NSX Edge. More information about it here.

Let’s now talk about the Bare Metal form factor, so the possibility to install the NSX Edge directly on Physical hardware.

Form	CPU Cores	RAM	Disk	DPDK CPU Req
Minimum Requirements	8	32	200Gb	AES-NI 1Gb Huge Page support
Recommended Requirements	24	256	200Gb	AES-NI 1Gb Huge Page support

Of course in this case the only supported ways to deploy the Edge is via ISO or PXE. The same requirement as before for the virtual must be satisfied, so infra services NTP and DNS available, hostname, IP address and so on. If you are interested you can find more information regarding specific hardware support on the official installation documentation.

Video Deploy NSX EDGE via NSX UI

Official documentation for Installing Bare Metal Edge

Once you deploy the first edge you can start doing some checks to verify that everything is working as expected:

Check Ping to the Management IP assigned to the Edge Node
if SSH enabled try to connect via that protocol to the Edge
Check if the Edge node can reach it’s default GW
Check if the Edge node can reach DNS and NTP
Verify the connectivity between hosts and edge node, managers and edge nodes (remember to check (https://ports.vmware.com)

Tier 1 and Tier 0 Gateway

video creating T1 GW and connecting Segments to testing east – west communication

video creating T0 GW, uplink segments vlan

Once T0 is in place correctly the administrator can configure static and dynamic routing for allowing the routing communication with the rest of the infrastructure outside NSX.

Depending on the configuration of the NSX infrastructure, so if you have a multi-tier configuration the connectivity between T1 and T0 it’s a must to perform a proper routing inside and outside NSX, this require T1/T0 to be linked. Once you link T1 to T0 one important configuration step to remember is the route advertisement used to propagate the subnets from T1 to T0. (Segments, VPN, NAT IPs, DNS Forwarders). In the T0 instead what need to be configured is the route re-distribution in order to propagate and publish to the upstream routers the subnets.

video link T1 to T0 and route advertisement

After this configuration if we go in the network topology diagram inside the NSX UI the T0 and T1 gateways and their networks.

The T0 has also a deeper view called FABRIC view where we can see the components that is using like SR, Edge Nodes, Uplink interfaces

Configure Static or Dynamic Routing

Static Routing	Dynamic Routing
Manual configuration	dynamic route allow gateway to exchange information about networks
fine tuning of route selection	routing protocols are used to obtain in a dynamic way routes to access networks
route changes cannot be made dynamic	routers speak with neighbor gateways when network change is needed
limited scalability / administrative overhead
failover possible with route redundancy and failure scenarios

The T0 Gateway supports the following:

Static Routing with upstream physical gateway
BGP Dynamic Routing
- eBGP (external) session with peers with different AS
- iBGP (internal) session with peers in same AS

OSPF
- over point to point networks
- over broadcast networks

T0 is supporting also the BGP Route Aggregation that is a feature of BGP that allow to aggregate specific routes into a single one (a larger prefix of subnets). This is done to reduce the size of routing tables, number of advertised routes and accelerate the best path calculation.

OSPF is not a routing protocol enable by default on the T0, it must be enabled before configure the relative parameters. (Graceful restart, Area ID, Authentication and so on)

Also with OSPF there is the possibility to use Route Summarization in order to:
– Reduce LSA (Link State Advertisement)
– Decrease the CPU and Memory usage
– Facilitate the troubleshooting

A simple example in the image below.

On the image above the T0 Gateway is advertising the summarization of both /25 network in a simple /24 that is covering the group of subnets coming from different T1-Gateways.

IN NSX this could be configured on the Set Router Summarization in the T0 GW setting.

image set ospf router summarization

High Availability and ECMP

Equal-cost multipathing allow to have a more robust routing configuration adding some feature and functions:

ECMP increase the N/S communication bandwidth by combining multiple uplinks
ECMP perform Load Balancing
ECMP provide Fault tolerance for the paths that failed
8 paths maximum are supported
ECMP routing is available for T0-GW uplinks

High Availability can be configured for the Gateways to have redundancy with two main modes:

Active-Active: All Edge Nodes are active and run the gateway services concurrently, the workload is distributed between the nodes in order to avoid overload on a single instance
Active-Standby: One Edge Node is active and the other oner remains in standby ready to take over in case the active one becomes unavailable

Active/Active Mode

This is the default high availability mode for T0 gateways and provides the following benefits:

ECMP to balance the traffic load across the EDGE Nodes
Logical routing services available and running actively on multiple edge instances at the same time
Starting from NSX version 4.0.1 complete support to stateful services (like NAT) in active/active HA mode

BGP service can be of course used with HA Active/Active and will have all the peering between physical routers and all the edge SR instances. The DRs on the lower part will load balance the traffic flow across all SR instances available using the automatic internal transit segment.

Active/Standby Mode

As said before using this mode the gateway services will be running on only one edge node at time.
This is a situation where you need to use centralized stateful services that can only be provided within this mode like: Stateful Gateway Firewall and VPN.

The BGP topology and how the traffic will flow and passing through the Edge node and SR will be slightly different compared to the previous case.
The BGP peering is still established from both SRs instances running on the different EDGE nodes but in this case the standby one will perform the AS path prepending in order to be the instance that will not forward the traffic to the physical routers. The DR of course will send the traffic to the active SR only.

The same thing will apply also in case of the OSPF dynamic routing, in this case will be used the high cost to influence the route selection on physical routers.

Failover Detection

NSX has two main mechanism used to detect connectivity issue:

BFD or Bidirectional Forwarding Detection used on Management and Overlay Network
BGP or OSFP dynamic routing protocols on the Uplinks

BFD is used to detect path failures, and is used because is a low-overhead, fast protocol that sent keepalives on both management and tunnel interfaces.

Failover with BDF

The BFD is sending probes via management and tunnel interfaces and if this probes are failing the failover process is started. So the Active EDGE will Failover to the Passive the services.

Failover with Dynamic Routing

The routing sessions (BGP or OSPF) that are established with the northbound physical routers peers is of course checked and monitored. If the active gateway loses all the routing neighbors and there is a stand-by gateway available the process basically will switch the roles, so the stand-by will be promoted to active and the active to passive one. Of course if in the case below the active gateway loses only one of two peers this will not trigger the failover process.

When Active-Standby mode is used, the administrator can select which mode using for the HA:

Preemptive: when the preferred node recovers from a failure is always becomes the active node, the peer is moving to stand-by state
Non Preemptive: when the preferred node recovers from the failure it checks before if the peer is active, if yes, it remains as standby, so is only recovering is active state only if the peer is not available.

This setting is available on the Tier-0 Gateway configuration, on the Fail Over section.

VRF Lite

VRF Lite enables to use the same instances of T0 and creating virtual T0-VRF instances inside the same EDGE node/s.

Before going through the whole VRF let quickly list the requirements:
– Tier-0 deployed
– Connectivity via Layer 3 to peer
– Peer device support to 802.1Q protocol (VLAN Tagging)
VRF Lite will not support VPN and OSPF Routing

Why use VRF Lite?
With VRF Lite you can for example:
– allow same network address to coexist in different “routing domains” (If you don’t know what is a VRF check this nice article from CBT Nuggets – What is Cisco VRF (Virtual Routing and Forwarding)
– provide feature compatibility with existing network installations
– run different multiple instances in the same Gateway, optimizing the usage of the resources

In the case of Topologies VRF Lite is supporting 100% both Single Tier and Multi-Tier.

Interfaces on the VRF Lite Gateway

Similarly like the standard Gateway also in VRF Lite flavor we have the interfaces that we knows from the previous paragraphs with some additions:

Logical Router (LR) trunk port connect the parte T0 Gateway to the physical devices
VRF Uplink interface is internally connected to the LR trunk port of the parte T0
The intra-tier Transit Link is the internal link between the SR and DR of a VRF Gateway
Downlink interfaces connect VRF Gateways to the segments
RouterLink ports connect VRF Gateways T0 with T1 Gateways

A written before the VLAN tagging in the Uplink segment is used to provide isolation for each VRF instance, the VLAN is the channel used for the data plane and the BGP protocol in each VRF provides the control plane functionality.

In order to configure the VRF Lite you need to do the following:

Deploy Default T0 Gateway
Add Uplink Interfaces to Default T0 Gateway
Configure Default T0 Gateway
Create Uplink Trunk Segments
Deploy VRF Gateway
Add Uplink Interfaces to VRF Gateway
Configure VRF Gateway
Create and Connect/Connect T1 to VRF Gateway

This is concluding the Logical Routing part, if you arrived until here thanks and see you in the next chapter of this series!

Tags: