Ethernet and Overlay technologies over Ethernet Frames Formats

Tuesday, December 31, 2013


Ethernet and Overlay technologies over Ethernet (802.3, 802.1q, 802.1ad QinQ, MPLS, 802.1ah mac-in-mac, Cisco Fabricpath, TRILL, OTV, LISP, VxLAN, NVGRE, and STT) are everywhere and in each presentation or implementation. So, let’s start from the beginning (which is 802.3 Ethernet) going to almost all New Ethernet or Overlay Ethernet frame encapsulation methods.

The main idea of this post is that I wanted to find most of the Ethernet encapsulations (as MPLS is L2.5 and the new Ethernet standards are considered to be L2 routing) put together to see the frame fields and the total frame sizes “on the wire”. The most important things in this post are the tables below, the rest of it only define the fields and give some more details, not necessarily all.





The internal structure of an Ethernet frame – with IFG


IEEE 802.3 Ethernet frame - 1538 bytes/octets
Pre
SFD
DMAC
SMAC
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
2
46-1500
4
12

IEEE 802.1Q - tagged Ethernet frame - 1542 bytes/octets
Pre
SFD
DMAC
SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
IFG
8
6
6
4
2
46-1500
4
12

IEEE 802.1AD - double tagged Ethernet frame - QinQ - 1546 bytes/octets
Pre
SFD
DMAC
SMAC
P-TAG
0x88a8
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
4
2
46-1500
4
12

IEEE 802.3 frame with 3 MPLS Headers - 1570 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
Type
0x8847
LSP
Label
RSVP Label
VPN
Label
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
2
4
4
4
6
6
4
2
46-1500
4
12

Cisco FabricPath Ethernet frame - 1558 bytes/octets
Pre
SFD
ODA
OSA
FP
TAG
C-DMAC
C-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
6
6
4
2
46-1500
4
12

IEEE 802.1AH - PBB Ethernet frame - MACinMAC - 1568 bytes/octets
Pre
SFD
B-DMAC
B-SMAC
B-TAG
0x88a8
I-TAG
0x88e7
C-DMAC
C-SMAC
P-TAG
0x88a8
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
6
6
6
4
4
2
46-1500
4
12

TRILL Ethernet frame - 1568 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
Type
TRILL HEADER
I-SMAC
I-SMAC
I-TAG
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
8
6
6
4
4
2
46-1500
4
12

OTV Ethernet 802.1Q frame by IETF - 1596 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x0800
Type
0x8100
Outer IP header
O-UDP Header
OTV Header
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
8
8
6
6
4
2
46-1500
4
12

OTV Ethernet 802.1Q frame by Cisco (?)  - 1596 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x0800
Type
 0x8100
Outer IP header
GRE Header
MPLS label
OTV
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
8
8
6
6
4
2
46-1500
4
12

LISP Ethernet 802.1Q frame  - 1596 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x0800
Type
0x8100
Outer IP header
O-UDP Header
LISP Header
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
8
8
6
6
4
2
46-1500
4
12

VxLAN Ethernet 802.1Q frame  - 1596 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x0800
Type
0x8100
Outer IP header
O-UDP Header
VxLAN Header
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
8
8
6
6
4
2
46-1500
4
12

NvGRE Ethernet 802.1Q frame  - 1588 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x08100
Type
 0x8100
Outer IP header
NvGRE 0x6558
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
8
6
6
4
2
46-1500
4
12

STT Ethernet 802.1Q frame  - 1608 bytes/octets
Pre
SFD
O-DMAC
O-SMAC
O-TAG
0x0800
Type
0x8100
Outer IP Header
Outer TCP Header
STT Header
I-DMAC
I-SMAC
C-TAG
0x8100
Type
0x0800
IP
20 bytes
TCP
20 bytes
Payload / Data
6-1460
CRC
FCS
IFG
8
6
6
4
2
20
20
8
6
6
4
2
46-1500
4
12


The minimum frame size allowed to be transmitted is 64 bytes = Header + CRC + Payload, the minimum Payload size is 46 bytes for non-tagged frames and 42 (46-4 bytes tag) for tagged frames.

The maximum frame size allowed to be transmitted is 1518 bytes = Header + CRC + Payload, the Preamble + SFD, the CRC or IFG are not calculated at frame transmission are important only to calculate the maximum throughput of the link.

Something else worth to mention here is that the frame is transmitted with the most-significant octet first (starting from Preamble until the CRC, IFG is just like an idle time), but in each octet, the least-significant bit is transmitted first, meaning in each octet the bits order are inversed when the frame is transmitted.

More details about each field in the above Ethernet frames formats can be found in the below description (which it is way too big to be in the initial post).





Fields Description

Let's start with the field description starting with the wiki links for more details:


1. Pre – Preamble – 7 bytes  The Preamble is an alternating pattern of ones and zeros that tells receiving stations that a frame is coming.

2. SFD/SOD – Start of Frame Delimiter – 1 byte – Consists in an an alternating pattern of ones and zeros, ending with 11 indicating that the next bit is the Destination MAC address (the leftmost bit  of it).

3. DMAC – Destination MAC – 6 bytes – Identifies which MAC address should receive the frame. The first 3 bytes, in transmission order, correspond to OUI - Organizationally Unique Identifier assigned by IEEE to each organization; the following 3 bytes are assigned by that organization. 
See more details on MAC address

4. SMAC – Source MAC – 6 bytes – Identifies the sending MAC, it is always an individual address and the leftmost bit in this field is always 0.

5. Type / Length – 2 bytes – Might indicate the Length - number of bytes that are contained in the Data field, if the number is less or equal with 1500 (0x05DC HEX); or might indicate the Ether type if the value is greater or equal with 1536 (0x0600 HEX). 

The most important Ethertypes are below:
0x0800        Internet Protocol version 4 (IPv4)
0x0806        Address Resolution Protocol (ARP)
0x22F3        IETF TRILL Protocol
0x8035        Reverse Address Resolution Protocol
0x8100        Tagged IEEE 802.1Q,Shortest Path Bridging IEEE 802.1aq
0x86DD       Internet Protocol Version 6 (IPv6)
0x8808        Ethernet flow control
0x8809        Slow Protocols (IEEE 802.3)
0x8847        MPLS unicast
0x8848        MPLS multicast
0x8863        PPPoE Discovery Stage
0x8864        PPPoE Session Stage
0x8870        Jumbo Frames[2]
0x888E        EAP over LAN (IEEE 802.1X)
0x88A8       PB IEEE 802.1ad,Shortest Path Bridging IEEE 802.1aq
0x88CC       Link Layer Discovery Protocol (LLDP)
0x88E5        MAC security (IEEE 802.1AE)
0x88F7        Precision Time Protocol (IEEE 1588)
0x8902        IEEE 802.1ag CFM Protocol Y.1731 (OAM)
0x892F        High-availability Seamless Redundancy (HSR)
0x9100        Q-in-Q (old)

See more details on EtherType

6. Payload / Data – 46-1500 bytes – The actual Data contained in the frame. After physical layer and link layer processing is complete this data will eventually be sent to an upper layer protocol. In this field are encapsulated the upper layer protocols, including IP, TCP or UDP headers.

7. CRC / FCS  Cyclic Redundancy Check / Frame Check Sequence   4 bytes  Contains a CRC check value created by the Source MAC and recalculated by the Destination MAC to check for the damage which might have occurred on the frame transit.

8. IFG – Inter-frame GAP – 12 bytes – Represents minimum idle period between transmissions of Ethernet frames, a recovery time between frames which allows the devices to prepare the reception of the next frame. It is inserted by the physical layer.

See more details on IFG


9. Single tagged Header - 802.1 Q - C-TAG - Customer Tag - 4 bytes - Consists of the following fields, the C from the beginning of each field comes from Customer and it is there in order to be able to better understand the further encapsulations:

4 bytes
C-TPID 
0x8100
C-TCI
P-PCP C-COS
C-DEI
C-VID
16 bits
3 bits
1 bit
12 bits







9.1 TPID – Tag Protocol Identifier – 2 bytes / 16 bits – Set to 0x8100 HEX to identify the frame as an IEEE 802.1Q-tagged frame Tag.
9.2 TCI – Tag Control Information – 2 bytes / 16 bits – Has the following 3 fields:
9.2.1 PCP – Priority Code Point or COS Class of Services – 3 bits – Refers to IEEE 802.1p priority, from 0 which represents best effort to 7 the highest priority.
9.2.2 DEI – Drop Eligible Indicator – 1 bit – Can be used together with PCP to indicate the frame is eligible to be discarded in the presence of congestion; called also CFI - Canonical Format Identifier, it is always set to 0 for the Ethernet switches and to 0 for Token-Ring-types network.
9.2.3 VID – Vlan ID – 12 bits – Indentifies the vlan to which the frame belongs to. The HEX value of 0x000 is reserved for representing a frame without tag, in this case the 802.1Q tag specifies the priority and it is called priority tag, also the 0xFF frame is reserved.

The insertion of 802.1Q tag forces the FCS/CRC recalculation.


10. Double Tagged Header - QinQ - 802.1 AD Header - 8 Bytes - consists in 2 802.1Q headers (P-TAG and C-TAG)

8 bytes
 P-TAG - 4 bytes
C-TAG - 4 bytes
P-TPID 
0x88a8
P-TCI
C-TPID 
0x8100
C-TCI
P-PCP
P-COS
P-DEI
P-VID
C-PCP
 C-COS
C-DEI
C-VID
16 bits
3 bits
1 bit
12 bits
16 bits
3 bits
1 bit
12 bits


10.1 P-TAG – Provider or Outer TAG – the newly inserted 802.1AD header – The only difference between the initial 802.1Q header is the TPID value, which in this case is 0x88a8 representing the 802.1AD ether-type, the PCP/COS field represents the 802.1P priority for the Provider or Outer Tag. 
10.2 C-TAG – Customer or Inner TAG – the newly inserted 802.1AD header – The initial 802.1Q TAG, with the TPID of 0x8100HEX value.

Of course, in this case also the FCS/CRC is forced to be recalculated in order for the second 802.1Q tag to be included.


11.  MPLS Headers

MPLS Label - 4 bytes
Label
EXP
TC
S
TTL
20 bits
3 bits
1 bit
5 bit

11.1 Label – 20 bits – The label values, the range is from 0 through (220-1),  but the labels values from 1-15 are reserved,  4-15 reserved for future use and 0-3 defined below (defined in RFC3032 - MPLS Label Stack Encoding):
 0 – IPv4 Explicit NULL value  Indicates that the label stack must be popped and the packet must be forwarded based on IPv4 header.
 1 – Router Alert Label  If this value is on top of the label stack, then the packet must be delivered to local software for processing. It can be seen as the "Router Alert Option" in IPv4 packets, for example ping with record-route option.
 2 – IPv6 Explicit NULL value  Indicates that the label stack must be popped and the packet must be forwarded based on IPv6 header.
 3 – Implicit NULL value – Indicates that LSR pops the top label from the stack and forwards the rest of the packet (label or unlabeled) through the outgoing interface.
11.2. EXP – TC Field – Experimental or Traffic Class Field – 3 bits – Traffic class for QoS priority and ECN - Explicit Congestion Notification.(RFC 5462 renamed the EXP bits to TC).
11.3. S – Bottom of the Stack flag – 1 bit – If this is set, it signifies that the current label is the last in the stack.
11.4. TTL – Time to Live – 8 bits – The usage of the TTL field in the label is the same as the TTL in the IP header. When an IP packet enters the MPLS cloud - such as on the ingress LSR - the IP TTL value is copied (after being decremented by 1) to the MPLS TTL values of the pushed label(s). At the egress LSR, the label is removed, and the IP header is exposed again. The IP TTL value is copied from the MPLS TTL value in the received top label after decrements it by 1

MPLS Frame Format with 2 and 3 Labels
There is no point in having MPLS and no Applications on top of it, so i will just skip the 1 label MPLS Frame. When the label stack is referenced to, the first left label in this picture is called the Top label, the Outer label or Outermost label, and the last label is called (first right label in this picture) Bottom label, Inner label or Innermost label.

The frame with 2 MPLS labels can be seen in the following scenarios:
1. VPN - both Layer 2 and Layer 3 VPN, in which the Top label represents the IGP path to the next BGP PE router which originates the VPN route and the Bottom label, represents the VPN route.
2. VPN over TE (PE to PE TE) -  both Layer 2 and Layer 3 VPN, in which the Top label represents the RSVP Tunnel label to the peer TE PE (considering of course PE to PE TE, eventually a peer of TE for both inbound and outbound traffic) and the the Bottom label represents the VPN route.

The frame with 3 MPLS label represents the VPN over TE (PE-P TE and PE-P-PE TE, with LDP over TE enabled), for both Layer 2 and Layer 3 VPN. The Top label is the TE label (between PE and P), the middle label is the LDP label, and finally the Bottom label represents the VPN Label.

See more information on MPLS labels.

The last thing worth to mention here is about the FAT label, Entropy label and GMPLS and MPLS-TP. 
The FAT label is actually a flow label used for optimizing load-balancing over L2VPN and VPLS. The ingress PE maps all flows inside of the PW3 or VPLS to the same label which could causes inefficient balancing or the same traffic path. This why, the ingress PE can insert a new label, with no reserved value, but with TTL set to 0 in order to be discarded at the egress PE. 
The Entropy label is a generalized extension to FAT label, it can be applied to IP forwarding, L2VPN or L3VPN, the entropy label is not used for forwarding, is not signaled its only purpose is to improve load-balancing. The egress LSR signals the ELI (Entropy Label indicator, reserved value 7) which indicates that the following label is the Entropy label. The ingress LSR includes the ELI in the label stack along with the Entropy label. The S bit for ELI is set to 0, and the TTL value is set to the same TTL value of above label. (RFC 6790).
GMPLS (Generalized MPLS) labels can be fibers, wavelength, timeslots and so on. The MPLS-TP (Transport Profile) has the same frame format as MPLS but has a restricted forwarding plane and control plane can be either static defined or dynamic using GMPLS.


12.  Cisco FabricPath Headers

The FabricPath encapsulation uses a MAC-in-MAC encapsulation format. The original Ethernet frame, including an 802.1Q tag, is prepended by a 48-bit outer source address (SA), a 48-bit outer destination address (DA), and a 32-bit FabricPath tag. While the outer SA and DA may appear as 48-bit MAC addresses, FabricPath switches receiving such frames on a FabricPath core port parse these fields according to the format shown below.

16 bytes
6 bytes
6 bytes
4 bytes
ODA
OSA
FP Tag
Endnode ID
U
L
I
G
Endnode ID
RSVD
OOO
DL
Switch ID
Sub
switch
ID
Port ID
Endnode ID
U
L
I
G
Endnode ID
RSVD
OOO
DL
Switch ID
Sub
switch
ID
Port ID
Etype
0x8903
Ftag
TTL
6 bits
1 bit
1 bit
2 bits
1 bit
1 bit
12 bits
8 bits
16 bits
6 bits
1 bit
1 bit
2 bits
1 bit
1 bit
12 bits
8 bits
16 bits
16 bits
10 bits
6 bits

– ODA – O-DMAC – Outer Destination MAC – 48 bits / 6 bytes
– OSA – O-SMAC – Outer Source MAC – 48 bits / 6 bytes
– FP Tag – FabricPath Tag – 32 bits / 4 bytes

12.1 Endnode ID
6 bits  Reserved, not yet used. The presence of this field may provide the future capability for a FabricPath-enabled end station to uniquely identify itself, allowing FabricPath-based forwarding decisions down to the virtual or physical end-station level.
12.2. U/L bit 1 bit  FabricPath switches set this bit in all unicast outer SA and DA fields, indicating the MAC address is locally administered (as opposed to universally unique). This is required since the outer SA and DA fields are not in fact MAC addresses and do not uniquely identify a particular hardware component as a standard MAC address would.
12.3 I/G bit 1 bit – The I/G bit serves the same function in FabricPath as in standard Ethernet. Any multidestination addresses have this bit set.
12.4 OOO/DL bit 1 bit  The function of the OOO (out-of-order)/don't learn (DL) bit varies depending on whether the bit is set in the outer DA (OOO) or the outer SA (DL). Reserved, not yet used.
12.5 Switch ID 12 bits  Every switch in the FabricPath domain is assigned a unique 12-bit Switch ID. In the outer SA, this field identifies the FabricPath switch that originated the frame (typically the ingress FabricPath edge switch). In the outer DA, this field identifies the destination FabricPath switch.
12.6 Sub-Switch ID 8 bits  The sub-switch ID (sSID) field identifies the source or destination VPC+ port-channel interface associated with a particular VPC+ switch pair. FabricPath switches running VPC+ use this field to identify the specific VPC+ port-channel on which traffic is to be forwarded. The sSID value is locally significant to each VPC+ switch pair. Note that, because this field is 8 bits, using the sSID to identify VPC+ port-channels imposes a limit of roughly 250 VPC+ port-channels per VPC+ switch pair (244 to be precise). In the absence of VPC+, this field is always set to 0.
12.7 Port ID  Local Identifier (LID) 16 bits  Can be used to identify the specific physical or logical interface on which the frame was sourced or is destined. When used, the value is locally significant to each switch. This field in the outer DA allows the egress FabricPath switch to forward the frame to the appropriate edge interface without requiring a MAC address table lookup. For frames sourced from or destined to a VPC+ port-channel, this field is set to a common value shared by both VPC+ peer switches, and the sSID is used by default to select the outgoing port instead.
12.8 Etype  EtherType 16 bits  The EtherType value for FabricPath encapsulated frames is 0x8903.
13.9 FTAG 10 bits  The function of the forwarding tag, or FTAG, depends on whether a particular frame is unicast or multidestination. In the case of unicast frames, the FTAG identifies the FabricPath topology the frame is traversing. The system selects a unique FTAG for each topology configured. In the case of multidestination frames, the FTAG identifies which multidestination forwarding tree in a given topology the frame should traverse.
12.10 TTL  6 bits  The Time to Live (TTL) field serves the same purpose in FabricPath as it does in traditional IP forwarding - each switch hop decrements the TTL by 1, and frames with an expired TTL are discarded. The TTL in FabricPath prevents Layer 2 bridged frames from looping endlessly in the event that a transitory loop occurs (such as during a reconvergence event).


13. PBB Ethernet Header - 802.1AH - MACinMAC - 42 bytes header
 - consists in the following fields:

42 bytes
22 bytes
12 bytes
8 bytes
6 bytes
6 bytes
B-TAG - 4 bytes
2 bytes
I-TAG - 4 bytes
C-DMAC
C-SMAC
 P-TAG - 4 bytes
C-TAG - 4 bytes
B-DMAC
B-SMAC
B-TPID
B-TCI
I-Type
0x88e7
I-TPID
I-TCI/SID
P-TPID
0x88a8
P-TCI
C-TPID 
0x8100
C-TCI
B-Type
0x88a8
B-COS
B-DEI
B-VID
I-SID
I-COS
I-DEI
Reserved
P-COS
P-DEI
P-VID
 C-COS
C-DEI
C-VID
48 bits
48 bits
16 bits
3 bits
1 bit
12 bits
16 bits
24 bits
3 bits
1 bit
4 bits
16 bits
3 bits
1 bit
12 bits
16 bits
3 bits
1 bit
12 bits
Backbone Components - 16 bytes
Service Components - 6 bytes
6 bytes
6 bytes
QinQ - Components - 8 bytes

13.1 Backbone Components – 16 bytes  Have the following fields:
13.1.1 B-DMAC – Backbone Destination MAC – 6 bytes
13.1.2 B-SMAC – Backbone Source MAC – 6 bytes
13.1.3 B-TAG – Backbone 802.1Q TAG – 4 bytes
13.2 Service Components – 6 bytes – Have the following fields:
13.2.1 I-Type – 2 bytes  Represents Service Ethertype, 0x88e7 HEX in this case
13.2.2 I-TAG – 4 bytes – Represents a "modified type of 802.1Q TAG" with the following internal fields:
13.2.2.1 I-SID – 3 bytes  Represents Service Identifier or Service Instance VLAN ID, it allows to distinguish services within the same PBB domain
13.2.2.2 I-PCP/ I-COS – 3 bits – Same as PCP in normal 802.1Q frame header
13.2.2.3 I-DEI – 1 bit – Same as DEI in normal 802.1Q frame header
13.2.2.4 Reserved – 4 bits
13.3 QinQ Components – 8 bytes – same fields as in 802.1 AD Header
See more details on PBB


14.  TRILL (Transparent Interconnection of Lots of Links) Headers 

40 bytes
16 bytes
2  bytes
6 bytes
16 bytes
O-DMAC
O-SMAC
O-TAG
Etype
V
R
M
OL
HC
E-Rbridge Nickname
E-Rbridge Nickname
I-DMAC
I-SMAC
I-TAG
OUTER MAC HEADER
Etype
TRILL HEADER
INNER MAC HEADER
48 bits
48 bits
32 bits
16 bits
2 bits
2 bits
1 bits
5 bits
6 bits
16 bits
16 bits
48 bits
48 bits
32 bits

14.1 Outer MAC header 16 bytes – contains O-DMAC (Outer Destination MAC), O-SMAC (Outer Source MAC), O-TAG (Outer 802.1Q TAG) – Those contain same fields as described above in Ethernet frame and in 802.1Q Ethernet frame.
14.2 Etype – TRILL Ether- type – 2 bytes – Most probably this value should be 0x22F3.
14.3 TRILL HEADER – 6 bytes – This contains the following fields:
14.3.1 V – Version – 2 bits – Represents TRILL protocol version.
14.3.2 R – Reserved – 2 bits – Reserved for future use in extensions to the TRILL version.
14.3.3 M – Multi-destination bit – 1 bit – Indicates that the frame is to be delivered to a class of destination end stations via distribution tree and that the egress Nickname field specifies this tree: 
 M=0  The egress RBridge Nickname contains a Nickname of the egress Rbridge for a known unicast MAC address
 M=1 The egress RBridge Nickname contains a Nickname that specifies a distribution tree (RBridge that is the root of the tree)
14.3.4 OL – Option Length – 5 bits – Specifies in the TRILL Header if that a frame is using an optional capability and the need to encode information into the header in connection with that capability. If it is 0, there is no option present. If the options are present, they follow immediately after the Ingress Rbridge Nickname field.
14.3.5 HC – Hop Count – 6 bits – A Rbridge drops the frames received with a hop count of 0, otherwise it decrements the hop count.
14.3.6 E-Rbridge Nickname – Egress Rbridge Nickname – 16 bits – Both the Egress and the Ingress Nicknames are dynamically assigned that act as abbreviations for RBridges' ISIS IDs to achieve a more compact encoding and can be used to specify potentially different trees with the same root. 
For known unicast frames and M=0, the egress RBridge nickname fields specifies the egress RBridge (which should remove the TRILL encapsulation).
For multi-destination TRILL frames and M=1, the Egress RBridge contains a nickname specifying the distribution tree selected to be used to forward the frame.
14.3.7 I-Rbridge Nickname – Ingress Rbridge Nickname – 16 bits – Is set to a nickname of the ingress Rbridge for TRILL data frames and to a nickname of the source RBridge. If the RBridge   settings the ingress nickname has multiple nicknames, it should use the same nickname in the ingress field whenever it encapsulates a frame with any particular Inner.Mac.SA and Inner.Vlan value.
14.4 Inner MAC header 16 bytes – Contains I-DMAC (Inner Destination MAC), I-SMAC (Inner Source MAC), I-TAG (Inner 802.1Q TAG) - Those contain same fields as described above in Ethernet frame and in 802.1Q Ethernet frame.

See more details on the RFC 6325


15.  Cisco OTV - Overlay Transport Virtualization

Cisco Overlay Transport Virtualization (OTV) is a Layer 2-over-Layer 3 encapsulation "MAC-in-IP" technology that is designed to extend the reach of Layer 2 domains across data center pods, domains, and sites. It uses stateless tunnels to encapsulate Layer 2 frames in the IP header and does not require the creation or maintenance of fixed stateful tunnels. OTV encapsulates the entire Ethernet frame in an IP and User Datagram Protocol (IP/UDP) header, so that the provider or core network is transparent to the services offered by OTV.

OTV uses Ethernet over Generic Router Encapsulation (GRE) and adds an OTV shim to the header to encode VLAN information. The OTV encapsulation is 42 bytes, which is less than virtual private LAN service (VPLS) over GRE.

What I have found on INE - OTV Decoded – A Fancy GRE Tunnel , very interesting and nice way to explain OTV:

MPLS? GRE? Where did those come from? That’s right, OTV is in fact a fancy GRE tunnel. More specifically it is an Ethernet over MPLS over GRE tunnel. My poor little PINGs between R2 and R3 are in fact encapsulated as ICMP over IP over Ethernet over MPLS over GRE over IP over Ethernet (IoIoEoMPLSoGREoIP for short).

Here, because there are way too many fields I will put them grouped by different types, each with own color.

OTV Headers - 72 bytes /octets
Outer Ethernet 802.1Q
18 bytes / octets
Outer IP header
20 bytes / octets
 Outer UDP Header
 8 bytes / octets
OTV Shim Header
 8 bytes / octets
Inner Ethernet 802.1Q
18 bytes / octets

15.1 Outer Ethernet 802.1Q Headers – 18 Bytes

Outer Ethernet 802.1Q - 18 bytes / octets
O-DMAC
O-SMAC
Type
0x8100
O-COS
O-DEI
O-VID
Etype
0x0800
 48 bits
48 bits
16 bits
3 bits
1 bit
12 bits
16 bits

Same as any 802.1Q frame, with the above fields and Ether-types.
15.2 Outer IP Header – 20 bytes

Outer IP header - 20 bytes / octets
V
IHL
5
TOS
Total
Length
Identification
Flag
DF=1
Fragment Offset
Time to Live
Protocol
17
Header Checksum
S-IP
D-IP
4 bits
4 bits
8 bits
16 bits
16 bits
3 bits
13 bits
8 bits
8 bits
16 bits
32 bits
32 bits

15.2.1 V – Version – 4 bits – Set to value 4 in decimal.
15.2.2 IHL – 4 bits –  Set to value 5 in decimal meaning there are no IP options present in an OTV encapsulated packet.
15.2.3 TOS – Type of Service – 8 bits – The 802.1P bits from the Ethernet Frame are copied to this field.
15.2.4 Total Length – 16 bits – The total length of the IP datagram in bytes. This includes the IP header, the UDP header, the OTV header, and the L2 frame without the preamble and CRC fields.
15.2.5 Identification – 16 bits – Set randomly by the OTV Edge Device.
15.2.6 Flags – 3 bits The DF bit should be set to 1.
15.2.7 TTL Time to Live – 8 bits Set by the OTV Edge Device and is configurable.
15.2.8 Protocol – 8 bits Since the packet is UDP encapsulated, this field is set to 17 decimal.
15.2.9 Header Checksum – 16 bits Must be computed by the OTV Edge Device over the IP header fields.
15.2.10 S-IP – Source Address – 32 bits The IP address of the OTV Edge Device doing the encapsulation of the L2 frame.
15.2.11 D-IP – Destination Address – 32 bits The IP unicast or multicast address set by the OTV Edge Device which is encapsulating the L2 frame.  The Edge Device decides when the address is set to a unicast or multicast address.
15.3 Outer UDP Header – 8 bytes

Outer UDP Header - 8 bytes / octets
S-Port
D-Port
8472
UDP
length
UDP Checksum
16 bits
16 bits
16 bits
16 bits

15.3.1 S-Port – Source Port – 16 bits Is chosen by the OTV Edge Device which is encapsulating the L2 frame based on a hash of the L2 frame. This allows packets to be load-split evenly over LAGs on routers in the core, responsible for delivering these IP encapsulated packets.
15.3.2 D-Port – Destination Port – 16 bits This is an IANA assigned well-known user port number. Packets encapsulated by an OTV Edge Device put value 8472 in the destination port field.
15.3.3 UDP Length – 16 bits - Is the length in bytes of the UDP header, the OTV header, and the L2 frame without the preamble and CRC fields.
15.3.4 UDP Checksum – 16 bits – This is set to 0 by the OTV Edge Device when doing encapsulation and ignored by the OTV Edge Device which is decapsulating at the destination site.
15.4 OTV Shim Header – 8 bytes

OTV Shim Header  - 8 bytes / octets
R
R
R
R
I
R
R
R
Overlay  ID
Instance ID
Reserved
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
24 bits
24 bits
8 bits

15.4.1 R - Reserved bits – 7 bits at the beginning of header and 8 bits at the end
15.4.2 I - Instance-ID –1 bit When set to 1, it indicates the Instance ID should be used in the forwarding lookup.
15.4.3 Overlay ID – 24 bits Is used only for control plane packets such as the URP/MRP (IS-IS) to identify packets for a specific overlay.
15.4.4 Instance ID – 24 bits Set by the OTV Edge Device doing the encapsulation to specify a logical table that should be used for lookup by the OTV Edge Device at the destination site.
15.5 Inner 802.1Q Ethernet Headers – 18 Bytes

Inner Ethernet 802.1Q - 18 bytes / octets
I-DMAC
I-SMAC
Type
 0x8100
I-COS
I-DEI
I-VID
48 bits
48 bits
16 bits
3 bits
1 bit
12 bits

Same as any 802.1Q frame, with the above fields and Ether-types.

See more details on the draft-hasmit-otv-01 draft-hasmit-otv-01

The difference between OTV and VPLS .


16.  LISP - Location/Identifier Separation Protocol

The Cisco Location/Identifier Separation Protocol, or LISP, is designed to address the challenges of using a single address field for both device identification and topology location.  LISP addresses the problem by uniquely identifying two different number sets: routing locators (RLOCs), which describe the topology and location of attachment points and hence are used to forward traffic, and endpoint identifiers (EIDs), which are used to address end hosts separate from the topology of the network.

LISP [I-D.ietf-lisp] essentially provides an IP over IP overlay where the internal addresses are end station Identifiers and the outer IP addresses represent the location of the end station within the core IP network topology. The LISP overlay header uses a 24 bit Instance ID used to support overlapping inner IP addresses.
  
LISP Headers - 72 bytes /octets
Outer Ethernet 802.1Q
18 bytes / octets
Outer IP header
20 bytes / octets
 Outer UDP Header
 8 bytes / octets
LISP Shim Header
 8 bytes / octets
Inner Ethernet 802.1Q
18 bytes / octets

16.1 Outer Ethernet 802.1Q Headers – 18 Bytes

Outer Ethernet 802.1Q - 18 bytes / octets
O-DMAC
O-SMAC
Type
0x8100
O-COS
O-DEI
O-VID
Etype
0x0800
 48 bits
48 bits
16 bits
3 bits
1 bit
12 bits
16 bits

Same as any 802.1Q frame, with the above fields and Ether-types
16.2 Outer IP Header – 20 bytes

Outer IP header - 20 bytes / octets
V
IHL
5
TOS
Total
Length
Identification
Flag
DF=1
Fragment Offset
Time to Live
Protocol
17
Header Checksum
S-IP
D-IP
4 bits
4 bits
8 bits
16 bits
16 bits
3 bits
13 bits
8 bits
8 bits
16 bits
32 bits
32 bits
  
Same as OTV IP header (or any IP header) format, DF bit is not mandatory to be set.

The LISP architecture and protocols LISP introduces two new numbering spaces, Endpoint Identifiers (EIDs) and Routing Locators (RLOCs) which are intended to replace most use of IP addresses on the Internet. To provide flexibility for current and future applications, these values can be encoded in LISP control messages using a general syntax that includes Address Family Identifier (AFI), length, and value fields.

See more details on the LISP Canonical Address Format

16.3 Outer UDP Header – 8 bytes

Outer UDP Header - 8 bytes / octets
S-Port
D-Port
4341
UDP
length
UDP Checksum
16 bits
16 bits
16 bits
16 bits

Same as OTV IP header, with the difference that UDP port number is 4341, but when the headers are used for encapsulating L2 frames, the UDP Destination Port is set to 8472 (same as OTV).
16.4 LISP Shim Header – 8 bytes

LISP Shim Header  - 8 bytes / octets
N
L
E
V
I
Flags
Nonce / Map Version
Instance ID / Locator Status Bits
1 bit
1 bit
1 bit
1 bit
1 bit
3 bits
24 bits
32 bits

16.4.1 N – Nonce Present – 1 bit –The N bit is the nonce-present bit.  When this bit is set to 1, the low-order 24-bits of the first 32-bits of the LISP header contains a Nonce. Both N and V bits MUST NOT be set in the same packet. If they are, a decapsulating ETR MUST treat the "Nonce/Map-Version" field as having a Nonce value present.
16.4.2 L – Locator Status Bit – 1 bit – When this bit is set to 1, the Locator Status Bits in the second 32-bits of the LISP header are in use.
16.4.3 E – Echo-nonce-request – 1 bit – This bit MUST be ignored and has no meaning when the N bit is set to 0.  When the N bit is set to 1 and this bit is set to 1, means an ITR is requesting for the nonce value in the Nonce field to be echoed back in LISP       encapsulated packets when the ITR is also an ETR.
16.4.4 V – Map Version – 1 bit – When this bit is set to 1, the N bit MUST be 0. This bit indicates that the LISP header is encoded in this case as below:

LISP Shim Header  - 8 bytes / octets
N
L
E
V
I
Flags
Source Map Version
Destination Map Version
Instance ID / Locator Status Bits
0
x
0
1
x
x x x
12 bits
12 bits
32 bits

16.4.5 I – Instance ID – 1 bit – When this bit is set to 1, the Locator Status Bits field is reduced to 8-bits and the high-order 24-bits are used as an Instance ID. If the L-bit is set to 0, then the low-order 8 bits are transmitted as zero and ignored on receipt. The format of the LISP header would look like in this case:

LISP Shim Header  - 8 bytes / octets
N
L
E
V
I
Flags
Nonce / Map Version
Instance ID
LSBs
x
x
x
1
x
x x x
24 bits
24 bits
8 bits

16.4.6 Flags – 3 bits – Reserved for future flag use It MUST be set to 0 on transmit and MUST be ignored on receipt.
16.4.5 Nonce – 24 bits – The LISP nonce field is a 24-bit value that is randomly generated by an ITR when the N-bit is set to 1. Nonce generation algorithms are an implementation matter but are required to generate different nonces when sending to different destinations. However, the same nonce can be used for a period of time to the same destination. The nonce is also used when the E-bit is set to request the nonce value to be echoed by the other side when packets are returned.  When the E-bit is clear but the N-bit is set, a remote ITR is either echoing a previously requested echo-nonce or providing a random nonce.
16.4.6 LSB – LISP Locator Status Bits – 24 bits – When the L-bit is also set, the locator status bits field in the LISP header is set by an ITR to indicate to an ETR the up/down status of the Locators in the source site. Each RLOC in a Map-Reply is assigned an ordinal value from 0 to n-1 (when there are n RLOCs in a mapping entry). The Locator Status Bits are numbered from 0 to n-1 from the least significant bit of field. The field is 32-bits when the I-bit is set to 0 and is 8 bits when the I-bit is set to 1. When a Locator Status Bit is set to 1, the ITR is indicating to the ETR the RLOC associated with the bit ordinal has up status. When a site has multiple EID-prefixes which result in multiple mappings (where each could have a different locator-set), the Locator Status Bits setting in an encapsulated packet MUST reflect the mapping for the EID-prefix that the inner-header source EID address matches. If the LSB for an anycast locator is set to 1, then there is at least one RLOC with that address the ETR is considered 'up'.
16.5 Inner 802.1Q Ethernet Headers – 18 Bytes

Inner Ethernet 802.1Q - 18 bytes / octets
I-DMAC
I-SMAC
Type
 0x8100
I-COS
I-DEI
I-VID
48 bits
48 bits
16 bits
3 bits
1 bit
12 bits

Same as any 802.1Q frame, with the above fields and Ether-types.

See more details on the draft-ietf-lisp-23#page-19


17.  VxLAN - Virtual eXtensible LANs

Virtual Extensible LAN, or VXLAN, is a Layer 2 overlay scheme over a Layer 3 network. It uses an IP/UDP encapsulation so that the provider or core network does not need to be aware of any additional services that VXLAN is offering. A 24-bit VXLAN segment ID or VXLAN network identifier (VNI) is included in the encapsulation to provide up to 16 million VXLAN segments for traffic isolation and segmentation, in contrast to the 4000 segments achievable with VLANs. Each of these segments represents a unique Layer 2 broadcast domain and can be administered in such a way that it can uniquely identify a given tenant's address space or subnet.
In short, VXLAN is a Layer 2 overlay scheme over a Layer 3 network. Each overlay is termed a VXLAN segment.  Only VMs within the same VXLAN segment can communicate with each other. Each VXLAN segment is scoped through a 24 bit segment ID hereafter termed the VXLAN Network Identifier (VNI). This allows up to 16M VXLAN segments to coexist within the same administrative domain.

VxLAN Headers - 72 bytes /octets
Outer Ethernet 802.1Q
18 bytes / octets
Outer IP header
20 bytes / octets
 Outer UDP Header
 8 bytes / octets
OTV Shim Header
 8 bytes / octets
Inner Ethernet 802.1Q
18 bytes / octets

17.1 Outer Ethernet 802.1Q Headers – 18 Bytes

Outer Ethernet 802.1Q - 18 bytes / octets
O-DMAC
O-SMAC
Type
0x8100
O-COS
O-DEI
O-VID
Etype
0x0800
 48 bits
48 bits
16 bits
3 bits
1 bit
12 bits
16 bits

Same as any 802.1Q frame, with the above fields and Ether-types
17.2 Outer IP Header – 20 bytes

Outer IP header - 20 bytes / octets
V
IHL
5
TOS
Total
Length
Identification
Flag
DF=1
Fragment Offset
Time to Live
Protocol
17
Header Checksum
S-IP
D-IP
4 bits
4 bits
8 bits
16 bits
16 bits
3 bits
13 bits
8 bits
8 bits
16 bits
32 bits
32 bits

It is the same as OTV IP header (or any IP header) format. The source IP address is indicating the IP address of the VTEP over which the communicating VM (as represented by the inner source MAC address) is running. The destination IP address can be a unicast or multicast IP address. When it is a unicast IP address, it represents the IP address of the VTEP connecting the communicating VM as represented by the inner destination MAC address.
17.3 Outer UDP Header – 8 bytes

Outer UDP Header - 8 bytes / octets
S-Port
D-Port
4789
UDP
length
UDP Checksum
16 bits
16 bits
16 bits
16 bits

It is the same as OTV UDP header, with the difference that UDP Destination port number is 4789. Some early implementations of VXLAN have used other values for the destination port. To enable interoperability with these implementations, the destination port SHOULD be configurable. It is recommended that the source port number be calculated using a hash of fields from the inner packet - one example being a hash of the inner Ethernet frame`s headers. This is to enable a level of entropy for ECMP/load balancing of the VM to VM traffic across the VXLAN overlay.
The UDP checksum field SHOULD be transmitted as zero. When a packet is received with a UDP checksum of zero, it MUST be accepted for encapsulation. Optionally, if the encapsulating endpoint includes a non-zero UDP checksum, it MUST be correctly calculated across the entire packet including the IP header, UDP header, VXLAN header and encapsulated MAC frame.  When a dencapsulating endpoint receives a packet with a non-zero checksum it MAY choose to verify the checksum value. If it chooses to perform such verification, and the verification fails, the packet MUST be dropped. If the decapsulating destination chooses not to perform the verification, or performs it successfully, the packet MUST be accepted for decapsulation.
17.4 VxLAN Shim Header – 8 bytes

VxLAN Shim Header  - 8 bytes / octets
R
R
R
R
I
R
R
R
Reserved
VXLAN Network Identifier (VNI)
Reserved
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
24 bits
24 bits
8 bits

17.4.1 R – Reserved – 1 bit sequence – 7 fields of 1 bit representing the Reserved Bits, must be set to 0.
17.4.2 I – VXLAN Network ID (VNI) – 1 bit – Must be set to 1.
17.4.3 Reserved (24 bits and 8 bits) – MUST be set to zero.
17.4.4 VXLAN Network ID (VNI) – 24 bits – Designate the individual VXLAN overlay network on which the communicating VMs are situated. VMs in different VXLAN overlay networks cannot communicate with each other.
17.5 Inner 802.1Q Ethernet Headers – 18 Bytes

Inner Ethernet 802.1Q - 18 bytes / octets
I-DMAC
I-SMAC
Type
 0x8100
I-COS
I-DEI
I-VID
48 bits
48 bits
16 bits
3 bits
1 bit
12 bits

VXLAN is typically deployed in data centers on virtualized hosts, which may be spread across multiple racks. The individual racks may be parts of a different Layer 3 network or they could be in a single Layer 2 network. The VXLAN segments/overlay networks are overlaid on top of these Layer 2 or Layer 3 networks



18.  NVGRE - Network Virtualization using Generic Routing Encapsulation
Network Virtualization Using Generic Routing Encapsulation, or NVGRE, allows the creation of virtual Layer 2 topologies on top of a physical Layer 3 network. This design is achieved by tunneling Ethernet frames inside an IP packet over a physical network. NVGRE supports a 24-bit segment ID or virtual subnet identifier (VSID), providing up to 16 million virtual segments that can uniquely identify a given tenant's segment or address space.
Network virtualization involves creating virtual Layer 2 and/or Layer 3 topologies on top of an arbitrary physical Layer 2/Layer 3 network. Connectivity in the virtual topology is provided by tunneling Ethernet frames in IP over the physical network. Virtual broadcast domains are realized as multicast distribution trees. The multicast distribution trees are analogous to the VLAN broadcast domains. A virtual Layer 2 network can span multiple physical subnets. Support for bi-directional IP unicast and multicast connectivity is the only requirement from the underlying physical network to support unicast communications within a virtual network. If the operator chooses to support broadcast and multicast traffic in the virtual topology the physical topology must support IP multicast.

NvGRE Headers - 64 bytes /octets
Outer Ethernet 802.1Q
18 bytes / octets
Outer IP header
20 bytes / octets
NvGRE Header
 8 bytes / octets
Inner Ethernet 802.1Q
18 bytes / octets

18.1 Outer Ethernet 802.1Q Headers – 18 Bytes

Outer Ethernet 802.1Q - 18 bytes / octets
O-DMAC
O-SMAC
Type
0x8100
O-COS
O-DEI
O-VID
Etype
0x0800
 48 bits
48 bits
16 bits
3 bits
1 bit
12 bits
16 bits

Same as any 802.1Q frame, with the above fields and Ether-types.
The source Ethernet address in the outer frame is set to the MAC address associated with the NVGRE endpoint. The destination Ethernet address is set to the MAC address of the nexthop IP address for the destination NVE. The destination endpoint may or may not be on the same physical subnet. The outer VLAN tag information is optional and can be used for traffic management and broadcast scalability.

18.2 Outer IP Header – 20 bytes
Outer IP header - 20 bytes / octets
V
IHL
5
TOS
Total
Length
Identification
Flag
DF=1
Fragment Offset
Time to Live
Protocol
47
Header Checksum
S-IP
D-IP
4 bits
4 bits
8 bits
16 bits
16 bits
3 bits
13 bits
8 bits
8 bits
16 bits
32 bits
32 bits

Same as any IP header format, protocol being set to 47.
18.3 NvGRE Header – 8 bytes

NvGRE Header  - 8 bytes / octets
C
0
K
1
S
0
Reserved0
V
Protocol
0x6558
Virtual Subnet ID (VSID)
Flow ID
1 bit
1 bit
1 bit
1 bit
9 bits
3 bits
16 bits
16 bits
8 bits

18.3.1 C – Checksum Present – 1 bit – The value is set to zero meaning that both the Checksum and the Reserved1 fields are not present.
18.3.2 K – Key Present – 1 bit – The bit is set to 1 meaning the key field is present in the GRE header.
18.3.3 S – Sequence Number Present – 1 bit – The value is set to zero meaning that the Sequence Number field is not present in the GRE header.
18.3.4 Reserved0 – 9 bits – A receiver MUST discard a packet where any of bits 1-5 are non-zero, unless that receiver implements RFC 1701. Bits 6-12 are reserved for future use. These bits MUST be sent as zero and MUST be ignored on receipt.
18.3.5 V – Version – 3 bits – The Version Number field MUST contain the value zero.
18.3.6 Protocol – 16 bits – The Protocol Type field contains the protocol type of the payload packet. The protocol type field in the GRE header is set to 0x6558 (transparent Ethernet bridging).
18.3.7 VSID – Virtual Subnet ID – 16 bits – The first 24 bits of the Key field are used for VSID. The VSID can be crafted in such a way that it uniquely identifies a specific tenant's subnet. The VSID is carried in an outer header allowing unique identification of the tenant's virtual subnet to various devices in the network. NVGRE leverages the GRE header to carry VSID information in each packet. The VSID information in each packet can be used to build multi-tenant-aware tools for traffic analysis, traffic inspection, and monitoring.
18.3.7 Flow ID – 8 bits – The last 8 bits of the Key field are (optional) FlowID, which can be used to add per-flow entropy within the same VSID, where the entire Key field (32-bit) MAY be used by switches or routers in the physical network infrastructure for ECMP purposes (Equal-Cost, Multi-Path). If a FlowID is not generated, the FlowID field MUST be set to all zeros.
16.5 Inner 802.1Q Ethernet Headers – 18 Bytes

Inner Ethernet 802.1Q - 18 bytes / octets
I-DMAC
I-SMAC
Type
 0x8100
I-COS
I-DEI
I-VID
48 bits
48 bits
16 bits
3 bits
1 bit
12 bits

Same as any 802.1Q frame, with the above fields and Ether-types.


18.  STT - Stateless Transport Tunneling
Stateless transport tunneling (STT) is an overlay encapsulation scheme over Layer 3 networks that use a TCP-like header within the IP header. The use of TCP fields has been proposed to provide backward compatibility with existing implementations of NICs to enable offload logic, and hence STT is specifically useful for deployments that are target end systems (such as virtual switches on physical servers). Note that, as the name implies, the TCP fields do not use any TCP connection state.

I don't intend to describe frame format here, more information can be found on Draft-davie-stt-02#page-13.

And finally I couldn't find any frame encapsulation for Juniper Qfabric , qfabric or Juniper Meta Fabric


Now, before going to sleep, I wonder how a frame will look like if it will be an IPSEC over GRE over VPLS (protected with TE PE-P FRR) over MAC-in-MAC...for sure I don't like to troubleshoot something like that and I am very glad that MTU 9000 is supported in almost all core routers...by Mihaela Paraschivu dreaming in the night...

No comments: