Path MTU Discovery

Sunday, April 6, 2014


Path MTU Discovery is a technique used to dynamically discover the path MTU from the source to the destination using the DF (Don’t Fragment) bit from the IP header. It is the smallest effective transmitted MTU along the path defined by IP source, IP destination and maybe TOS of the packets. The basic idea of the mechanism is that the source will assume the path MTU to be equal with the known first hop MTU and to send IP packets with DF bit set with that known MTU along the path. If along the path, a link has a next-hop smaller MTU the router will drop the IP datagram and send back an ICMP Destination Unreachable with the code Fragmentation needed and DF set. After receiving this ICMP message the host will reduce the path MTU for that particular link.

There are more ways to implement path MTU discovery as stated in RFC 1191 , the big differences are between router and host implementation:
 The router must include the MTU of the next-hop in the lower 16 bit of the ICMP Destination Unreachable – Fragmentation needed and DF bit set (datagram too big as RFC says). This is the most used implementation but it has its flows.
 The host implementation on the other hand might elect to reduce the path MTU to the next-hop value received on the ICMP Datagram too big message or clear the DF bit on the IP header.
 When the ICMP datagram too big message does not contain the MTU of the next-hop, the things get complicated and a lot of possible algorithms might be implemented by the hosts.

MPLS Fragmentation

Thursday, March 27, 2014


Besides IP fragmentation we have MPLS fragmentation for some of the MPLS application, or so the RFC are saying and some MPLS fragmentation should exist in all vendors’ hardware and software. MPLS fragmentation should exist at L3VPN (based, of course, on IP fragmentation described in RFC 3032), it might exist on PW (described in RFC 4623), and also it might exist on VPLS if IP fragmentation is presented on CE (RFC 4665). So MPLS fragmentation exists, at least it is described on the RFCs, and no the labeled fragmented packets will not get lost and no will not lose the source and the destination because of the fragmentation.

Again, the best scenario is to avoid fragmentation and to increase the MTU size as much as possible to increase the link throughput. Also it would be best to have the same MTU size along all the paths in the network, but if not possible at least to have the same MTU size along the same layer: core, aggregation and distribution, considering the fact in distribution layer could be some not so smart devices. 

IP Fragmentation

Sunday, March 16, 2014


At the beginning has been TCP with no IP at all, the TCP has been spited in 2 TCP/IP at version 3 and it has become operational and highly used at version 4 (that’s why the 4 version from IP). The splitting of TCP has been decided because of the following reasons:  the need to have different layers for Network and Transport; the overkill of gateways (the former name of routers) to deal with end-to-end protocol for both routing packets between devices and reliable communications between end hosts. The IPv6 name has been chosen to avoid any confusion between the non-used IPv5 protocol (Internet Stream Protocol) and the new IPv6 protocol.

Just to be clear since the beginning: the fragmentation can be done on IP layer and on MPLS layer and that’s why on layer 2 link (like QinQ) the frames are dropped (depending, for more details you can check Almighty-MTUl) if the layer 2 MTU is less than the frames size, at this moment there is no layer 2 fragmentation definition or RFCs.

IP MTU, TCP MSS and TCP windows sizes defaults

Sunday, February 16, 2014


The differences between IP MTU, TCP windows size and MSS are sometimes not clear enough but most of the times the correct chosen values influence the throughput (here  more details ) of the link. Again, these values are vendor, hardware and operating system dependent.

The IP MTU is considered to be the maximum IP packet size which can be transmitted over the interface without the need of IP fragmentation.

The TCP window size is consider being how many TCP segments can be transmitted without waiting for a TCP acknowledgment, or I say it to be a burst of unacknowledged TCP segments or a burst of MSS.

The MSS is the Maximum Segment Size of one TCP segment; it is actually the maximum amount of Data which can be sent in 1 TCP segment, not including the Ethernet, IP and TCP headers.

The difference between PING DF and MTU size of the link

Saturday, February 1, 2014


The ping command has different implementation based on the operation systems and the networking vendors’ and software’s devices. Most of the time, when the MTU must be tested, the ping command is used with DF (Don’t Fragment) bit set. At the Ethernet header must be added the IP header (20 bytes without Options) and ICMP header (8 bytes); in some cases these values must be subtracted from the link MTU, in some cases even the Ethernet frame header (12 bytes – DMAC, SMAC, Type) and sometimes even the Ethernet CRC. 
Every operating system has its own way to implement the ping command. After all, each networking equipment has an operating system and most of the time the operating system is based on Linux/Unix, you will find the fine print hiding somewhere in the code or on vendors pages, if you search for it.

Almighty MTU – Ethernet MTU, MPLS MTU and IP MTU

Tuesday, January 21, 2014

The MTU implementations are different on vendor’s equipments, different equipments, different boards and even different software on the same hardware platform. Most of the known and used MTU are the following: Physical MTU, IP MTU, GRE IP MTU, IPSEC MTU, MPLS MTU, MPLS TE MTU, and LDP over MPLS TE MTU. The packet/frame verification and fragmentation can be done or in ingress or in egress depending of course of the implementation on specific device and vendor.

Theoretically we have the following MTU for an Ethernet frame, with the following relationship between them:

ETHERNET MTU > MPLS MTU > IP MTU

ETHERNET MTU = ETHERNET ENCAPSULATION + MPLS MTU + IP MTU


Ethernet, Carrier and Datacenter Ethernet Protocol Overhead and Throughput

Thursday, January 9, 2014


Let’s start to calculate the Protocol Overhead, Protocol Efficiency and Throughput for Ethernet, Carrier and Datacenter Ethernet Protocols like 802.3, 802.1q, 802.1ad QinQ, 802.1ah mac-in-mac, MPLS, TRILL. More information about the Ethernet frame formats and fields’ description, Protocols Overhead and the table representing all can be found in my previous post Ethernet and Overlay technologies over Ethernet .