MPLS Fragmentation

Thursday, March 27, 2014

Besides IP fragmentation we have MPLS fragmentation for some of the MPLS application, or so the RFC are saying and some MPLS fragmentation should exist in all vendors’ hardware and software. MPLS fragmentation should exist at L3VPN (based, of course, on IP fragmentation described in RFC 3032), it might exist on PW (described in RFC 4623), and also it might exist on VPLS if IP fragmentation is presented on CE (RFC 4665). So MPLS fragmentation exists, at least it is described on the RFCs, and no the labeled fragmented packets will not get lost and no will not lose the source and the destination because of the fragmentation.

Again, the best scenario is to avoid fragmentation and to increase the MTU size as much as possible to increase the link throughput. Also it would be best to have the same MTU size along all the paths in the network, but if not possible at least to have the same MTU size along the same layer: core, aggregation and distribution, considering the fact in distribution layer could be some not so smart devices. 

RFC 3032 - MPLS Label Stack Encoding 

The MPLS packets could be “too big” for the link because one of the following reasons:
-       IP packets entering in the MPLS cloud are too big.
-       IP packets entering in the MPLS cloud are getting the labels pushed and are becoming too big.
-       in MPLS cloud, some MPLS packets are getting more labels pushed and are becoming too big.

Even if the packet is too big and DF is not set, the LSR might silently discard the packet. If MPLS fragmentation is implemented and DF is not send, then the LSR should do the following:
1.  POP all the labels from the label stack to obtain IP datagram.
2. Let N be the number of bytes in the label stack.
3. Check the DF bit of the IP datagram, if not set fragment the packet based on IP fragmentation rules.
4. Each IP datagram fragment should have the size equal with MTU – N, where N is the value noted at point 2 (number of bytes in the label stack).
5. PUSH to each IP datagram fragment same label header stack that had the original non-fragmented packet.
6. Forward the fragments.

If the stripped of IP datagram (point 3) has the DF bit set, then the datagram should not be fragmented and forwarded, should be discarded and an ICMP host Unreachable message (with code 3 "Fragmentation Required and DF Set") should be generated an transmitted to the source, if possible.

Considering an IP packet with 5000 bytes size from a L3VPN CE entering in the MPLS domain with 1500 bytes MTU size, first to the packet will be pushed 2 MPLS labels:


Second, the big MPLS packet will be fragmented based on IP fragmentation (for more details you can check IP-Fragmentation ) plus the label stack copied to each fragment, as you can see below:


From the above table the following can be noted:
-     The labels values from the initial packet are copied to all fragments.
-    The Length of the IP fragments, except the last one, is 1492 bytes = 1500 bytes (MTU) – 8 bytes (2 labels).
-   The Offset (representing where in the datagram the current fragment belongs toof the IP fragments is calculated based on the 1492 bytes value.
-      The Identification, the MF and DF values are same as for the IP fragmentation. 

Just to be able to make an analogy between the IP and MPLS fragmentation, you can check the below table in which the same IP 5000 bytes size is fragmented on IP cloud and when entering in MPLS cloud (considering a 1500 bytes IP MTU first and MPLS MTU second):


RFC 4623 - Pseudowire Emulation Edge-to-Edge (PWE3) Fragmentation and Reassembly

As even the RFC is stated, the fragmentation should be avoided as much as possible due to processing overhead, but in case the fragmentation is needed then the following fragmentation and reassembly domains are defined:
-       The first method is again to let the CE to do the IP fragmentation and to send the fragments to PW.  
-      Fragmentation is done in the transmitting PE immediately before the PW encapsulation.
-      Reassembly is done in the receiving PE immediately after the PW decapsulation.

PW fragmentation

Because there is no Fragment Offset from IP, using the Sequence Number field on fragmented packets is mandatory. For this purpose it is used the Control Word from the VC signaling, with already defined Pseudowire Interface Parameter Sub-TLV (parameter 0x99, length 4). The presence of this parameter in the VC label advertisement indicates that the receiver is able to do the reassembly and not that the transmitter will use fragmentation; the absence of this parameter will notify the sender not to use fragmentation. 

The fragmentation bits are on the position 8 and 9 in the control word format and have the following significance:

PW Control Word

the entire (un-fragmented) payload is carried in a single packet
the packet carrying the first fragment
the packet carrying the last fragment
indicates a packet carrying an intermediate fragment

 RFC 4665 - Service Requirements for Layer 2 Provider-Provisioned Virtual Private Networks

There is no implemented fragmentation method (until now) for VPLS services, a VPLS domain may implement IP fragmentation only on the IP CE sides.

Again, at the end, fragmentation should be avoided and eliminated as much as possible because besides increasing the processing overload, the different types of delays and possible packet drops it also increase the bandwidth due to the additional headers overload for each fragment…if not avoided, then good luck with fragmenting the Moby-Dick. 

By Mihaela Paraschivu

No comments: