The MPLS WG Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] [mpls] Comments on draft-decraene-mpls-ldp-interarea
In San Diego I expressed some concerns about draft-decraene-mpls-ldp-
interarea, and Loa asked me to write them up for this list.
I certainly can't say at this time that the draft just won't work, but I'm
not currently convinced that the complexity is fully understood, or that the
proposed procedures will really achieve all their goals.
A summary of my concerns is the following:
- The draft purports to be making a simple change to the LDP procedures, but
I believe it's actually a fairly involved change to the implementation.
In particular the interaction between routing, LDP, and the maintenance of
the forwarding tables is made more complicated, but the details are not
fully specified.
- The draft has LDP put prefixes into the forwarding table that would not
otherwise be there. Currently, LDP doesn't do this, it just puts in
labels for prefixes that have been installed by routing. This means that
LDP would have functionality that was formerly in the domain of routing.
The interactions of LDP with the various routing algorithms under a
variety of circumstances must therefore be carefully considered. However,
the draft has very little about this.
- The draft focuses on the use of the new procedures in a particular sort of
deployment scenario, and doesn't say much about the effects on the network
if the procedures were to be used in other deployment scenarios, or if the
procedures were misconfigured, etc.
- I think that the major purpose of the proposal is to facilitate the use of
LDP-based node protection for area border routers. However, I am not
convinced that the proposal really achieves this goal.
Let me start by summarizing my understanding of the draft. The problem
addressed by the draft is the following. There are a number of situations
in which an ingress PE needs to send a packet on an LSP tunnel whose "LSP
egress" is a PE router (the "egress PE"). This requires the "tunnel label"
to be bound to a /32 prefix for the egress, which in turn requires that the
ingress PE's forwarding table contain a /32 prefix for each potential egress
PE.
The authors describe a deployment scenario in which the service provider's
backbone is a multi-area IS-IS or OSPF network, containing tens of thousands
of PE routers distributed across the areas. The usual way of getting all
the /32s in each ingress PE's forwarding table is to have all IGP area
border routers leak all the /32s into each area. Then the IGP ensures that
each ingress PE's forwarding table is populated with all the /32s.
The problem is that the intra-area IGP protocols don't scale up well as the
number of prefixes increases into the tens of thousands. Therefore it would
be desirable to have some other way of getting all the /32s into the
forwarding tables of the ingress PEs.
The proposal is to use LDP, rather than the IGP, to distribute the /32s and
to ensure that the ingress PE's forwarding tables are properly populated
with the /32s. Note that this does not really reduce the resources in the
core needed to support the LSPs to the PEs, it simply changes the protocol
which is used to allocate some of those resources. Resource consumption in
the P routers of a given area is still proportional to the total number of
PEs in all the areas.
It might seem that a better solution would be to use hierarchy, so that the
P routers do not have to maintain an LSP for each egress PE. For instance,
BGP between the border routers could carry the /32s, without any need for
the P routers to maintain the /32s are all. However, I think that the
authors actually want to the P routers maintain an LSP for each egress PE,
because they believe this to be a necessary condition of applying node
protection when a border router fails. So they frame the problem not as one
of reducing the resource usage in the core, but as one of finding a better
protocol to allocate those resources in the core.
One could of course put IBGP in all the P routers and use IBGP to distribute
the /32s. However, the authors do not like the operational implications of
putting BGP in the core. I'm not sure they've considered a restricted use
of BGP for distributing only the egress PE addresses; often when we talk of
a "BGP free core", what it really important is that the core be free of
external routes, rather than that it be free of BGP protocol. But in any
event they propose to use LDP.
I think the proposed operational model is the following:
- A PE, say PE1, is configured to bind a label to its own address A (as a
/32 prefix),a and to use LDP distribute this label binding to its
neighbors.
- If P1 is a neighbor of PE1, and PE1 is P1's next hop for A, and P1
receives a label binding for A from PE1, then P1 must put an entry in its
FIB for A, with the label distributed by PE1.
- P1 must then assign a label of its own to A, and use LDP to advertise the
binding A and that label to P1's neighbors.
- Using LDP ordered mode, this process proceeds out to the edges of the
network. However, it does not go beyond the edges.
This operational model is not completely clear to me. I surmise that each
PE needs to be configured to begin this process. I'm less clear about how
other nodes know whether or not they are on the "edge" of the network. I
think the authors have in mind a deployment scenario in which the "edges"
are inter-AS interfaces and VRF interfaces, but inter-area interfaces are
not considered to be edges. But I'm not sure what the model is supposed to
be in general. Are VRF interfaces always edges, even if Carrier's Carrier
is being used? Are inter-AS interfaces always edges, even if a single SP
has broken his network into several ASes. Is it true that inter-area links
are never edges, or is this meant to be configurable? If the latter, are
there deployment requirements for all area border routers to be configured
the same?
In any event, the authors note that the use of LDP in this manner is
explicitly prohibited by the LDP specification, which requires that LDP not
install a label for any prefix if that exact prefix has not already been
installed in the forwarding table by routing. That is, LDP cannot bind a
label to a prefix unless that prefix is an exact match to one installed by
routing.
This prohibition didn't get there by accident. It was put there
deliberately to ensure that LDP cannot "second guess" routing by creating
paths which differ from the paths created by routing. After all, LDP does
not have its own procedures for deciding which path an LSP should follow,
it depends upon routing for this.
The authors propose to remove this prohibition, and instead impose a more
sophisticated restriction, which they express as follows:
"This procedure is similar to the one defined in [LDP] but performs a
longest match when searching the FEC element in the RIB ...
With this new longest match label mapping procedure, a LSR receiving
a Label Mapping message from a neighbor LSR for a Prefix Address FEC
Element SHOULD use the label for MPLS forwarding if its routing table
contains an entry that matches the FEC Element and the advertising
LSR is a next hop to reach the FEC. If so, it SHOULD advertise the
FEC Element and a label to its LDP peers.
By 'matching FEC Element', one should understand an IP longest match."
The intention is to maintain the requirement that LDP not alter the routing,
while allowing LDP to install prefixes that are not installed by routing.
They propose to allow LDP to install a prefix if and only if there is a
corresponding less specific prefix that has been installed by routing.
Further, the next hop of the LDP-installed prefix is constrained to be the
same as the next hop of the less specific prefix, as determined by routing.
This should ensure that labeled packets always follow the path that has been
determined by routing to be the best path.
It must be understood that this procedure adds considerable complexity to
the maintenance of the forwarding tables.
In existing LDP deployments, all LDP does with regard to the forwarding
table is to supply the label for a given <prefix, next hop> pair which
routing has installed in the forwarding table. In the proposal, LDP has a
number of additional jobs to do with regard to the forwarding table. Let's
define some terminology:
- Let RP be a prefix distributed by routing.
- Let NH(RP) be the next hop for RP.
- Let LP(i) be a prefix distributed by neighbor i using LDP.
- Let RP(i) be the set of LDP-distributed prefixes from neighbor i for which
RP is the longest match.
If NH(RP) changes from i to j, things aren't too hard, as long as RP(i) is
identical to RP(j). Then you just have to change the next hop and outgoing
label for each prefix in RP(i).
It's a bit more complicated if RP(j) is different from RP(i). Now some
prefixes need to get their nh and label changed, but others need to either
be put into or pulled out of the forwarding table entirely.
It's even more complicated if routing installs a new prefix, RP2, which is
more specific than an already installed prefix RP1, but less specific than
some members of RP1(i). (And possibly more specific than some other members
of RP1(i)). Now one has to figure out which members of RP1(i) remain in
RP1(i) and which are now in RP2(i). In fact, one has to figure this out for
each of the neighbors.
Certainly this could all be made to work, but it's not exactly the simple
change that the draft suggests it is. It would be nice to see section 4
replaced with a much more comprehensive treatment of how the interaction
between routing events and LDP events affects the forwarding tables. I am
not at all sure that I yet understand all the implications.
The draft also needs a lot more consideration of "corner cases".
For instance, what if BGP and LDP each distribute the same /32 prefix, while
IGP distributes a less specific prefix? It's not immediately obvious
whether the LDP-distributed prefix is supposed to match the IGP-distributed
less specific prefix, thereby replacing the BGP-distributed prefix, or
whether the LDP-distributed prefix is supposed to match the BGP-distributed
prefix. It's also not clear just what you'd have to do if the
LDP-distributed prefix matched a BGP-distributed prefix, because the notions
of "next hop" are different. To add a further complication, suppose that
the BGP-distributed prefix is really a labeled-IPv4 prefix. What is the
relation between the BGP-distributed label and the LDP-distributed label?
Are they both used, or not? Questions like these need to be addressed, and
the implications of answering them one way or another need to be considered.
Here's another case worth considering. Consider an intermediate P router,
P1, with neighbors P2 and P3. Suppose:
- IGP distributes a route to prefix X.
- Using LDP, P2 distributes a label for prefix Y, where X is the "longest
match" for Y.
- P3 does not use LDP to distribute a label for Y.
- At time t0, the next hop at P1 for X is P2.
- At time t1, the next hop at P1 for X is P3.
I think this means that at time t1, any packets which P1 receives that have
been labeled for Y will be dropped by P1. Why? The more specific prefix Y
will have to be removed from the forwarding table at t1, as the conditions
for having it in the forwarding table no longer seem to hold, given the P3
has not distributed a label for the longer prefix. Thus the labels that
correspond to Y will also be removed, and packets which have already been
labeled for Y will find that there is no outgoing label corresponding to the
incoming label. This means they must be dropped.
One could dismiss this as a transient, but if the whole reason for doing any
of this is to make node protection work, i.e., to avoid the bad effects of
routing transients, then one presumably is not willing to just dismiss
transient effects.
This scenario could occur, e.g., if a given egress PE can be reached via two
different ABRs, where one of them is implementing the draft but the other is
not (or where one is configured to use the draft procedures but the other is
not). One could dismiss this as an operator error, but it can also happen
if a new ABR comes up and the distribution of routing info across the
network proceeds faster than the distribution of LDP info. Note that on a
single link, we can implement procedures which keep the link from coming up
to routing until LDP has distributed labels over the link, in order to keep
things like this from happening. That's fine as long as LDP operations are
per-link. When we start to provide LDP operations that have edge-to-edge
significance, such as using LDP to build a path for X across the network, we
don't have any similar way to synchronize the LDP and IGP info.
There may be other corner cases as well.
If a decision is made to pursue this draft, I think it needs to have a
fairly comprehensive treatment of (a) all the possible interactions between
and routing, and (b) the behavior of the scheme when deployed in other than
the intended manner. Also, if the main purpose of the draft is to
facilitate an LDP-based node protection scheme for border routers, it should
provide a better assurance that it doesn't leave a number of transient
conditions unprotected.
Without this additional work, I think it is difficult to tell whether this
is ultimately a direction worth pursuing.
_______________________________________________
mpls mailing list
mpls@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/mpls
|
|