The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2006-Dec> msg00048



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

[mpls] Comments on draft-decraene-mpls-ldp-interarea

  • From: Eric Rosen <erosen@cisco.com>
  • Date: Thu, 14 Dec 2006 11:06:29 -0500
  • Authentication-Results: rtp-dkim-2; header.From=erosen@cisco.com; dkim=pass (sig from cisco.com/rtpdkim2001 verified; );
  • DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=14129; t=1166112393;x=1166976393; c=relaxed/simple; s=rtpdkim2001;h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version;d=cisco.com; i=erosen@cisco.com;z=From:=20Eric=20Rosen=20<erosen@cisco.com>|Subject:=20Comments=20on=20draft-decraene-mpls-ldp-interarea|Sender:=20 |To:=20mpls@ietf.org;bh=78coWsR/YEkdWh+be/1xO2Mr177Db7uyfMlBm24LCho=;b=TR4j0JySq/3Fd1GPRHH9U6Vluz4MTC3zo4BZgx2i02GMeEW4oqY1f2AZgTtTg9FMvI/rTJ3kaGgmjXDuSub/JUkuZw7dLNBc0a8xkbP9pfOvmS8OybBppVloEvbDQQDJ;
  • X-IronPort-AV: i="4.12,170,1165208400"; d="scan'208"; a="109501119:sNHT62765604"

In  San  Diego  I  expressed some  concerns  about  draft-decraene-mpls-ldp-
interarea, and Loa asked me to write them up for this list. 

I certainly can't say  at this time that the draft just  won't work, but I'm
not currently convinced that the complexity is fully understood, or that the
proposed procedures will really achieve all their goals.

A summary of my concerns is the following:

- The draft purports to be making a simple change to the LDP procedures, but
  I believe  it's actually a  fairly involved change to  the implementation.
  In particular the interaction between routing, LDP, and the maintenance of
  the forwarding  tables is made more  complicated, but the  details are not
  fully specified. 

- The draft  has LDP put prefixes  into the forwarding table  that would not
  otherwise  be there.   Currently, LDP  doesn't do  this, it  just  puts in
  labels for prefixes that have  been installed by routing.  This means that
  LDP would have  functionality that was formerly in  the domain of routing.
  The  interactions of  LDP  with  the various  routing  algorithms under  a
  variety of circumstances must therefore be carefully considered.  However,
  the draft has very little about this. 

- The draft focuses on the use of the new procedures in a particular sort of
  deployment scenario, and doesn't say much about the effects on the network
  if the procedures were to be used in other deployment scenarios, or if the
  procedures were misconfigured, etc. 

- I think that the major purpose of the proposal is to facilitate the use of
  LDP-based  node protection  for area  border routers.   However, I  am not
  convinced that the proposal really achieves this goal. 

Let  me start by  summarizing my  understanding of  the draft.   The problem
addressed by the  draft is the following.  There are  a number of situations
in which an  ingress PE needs to send  a packet on an LSP  tunnel whose "LSP
egress" is a PE router (the  "egress PE").  This requires the "tunnel label"
to be bound to a /32 prefix  for the egress, which in turn requires that the
ingress PE's forwarding table contain a /32 prefix for each potential egress
PE.  

The authors describe  a deployment scenario in which  the service provider's
backbone is a multi-area IS-IS or OSPF network, containing tens of thousands
of PE  routers distributed across the  areas.  The usual way  of getting all
the  /32s in each  ingress PE's  forwarding table  is to  have all  IGP area
border routers leak all the /32s  into each area.  Then the IGP ensures that
each ingress PE's forwarding table is populated with all the /32s.

The problem is that the intra-area  IGP protocols don't scale up well as the
number of prefixes increases into the tens of thousands.  Therefore it would
be  desirable to  have  some other  way of  getting  all the  /32s into  the
forwarding tables of the ingress PEs. 

The proposal is to use LDP, rather  than the IGP, to distribute the /32s and
to ensure  that the  ingress PE's forwarding  tables are  properly populated
with the /32s.   Note that this does not really reduce  the resources in the
core needed to  support the LSPs to the PEs, it  simply changes the protocol
which is used to allocate  some of those resources.  Resource consumption in
the P routers of  a given area is still proportional to  the total number of
PEs in all the areas. 

It might seem that a better solution  would be to use hierarchy, so that the
P routers do not  have to maintain an LSP for each  egress PE. For instance,
BGP between  the border routers could  carry the /32s, without  any need for
the  P routers to  maintain the  /32s are  all.  However,  I think  that the
authors actually want  to the P routers maintain an LSP  for each egress PE,
because  they believe  this to  be a  necessary condition  of  applying node
protection when a border router fails.  So they frame the problem not as one
of reducing the resource  usage in the core, but as one  of finding a better
protocol to allocate those resources in the core.

One could of course put IBGP in all the P routers and use IBGP to distribute
the /32s.  However, the authors  do not like the operational implications of
putting BGP in  the core.  I'm not sure they've  considered a restricted use
of BGP for distributing only the  egress PE addresses; often when we talk of
a "BGP  free core", what  it really  important is that  the core be  free of
external routes,  rather than that it be  free of BGP protocol.   But in any
event they propose to use LDP. 

I think the proposed operational model is the following: 

- A PE, say  PE1, is configured to bind  a label to its own address  A (as a
  /32  prefix),a  and  to use  LDP  distribute  this  label binding  to  its
  neighbors.

- If  P1 is  a neighbor  of PE1,  and PE1  is P1's  next hop  for A,  and P1
  receives a label binding for A from  PE1, then P1 must put an entry in its
  FIB for A, with the label distributed by PE1.

- P1 must then assign a label of its  own to A, and use LDP to advertise the
  binding A and that label to P1's neighbors. 

- Using LDP  ordered mode,  this process  proceeds out to  the edges  of the
  network.  However, it does not go beyond the edges.

This operational model  is not completely clear to me.   I surmise that each
PE needs to  be configured to begin this process.  I'm  less clear about how
other nodes  know whether or not  they are on  the "edge" of the  network. I
think the  authors have in mind  a deployment scenario in  which the "edges"
are inter-AS  interfaces and VRF  interfaces, but inter-area  interfaces are
not considered to be edges.  But I'm  not sure what the model is supposed to
be in general.   Are VRF interfaces always edges,  even if Carrier's Carrier
is being  used?  Are inter-AS interfaces  always edges, even if  a single SP
has broken his network into several  ASes.  Is it true that inter-area links
are never  edges, or is this meant  to be configurable?  If  the latter, are
there deployment requirements  for all area border routers  to be configured
the same?  

In  any event,  the authors  note that  the  use of  LDP in  this manner  is
explicitly prohibited by the LDP  specification, which requires that LDP not
install a  label for any  prefix if that  exact prefix has not  already been
installed in  the forwarding table by  routing.  That is, LDP  cannot bind a
label to a prefix  unless that prefix is an exact match  to one installed by
routing. 

This  prohibition  didn't   get  there  by  accident.   It   was  put  there
deliberately to  ensure that LDP  cannot "second guess" routing  by creating
paths which differ  from the paths created by routing.   After all, LDP does
not have  its own procedures for  deciding which path an  LSP should follow,
it depends upon routing for this.

The authors  propose to remove this  prohibition, and instead  impose a more
sophisticated restriction, which they express as follows:

   "This procedure  is similar to  the one defined  in [LDP] but  performs a
   longest match when searching the FEC element in the RIB ...

   With this new longest match label mapping procedure, a LSR receiving 
   a Label Mapping message from a neighbor LSR for a Prefix Address FEC 
   Element SHOULD use the label for MPLS forwarding if its routing table 
   contains an entry that matches the FEC Element and the advertising 
   LSR is a next hop to reach the FEC. If so, it SHOULD advertise the 
   FEC Element and a label to its LDP peers. 
    
   By 'matching FEC Element', one should understand an IP longest match." 

The intention is to maintain the requirement that LDP not alter the routing,
while allowing  LDP to install prefixes  that are not  installed by routing.
They propose  to allow LDP  to install a  prefix if and  only if there  is a
corresponding  less specific  prefix  that has  been  installed by  routing.
Further, the next  hop of the LDP-installed prefix is  constrained to be the
same as the next hop of  the less specific prefix, as determined by routing.
This should ensure that labeled packets always follow the path that has been
determined by routing to be the best path. 

It must  be understood that  this procedure adds considerable  complexity to
the maintenance of the forwarding tables.

In  existing LDP deployments,  all LDP  does with  regard to  the forwarding
table  is to supply  the label  for a  given <prefix,  next hop>  pair which
routing has installed  in the forwarding table.  In the  proposal, LDP has a
number of additional jobs to do  with regard to the forwarding table.  Let's
define some terminology: 

- Let RP be a prefix distributed by routing. 
- Let NH(RP) be the next hop for RP.
- Let LP(i) be a prefix distributed by neighbor i using LDP. 
- Let RP(i) be the set of LDP-distributed prefixes from neighbor i for which
  RP is the longest match. 

If NH(RP) changes from  i to j, things aren't too hard,  as long as RP(i) is
identical to RP(j).  Then you just  have to change the next hop and outgoing
label for each prefix in RP(i). 

It's a  bit more  complicated if  RP(j) is different  from RP(i).   Now some
prefixes need to  get their nh and label changed, but  others need to either
be put into or pulled out of the forwarding table entirely. 

It's even more  complicated if routing installs a new  prefix, RP2, which is
more specific than  an already installed prefix RP1,  but less specific than
some members of RP1(i).  (And possibly more specific than some other members
of RP1(i)).   Now one has  to figure out  which members of RP1(i)  remain in
RP1(i) and which are now in RP2(i).  In fact, one has to figure this out for
each of the neighbors.

Certainly this  could all be made to  work, but it's not  exactly the simple
change that  the draft suggests it  is.  It would  be nice to see  section 4
replaced with  a much  more comprehensive treatment  of how  the interaction
between routing events  and LDP events affects the  forwarding tables.  I am
not at all sure that I yet understand all the implications.

The draft also needs a lot more consideration of "corner cases".

For instance, what if BGP and LDP each distribute the same /32 prefix, while
IGP  distributes  a less  specific  prefix?   It's  not immediately  obvious
whether the LDP-distributed prefix  is supposed to match the IGP-distributed
less  specific  prefix, thereby  replacing  the  BGP-distributed prefix,  or
whether the LDP-distributed prefix  is supposed to match the BGP-distributed
prefix.   It's  also   not  clear  just  what  you'd  have   to  do  if  the
LDP-distributed prefix matched a BGP-distributed prefix, because the notions
of "next  hop" are different.  To  add a further  complication, suppose that
the BGP-distributed  prefix is  really a labeled-IPv4  prefix.  What  is the
relation between  the BGP-distributed  label and the  LDP-distributed label?
Are they both used, or not?   Questions like these need to be addressed, and
the implications of answering them one way or another need to be considered.

Here's another  case worth considering.  Consider an  intermediate P router,
P1, with neighbors P2 and P3.  Suppose:

- IGP distributes a route to prefix X. 
- Using LDP,  P2 distributes a label for  prefix Y, where X  is the "longest
  match" for Y. 
- P3 does not use LDP to distribute a label for Y.
- At time t0, the next hop at P1 for X is P2. 
- At time t1, the next hop at P1 for X is P3. 

I think this means that at time  t1, any packets which P1 receives that have
been labeled for Y will be dropped  by P1.  Why?  The more specific prefix Y
will have to  be removed from the forwarding table at  t1, as the conditions
for having it in  the forwarding table no longer seem to  hold, given the P3
has not  distributed a label  for the longer  prefix.  Thus the  labels that
correspond to  Y will also be  removed, and packets which  have already been
labeled for Y will find that there is no outgoing label corresponding to the
incoming label.  This means they must be dropped.

One could dismiss this as a transient, but if the whole reason for doing any
of this is to  make node protection work, i.e., to avoid  the bad effects of
routing  transients, then  one presumably  is  not willing  to just  dismiss
transient effects.

This scenario could occur, e.g., if a given egress PE can be reached via two
different ABRs, where one of them is implementing the draft but the other is
not (or where one is configured to use the draft procedures but the other is
not).  One could  dismiss this as an operator error, but  it can also happen
if  a new  ABR comes  up and  the distribution  of routing  info  across the
network proceeds faster  than the distribution of LDP info.   Note that on a
single link, we can implement procedures  which keep the link from coming up
to routing until LDP has distributed  labels over the link, in order to keep
things like this from happening.  That's  fine as long as LDP operations are
per-link.  When  we start to  provide LDP operations that  have edge-to-edge
significance, such as using LDP to build a path for X across the network, we
don't have any similar way to synchronize the LDP and IGP info.

There may be other corner cases as well.  

If a  decision is  made to pursue  this draft,  I think it  needs to  have a
fairly comprehensive treatment of  (a) all the possible interactions between
and routing, and (b) the behavior  of the scheme when deployed in other than
the  intended  manner.   Also, if  the  main  purpose  of  the draft  is  to
facilitate an LDP-based node protection scheme for border routers, it should
provide  a better  assurance that  it doesn't  leave a  number  of transient
conditions unprotected.

Without this additional  work, I think it is difficult  to tell whether this
is ultimately a direction worth pursuing.






  

_______________________________________________
mpls mailing list
mpls@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/mpls