The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2003-Jul> msg00101



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

Crankback [Was: Running out of RSVP-TE bits?]

  • From: Curtis Villamizar <curtis@laptoy770.fictitious.org>
  • Date: Wed, 23 Jul 2003 10:04:06 -0400
  • cc: curtis@fictitious.org, "'mpls@uu.net'" <mpls@UU.NET>, ccamp@ops.ietf.org


Adrian,

For brevity I'll use the word "broken" to mean a flaw in the
implementation.

An isolated setup failure can only occur 1) if correct information has
not been made available through flooding or 2) if the ingress
implementation is broken.  Correct information will not be available
if 1) an implementation in the path serving as midpoint is broken and
hasn't flooded correct information, 2) more than one router is broken
and didn't flood the information, 3) the setup isn't really isolated
and the reflooding has yet to occur or has not completely propogated.

Careful reading of the above paragraph yields the following
conclusion: either something is broken, or the the setup is not truly
isolated.  The one case where there is no implementation flaw and a
setup fails is when the setup follows a previous setup that took
resources that have not yet been reported by IGP flooding.  The
minimum case therefore involves two LSP setups, one that got the
resources and one that didn't.

In the minimum case without crankback reflooding is needed to indicate
that the resource is not available anymore plus an error must be
returned to the ingress.  With crankback the IGP reflooding becomes
redundant for that LSP setup.

Rarely is the minimal case of much interest compared to cases where a
feature is exercised heavily.  This is no exception.

Links occasionally fail, sometimes even routers fail.  ISPs have
reported (publicly I think, but at least privately) to have more than
1,000 RSVP/TE LSPs through a single node with close to 1,000 on a
single link.  The ingress for those failures would be many 10s of
routers in either direction.  Topologies are meshed but not densely
meshed.  This means that these many 10s of ingress are all going to go
for the same resource which is now the best after the failure.  Setups
occur very fast.  Often a large percentage of the LSPs will fail
admission.

This is an interesting case.  It is not mearly hypothetical, it is a
common real world case.

The argument that only the case of an isolated failure is of interest
does not hold up.  I've already 

In message <00a501c35096$06001110$0e9c9ed9@Puppy>, "Adrian Farrel" writes:
> Curtis,
> 
> I think you have possibly missed the chief use of crankback and focused on a 
> side use.
> 
> The principal use is to handle LSP setup failures. When these happen they ten
> d to happen
> one at a time (that is not a whole bunch of impacted LSPs that were using a l
> ink that has
> failed). Crankback allows nodes on the LSP to attempt to re-direct the LSP se
> tup attempt
> without failing it back to the ingress, and allows the collection of more det
> ailed failure
> information to aid the re-direction process.
> 
> In a nutshell, crankback targets rapid re-direction of LSP setup attempts wit
> hout waiting
> for IGP convergence. That crankback can also be used for handling failed LSPs
>  is an
> 'interesting' additional feature.
> 
> Note that in no case does crankback increase the error reporting message flow
> . It simply
> adds a little information to error messages already used in RSVP-TE or GMPLS.
>  In fact, if
> crankback is in use and intermediate nodes are enabled as crankback repair po
> ints,
> crankback can reduce the error reporting message flows since they don't have 
> to go all the
> way to the ingress nodes.
> 
> > > Perhaps you could summarize some of the problems crankback will cause.
> > > It would be good to get comments on these problems from the ITU-T members
>  who
> > > have made this an ASON requirement.
> >
> > After a link fails zero to a large number of admission failures may
> > occur on a few links that are considered desireable alternates (by
> > multiple LSP ingress, each acting independently).  Crankback will
> > provide individual signals to each ingress, probably for each LSP that
> > failed.
> 
> Well now: the number of signals is not a function of crankback. It is an exis
> ting function
> of RSVP-TE error reporting.
> Crankback simply adds information to the error reporting so that the error ma
> y be
> repaired. It also recognizes that non-ingress nodes may wish to repair errors
> .
> 
> > Just using the IGP flooding is far more efficient and
> > proactively tells ingress that haven't sent LSP paths over the now
> > overloaded link.
> 
> Efficiency is in the eye of the beholder on this issue. The debate over flood
> ing or
> signaling errors will probably go on for a while (see Richard Rabbat's work i
> n ccamp on
> the need for a flooding mechanism for fail-over to a protection LSP that may 
> be carrying
> extra traffic). The efficiency surely depends on the number of LSPs and the n
> umber of
> nodes/links in the network.
> 
> To repeat: the crankback draft doesn't make any assumptions about the efficie
> ncy, it
> simply uses existing mechanisms. In fact, there is a significant issue with t
> he use of a
> flooding mechanism. Viz. the ingress may not be able to tell from a descripti
> on of the
> faulted resource which LSPs were impacted - suppose a component link fails, o
> r even an
> individual laser. (You might as well say that soft pre-emption should be repo
> rted through
> flooding!)

Soft preempt applies to a single LSP, not to the state of the link.
There is only one ingress to that single LSP and that is the only
router that needs to know that that particular LSP has been preempted.
All other routers just need to know that reservable bandwidth has just
hit zero (or a small number) and that would be flooded.

> > Of course if you write specs before trying things you often get things
> > wrong, particularly where the dynamics of a protocol are concerned.
> > That's where the running code thing comes in.
> 
> I agree with you whole-heartedly. It is so important to have running code.
> Not sure why you bring it up in this context. Are you trying to imply somethi
> ng?

Only that some experience with crankback should be required for it to
move forward and if possible its effectiveness in more than the
minimal case should be considered before moving it forward.

> > I think testing in an
> > environment where lots of LSP from many ingress traverse a link that
> > fails and all route to one or two links that become overloaded should
> > be a prerequisite for crankback, with improvement demonstrated.
> 
> As in my preamble, this is not what crankback is about.

This is part of what successfully running an MPLS network is all
about, so maybe crankback is not about successfully running an MPLS
network.  :-)

> > Curtis
> >
> > ps - who cares about ASON?  or ITU for that matter?  :-)
> 
> Cheers,
> Adrian

I think the reason that we differ in opinion on what is more efficient
is that we are looking at two related but different problems.

Crankback may make sense in a call routing situation, where ingress
routinely setup short lived connections.  In such a case the
reservable bandwidth is in a constant state of flux and it is not at
all practical to immediately flood any change.  When a resource
contention occurs, if the number of affected calls is low, crankback
may be efficient.

I am looking at MPLS as a means to setup very long lived connections
for the purpose of traffic engineering an ISP network.  These
connections are intended to never go down if that were possible.  When
the network is quiescent there are few if any setups.  Any setups
during quiescent periods would be due to residual optimizations as a
result of prior change to the network or change to reservable
bandwidth of tunnels that routinely occur (often weekly) as a result
of evaluation of prior traffic patterns.  These occasional setups
during quiescent periods are spaced far enough appart that it is very
rare not to have exact information.  It is only when a major event,
such as a link down, occurs that resource contention occurs and then
quite massive resource contention occurs over a brief period.  In this
case flooding the fact that the resource is exhausted makes good
sense.  Again, flooding every change immediately is not at all
practical, but amount of change and time based triggers, combined with
immediate trigger on resource allocation failure makes sense.

I personally believe that MPLS using RSVP/TE was not intended for call
routing of short lived connections and I have very strong basis in
past IETF meetings and mailing list correspondence to support that.

I also think that useless or not, we may end up implementing crankback
simply because people either think we're solving call routing of short
connections, or people mistakenly think crankback to be a good
solution for the ISP situation.  Hopefully no implementor will think
that crankback is a viable substitute for reflooding when a resource
allocation fails.

Curtis