The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2000-Nov> msg00079



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

LSP failure detection

  • From: neil.2.harrison@bt.com
  • Date: Fri, 10 Nov 2000 08:22:05 -0000
  • Cc: david.charlap@marconi.com, mpls@UU.NET

Hi Kireeti,
	<snipped>
> Two related points: first, if an LSP (for whatever reason) runs over
> unprotected links (say GigE), you would want it to recover as fast as
> possible, because L1 just cannot recover.
	NH=> If one has access to the L1 section/media coding and can detect
failure at this level (eg laser dead) then that's fine.  To me, however,
this is restoration at the L1 level (at whatever the trail at this level
is).  If, on the other hand, one is using the section/media trail
information to trigger a restoration in a higher level client (which will
generally not be congruent with the lower server trail, ie 'server layer
trails = client layer links') then I would be very wary of the general
applicability of such a technique.
	The most important observation is that this model cannot be
generalised (ie its needs intimate knowledge of all client/server trails
relationships to the duct network layer), so once you step outside the
domain of 'total control' (ie across an inter-operator NNI) one will not
know the lower server layer trail composition.  So this applies to all
situations where (i) an operator has to lease lower layer services from
another operator (eg to provide a global VPN solution for a given customer
say) or (ii) looking at the SME->mass market end of the spectrum, the
customers who wish to communicate are served by access points of different
operator doamains.  I would not regard these cases as unusual, in fact I
consider them the norm and are a factor of global
interoperability/applicability, and so cannot be ignored.
	A 2nd point is that using the characteristic information (of the
trail) in one layer network (be it server or client) to effect actions in
another layer network constitutes an architectural violation.  I agree it
can be made to work, but the penalty one pays it that the 2 layer networks
have now become intrinsically bound together and so cannot evolve (with the
ultimate case of being removed/replaced over time) without impacting the
other layer network.  So one can create backwards compatibility problems by
violating this architectural principle.

> Second, a unified control plane and a common view of the topology is
> needed to make these kinds of decisions.
	NH=> I totally agree.  Hence the observations above, ie the model
does apply outside of an NNI or where one is leasing a server layer from
another operator.  But these are the facts of life for operators and so
cannot be ignored.  This is indeed why, when we were discussing the
peer/overlay models for the OTN, BT was very clear as to why we need the
overlay model and wanted to point out the limited applicability of the peer
model.
> Say the above LSP runs over
> 4 segments, two of which are SONET rings, one of which has link local
> protection (a la LMP) and the last is unprotected.  The ideal solution
> is to protect the last segment with MPLS fast reroute, and to leave
> recovery on the other segments to L1 mechanisms.  But to get such fine
> grain protection schemes, and to *not* get the "unnecessary hits" from
> double protection (as well as the network inefficiency) requires that
> MPLS knows what L1 is doing.
	NH=> I am not sure this model is a correct interpretation of the
situation Kireeti.  In the case given above there *should*, in my view, be 2
LSPs required:
	-	one is the trail that goes end-end (ie between its trail
termination points across *all* the 4 lower server layer trails....these are
*not* segments BTW, these are layer network entities in their own right);
	-	one is a server layer (LSP) trail on the *last* partition of
the client end-end LSP.
	It is this last server layer LSP that needs to provide the prot-sw
service you are describing......it should not be a function of the partition
of the client LSP.
	Not sure how well I described that, but I hope what I am trying to
say is clear.

> I agree with the high order bit, though: having fast protection at
> multiple layers is asking for trouble.
	NH=>Every time we have studied prot-sw in BT we come to the same
conclusions......minimise the layers at which prot-sw is deployed....ideally
just 2:  L1 and fast, applications interfacing and (relatively) slow.  And
as I said in a few prior mails on this topic.....don't go too fast at the
application interfacing level.  It is not needed, it will cause problems and
increase costs (since every layer where prot-sw is deployed lower has to go
increasingly faster), you will get prot-sw for events that self-heal, if you
use bumping at high utilisations then you will be in big trouble (N hits for
1), etc.
	<snipped>

> > NH=> Voice is one application that certainly does not need a sub-1s
> > restoration time.
> 
> Don't know about this.
	NH=> I do......but its common sense anyway.  For voice the critical
observation is how long will someone hold onto a call that has died before
giving up?  I would suggest it is probably of the order of 4-8s for maybe
90% of the population (in practice I think the range will extend further at
the higher end).
>   I do know that we (and other vendors) have
> a lot of customers asking us for sub-second (even 50ms) MPLS fast
> reroute.
	NH=> But do you know why?  I would suggest its historical because
someone set a precedent that (due to the particular nature of the L1
technology where it originated) and this now sets the benchmark for other
technologies to hit.  Well that's OK if you go into it with your eyes open
and recognise you may be making a big mistake, incurring totally unecessary
costs and making poor restoration decisions that incur higher performance
penalties in practice.      Even if (big if) voice doesn't need sub-second
recovery,
> there will be applications that do
	NH=> Can someone please tell me what these are?  I think we should
start by listing them.  If we can prove these exist, and are the majority,
then lets go for these very fast restorations as the norm.  If, on the other
hand, these either don't exist or are not the norm then (i) don't do it or
(ii) just target those that do (there are several ways one can do this, ie
dual pre-calc routes) respectively
> , and many find the option to use
> MPLS fast reroute over GigE or unprotected SONET/SDH gear appealing.
	NH=> I don't want to stop anyone going down this road if they want.
That is their decision.  All I can do is point out some logic/experience
that you may want to consider before departing.