The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2002-Apr> msg00218



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

comments on nagarajan-ccamp-mpls-packet-protection-00.txt

  • From: neil.2.harrison@bt.com
  • Date: Thu, 25 Apr 2002 12:15:15 +0100
  • Cc: mpls@UU.NET
  • X-MIME-Autoconverted: from quoted-printable to 8bit by cell.onecall.net id g3PBGdk32301

Hi Curtis.....nice to hear from you again.  You make some important
points....please see repsonses in-line.

regards, Neil

Curtis Villamizar wrote 24 April 2002 19:23
> In message 
> <B9571FDEBD3DD21181E500606DD5EE0514BABC19@mbddmknt01.hc.bt.com>, nei
> l.2.harrison@bt.com writes:
> > 
> > 1	I noted a criticism in the mtg notes that it uses 2x 
> the BW of one
> > LSP....well yes it is 1+1 so I'd say that's obvious.  
> However, I think we
> > have to place any such criticism of BW efficiency in 
> context.  For example,
> > if I compare it with LDP as a server layer to (say) rfc2547 
> VPNs, then
> > because there is no relationship between a pkt's up-state 
> QoS forwarding
> > treatment and a pkt's survivability requirements (vis-à-vis same or
> > different DS-coded pkts of *any* VPN), operators are forced 
> to over-engineer
> > such networks and *hope* (because there is no assurance) 
> that traffic
> > survives under failures....a factor of 2x over-engineering 
> on some DS
> > classes is not uncommon.
> > 
> > Hence, the point I want to make here is that using BW wisely to
> > reduce/remove a complexity/problem is one thing (which I 
> support), but
> > asking an operator to throw BW at a problem that should not 
> really exist
> > (because the application/problem in question has only been partially
> > defined, eg just the connectivity bit of VPNs) is something 
> else.  So any
> > criticism of BW efficiency only makes sense against the 
> context of the
> > application/problem it is addressing IMO.
> 
> 
> 
> Point of failure protection schemes (local-protect and fast-reroute)
> provide 50msec recovery.  I'll just say FRR for short.
NH=> Yes I am aware this is their claim/target.  Totally uncessary, and IMO
shows a lack of clear thinking/understanding of the total problem space
here......no application needs these speeds and you are going to get
prot-sw/restoration events for error bursts that would have self-cleared.
You make a rod for your back (chasing L1 SDH/Sonet ring restoration times)
by going too fast as one moves into the upper layer networks and closer to
end applications.  Lots of reasons why, many I have pointed out in previous
mails when this topic has surfaced before.  But if people are silly enough
to do this then that's up to them I guess.

> This technique
> doesn't require sending double the bandwidth.  Some implementations
> might not make the 50msec but some do.
> 
> Fully disjoint paths from the ingress yield from 1/4 to 1-2 seconds
> recovery (depending again on who's implementation) and these require
> less signaling than point of failure protection.  In our
> implementation these are called "standby" LSPs.
NH=> Now that sounds like far more sensible port-sw/restoration times at
these network layers....you can do a great deal with a topology database
*before* a failure occurs ;-)
You still need knowledge to the duct layer however to make it work....once
you 'lease' the lower layers this info goes......and that's why the
so-called peer-model in GMMPLS can never scale, ie commercial non-starter
even if it was technically feasible.
> 
> Reservations on the backups (using either FRR or standby) at a
> numerically higher setup/hold priority can help avoid congestion (or
> completely avoid it) during the failure without impacting the traffic
> engineering of the primary path (on the numerically lower priority).
NH=> I have said it before but I will do so again......please make sure any
pre-emption/bumping scheme can be disabled.  Its largely a waste of time:
At low loads its a non-issue, and at high loads (which is strictly the only
time you really need it) it rapidly becomes unpredicatable.  Should not be a
surprise, this is typical network behaviour (any type) at the 'loading
knee'....same considerations apply to DS.  I am also not saying this from
some theoretical viewpoint but from bitter practical experience of such
schemes.
> 
> The ISPs that I've talked to about the topic of restoration indicate
> that although fast restoration is highly desireable for some services,
NH=> But I'll bet no one can *justify* 50ms!

> a strong requirement is efficient use of available bandwidth.
NH=> Yipee!  We have been saying that those who think BW comes for free are
nuts.....welcome to the real operator world.  Those have not heeded this
have already gone, or will go, bust.

>  The
> goal is to allow the most cost effective delivery of service and keep
> the financial bottom line in the black.  This may have something to do
> with a competative business environment but whatever the reason, the
> "throw more bandwidth at it" approach does not seem to be in favor.
NH=> Its not news to me this.....and its about time others grapsed this now
the 'silly' margins on BW have all but gone.  Anyone for GMPLS/SVC-like L1
BoD?....try getting a business case to fly on that one!
> 
> So far signaling extensions to share the backup bandwidth have been
> proposed but gone nowhere (perhaps implementation will help).  This
> would yield even better use of resources (or better avoidance of
> congestion if overbooking backup bandwidth was eliminated).  Of
> course, if the payload is non-IP overbooking backups may not be an
> option so this then would just yield better use of resources.
> 
> Of course, FRR requires that the point of failure be able to detect
> its own failure which may not be the case for optical devices.
> Doubling the bandwidth is a steep price to pay to satisfy the
> restoration needs.
NH=> I agree Curtis.  That is exactly why I said for *critical applications*
(eg control/management channels) the proposals could be attractive.....and
its also simple/elegant, which usually implies 'sensible'.

>  If the cost of WDM port plus optical switch port
> is half that of a router port, that might be fine.  It still might
> make a lot more economic sense to use FRR to provide restoration at
> the PSC capable boxes outside the optical domain which will generally
> be plenty fast enough.
> 
> IMHO - there is no need for this draft.  In the end the ISPs/carriers
> vote with their dollars and I haven't seen any "yes" votes of that
> sort so I won't be coding this quite yet.
> 
> As far as VPN is concerned, some providers are using (at least in
> trial if not production) LDP over MPLS/TE and using the MPLS/TE
> traffic engineering and fast recovery capabilites.
NH=> This is sort-of what I have been thinking for some time too.  The key
problem is LDP as the bottom/server layer of LSPs *if* one aggregates over
all VPNs as I pointed out;  in particular, it does not allow one to
differentiate a pkt's QoS and survivability attributes.....these are quite
different.   In the case of VPNs I would argue strongly that it is the trail
object (ie the LSP) that has to carry the survivability semantics.....but
its can't do this under like-DS-class aggregation in LDP......so we have to
throw BW at the problem.  I don't mind throwing BW at problems *wisely*, to
get rid of the problem or reduce its complexity *if* there is no alternative
technology solution......but I don't like throwing BW at the problem when
the basis of the problem is not addressable in technology X, but there is
technolgy Y that can sort it.....and perhaps gives other benefits too.

> This scales well
> if the network is divided into a core and regions keeping the number
> of LSPs manageable and can work well if the nodes (routers) at the
> borders of the core and regions are very reliable.
NH=> Remember this.....its one thing targeting a smallish niche market and
its a very different thing scaling to massive public network volumes.....the
network solutions are usually not the same, and the NMS/OSS requitrements
are radically different.  'Enterprise sourced' solutions usually can't
cut-it for large operators when looking for the massive volumes...and we
also need automatic defect detection/handling as you know I have been
advocating for some time, as this too is a 'sign' of an ability/desire to
address the large operator case.
> 
> Curtis
>