The MPLS WG Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] comments on nagarajan-ccamp-mpls-packet-protection-00.txt
Hi Curtis.....nice to hear from you again. You make some important points....please see repsonses in-line. regards, Neil Curtis Villamizar wrote 24 April 2002 19:23 > In message > <B9571FDEBD3DD21181E500606DD5EE0514BABC19@mbddmknt01.hc.bt.com>, nei > l.2.harrison@bt.com writes: > > > > 1 I noted a criticism in the mtg notes that it uses 2x > the BW of one > > LSP....well yes it is 1+1 so I'd say that's obvious. > However, I think we > > have to place any such criticism of BW efficiency in > context. For example, > > if I compare it with LDP as a server layer to (say) rfc2547 > VPNs, then > > because there is no relationship between a pkt's up-state > QoS forwarding > > treatment and a pkt's survivability requirements (vis-à-vis same or > > different DS-coded pkts of *any* VPN), operators are forced > to over-engineer > > such networks and *hope* (because there is no assurance) > that traffic > > survives under failures....a factor of 2x over-engineering > on some DS > > classes is not uncommon. > > > > Hence, the point I want to make here is that using BW wisely to > > reduce/remove a complexity/problem is one thing (which I > support), but > > asking an operator to throw BW at a problem that should not > really exist > > (because the application/problem in question has only been partially > > defined, eg just the connectivity bit of VPNs) is something > else. So any > > criticism of BW efficiency only makes sense against the > context of the > > application/problem it is addressing IMO. > > > > Point of failure protection schemes (local-protect and fast-reroute) > provide 50msec recovery. I'll just say FRR for short. NH=> Yes I am aware this is their claim/target. Totally uncessary, and IMO shows a lack of clear thinking/understanding of the total problem space here......no application needs these speeds and you are going to get prot-sw/restoration events for error bursts that would have self-cleared. You make a rod for your back (chasing L1 SDH/Sonet ring restoration times) by going too fast as one moves into the upper layer networks and closer to end applications. Lots of reasons why, many I have pointed out in previous mails when this topic has surfaced before. But if people are silly enough to do this then that's up to them I guess. > This technique > doesn't require sending double the bandwidth. Some implementations > might not make the 50msec but some do. > > Fully disjoint paths from the ingress yield from 1/4 to 1-2 seconds > recovery (depending again on who's implementation) and these require > less signaling than point of failure protection. In our > implementation these are called "standby" LSPs. NH=> Now that sounds like far more sensible port-sw/restoration times at these network layers....you can do a great deal with a topology database *before* a failure occurs ;-) You still need knowledge to the duct layer however to make it work....once you 'lease' the lower layers this info goes......and that's why the so-called peer-model in GMMPLS can never scale, ie commercial non-starter even if it was technically feasible. > > Reservations on the backups (using either FRR or standby) at a > numerically higher setup/hold priority can help avoid congestion (or > completely avoid it) during the failure without impacting the traffic > engineering of the primary path (on the numerically lower priority). NH=> I have said it before but I will do so again......please make sure any pre-emption/bumping scheme can be disabled. Its largely a waste of time: At low loads its a non-issue, and at high loads (which is strictly the only time you really need it) it rapidly becomes unpredicatable. Should not be a surprise, this is typical network behaviour (any type) at the 'loading knee'....same considerations apply to DS. I am also not saying this from some theoretical viewpoint but from bitter practical experience of such schemes. > > The ISPs that I've talked to about the topic of restoration indicate > that although fast restoration is highly desireable for some services, NH=> But I'll bet no one can *justify* 50ms! > a strong requirement is efficient use of available bandwidth. NH=> Yipee! We have been saying that those who think BW comes for free are nuts.....welcome to the real operator world. Those have not heeded this have already gone, or will go, bust. > The > goal is to allow the most cost effective delivery of service and keep > the financial bottom line in the black. This may have something to do > with a competative business environment but whatever the reason, the > "throw more bandwidth at it" approach does not seem to be in favor. NH=> Its not news to me this.....and its about time others grapsed this now the 'silly' margins on BW have all but gone. Anyone for GMPLS/SVC-like L1 BoD?....try getting a business case to fly on that one! > > So far signaling extensions to share the backup bandwidth have been > proposed but gone nowhere (perhaps implementation will help). This > would yield even better use of resources (or better avoidance of > congestion if overbooking backup bandwidth was eliminated). Of > course, if the payload is non-IP overbooking backups may not be an > option so this then would just yield better use of resources. > > Of course, FRR requires that the point of failure be able to detect > its own failure which may not be the case for optical devices. > Doubling the bandwidth is a steep price to pay to satisfy the > restoration needs. NH=> I agree Curtis. That is exactly why I said for *critical applications* (eg control/management channels) the proposals could be attractive.....and its also simple/elegant, which usually implies 'sensible'. > If the cost of WDM port plus optical switch port > is half that of a router port, that might be fine. It still might > make a lot more economic sense to use FRR to provide restoration at > the PSC capable boxes outside the optical domain which will generally > be plenty fast enough. > > IMHO - there is no need for this draft. In the end the ISPs/carriers > vote with their dollars and I haven't seen any "yes" votes of that > sort so I won't be coding this quite yet. > > As far as VPN is concerned, some providers are using (at least in > trial if not production) LDP over MPLS/TE and using the MPLS/TE > traffic engineering and fast recovery capabilites. NH=> This is sort-of what I have been thinking for some time too. The key problem is LDP as the bottom/server layer of LSPs *if* one aggregates over all VPNs as I pointed out; in particular, it does not allow one to differentiate a pkt's QoS and survivability attributes.....these are quite different. In the case of VPNs I would argue strongly that it is the trail object (ie the LSP) that has to carry the survivability semantics.....but its can't do this under like-DS-class aggregation in LDP......so we have to throw BW at the problem. I don't mind throwing BW at problems *wisely*, to get rid of the problem or reduce its complexity *if* there is no alternative technology solution......but I don't like throwing BW at the problem when the basis of the problem is not addressable in technology X, but there is technolgy Y that can sort it.....and perhaps gives other benefits too. > This scales well > if the network is divided into a core and regions keeping the number > of LSPs manageable and can work well if the nodes (routers) at the > borders of the core and regions are very reliable. NH=> Remember this.....its one thing targeting a smallish niche market and its a very different thing scaling to massive public network volumes.....the network solutions are usually not the same, and the NMS/OSS requitrements are radically different. 'Enterprise sourced' solutions usually can't cut-it for large operators when looking for the massive volumes...and we also need automatic defect detection/handling as you know I have been advocating for some time, as this too is a 'sign' of an ability/desire to address the large operator case. > > Curtis > |
|