The MPLS WG Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] LSP failure detection
Hi Kireeti, <snipped> > Two related points: first, if an LSP (for whatever reason) runs over > unprotected links (say GigE), you would want it to recover as fast as > possible, because L1 just cannot recover. NH=> If one has access to the L1 section/media coding and can detect failure at this level (eg laser dead) then that's fine. To me, however, this is restoration at the L1 level (at whatever the trail at this level is). If, on the other hand, one is using the section/media trail information to trigger a restoration in a higher level client (which will generally not be congruent with the lower server trail, ie 'server layer trails = client layer links') then I would be very wary of the general applicability of such a technique. The most important observation is that this model cannot be generalised (ie its needs intimate knowledge of all client/server trails relationships to the duct network layer), so once you step outside the domain of 'total control' (ie across an inter-operator NNI) one will not know the lower server layer trail composition. So this applies to all situations where (i) an operator has to lease lower layer services from another operator (eg to provide a global VPN solution for a given customer say) or (ii) looking at the SME->mass market end of the spectrum, the customers who wish to communicate are served by access points of different operator doamains. I would not regard these cases as unusual, in fact I consider them the norm and are a factor of global interoperability/applicability, and so cannot be ignored. A 2nd point is that using the characteristic information (of the trail) in one layer network (be it server or client) to effect actions in another layer network constitutes an architectural violation. I agree it can be made to work, but the penalty one pays it that the 2 layer networks have now become intrinsically bound together and so cannot evolve (with the ultimate case of being removed/replaced over time) without impacting the other layer network. So one can create backwards compatibility problems by violating this architectural principle. > Second, a unified control plane and a common view of the topology is > needed to make these kinds of decisions. NH=> I totally agree. Hence the observations above, ie the model does apply outside of an NNI or where one is leasing a server layer from another operator. But these are the facts of life for operators and so cannot be ignored. This is indeed why, when we were discussing the peer/overlay models for the OTN, BT was very clear as to why we need the overlay model and wanted to point out the limited applicability of the peer model. > Say the above LSP runs over > 4 segments, two of which are SONET rings, one of which has link local > protection (a la LMP) and the last is unprotected. The ideal solution > is to protect the last segment with MPLS fast reroute, and to leave > recovery on the other segments to L1 mechanisms. But to get such fine > grain protection schemes, and to *not* get the "unnecessary hits" from > double protection (as well as the network inefficiency) requires that > MPLS knows what L1 is doing. NH=> I am not sure this model is a correct interpretation of the situation Kireeti. In the case given above there *should*, in my view, be 2 LSPs required: - one is the trail that goes end-end (ie between its trail termination points across *all* the 4 lower server layer trails....these are *not* segments BTW, these are layer network entities in their own right); - one is a server layer (LSP) trail on the *last* partition of the client end-end LSP. It is this last server layer LSP that needs to provide the prot-sw service you are describing......it should not be a function of the partition of the client LSP. Not sure how well I described that, but I hope what I am trying to say is clear. > I agree with the high order bit, though: having fast protection at > multiple layers is asking for trouble. NH=>Every time we have studied prot-sw in BT we come to the same conclusions......minimise the layers at which prot-sw is deployed....ideally just 2: L1 and fast, applications interfacing and (relatively) slow. And as I said in a few prior mails on this topic.....don't go too fast at the application interfacing level. It is not needed, it will cause problems and increase costs (since every layer where prot-sw is deployed lower has to go increasingly faster), you will get prot-sw for events that self-heal, if you use bumping at high utilisations then you will be in big trouble (N hits for 1), etc. <snipped> > > NH=> Voice is one application that certainly does not need a sub-1s > > restoration time. > > Don't know about this. NH=> I do......but its common sense anyway. For voice the critical observation is how long will someone hold onto a call that has died before giving up? I would suggest it is probably of the order of 4-8s for maybe 90% of the population (in practice I think the range will extend further at the higher end). > I do know that we (and other vendors) have > a lot of customers asking us for sub-second (even 50ms) MPLS fast > reroute. NH=> But do you know why? I would suggest its historical because someone set a precedent that (due to the particular nature of the L1 technology where it originated) and this now sets the benchmark for other technologies to hit. Well that's OK if you go into it with your eyes open and recognise you may be making a big mistake, incurring totally unecessary costs and making poor restoration decisions that incur higher performance penalties in practice. Even if (big if) voice doesn't need sub-second recovery, > there will be applications that do NH=> Can someone please tell me what these are? I think we should start by listing them. If we can prove these exist, and are the majority, then lets go for these very fast restorations as the norm. If, on the other hand, these either don't exist or are not the norm then (i) don't do it or (ii) just target those that do (there are several ways one can do this, ie dual pre-calc routes) respectively > , and many find the option to use > MPLS fast reroute over GigE or unprotected SONET/SDH gear appealing. NH=> I don't want to stop anyone going down this road if they want. That is their decision. All I can do is point out some logic/experience that you may want to consider before departing.
|
|