The MPLS WG Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] Crankback [Was: Running out of RSVP-TE bits?]
Adrian, For brevity I'll use the word "broken" to mean a flaw in the implementation. An isolated setup failure can only occur 1) if correct information has not been made available through flooding or 2) if the ingress implementation is broken. Correct information will not be available if 1) an implementation in the path serving as midpoint is broken and hasn't flooded correct information, 2) more than one router is broken and didn't flood the information, 3) the setup isn't really isolated and the reflooding has yet to occur or has not completely propogated. Careful reading of the above paragraph yields the following conclusion: either something is broken, or the the setup is not truly isolated. The one case where there is no implementation flaw and a setup fails is when the setup follows a previous setup that took resources that have not yet been reported by IGP flooding. The minimum case therefore involves two LSP setups, one that got the resources and one that didn't. In the minimum case without crankback reflooding is needed to indicate that the resource is not available anymore plus an error must be returned to the ingress. With crankback the IGP reflooding becomes redundant for that LSP setup. Rarely is the minimal case of much interest compared to cases where a feature is exercised heavily. This is no exception. Links occasionally fail, sometimes even routers fail. ISPs have reported (publicly I think, but at least privately) to have more than 1,000 RSVP/TE LSPs through a single node with close to 1,000 on a single link. The ingress for those failures would be many 10s of routers in either direction. Topologies are meshed but not densely meshed. This means that these many 10s of ingress are all going to go for the same resource which is now the best after the failure. Setups occur very fast. Often a large percentage of the LSPs will fail admission. This is an interesting case. It is not mearly hypothetical, it is a common real world case. The argument that only the case of an isolated failure is of interest does not hold up. I've already In message <00a501c35096$06001110$0e9c9ed9@Puppy>, "Adrian Farrel" writes: > Curtis, > > I think you have possibly missed the chief use of crankback and focused on a > side use. > > The principal use is to handle LSP setup failures. When these happen they ten > d to happen > one at a time (that is not a whole bunch of impacted LSPs that were using a l > ink that has > failed). Crankback allows nodes on the LSP to attempt to re-direct the LSP se > tup attempt > without failing it back to the ingress, and allows the collection of more det > ailed failure > information to aid the re-direction process. > > In a nutshell, crankback targets rapid re-direction of LSP setup attempts wit > hout waiting > for IGP convergence. That crankback can also be used for handling failed LSPs > is an > 'interesting' additional feature. > > Note that in no case does crankback increase the error reporting message flow > . It simply > adds a little information to error messages already used in RSVP-TE or GMPLS. > In fact, if > crankback is in use and intermediate nodes are enabled as crankback repair po > ints, > crankback can reduce the error reporting message flows since they don't have > to go all the > way to the ingress nodes. > > > > Perhaps you could summarize some of the problems crankback will cause. > > > It would be good to get comments on these problems from the ITU-T members > who > > > have made this an ASON requirement. > > > > After a link fails zero to a large number of admission failures may > > occur on a few links that are considered desireable alternates (by > > multiple LSP ingress, each acting independently). Crankback will > > provide individual signals to each ingress, probably for each LSP that > > failed. > > Well now: the number of signals is not a function of crankback. It is an exis > ting function > of RSVP-TE error reporting. > Crankback simply adds information to the error reporting so that the error ma > y be > repaired. It also recognizes that non-ingress nodes may wish to repair errors > . > > > Just using the IGP flooding is far more efficient and > > proactively tells ingress that haven't sent LSP paths over the now > > overloaded link. > > Efficiency is in the eye of the beholder on this issue. The debate over flood > ing or > signaling errors will probably go on for a while (see Richard Rabbat's work i > n ccamp on > the need for a flooding mechanism for fail-over to a protection LSP that may > be carrying > extra traffic). The efficiency surely depends on the number of LSPs and the n > umber of > nodes/links in the network. > > To repeat: the crankback draft doesn't make any assumptions about the efficie > ncy, it > simply uses existing mechanisms. In fact, there is a significant issue with t > he use of a > flooding mechanism. Viz. the ingress may not be able to tell from a descripti > on of the > faulted resource which LSPs were impacted - suppose a component link fails, o > r even an > individual laser. (You might as well say that soft pre-emption should be repo > rted through > flooding!) Soft preempt applies to a single LSP, not to the state of the link. There is only one ingress to that single LSP and that is the only router that needs to know that that particular LSP has been preempted. All other routers just need to know that reservable bandwidth has just hit zero (or a small number) and that would be flooded. > > Of course if you write specs before trying things you often get things > > wrong, particularly where the dynamics of a protocol are concerned. > > That's where the running code thing comes in. > > I agree with you whole-heartedly. It is so important to have running code. > Not sure why you bring it up in this context. Are you trying to imply somethi > ng? Only that some experience with crankback should be required for it to move forward and if possible its effectiveness in more than the minimal case should be considered before moving it forward. > > I think testing in an > > environment where lots of LSP from many ingress traverse a link that > > fails and all route to one or two links that become overloaded should > > be a prerequisite for crankback, with improvement demonstrated. > > As in my preamble, this is not what crankback is about. This is part of what successfully running an MPLS network is all about, so maybe crankback is not about successfully running an MPLS network. :-) > > Curtis > > > > ps - who cares about ASON? or ITU for that matter? :-) > > Cheers, > Adrian I think the reason that we differ in opinion on what is more efficient is that we are looking at two related but different problems. Crankback may make sense in a call routing situation, where ingress routinely setup short lived connections. In such a case the reservable bandwidth is in a constant state of flux and it is not at all practical to immediately flood any change. When a resource contention occurs, if the number of affected calls is low, crankback may be efficient. I am looking at MPLS as a means to setup very long lived connections for the purpose of traffic engineering an ISP network. These connections are intended to never go down if that were possible. When the network is quiescent there are few if any setups. Any setups during quiescent periods would be due to residual optimizations as a result of prior change to the network or change to reservable bandwidth of tunnels that routinely occur (often weekly) as a result of evaluation of prior traffic patterns. These occasional setups during quiescent periods are spaced far enough appart that it is very rare not to have exact information. It is only when a major event, such as a link down, occurs that resource contention occurs and then quite massive resource contention occurs over a brief period. In this case flooding the fact that the resource is exhausted makes good sense. Again, flooding every change immediately is not at all practical, but amount of change and time based triggers, combined with immediate trigger on resource allocation failure makes sense. I personally believe that MPLS using RSVP/TE was not intended for call routing of short lived connections and I have very strong basis in past IETF meetings and mailing list correspondence to support that. I also think that useless or not, we may end up implementing crankback simply because people either think we're solving call routing of short connections, or people mistakenly think crankback to be a good solution for the ISP situation. Hopefully no implementor will think that crankback is a viable substitute for reflooding when a resource allocation fails. Curtis
|
|