The MPLS WG Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] Comments solicited on the draft-makam-mpls-recovery-frmwrk-0. txt
Hi Srinivas, You wrote (19 Apr 00, 22:16): <extract> We solicit comments from the MPLS WG mailing list on the above proposal as well as on the items in the draft. I have no specific comments on your proposals for atomic and composite terms, except to suggest that there should be benefit from looking at some of the functional modelling that has been done for other technologies such as SDH, ATM and which will also be used to describe the optical layer. Many vendors/operators have found these very useful, and I guess it would be worthwhile trying to harmonise concepts/terminology. (See ITU Rec.s G.803 for SDH and I.372 for ATM). I think this will become increasingly important when the optical layer is considered in conjunction with IP/MPLS. As regards more general issues that the framework should address, I have these observations: 1 Protection switching needs some form of trigger mechanism. The trigger may be as a result of a pre-meditated human intervention (ie 'planned-works' or TE actions by an operator) or an automatic intervention by the network itself in response to network failure/degradation. Taking as-read that facilities will be provided for the operator intervention case, we need to be sure that we have adequately identified/specified the defects which are needed to create the restoration triggers for the failure case. These will range from simple connectivity failures [either from a server layer (identified by a client/server adaptation of an AIS-type Forward Defect Indicator) or from within the MPLS layer itself], to various misbranching conditions wholly within the MPLS layer, eg swapped LSPs, unintended merging of different LSPs, unintended self-merging of the same LSP (aka looping). These latter misbranching defects are possible due to the use of relative addressing, ie non-globally unique labels. Hence, before we can progress restoration studies with any conviction we must have identified and specified (in terms of entry/exit criteria and consequent actions) all the defects that can occur in the MPLS user-plane. It is not apparent to me that this has been done yet. 2 Since important uses of LSPs will be outside the IGP (eg TE and possibly VPN construction) one cannot rely on the routing-protocol behaviour to act as the 'detection tool'. But this is obvious from other considerations such as speed of required restoration action. Further, one should not assume that other aspects of the control-plane (eg signalling network-topology/protocol) will be congruent with the topology of, or suffer the same defect behaviour as, the user-plane. We therefore need trigger mechanisms based on defects relevant to, and detected within, the user-plane. 3 In order that the SLA objectives (which operators will need to define) for the various QoS metrics have statistical meaning, it is important that their aggregation is suspended during periods of persistent outage, ie under the condition that full resource restoration was not possible, or that it was not achieved within a given period. One therefore must specify an availability metric (which is independent of the QoS metrics). This requires that the entry/exit criteria for the availability state of an LSP needs to be defined. We therefore need to think about the definition of such an availability metric for LSPs and I am not aware any work has been done on this. Note: (i) In established WAN server layer technologies such as SDH and ATM, availability is defined around a 10s integration period over which the defect has to persist (entry) or be absent (exit). This is historical in nature (and I think was originally based on how long someone could realistically be expected to hang-on to a 'dead' telephony connection before hanging-up). However, I mention this since this is the value that current WAN server layers tend to operate against in terms of their SLAs (to their client networks or peer-layer applications) (ii) I can foresee some interesting issues arising wrt measuring availability and QoS SLAs for operators when using MPLS. For example, in a single-class BE only environment the distinction between availability and QoS is blurred since failures (assuming no, or partial, resource restoration) get translated into a perceived QoS impairment due to sharing a lesser resource over the same traffic load [at some stage one cannot continue to share a dwindling resource amongst a fixed traffic load and still appeal to a 'QoS translation'.....since the network would, at some resource point, flip modes and become unavailable for all affected users]. With DiffServ and same-class aggregation, the effect is similar to the previous case (ie availability and QoS distinctions become blurred), except that now (as a generalisation) the translation of the failure into a QoS impairment is skewed by class priority instead of uniform. These observations are largely related to the IGP behaviour being the key determinant of network survivability. But when LSP routings are created outside the IGP then the above observations no longer hold and we have to consider the (for want of a better word) 'survivability index' of that particular LSP against its peers and the restoration contention with the previously described traffic. The above arguments can be further extended, and further complicated, by considering these effects over VPNs and public application/service offerings. Since operators will want to be able to offer varying availability SLAs independent of QoS SLAs (esp for VPNs) all the issues raised above (from 1) will be important considerations of the MPLS layer restoration specifications, and as such should be addressed within the framework document (or if not, then somewhere else). Regards, Neil |
|