The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2000-Apr> msg00247



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

Comments solicited on the draft-makam-mpls-recovery-frmwrk-0. txt

  • From: neil.2.harrison@bt.com
  • Date: Thu, 27 Apr 2000 10:40:16 +0100

Hi Srinivas,  You wrote (19 Apr 00, 22:16):
	<extract>
	We solicit comments from the MPLS WG mailing list on the above
proposal as 
	well as on the items in the draft.

I have no specific comments on your proposals for atomic and composite
terms, except to suggest that there should be benefit from looking at some
of the functional modelling that has been done for other technologies such
as SDH, ATM and which will also be used to describe the optical layer.
Many vendors/operators have found these very useful, and I guess it would be
worthwhile trying to harmonise concepts/terminology.  (See ITU Rec.s G.803
for SDH and I.372 for ATM).  I think this will become increasingly important
when the optical layer is considered in conjunction with IP/MPLS.

As regards more general issues that the framework should address, I have
these observations:

1	Protection switching needs some form of trigger mechanism.  The
trigger may be as a result of a pre-meditated human intervention (ie
'planned-works' or TE actions by an operator) or an automatic intervention
by the network itself in response to network failure/degradation.  Taking
as-read that facilities will be provided for the operator intervention case,
we need to be sure that we have adequately identified/specified the defects
which are needed to create the restoration triggers for the failure case.
These will range from simple connectivity failures [either from a server
layer (identified by a client/server adaptation of an AIS-type Forward
Defect Indicator) or from within the MPLS layer itself], to various
misbranching conditions wholly within the MPLS layer, eg swapped LSPs,
unintended merging of different LSPs, unintended self-merging of the same
LSP (aka looping).  These latter misbranching defects are possible due to
the use of relative addressing, ie non-globally unique labels.

Hence, before we can progress restoration studies with any conviction we
must have identified and specified (in terms of entry/exit criteria and
consequent actions) all the defects that can occur in the MPLS user-plane.
It is not apparent to me that this has been done yet.

2	Since important uses of LSPs will be outside the IGP (eg TE and
possibly VPN construction) one cannot rely on the routing-protocol behaviour
to act as the 'detection tool'.  But this is obvious from other
considerations such as speed of required restoration action.  Further, one
should not assume that other aspects of the control-plane (eg signalling
network-topology/protocol) will be congruent with the topology of, or suffer
the same defect behaviour as, the user-plane.  We therefore need trigger
mechanisms based on defects relevant to, and detected within, the
user-plane.

3	In order that the SLA objectives (which operators will need to
define) for the various QoS metrics have statistical meaning, it is
important that their aggregation is suspended during periods of persistent
outage, ie under the condition that full resource restoration was not
possible, or that it was not achieved within a given period.  One therefore
must specify an availability metric (which is independent of the QoS
metrics).  This requires that the entry/exit criteria for the availability
state of an LSP needs to be defined.  We therefore need to think about the
definition of such an availability metric for LSPs and I am not aware any
work has been done on this.

Note:
(i)	In established WAN server layer technologies such as SDH and ATM,
availability is defined around a 10s integration period over which the
defect has to persist (entry) or be absent (exit).  This is historical in
nature (and I think was originally based on how long someone could
realistically be expected to hang-on to a 'dead' telephony connection before
hanging-up).  However, I mention this since this is the value that current
WAN server layers tend to operate against in terms of their SLAs (to their
client networks or peer-layer applications)

(ii)	I can foresee some interesting issues arising wrt measuring
availability and QoS SLAs for operators when using MPLS.  For example, in a
single-class BE only environment the distinction between availability and
QoS is blurred since failures (assuming no, or partial, resource
restoration) get translated into a perceived QoS impairment due to sharing a
lesser resource over the same traffic load [at some stage one cannot
continue to share a dwindling resource amongst a fixed traffic load and
still appeal to a 'QoS translation'.....since the network would, at some
resource point, flip modes and become unavailable for all affected users].
With DiffServ and same-class aggregation, the effect is similar to the
previous case (ie availability and QoS distinctions become blurred), except
that now (as a generalisation) the translation of the failure into a QoS
impairment is skewed by class priority instead of uniform.  These
observations are largely related to the IGP behaviour being the key
determinant of network survivability.  But when LSP routings are created
outside the IGP then the above observations no longer hold and we have to
consider the (for want of a better word) 'survivability index' of that
particular LSP against its peers and the restoration contention with the
previously described traffic.  The above arguments can be further extended,
and further complicated, by considering these effects over VPNs and public
application/service offerings.  Since operators will want to be able to
offer varying availability SLAs independent of QoS SLAs (esp for VPNs) all
the issues raised above (from 1) will be important considerations of the
MPLS layer restoration specifications, and as such should be addressed
within the framework document (or if not, then somewhere else). 

Regards, Neil