The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2002-Nov> msg00079



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

draft-nadeau-ip-basedtool-requirements-01

  • From: "David Allan" <dallan@nortelnetworks.com>
  • Date: Tue, 12 Nov 2002 13:42:21 -0500
  • Cc: mpls@UU.NET

Tom/Monique/George:

Interesting document, a "few" comments though ;-)

Section 2:
"Any error condition that prevents an LSR"....do you mean LSP here?

Section 3:

'a' this munges together detection and diagnosis. Shouldn't path liveliness
and path tracing be separated out as separate requirements. (They seem to be
synonymous in this document).

'b' Are you inheriting all the requirements of the GTTP requirements
document by reference. There is a lot of stuff in there that is more generic
and may not be applicable. Is there a specific difference between tunnel
trace and path trace in this document?

'c' The phrase "automation of path tracing..." are you really discussing
misbranching/mismerging detection. My read of this is that I am required to
periodically perform traceroute on LSPs. I doubt that is the actual intent.

'd' [LSPPING] for CE to CE verification. Besides violating the
"non-prescriptive" claim in the introduction, I didn't think MPLS went CE to
CE ;-) The wording seems to confuse path tracing as a response to failure
detection, and the mechanism for failure detection as outlined in 'b'. Also
some requirements seem to do with the tools, and others with respect to box
behaviors (e.g. suggesting a nodal response to failure detection), it might
be cleaner simply to stick to what the tools should do, and let folks roll
up the system behaviors into applications themselves.

'e' If I understood 'b' correctly, is this not the same thing?

'f' Requirements 'f', 'l' and 'm' seem to be largely overlapping, although I
cannot actually parse 'l', and may be interested in latency for reasons
other than SLA.

'g' what is a "device" in this context, and that it only "may" take
corrective action runs counter to the first sentence suggesting the network
MUST self-heal. The general requirement is rather high level. I've preferred
the wording that "the network detect faults, and may facilitate automated
response to restore service (e.g. via fault notification or whatever)".

'h' Agree

'i' What OAM functions are common to how both P2P and MP2P are instrumented?
Presumably one solution that meets the requirement is to overlay P2P on
MP2P....

'j' In general agree, with the proviso that some synchronization of tool
usage/frequency is required for availability metrics, e.g. the network needs
to function as a system and the OAM functions harmonized across the system.

'k' The wording seems to undermine requirements 'h' and 'r'. IMHO this seems
to highlight one problem with instrumenting load balancing (as per ECMP
etc.) in general. The approach is that all paths need to be tested, the
specific implication is that they all need to be able to be tested from any
point. That I do not think is sustainable. If OAM probes are to follow the
same path as the user traffic, then ECMP should function independently of
both the OAM probes and the user traffic. Otherwise we're into the "monkeys
and typewriters" verification model as the OAM probes try various
permutations to impersonate user traffic's ECMP characteristics (which by
definition is proprietary and therefore unknowable to the probe origin).   

'l' See 'f'.

'm' This one is not clear to me, and sort of suggests using bursts of OAM
packets to characterize LSP performance, a clarification please..?

'n' The actual example is rather trivial as all the LSPs discussed terminate
in a single box. What is required is alarm suppression in all clients
regardless of where the client LSP, PW, VCC, VPC etc. terminates.

'o' This seems othogonal to the other requirements, and seems relatively
hard not to satisfy. In other words, why is it here?

'p' Seems prescriptive. IMHO detection of, and limitation of
damage/disruption done by DOS attacks is the requirement. "How" is for
another day.

'q' OAM interworking for fault notification is only a part of the problem.
The client may be monitored, therefore timliness of this stuff matters as it
either needs to harmonize with the client OAM, or the client OAM needs to be
configured to "hold off" before reacting. Which also implies a requirement
for bounded detection times, which is one nit I have with current proposals
for instrumentation of ECMP and other load sharing mechanisms (one offshoot
being the "guidelines for load balancing" draft to be presented later on the
WG agenda).

'r' This is a rather long winded way to say that "liveliness must test the
actual forwarding path (proxy verification of what systems "think" is
happening is insufficient)".

And finally, nothing in the document leap out at me as justifying the title
(except perhaps the use of SNMP ;-)

cheers
Dave