Hi Ben,
> 1) Why is it restricted to P2MP TE
LSPs? mLDP signalled
> LSPs will also require OAM and it's
not clear to me why
> they have been precluded by this
draft.
Well, this draft does *not* preclude
mLDP.
There are some factors to
consider.
a. When this draft was started, mLDP did not
exist
b. When this draft was developed there were two
competing
mLDP solutions drafts and no
clear definition of mLDP
FECs.
c. The existing MPLS-TE P2MP techniques are
extensible
(by use of a FEC) but it is
not clear that the problem
space is the same with
regard to scalability for mLDP.
In particular, the ingress
of an mLDP tree is (perhaps)
not aware of the identities
of the egresses and so it is
hard to ping a single egress
along the tree.
d. What would be wrong about having two drafts
for this?
Actually, at the last couple of IETFs (and on
the list, if I recall) I have asked the question about whether to roll this
all into one I-D or to keep it separate, and I have not had anything in the
way of an answer.
At the moment, we are inclined to keep
the two pieces of work separate for cleanliness. But doing so does not imply
that anyone thinks that mLDP should be without OAM.
BPN=> OK. TBH I don't care if there is
one piece of work or two but there is a clear requirement for OAM
mechanisms for mLDP and I haven't seen any drafts that address this
requirement, which is what prompted my question about mLDP being
precluded.
> 2) Section 4.1 of
draft-ietf-mpls-p2mp-oam-reqs-01 states
> "The ability to detect defects in a P2MP LSP SHOULD not
> require manual, hop-by-hop troubleshooting of each LSR
> used to switch traffic for that LSP, and SHOULD rely on
> proactive OAM procedures (such as continuous path
> connectivity and SLA measurement mechanisms)." whereas
> the comments
in draft-ietf-mpls-p2mp-lsp-ping-01 imply
> to me that proactive/continuous OAM may not be possible
> with the proposed solution due to scaling concerns
> resulting from the possibility of
flooding the ingress
> with responses.
Hmmm.
May be there are some semantics around
"proactive" and "continuous".
I think that these terms could be taken to mean
that you should be able to initiate a "probe" at the ingress and rely
on the probe mechanism to determine the existence of and location of the
defect. This should happen without the operator having to initiate a probe
hop-by-hop through the network along the path of the LSP.
BPN=> I
interpret 'proactive' and 'continous' to mean that probe/ping/CC/CV (or
whatever you want to call them) packets are (automatically) sent at regular
intervals to detect faults, cf CV packets in Y.1711.
But we should also take into account that
the proposed P2MP LSP Ping is not intended to be a solution for all
requirements in the P2MP OAM Requirements I-D. Indeed, section 4.1 of the P2MP
OAM Requirements I-D is quite clear about the issues of using LSP Ping for
defect detection (compared with section 4.2 that describes defect
diagnosis).
BPN=> OK, in
which case it would be good to see some drafts describing other
proposals/solutions.
Whether some other mechanism needs to be added
to the toolkit (compare with BFD for P2P LSPs) for error detection is up for
grabs, and I am sure that the working group would welcome
contributions.
> As well as the ability to
delay/jitter
> the echo responses has consideration
been given to other
> potential ways to improve scalability
for example using a
> Nack model for responses, somehow
aggregating responses
> etc.?
We did discuss a Nack model. I am not
sure if it is what you are talking about.
BPN=> I was just thinking out loud
really...
In brief we considered...
- OAM is enabled (see later)
- Egress expects to receive "ping" every time
unit
- In the absence of a "ping" the egress sends an
alert
We felt that OAM should not be permanently active on what
may be a low bandwidth service for fear of being the main consumer of traffic.
Therefore, OAM must be enabled by an explicit act (and not implicitly enabled
by the establishment of the LSP).
Thus, we can say that OAM is enabled by the first ping. We
would, of course, need a positive ack to that event. After that, we can resort
to Nack processing. But I think there is some handshaking needed to make sure
that the positive ack has been received.
Disabling is more of a worry. If the disable instruction
does not reach an egress, it will continue to generate Nacks. Thus we would
need some form of back-off on Nacks dropping quite quickly to assuming that
OAM is disabled.
All of this is achievable, and could be built on top of LSP
ping, but what is the cost/benefit? It is certainly a more complicated
process, and I don't hear anyone suggesting that they plan to implement it.
But if someone is working on such a solution, I'm sure we'd be happy to fold
it in to our draft.
Aggregation of responses was also briefly discussed. We have
two problems with this.
1. How does the transit node know how long to hold on to a
response in the hope that another will come along? In fact, won't any delay
for aggregation simply serve to destroy the benefits of jitter?
2. What makes us assume that responses will follow paths
that make aggregation practical?
Cheers,
Adrian