The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2005-Aug> msg00055



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

[mpls] draft-yasukawa-mpls-p2mp-lsp-ping-02.txtforworkingdocument?

  • From: "Adrian Farrel" <adrian@olddog.co.uk>
  • Date: Tue, 30 Aug 2005 14:37:09 +0100
  • Cc: Bill Fenner <fenner@research.att.com>, mpls@ietf.org, Zafar Ali <zali@cisco.com>

Hi JL,
 
>> I would like to pursue the scaling issue with you in more
>> detail since it is something that has worried us as authors.
>>
>> The problem is essentially the stress that is placed on the
>> root (and the network near the root) if one pings the entire
>> tree and receives responses from all of the leaves.
>>
>> The first point to note is, of course, that this only becomes
>> a problem when the tree is over a certain size. But, yes, we
>> need to be able to handle trees of all sizes.
>
> It depends on what you mean by "a certain size", to my mind
> without Jitter mechanism we may face issues even with just
> a few tens of leaves, particularly is leaves are at the same
> distance from the root (i.e. simultaneous replies)...
 
Yes.
"A certain size" means the size that we can handle without jitter :-)
But anyway, we now have jitter in place.
 
>> The jitter mechanism is designed to increase the size of tree
>> that can be handled. And is it fair to say that if the root
>> knows its own limitations and knows the size of the tree, it
>> can set the jitter suitably large according to the size of
>> the tree to mean that it is not swamped.
>
> Yes, and we must be aware that this will have an impact on
> the delay to diagnose that there is a failure. Note that
> the jitter may rapidly become quite important:
> Let's take
> R = the number of echo reply messages that can be handle
>     on the root per second
> L = the number of leaves
> J = the jitter
> We should have J >= L/R
>
> Let's take L = 1000 and R = 100, we have J >= 10s
 
Agreed.
 
I might be surprised to hear that R=100. But perhaps routers don't all have very good CPUs, and (of course) we must share processing with other control plane tasks.
 
L=1000 seems a realistic long-term number for P2MP MPLS-TE. To quote from the requirements doc...
     It is important to classify the issues of scaling within the
     context of Traffic Engineering. It is anticipated that the initial
     deployments of P2MP TE LSPs will be limited to a maximum of around
     a hundred recipients, but that medium term deployments may increase
     this to several hundred, and that future deployments may require
     significantly larger numbers.
For multicast LDP we will need to do some similar thinking.
 
 
Anyway, J >= 10 seconds is a good number to work with.
 
[SNIP]
 
As you say...
 
> I agree with you that this would not be the best way
> to perform fast failure detection.
> By the way note that the polling period would have to
> be higher than the Jitter which may lead to a poor
> detection delay (e.g. more than 10s)...
 
Yes. I think that using LSP Ping as a fast failure detection method has been ruled out by this analysis. but, in any case, I don't think that LSP Ping is recommended as a fast failure detection mechanism even for P2P LSPs.

[SNIP]
 
>> b. Develop some mechanism whereby ping responses are
>>    "aggregated" at branch points and sent back to the
>>    root as single report of all of the downstream tree.
>>    This is similar to the SRRO mechanism in P2MP RSVP-TE.
>>    This is certainly achievable, but the reasons for not
>>    even beginning to think about this is that it is a
>>    completely new protocol that would not be based
>>    on LSP Ping. Recall that the return path in LSP Ping
>>    is in the control plane and therefore might not (will
>>    not!) flow upstream through the branch nodes.
>
> Yes I also had this kind of procedure is mind.
> In my view the main problem is note to flow upstream through
> the branch node. You may easily enforce a return path along
> the reverse tree:
> The echo reply could be piggyback in a Resv message :-(
 
Ha, ha! :-)

> Or you could check RSVP state info when propagating the
> reply message upstream...
 
Hmmm. I don't think the control plane and data plane states should be linked in this way if we can help it. It would certainly open up some interesting issues for the misconnection scenarios, but perhaps it would help to find more problems.
 
 
We should do some maths. Consider a mesh where multiple ingresses are pinging P2MP trees. This idea will reduce the load on the ingresses, but how will it increase the load on the core nodes?
 
 
> IHMO the main issue would be related to the states to be
> maintained on branch nodes. Branch nodes would have to wait
> for all dowstream replies before to forward upstream, and
> we would fall into the classical timer expiration issue
> (timer value would have to depend on the distance from the
> leaves :-(

Right.
And, we still want to send a reply up stream when the timer expires (to say that some of the leaves are present).
 
 
>> Now, as mLDP comes along, some of these assumptions may need
>> to change a bit. That is, the SRRO will not be available and
>> the root must rely on the traceroute option to know the
>> topology of the tree. Further, the root cannot know the
>> addresses of the leaves or even know how many leaves there
>> are before it starts to ping. This makes for a *very*
>> different situation and ping-all and traceroute-all become
>> much more significant. It is our hope that jitter will be
>> enough to resolve the scaling issues even in this case.
>
> Yes, for instance the Jitter may be computed based on the
> total number of PEs, that is an upper bound of the number
> of leaves...
 
As I understand the way mLDP is going at the moment, the ingress will not know the number of leaves or the number of PEs involved in the P2MP flow. I guess that the ingress will know an upper bound for both, but this will depend on the network and so will be the same for all mLDP LSPs.
 
So determining how to set the jitter period may be much harder. One could run traceroute and increase the jitter as the tree is built. From then on, ping can use the jitter that has been determined for the deepest traceroute. And as leaves come and go, the jitter can be updated for future pings.
 
Thanks,
Adrian
_______________________________________________
mpls mailing list
mpls@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/mpls