The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2002-Dec> msg00233



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

Last call on LSP Ping - ECMP thread (continued)

  • From: Curtis Villamizar <curtis@fictitious.org>
  • Date: Mon, 09 Dec 2002 16:50:13 -0500
  • cc: "'curtis@fictitious.org'" <curtis@fictitious.org>, "'erosen@cisco.com'" <erosen@cisco.com>, mpls@UU.NET


In message <1117F7D44159934FB116E36F4ABF221B0267ED5F@celox-ma1-ems1.celoxnetwor
ks.com>, "Gray, Eric" writes:
> Curtis,
> 
> 	For the specific case where the destination IP
> address is chosen at random from a large space and is
> actually independent of the route followed by any path,
> a destination IP address based hash algorithm results
> in a fairly random distribution across all paths.  As
> you probably know, this is not the case for destination
> IP addresses in general - i.e. - the DIPA is not at all
> independent of the routes followed by all equal cost 
> paths.  Consequently, even assuming the implementation
> does the deep dive and looks at the IP header for its
> ECMP hashing algorithm, it will most likely not rely
> on the DIPA exclusively for deriving the ECMP hash.  If
> it did, there is likely to be insufficient entropy in
> the derived hash values - resulting in an undesirable 
> distribution among the equal cost paths.  

Actually src/dst addresses are used and the hash is over those 64
bits.  Some implementations use (or used) the port numbers as well.
The idea was to split up the traffic but make sure no IP microflow was
reordered.

> 	This proposal will indirectly have the effect of 
> requiring that any ECMP hash algorithm used in those 
> implementations that use the IP header must include a 
> part of the least significant bytes of the destination 
> IP address in the key.  This in spite of the fact that
> this is not actually helpful to the hash algorithm in
> general.  Moreover, this may be interpreted by users as
> justification for pressuring implementers to use the IP
> header in their ECMP hash algorithm even though this is
> clearly not going to scale in some LSR implementations. 

The normal HASH is over a 64 bit quantity.  If the MPLS-ping has a
fixed source address, and indicates that it can use 127/8, then if
possible an address from 127/8 has to be returned for the source to
use.

I mentioned that two branch points presents a problem because after
the first branch point a subset of addresses is available, of which
onl one is known.

> 	My point is that the usefulness of this address
> range depends on an assumption that ECMP hash algorithms 
> will include some part of the least significant bytes of 
> the destination IP address.  This is a grave restriction 
> on the use of ECMP algorithms and should not be encouraged.
> If you cannot make this assumption, then the existence of
> this address range would not be useful and would in fact
> be misleading.  

This is a safe assumption, but if not, then rather than a 127/8
address, the midpoint must return some indication that no address in
the range will allow the diagnostic.  The indication could be 0.0.0.0.

> 	If you cannot make this assumption, and the use of 
> this address range is not useful, then I fail to see why 
> it is necessary to propose something equally useless to 
> replace this suggestion.  :-)

If there is a router that does ECMP and uses an IP hash that doesn't
include the lower 24 bits, I'd be really surprised.  I don't think
such a thing exists or that there is any good reason to build such a
thing and therefore your objection has no merit.

The problem of multiple ECMP branch points remains a problem which
nothing proposed so far addressed adequately (unless you count
spraying over the whole 127/8 range as addressing this).

> 	However, if you proposed a more straight-forward
> approach - requiring an ECMP origination point to (some
> day) support the ability to recognize a Ping packet and
> (presumably on it's own initiative) produce its own Ping
> packets for all equal cost paths - that approach does 
> seem useful and we should think about it some more.

I thought I just did propose something (not in the attached message
but in a prior message) that unlike what you state above requires no
change to forwarding.  It requires only a change in MPLS TTL expired
processing if the payload is MPLS-ping (which is already required).
It is really spread over three messages.  Do I need to resend it in
one message?

> Eric W. Gray
> Systems Architect
> Celox Networks, Inc.
> egray@celoxnetworks.com
> 508 305 7214

Curtis


> > -----Original Message-----
> > From: Curtis Villamizar [mailto:curtis@fictitious.org]
> > Sent: Monday, December 09, 2002 2:13 PM
> > To: Gray, Eric
> > Cc: 'curtis@fictitious.org'; 'erosen@cisco.com'; mpls@UU.NET
> > Subject: Re: Last call on LSP Ping
> > 
> > 
> > In message <1117F7D44159934FB116E36F4ABF221B0267ED5B@celox-ma1-
> > ems1.celoxnetwor
> > ks.com>, "Gray, Eric" writes:
> > > Curtis,
> > >
> > > 	I am suggesting that attempting to use any address range
> > > as a mechanism for testing ECMP routes is a mistake.  I am also
> > > suggesting that a single address is sufficient to satisfy the
> > > need Eric Rosen points out.  Finally, if you were suggesting
> > > that 127.0.0.1 should be specified (as opposed to 127.0.0.0/8)
> > > for the same purpose, then I agree with you.
> > >
> > > 	Any address based approach to trying to test ECMP load
> > > sharing paths implies prior knowledge of all ECMP algorithms
> > > in use and is therefore broken.
> > >
> > > Eric W. Gray
> > > Systems Architect
> > > Celox Networks, Inc.
> > > egray@celoxnetworks.com
> > > 508 305 7214
> > 
> > 
> > If you have ideas to address the prior message I sent on ECMP and in
> > particular the problem of multiple branch points, I'd like to hear
> > your solution.
> > 
> > One thing that came to mind was the midpoint enumerating the possible
> > paths with indices and the ingress providing the index in subsequent
> > messages.  To do this, the ingress may have to bundle the better part
> > of an MPLS-ping message inside another so the top POP exposes an IP
> > packet and ECMP index to use and the packet inside is injected into
> > the stream.  If we go with this, then 127.0.0.1 would be fine.  This
> > has the disadvantage of not actually exercising the forwarding, just
> > the control plane.
> > 
> > The 127.0.0.1 address has a small advantage in that if the egress
> > comes out in some unexpected place, then it is delivered to that
> > router which can respond more intellegently than sending an ICMP TTL
> > expired.
> > 
> > Curtis