The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2001-Jul> msg00518



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

QoS in Shared Media

  • From: Fred Baker <fred@cisco.com>
  • Date: Mon, 30 Jul 2001 10:25:06 +0100
  • Cc: "Jay Wang" <jwang@opixnetworks.com>, "HANSEN CHAN" <hansen.chan@alcatel.com>, "Naidu, Venkata" <Venkata.Naidu@Marconi.com>, "'mpls@uu.net'" <mpls@UU.NET>, <te-wg@ops.ietf.org>

At 04:53 PM 7/30/2001, Juha Heinanen wrote:
>i'm reading old emails and found your dpr/rpr/diffserv comment below to
>which i have not had time to reply.  unfortunately i have to disagree
>with you regarding dpt/rpr supporting diffserv.  the problem is their
>node based fairness concept.

Juha: you and I have discussed this in private mail, and I at least 
partially disagree. Let me lay out the argument, and the assembled masses 
can comment.

It comes down to this: I will grant you that providing something as simple 
as two levels of priority does not provide all of the facilities of the 
diffserv architecture - you cannot control the rate of each data stream in 
the system if you are not given the knobs. This is not so very different 
from Ethernet, which even with 802.1q labelling doesn't give you rate 
controls on traffic streams within the switched part of the network.

However, apart from failure events, the SRP architecture provides a 
nominal-latency and loss-free domain, and allows for access control on 
injection of traffic into that domain. Thus, injection of traffic to the 
ring from any one system does not exceed the rates configured on those 
systems, and there is technology to manage babblers. I will argue that this 
is not significantly different from typical ISP backbones, which rely on 
bandwidth on their internal interfaces and provide QoS services only on 
their ingress/egress links if at all.

Are you telling me that in order to be diffserv compliant, not only must a 
network provide the technology at it's service interfaces, but also on each 
and every internal interface? If so, I don't predict early adoption...

Or are you telling me that to be compliant with the diffserv architecture, 
the device must specifically support the AF PHB on every interface? I don't 
find anything in the diffserv architectural documents that makes this 
stipulation. And even in the AF PHB (http://www.ietf.org/rfc/rfc2597.txt), 
while I see a stipulation that interfaces which might lose data are to drop 
randomly and prefer to drop from traffic marked "excess", I don't see any 
requirement that all interfaces drop.

I believe that you are essentially, in the final paragraph of your reply, 
asking that all switching be done at the IP layer, and that link layer 
networks of any sort be abandoned. I understand the viewpoint, but have to 
say that it is not completely practical; LAN technologies exist because 
they are inexpensive, useful, and widely used, and I don't know of *any* 
that support the diffserv architecture in the sense that you seem to want.

Following is the email I sent you on 5/10/2001, and your reply. For the 
remainder of the folks on the list, please excuse a few discussions of 
specific implementations which came up in this private conversation.

>To: Juha Heinanen <jh@lohi.eng.telia.fi>
>From: Fred Baker <fred@cisco.com>
>Subject: [Fwd: QoS in Shared Media]
>Cc: Mike Takefman <tak@cisco.com>
>
>At 10:19 AM 5/9/2001, Juha Heinanen wrote:
>>the idea behind diffserv is that each traffic is allocated a certain
>>amount of aggregate resources (buffer space and bandwidth) at each
>>diffserv (l2 or l3) node.
>
>Mike Takefman, who manages the SRP hardware group, contacted me to be sure 
>he understood your issues and to see if we could sort them out. I think we 
>in fact have the support you need, but some of it is in the interface card 
>as opposed to being in the "Wide Area LAN" itslf. Let me walk through your 
>issues, the architecture they have proposed, and how I see this all 
>working, and then you can tell me where the disconnect is.
>
>If I think about ensuring a combination of voice/video a predictable 
>latency through EF and a set of AF classes for elastic traffic at various 
>rates, I wind up implementing a multi-level scheduler like so:
>
>         voice/video =====>||==================>priority scheduler
>                                             =>
>         AF class 1  =====>||====>\         /
>         AF class 2  =====>||====> \WRR/WFQ/
>         AF class 3  =====>||====> /Scheduler
>         AF class 4  =====>||====>/
>
>In short, I want to assure the voice/video traffic that it will not be 
>delayed by the elastic traffic, but will instead get predictable latency 
>at the interface, and I control rate using policing. For the elastic 
>traffic streams, I want to ensure each one a specific minimum rate, and 
>ensure that when arrival rate for a class exceeds departure rate, I signal 
>to a TCP or similar transport which can materially adjust its rate to 
>compensate using RED or WRED.
>
>What SRP provides is a collection of point to point duplex lines with 
>internal queuing which together comprise a ring. Each interface sees:
>
>                                            +-----+
>      high priority transit traffic in ->||-|     |
>      local high priority in ----------->||-|     |
>                                            |     |--->traffic out
>      low priority transit traffic in -->||-|     |
>      local low priority in ------------>||-|     |
>                                            +-----+
>
>Clearly, the other end of "traffic out" is "traffic in" at the next node, 
>which classifies traffic according to a three bit field copied from the IP 
>Precedence as high or low priority.
>
>The decision function favors high priority over low priority to ensure 
>minimal per-hop additional latency and jitter. The choice between local 
>and transit traffic is a bit more complicated. Each node on the ring is 
>guaranteed a certain lower bound bandwidth (at the moment, 1/N of the 
>ring, but Mike tells me that they are considering making this 
>specifiable), and if it is not using its fair share the decision function 
>will take local low priority traffic over low priority transit traffic, 
>and will take local high priority traffic over high priority transit 
>traffic. If the transit queues are too deep, however, it will take transit 
>traffic to clear the buffer. In the latter case, a longer term rule steps 
>in - the nodes communicate among themselves, and a system which fails to 
>get its share during one period of ten milliseconds will ask its neighbors 
>to back off for the next ten milliseconds. Hence, in any case, the low 
>priority has a predictable lower bound at each node.
>
>The systems which implement SRP also have extra chips which implement some 
>number of classes that use a WFQ/WRR type of scheduler, WRED, metering 
>with policing or marking, etc. The chip also allows a single queue to be 
>given absolute priority, which is appropriate for voice/video. These work 
>with any interface device, whether SRP, ATM, HSSI, POS, or whatever, which 
>is why the SRP guys don't think about them too much. Fundamentally, in 
>SRP's case, they divide the low priority capacity proportionally among a 
>set of traffic classes.
>
>I think you will agree that SRP with the external priority queue limits 
>edge to edge jitter and latency around the ring for the EF traffic. I want 
>to convince you that each AF class also gets a predictable lower bound 
>rate, and that RED drops apply when incoming traffic overruns available 
>capacity.
>
>I submit that if each node in the ring gets a predictable lower bound, and 
>subdivides that among its several AF classes proportionally, then each of 
>those sub-classes also has a predictable lower bound. This is for traffic 
>entering the ring from each node; how it leaves the ring depends on the 
>traffic itself, and in the worst case stays on the ring all the way 
>around. Thus, the proportions on the ring itself may vary, but that is not 
>necessarily an SLA objective - the SLA objective is to give each edge 
>device its due, and that is done.
>
>In discussing this with the SRP team, I proposed a scenario in which I 
>wondered if traffic from one node could be reduced or locked out by the 
>preponderance of traffic from other nodes. That's where the "I'm not 
>getting my share" message comes in; even in that case, the other nodes 
>slowing down should result in the restricted node getting its bandwidth 
>needs met, by forcing competing traffic to back up into the WRR/WFQ queues 
>on the competing nodes. Any place that traffic backs up into a WRR/WFQ 
>queue, RED/WRED can be applied to talk with those TCP senders, so RED 
>happens and it happens to the right set of senders.
>
>It seems to me that we have all the bases covered. They are not covered 
>within the SRP component on the interface card, but they are covered in 
>the queuing components that sit in front of the SRP interface on the line 
>card. In other words, the following happens on the line card:
>
>         voice/video =====>||==================>priority scheduler
>                                             =>
>         AF class 1  =====>||====>\         /
>         AF class 2  =====>||====> \WRR/WFQ/
>         AF class 3  =====>||====> /Scheduler
>         AF class 4  =====>||====>/
>
>and on the SRP ring, a simple priority scheduler preserves the 
>latency/jitter characteristics needed by EF, and each elastic traffic 
>class (from each node and from the aggregate of the nodes) is guaranteed a 
>predictable lower rate bound.
>
>Am I missing something?

>From: Juha Heinanen <jh@lohi.eng.telia.fi>
>To: Fred Baker <fred@cisco.com>
>Cc: Mike Takefman <tak@cisco.com>, jh@rautu.eng.telia.fi
>Subject: [Fwd: QoS in Shared Media]
>Date: Thu, 10 May 2001 15:46:30 +0300
>
>fred,
>
>thanks for your thorough explanation.  my comments follow.
>
>queuing at two levels (layer 3 and 2) is always problematic.  we have
>learned this from atm, where l3 diffserv queueing and scheduling is
>messed up when the properly scheduled packets enter the atm layer's per
>vc queues. i'm afraid that a similar problems exist with rpr if rpr
>doesn't do same kind of queueing/scheduling using the user priority
>field as is done at l3 using the precedence field.
>
>based on your description i understand that in rpr the low priority
>capacity is allocated fairly to each node, i.e., each node has a lower
>bound on the amount of low priority traffic and can then distribute it
>to the elastic traffic classes according to its policy.  this doesn't,
>however, mean that a given traffic class would IN THE RING ITSELF get a
>given proportion of bandwidth, because one node could, for example, send
>only best effort packets and they would get as much ring bandwidth as
>business class packets from another node.
>
>in other words, it seems to me that diffserv and node based fairness and
>separation of local and transit traffic simply do not match. if we
>forget node based fairness and local/transit traffic it could be
>possible to end up with a proper diffserv implementation where l2
>packets entering the ring (first time or transit) would be all
>queued/scheduled/dropped by the rpr interface based solely on the
>traffic class and drop precedence coded in the user priority field.
>additional complications may result from the ring wraps, but i don't
>remember the details.
>
>-- juha