The MPLS WG Archive

Cell Relay Retreat>MPLS WG Archive>month:2005-Jan> msg00000



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

[mpls] working group last call on draft-ietf-mpls-oam-frmwk-01.txt

  • From: "Busschbach, Peter B (Peter)" <busschbach@lucent.com>
  • Date: Mon, 3 Jan 2005 16:17:24 -0500
  • Cc: mpls@ietf.org

Thanks, Neil.

A few comments in-line.

Peter


neil.2.harrison@bt.com wrote:
> Hi Peter.....
> 
>> 
>> Neil,
>> 
>> Two questions:
>> 
>> 1) You say that PM can not be defined without defining entry
>> and exit criteria for availability. However, in Y.1711 you
>> specify that PM SLA recording be turned off at the entry of
>> the defect state. Does't this proof that PM is dependent on
>> an unambiguous definition of defect states, but independent
>> of the definition of availability?
> NH=> In my opinion, and from a practical point of view, 'yes' is the
> trite answer.....but only with how I have specified defects here, so
> let me explain this a bit further.  Note - if you look at the ITU
> Recs that deal with this they all define that a 10s period is to be
> used for unavailability entry/exit and the turning off/on of PM, eg
> see G.826 for example.
> 
> My main objectives were simplicity whilst concentrating on what is
> operationally important based on many years of studying real tranport
> networks.  Now in a bit more detail wrt your question:
> 
> True PM purists won't like my proposals, such as (i) not including PM
> in the entry and exit criteria of availability or (ii) the fact I
> suggested stopping the PM counters at the end of 3s defect evaluation
> period.  We don't have to do either of these things of course...we
> can go for full 'by the book' processing with the added complexity
> and loss of information this might result in.  Let me try and explain
> what I mean by this.
> 
> Nearly all (close to 100%) of SES events I ever saw were accompanied
> by defect conditions in transport networks.  But SES came in 2
> flavours (you may recall the reasons for this that I explained in a
> long mail to you some time ago which I won't repeat here):  (i) few
> ms events at an effective error density 0.5 (ii) few seconds events
> at an effective error density 0.5.  Most <1s events self healed, so
> did a few 1-2s events, but few >3s events ever did (ergo why 3s in
> Y.1711 for defects).....and there was invariably a (server layer)
> defect present for all this time, meaning there would be FDI inserted
> into whatever the clients were (assuming they had an FDI).  In other
> words just looking for defects was sufficient to measure SES and thus
> was the simple way to measure unavailability in transport networks.
> 
> The few seconds SES events (which we called short-breaks) were however
> rather important.  These were 3 orders of magnitude bigger than the
> few ms SES and they knocked over applications.  This would not have
> been such a problem if they were rare compared to the few ms SES
> events.  But they were not....I was very surprised to see a roughly
> 50/50 split of these.  But we had no method of measuring these...and
> if they just got dumped in the same SES bucket with singleton PM SES
> events we lost sight of them.

You observed this behavior in an SDH network, right? It would be interesting to find out if th same behavior occurs in networks that rely on packet switching switching more than  on SDH (e.g. ethernet-based metro networks).


> 
> It was so we could also distinguish these, and not lose sight of them,
> that I have suggested these be measured as a parameter in their own
> right (albeit harmonised with the manner in which defect entry events
> were defined).  In other words a defect lasting 3-9s would register a
> short-break event....else it will 'get lost' in the PM measurements
> (and it obviously will not factor as an unavailability event). 
> However, letting these short-break events get captured by the PM
> process will distort the metric objectives PM is intended to measure.
> And it is this latter point that is the practical reason for saying
> there is liitle point in continuing to meausre PM metrics after a
> period of about 3s. 
> 
> However, you imply stopping PM measurements on defect entry (any?). 
> But this is critical on how one defines the defect entry criteria. 
> If this means the 3s as noted above they I'd be happy with that, but
> if you were to imply defects detected in <<1s periods (as you can
> easily do on transport networks) I would not, as I believe this
> should still be counted in the PM category.

Good point. I was thinking of defects that would be detected through MPLS CV or BFD.

How do you deal with server-layer defects? Do they have any impact on MPLS PM, or would the LSP end-point wait for Loss of CV to occur?


> 
> Note also that we should be a bit careful not to act too fast on <<1s
> events as we can trigger prot-sw on (PM) error events that would
> otherwise go away.
> 
> Finally here, please note we don't have to make the simplifications I
> suggested.  The 'by the book' approach means one has to let all events
> lasting <9s get registered as PM....though as I note this will distort
> PM and we will lose sight of short-break events....and simply use the
> 10s unavailable state entry/exit criteria to turn on/off PMs.  I'd
> settle for this, though I don't think its the smartest approach.


I am not in favor of this 'by the book' approach. My concern is that I think that many service providers will run MPLS connectivity verification at rates (much) slower than once per second. In that case, a ten-second availability definition is useless. Therefore, I would prefer it if we could take availability out of the equation and relate PM to defects only.


> 
>> 
>> 2) Isn't availability only relevant with respect to SLAs? In
>> most networks, services are carried over multiple networks
>> layers. Eg. an ethernet service can be carried over an
>> MPLS-over-SONET-over-WDM core network. Defect conditions need
>> to be defined for each network layer, but availability is
>> only relevant for the end-to-end Ethernet service. Therefore,
>> it seems to me that the SLA itself can provide the definition
>> of availability (including a description of the verification
>> process) and that there is not necessarily a need for a tight
>> coupling with the defect definitions in the underlying network
>> layers. 
> 
> NH=> If I have SDHoverMPLS which is the layer you would apply this
> reasoning to?  And what about MPLSoverSDH?...does this change things?
> 
> Please also note that the SLA will apply to all layer networks.

I disagree. A Service Level Agreement applies to a service (as the name implies) not to network layers. 

Assume for example the following service guarantee for an IP or Ethernet VPN service:

- every 15 minutes, generate a Ping request from each CE router to each peer CE router in the same VPN

- the number of failed ping attemps will not be more than X per Y hours

Such an SLA is completely independent of the network layers over which the VPN service is carried.


>  If you were leasing capacity off some SP in some layer network X I'd
> assume you want and SLA for that, and not just the services you were
> then running over that. And if there was no alignment at all between
> how the leasing SP specified his SLA and you specified your SLA then
> what would do about that?

SLAs need to be aligned, but that does not mean that there must be a consistent definition of availabillity.

Obviously, it would be unwise if a service provider would offer his customers more stringent SLAs than he himself would get from a backbone provider. However, there is no problem if the customer SLAs are less stringent than the backbone provider's SLA. Therefore, I don't see why Ethernet services would need to inherit the availability definition used in SDH networks.

> 
> My simple approach (because there are currently no rules for XoverY or
> how many XoverY can be allowed in some global HRX) is to aim for as
> much harmonisation as possible....some folks might call it 'backwards
> compatibility', I'm just trying to be pragmatic.
> 
> Now let me ask you a question if I may.  I have made a fair stab at
> defining quite a lot of stuff as simply as I can in Y.1711 (which
> would apply to for p2p and p2mp).  So can you please explain to me how
> defects, availability and all this PM stuff is going to work on mp2p
> constructs?

As far as I can tell, most of the fault detection mechanisms work as well in mp2p constructs as in p2p constructs. As long as Connectivity Verification messages carry identifiers that uniquely identify a source-destination pair, all possible errors (loss of CV, mismerge, misbranching) can be detected. Also, ping, traceroute and loopback will still work.

The only mechanism that does not work well in mp2p constructs is AIS insertion due to server layer defects.

However, Service Providers can avoid mp2p construct if they setup LSPs with RSVP-TE. Since it is likely that services with stringent quality requirements are carried over traffic-engineered LSPs, whereas service with less stringent requirements are carried over autonomously-created LSP (through LDP), I don't see the problem with mp2p.


> 
> regards, Neil
>> 
>> Peter
>> 
[snipped till the end]

_______________________________________________
mpls mailing list
mpls@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/mpls