transport - [transport] Re: some comments on the design document

Subject: Transport protocols and bulk file transfer

List archive

[transport] Re: some comments on the design document

From: "Lawrence D. Dunn" <>
To: "Injong Rhee" <>, <>
Subject: [transport] Re: some comments on the design document
Date: Fri, 10 Dec 2004 17:41:18 -0600

Folks,
Here's a sample of what the high-speed WAN routers do for packet memory.
This is info. publicly available on cisco web pages.
I'm pretty sure Juniper numbers are similar (but haven't checked lately...)
This is w.r.t. packet-buffer memory (and excludes other types of memory
in the system, like route-table, OS-memory, etc.)

For an OC-48-speed linecard on a 12xxx router
(whether it's 1xOC48, 4xOC-12, etc- the aggregate "speed" of the
1- or several- interfaces on the card is OC-48):
The packet memory is organized as 256MB Rx, 256MB Tx.
So a conservative analysis could just look at the Tx memory,
associated with queues for the outgoing traffic
(but a more thorough analysis would somehow
have to model the contribution of the distributed Rx memory
across the several ingress ports that might be sending traffic
to that egress. Let's keep it simple for now, just look at egress mem...)

So with 2.5Gbits/sec, or *roughly* 300MBytes/sec,
a 100msec-worth of buffer memory would be ~30MBytes.
(if you wanted to use 2xone-way or 2x100msec as RTT,
then 200msec would be about 60MB).
So 256MBytes of egress memory is overkill, if anything,
for OC-48 (something like 800msec).

I guess it depends on your target for "long" RTT,
but simple ping from California to Switzerland is ~170msec RTT.
So I think it's not unusual to use ~100msec as typical 1-way
"long", but admittedly not "worst" case; "worst" can
be anything you dream up, w.r.t tunnels, looped paths, etc.

I think the OC-192 cards also use 256MB Tx memory,
so that's only about 200msec, but that still
covers bandwidth*delay for most situations.
(If it's multiple ports, then the BW*delay
drops accordingly, and the total linecard speed
vs. the total packet-buffer shared across LC ports
tends to be "what counts". There are various schemes
for sharing memory across ports, preventing
starvation, etc.)

I haven't checked the OC-768 products.

Note that if you look at the board layouts of these
cards, there's a lot of space being taken up w/ various memory.
So folks are interested in the Stanford work/claims
w.r.t. 1/sqrt(N) flows and potentially reducing required
memory; they're also interested in the outcome of various
alternate TCP proposals that might "require" less bottleneck
memory.

The designers would love to cut down on space/power/cost,
but for the most part are sticking pretty close to BW*RTT_delay,
roughly, for high-end gear.

There are lots of devices that are "meant" to be used
in LAN environments, which are more cost-sensitive.
There, designers often keep the cost/space lower for
"LAN" interfaces by reducing memory.
Of course, if that device/port happens to be the bottleneck,
that's a problem...

Hope that helps a bit,

Larry
--

11:54 AM -0500 12/10/04, Injong Rhee wrote:

> -----Original Message-----

From: stanislav shalunov
[mailto:]
To:

Subject: [transport] Re: some comments on the design document

"Injong Rhee"
<>
writes:

> I have some comments on the document. Since I am coming from loss-based
> protocols, my comments could be a little biased. So please take it with
some
> grains of salt. Hre is a quick brain-dump.
>
> First, the document seems to weigh more in on delay-based protocols. I
feel
> we need more discussion on which ways to go before we finally settle on
the
> goals.

Yes. In would be great if you could help me and others understand the
advantages of pure loss-based approaches better.

> In Section 2.1 it is implied that loss-based protocols always double
> the delay and the currently deployed routers have RTT worth of
> buffers. In fact, the router buffers (from what I gather) are not
> as large as RTT. Typically, I hear that it is around 20% of the

> > long RTT. STCP paper mentions about the lack of buffers and there

> were several references that comment on small buffers. Also router
> buffers are shared memory among several interfaces and ports. So per
> port memory could be limited. We need more data supporting your
> claims.

I agree for some of the paths: the claim is almost certainly wrong for
Ethernet switches, where 64- and 128-kB buffers aren't uncommon, even
for major manufacturers. I believe the provisioning claim to be right
for major router vendors.

I guess Larry can enlighten us on this, but my own investigation with router
makers indicate that the buffer space (especially on high speed long
distance networks) will be limited and it is not likely to be changed in the
near future. I suppose that it is hard to predict the network delays (even
base RTT delays) at the time of router deployment since there are many other
Internet paths that might be connecting to that high speed routers, so the
base RTT will be likely increasing in the future. So it is a little
difficult to imagine that each router (at the time of deployment) is set its
buffer space to the worst case delays (what is the worst case then?). Thus,
I don't see the buffer space is as large as the bandwidth delay product.

I draw a blank on ATM switches, but these
are largely extant at this point.

Inserted the following text as a footnote after ``the user-observable
delay would need to double before losses would occur'':

The buffer will, of course, depend on the type of the network
interface and the device in front of the bottleneck link. The
claim of provisioning of buffer space sufficient to hold
round-trip time worth of data applies primarily to backbone
router interfaces; it also holds for most host operating
systems (should the first link be the bottleneck). The claim
does not necessarily hold for Ethernet switches, which

> traditionally come with smaller buffers. However, the

presence of large buffers at least in front of some of the
bottleneck links makes the claim relevant.

Larry, can you comment on the correctness of the provisioning claim in
the case of Cisco devices?

> In the end of 2.1, you mentioned about tolerance to the losses. Are
> you referring to short-term throughput or long-term throughput? As
> you refer to response function, I assume that it is to long-term
> throughput. If non-congested losses are rare, it is not necessarily
> a characteristic of loss-based protocols that their tolerance to
> loss is very low. Consider TFRC. TFRC responds to the average loss
> rates - instead of instantaneous loss rates. Depending on the
> design, we can avoid congestion control to respond less sensitively
> to "rare" non-congestive losses.

Yes, the text is talking about long-term throughput. It's a design
goal of TFRC to have a response function of 1/sqrt(p) (where p
includes both congestive and non-congestive packet loss), isn't it?
TFRC rate responds more smoothly to losses then jerking the rate to
half, but the long-term effect of any loss event (congestive or
not---the controller can't tell) on throughput, by design, should be
the same as it would be in the case of Reno, shouldn't it? Am I
missing something?

Yes you're right. So are you referring to the "tolerance" of loss-based ones
to be close to 1/sqrt(p)?

The fact that TFRC is designed to be TCP-compatible does not, of
course, mean that any loss-based protocol will exhibit the same
behavior. I still need to understand BIC, for example.

Not all loss based ones have 1/sqrt(p) response functions. For instance,
STCP has 1/p.

> In Section 2.3, references to SABUL and SOBAS. Could the authors of
> this protocol comment on the TCP friendliness of the protocols?
> Since we have the authors of SABUL on board, we can hear from the
> experts; but our experience with SABUL (old version) is that the
> protocol is not so stable and not designed for TCP-friendliness.
> But we could be wrong and also the protocol also could have been

> > changed over time.

Constantinos is now also on the list.

> In Section 6, I think we need more clear definition on "under
> congestion".

The phrase comes straight from RFC 2914:

Again borrowing from RFC 2309, we use the term "TCP-compatible" for a
flow that behaves under congestion like a flow produced by a
conformant TCP. A TCP-compatible flow is responsive to congestion
notification, and in steady-state uses no more bandwidth than a
conformant TCP running under comparable conditions (drop rate, RTT,
MTU, etc.)

> Does this mean when you get congestion indication (i.e., queuing
> delays or losses)?

I wish I knew. Do you agree with the current text where it says ``the
usual understanding of this definition is that the response function
is the same as that of conventional TCP''?

> Also I guess that the wording on loss-based protocols behaving
> exactly as TCP under congestion is not necessarily true.

Where is that? If the text says that *all* loss-based protocols
behave like TCP, that's, of course, wrong (clearly, there are response
functions different from 1/sqrt(p)) and needs to be fixed. Maybe I
fixed it in -03? I can't find that language.

> Also in general TCP-compatibility means that it can "co-exist" with
> TCP (not necessarily TCP-friendly).

I simply use the term as it is used in RFC 2914. Are there more
authoritative definitions? In any case, we'll have to talk to the
IETF crowd some time later on, so maybe we should use the language
that works there.

> More precise definition of TCP friendliness can be found in
> Vojnovic's paper (SIGCOMM 2003) where the competing flows use always
> less or equal bw than TCP flows.

Right, the terms TCP-compatibility and TCP-friendliness are both very
much overloaded (sometimes even those two are used interchangeably).

> Prompted by Lisong's message
<>,
where he

says ``[i]f we consider `reno-friendly' as `not hurting the throughput
of Reno' instead of `having the same throughput of Reno', then maybe
we can design a tool which is Reno friendly and also has a better
performance,'' I wanted to introduce some name for that property. Is
there a better name than TCP-friendliness?

--Stas

--
Stanislav Shalunov http://www.internet2.edu/~shalunov/

This message is designed to be viewed at room temperature.

[transport] Re: some comments on the design document, Injong Rhee, 12/10/2004
- [transport] Re: some comments on the design document, Lawrence D. Dunn, 12/10/2004
  - Re: [transport] Re: some comments on the design document, stanislav shalunov, 12/13/2004
    - Re: [transport] Re: some comments on the design document, stanislav shalunov, 12/13/2004
- Re: [transport] Re: some comments on the design document, stanislav shalunov, 12/13/2004

List archive

[transport] Re: some comments on the design document