transport - Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2)

Subject: Transport protocols and bulk file transfer

List archive

Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2)

From: stanislav shalunov <>
To:
Subject: Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2)
Date: 09 Dec 2004 22:19:03 -0500

"Lawrence D. Dunn"
<>
writes:

> (BTW- I'm OK if we let folks read it and comment via
> email for a week, rather than taking time on the
> call-the-day-after-i-send it. Your call, Stas...)

Let's see if there are points that would benefit from low-latency
telephone discussion...

> 4.1 point "3"
> "failing closed" is not a universally understood term
> (and a dynamite detonator that "fails closed" may be
> more robust to its eventual purpose, but is not safe...)

Inserted explanation:

* it appears more susceptible to being made more robust in the sense
of failing closed (it is a desirable property for the protocol to
back off if both data and information about the state of congestion
stop coming);

> 4.2 timestamps
> I'm ignorant as to whether the inclusion of timestamps
> could impact the CPU/efficiency of the process
> enough to alter the achievable rate.
> Voices of experience?

But a real voice of experience, but see measurement of the time to get
time from the kernel elsewhere in the document (4.3us on my machine,
horrifically bad design somewhere). Still, at 1Gb/s with 1500-B MTU
(pessimistic estimate again), that's ~35% of a 2-GHz CPU -- or ~6% for
jumbo frames.

If that's too expensive (it seems like it is), using TSC makes the
performance implications a non-issue (mostly). Depending on the
format, putting a timestamp might take (rough estimate) 100 extra
cycles or so. That's 50ns at 2GHz. It's about as expensive,
cycles-wise, as a (fixed) nonce.

Text addition at end of section:

Note that with the use of the TSC register for timestamping, the
performance should not suffer, as long as the time representation is
not overly complex.

> 4.6 "Selective Acknowledgements"
> Parts of section 4 read as tutorial;
> in 4.6, we use(lead-off) with the term "Selective Acknowledgements",
> but never mention SACK in the body. May as well include
> a reference to SACK, particularly given the tutorial-feel
> of the rest of the paragraph.

After ``unnecessary retransmissions,'' inserted:

Modern TCP uses Selective Acknowledgments (SACK)
\cite{rfc2018,rfc2883,rfc3517} to avoid the problem.

> I'm not quite sure what the last sentence means: "...mechanism.. same
> as for regular transmission"

The point I was trying to make is that, say, in TCP, SACK is an add-on
that requires a change at both ends to work. If the need for
selective retransmissions is realized from the start, it could be done
more elegantly (e.g., if the receiver sends [implicit or explicit]
requests for blocks of data, it's a trivial matter to use the same
format of request for retransmission).

How can we improve the wording?

> 4. general
> The section is called "Protocol format and framing"
> but a couple of the paragraphs delve into mechanism;
> maybe "4" needs some small re-titling,
> like "Protocol features and framing" or similar

Excellent. Done.

> (I also think the difference between "format"
> and "framing" may be too subtle for many readers?).

Mea culpa.

> 4.7. Request compression ratio
> I'm not sure it's quite true/relevant that
> the*packet* ratio needs to be <<1 (<1?)
> to "use the network resources efficiently".

I think we would all agree that ratio of 1 is wasteful. One could
transmit essentially the same information with a smaller ratio.

> Also, "considerably smaller" is vague.

That's on purpose, as the difference between, say, 0.1 and 0.01 is
quite negligible compared to the difference between 1 and 0.1.

> I agree it's a worthy goal to be able to control
> the ratio, but it might suffice that the byte-ratio
> is <<1.

Well, if that's not true for a unidirectional transfer, the protocol
is really in some deep animal exhaust.

> Also, if the reverse-path is uncongested,
> it's not cleat that reverse-direction efficiency is even
> important

But what if it is congested?

> (and if that's an information channel
> w.r.t. controlling send-rate, then we may want to reduce
> pkt- or byte- efficiency to gain information rate).

I agree. Reducing the backflow to the point where information
transfer suffers is not desirable.

> So I'm not arguing against a control method, just think
> that the logic of "why" and the metrics (pkt-based)
> might warrant further thought.

New language:

To use the network resources efficiently, it is desirable to make this
ratio sufficiently small and, perhaps, to be able to control it; in
particular, values considerably smaller than $1$ are desirable.
Naturally, reducing the backflow of packets (and bytes) is a goal
subservient to conveying sufficient information about the state of the
network.

> 5. Window- vs. rate- based
> Need to be careful about the term
> "kernel TCP patch" as I think sometimes
> folks consider wholesale replacement
> of the OS-provided algorithm as a "kernel patch".

I include that meaning. I think it's now conventional, with people
talking about Web100 and FAST TCP as ``patches.''

> Of course, some kernels now come with several
> algorithms included. So I think this paragraph
> is too strong. How do we know which algo. is
> being "patched"? embedded Reno? Vegas? embedded FAST?

I believe all TCP flavors of this sort implemented to date use
window-based control.

> Then again, I think I've been in conversations where someone
> refers to FAST as a "kernel TCP patch", which of
> course takes us from window- to delay-based via a "patch".

It takes one from loss- to delay-(and-loss)-based. The protocol stays
window-based (unless there's a new development I'm not aware of).

> So depending on the reader, the "hardly relevant" is
> a little strong, and open to confusion.

I changed ``hardly'' to ``less.'' How's that now?

> I'm also confused by telling the reader that it is
> "hardly relevant", and then "has not been studied"
> in the same paragraph (in which case, how do we
> know it is hardly relevant or not?)

We want to use existing research, not to set up a whole new research
agenda.

> 5. "...can have fewer bursts or they can be smaller" ->
> "...can have fewer bursts, or the bursts can be smaller".

Done.

> 6. "compatibility vs. friendliness..."
> "...becoming obsessed with necessarily being TCP-compatible..."
> is inflammatory and pejorative.
> Perhaps "requiring strict adherence to TCP-compatibility..."?

Done.

> Also- it's not clear that requiring TCP-compatibility leads to
> "same performance as conventional TCP".
> (Lossless environment being obvious counterexample).

I don't think I understand what ``lossless environments'' are.
Environments in which no loss, congestive or non-congestive, can ever
occur?

> Perhaps it would help the reader to characterize
> HS-TCP, FAST, BIC, etc as to whether you view them,
> with the proposed definition of terms, as "compatible"
> or "friendly".

HS-TCP is compatible (and friendly) at low rates (forget the limit,
but it's small number of tens of megabit per second for realistic WAN
paths). Not compatible and not friendly above.

FAST is not TCP-compatible (doesn't halve rate on loss), but friendly
at least in some regimes (and, in fact, better than friendly).

I don't understand BIC well enough to say.

> Under the definition and conclusion that
> "compatible --> *same* performance", is *any*
> algo,. compatible with Reno except Reno?

Perhaps. Something like TFRC.

> I can't tell if this section is a complaint about 2914's relevance
> or choice of wording, or lack of common usage of terminology,
> or something else.

It's a complaint about the rigidity of RFC 2914 and an attempt to find
a way out of that rigidity without losing the intent of 2914 (``don't
hurt normal TCP'').

> 7. state location
> No real comments. Though it might be nice to provide a
> word or two on how receiver misbehaves to get unfair-share,
> instead of making the reader chase the reference.

Inserted ``with ACK-flooding'' after ``unfair share of network
capacity.'' Is that enough of a hint?

> 8. UI/API
> UI- thought not exactly a "UI", might it be sensible to provide
> hooks/code for web100 to provide some display?
> I'm not sure how (non-)trivial it is, but being able to
> view/tweak state through gutil, and compare w/ other
> protocols through that vehicle, might be worthwhile.

Not quite sure I understand what you mean here. Web100 lives in the
kernel. How do we get it to talk to us without changing it (and thus
changing the kernel)? Could be doable inside the kernel, but then the
tool becomes a Web100 extension; not exactly a ubiquitously deployable
easy-to-install tool we want, is it?

How about we talk about it on the phone?

> (appendix) "A"
> Should probably prepend the word "Appendix"?

I let the LaTeX article style figure that part out. Not sure why it
thinks there should not be ``Appendix'' in front of ``A'' or on a
separate line. Looks like a typographic bug to me. Let's battle
style issues later.

The LaTeX text is now:

\appendix
\section{Formalization of ...}

> OK, one example: I'm unconvinced by the assertions about
> date rate and convergence time; they seem predicated on
> linear-bit-density, integer representation?

Representation is not discussed or used. Only the number of
theoretic-informational bits. These are too easy to confuse with 0s
and 1s on the wire. How can we make this part of the argument more
digestible?

> But the controller, knowing it's bit-rate transfer limitations,
> could choose an encoding that traded off precision for range,
> to speed (approximate)convergence.

Yes, by giving up precision, one can gain speed. That's why it says
``or'' between ``precision'' and ``speed.''

I'm aware of the trade-off (and, in fact, considered using it:
http://www.internet2.edu/~shalunov/tcpar/tcpar.pdf describes a scheme
that attempts to give 1/p response with fairness).

> It's not clear that "convergence" to a specific value
> is a requirement of this application (<=upper_bound
> is probably more useful, closer being better, perhaps...)

That's a valid point. What do we make of it? TCP Reno does the <=
upper bound part well enough (not counting the increase in delay
part). Autem nihil de proximitatis.

> I'm not saying it's the "right" way to do it, just that
> it makes me uncertain of the validity/applicability of the (overly?)tight
> set of assumptions that lead to the nice math.

Designing network protocols is always an exercise of hunches and
empirical rules, is it not? Having some facts to hold on to doesn't
hurt. I might have succumbed to the temptation to use the observation
in question as a hammer going around looking for nails, of course. If
I did, show me where so that we can fix that.

Comments on tradeoffs-design-space-03.pdf (part2 of 2), Lawrence D. Dunn, 12/09/2004
- Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2), stanislav shalunov, 12/09/2004
  - Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2), stanislav shalunov, 12/09/2004
  - Comments on tradeoffs-design-space-03.pdf, Lisong Xu, 12/09/2004
    - Re: [transport] Comments on tradeoffs-design-space-03.pdf, stanislav shalunov, 12/10/2004
      - Re: [transport] Comments on tradeoffs-design-space-03.pdf, Lisong Xu, 12/10/2004

List archive

Re: [transport] Comments on tradeoffs-design-space-03.pdf (part2 of 2)