rcache fsync() avoidance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

rcache fsync() avoidance

Nico Williams

One of the things that makes the traditional MIT rcache design painfully
slow is the use of fsync() in each operation that doesn't detect a
replay (the common case).

We all know that RFC4120 recommends a 10 minute skew window: 5 minutes
into the past, 5 into the future.  But the skew window can be different,
and implementations tend to make it configurable.

fsync() avoidance is simply about allowing part of the skew window to be
dynamically determined for a few minutes after boot time.

The fsync() avoidance rules are:

 - fsync() when writing rcache entries for Authenticators with
   timestamps "far enough" into the future, but not otherwise;

 - at boot time pick a time T_0 such that such that any Authenticators
   with a timestamp after T_0 would have triggered an fsync() if played
   before the boot event;
   
 - reject any Authenticators whose timestamps are before T_0

T_0 must be between T_crash and T_ready, where T_crash is the time of
crash or shutdown, and T_ready is the time at which the system is ready
to service clients.

T_crash is often impossible to determine, so it's best to estimate it as
T_boot.  T_ready can be estimated as T_boot + .5 * average time to boot.
For many systems a decent guesstimate can be something like

  T_ready = T_boot + 3s  /* guesstimate */

The key concept here is that the skew window need not be static, much
less +-5m.  Here it starts at close to [now, now + 5m] at boot time
and grows to [now - 5m, now + 5m] at the rate of 1s/s, so that at T_boot
+ 5m the system is back to normal.

This is inspired by thinking about placing the rcache in tmpfs (where
fsync() is a no-op): the obvious thing to do at boot time would be to
reject any Authenticator timestamps less than T_boot + 5m, whis is like
saying that at T_boot the skew window is [now + 5m, now +5m], and grows
second by second to the normal [now - 5m, now + 5m].  Obviously that's
not acceptable, operationally, which leads to placing the rcache on
stable storage.

Some clients will be rejected that wouldn't have been but for the
reboot, but many, many fewer than in the rcache-on-tmpfs case.

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: rcache fsync() avoidance

Greg Hudson
On 09/01/2014 05:59 PM, Nico Williams wrote:
> One of the things that makes the traditional MIT rcache design painfully
> slow is the use of fsync() in each operation that doesn't detect a
> replay (the common case).

I think at this point we are prepared to just get rid of the fsync()
calls in the MIT krb5 rcache implementation, and call that an
implementation limitation.  My reasoning is:

* For most situations where replay caches help, they provide limited
protection against active attacks anyway.  (Basically: if the protocol
needs replay protection because it uses Kerberos for authentication
only, an active attacker could modify the data stream or suppress the
legitimate authentication to bypass the replay cache.  Replay caches
only provide complete protection when the data stream is protected by
the Kerberos authentication context, but without an acceptor subkey,
such that an attacker could replay a complete session to cause an action
to be executed twice.)

* The design you outline degrades into bad performance if either (1) the
server has negative clock drift beyond the boot time estimate, or (2) a
non-trivial fraction of clients have positive clock drift beyond the
boot time estimate.  It can also cause spurious authentication failures
shortly after boot, for clients with negative clock drift.

* The probability of bad performance behavior increases as the boot time
estimate approaches zero.  At some point in the future we might start to
see VMs with sub-second reboot times, at which point even a 1s positive
client clock drift would force an fsync() and even a 1s negative client
clock drift could cause a spurious authentication failure shortly after
a reboot.
Reply | Threaded
Open this post in threaded view
|

Re: rcache fsync() avoidance

Nico Williams
On Tue, Sep 02, 2014 at 11:56:20AM -0400, Greg Hudson wrote:

> I think at this point we are prepared to just get rid of the fsync()
> calls in the MIT krb5 rcache implementation, and call that an
> implementation limitation.  My reasoning is:
>
> * For most situations where replay caches help, they provide limited
> protection against active attacks anyway.  (Basically: if the protocol
> needs replay protection because it uses Kerberos for authentication
> only, an active attacker could modify the data stream or suppress the
> legitimate authentication to bypass the replay cache.  Replay caches
> only provide complete protection when the data stream is protected by
> the Kerberos authentication context, but without an acceptor subkey,
> such that an attacker could replay a complete session to cause an action
> to be executed twice.)

I would just... declare such protocols dangerous and not supported.
Full stop.

No more rsh/rlogin/telnet with authentication only.  Preferably no more
rsh/rlogin/telnet full stop.

Application protocols that could benefit from rcaches:

 - UDP loggers (non-mutual auth AP-REQ + KRB-SAFE / MIC)
 - UDP / SCTP apps generally

We could even deprecate non-mutual auth for Kerberos and use an rcache
only for PROT_READY tokens, or even document that PROT_READY token
replays are not detected until the first non-PROT_READY per-msg token is
processed by the server.  (PROT_READY token semantics are close enough
to that anyways.)

Then we can get rid of the rcache completely.

It's probably a bit too soon to go that far.  But we could discuss that
on the KITTEN WG list and see what happens.

> * The design you outline degrades into bad performance if either (1) the
> server has negative clock drift beyond the boot time estimate, or (2) a
> non-trivial fraction of clients have positive clock drift beyond the
> boot time estimate.  It can also cause spurious authentication failures
> shortly after boot, for clients with negative clock drift.

If you're not using NTP or alike then it's fair to expect problems!

In any case, we really need a multi-round-trip extension anyways, which
should be the longer term answer to this concern.

> * The probability of bad performance behavior increases as the boot time
> estimate approaches zero.  At some point in the future we might start to
> see VMs with sub-second reboot times, at which point even a 1s positive
> client clock drift would force an fsync() and even a 1s negative client
> clock drift could cause a spurious authentication failure shortly after
> a reboot.

This is quite true.  If the estimate is 3s but the real time to boot is
.5s you have a window of vulnerability, but that's hardly worse than
just never doing fsync()!  :)

Yes, a window of vulnerability a few seconds long would be enormous to
the right attacker, and really, these attacks never happen.  That's a
good reason to stop fsync()ing altogether.  But fsync()s might be more
relevant to other protocols, so at least documenting (done; this thread
can be it) fsync() avoidance might help someone else.

Nico
--