Determening the number of clients per KDC

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Determening the number of clients per KDC

Sergei Gerasenko
Hi,

I’m planning an MIT KDC installation for a hadoop cluster consisting of X clients with Y kerberized services each. The KDCs are rather powerful machines with 64 cores and 125G of RAM. I want to get the most out of this hardware and use the mininum number of KDCs required. Is there a rule of thumb for situations like this?

For example, imagining X=300 and Y=10, can/should I run X*Y (3000) workers to accomodate the worst case scenario when they all want to get their tickets? Or can I assume that X*Y/2 will can handle that?

I would appreciate any insights.

Thanks!
  Sergei
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
Sergei Gerasenko <[hidden email]> writes:

> I’m planning an MIT KDC installation for a hadoop cluster consisting of
> X clients with Y kerberized services each. The KDCs are rather powerful
> machines with 64 cores and 125G of RAM. I want to get the most out of
> this hardware and use the mininum number of KDCs required. Is there a
> rule of thumb for situations like this?

> For example, imagining X=300 and Y=10, can/should I run X*Y (3000)
> workers to accomodate the worst case scenario when they all want to get
> their tickets? Or can I assume that X*Y/2 will can handle that?

For 3000 workers, you could probably run the KDC on a Raspberry Pi.

Redundancy for outage tolerance is almost certainly going to be the
limiting factor for number of KDCs in this situation unless you have way,
way more clients getting tickets than that, or you're using really short
ticket lifetimes, or you have some other unusual situation.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Sergei Gerasenko
Thanks for the quick response, Russ. Let’s say I run 1 worker process. How many clients can that sustain in the worst case scenario of all the clients trying to get a ticket? I need some way to quantify this. As for failover, I am planning to deploy a standby node.

> On Apr 15, 2018, at 11:13 PM, Russ Allbery <[hidden email]> wrote:
>
> Sergei Gerasenko <[hidden email]> writes:
>
>> I’m planning an MIT KDC installation for a hadoop cluster consisting of
>> X clients with Y kerberized services each. The KDCs are rather powerful
>> machines with 64 cores and 125G of RAM. I want to get the most out of
>> this hardware and use the mininum number of KDCs required. Is there a
>> rule of thumb for situations like this?
>
>> For example, imagining X=300 and Y=10, can/should I run X*Y (3000)
>> workers to accomodate the worst case scenario when they all want to get
>> their tickets? Or can I assume that X*Y/2 will can handle that?
>
> For 3000 workers, you could probably run the KDC on a Raspberry Pi.
>
> Redundancy for outage tolerance is almost certainly going to be the
> limiting factor for number of KDCs in this situation unless you have way,
> way more clients getting tickets than that, or you're using really short
> ticket lifetimes, or you have some other unusual situation.
>
> --
> Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>


________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
Sergei Gerasenko <[hidden email]> writes:

> Thanks for the quick response, Russ. Let’s say I run 1 worker
> process. How many clients can that sustain in the worst case scenario of
> all the clients trying to get a ticket? I need some way to quantify
> this. As for failover, I am planning to deploy a standby node.

It's unfortunately been long enough since I've tested this on a system
running flat out that I don't remember what qps a KDC can do on modern
hardware, but I would expect it to at least be in the range of 100 qps.
It's probably constrained by the KDC being single-threaded.  Clients
aren't going to generally all try to get a ticket at the same time, due to
ticket caching, so that scales to a lot of clients.

General rule of thumb for KDCs is that you want at least a master and a
replica, and there's no reason not to have the replica serve most of the
traffic (in other words, I wouldn't go with a standby design).  Usually I
run at least three KDCs, although the number three is mostly because I
started with kaserver that needed three KDCs for stable Ubik quorum, which
of course isn't a thing with regular KDCs, so I don't know that three is
really better than two.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Sergei Gerasenko
Hi Russ,

Since I don’t know too much about the KDC architecture, sorry for the dilettante questions.

> It's unfortunately been long enough since I've tested this on a system
> running flat out that I don't remember what qps a KDC can do on modern
> hardware, but I would expect it to at least be in the range of 100 qps.

Is that per worker?

Speaking of workers, does MIT Kerberos spawn workers as needed (sort of like apache) or is it capped by the `-w` argument? What’s a good number of workers to start with? 70? 500? 1000?

> General rule of thumb for KDCs is that you want at least a master and a
> replica, and there's no reason not to have the replica serve most of the
> traffic (in other words, I wouldn't go with a standby design).  

Ok, to clarify what you mean by the replica serving requests as well, do you mean:

1. Using a VIP that round-robins the requests to the primary and secondary KDC?
2. Or do you mean that half the clients use the master and the other half the slave?
3. Or do you mean that the client itself round-robins between them?

I can only see 2 as a real option because *I think* once a TGT is requested, all TGS requests would need to go to the server that gave the TGT? But I’m rather new to kerberos, so I might be mistaken.

Thanks!!
  Sergei
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
Sergei Gerasenko <[hidden email]> writes:

> Since I don’t know too much about the KDC architecture, sorry for the
> dilettante questions.

Oh, no problem -- just be aware that they're being answered by someone who
hasn't run large-scale KDCs in about four years, so some of my information
is stale.  :)

>> It's unfortunately been long enough since I've tested this on a system
>> running flat out that I don't remember what qps a KDC can do on modern
>> hardware, but I would expect it to at least be in the range of 100 qps.

> Is that per worker?

Oh, workers are new to me.  So yes, that would be per-worker.

> Speaking of workers, does MIT Kerberos spawn workers as needed (sort of
> like apache) or is it capped by the `-w` argument? What’s a good number
> of workers to start with? 70? 500? 1000?

If you're doing default Kerberos, the networking is UDP, so it's not going
to spend a lot of time waiting for the network.  I would expect that to be
CPU-bound, and therefore would tend towards one worker per core.  If
you're doing a lot of TCP, that might increase the chances that you'll
wait for networking, and may benefit from more workers.

This is all just a wild-ass guess, though.

Given your setup, though, it would really surprise me if you saw any
performance issues.

> Ok, to clarify what you mean by the replica serving requests as well, do
> you mean:

> 1. Using a VIP that round-robins the requests to the primary and
> secondary KDC?
> 2. Or do you mean that half the clients use the master and the other
> half the slave?
> 3. Or do you mean that the client itself round-robins between them?

You can use SRV records and get 3 by just listing both KDCs as equal
weight.  All Kerberos clients these days should support SRV records.

If you do have Kerberos clients that don't do SRV records for some reason,
it's pretty easy to do 2 by just randomizing the order of the KDCs in
/etc/krb5.conf.

Kerberos clients are very good about falling back to a second server.
You'll just see a slight delay that you might not even notice.

> I can only see 2 as a real option because *I think* once a TGT is
> requested, all TGS requests would need to go to the server that gave the
> TGT?

Nope, all KDCs share the same database and can answer all requests.  From
a client perspective, all KDC traffic is completely interchangeable.  The
only time it matters is when there's a write, since there's a propagation
delay and the replica will serve stale information for a short period of
time.  For keytabs, this almost doesn't matter; for user credentials with
passwords, there's a way to configure the client to automatically retry
the master if an authentication fails against the replica, which can be
useful for authentication immediately after a password change if you're
not using incremental replication.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Sergei Gerasenko
> Oh, no problem -- just be aware that they're being answered by someone who
> hasn't run large-scale KDCs in about four years, so some of my information
> is stale.  :)

Still very valuable since I haven’t been able to find answers to any of these questions elsewhere.

> If you're doing default Kerberos, the networking is UDP, so it's not going
> to spend a lot of time waiting for the network.  I would expect that to be

Cool, something to verify, but not a bad guess I think.

> This is all just a wild-ass guess, though.
>
> Given your setup, though, it would really surprise me if you saw any
> performance issues.

Will keeping an access log slow me down much, do you know? For that matter, is there a benchmarking tool for KDCs?

> You can use SRV records and get 3 by just listing both KDCs as equal
> weight.  All Kerberos clients these days should support SRV records.

That sounds like a good idea. I can use puppet to list the kdcs in krb5.conf as well.

> I can only see 2 as a real option because *I think* once a TGT is
>> requested, all TGS requests would need to go to the server that gave the
>> TGT?
>
> Nope, all KDCs share the same database and can answer all requests.  

Ok, it’s just that I see everywhere (e.g. https://en.wikipedia.org/wiki/Kerberos_(protocol) <https://en.wikipedia.org/wiki/Kerberos_(protocol)>) that the initial TGT response includes a session key that the host and the service server will share. So that’s what got me thinking that once a TGT is retrieved, the client should request a service ticket using the same KDC. But like I said, I’m total newb.

The AS checks to see if the client is in its database. If it is, the AS generates the secret key by hashing the password of the user found at the database (e.g., Active Directory <https://en.wikipedia.org/wiki/Active_Directory> in Windows Server) and sends back the following two messages to the client:
Message A: Client/TGS Session Key encrypted using the secret key of the client/user.
Message B: Ticket-Granting-Ticket (TGT, which includes the client ID, client network address <https://en.wikipedia.org/wiki/Network_address>, ticket validity period, and the client/TGS session key) encrypted using the secret key of the TGS.

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
Sergei Gerasenko <[hidden email]> writes:

> Will keeping an access log slow me down much, do you know?

Yes, you may want to tune syslog or whatever you're using for your KDC
logging, although MIT is a lot better than Heimdal in that regard (Heimdal
is very verbose).  I generally disabled sync to disk on the syslog log
file that the KDC logging was routed to.

> For that matter, is there a benchmarking tool for KDCs?

Not that I'm aware of.  I usually just rolled my own by calling kinit with
a keytab and then kvno to get service tickets.

> Ok, it’s just that I see everywhere
> (e.g. https://en.wikipedia.org/wiki/Kerberos_(protocol)
> <https://en.wikipedia.org/wiki/Kerberos_(protocol)>) that the initial
> TGT response includes a session key that the host and the service server
> will share. So that’s what got me thinking that once a TGT is retrieved,
> the client should request a service ticket using the same KDC. But like
> I said, I’m total newb.

The TGT contains both the session key and a copy of the session key
encrypted in the KDC's private key, which is shared between all of the
KDCs as part of the normal database, and the client always provides that
encrypted copy of the key back in subsequent protocol exchanges.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Andrew Cobaugh
On Mon, Apr 16, 2018 at 5:41 PM, Russ Allbery <[hidden email]> wrote:

> Sergei Gerasenko <[hidden email]> writes:
>
> > Will keeping an access log slow me down much, do you know?
>
> Yes, you may want to tune syslog or whatever you're using for your KDC
> logging, although MIT is a lot better than Heimdal in that regard (Heimdal
> is very verbose).  I generally disabled sync to disk on the syslog log
> file that the KDC logging was routed to.
>

Agree with disabling sync logging to local disk. The problem I've run into
is TCP syslog where the remote system can't keep up. Ask me how I know...

Always better to write to local log file asynchronously, then have an agent
(filebeat, splunk) follow that file and forward on, as it will still be
more reliable than any flavor of remote syslog.


>
> > For that matter, is there a benchmarking tool for KDCs?
>
> Not that I'm aware of.  I usually just rolled my own by calling kinit with
> a keytab and then kvno to get service tickets.
>

I wrote this a while back to help track down a TCP syslog bottleneck, which
later turned out to be very useful for isolating other performance issues
and general capacity planning. Also currently using it to demonstrate how
much faster MIT Kerberos is compared to AD, even when not using workers (on
modern-ish CPUs, without workers enabled krb5kdc can do ~4000 rps. I can
share more details if folks are interested).

   https://github.com/acobaugh/krb5perf

It is worth noting that when load testing a single KDC, you pretty much
have to take DNS out of the equation somehow. Initially I was testing the
performance of my local unbound caching nameserver...

--
andy
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
Andrew Cobaugh <[hidden email]> writes:

> Also currently using it to demonstrate how much faster MIT Kerberos is
> compared to AD, even when not using workers (on modern-ish CPUs, without
> workers enabled krb5kdc can do ~4000 rps. I can share more details if
> folks are interested).

Ah, good, I'm glad my 100 qps number was off in the direction that I
thought it would be.  I didn't want to overpromise, but KDCs are *really
fast*.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Mark Pröhl
In reply to this post by Russ Allbery-2
On 04/16/2018 05:51 PM, Russ Allbery wrote:
> ... Clients
> aren't going to generally all try to get a ticket at the same time, due to
> ticket caching, so that scales to a lot of clients.
>

I have only seen JAVA/JAAS clients caching the TGT and not the service
tickets. Especially in Hadoop environments this leads to much more TGS
traffic than in "classical" Kerberos environments. 1000 rps are not unusual.

- Mark
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Sergei Gerasenko
In reply to this post by Russ Allbery-2
Thank you so much for confirming that the KDCs are fast. This saved me a ton of time writing my own tests, etc. Andrew, as far as workers, is it one worker per core in general as Russ theorized?

Otherwise, I think I’m all set for now.

Thanks!!

> On Apr 16, 2018, at 8:41 PM, Russ Allbery <[hidden email]> wrote:
>
> Andrew Cobaugh <[hidden email]> writes:
>
>> Also currently using it to demonstrate how much faster MIT Kerberos is
>> compared to AD, even when not using workers (on modern-ish CPUs, without
>> workers enabled krb5kdc can do ~4000 rps. I can share more details if
>> folks are interested).
>
> Ah, good, I'm glad my 100 qps number was off in the direction that I
> thought it would be.  I didn't want to overpromise, but KDCs are *really
> fast*.
>
> --
> Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>
> ________________________________________________
> Kerberos mailing list           [hidden email]
> https://mailman.mit.edu/mailman/listinfo/kerberos


________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Russ Allbery-2
In reply to this post by Mark Pröhl
Mark Pröhl <[hidden email]> writes:
> On 04/16/2018 05:51 PM, Russ Allbery wrote:

>> ... Clients aren't going to generally all try to get a ticket at the
>> same time, due to ticket caching, so that scales to a lot of clients.

> I have only seen JAVA/JAAS clients caching the TGT and not the service
> tickets. Especially in Hadoop environments this leads to much more TGS
> traffic than in "classical" Kerberos environments. 1000 rps are not
> unusual.

Ah, interesting!  (Also incredibly broken behavior....)

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Sergei Gerasenko

> On Apr 17, 2018, at 5:20 PM, Russ Allbery <[hidden email]> wrote:
>
> Mark Pröhl <[hidden email]> writes:
>> On 04/16/2018 05:51 PM, Russ Allbery wrote:
>
>>> ... Clients aren't going to generally all try to get a ticket at the
>>> same time, due to ticket caching, so that scales to a lot of clients.
>
>> I have only seen JAVA/JAAS clients caching the TGT and not the service
>> tickets. Especially in Hadoop environments this leads to much more TGS
>> traffic than in "classical" Kerberos environments. 1000 rps are not
>> unusual.

Thank you for pointing that out! I have no idea what to expect from the version of Java that is going to be used. I hope I’ll find a work-around if that’s the case.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Determening the number of clients per KDC

Andrew Cobaugh
In reply to this post by Sergei Gerasenko
On Tue, Apr 17, 2018 at 9:32 AM, Sergei Gerasenko <[hidden email]> wrote:

> Thank you so much for confirming that the KDCs are fast. This saved me a
> ton of time writing my own tests, etc. Andrew, as far as workers, is it one
> worker per core in general as Russ theorized?
>

I haven't played with the workers option very much. We (major public
university) don't run with workers in production, as we only see peaks
around 500 rps, mostly from wireless authentication (but sometimes
single-digit second bursts when there are disruptions in the network core
and wireless auths get backed up on the controllers). Our KDCs are Dell
Poweredge R330 with decently fast disk and 8 cores. With workers set to 8,
I was getting ~20k rps iirc, but I think my load test client was maxed out
at that point. That test was sufficient to know that we could enable "turbo
boost" if we needed too

--
andy
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos