krb5.conf implementation question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

krb5.conf implementation question

O'Loughlin, Kieran
Hi all,

I'm not developing an application using the MIT Kerberos libraries, but am implementing a third-party application that uses the libraries.  So, I hope it's ok to be emailing this list.  I'm hitting intermittent authentication failures and looking for guidance on how they might be avoided.

The environment is using Active Directory 2012 R2 as the KDC.  The krb5.conf file is configured with a list of individual KDC machines in the realms section.  It appears that the libraries use the first KDC in the list as long as it is responding.  The problem happens when the first KDC in the list is rebooted.  There is a short window where the KDC will respond with KDC_ERR_C_PRINCIPAL_UNKNOWN or KDC_ERR_S_PRINCIPAL_UNKNOWN (probably depending on type of request) as it is shutting down.  Microsoft has an article about this type of behavior for 2008 SP2 https://support.microsoft.com/en-us/help/982801/a-domain-controller-returns-the-no-such-user-0xc0000064-status-code-or, but we're seeing the same symptoms with 2008 R2, 2012 R2 and 2019.

When we enable KRB5_TRACE we see up to 4 request attempt being made.  I don't know if the multiple tries are initiated by the MIT code or by the application code.  In many cases the 4 tries will happen within the window that the AD server is responding with the error.  Usually all 4 attempts will be sent to the same KDC within about a half-second window.  This is in a short enough timeframe that we can get 4 errors from the KDC as it shuts down.

Here are the questions I had around this:

  *   If the MIT code is making the 4 request attempts, is there any way (krb5.conf configuration, env variables, etc.) that we could force each retry to use a different KDC entry in the krb5.conf.  This would move the retries away from the server that is rebooting and the first of those should get a good response.
  *   As mentioned above we are listing out the individual KDC machines, would it be better to set up the krb5.conf in a different way, perhaps using DNS to find the KDCs?
  *   I saw a mention in one email about setting master_kdc, that suggested if there is an error a subsequent request might be sent to the master_kdc.  However the documentation says this only happens on an invalid password.  Is that the case or is it worth setting master_kdc?  We don't do that currently.

Thanks for any help with this.  I'm pasting the KRB5_TRACE output from the application client, in case that helps demonstrate the problem.  In this case there were two KDCs and the first one in the krb5.conf was rebooted.  We never switched to the second one and the authentication failed after sending 4 requests within a second to the rebooting KDC.

Thanks,

Kieran.


[81339] 1574879653.119817: Getting credentials [hidden email] -> *** SPN ***@EXAMPLE.COM using ccache FILE:/tmp/user.ccache
[81339] 1574879653.120309: Retrieving [hidden email] -> *** SPN ***@EXAMPLE.COM from FILE:/tmp/user.ccache with result: 0/Success
[81339] 1574879653.120799: Retrieving [hidden email] -> krbtgt/[hidden email] from FILE:/tmp/user.ccache with result: 0/Success
[81339] 1574879653.120825: Get cred via TGT krbtgt/[hidden email] after requesting krbtgt/[hidden email] (canonicalize off)
[81339] 1574879653.120886: Generated subkey for TGS request: aes256-cts/2A2B
[81339] 1574879653.120983: etypes requested in TGS request: aes256-cts
[81339] 1574879653.121286: Encoding request body and padata into FAST request
[81339] 1574879653.121423: Sending request (1831 bytes) to EXAMPLE.COM
[81339] 1574879653.121515: Resolving hostname win2012-01.EXAMPLE.COM
[81339] 1574879653.122060: Initiating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.131869: Sending TCP request to stream 192.168.240.27:88
[81339] 1574879653.135706: Received answer (357 bytes) from stream 192.168.240.27:88
[81339] 1574879653.135729: Terminating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.135843: Response was not from master KDC
[81339] 1574879653.135872: Decoding FAST response
[81339] 1574879653.136000: Decoding FAST response
[81339] 1574879653.136060: Got cred; -1765328378/Client not found in Kerberos database
[81339] 1574879653.136125: Get cred via TGT krbtgt/[hidden email] after requesting krbtgt/[hidden email] (canonicalize off)
[81339] 1574879653.136150: Generated subkey for TGS request: aes256-cts/5D5B
[81339] 1574879653.136187: etypes requested in TGS request: aes256-cts
[81339] 1574879653.136253: Encoding request body and padata into FAST request
[81339] 1574879653.136313: Sending request (1831 bytes) to EXAMPLE.COM
[81339] 1574879653.136324: Resolving hostname win2012-01.EXAMPLE.COM
[81339] 1574879653.136909: Initiating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.139206: Sending TCP request to stream 192.168.240.27:88
[81339] 1574879653.142052: Received answer (357 bytes) from stream 192.168.240.27:88
[81339] 1574879653.142068: Terminating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.142299: Response was not from master KDC
[81339] 1574879653.142367: Decoding FAST response
[81339] 1574879653.142772: Decoding FAST response
[81339] 1574879653.142869: Got cred; -1765328378/Client not found in Kerberos database
[81339] 1574879653.143135: Creating authenticator for [hidden email] -> *** SPN ***@EXAMPLE.COM, seqnum 826984930, subkey aes256-cts/2647, session key aes256-cts/A880
[81339] 1574879653.269412: Read AP-REP, time 1574879653.143176, subkey aes256-cts/71A7, seqnum 749727685
[81339] 1574879653.455353: Getting credentials [hidden email] -> *** SPN ***@EXAMPLE.COM using ccache FILE:/tmp/user.ccache
[81339] 1574879653.455624: Retrieving [hidden email] -> *** SPN ***@EXAMPLE.COM from FILE:/tmp/user.ccache with result: 0/Success
[81339] 1574879653.456055: Retrieving [hidden email] -> krbtgt/[hidden email] from FILE:/tmp/user.ccache with result: 0/Success
[81339] 1574879653.456079: Get cred via TGT krbtgt/[hidden email] after requesting krbtgt/[hidden email] (canonicalize off)
[81339] 1574879653.456119: Generated subkey for TGS request: aes256-cts/9C28
[81339] 1574879653.456154: etypes requested in TGS request: aes256-cts
[81339] 1574879653.456225: Encoding request body and padata into FAST request
[81339] 1574879653.456279: Sending request (1831 bytes) to EXAMPLE.COM
[81339] 1574879653.456301: Resolving hostname win2012-01.EXAMPLE.COM
[81339] 1574879653.456560: Initiating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.457220: Sending TCP request to stream 192.168.240.27:88
[81339] 1574879653.458621: Received answer (357 bytes) from stream 192.168.240.27:88
[81339] 1574879653.458641: Terminating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.458764: Response was not from master KDC
[81339] 1574879653.459100: Decoding FAST response
[81339] 1574879653.459205: Decoding FAST response
[81339] 1574879653.459243: Got cred; -1765328378/Client not found in Kerberos database
[81339] 1574879653.459252: Get cred via TGT krbtgt/[hidden email] after requesting krbtgt/[hidden email] (canonicalize off)
[81339] 1574879653.459266: Generated subkey for TGS request: aes256-cts/65FB
[81339] 1574879653.459292: etypes requested in TGS request: aes256-cts
[81339] 1574879653.459378: Encoding request body and padata into FAST request
[81339] 1574879653.459456: Sending request (1831 bytes) to EXAMPLE.COM
[81339] 1574879653.459468: Resolving hostname win2012-01.EXAMPLE.COM
[81339] 1574879653.459726: Initiating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.460139: Sending TCP request to stream 192.168.240.27:88
[81339] 1574879653.461270: Received answer (357 bytes) from stream 192.168.240.27:88
[81339] 1574879653.461290: Terminating TCP connection to stream 192.168.240.27:88
[81339] 1574879653.461410: Response was not from master KDC
[81339] 1574879653.461426: Decoding FAST response
[81339] 1574879653.461461: Decoding FAST response
[81339] 1574879653.461487: Got cred; -1765328378/Client not found in Kerberos database
[81339] 1574879653.461607: Creating authenticator for [hidden email] -> *** SPN ***@EXAMPLE.COM, seqnum 1054877856, subkey aes256-cts/863F, session key aes256-cts/A880
[81339] 1574879653.535150: Read AP-REP, time 1574879653.461618, subkey aes256-cts/8C75, seqnum 337205966

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: krb5.conf implementation question

Greg Hudson
On 4/10/20 5:40 PM, O'Loughlin, Kieran wrote:
> I'm not developing an application using the MIT Kerberos libraries, but am implementing a third-party application that uses the libraries.  So, I hope it's ok to be emailing this list.

[hidden email] would be more appropriate, but it's not a big deal.

> The problem happens when the first KDC in the list is rebooted.  There is a short window where the KDC will respond with KDC_ERR_C_PRINCIPAL_UNKNOWN or KDC_ERR_S_PRINCIPAL_UNKNOWN (probably depending on type of request) as it is shutting down.  Microsoft has an article about this type of behavior for 2008 SP2 https://support.microsoft.com/en-us/help/982801/a-domain-controller-returns-the-no-such-user-0xc0000064-status-code-or, but we're seeing the same symptoms with 2008 R2, 2012 R2 and 2019.

That's interesting; I haven't heard of this particular issue with
Microsoft's KDCs before, and it sounds like Microsoft thinks they fixed
the problem ten years ago.  Unfortunately I don't think you'll be able
to resolve the issue on the MIT client side without code changes.

> When we enable KRB5_TRACE we see up to 4 request attempt being made.  I don't know if the multiple tries are initiated by the MIT code or by the application code.

Based on the trace logs, I think the second try is initiated by the MIT
code, and the other pair is the result of a second attempt to get
credentials by the application.

The second try in the MIT code is a fallback in case the KDC doesn't
support referrals.

>   *   If the MIT code is making the 4 request attempts, is there any way (krb5.conf configuration, env variables, etc.) that we could force each retry to use a different KDC entry in the krb5.conf.  This would move the retries away from the server that is rebooting and the first of those should get a good response.

We only walk down the KDC list when the first KDC fails to respond.

>   *   As mentioned above we are listing out the individual KDC machines, would it be better to set up the krb5.conf in a different way, perhaps using DNS to find the KDCs?

I don't think that would help reliably.  Randomization of DNS response
order could make the right thing happen some of the time, but not all of
the time.

>   *   I saw a mention in one email about setting master_kdc, that suggested if there is an error a subsequent request might be sent to the master_kdc.  However the documentation says this only happens on an invalid password.  Is that the case or is it worth setting master_kdc?  We don't do that currently.

master_kdc currently only applies to AS requests, and this is a TGS
request, so that wouldn't help.  Also, it obviously wouldn't help the
KDC listed in master_kdc was the one shutting down.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

RE: krb5.conf implementation question

O'Loughlin, Kieran
Hi Greg,

That's helpful, thanks for taking the time to get back with me.

> Based on the trace logs, I think the second try is initiated by the MIT code, and the other pair is the result of a second attempt to get credentials by the application.

This is really interesting.  I have a support ticket opened with the software vendor.  Knowing that they are retrying on their side is helpful, I can encourage them to potentially add a delay before the retry and/or to add more retries.  We've seen the period of time that AD responds like this to be a couple of seconds, if they delay their retry, then that would give the AD server that is shutting down time to complete its work, and shutdown, and the retry will hit the next KDC in the list.  Of course, this wouldn't be foolproof but if we could reduce the frequency of failures that would be great.  Currently we get failures each time the KDC at the top of our list gets rebooted.

> I don't think that would help reliably.  Randomization of DNS response order could make the right thing happen some of the time, but not all of the time.

I understand what you're saying.  Again, even if it's not a complete solution, but it helps to reduce the failure frequency that's helpful to us.  I was thinking we could keep our existing krb5.conf KDC configuration, but add the realm name as an entry at the top as the first KDC entry.  Then that would cause the requests to go to different AD servers each time.  In the client trace it does indicate that the client resolves the machine name before each of the 4 requests.  In the customer's DNS configuration, resolution of the their Kerberos realm name via DNS return the IP addresses of 75 AD servers.  From what I can tell, each time a request is made to the DNS server for the Kerberos realm name, it returns the 75 IP address list with the top IP address removed from the list and added to the bottom.  It seems to be doing this as a load balancing mechanism probably.  If we 4 DNS requests are made for our initial TGS-REQ and the 3 retries, then the odds are very good that most!
  of the time those 4 requests will be made to 4 different KDCs.

Any feedback you might have on these ideas, would be greatly appreciated.

Thanks,

Kieran.

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev