long running kadm5 program running into errors

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

long running kadm5 program running into errors

Chris Hecker

I have a long-running daemon that reads a kadm5 admin key from a file
keytab into a memory keytab before dropping privs, and then when the
kadm5 connection drops (using checks for KADM5_RPC_ERROR), it loops and
tries to reconnect with kadm5_init_with_skey.  This works fine, with 1.9
it would stay up for months like this, and with 1.15.1 it was up for
weeks.  Or, at least, it worked fine until an hour ago...

This time, when it got the KADM5_RPC_ERROR, it reconnected and got an
EBUSY.  If I look at kadmind.log, I see this:

Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](info): closing down fd 22

I didn't retry in this case, so I assume these retries were inside libkadm5?

The krb5kdc.log was find this whole time, and my uptime probes that got
tickets were fine.  The kdc and this libkadm5 app are on the same machine.

Is this just computers being weird and I should handle this case and
loop and keep trying?  There's no new timeout or anything added between
1.9 and 1.15 I should know about?

Thanks,
Chris


________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: long running kadm5 program running into errors

Chris Hecker

I just got this again today, this time instead of EBUSY, the libkadm5
client got "Cannot resolve network address for admin server in requested
realm (43787576)".  Same "The referenced context has expired" stuff in
kadmind.log.  I'm confused about the kadmind.log lines...I can see the
client having a problem due to timeouts or paging or whatever, but why
would the kadmind print that stuff in this case?

Chris


On 2018-08-16 17:47, Chris Hecker wrote:

>
> I have a long-running daemon that reads a kadm5 admin key from a file
> keytab into a memory keytab before dropping privs, and then when the
> kadm5 connection drops (using checks for KADM5_RPC_ERROR), it loops
> and tries to reconnect with kadm5_init_with_skey.  This works fine,
> with 1.9 it would stay up for months like this, and with 1.15.1 it was
> up for weeks.  Or, at least, it worked fine until an hour ago...
>
> This time, when it got the KADM5_RPC_ERROR, it reconnected and got an
> EBUSY.  If I look at kadmind.log, I see this:
>
> Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
> failed inquire_context, stat=786432
> Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
> attempt failed: x.x.x.x, GSS-API error strings are:
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
> context has expired
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
> strings complete.
> Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication
> attempt failed: x.x.x.x, RPC authentication flavor 6
> Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
> failed inquire_context, stat=786432
> Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
> attempt failed: x.x.x.x, GSS-API error strings are:
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
> context has expired
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
> strings complete.
> Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication
> attempt failed: x.x.x.x, RPC authentication flavor 6
> Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
> failed inquire_context, stat=786432
> Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
> attempt failed: x.x.x.x, GSS-API error strings are:
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
> context has expired
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
> Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
> strings complete.
> Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication
> attempt failed: x.x.x.x, RPC authentication flavor 6
> Aug 16 16:17:12 domain.com kadmind[31118](info): closing down fd 22
>
> I didn't retry in this case, so I assume these retries were inside
> libkadm5?
>
> The krb5kdc.log was find this whole time, and my uptime probes that
> got tickets were fine.  The kdc and this libkadm5 app are on the same
> machine.
>
> Is this just computers being weird and I should handle this case and
> loop and keep trying?  There's no new timeout or anything added
> between 1.9 and 1.15 I should know about?
>
> Thanks,
> Chris
>
>

________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: long running kadm5 program running into errors

Greg Hudson
On 08/18/2018 01:53 AM, Chris Hecker wrote:
> I just got this again today, this time instead of EBUSY, the libkadm5
> client got "Cannot resolve network address for admin server in requested
> realm (43787576)".  Same "The referenced context has expired" stuff in
> kadmind.log.  I'm confused about the kadmind.log lines...I can see the
> client having a problem due to timeouts or paging or whatever, but why
> would the kadmind print that stuff in this case?

>> Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
>> failed inquire_context, stat=786432
>> Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
>> attempt failed: x.x.x.x, GSS-API error strings are:
>> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
>> context has expired

These errors look like they might be associated with the previous
connection expiring, rather than the new connection failing.

I don't have much insight into why you're getting EBUSY or
KRB5_REALM_CANT_RESOLVE when connecting to kadmind on the the local host.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: long running kadm5 program running into errors

Chris Hecker
I think this turned out to be an IP address that was attached to eth0 that
had actually been moved.  Those kadmind.log errors were correlated with the
connection problems though (always were present when a failure like this
occurred), if that is useful info.

Chris


On Wed, Aug 22, 2018 at 08:09 Greg Hudson <[hidden email]> wrote:

> On 08/18/2018 01:53 AM, Chris Hecker wrote:
> > I just got this again today, this time instead of EBUSY, the libkadm5
> > client got "Cannot resolve network address for admin server in requested
> > realm (43787576)".  Same "The referenced context has expired" stuff in
> > kadmind.log.  I'm confused about the kadmind.log lines...I can see the
> > client having a problem due to timeouts or paging or whatever, but why
> > would the kadmind print that stuff in this case?
>
> >> Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
> >> failed inquire_context, stat=786432
> >> Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
> >> attempt failed: x.x.x.x, GSS-API error strings are:
> >> Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
> >> context has expired
>
> These errors look like they might be associated with the previous
> connection expiring, rather than the new connection failing.
>
> I don't have much insight into why you're getting EBUSY or
> KRB5_REALM_CANT_RESOLVE when connecting to kadmind on the the local host.
>
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos