GSSAPI security context integrity check

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

GSSAPI security context integrity check

Alexandr Nedvedicky
Hello,

not sure if it is the right place to ask questions related to GSSAPI, will be
glad for any useful pointers.

The issue described here is about libgss, which comes with MIT kerberos 1.16,
and SAP NetWeaver Application Server release 749. Customer runs SAP with
privacy and integrity protection.  Everything used to work fine with old
kerberos 1.8.4.  Customer switched to Solaris 11.4, which comes with kerberos
1.16. The clients are no longer able to connect to SAP server on Solaris
as long as integrity protection is enabled. The log on server shows
error below:

    N Mon Mar 16 15:58:10:381 2020
    N  *** ERROR => SncPFrameIn()==SNCERR_GSSAPI  [/bas/749_REL/sr 4454]
    N        GSS-API(maj): An expected per-message token was not received
    N        GSS-API(min): Unknown code 0

The error message matches GSS_S_GAP_TOKEN error.  Grepping through source code
the error can come from g_seqstate_check() found in
lib/gssapi/generic/util_seqstate.c. This is the only function, which returns
GSS_S_GAP_TOKEN error, I could find. This was promising as kerberos 1.8.4
comes with different algorithm for sequence number checking.

To further analyze the issue I've sent customer a library with patch (see
attachment). The patched binary allowed me to gather detailed tracing
from g_seqstate_check(). The results I got from customer are kind of
surprising, because the results just confirm both algorithms (current and
1.8.4) are equal. However the data collected by customer shows some
odd behavior, which rather points to interoperability issue between
MIT GSSAPI implementation and GSSAPI on clients (?Windows 10?).

The attached patch tracks all sequence number since security context gets
created until it gets released. Once patched kerberos got installed
customer retry the login. There was a failure and the directory
with data captured two files:

    lumpy$ ls -l |grep -v '0 Apr'
    total 8
    -rw-r-----  1 sashan  wheel  131 Apr  8 01:33 ctx-10.c523660
    -rw-r-----  1 sashan  wheel  134 Apr  8 01:33 ctx-11.c523660

two security contexts attempted to use integrity protection. What's
worth to note are the file names. File names are constructed
in simple/naive fashion:

    snprintf(path, sizeof (path), "%s/ctx-%u.%p", dbg_dir,
                    counter++, ctx);
    fd = open(path, O_RDWR|O_CREAT|O_APPEND,
                    S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);

The counter is static variable, ctx is pointer to g_seqnum_state
instance allocated in g_seqstate_init(). Let's take a look at
the content of files. The first file (ctx-10.c523660) looks OK,
there was a single line:

    [ 04-07 16:33:31 ] in:  1509845287 base:        1509845287 \
            mask:        ffffffffffffffff relative:      0 \
            map:  0 offset:       0 next: 1 OK (GSS_S_COMPLETE)

The second file ctx-11.c523660 matches the error we've seen in log from SAP:

    [ 04-07 16:33:32 ] in:  1509845288 base:        1509845287 \
            mask:        ffffffffffffffff relative:      1 \
            map:  0 offset:       1 next: 2 FAIL (GSS_S_GAP_TOKEN)

What's worth to note is that the context 11 and 10 do share same base/initial
sequence number. If I understand GSSAPI right the base sequence number
comes from peer, acceptor (kg_accept_krb5()) retrieves that here:

 608     /* decode the message */
 609
 610     if ((code = krb5_auth_con_init(context, &auth_context))) {
 611         major_status = GSS_S_FAILURE;
 612         save_error_info((OM_uint32)code, context);
 613         goto fail;
 614     }
 615     if (cred->rcache) {
 616         cred_rcache = 1;
 ...
 983     {
 984         krb5_int32 seq_temp;
 985         krb5_auth_con_getremoteseqnumber(context, auth_context, &seq_temp);
 986         ctx->seq_recv = seq_temp;
 987     }
 988
 989     if ((code = krb5_timeofday(context, &now))) {
 990         major_status = GSS_S_FAILURE;
 991         goto fail;
 992     }
 993
 994     code = g_seqstate_init(&ctx->seqstate, ctx->seq_recv,
 995                            (ctx->gss_flags & GSS_C_REPLAY_FLAG) != 0,
 996                            (ctx->gss_flags & GSS_C_SEQUENCE_FLAG) != 0,
 997                            ctx->proto);

All it seems to me there is a some disagreement between acceptor (SAP server)
and client, whether there is one security context (they way window initiator
sees the session) or there is a new second security context, which is being
accepted as seen by acceptor.

I'll be glad for any further debugging tips like how to further trace GSSAPI to
hunt down the bug root cause the problem. I'll be also happy to discuss the
matter off-list and share the results.

thanks for any hints or pointers to get this moving forward.

regards
sasha

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

seq.diff (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Greg Hudson
On 5/6/20 1:18 PM, Alexandr Nedvedicky wrote:
> not sure if it is the right place to ask questions related to GSSAPI, will be
> glad for any useful pointers.

This is the right place, since it relates to the MIT krb5 GSS
implementation.

> Customer switched to Solaris 11.4, which comes with kerberos
> 1.16.

Are there Solaris-specific modifications to this code, or is it
unmodified 1.16?

> two security contexts attempted to use integrity protection.

The two filenames had the same suffix (c523660).  If I understand
correctly, that is the pointer value of the krb5 GSS context object--so
both g_seqstate_init() calls were for the same context (which is
consistent with the initial sequence numbers being the same).  It would
be very interesting to know the stack traces of the two
g_seqstate_init() calls, although that might be difficult to collect
remotely.  Normally there should only be one g_seqstate_init() call for
a context, from kg_accept_krb5().
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Alexandr Nedvedicky
On Wed, May 06, 2020 at 08:26:57PM -0400, Greg Hudson wrote:
</snip>
> > Customer switched to Solaris 11.4, which comes with kerberos
> > 1.16.
>
> Are there Solaris-specific modifications to this code, or is it
> unmodified 1.16?

    sorry I've forgot to mention that in my first email. There are
    several modifications to GSSAPI. Although those changes do
    touch the GSSAPI, they should not matter (according to my
    guess/understanding of code).

    The first change improves interoperability with SMB [1]. According
    to records it dates back to WinXP epoch.

    There are also Solaris specific modifications, which allow MIT
    kerberos to share security context with keberos mechanism implemented
    in Solaris kernel [2]. If I understand things right, they allow kerberized
    NFS to work.

    then there is change, which deals with specific way to acquire
    credentials for the root user [3].

    The remaining changes deal with source code compatibility
    for legacy parts of Solaris.

>
> > two security contexts attempted to use integrity protection.
>
> The two filenames had the same suffix (c523660).  If I understand
> correctly, that is the pointer value of the krb5 GSS context object--so
> both g_seqstate_init() calls were for the same context (which is
> consistent with the initial sequence numbers being the same).  It would
> be very interesting to know the stack traces of the two
> g_seqstate_init() calls, although that might be difficult to collect
> remotely.  Normally there should only be one g_seqstate_init() call for
> a context, from kg_accept_krb5().

    I'll ask customer to collect the stack with dtrace. After spending couple
    days browsing through source code the g_seqstate_init() gets only called
    when new security context is born (import, accept or init).

    Assuming I interpret the log from SAPware right, we run as GSS acceptor,
    so g_seqstate_init() is being called from kg_accept_krb5() in
    lib/gssapi/krb5/accept_sec_context.c.

yes, that's a good idea to try to collect stack trace using dtrace.

thanks and
regards
sasha

[1] https://github.com/Sashan/krb5/commit/fa78fb2c9f6e92e087b27c86c1fa79326fde73b5

[2] https://github.com/Sashan/krb5/commit/43ce4ff37598e91f7ae35630115bfcec5bbe6eb7

[3] https://github.com/Sashan/krb5/commit/5373d32f0d2f323e42dabec02707f591a3ae10fc
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Alexandr Nedvedicky
Hello,

just to let you know about the progress of the things...
it looks like issue specific to Solaris at the moment.

</snip>

> > The two filenames had the same suffix (c523660).  If I understand
> > correctly, that is the pointer value of the krb5 GSS context object--so
> > both g_seqstate_init() calls were for the same context (which is
> > consistent with the initial sequence numbers being the same).  It would
> > be very interesting to know the stack traces of the two
> > g_seqstate_init() calls, although that might be difficult to collect
> > remotely.  Normally there should only be one g_seqstate_init() call for
> > a context, from kg_accept_krb5().
>
>     I'll ask customer to collect the stack with dtrace. After spending couple
>     days browsing through source code the g_seqstate_init() gets only called
>     when new security context is born (import, accept or init).
>
>     Assuming I interpret the log from SAPware right, we run as GSS acceptor,
>     so g_seqstate_init() is being called from kg_accept_krb5() in
>     lib/gssapi/krb5/accept_sec_context.c.
>
> yes, that's a good idea to try to collect stack trace using dtrace.
>

    we've got a dtrace output, which shows security context export/import
    is involved. I suspect there are multiple SAP processes. One of them
    accepts security context, exports it and passes to worker process, which
    imports it. I've noticed there is a Solaris specific diff, which deals
    context serialization. Therefore I think the issue might be specific to
    Solaris.  I'll send pull request in case the  investigation will reveal
    it's bug in MIT kerberos.


thanks and
regard
sashan

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Alexandr Nedvedicky
Hello,

I owe an answer here. Just hit delete if you don't care. To keep story short:
the bug was hiding at Solaris specific code, which deals with GSS security
context export/import.  It was sitting there for 4 years or so waiting to be
discovered.

It looks like SAP software has one dedicated server, which performs
authentication. As soon as client is authenticated the security
context is exported and passed to another (worker) process, which
imports it.

There are two GSS kerberos mechanisms in Solaris:
    - MIT one (native)

    - kernel one (Solairs). Think of it like it's MIT 12 years ago
      bent enough to work in Solaris kernel.

There is a Solaris specific change to native kerberos, which deals with
incompatibility between Solaris and MIT, when it comes to sequence number
checking. The Solaris kernel mechanism still uses sequence number tracking as
implemented in src/lib/gssapi/generic/util_ordering.c, which MIT dropped
in 1.14 in favor of src/lib/gssapi/generic/util_seqstate.c.

Once issue was understood the fix is straightforward.  The export/import
process must serialize security context such it will be compatible with kernel
mechanism (turn seqstate to order). And vice-versa import process must turn
order to seqstate. End of story.

thanks and
regards
sasha

On Fri, Jun 12, 2020 at 08:13:59AM +0200, Alexandr Nedvedicky wrote:

> Hello,
>
> just to let you know about the progress of the things...
> it looks like issue specific to Solaris at the moment.
>
> </snip>
> > > The two filenames had the same suffix (c523660).  If I understand
> > > correctly, that is the pointer value of the krb5 GSS context object--so
> > > both g_seqstate_init() calls were for the same context (which is
> > > consistent with the initial sequence numbers being the same).  It would
> > > be very interesting to know the stack traces of the two
> > > g_seqstate_init() calls, although that might be difficult to collect
> > > remotely.  Normally there should only be one g_seqstate_init() call for
> > > a context, from kg_accept_krb5().
> >
> >     I'll ask customer to collect the stack with dtrace. After spending couple
> >     days browsing through source code the g_seqstate_init() gets only called
> >     when new security context is born (import, accept or init).
> >
> >     Assuming I interpret the log from SAPware right, we run as GSS acceptor,
> >     so g_seqstate_init() is being called from kg_accept_krb5() in
> >     lib/gssapi/krb5/accept_sec_context.c.
> >
> > yes, that's a good idea to try to collect stack trace using dtrace.
> >
>
>     we've got a dtrace output, which shows security context export/import
>     is involved. I suspect there are multiple SAP processes. One of them
>     accepts security context, exports it and passes to worker process, which
>     imports it. I've noticed there is a Solaris specific diff, which deals
>     context serialization. Therefore I think the issue might be specific to
>     Solaris.  I'll send pull request in case the  investigation will reveal
>     it's bug in MIT kerberos.
>
>
> thanks and
> regard
> sashan
>
> _______________________________________________
> krbdev mailing list             [hidden email]
> https://mailman.mit.edu/mailman/listinfo/krbdev
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Greg Hudson
On 6/26/20 4:10 AM, Alexandr Nedvedicky wrote:
> Once issue was understood the fix is straightforward.  The export/import
> process must serialize security context such it will be compatible with kernel
> mechanism (turn seqstate to order). And vice-versa import process must turn
> order to seqstate. End of story.

Thanks for the update; it provides a lot of useful context.

We have periodically talked about radically changing how gss-krb5
security contexts are exported and imported, most likely accompanied by
a written specification.  That would let us rip out the libkrb5
serialization code (which isn't up to current standards), perhaps share
a token format with Heimdal, and most likely reduce the token size
significantly.

It sounds like if we did this work, it would create a significant amount
of work for Oracle, which would have to either translate the new format
to the kernel format, or adapt the import code to the kernel.  On the
other hand, if the new format is stable and/or versioned, it might help
to prevent subtle bugs like this one--which was caused by a change to
the export token format without any accompanying versioning.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: GSSAPI security context integrity check

Alexandr Nedvedicky
On Fri, Jun 26, 2020 at 11:38:10AM -0400, Greg Hudson wrote:

> On 6/26/20 4:10 AM, Alexandr Nedvedicky wrote:
> > Once issue was understood the fix is straightforward.  The export/import
> > process must serialize security context such it will be compatible with kernel
> > mechanism (turn seqstate to order). And vice-versa import process must turn
> > order to seqstate. End of story.
>
> Thanks for the update; it provides a lot of useful context.
>
> We have periodically talked about radically changing how gss-krb5
> security contexts are exported and imported, most likely accompanied by
> a written specification.  That would let us rip out the libkrb5
> serialization code (which isn't up to current standards), perhaps share
> a token format with Heimdal, and most likely reduce the token size
> significantly.
>
> It sounds like if we did this work, it would create a significant amount
> of work for Oracle, which would have to either translate the new format
> to the kernel format, or adapt the import code to the kernel.  On the
> other hand, if the new format is stable and/or versioned, it might help
> to prevent subtle bugs like this one--which was caused by a change to
> the export token format without any accompanying versioning.

    To be honest I had no time to figure out all details around
    krb5 kernel mechanism in Solaris. I was thinking about updating the
    kernel mechanism with newer bits from upstream. But I feel kind of
    scared to do it. There is just NFS test suite, which provides coverage
    and I'm just afraid it might not be enough.  The resources are bit
    stretched these days. It used to be 4-5 developers to take care of one
    component such as kerberos. The ratio is kind of inverted after all lay
    offs in our org.

    I think 'significant work for Oracle' should not matter. If the API
    will provide some extra belts. This particular bug was sitting there
    waiting to bite us for almost 5 years without being noticed.

thanks and
regards
sasha
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev