Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Stefan (metze) Metzmacher
Hi Robbie,

is there a reson why you still limit this to USC-2 and not use full UTF16
support?

We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

For now we worked arround it in Samba by limiting random machine passwords
to unicode codepoints up to 0xffff.

metze

> https://github.com/krb5/krb5/commit/c4e8d444632140ecb47f31df133c0657f07f9be0
> commit c4e8d444632140ecb47f31df133c0657f07f9be0
> Author: Robbie Harwood <[hidden email]>
> Date:   Thu Apr 6 12:15:39 2017 -0400
>
>     Modernize UTF-8/UCS-2 conversion code
>    
>     Remove unused entry points as we only need to convert between
>     little-endian UCS-2 byte buffers and UTF-8.  Rename and simplify the
>     remaining two function contracts.  Avoid pointer alignment and
>     endianness issues by operating on byte buffers and using store_16_le()
>     and load_16_le().  Avoid two-pass operation using k5buf.
>    
>     [[hidden email]: simplified code using k5buf; simplified function
>     names and contracts; rewrote commit message]

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Robbie Harwood
Stefan Metzmacher <[hidden email]> writes:

> is there a reson why you still limit this to USC-2 and not use full UTF16
> support?

I'm not sure what you're asking.  The commit in question fixes several
code hygene problems; it does not add any new interfaces, and the only
fixes are related to memory safety.

> We had a lot of trouble with the limited unicode support of krb5 libraries
> in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262
>
> For now we worked arround it in Samba by limiting random machine passwords
> to unicode codepoints up to 0xffff.

I guess the answer is that I didn't know there was a bug?  Have you
filed something in the krb5 bugtracker?  I don't see any linked from
your bugzilla, just the workaround.


_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Greg Hudson
In reply to this post by Stefan (metze) Metzmacher
On 04/17/2017 03:24 PM, Stefan Metzmacher wrote:
> is there a reson why you still limit this to USC-2 and not use full UTF16
> support?

This commit was just trying to eliminate warnings, not change the
functionality.

This code originally came from OpenLDAP's utf-8-conv.c, and was
incorporated with the goal of Windows compatibility (for PACs and RC4
string-to-key).  If Windows now supports UTF-16 then we could consider
extending it, but I wouldn't want to do so without confirmation of that.

> We had a lot of trouble with the limited unicode support of krb5 libraries
> in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

Thanks for the pointer.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Andrew Bartlett
On Mon, 2017-04-17 at 15:53 -0400, Greg Hudson wrote:

> On 04/17/2017 03:24 PM, Stefan Metzmacher wrote:
> > is there a reson why you still limit this to USC-2 and not use full
> > UTF16
> > support?
>
> This commit was just trying to eliminate warnings, not change the
> functionality.
>
> This code originally came from OpenLDAP's utf-8-conv.c, and was
> incorporated with the goal of Windows compatibility (for PACs and RC4
> string-to-key).  If Windows now supports UTF-16 then we could
> consider
> extending it, but I wouldn't want to do so without confirmation of
> that.

Windows has been UTF-16 for quite a long time now.

Andrew Bartlett
--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Stefan (metze) Metzmacher
In reply to this post by Greg Hudson
Am 17.04.2017 um 21:53 schrieb Greg Hudson:
> On 04/17/2017 03:24 PM, Stefan Metzmacher wrote:
>> is there a reson why you still limit this to USC-2 and not use full UTF16
>> support?
>
> This commit was just trying to eliminate warnings, not change the
> functionality.

Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
rewrite of the code to me. As it's not you may just ignore this topic
for now.

> This code originally came from OpenLDAP's utf-8-conv.c, and was
> incorporated with the goal of Windows compatibility (for PACs and RC4
> string-to-key).  If Windows now supports UTF-16 then we could consider
> extending it, but I wouldn't want to do so without confirmation of that.

I think Windows always supported UTF-16 instead of USC-2, at least it
did for the last 10 years (at least for passwords).

>> We had a lot of trouble with the limited unicode support of krb5 libraries
>> in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

The core of the problem is that Windows uses an random buffer as machine
password,
and interprets it as UTF16 (which is not always valid UTF-16).
The MD4 of the raw buffer is the NTHASH/RC4-key.
In order to calculate the AES-keys the buffer is converted to (valid)
UTF-8 first.
The following rules are applied to the raw buffer first:
1) any 0x0000 characters are mapped to 0x0001
2) convert any instance of 0xD800 - 0xDBFF (high surrogate)
   without an immediately following 0xDC00 - 0x0xDFFF (low surrogate) to
   U+FFFD (OBJECT REPLACEMENT CHARACTER).
3) the same for any low surrogate that was not preceded by a high surrogate.

Windows stores the raw buffer as master copy of the password,
while Samba currently stores the UTF-8 password.

If we pass the UTF-8 to the krb5 libraries we may calculate a different
NTHASH,
because the transition from the random (UTF-16-like) buffer to valid UTF-8
and back to valid UTF-16 looses information.

So our current solution is limiting the machine password to valid utf-8,
with codepoints up to 0xffff.

In future we may try to do pass a keytab instead of password to the krb5
libraries
and calculate the keys ourself in order to avoid these problems.

So there's currently no immediate action required.

metze


_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Greg Hudson
On 04/18/2017 04:32 AM, Stefan Metzmacher wrote:
> Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
> rewrite of the code to me. As it's not you may just ignore this topic
> for now.

Yeah, in this case "Modernize" referred to coding standards, not
functionality.  We're not always consistent about that (e.g. a recent
commit "Modernize UTF-8 conversions" was about the maximum Unicode code
point), but we can't always be perfectly clear in the summary line as we
need to keep it short.

I have written up UTF-16 support for these conversion functions, but
we'll need test cases in order to be confident that they work.  Do you
or anyone else know of a body of test cases that I might crib from?

> I think Windows always supported UTF-16 instead of USC-2, at least it
> did for the last 10 years (at least for passwords).

Wikipedia thinks it added UTF-16 support in Windows 2000, so yeah, quite
some time.

> The core of the problem [...]

Thanks for that explanation.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Simo Sorce-3
In reply to this post by Stefan (metze) Metzmacher
On Tue, 2017-04-18 at 10:32 +0200, Stefan Metzmacher wrote:

> Am 17.04.2017 um 21:53 schrieb Greg Hudson:
> > On 04/17/2017 03:24 PM, Stefan Metzmacher wrote:
> >> is there a reson why you still limit this to USC-2 and not use full UTF16
> >> support?
> >
> > This commit was just trying to eliminate warnings, not change the
> > functionality.
>
> Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
> rewrite of the code to me. As it's not you may just ignore this topic
> for now.
>
> > This code originally came from OpenLDAP's utf-8-conv.c, and was
> > incorporated with the goal of Windows compatibility (for PACs and RC4
> > string-to-key).  If Windows now supports UTF-16 then we could consider
> > extending it, but I wouldn't want to do so without confirmation of that.
>
> I think Windows always supported UTF-16 instead of USC-2, at least it
> did for the last 10 years (at least for passwords).
>
> >> We had a lot of trouble with the limited unicode support of krb5 libraries
> >> in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262
>
> The core of the problem is that Windows uses an random buffer as machine
> password,
> and interprets it as UTF16 (which is not always valid UTF-16).
> The MD4 of the raw buffer is the NTHASH/RC4-key.

Would it make sense to provide an alternative interface to calculate the
RC4-key that takes a random buffer instead of utf-8, so we do not have
to deal with awkward conversions or restricting how we generate this
buffer ?

Simo.

> In order to calculate the AES-keys the buffer is converted to (valid)
> UTF-8 first.
> The following rules are applied to the raw buffer first:
> 1) any 0x0000 characters are mapped to 0x0001
> 2) convert any instance of 0xD800 - 0xDBFF (high surrogate)
>    without an immediately following 0xDC00 - 0x0xDFFF (low surrogate) to
>    U+FFFD (OBJECT REPLACEMENT CHARACTER).
> 3) the same for any low surrogate that was not preceded by a high surrogate.
>
> Windows stores the raw buffer as master copy of the password,
> while Samba currently stores the UTF-8 password.
>
> If we pass the UTF-8 to the krb5 libraries we may calculate a different
> NTHASH,
> because the transition from the random (UTF-16-like) buffer to valid UTF-8
> and back to valid UTF-16 looses information.
>
> So our current solution is limiting the machine password to valid utf-8,
> with codepoints up to 0xffff.
>
> In future we may try to do pass a keytab instead of password to the krb5
> libraries
> and calculate the keys ourself in order to avoid these problems.
>
> So there's currently no immediate action required.
>
> metze
>
> _______________________________________________
> krbdev mailing list             [hidden email]
> https://mailman.mit.edu/mailman/listinfo/krbdev


--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: krb5 commit: Modernize UTF-8/UCS-2 conversion code

Stefan (metze) Metzmacher
Am 25.04.2017 um 19:31 schrieb Simo Sorce:

> On Tue, 2017-04-18 at 10:32 +0200, Stefan Metzmacher wrote:
>> Am 17.04.2017 um 21:53 schrieb Greg Hudson:
>>> On 04/17/2017 03:24 PM, Stefan Metzmacher wrote:
>>>> is there a reson why you still limit this to USC-2 and not use full UTF16
>>>> support?
>>>
>>> This commit was just trying to eliminate warnings, not change the
>>> functionality.
>>
>> Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
>> rewrite of the code to me. As it's not you may just ignore this topic
>> for now.
>>
>>> This code originally came from OpenLDAP's utf-8-conv.c, and was
>>> incorporated with the goal of Windows compatibility (for PACs and RC4
>>> string-to-key).  If Windows now supports UTF-16 then we could consider
>>> extending it, but I wouldn't want to do so without confirmation of that.
>>
>> I think Windows always supported UTF-16 instead of USC-2, at least it
>> did for the last 10 years (at least for passwords).
>>
>>>> We had a lot of trouble with the limited unicode support of krb5 libraries
>>>> in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262
>>
>> The core of the problem is that Windows uses an random buffer as machine
>> password,
>> and interprets it as UTF16 (which is not always valid UTF-16).
>> The MD4 of the raw buffer is the NTHASH/RC4-key.
>
> Would it make sense to provide an alternative interface to calculate the
> RC4-key that takes a random buffer instead of utf-8, so we do not have
> to deal with awkward conversions or restricting how we generate this
> buffer ?
I think using a client side keytab is the way to avoid the conversation.

metze


_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

signature.asc (853 bytes) Download Attachment