proflib: krb5.conf lexer proposal

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

proflib: krb5.conf lexer proposal

Joseph Calzaretta
Now that America's extended harvest holiday has ended, I'd like to pick up
this thread.  Here is a summary of where we are at this point, in my opinion.

New Lexer for krb5.conf Files

The existing profile library has issues which warrant rewriting it.  This
is an opportunity to improve the parser to handle input in a more intuitive
and general way (e.g., allowing comments within curly braces, allowing the
specification of arbitrary character sequences as section names, relation
tags, and relation values, etc.) while still maintaining backwards
compatibility with existing krb5.conf files.

My lexer proposal of Nov 21 (
http://mailman.mit.edu/pipermail/krbdev/2005-November/003892.html ) may be
a component of such an improved parser.  Maybe it's not.  Any change made
to the parser's behavior may result in -someone's- krb5.conf file suddenly
being interpreted differently, and therefore incorrectly (from the point of
view of that person).   The proposed lexer might be acceptable, and only
cause problems with pathologically misformatted files that we don't intend
to support.  Or it might be too ambitious and break or risk breaking too
many existing systems.  My main purpose in presenting that lexer is to find
out your opinions on whether or not this lexer is acceptable or too ambitious.

In my opinion, the main issue centers around what possible characters
appear, unquoted, as relation values (text tokens after the '=') in
supported krb5.conf files.  For example, if the string "foo[bar]" appears
as an unquoted relation value, (baz = foo[bar] as opposed to baz =
"foo[bar]") then the lexer must allow for the characters '[' and ']' to
appear unquoted in text tokens, at least in certain circumstances.  From my
research, it seems that most relation values consist of alphanumeric
characters plus dashes, underscores, and single spaces.  However, the
auth_to_local relation values seem to contain '[', ']', and ';' characters,
as well as some other punctuation marks I'm not too concerned about because
I treat them as text anyway.

The proposed lexer assumes that text tokens do not contain (unquoted):
   '=' (equal signs)
   '{' (open curly braces)
   '}' (close curly braces)
   '[' (open square brackets) preceded by whitespace.  (So "foo[bar" can be
a text token but "foo [bar" is not)
   ']' (close square brackets) followed by whitepace.  (So "foo]bar" can be
a text token but "foo] bar" is not)
   '#' (hash/pound signs) preceded by whitespace.  (So "foo#bar" can be a
text token but "foo #bar" is not)
   ';' (semicolons) preceded by whitespace. (So "foo;bar" can be a text
token but "foo ;bar" is not)

If these assumptions are invalid, or if there are other concerns about how
the lexer tokenizes krb5.conf files, please let me know.  Thanks!

Yours,

Joe Calzaretta
Software Development & Integration Team
MIT Information Services & Technology

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: proflib: krb5.conf lexer proposal

Joseph Calzaretta
Hello,

I've made a minor change to this lexer (
http://mailman.mit.edu/pipermail/krbdev/2005-November/003892.html ), due to
the existince of "FILE=" and "DEVICE=" relation values under [logging] in
krb5.conf files.  This means that an unquoted equal sign is not always
treated as a special character (even though I wouldn't be comfortable using
an equal sign in a text token with today's parser).  So, at least for now:

The new lexer does this:
>The proposed lexer assumes that text tokens do not contain (unquoted):
>   '=' (equal signs) preceded or followed by whitepsace.  (So "foo=bar"
> can be a text token but neither "foo= bar" nor "foo =bar" are text tokens)
and specifically:

>Tokens of type TT_TEXT follow a more complicated rule:
>    When outside a quoted string, they terminate in any of the following
> cases:
>    (1)  when an LB character is encountered.  The token is terminated
> just before the LB.
>    (2)  when an '{' or  '}' character is encountered.  The token is
> terminated just before the '{' or '}'.
>    (3)  when an ']', '#', ';', or '=' is encountered, followed by an LW
> character.  The token is terminated just before the ']', '#', ';', or '='.
>    (4)  when an WS character is encountered, followed by an '[' or
> '='.  The token is terminated just before the '[' or '='.

Earlier I said:
>The (old) proposed lexer assumes that text tokens do not contain
>(unquoted) '=' (equal signs)
and:
>Tokens of type TT_TEXT follow a more complicated rule:
>    When outside a quoted string, they terminate in any of the following
> cases:
>    (2)  when an '{', '}', or '=' character is encountered.  The token is
> terminated just before the '{', '}', or '='.

--Joe

Joe Calzaretta
Software Development & Integration Team
MIT Information Services & Technology


_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: proflib: krb5.conf lexer proposal

Nicolas Williams
On Thu, Dec 01, 2005 at 03:44:10PM -0500, Joseph Calzaretta wrote:
> Hello,
>
> I've made a minor change to this lexer (
> http://mailman.mit.edu/pipermail/krbdev/2005-November/003892.html ), due to
> the existince of "FILE=" and "DEVICE=" relation values under [logging] in
> krb5.conf files.  This means that an unquoted equal sign is not always
> treated as a special character (even though I wouldn't be comfortable using
> an equal sign in a text token with today's parser).  So, at least for now:

Can you treat '=' without whitespace on either side as not a special
token?
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: proflib: krb5.conf lexer proposal

Joseph Calzaretta
Yes, exactly.  That's what I thought I was saying.  :)

--Joe

At 03:44 PM 12/1/2005, Joseph Calzaretta wrote:
>The new lexer assumes that text tokens do not contain (unquoted) '='
>(equal signs) preceded or followed by whitepsace.
>(So "foo=bar" can be a text token but neither "foo= bar" nor "foo =bar"
>are text tokens)

At 04:12 PM 12/1/2005, Nicolas Williams wrote:
>Can you treat '=' without whitespace on either side as not a special token?

Joe Calzaretta
Software Development & Integration Team
MIT Information Services & Technology

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: proflib: krb5.conf lexer proposal

Nicolas Williams
On Thu, Dec 01, 2005 at 04:17:00PM -0500, Joseph Calzaretta wrote:
> Yes, exactly.  That's what I thought I was saying.  :)

I didn't read the quoted new text -- the leading '>' made think it was
old text.  Sorry.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev