LMDB KDB module design notes

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

LMDB KDB module design notes

Greg Hudson
I have been considering how MIT krb5 might implement an LMDB KDB
module.

LMDB operations take place within read or write transactions.  Read
transactions do not block write transactions; instead, read transactions
delay the reclamation of pages obsoleted by write transactions.  This is
attractive for a KDB, as it means "kdb5_util dump" can take a snapshot
of the database without blocking password changes or administrative
operations.  (The DB2 module allows this with the "unlockiter" DB
option, but that option carries a noticeable performance penalty, causes
kdb5_util dump to write something which isn't exactly a snapshot, and is
probably open to rare edge cases where an admin deletes a principal
entry right as it's being iterated through.)

"kdb5_util load" is our one transactional write operation.  It calls
krb5_db_create() with the "temporary" DB option, puts principal and
policy entries, and then calls krb5_db_promote() to make the new KBD
visible.  The DB2 module handles this by creating side databases and
lockfiles with a "~" extension, and then renaming them into place.  For
this to work, each kdb_db2 operation needs to close and reopen the
database.

The three lockout fields of principal entries (last_success,
last_failed, and fail_auth_count) add additional complexity.  These
fields are updated by the KDC by default, and are not replicated in an
iprop setup.  iprop loads include the "merge_nra" DB option when
creating the side database, indicating that existing principal entries
should retain their current lockout attribute values.

Here is my general design framework, taking the above into
consideration:

* We use two MDB environments, setting the MDB_NOSUBDIR flag so that
  each environment is a pair of files instead of a subdirectory:

  - A primary environment (suffix ".mdb") containing a "policy" database
    holding policy entries and a "principal" database holding principal
    entries minus lockout fields.

  - A secondary environment (suffix ".lockout.mdb") containing a
    "lockout" database holding principal lockout fields.

  The KDC only needs to write to the lockout environment, and can open
  the primary environment read-only.

  The lockout environment is never emptied, never iterated over, and
  uses only short-lived transactions, so the KDC is never blocked more
  than briefly.

* For creations with the "temporary" DB option, instead of creating a
  side database, we open or create the usual environment files, begin a
  write transaction on the primary environment for the lifetime of the
  database context, and open and drop the principal and policy databases
  within that transaction.  put_principal and put_policy operations use
  the database context write transaction instead of creating short-lived
  ones.  When the database is promoted, we commit the write transaction
  and the load becomes visible.

  To maintain the low-contention nature of the lockout environment, we
  compromise on the transactionality of load operations for the lockout
  fields.  We do not empty the lockout database on a load and we write
  entries to it as put_principal operations occur during the load.
  Therefore:

  - updates to the lockout fields become visible immediately (for
    existing principal entries), instead of at the end of the load.

  - updates to the lockout fields remain visible (for existing principal
    entries) if the load operation is aborted.

  - since we don't empty the lockout database, we leave garbage entries
    behind for old principals which have disappeared from the dump file
    we loaded.

  I don't anticipate any of those behaviors being noticeable in
  practice.  We could provide a tool to remove the garbage entries in
  the lockout database if it becomes an issue for anyone.

* For iprop loads, we set a context flag if we see the "merge_nra" DB
  option at creation time.  If the context flag is set, put_principal
  operations check for existing entries in the lockout database before
  writing, and do nothing if an entry is already there.

* To iterate over principals or policies, we create a read transaction
  in the primary MDB environment for the lifetime of the cursor.  By
  default, LMDB only allows one transaction per environment per thread.
  This would break "kdb5_util update_princ_encryption", which does
  put_principal operations during iteration.  Therefore, we must specify
  the MDB_NOTLS flag in the primary environment.

  The MDB_NOTLS flag carries a performance penalty for the creation of
  read transactions.  To mitigate this penalty, we can save a read
  transaction handle in the DB context for get operations, using
  mdb_txn_reset() and mdb_txn_renew() between operations.

* The existing in-tree KDB modules allow simultaneous access to the same
  DB context by multiple threads, even though the KDC and kadmind are
  single-threaded and we don't allow krb5_context objects to be used by
  multiple threads simultaneously.  For the LMDB module, we will need to
  either synchronize the use of transaction handles, or document that it
  isn't thread-safe and will need mutexes added if it needs to be
  thread-safe in the future.

* LMDB files are capped at the memory map size, which is 10MB by
  default.  Heimdal exposes this as a configuration option and we should
  probably do the same; we might also want a larger default like 128MB.
  We will have to consider how to apply any default map size to the
  lockout environment as well as the primary environment.

* LMDB also has a configurable maximum number of readers.  The default
  of 126 is probably adequate for most deployments, but we again
  probably want a configuration option in case it needs to be raised.

* By default LMDB calls fsync() or fdatasync() for each committed write
  transaction.  This probably overshadows the performance benefits of
  LMDB versus DB2, in exchange for improved durability.  I think we will
  want to always set the MDB_NOSYNC flag for the lockout environment,
  and might need to add an option to set it for the primary environment.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Nathaniel McCallum-5
This seems reasonable. I'm glad to see MIT considering LMDB (my
experiences with it are positive).

On Mon, Apr 9, 2018 at 10:45 AM, Greg Hudson <[hidden email]> wrote:

> I have been considering how MIT krb5 might implement an LMDB KDB
> module.
>
> LMDB operations take place within read or write transactions.  Read
> transactions do not block write transactions; instead, read transactions
> delay the reclamation of pages obsoleted by write transactions.  This is
> attractive for a KDB, as it means "kdb5_util dump" can take a snapshot
> of the database without blocking password changes or administrative
> operations.  (The DB2 module allows this with the "unlockiter" DB
> option, but that option carries a noticeable performance penalty, causes
> kdb5_util dump to write something which isn't exactly a snapshot, and is
> probably open to rare edge cases where an admin deletes a principal
> entry right as it's being iterated through.)
>
> "kdb5_util load" is our one transactional write operation.  It calls
> krb5_db_create() with the "temporary" DB option, puts principal and
> policy entries, and then calls krb5_db_promote() to make the new KBD
> visible.  The DB2 module handles this by creating side databases and
> lockfiles with a "~" extension, and then renaming them into place.  For
> this to work, each kdb_db2 operation needs to close and reopen the
> database.
>
> The three lockout fields of principal entries (last_success,
> last_failed, and fail_auth_count) add additional complexity.  These
> fields are updated by the KDC by default, and are not replicated in an
> iprop setup.  iprop loads include the "merge_nra" DB option when
> creating the side database, indicating that existing principal entries
> should retain their current lockout attribute values.
>
> Here is my general design framework, taking the above into
> consideration:
>
> * We use two MDB environments, setting the MDB_NOSUBDIR flag so that
>   each environment is a pair of files instead of a subdirectory:
>
>   - A primary environment (suffix ".mdb") containing a "policy" database
>     holding policy entries and a "principal" database holding principal
>     entries minus lockout fields.
>
>   - A secondary environment (suffix ".lockout.mdb") containing a
>     "lockout" database holding principal lockout fields.
>
>   The KDC only needs to write to the lockout environment, and can open
>   the primary environment read-only.
>
>   The lockout environment is never emptied, never iterated over, and
>   uses only short-lived transactions, so the KDC is never blocked more
>   than briefly.
>
> * For creations with the "temporary" DB option, instead of creating a
>   side database, we open or create the usual environment files, begin a
>   write transaction on the primary environment for the lifetime of the
>   database context, and open and drop the principal and policy databases
>   within that transaction.  put_principal and put_policy operations use
>   the database context write transaction instead of creating short-lived
>   ones.  When the database is promoted, we commit the write transaction
>   and the load becomes visible.
>
>   To maintain the low-contention nature of the lockout environment, we
>   compromise on the transactionality of load operations for the lockout
>   fields.  We do not empty the lockout database on a load and we write
>   entries to it as put_principal operations occur during the load.
>   Therefore:
>
>   - updates to the lockout fields become visible immediately (for
>     existing principal entries), instead of at the end of the load.
>
>   - updates to the lockout fields remain visible (for existing principal
>     entries) if the load operation is aborted.
>
>   - since we don't empty the lockout database, we leave garbage entries
>     behind for old principals which have disappeared from the dump file
>     we loaded.
>
>   I don't anticipate any of those behaviors being noticeable in
>   practice.  We could provide a tool to remove the garbage entries in
>   the lockout database if it becomes an issue for anyone.
>
> * For iprop loads, we set a context flag if we see the "merge_nra" DB
>   option at creation time.  If the context flag is set, put_principal
>   operations check for existing entries in the lockout database before
>   writing, and do nothing if an entry is already there.
>
> * To iterate over principals or policies, we create a read transaction
>   in the primary MDB environment for the lifetime of the cursor.  By
>   default, LMDB only allows one transaction per environment per thread.
>   This would break "kdb5_util update_princ_encryption", which does
>   put_principal operations during iteration.  Therefore, we must specify
>   the MDB_NOTLS flag in the primary environment.
>
>   The MDB_NOTLS flag carries a performance penalty for the creation of
>   read transactions.  To mitigate this penalty, we can save a read
>   transaction handle in the DB context for get operations, using
>   mdb_txn_reset() and mdb_txn_renew() between operations.
>
> * The existing in-tree KDB modules allow simultaneous access to the same
>   DB context by multiple threads, even though the KDC and kadmind are
>   single-threaded and we don't allow krb5_context objects to be used by
>   multiple threads simultaneously.  For the LMDB module, we will need to
>   either synchronize the use of transaction handles, or document that it
>   isn't thread-safe and will need mutexes added if it needs to be
>   thread-safe in the future.
>
> * LMDB files are capped at the memory map size, which is 10MB by
>   default.  Heimdal exposes this as a configuration option and we should
>   probably do the same; we might also want a larger default like 128MB.
>   We will have to consider how to apply any default map size to the
>   lockout environment as well as the primary environment.
>
> * LMDB also has a configurable maximum number of readers.  The default
>   of 126 is probably adequate for most deployments, but we again
>   probably want a configuration option in case it needs to be raised.
>
> * By default LMDB calls fsync() or fdatasync() for each committed write
>   transaction.  This probably overshadows the performance benefits of
>   LMDB versus DB2, in exchange for improved durability.  I think we will
>   want to always set the MDB_NOSYNC flag for the lockout environment,
>   and might need to add an option to set it for the primary environment.
> _______________________________________________
> krbdev mailing list             [hidden email]
> https://mailman.mit.edu/mailman/listinfo/krbdev
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Robbie Harwood
In reply to this post by Greg Hudson
Greg Hudson <[hidden email]> writes:

> Here is my general design framework, taking the above into
> consideration:
>
> * We use two MDB environments, setting the MDB_NOSUBDIR flag so that
> each environment is a pair of files instead of a subdirectory:
>
>   - A primary environment (suffix ".mdb") containing a "policy"
>   database holding policy entries and a "principal" database holding
>   principal entries minus lockout fields.
>
>   - A secondary environment (suffix ".lockout.mdb") containing a
>   "lockout" database holding principal lockout fields.
>
>   The KDC only needs to write to the lockout environment, and can open
>   the primary environment read-only.
>
>   The lockout environment is never emptied, never iterated over, and
>   uses only short-lived transactions, so the KDC is never blocked more
>   than briefly.
Overall this design seems good to me.

It's hard to tell from the docs - is there a disadvantage to
MDB_NOSUBDIR?  It seems weird to have it as an option but not the
default.

> * For creations with the "temporary" DB option, instead of creating a
> side database, we open or create the usual environment files, begin a
> write transaction on the primary environment for the lifetime of the
> database context, and open and drop the principal and policy databases
> within that transaction.  put_principal and put_policy operations use
> the database context write transaction instead of creating short-lived
> ones.  When the database is promoted, we commit the write transaction
> and the load becomes visible.
>
>   To maintain the low-contention nature of the lockout environment, we
>   compromise on the transactionality of load operations for the
>   lockout fields.  We do not empty the lockout database on a load and
>   we write entries to it as put_principal operations occur during the
>   load.  Therefore:
>
>   - updates to the lockout fields become visible immediately (for
>   existing principal entries), instead of at the end of the load.
>
>   - updates to the lockout fields remain visible (for existing
>   principal entries) if the load operation is aborted.
>
>   - since we don't empty the lockout database, we leave garbage
>   entries behind for old principals which have disappeared from the
>   dump file we loaded.
>
>   I don't anticipate any of those behaviors being noticeable in
>   practice.  We could provide a tool to remove the garbage entries in
>   the lockout database if it becomes an issue for anyone.
The size is capped at the number of principals that have ever existed,
right?  I'm also not worried about it then.

> * The existing in-tree KDB modules allow simultaneous access to the same
>   DB context by multiple threads, even though the KDC and kadmind are
>   single-threaded and we don't allow krb5_context objects to be used by
>   multiple threads simultaneously.  For the LMDB module, we will need to
>   either synchronize the use of transaction handles, or document that it
>   isn't thread-safe and will need mutexes added if it needs to be
>   thread-safe in the future.



> * LMDB files are capped at the memory map size, which is 10MB by
> default.  Heimdal exposes this as a configuration option and we should
> probably do the same; we might also want a larger default like 128MB.
> We will have to consider how to apply any default map size to the
> lockout environment as well as the primary environment.

What will the failure modes look like on this?  Does LMDB return useful
information around the caps?

> * LMDB also has a configurable maximum number of readers.  The default
> of 126 is probably adequate for most deployments, but we again
> probably want a configuration option in case it needs to be raised.

Agreed.

> * By default LMDB calls fsync() or fdatasync() for each committed
> write transaction.  This probably overshadows the performance benefits
> of LMDB versus DB2, in exchange for improved durability.  I think we
> will want to always set the MDB_NOSYNC flag for the lockout
> environment, and might need to add an option to set it for the primary
> environment.

Agreed.  Primary will be needed, even if only for testing.

Thanks,
--Robbie

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Simo Sorce-3
In reply to this post by Greg Hudson
On Mon, 2018-04-09 at 10:45 -0400, Greg Hudson wrote:

> I have been considering how MIT krb5 might implement an LMDB KDB
> module.
>
> LMDB operations take place within read or write transactions.  Read
> transactions do not block write transactions; instead, read transactions
> delay the reclamation of pages obsoleted by write transactions.  This is
> attractive for a KDB, as it means "kdb5_util dump" can take a snapshot
> of the database without blocking password changes or administrative
> operations.  (The DB2 module allows this with the "unlockiter" DB
> option, but that option carries a noticeable performance penalty, causes
> kdb5_util dump to write something which isn't exactly a snapshot, and is
> probably open to rare edge cases where an admin deletes a principal
> entry right as it's being iterated through.)
>
> "kdb5_util load" is our one transactional write operation.  It calls
> krb5_db_create() with the "temporary" DB option, puts principal and
> policy entries, and then calls krb5_db_promote() to make the new KBD
> visible.  The DB2 module handles this by creating side databases and
> lockfiles with a "~" extension, and then renaming them into place.  For
> this to work, each kdb_db2 operation needs to close and reopen the
> database.
>
> The three lockout fields of principal entries (last_success,
> last_failed, and fail_auth_count) add additional complexity.  These
> fields are updated by the KDC by default, and are not replicated in an
> iprop setup.  iprop loads include the "merge_nra" DB option when
> creating the side database, indicating that existing principal entries
> should retain their current lockout attribute values.
>
> Here is my general design framework, taking the above into
> consideration:
>
> * We use two MDB environments, setting the MDB_NOSUBDIR flag so that
>   each environment is a pair of files instead of a subdirectory:
>
>   - A primary environment (suffix ".mdb") containing a "policy" database
>     holding policy entries and a "principal" database holding principal
>     entries minus lockout fields.
>
>   - A secondary environment (suffix ".lockout.mdb") containing a
>     "lockout" database holding principal lockout fields.
>
>   The KDC only needs to write to the lockout environment, and can open
>   the primary environment read-only.
>
>   The lockout environment is never emptied, never iterated over, and
>   uses only short-lived transactions, so the KDC is never blocked more
>   than briefly.

I am not a fan of setups that use multiple files for databases, especially when
transactions need to span multiple ones.
What is the underlying reason to do this in the new design instead of using a
single database file with all the data ?
 

> * For creations with the "temporary" DB option, instead of creating a
>   side database, we open or create the usual environment files, begin a
>   write transaction on the primary environment for the lifetime of the
>   database context, and open and drop the principal and policy databases
>   within that transaction.  put_principal and put_policy operations use
>   the database context write transaction instead of creating short-lived
>   ones.  When the database is promoted, we commit the write transaction
>   and the load becomes visible.
>
>   To maintain the low-contention nature of the lockout environment, we
>   compromise on the transactionality of load operations for the lockout
>   fields.  We do not empty the lockout database on a load and we write
>   entries to it as put_principal operations occur during the load.
>   Therefore:
>
>   - updates to the lockout fields become visible immediately (for
>     existing principal entries), instead of at the end of the load.
>
>   - updates to the lockout fields remain visible (for existing principal
>     entries) if the load operation is aborted.
>
>   - since we don't empty the lockout database, we leave garbage entries
>     behind for old principals which have disappeared from the dump file
>     we loaded.
>
>   I don't anticipate any of those behaviors being noticeable in
>   practice.  We could provide a tool to remove the garbage entries in
>   the lockout database if it becomes an issue for anyone.
>
> * For iprop loads, we set a context flag if we see the "merge_nra" DB
>   option at creation time.  If the context flag is set, put_principal
>   operations check for existing entries in the lockout database before
>   writing, and do nothing if an entry is already there.
>
> * To iterate over principals or policies, we create a read transaction
>   in the primary MDB environment for the lifetime of the cursor.  By
>   default, LMDB only allows one transaction per environment per thread.
>   This would break "kdb5_util update_princ_encryption", which does
>   put_principal operations during iteration.  Therefore, we must specify
>   the MDB_NOTLS flag in the primary environment.
>
>   The MDB_NOTLS flag carries a performance penalty for the creation of
>   read transactions.  To mitigate this penalty, we can save a read
>   transaction handle in the DB context for get operations, using
>   mdb_txn_reset() and mdb_txn_renew() between operations.
>
> * The existing in-tree KDB modules allow simultaneous access to the same
>   DB context by multiple threads, even though the KDC and kadmind are
>   single-threaded and we don't allow krb5_context objects to be used by
>   multiple threads simultaneously.  For the LMDB module, we will need to
>   either synchronize the use of transaction handles, or document that it
>   isn't thread-safe and will need mutexes added if it needs to be
>   thread-safe in the future.
>
> * LMDB files are capped at the memory map size, which is 10MB by
>   default.  Heimdal exposes this as a configuration option and we should
>   probably do the same; we might also want a larger default like 128MB.
>   We will have to consider how to apply any default map size to the
>   lockout environment as well as the primary environment.
>
> * LMDB also has a configurable maximum number of readers.  The default
>   of 126 is probably adequate for most deployments, but we again
>   probably want a configuration option in case it needs to be raised.
>
> * By default LMDB calls fsync() or fdatasync() for each committed write
>   transaction.  This probably overshadows the performance benefits of
>   LMDB versus DB2, in exchange for improved durability.  I think we will
>   want to always set the MDB_NOSYNC flag for the lockout environment,
>   and might need to add an option to set it for the primary environment.
> _______________________________________________
> krbdev mailing list             [hidden email]
> https://mailman.mit.edu/mailman/listinfo/krbdev

--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Greg Hudson
On 04/12/2018 08:03 AM, Simo Sorce wrote:
>>   The lockout environment is never emptied, never iterated over, and
>>   uses only short-lived transactions, so the KDC is never blocked more
>>   than briefly.
>
> I am not a fan of setups that use multiple files for databases, especially when
> transactions need to span multiple ones.
> What is the underlying reason to do this in the new design instead of using a
> single database file with all the data ?

Transactions are per-environment, so if we use one database file and a
write transaction for loads, loads would block the KDC.  That's worse
than what we have with DB2.

Alternatively we could load into a temporary database file and rename it
into place like we do with DB2.  But we would then have to close and
reopen the database between operations like we do with DB2, or somehow
signal processes that have the database open to reopen it after a load
completes.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Simo Sorce-3
On Thu, 2018-04-12 at 10:59 -0400, Greg Hudson wrote:

> On 04/12/2018 08:03 AM, Simo Sorce wrote:
> > >   The lockout environment is never emptied, never iterated over, and
> > >   uses only short-lived transactions, so the KDC is never blocked more
> > >   than briefly.
> >
> > I am not a fan of setups that use multiple files for databases, especially when
> > transactions need to span multiple ones.
> > What is the underlying reason to do this in the new design instead of using a
> > single database file with all the data ?
>
> Transactions are per-environment, so if we use one database file and a
> write transaction for loads, loads would block the KDC.  That's worse
> than what we have with DB2.
>
> Alternatively we could load into a temporary database file and rename it
> into place like we do with DB2.  But we would then have to close and
> reopen the database between operations like we do with DB2, or somehow
> signal processes that have the database open to reopen it after a load
> completes.

How common are loads ?
As far as I know LMDB will let you keep reading during a transaction, so
the KDC would block only if there are write operations, but won't block
in general, right ?

The only write operations are on AS requests, when lockout are enabled
and when that triggers a change in the lockout fields. How common is that ?
Would that something that can be mitigated by deferring those writes during
transactions ?

Simo.

--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Greg Hudson
On 04/12/2018 11:56 AM, Simo Sorce wrote:

>> Transactions are per-environment, so if we use one database file and a
>> write transaction for loads, loads would block the KDC.  That's worse
>> than what we have with DB2.
>>
>> Alternatively we could load into a temporary database file and rename it
>> into place like we do with DB2.  But we would then have to close and
>> reopen the database between operations like we do with DB2, or somehow
>> signal processes that have the database open to reopen it after a load
>> completes.
>
> How common are loads ?

That's hard to predict, but for a large database, having the KDC block
for the lifetime of a load operation seems like a pretty noticeable problem.

> As far as I know LMDB will let you keep reading during a transaction, so
> the KDC would block only if there are write operations, but won't block
> in general, right ?

Yes.

> The only write operations are on AS requests, when lockout are enabled
> and when that triggers a change in the lockout fields. How common is that ?

By default, every successful AS request on a principal requiring preauth
updates the last_success timestamp.  If disable_last_success is set (but
disable_lockout is not), only failed AS requests would cause a KDC write.

> Would that something that can be mitigated by deferring those writes during
> transactions ?

I don't see a way in LMDB to check for a write transaction, or begin a
write transaction without blocking.  Queueing those writes to be
performed later would also add a lot of complexity.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Andrew Bartlett
In reply to this post by Simo Sorce-3
On Thu, 2018-04-12 at 08:03 -0400, Simo Sorce wrote:
> I am not a fan of setups that use multiple files for databases, especially when
> transactions need to span multiple ones.
> What is the underlying reason to do this in the new design instead of using a
> single database file with all the data ?

I just lurk here, but I have to agree with Simo here from Samba
experience.  Be very careful about lock ordering between multiple
databases.  

Andrew Bartlett

--
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Greg Hudson
In reply to this post by Robbie Harwood
I have prototype code for this design which passes the test suite
(temporarily modified to create LMDB KDBs for Python and tests/dejagnu,
and with a few BDB-specific tests skipped).  I'm working on polishing it
and adding documentation and proper tests.

On 04/10/2018 12:47 PM, Robbie Harwood wrote:
> It's hard to tell from the docs - is there a disadvantage to
> MDB_NOSUBDIR?  It seems weird to have it as an option but not the
> default.

LMDB uses two files per database.  By default, they have the suffixes
"/data.mdb" and "/lock.mdb"; with MDB_NOSUBDIR, they have the suffixes
"" and "-lock".  The default has the advantage that it uses exactly the
directory entry given to it and no others, though it is up to the
consumer to create the directory.

>From our perspective, the main drawback of MDB_NOSUBDIR is that our
destroy method needs to just know about the MDB_NOSUBDIR suffixes in
order to clean up the files.  If we used the default, we could nuke the
directory (annoyingly hard to do in C) with no special knowledge.

>> * LMDB files are capped at the memory map size, which is 10MB by
>> default.  Heimdal exposes this as a configuration option and we should
>> probably do the same; we might also want a larger default like 128MB.
>> We will have to consider how to apply any default map size to the
>> lockout environment as well as the primary environment.
>
> What will the failure modes look like on this?  Does LMDB return useful
> information around the caps?

With my prototype code, an admin would see something like:

add_principal: LMDB write failure (path: /me/krb5/build/testdir/db.mdb):
MDB_MAP_FULL: Environment mapsize limit reached while creating
"[hidden email]".

where the "MDB_MAP_FULL...reached" part comes from mdb_strerror().  We
could intercept MDB_MAP_FULL and say something else there.

I measured that each principal entry takes about 430 bytes in the main
environment (with the default of AES-128 and AES-256 keys, and a name
length of about 22 bytes) and about 100 bytes in the lockout
environment.  With these lengths, a 128MB map size for the main
environment would accomodate around 300K principal entries.  The LMDB
default of 10MB would accomodate around 25K entries.

>> * By default LMDB calls fsync() or fdatasync() for each committed
>> write transaction.  This probably overshadows the performance benefits
>> of LMDB versus DB2, in exchange for improved durability.  I think we
>> will want to always set the MDB_NOSYNC flag for the lockout
>> environment, and might need to add an option to set it for the primary
>> environment.
>
> Agreed.  Primary will be needed, even if only for testing.

I haven't added a nosync option to my prototype code yet, and the test
suite didn't seem painfully slow using LMDB.  But I will likely add it
and use it for testing anyway.

Without adding another message to the thread, I will address Andrew
Bartlett's concern about locking here:

> I just lurk here, but I have to agree with Simo here from Samba
> experience.  Be very careful about lock ordering between multiple
> databases.  

In this design, transactions on the lockout environment are all
ephemeral, consisting of at most one get and one put.  There is no
iteration over it and no need to consult the primary environment during
a lockout transaction.  So I don't think deadlock is a concern.

If we ever supply a tool to collect garbage entries in the lockout
database, that tool would hold open a read transaction to iterate over
the lockout DB and do gets (to test existence) on the primary
environment as it went.  But since read transactions don't block other
transactions in LMDB, there is still no deadlock risk.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Nathaniel McCallum-5
In reply to this post by Greg Hudson
On Thu, Apr 12, 2018 at 12:34 PM, Greg Hudson <[hidden email]> wrote:
> On 04/12/2018 11:56 AM, Simo Sorce wrote:
>>> Transactions are per-environment, so if we use one database file and a
>>> write transaction for loads, loads would block the KDC.  That's worse
>>> than what we have with DB2.

Could loads be segmented into chunks to avoid blocking the KDC for the
entire operation?

>>> Alternatively we could load into a temporary database file and rename it
>>> into place like we do with DB2.  But we would then have to close and
>>> reopen the database between operations like we do with DB2, or somehow
>>> signal processes that have the database open to reopen it after a load
>>> completes.
>>
>> How common are loads ?
>
> That's hard to predict, but for a large database, having the KDC block
> for the lifetime of a load operation seems like a pretty noticeable problem.
>
>> As far as I know LMDB will let you keep reading during a transaction, so
>> the KDC would block only if there are write operations, but won't block
>> in general, right ?
>
> Yes.
>
>> The only write operations are on AS requests, when lockout are enabled
>> and when that triggers a change in the lockout fields. How common is that ?
>
> By default, every successful AS request on a principal requiring preauth
> updates the last_success timestamp.  If disable_last_success is set (but
> disable_lockout is not), only failed AS requests would cause a KDC write.

Could you create an opportunistic write queue for these ? This would
unblock the KDC during loads. The cost would be that the
aforementioned writes would not occur until the ends of the loads.
Slightly stale data is probably not a big deal for (at least)
last_success.

>> Would that something that can be mitigated by deferring those writes during
>> transactions ?
>
> I don't see a way in LMDB to check for a write transaction, or begin a
> write transaction without blocking.  Queueing those writes to be
> performed later would also add a lot of complexity.
> _______________________________________________
> krbdev mailing list             [hidden email]
> https://mailman.mit.edu/mailman/listinfo/krbdev
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev
Reply | Threaded
Open this post in threaded view
|

Re: LMDB KDB module design notes

Greg Hudson
On 04/15/2018 04:51 PM, Nathaniel McCallum wrote:
> On Thu, Apr 12, 2018 at 12:34 PM, Greg Hudson <[hidden email]> wrote:
>> On 04/12/2018 11:56 AM, Simo Sorce wrote:
>>>> Transactions are per-environment, so if we use one database file and a
>>>> write transaction for loads, loads would block the KDC.  That's worse
>>>> than what we have with DB2.
>
> Could loads be segmented into chunks to avoid blocking the KDC for the
> entire operation?

I don't believe so.  If loads were purely additive, sure, but they also
delete the entries not present in the file.

> Could you create an opportunistic write queue for these ? This would
> unblock the KDC during loads. The cost would be that the
> aforementioned writes would not occur until the ends of the loads.
> Slightly stale data is probably not a big deal for (at least)
> last_success.

I would like to see an objection more substantial than "I don't like
having multiple DB files" before thinking about adding something as
complicated as a KDC database write queue.  We've been living with
multiple DB files in the DB2 back end (for principals and policies) for
the lifetime of the project.
_______________________________________________
krbdev mailing list             [hidden email]
https://mailman.mit.edu/mailman/listinfo/krbdev