Berkeley DB error PANIC

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Berkeley DB error PANIC

Jacques Henry
Hello,

This morning this is what I've encountered on the KDC using kadmin:

/usr/sbin/kadmin -l add --random-password --use-defaults mysuer
/var/lib/heimdal-kdc/heimdal.db page 81 is on free list with type 5
PANIC: Invalid argument
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
kadmin: kadm5_create_principal: No such file or directory
kadmin: adding myuser: No such file or directory


Listing the principals (list -s *) only showed me about a 10% of what I had until them.
However, PKININT still worked, even for users which didn't show up in the principals listing.

Fortunately I have a slave server. I was able to copy the heimdal.db file (I use the same master key) to the master and everything went back to normal... (well I think)

However that's something that should never happened... Why ???

My setup is the following:
Debian 7.8 and heimdal-kdc 1.6~git20120403+dfsg1-2

Is there a way to prevent this? Or at least can I monitor the database (using Nagios, as script, ...)? Unfortunately this error didn't showed up in /var/log/heimdal-kdc.log...
I've made a copy of the "corrupted" heimdal.db file, can I "repair" it?

Thanks in advance.

Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Thomas M. Payerle-3
On Wed, 27 May 2015, Jacques Henry wrote:

> Hello,
>
> This morning this is what I've encountered on the KDC using kadmin:
>
> /usr/sbin/kadmin -l add --random-password --use-defaults mysuer
> /var/lib/heimdal-kdc/heimdal.db page 81 is on free list with type 5
> PANIC: Invalid argument
> PANIC: fatal region error detected; run recovery
> PANIC: fatal region error detected; run recovery
> PANIC: fatal region error detected; run recovery
> kadmin: kadm5_create_principal: No such file or directory
> kadmin: adding myuser: No such file or directory
>
> Listing the principals (list -s *) only showed me about a 10% of what I had until them.
> However, PKININT still worked, even for users which didn't show up in the principals listing.
>
> Fortunately I have a slave server. I was able to copy the heimdal.db file (I use the same master key) to the master and everything went back to normal... (well I think)
>
> However that's something that should never happened... Why ???
>
> My setup is the following:
> Debian 7.8 and heimdal-kdc 1.6~git20120403+dfsg1-2
>
> Is there a way to prevent this? Or at least can I monitor the database (using Nagios, as script, ...)? Unfortunately this error didn't showed up in /var/log/heimdal-kdc.log...
> I've made a copy of the "corrupted" heimdal.db file, can I "repair" it?

Not sure if/how to prevent this; presumably the root cause is a bug in some code
(heimdal or BerkeleyDB libs; are you at the latest BerkeleyDB version?).

We do the following to backup and monitor the integrity of our heimdal databases:
every hour on the master KDC we have a cron job do a kadmin -l dump to
a dumpfile.

Every day, in the middle of the night, we stop the heimdal service on the
master KDC and copy the database and database log files.  Restart kerberos,
and then initiate a backup.  We also run a BerkeleyDB db_verify on the
copy of the heimdal.db file, and send an alert out if this detects any issues.

We have also recently started to (using kadmin -l) get a list of all principals,
kvno, and last modification times on ALL the KDCs (master + slaves) in the middle
of the night, and compare them, reporting on discrepencies.  (We recently discovered
some issues with differences between master and slave versions of the heimdal.db,
discussed in a thread a few weeks back.  Seem to be do to locking issues with the
heimdal DB (we are running a 1.5.2 release), and poorly written local account activation
script which is tickling some of those).

You might be able to repair the "corrupted" heimdal.db file with standard Berkeley
DB tools (e.g. db_dump and db_load), but it may lose records.  Your copy from the slave
is probably more reliable, and of course any updates since the incident will only be in the latter.

In the past when db_verify noticed issues with the heimdal.db, I have had success
with doing a kadmin -l dump followed by a kadmin -l load; but in those cases we
were able to see all the principals (comparing to the principal list on the slaves,
etc).  As far as I can tell (from kadmin -l dumps afterwards), the result was the
same as a db_dump and db_load.


>
> Thanks in advance.
>
>
>

Tom Payerle
IT-ETI-EUS [hidden email]
University of Maryland (301) 405-6135
College Park, MD 20742-4111
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Nico Williams
In reply to this post by Jacques Henry
On Wed, May 27, 2015 at 05:52:05PM +0200, Jacques Henry wrote:
> However that's something that should never happened... Why ???

I don't know, but do run db_verify and db_recover -- what do they say?

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Russ Allbery-2
In reply to this post by Jacques Henry
Jacques Henry <[hidden email]> writes:

> However that's something that should never happened... Why ???

I think it happens because the version of Berkeley DB in use here is
ancient and not horribly well-written code.  It's very old, so most of the
bugs have been hammered out, but I've had Berkeley DB corrupt KDC
databases in the past.  It doesn't particularly surprise me when it
happens.

My past experience was that it was most likely to happen when making lots
of changes with kadmin -l very quickly.

> Is there a way to prevent this? Or at least can I monitor the database
> (using Nagios, as script, ...)? Unfortunately this error didn't showed up
> in /var/log/heimdal-kdc.log...

Run kadmin -l list \* and make sure it completes successfully and doesn't
either error out or go into an infinite loop.  It's a good Nagios probe.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Nico Williams
On Wed, May 27, 2015 at 10:31:59PM -0700, Russ Allbery wrote:
> Jacques Henry <[hidden email]> writes:
>
> > However that's something that should never happened... Why ???
>
> I think it happens because the version of Berkeley DB in use here is
> ancient and not horribly well-written code.  It's very old, so most of the
> bugs have been hammered out, but I've had Berkeley DB corrupt KDC
> databases in the past.  It doesn't particularly surprise me when it
> happens.

What version of Berkeley DB are you (Jacques) using?

Heimdal (in master anyways; haven't checked 1.5) supports multiple
versions of BDB, as well as SQLite3.

> My past experience was that it was most likely to happen when making lots
> of changes with kadmin -l very quickly.

Looking at libhdb and callers, I don't see a bug there causing Heimdal
to lose a lock and keep writing.

The SQLite3 backend does its own locking, so it should be safe even if
Heimdal did have a locking problem.

kadmin -l (in master) normally opens and closes the HDB around every
update, unless you lock the HDB, in which case it holds the HDB open and
loked for the duration.

FYI, I've got patches to fix the iprop log/HDB update races (which
cannot cause HDB corruption).  I just need to squash commits and get a
code review, which I'll ask for after one more round of testing with
valgrind.

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Russ Allbery-2
Nico Williams <[hidden email]> writes:
> On Wed, May 27, 2015 at 10:31:59PM -0700, Russ Allbery wrote:

>> I think it happens because the version of Berkeley DB in use here is
>> ancient and not horribly well-written code.  It's very old, so most of
>> the bugs have been hammered out, but I've had Berkeley DB corrupt KDC
>> databases in the past.  It doesn't particularly surprise me when it
>> happens.

> What version of Berkeley DB are you (Jacques) using?

> Heimdal (in master anyways; haven't checked 1.5) supports multiple
> versions of BDB, as well as SQLite3.

Whatever you get by default if you just stand up a server using the Debian
packages, here.  Oh, huh.  For some reason, I thought that Heimdal shipped
its own old copy of the Berkeley DB 1.x code, but I must have confused
that with MIT?  So it would have been Berkeley DB 5.x in its old
compatibility mode.  I think?

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Henry B Hotz
Yes, MIT does that.

On May 28, 2015, at 10:39 AM, Russ Allbery <[hidden email]> wrote:

> For some reason, I thought that Heimdal shipped
> its own old copy of the Berkeley DB 1.x code, but I must have confused
> that with MIT?

Personal email.  [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Nico Williams
In reply to this post by Russ Allbery-2
On Thu, May 28, 2015 at 10:39:37AM -0700, Russ Allbery wrote:

> Nico Williams <[hidden email]> writes:
> > On Wed, May 27, 2015 at 10:31:59PM -0700, Russ Allbery wrote:
>
> >> I think it happens because the version of Berkeley DB in use here is
> >> ancient and not horribly well-written code.  It's very old, so most of
> >> the bugs have been hammered out, but I've had Berkeley DB corrupt KDC
> >> databases in the past.  It doesn't particularly surprise me when it
> >> happens.
>
> > What version of Berkeley DB are you (Jacques) using?
>
> > Heimdal (in master anyways; haven't checked 1.5) supports multiple
> > versions of BDB, as well as SQLite3.
>
> Whatever you get by default if you just stand up a server using the Debian
> packages, here.  Oh, huh.  For some reason, I thought that Heimdal shipped
> its own old copy of the Berkeley DB 1.x code, but I must have confused
> that with MIT?  So it would have been Berkeley DB 5.x in its old
> compatibility mode.  I think?

When I build from master I get the lib/hdb/db3.c backend using
libdb-5.1 (using just the db_create() entry point).  There's no "old
compatibility mode" that I can see.  It's db5 with the db3 source
compatible API.

MIT ships with its own private copy of db 1.85 hacked up and which they
call db2 (IIRC) though it isn't.

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Quanah Gibson-Mount-3
--On Thursday, May 28, 2015 5:01 PM -0500 Nico Williams
<[hidden email]> wrote:

> When I build from master I get the lib/hdb/db3.c backend using
> libdb-5.1 (using just the db_create() entry point).  There's no "old
> compatibility mode" that I can see.  It's db5 with the db3 source
> compatible API.
>
> MIT ships with its own private copy of db 1.85 hacked up and which they
> call db2 (IIRC) though it isn't.

An alternative would be to use lmdb instead of bdb, since bdb's essentially
dead at this point for OSS projects.

--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Russ Allbery-2
In reply to this post by Nico Williams
Nico Williams <[hidden email]> writes:

> When I build from master I get the lib/hdb/db3.c backend using libdb-5.1
> (using just the db_create() entry point).  There's no "old compatibility
> mode" that I can see.  It's db5 with the db3 source compatible API.

Ah, interesting.  Well, in that case, I have definitely had BerkeleyDB 5.x
corrupt my database.  Symptom was that kadmin -l would go into an infinite
loop reading the same entries over and over again.

--
Russ Allbery ([hidden email])              <http://www.eyrie.org/~eagle/>
Reply | Threaded
Open this post in threaded view
|

Fwd: Berkeley DB error PANIC

Jacques Henry

Hello everyone,

First many thanks to all of you who took time to answer me.

Regarding all your questions:

- Yes I am using the Debian packages. Which means libdb5.1 (5.1.29-5) in Wheezy. Would it make a difference to go for Jessie and its libdb5.3 version?
- @Russ, I think that may have happened: I am using kadmin -l a lot. I will set a small timeout between them, it won't harm the process ;)

Running db_verify gives me that:

db5.1_verify heimdal-kdc/heimdal.db
db5.1_verify: Page 17: incorrect next_pgno 104 found in leaf chain (should be 82)
db5.1_verify: Page 10: incorrect next_pgno 81 found in leaf chain (should be 104)
db5.1_verify: Page 104: incorrect prev_pgno 17 found in leaf chain (should be 10)
db5.1_verify: Page 0: non-invalid page 81 on free list
db5.1_verify: heimdal-kdc/heimdal.db: DB_VERIFY_BAD: Database verification failed
Verification of heimdal-kdc/heimdal.db failed.


However not sure how to use db_recover. I tried that:

db5.1_recover -v -h heimdal-kdc
No log files found


Using strace, I see it wants to open heimdal-kdc/DB_CONFIG, heimdal-kdc/__db.rep.init and heimdal-kdc/__db.001
I have to structure on my Openldap server but not here...

I will follow Thomas and Russ' advices on checking and backing up regularly the database. Many thanks for experience feedbacks.

Cheers

Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Jacques Henry
In reply to this post by Russ Allbery-2


Run kadmin -l list \* and make sure it completes successfully and doesn't
either error out or go into an infinite loop.  It's a good Nagios probe.


I was able to do kadmin -l list \* without going in a infinite loop, It just showed about 10% of the principals just as if the others were deleted

Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Jacques Henry
In reply to this post by Nico Williams


Heimdal (in master anyways; haven't checked 1.5) supports multiple
versions of BDB, as well as SQLite3.


Indeed, according to the heimdal blog (http://blog.h5l.org/) SQLite3 is supported for based credential cache since 1.3.0

On Debian libsqlite3-0 is a dependency of heimdal-kdc.

However I didn't find any documentation/instructions on how to make Heimdal use SQLite3 instead of BDB....


Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Harald Barth-2

> However I didn't find any documentation/instructions on how to make Heimdal
> use SQLite3 instead of BDB....

Me neither, but my guess would be to try something in the config file
that looks like and see if that results in what you want.

[kdc]

database = {
    dbname =  sqlite:/var/heimdal/db.sqlite3
}

Harald.



Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Anton Lundin
On 29 May, 2015 - Harald Barth wrote:

>
> > However I didn't find any documentation/instructions on how to make Heimdal
> > use SQLite3 instead of BDB....
>
> Me neither, but my guess would be to try something in the config file
> that looks like and see if that results in what you want.
>
> [kdc]
>
> database = {
>     dbname =  sqlite:/var/heimdal/db.sqlite3
> }
>

A couple of years back when i reworked a kdc setup, as far as i
remembered hprop or iprop didn't work with sqlite3 databases.

The whole sqlite3 database support felt kinda unfinished, but if it
worked I'd rather use that than BDB.


//Anton


--
Anton Lundin +46702-161604
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Harald Barth-2

> A couple of years back when i reworked a kdc setup, as far as i
> remembered hprop or iprop didn't work with sqlite3 databases.

That would be rather "not useful".

Harald.
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Jeffrey Hutzelman
On Mon, 2015-06-01 at 11:31 +0200, Harald Barth wrote:
> > A couple of years back when i reworked a kdc setup, as far as i
> > remembered hprop or iprop didn't work with sqlite3 databases.
>
> That would be rather "not useful".

iprop happens at a layer above hdb, and should work with any database
backend.

Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Nico Williams
On Mon, Jun 01, 2015 at 04:26:53PM -0400, Jeffrey Hutzelman wrote:
> On Mon, 2015-06-01 at 11:31 +0200, Harald Barth wrote:
> > > A couple of years back when i reworked a kdc setup, as far as i
> > > remembered hprop or iprop didn't work with sqlite3 databases.
> >
> > That would be rather "not useful".
>
> iprop happens at a layer above hdb, and should work with any database
> backend.

Right.  I'm quite curious about iprop or hprop not working with the
SQLite3 backend.  Right now I don't see how they wouldn't.

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Nico Williams
In reply to this post by Jacques Henry
FYI, these are the kinds of warnings I see from valgrind about Berkeley
DB (5):

==28229== Conditional jump or move depends on uninitialised value(s)
==28229==    at 0x6F082E4: __bam_stkrel (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6EF9636: ??? (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8E9F3: __dbc_iput (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8BC0F: __db_put (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F9E1EA: __db_put_pp (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x5469DF5: DB__put (db3.c:230)
==28229==    by 0x5469663: _hdb_store (common.c:351)
...

and

==28229== Syscall param pwrite64(buf) points to uninitialised byte(s)
==28229==    at 0x6211D03: __pwrite_nocancel (syscall-template.S:81)
==28229==    by 0x6FEFCD8: __os_io (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FDD365: ??? (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FDD650: __memp_bhwrite (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FEC500: __memp_sync_int (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8C4DE: __db_sync (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8A117: __db_refresh (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8A3A5: __db_close (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F99AD7: __db_close_pp (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x54697C7: DB_close (db3.c:72)
==28229==    by 0x504D4C4: kadm5_s_create_principal (create_s.c:224)
==28229==    by 0x4E3BF0A: kadm5_create_principal (common_glue.c:100)
==28229==    by 0x404550: add_one_principal (ank.c:150)
==28229==    by 0x404AAD: add_new_key (ank.c:259)
==28229==    by 0x40EB79: add_wrap (kadmin-commands.c:222)
==28229==    by 0x525DA65: sl_command (sl.c:212)
==28229==    by 0x409A6B: main (kadmin.c:279)
==28229==  Address 0x863f0ba is 122 bytes inside a block of size 4,184 alloc'd
==28229==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28229==    by 0x6FED4E4: __os_malloc (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FBB96E: __env_alloc (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FDBF6D: __memp_alloc (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FDEF6F: __memp_fget (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6FA079C: __db_new (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F0CFBF: __bam_split (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6EF9564: ??? (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8E9F3: __dbc_iput (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F8BC0F: __db_put (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x6F9E1EA: __db_put_pp (in /usr/lib/x86_64-linux-gnu/libdb-5.1.so)
==28229==    by 0x5469DF5: DB__put (db3.c:230)
==28229==    by 0x5469663: _hdb_store (common.c:351)
...

among others.

I've not investigated the Berkeley DB source to see if these are
serious, and I don't think I will either.  Once the LMDB support is
fixed up I think we should switch to LMDB as the preferred backend and
kiss Berkeley DB goodbye.

Nico
--
Reply | Threaded
Open this post in threaded view
|

Re: Berkeley DB error PANIC

Anton Lundin
In reply to this post by Nico Williams
On 01 June, 2015 - Nico Williams wrote:

> On Mon, Jun 01, 2015 at 04:26:53PM -0400, Jeffrey Hutzelman wrote:
> > On Mon, 2015-06-01 at 11:31 +0200, Harald Barth wrote:
> > > > A couple of years back when i reworked a kdc setup, as far as i
> > > > remembered hprop or iprop didn't work with sqlite3 databases.
> > >
> > > That would be rather "not useful".
> >
> > iprop happens at a layer above hdb, and should work with any database
> > backend.
>
> Right.  I'm quite curious about iprop or hprop not working with the
> SQLite3 backend.  Right now I don't see how they wouldn't.
>

Try setting it up and bits are flying everywhere. I just re-tested
trying to get check-iprop to actually run between two sqlite db's, and
as i remembered, it didn't work:

error message: Matching credential (krbtgt/[hidden email]) not
found: -1765328243 and such errors.

Try it yourself to see for yourself.


//Anton


--
Anton Lundin +46702-161604
12