Alexander Talos
alexander.talos@univie.ac.at
Wed, 18 Dec 2002 17:51:51 +0100
Hej!
Vernon Schryver wrote:
> > 11/26/02 17:03:57.480496 2 4261 delayed eb6c8
> > Body 2 edd45fd7 29b59de1 7d28604f 86d74b55 23f59f6e0
> > 71450
> That's a seriously broken hash table link.
I wish to share my findings in this matter with you - it's fun.
I managed to figure out where this magic number 23f59f6e0 comes from. I
must add that almost the same broken address was found on all machines
where I did my tests, just the last two bytes vary a somewhat from host
to host and from compiler to compiler (variant).
Consider the following program (derived from the code of db_link_rcd()
in db.c, where rcd_ck->prev is initialised to NULL):
#include <sys/types.h>
#include <stdio.h>
typedef u_int64_t DB_PTR;
typedef u_int32_t DB_PTR_C;
#define DB_PTR_MULT ((DB_PTR)12)
#define DB_PTR_CP(v) ((u_int32_t)((v) / DB_PTR_MULT))
#define DB_PTR_NULL 0
main()
{
DB_PTR_C prev;
u_int64_t prevex;
/* If you wonder about the 1=1 - it's here to make sure I got the
%lx vs. %llx right */
printf("DB_PTR_NULL=%lx 1=%d\n", DB_PTR_NULL, 1);
printf("DB_PTR_CP(DB_PTR_NULL)=%lx 1=%d\n", DB_PTR_CP(DB_PTR_NULL),
1);
prev=DB_PTR_CP(DB_PTR_NULL);
printf("prev=%lx 1=%d\n", prev, 1);
prevex=DB_PTR_CP(DB_PTR_NULL);
printf("prevex=%llx 1=%d\n", prevex, 1);
prevex=((u_int64_t)0)/((u_int64_t)12);
printf("((u_int64_t)0)/((u_int64_t)12)=%llx 1=%d\n", prevex, 1);
prevex=0/12;
printf("0/12=%llx 1=%d\n", prevex, 1);
}
We would expect a result of 0 every time. Actually:
mailbox$ xlc div.c -o div
mailbox$ ./div
DB_PTR_NULL=0 1=1
DB_PTR_CP(DB_PTR_NULL)=2ff22ab8 1=1
prev=2ff22ab8 1=1
prevex=2ff22ab8 1=1
((u_int64_t)0)/((u_int64_t)12)=82ff22ab8 1=1
0/12=0 1=1
mailbox$
In my last test, the broken address was, 23f59fcb0, their
DB_PTR_CP-compresses equivalent being 2ff22a64 - so I think I have
located the source of my trouble.
I must confess that our xlc is not the most recent one (actually, I have
yet to find who can tell me *how* old it is); compiled with gcc, the
program above gives the correct results.
I will check again, but I'm pretty sure that I already have tested a
dccd compiled with gcc and that this one had the same corrupted
database.
When I have more details, especially regarding trouble or nontrouble
with gcc and how I managed to get dccd working on our machines (at least
I hope I will), I'll post again.
Best regards,
Alexander