After reading the responses, I decided the best way to repair the wtmpx
file was to write a (simple)perl script to read in the records, cut out
the garbage(only 164 bytes of crap was screwing everything up), and
close the space back up(I should be a surgeon).  This worked, and I also
found out lots of interesting stuff about the files in question.  Thank
you to these people who gave me sample scripts and a good history of the
[uw]tmpx? files.
BILLY <billy@student.adelaide.edu.au>
Jean-Philippe.LEROY@st.com
Jim Harmon <jharmon@telecnnct.com>
Chris_Marble@hmc.edu
jsdy@cais.com
Aleksandar Milivojevic <alex@srce.hr>
"Karl E. Vogel" <vogelke@c17mis.region2.wpafb.af.mil>
Original question:
====================
In short, can everyone tell me all they know about the utmp,wtmp,utmpx,
and wtmpx files?
I have read the man pages for [uw]tmpx? and fwtmp, know how to truncate
them and know how to rotate them, and realize that the "x" files are
extended versions of the non-"x" files, but I wonder: 
why are all four files necessary, i.e. why is all accounting info not
kept in one huge file?  Is this so older programs can still read the old
format of the utmp and wtmp?
why is there both a U-tmp(x) and a W-tmp(x)(emphasis on first letter)? 
Again, I am curious why four files are needed.
what function each serves(sure they help commands like who and write,
but more specifically what commands rely on which files and why)
what is the history behind the files(since the x files are "extensions",
I assume that at some point there were only the utmp and wtmp)
is there an equivalent to fwtmp that can read the wtmpx file and write
it out in ascii so I can try to repair my wtmpx file?  This is the real
reason for this message: my wtmpx file is messed up somehow because a
"last" command only lists people up to Dec 6.  It looks as though noone
has logged in since then.  
I could just truncate the file and get on with life, but I need to keep
the information intact(I do analysis on the connections to this
machine).  The file is still growing, so the new logins are getting
written still, but there must be a bad spot in the file that "last"
chokes on.  I might eventually write a C program to do what i want, but
I wanted to understand the history, structure, and uses for these files
first(also maybe there is already a program out there).  I will check if
there are any good backups after Dec 6 of the wtmpx file(maybe it just
recently got hosed), but I would still like to know this stuff just to
be more educated.
Thank you.
====================
RESPONSES:
----------
=> why is there both a U-tmp(x) and a W-tmp(x)(emphasis on first
letter)? 
=> what function each serves(sure they help commands like who and write,
=> but more specifically what commands rely on which files and why)
utmp(x) contains the current state, and is used by things like
finger(1),
write(1) and who(1)
wtmp(x) contains the login history, and is used by things like last(1)
=> is there an equivalent to fwtmp that can read the wtmpx file and
write
=> it out in ascii so I can try to repair my wtmpx file?
not that i know of... but [uw]tmp(x) manipulators are easy to write...
in
perl, you'd want something like this to read [uw]tmpx:
open(UTMPX, "/var/adm/utmpx"); # or whatever
while(read(UTMPX, $utmpx, 372)) {
    ($user, $id, $line, $pid, $type, $exit_1, $exit_2, $tv_1, $tv_2,
     $session, $pad_1, $pad_2, $pad_3, $pad_4, $pad_5, $syslen, $host)
        = unpack('A32 A4 A32 l s ss xx ll l lllll s A257', $utmpx);
    # do stuff here
}
close(UTMPX);
have a peek through /usr/include/utmp.h and utmpx.h to get an idea of
the
structures and functions available... if it helps, i can send you a perl
hack
i wrote (from which i pulled the code above) that basically duplicates
"finger|sort"...
----------
>From a unix administration book (accounting chapter) : 
"First of all utmp is created by the init daemon when it runs for the
first time. wtmp must be create dby the administrator. Each record is
writen in utmp by a terminal: for example login writes user name and
remote node (if any) and the connection time. When the connection ends
init process will clean this information. So the file size is more or
less stable and proportional to the number of terminals. The records are
similar in wtmp but it will contain two records by session: one for the
begining and one for the end date. This file needs to be clean
periodically based on the number of connection (nb of terminals and
users)..."
To clean wtmp you just need to "cp /dev/null /var/adm/wtmp".
----------
There's an administrative command called "wtmpfix" that will probably do
what you're looking for.
look for it in the (1m) section of the Answerbook Manpages.
It should be in the the /usr/lib/acct dir.
----------
We wrote a program here to read in and trim the files as desired.
We didn't want to simply truncate but retain the last login date
for each user no matter how old.  Our program's written in perl
and should be readable and modifyable.  Hope it helps.
[http://www3.hmc.edu/docs/coolstuff/wtmpx]
----------
As you clearly have deduced, "tradition" accounts for a lot of this.
In the beginning, there were just the utmp and wtmp files, in /etc/.
The utmp file, as now, contains structures for those who are currently
logged in.  With the introduction of System V, certain other processes
logged themselves into the utmp file [notably 'init'], and the locations
of terminal lines became fixed in the file - no longer would a new login
just insert itself in the first empty slot.  This meant, too, that many
programs began to depend on the format of utmp and wtmp.
Meanwhile, again as from the beginning, "wtmp" was just the
concatenation of 'utmp' structures to indicate when users had logged in
and out and when other system, events (notably time changes and reboots)
had happened.  No attempt was made to verify whether the file was intact
before appending another 'wtmp' record.  This is of especial importance
to you, as we will see.
But along came networked logins, X-windows sessions, and other things
that needed to be logged along with a 'utmp'/'wtmp' entry.  Different
groups have reacted to this in different ways.  Sun decided to add the
utmpx and wtmpx files.  Some of the information is mirrored; but the
string lengths are notably longer.  Other information is added, and
other information is omitted.
So, now, when a Sun program needs to get all of the information for a
given current login, it looks in both the utmp and the utmpx files.  For
historical information, it looks in both the wtmp and wtmpx files.
I've had the problem you describe, when a 'wtmp' structure was partially
written to the "wtmp" file just when the machine went down.  You need to
re-synch the file, by reading as many good records as you can, skipping
over the bad record, and repeating.  You have a particular problem with
the Sun solution, in that you might want to maintain consistency between
"wtmp" and "wtmpx".  I did this by doing a 'who wtmp', 'dd'ing the
appropriate number of records, using 'dd' again to skip over the mangled
record, etc.  This may or may not be more onerous when synchronizing
with the 'utmpx' structures in "wtmpx".
----------
Sometimes after crash you'll get messed up wtmpx file (becose it was
not cleanly closed).  If you look wtmpx, you'll see that there is some 
garbage (usualy lots of zeros) that confuses commands like last.
Since wtmpx is binary file, it will be hard to repair it by hand.
But, you can write small program (similar to last) that will read the
file and ignore errors in it.
----------
  Is /usr/lib/utmpd running?  That should be started in
/etc/rc2.d/S88utmpd.
   It's supposed to correct distortions in the utmp and utmpx files, but
   it can misbehave.  The current version of utmpd seems to work quite
   well as long as the defaults are set properly in /etc/default/utmpd:
        SCAN_PERIOD=300
        MAX_FDS = 0
   These values come from 
        http://remus.rutgers.edu/~adrian/solaris/problems.html
   We use the values
        SCAN_PERIOD=30
        MAX_FDS = 3
   If none of this helps, try modifying the S88utmpd script to remove
the utmp
   and utmpx files from /var/adm, and then create new ones before
starting
   utmpd:
        rm /var/adm/utmp /var/adm/utmpx
        cp /dev/null /var/adm/utmp
        cp /dev/null /var/adm/utmpx
        chown root /var/adm/utmp*
        chgrp bin /var/adm/utmp*
        chmod 644 /var/adm/utmp*
----------
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:12 CDT