SUMMARY: Which file this block belongs to

From: Alberto Ferrari (FERRARIA@mz.astra.com.ar)
Date: Thu Jul 31 1997 - 15:40:01 CDT


Many Thanks to:
1) Brad Young <bbyoung@amoco.com>
2) Wes Pfarner - 5736 <wrpfarn@csua35.sandia.gov>
3) Jim Harmon <jharmon@telecnnct.com>
4) "Rick von Richter" <rickv@mwh.com>
5) "Karl E. Vogel" <vogelke@c17.wpafb.af.mil>
6) Bismark Espinoza <bismark@alta.Jpl.Nasa.Gov>

Original Question:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(SPARC20, Solaris 2.5.1, I already know the disks partition
structure)
Given the problem I submitted in a former message (I'm
getting warnings
about read errors from a disk), how can I know which
file/device is
damaged/corrupted?
I mean, given a certain block identified by its number, how
can I know
which filesystem entity it belongs to?

Anwers 3), 4), 5), 6)
Thank you.
The ways you all explain leads me to the partition which
contains the bad block at most,
but not the *file* (this is what I meant with "filesystem entity",
besides links, directories)
it spans into.

I think Wes Pfarner (2) has the right point: no way to know it
in Solaris (although we can
in SunOS - icheck (Brad Young (1)) !!)
I'm currently trying to get a machine with SunOS, connect
the disk and run icheck.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Brad Young <bbyoung@amoco.com>
1) Errors such as ...

Notice that the errors are all coming from a single block.
This
is indicitive of, duh, a single bad block.. If the errors were
from
many blocks, this repair may not be worth trying..

Apr 4 12:29:51 workstation vmunix: sd8h: Vendor
'SEAGATE' error code: 0x16
Apr 4 12:29:51 workstation vmunix: sd8h: Error for
command 'read(10)'
Apr 4 12:29:51 workstation vmunix: sd8h: Error Level: Fatal
Apr 4 12:29:51 workstation vmunix: sd8h: Block 467338,
Absolute Block: 3307252
Apr 4 12:29:51 workstation vmunix: sd8h: Sense Key:
Media Error
Apr 4 12:29:51 workstation vmunix: sd8h: Vendor
'SEAGATE' error code: 0x11
Apr 4 12:29:51 workstation vmunix: sd8h: Error for
command 'read(10)'
Apr 4 12:29:51 workstation vmunix: sd8h: Error Level: Fatal
Apr 4 12:29:51 workstation vmunix: sd8h: Block 467339,
Absolute Block: 3307253
Apr 4 12:29:51 workstation vmunix: sd8h: Sense Key:
Media Error
Apr 4 12:29:51 workstation vmunix: sd8h: Vendor
'SEAGATE' error code: 0x11
Apr 4 12:29:52 workstation vmunix: sd8h: Error for
command 'read(10)'
Apr 4 12:29:52 workstation vmunix: sd8h: Error Level: Fatal

2) Find the inode...
   workstation# icheck -b 467328 /dev/rsd8h
/dev/rsd8h:
467328 arg; frag 0 of 8, inode=109312, class=inodes
109312-109376

3) Find the file associated.. And have those backup tapes
ready!!...
   You _do_ have backup tapes...??!!

   Note: in this case, the inode points to a directory..

   workstation# mount /dev/sd8h /mnt
        workstation# cd /mnt
        workstation# find . -inum 109312 -print
        find: cannot stat
./userdata/application/subdir/nov13: I/O error

        or..

        workstation# ncheck -i 109312 /dev/rsd8h
        /dev/rsd8h:
        ncheck: read error 467328 (wanted 8192 got -1)
        ncheck: I/O error

   Ncheck didn't work for me, but others seem to highly
regard it over
   the find method.

4) Repair..

  Warning... not for the faint of heart..

format> repair
Enter block number of defect: 467328
Ready to repair defect, continue? y
Repairing block 467328 (224/16/48)...done

workstation# fsck /dev/rsd8h
** /dev/rsd8h
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
 
CANNOT READ: BLK 467328
CONTINUE? y
 
THE FOLLOWING SECTORS COULD NOT BE READ:
467328, 467329, 467330, 467331, 467332,
 467333, 467334, 467335, 467336, 467337, 467338, 467339,
467340, 467341, 467342,
 467343,

PARTIALLY ALLOCATED INODE I=109312
CLEAR? y
 
 
CANNOT READ: BLK 467328
CONTINUE? y
 
THE FOLLOWING SECTORS COULD NOT BE READ:
467328, 467329, 467330, 467331, 467332, 467333, 467334,
467335, 467336, 467337, 467338, 467339, 467340, 467341,
467342, 467343,
UNKNOWN FILE TYPE I=109313
CLEAR? y
 
UNKNOWN FILE TYPE I=109314
CLEAR? y
 
PARTIALLY ALLOCATED INODE I=109315
CLEAR? y
 
UNKNOWN FILE TYPE I=109324
CLEAR? y
...
 
** Phase 2 - Check Pathnames
UNALLOCATED I=109312 OWNER=root MODE=0
SIZE=0 MTIME=Dec 31 18:00 1969
WRITING ZERO'ED BLOCK 467328 TO DISK
NAME=/userdata/application/subdir/nov13
REMOVE? y
 
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT DIR I=93696 OWNER=zbaf05 MODE=40777
SIZE=1024 MTIME=Jun 3 10:04 1996 COUNT 5 SHOULD
BE 4
ADJUST? y
 
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? y
 
1070 files, 1180544 used, 155122 free (242 frags, 19360
blocks, 0.0% fragmentation)
 
***** FILE SYSTEM WAS MODIFIED *****
workstation# !!
fsck /dev/rsd8h
** /dev/rsd8h
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
UNKNOWN FILE TYPE I=109344
CLEAR? y
 
UNKNOWN FILE TYPE I=109345
CLEAR? y
...

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=109358 OWNER=root MODE=23400
SIZE=998244782 MTIME=Apr 17 21:09 1957
CLEAR? y
 
** Phase 5 - Check Cyl groups
1070 files, 1180544 used, 155122 free (242 frags, 19360
blocks, 0.0% fragmentation)
 
***** FILE SYSTEM WAS MODIFIED *****
workstation# !!
fsck /dev/rsd8h
** /dev/rsd8h
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1070 files, 1180544 used, 155122 free (242 frags, 19360
blocks, 0.0% fragmentati
on)

 Now, restore the missing files....

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2) Wes Pfarner - 5736 <wrpfarn@csua35.sandia.gov>
I had this same problem a few months ago which resulted in
bug #1264036
being created. They may be working on it, but I haven't
heard anything
further from the engineer involved in this bug.

It's a real dilemma not to be able to determine to which file
a particular
disk block belongs. We had this functionality in SunOS
4.x.x, but lost
it when Sun switched to Jersey UNIX. If anyone at Sun tries
to give you
the former company line about "fsdb" being of equal
functionality, tell
them to look at this bug report.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3) Jim Harmon <jharmon@telecnnct.com>
Since I don't know what the error message said, I can't tell
you an
exact answer.

If you want an ESTIMATED answer, the message should
tell you what TARGET
had the failing block.

If you know the TARGET information, you know the DEVICE
information.

With the DEVICE information, you can use FORMAT to see
what PARTITION
the block is/should be part of, and to CORRECT the bad
block directly.

Example:

        /dev/dsk/c0t1d0 block 300098 (Abs. Block 57890)
has encountered
        a write error...

#> Format
format> 2.
format> repair
        block? 57890 <cr>
format> repaired block 57890, successfully
format>quit

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>4) "Rick von Richter" <rickv@mwh.com>
- run 'format' and select the offending drive.
- in format, type verify (or just ve) to show the drive params.
- write down the numbers for; ncyl, nhead, and nsect
- for this example i'll use ncyl=2036, nhead=14, and
nsect=72
- the total number of blocks on the drive is found using this
equation;
   (ncyl)x(nhead)x(nsect) = (2036)x(14)x(72)=2052288 blocks
- so let's say you're having a problem on block number
1073520. Take this
number and divide it by (nhead)x(nsect) =
(1073520)/(14)(72)=1065
So the problem is occuring on or near cylinder 1065. Now
go back to the format
command and find out what filesystem cylinder 1065
resides on.
Here's an example from 'format' after the verify command;

format> verify

Primary label contents:

Volume name = < >
ascii name = <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>
pcyl = 2038
ncyl = 2036
acyl = 2
nhead = 14
nsect = 72
Part Tag Flag Cylinders Size Blocks
  0 var wm 0 - 203 100.41MB (204/0/0) 205632
  1 swap wm 204 - 407 100.41MB (204/0/0)
205632
  2 unassigned wm 0 - 2035 1002.09MB (2036/0/0)
2052288
  3 usr wm 408 - 1627 600.47MB (1220/0/0)
1229760
  4 unassigned wm 1628 - 2035 200.81MB (408/0/0)
411264
  5 unassigned wm 0 0 (0/0/0) 0
  6 unassigned wm 0 0 (0/0/0) 0
  7 unassigned wm 0 0 (0/0/0) 0

So using the above example, I can see that cylinder 1065 is
part of the /usr
filesystem because /usr uses cylinders 408-1627

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
5) "Karl E. Vogel" <vogelke@c17.wpafb.af.mil>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@2,0 (sd2):

   The last field (sd2) is the one you want. Use something
like sysinfo to
   associate that with a mounted drive.

         ftp://usc.edu/pub/sysinfo/sysinfo-3.2.2.tar.gz

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>6) Bismark Espinoza <bismark@alta.Jpl.Nasa.Gov>
Compare the block number to the blocks assigned in the
disk's partition
table obtained with "format".



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:59 CDT