SUMMARY: Disk blocksize - common ground?

Date: Thu Jun 10 1999 - 08:10:49 CDT

I was reassured to find that there are other folks out there who are also
confused about this. The consensus seems to be that 512 bytes is a standard
physical blocksize on a disk. This can be changed by jumpering or formatting,
but generally is not changed. 8K is the default size of ufs filesystems,
with newfs, and is the standard size of data chunks that are read to/from the
disk. This too is configurable.

Thanks very much to everyone who replied!!!!
J. Reed -
============================================================================ said:
The 512-byte blocksize is a hangover from older System V source code.
In Solaris, that really applies only to the default form of the "du" and
"df" commands, which reports their output in 512-byte blocks. The 8K
blocksize is the default blocksize when using "newfs" to make new UFS
filesystems. I would expect the Oracle people to be interested in the actual
blocksize used for the filesystem(s) that their database files will use.
If those filesystem(s) were created with standard UFS defaults, then the
block size they care about is 8K. said:
Disk sector sizing (while it could be set at just about anything) is
usually set at 512 bytes per sector for most versions of Unix. This
accomodates a reasonable amount of data with most disks within a track
geometry, and if you lose a little at the end of the track it's less
than 512 bytes so who cares? (BTW, disk geometry can be hard- or soft-
sector formatted. ie. jumpered or controlled by an "init device"
command of some type, depending on the controller you use.)

Memory, however, is usually allocated in 1024 or 2048 32-bit words
(depending on the platform). Therefore most buffered I/O ends up in
8Kb buffers by default, since it happens to gracefully fall on a
common allocation unit.
As for how to manipulate them, many programs allow you to specify a
blocksize. (This usually means buffersize... so they can look for some
header that is added. tar, for instance, allows a -b option. said:
     This 512B vs. 8K block issue is "disk block" vs. "filesystem block"
     to be accurate.
     This is not a Solaris specific issue. The idea is back to the old
     days of 4.2BSD, where BFFS (Berkely Fast File System) was developed,
     and since BFFS is the base of UFS design (SVR4's standard filesystem
     type), all things applying to BFFS also applies to UFS and therefore
     Solaris filesystems.

     "Disk block" is a physical term, at the hardware level, and means
     the smallest possible transfer unit of a disk, which is usually
     512 bytes.
     "Filesystem block", on the other hand, is the logical unit that
     is read or written by the kernel filesystem code. A bigger FS block
     (often) results in better performance since disk is a very slow device
     in comparison with CPU and RAM, and it is very expensive to position
     the head for example 16 times and transfer 16 512B blocks rather
     than positioning the head once, and transfering one big 8K block.
     The default FS block size is 8K in UFS, but you can change it
     if you want.
     If you would like to know about the design of BFFS (which I highly
     recommend, both as a sample of a very intelligent design and as a
     real-life technology that you are involved with as a sysadmin), read
     Marshal McKusick's original paper on the implementation of BFFS. said:
The disks are formatted with 512byte blocks. The operating system groups these
blocks into 8k clusters. These clusters improve disk management, and thus
(especially in older systems) performance.

Although a cluster can hold blocks allocated to multiple files, this is not
unless the disk is near full, and there are no free clusters. said:
        The logical block size of a Solaris filesystem is 8192 bytes! (standard)
This can be changed with newfs! (newfs -b).

You would change block size when creating a new filesystem, to tune application
performance you should tune block sizes to match the applications preferred
read/write size. said:
There are at least three different "block sizes" being talked about.
1. SCSI disk blocksize = 512b
2. UFS filesystem blocksize, set when fs is newfs'd = default 8K
3. Kernel PAGE size. Can be determined with the command
        /usr/bin/pagesize (on older Solaris systems it may be

As I understand it, SCSI blocksize is unimportant. UFS blocksize
is used by Oracle when its datafile resides on an fs.

I *believe* that PAGE size is the unit of IO for the kernel, and that
it is the blocksize used for kernel IO. said:
512 is the typical physical block size on a disk.
Many operating systems, and software packages that avoid using the
operating system's higher level read/write functions, read groups of
blocks in at a time in an attempt to speed up I/O. They typically
define a "logical" block size, which is the number of bytes a file is
extended by when it grows (i.e. file size is a multiple of this
number). So on a system with a logical block size of 8192, a one byte
file can occupy 8192 bytes! Some operating systems (Unix included) try
to get around this with various schemes (Unix uses "frags" which are
logical blocks set aside to store the end part of files that are not a
multiple of the block size, this is the number referred to by the fsck

The more logical blocks you have on the disk, the more space taken up
for housekeeping associated with tracking which blocks are used and
which are free. There is an inverse relationship between the logical
blocksize and the amount of space used for housekeeping. And there is a
direct relationship between logical blocksize and wasted space due to
files not being a multiple of the logical blocksize.
Typically, the default values are best. said:
The UFS filesystem that Solaris uses by default uses 8K filesystem blocks.
But each filesystem block also contains eight 1K fragments so that a if you
have a lot of small files your disk space isn't wasted.

Oracle is usually more interested in the size of a stripe if RAID is being
used on the system. For example if a database block size is 16K and the
stripe size is 16K there is less overhead to read in a database block than
if your stripe size is only 8K because you would have to read in two stripes
from the disk to get one DB block. Usually you make the stripe size a
multiple of the DB block size. Usually around 16KB is nice.

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:21 CDT