With the advent of larger and larger hard disks, one of the factors
that seems to be consistently overlooked is the factor of cluster
size.

A cluster is the smallest allocation unit on a disk.  Every file that
is created on a disk drive takes up at least one cluster.  MS-DOS and
Windows 95 use a file system called FAT (file allocation table).  One
of FAT's limitations is that it can only address a finite number of
clusters on a hard disk.  So, as hard disk partitions get bigger, so
must cluster sizes.  IBM's OS/2 operating system can be set up to use
an alternate file system, called HPFS (high performance file system),
that does not have this limitation.

This chart shows the relationship bewtween hard disk partition size
and cluster size.  This table applies to all version of DOS from 4.0
on up.

     +----------------------------------------+
     | Partition size    |  Cluster Size      |
     |-------------------+--------------------|
     |    0MB - 128MB    |   2K (2048 bytes)  |
     |  129MB - 256MB    |   4K (4096 bytes)  |
     |  257MB - 512MB    |   8K (8192 bytes)  |
     |  513MB - 1.02GB   |  16K (16384 bytes) |
     | 1.02GB - 2.04GB   |  32K (32768 bytes) |
     +----------------------------------------+



As you might guess, a smaller cluster size will waste less of your
hard disk.  A batch file with 400 bytes in it will consume one
cluster, regardless of whether it's a 2K cluster or a 32k cluster.


Why did I write CLUSTERS?  Well, I was contemplating the purchase of a
one gigbyte hard drive, and I was wondering how much of it would go to
waste when I re-installed my files.  I figured a more generic utility
might be usefull to the public.

CLUSTERS will examine the size of each individual file on a specified
hard drive.  It will display a chart showing the cluster alocation for
all five possible cluster sizes.  The current cluster size of the hard
drive being processed is highlighted.

Simply type CLUSTERS at the DOS prompt.  It will ask whish drive you
would like to examine, and will ask what your "pain threshold" is.
The pain threshold is simply the highest percentage of waste that you
think is tolerable.  If using a cluster size would exceed this number,
the screen report will flash the number.  CLUSTERS will first create a
list of all he directories on your hard drive.  Don't be alarmed if
the listing pauses -- some directories contain a lot of files to
process.



Next, CLUSTERS will check the size of every file on your hard drive,
and will display running computations for all five default cluster
sizes used by MS-DOS.  The screen will look this:

-------------------------------------------------------------------------------

 Disk usage prediction program                            (c) Nathan C. Durland
 Processing  283  directories. Pain threshold is  30.00%
 Current drive cluster size is  8192
 Directory D:\WORKSHOP\RESTEST\
 File:     RESTEST.TXT

    5,739 Files have been checked.     283 directories have been checked


 Clust Size  Clust Alloc     Bytes Alloc     Bytes used         Wasted   %Slack
 ~~~~~~~~~~  ~~~~~~~~~~~     ~~~~~~~~~~~    ~~~~~~~~~~~        ~~~~~~~  ~~~~~~~
   2,048        132,155      270,653,440    253,032,550     17,620,890     6.51
   4,096         70,602      289,185,792    253,032,550     36,153,242    12.50
   8,192         40,047      328,065,024    253,032,550     75,032,474    22.87
  16,384         24,919      408,272,896    253,032,550    155,240,346    38.02
  32,768         17,531      574,455,808    253,032,550    321,423,258    55.95

 DOS partition size/cluster size relationship:
     0MB - 128MB  =  2K (2048 byte) Clusters
   128MB - 256MB  =  4K (4096 byte) Clusters
   256MB - 512MB  =  8K (8192 byte) Clusters
   512MB - 1.02GB = 16K (16384 byte) Clusters
  1.02GB - 2.04GB = 32K (32768 byte) Clusters

------------------------------------------------------------------------------

Personal observation  --  Every menu item on the Windows95 "start" menu,
and meny of your "normal" icons, represents what Microsoft calls a
"shortcut".  Shortcuts are small (200-400 byte) files that point to the
"real" file.  This is very wasteful of space on big disks.

CLUSTERS, it's source code, and the documentation is placed in the
public domain; however, I retain all copyright to the program and
source code.  You cannot charge for, sell, or lease this program.

If you think CLUSTERS is kinda cool, let me know.. E-Mail me at
OFFSYS@CENCOM.NET on the internet, NDURLAND on America on-Line, or
76467,3355 on CompuServe.

If you are feeling philanthropic, send a $5.00 donation to:

    Nathan Durland
    47 Spring St.
    Keeseville,  NY 12944

Clusters was written using PowerBASIC, from POWERBasic inc.  If you
need a DOS compiler that easier to use than C or C++, and generally
creates faster executables, it's the only way to go.
