            Find Duplicates - discover duplicated files

Find Duplicates was written to allow you to control your disk space usage 
by discovering files that are duplicated and, should you so wish, deleting 
one or more of these duplicates.  There are many ways in which duplicate 
files can be deposited on your hard disk, for example programs which don't 
check to see if you have a particular DLL installed and install their own 
private copy in any case, or other programs that install a DLL in your 
\Windows folder when it is already in \Windows\System32.  You can also use 
Find Duplicates to see if any files on a floppy are already present on 
your hard disk.


How does Find Duplicates work?

Find Duplicates scans one or more disks on your system to find multiple 
files, in a two-phase process.  First it scans all the folders and sorts 
all the files it finds into size order (files HAVE to be the same size to 
be identical - yes?)  You can limit the scan to one folder tree, if you 
wish.  It then compares files of the same size to see if the contents are 
actually identical, and lists identical files by size order.  You can then 
double-click on any file to examine its properties, and optionally delete 
it.

This process can take some time, so Find Duplicates will first perform one 
of two preliminary checks to see if the files might actually be identical 
without having to actually examine the whole file.  By default, it checks 
the modification date and time of the files, and only compares the files 
byte-by-byte if the timestamps are the same.  But it is possible for two 
files to have the same contents without having the same timestamp, so you 
can enable an option whereby the first 512 bytes of each file are 
checksummed.  This improves the recognition of identical files, but it is 
slower, and since it involves a file access, the file's last access date 
will be altered.  By default, the timestamp, not the checksum comparison 
is selected.  In either case, the filename is ignored, so simply renaming 
a file will not hide the fact that it is a duplicate.  The timestamp of 
zero size files is ignored.  You may be rather surprised to discover what 
duplicates by content actually exist in some popular office suites!


When should I disable timestamp checking?

You can turn off the timestamp checking in favour of the slower checksum 
method should you so wish.  For example, if many identically sized and 
timestamped files are found from the initial search, using just timestamps 
might miss some duplicate files since the duplicates may not be adjacent 
in the name ordered list produced by the folder scan.  The program was not 
designed for this sort of duplicate search, but will perform adequately 
with timestamp checking turned off.  You might also wish to disable time-
stamp checking if you suspected that different products had installed 
identical support DLLs.


Usage:

Extract FindDupl.exe from the zip file to a convenient location, and run 
it!  Only the FindDupl.exe file is required from the archive.  You will be 
presented with a dialog box showing you disk drives, with your local hard 
disk drives selected.  You can optionally enter a file spec such as *.EXE 
and a folder specification such as \windows to limit the search.  Note 
that if you enter a folder specification, only that folder will be 
searched on each drive (e.g. c:\windows, d:\windows and so on).  Press the 
Start Search button to find duplicate files.

There is a status bar which will keep you informed on the progress of both 
the folder scan phase, and the file comparison phase.  Once the main list 
box has filled up with file names, you can double-click on a file name to 
get a pseudo Properties dialog box (actually written in Delphi, not 
derived from the system right-click -> Properties box).  You will see a 
delete button which allows you actually to delete the file.

If a floppy disk (specifically drive A:) is included in the selected 
drives to scan, the program will normally assume that you wish to find 
files in common between the floppy and the other disk drives, so that 
during the folder scan phase Find Duplicates will only record files on the 
other drives that are the same size as files found on the floppy.  This 
makes the scanning faster and allows you to ask the question "Do I already 
have any files on my hard disk that are on this floppy?"  You can treat 
floppies just as ordinary disks by unchecking the "Treat floppy as master" 
check box.  You may notice a slightly different message in the status bar 
during the folder scan phase in this case.

Windows 95 has a special hidden folder called SYSBCKUP where backup copies 
of critical system files are stored.  Find Duplicates will recognise a 
folder with \SYSBCKUP\ in the path name, and ignore any files in that 
folder.  To disable this safety feature, uncheck the "Skip SYSBCKUP 
folder" check box.  The status bar will indicate that the folder is being 
skipped, but you'll have to be quick to see that message!  Other hidden 
folders are scanned normally.

Find Duplicates will ignore files that have zero length, because the data 
in such files does not occupy disk space, and they are often simply marker 
files (e.g. hidden files to show that a folder was created by installing 
an application and not a user).  If you prefer to find these files, 
uncheck the "Skip zero-length files" check box.  Be aware that these files 
actually take up at least 32 bytes of directory space, but that since the 
folder must be at least a cluster size long (e.g. 4096, 8192 bytes) there 
will typically be very little overhead for a zero-length file.

Upon exiting, Find Duplicates will try to save the list of duplicates in a 
file named FindDupl.lis in the same folder as the FindDupl.exe program 
file.  If this file is present on starting the program, Find Duplicates 
will ask if you would like to reload the list.  This allows you to split 
the task of deleting of duplicate files into short sessions without having 
to run the time consuming scan and compare phases every time.

For safety, Find Duplicates will not actually delete files, but instead 
will move them to the Recycle Bin.  This means that the disk space will 
not actually be returned until the Recycle Bin is emptied.  Right-click 
on the Recycle Bin to access the Empty Recycle Bin function.


+------------------------------ WARNING ---------------------------------+
|                                                                        |
|  You take sole responsibility if you choose to delete a file.  Find    |
|  Duplicates makes no attempt to check if the file is in use or key to  |
|  the functioning of your computer. Take backups before making changes. |
|                                                                        |
+------------------------------ WARNING ---------------------------------+


Notes:

The program is written with Borland's Delphi 3.0, and most of the source 
code is included.  You do not need access to Delphi to run Find Duplicates.  
You will need other Delphi units (not included in the .ZIP file) in order 
to recompile Find Duplicates.  The program requires Windows 95 or NT 4.0.

The folder scan phase can consume a large amount of virtual memory if a
wildcard *.* is specified.  At present, the program does not detect when 
its memory allocations fail, and may hang in these circumstances with an 
out-of-memory error.  Increase the space available for the Windows 
swapfile or avoid specifying wildcards if this happens to you.


Release information:

1997 Jan 29  V1.0.4  First released version
1997 Feb 03  V1.0.6  Treat floppy drive as master
1997 Feb 12  V1.0.8  Decode date of "0" as "unknown" on Properties dialog
1997 Apr 02  V1.1.0  Make file list box hint the filename (for long paths!)
                     Save and optionally restore duplicate file list
                     By default, ignore files in Win 95 SYSBCKUP folder
                     Replace ListBox with ListView (both Drives and Results)
                     Correct: missing FindClose in do_checksum routine
                     Correct: remove deleted file from the duplicates list
1997 Apr 07  V1.1.2  Use ShellAPI function to move file to recycle bin
1997 May 13  V1.1.4  Use my own TFileList component
                     Don't show properties/delete box for non-existant files
                     Put source files in sub-folder
                     Force checksum routine to return 31-bit value 
1997 May 18  V1.2.0  Move to Delphi 3.0
                     Don't leave singletons in the duplicates list
                     Correct property display for sequential compressed files
                     Don't allow ColumnClick on the FileListView - set False
1997 Oct 08  V1.2.2  Move to Delphi 3.01
                     Handle large font displays better.
                     Use TreeScanner with FindHiddenXX options
                     Don't build against run-time VCL30.DPL

Contacting the author:

This program is freeware, and remains copyright of David J Taylor, 
Edinburgh, 1997.  This program is provided "as is", without any support.  
Whilst I cannot answer queries relating to the use of this program, I'd 
welcome any comments or suggestions for improvements you may have, and 
such feedback has helped mould the present version of the program.


david.taylor@gecm.com
1997 October 08
