                              SPECTROGRAM (2.3)

1.  PRINCIPLES OF OPERATION

   Most ordinary sounds are complex combinations of individual frequency
components or harmonics which cover a wide frequency range and vary in
intensity over time.  A spectrogram is simply a plot of the frequency
content of such an audio signal as a function of time.  In this program,
digital audio recordings (PCM format) are analyzed to produce a plot of
frequency versus time, with harmonic intensity represented by a variable
color scale.  These spectrograms reveal the fascinating hidden frequency
structure of audio signals and can be used for identifying or classifying
particular sounds.
	
   Spectrogram uses a mathematical Fast Fourier Transform in performing the
frequency analysis. FFTs are usually specified by the number of input data
points used in each calculation.  For a sampling rate of F (Hz), an N input
point FFT will produce a frequency analysis over a frequency range of F/2. 
Signal amplitude will be calculated at N/2 frequency increments in this
range.  All this means is that for a digital signal sampled at 10000 Hz, a
512 point FFT will calculate signal amplitude to be found at 256 frequency
increments from 0 Hz to 5000 Hz.  This will become clear as you calculate
and observe different spectrograms. 	

   Contrary to popular opinion, higher sampling rates are not always
necessary for high fidelity recording.  The choice of sampling rate depends
entirely on the highest frequencies in the audio signal. The rule of thumb
is to use a sampling rate that is twice the highest frequency in the audio
signal.  That is, if you expect to have no frequency components above 11KHz,
then a sampling rate of 22KHz is adequate.  If you examine a spectrogram and
see that all of the signal is concentrated in lower frequency components at
the bottom of the display, then it is a good bet that the recording was
sampled at too high a rate, wasting a significant amount of memory.  This
program produces the highest quality spectrograms of digital recordings
which have been sampled at the appropriate rate. 

2.  SYSTEM REQUIREMENTS

   Spectrogram will run on any Windows 3.1 equipped machine. However, the
intensive calculations required to develop the frequency spectrum demand the
fastest processor available.  In addition, large sound files will require
much memory for analysis and display, so the more memory the better. 
Spectrogram will process any 8 or 16 bit audio data in PCM format including
".wav" files or raw data files. Spectrogram cannot process compressed audio
data found occasionally in large .wav files.

   In order to record and play back sound samples, you will also need a
Windows compatible sound card installed.  However, a sound card is not
necessary in order to analyze and display audio spectrograms. 

3.  COMPUTING AND DISPLAYING A SPECTROGRAM

   Choose "Open" from the "File" menu to load a digital sound sample file. 
Once a file has been selected, Spectrogram will present the "Analysis
Options" dialog box where you will specify the parameters of the frequency
analysis.  To select the default values, just press the space bar.  To
tailor the calculations to your own preferences, see below. 

a. SAMPLE CHARACTERISTICS

   You may enter any value of sample rate from 8000 Hz to 44100 Hz. If you
have selected a .wav file, the sample rate displayed will be the rate used
in the original recording.  If you have selected a raw data file, a sample
rate of 11025 Hz will be initially assumed, and you should enter the correct
value if necessary.

   You may also select the beginning and ending location in the selected
file (in bytes) to be analyzed.  Initially, the starting and ending location
of the entire file will be displayed. If you make no change here, the entire
file will be analyzed.

   You also have a choice of 8 bit or 16 bit data resolution.  Pick the
value which you know corresponds to the data file you are analyzing. If you
are loading a .wav file, the correct value will already be shown.  If this
is a raw data file, 16 bit data will be assumed, but it is up to you to
specify the correct value. 
    
b.  FFT Selection

   You have a choice of 512, 1024, or 2048 point FFTs for the frequency
analysis.  Use 512 points routinely.  Use 1024 or 2048 point FFTs for high
resolution analysis.  The higher resolution FFTs require more time to
compute the spectrogram.  For this reason, it is sometimes preferable to
decrease sampling rate when recording audio data, if increased frequency
resolution is needed, rather than to use a higher resolution FFT. 
    
c.  FFT Window

   Spectrogram provides both narrowband (NB) and broadband (BB) processing
options which are selected by specifying an FFT data input window size in
milliseconds.  For narrowband processing the width of the FFT window
automatically corresponds to its maximum value, which is the time required
to fill the FFT input with data samples (either 512, 1024, or 2048 data
points depending on FFT selection).  Narrowband processing produces a
display of high frequency resolution which resolves the individual harmonics
of the audio sample.

   For broadband processing the FFT window width is reduced from the
maximum value by filling a portion of the input data buffer with zeroes. 
This technique broadens the frequency response of the FFT and produces a
display which smoothes over the individual harmonics to show broad areas of
intensity.  The smaller the FFT window width the greater the output
smoothing.  The default broadband FFT window with is 8 ms, however you may
choose the value which gives the best results for a particular combination
of sampling rate and FFT selected.  This type of display is useful mainly
for analysis of speech formants.    

d.  Horizontal Scale Selection

   You may also select a horizontal scale in milliseconds, which
corresponds to the time interval between the calculation of each FFT.  Each
vertical line in the spectrogram display represents the output of one FFT
calculation.  The FFT data input window is stepped sequentially through the
data, performing an FFT calculation at each step.  The horizontal scale
selected determines the length of the step between each FFT and thus the
total number of FFTs required. The horizontal scale can be assigned any
value between 1 ms and 500 ms.  Experiment with these values to pick the
horizontal scale you prefer. 
    
e.  Display Threshold Selection

   You are also given a choice of display threshold in order to reduce
clutter in noisy digital recordings.  A threshold of -3 dB or -6 dB reduces
the input signal level to eliminate background clutter. Use a threshold of 0
dB regularly, and select signal reduction only if necessary to reduce
clutter. 
    
f.  Color Palette Selection

   And finally, you have a choice of color of grayscale display. For a
color display, red represents the highest signals and dark blue the lowest. 
For a grayscale display, the darker the display, the higher the signal
level. 	 

   Once you are satisfied with the Analysis Options, click "OK" to begin
processing and display of a spectrogram of the audio data file.  The program
will step sequentially through the audio file, calculate an FFT  at each
step, and display the results in the Spectrogram window.  You can stop the
process at any time by clicking the "Stop" button. 

4.  The Spectrogram Display

   The spectrogram display reveals the digital signal as a frequency versus
time plot with signal amplitude at each frequency represented by intensity
(or color).  A continuous readout of cursor position in frequency (Hz) and
time (milliseconds) is displayed at the bottom left of the window.  A
coordinate grid can also be added by  clicking the "Toggle Grid" button.

   The width of the spectrogram display is limited only by the display
screen.  Maximizing the spectrogram window will expand the display
horizontally to fill the screen.  If the spectrogram width is greater than
screen width, you can use the horizontal scroll bar at the bottom of the
display to position the spectrogram side-to-side.

   The height of the spectrogram display is limited by the size of the FFT
chosen for analysis. Only 256 vertical display points are needed for a 512
point FFT.  The 1024 and 2048 point FFTs require 512 and 1024 points
respectively.  Maximizing the spectrogram window will expand the display
vertically to the size required by the FFT if not limited by the screen
height.  If the spectrogram height is greater than the screen height, use
the vertical scroll bar at right of the window to position the spectrogram
top-to-bottom.  

5.  Modifying Spectrograms

   Once you have computed a spectrogram, you may want to make changes to
its length, vertical or horizontal scale, threshold or color to improve the
frequency analysis.  The menu bar across the top of the display gives
options for FFT size, display threshold, and color palette.  Choosing any of
these options will cause the spectrogram to be recomputed with the new value
you have chosen. If you want to change more than one parameter before
recomputing the spectrogram, choose "Modify" from the File Menu to bring up
another Analysis Options dialog box to make your selections.

   Frequently you will want to select a portion of the entire  spectrogram
for recomputation rather than recompute the entire length. You can drag
select this section from the spectrogram display.  Position the mouse
pointer at the desired starting point, press the left mouse button and drag
the mouse to the desired ending point and then release the mouse button. 
The Analysis Options dialog box will then appear with the starting and
ending locations filled according to your selection. 

6.  Direct Recording and Analysis

   If you have a Windows compatible sound card installed, you will be able
to directly record and analyze an audio sample through a microphone attached
to your sound card.  Choose "Record New" from the File menu to initiate
recording.  You will be again be presented with the Analysis Options dialog
box to select the parameters of the frequency analysis.  When recording is
complete, computation of the spectrogram will begin.     

7.  Spectrogram Playback

   If you have a windows compatible sound card installed, you will also be
able to play back the spectrogram by clicking the 'Play' or 'Play Wdw'
buttons.  The Play button plays back the entire length of the .wav file,
while the Play Wdw button plays back only that portion of the spectrogram
which is visible in the Spectrogram Window. 

8.  Saving Audio and Bitmap Files

You can save a .wav file of the digital audio of your spectrogram by
choosing "Save Wave" from the File Menu.  You can also save a bitmap of the
visible portion of the Spectrogram Window by choosing "Save Bitmap" from the
File Menu. 

9.  Problem Reporting

   Programs can only be improved if users provide feedback to the author. 
I can be reached at the following address for you to report any bugs or to
provide comments or feedback.  I encourage anyone with a question or
suggestion to contact me at rshorne@delphi.com.
			
10.  DISTRIBUTION

   Spectrogram is Copyright 1994, 1995 by R.S. Horne and may be distributed
as freeware. 

11.  CREDITS

   So many interested Internet users have given good comments and
suggestions that I can't list everyone.  However, the contribution of Philip
VanBaren who provided the fast integer FFT code has been vital to the
improved performance of this update.  Thierry Dutoit provided the
inspiration and the technique for broadband processing.  Greg Walker and
Henrik Clausen provided invaluable suggestions and debugging help.

	
