
          ACB, DMC, PPM, LZ - CLASSIFICATION, IDEA, COMPARISON



                                                         George Buyanovsky


                   1.  Introduction

 This article provides a brief overview of the existing technologies for
compression of information and describes the place taken by ACB compression.
The overview discusses only the technologies for building compression
models: LZ, PPM, DMC, ACB; coding is not discussed.

LZ has two basic modifications, LZ77 and LZ78 along with a large number of
   variations. At present LZ models are widely used (zip,rar,arj,lzh,lzw...).
   The theoretical approach for LZ models was suggested in 1977 by
   Lempel A. Ziv J. (LZ), with software realization in the early eighties.

PPM, or context modelling, is based on theoretical work conducted from 1986-95
    (T.C.Bell, J.G.Cleary, I.H.Witten, W.J.Teahan, A.Moffat ....).
    Software realization came about in the early nineties (HA, X1, ....).

DMC - Several DMC, or Dynamic Markov Coding, are known, in particular,
      Ross Williams'DHPC algorithm, as well as the DMC algorithm of
      Cormack & Horspool, which uses many heuristics methods (as well as PPM).
      I am not aware of industrial archivators using the DMC-technology.


                   2.  LZ, PPM, ACB<-->DMC


  When two persons speak with one another, they have to formulate their
ideas in detail at the beginning of their conversation, but the longer
their conversation lasts, less detail is required. Why does this happen?
Many ideas have been expressed and are understood by conversants without
detail.

                   2.1  LZ-compression

LZ-compression substitutes the initial text with references to a dictionary;
it seems that this scheme resembles the methodology used by two interlocutors.

However, with the growth of the dictionary, the number of bits
necessary for formulating the reference grows proportionally by a binary
logarithm to the size of the dictionary. The length of the phrases
in the dictionary (in the simplest case, a binary tree) at the beginning
considerably surpasses the growth of the length of the references, but after
some saturation, the speed of their growth asymptotically tends to
logarithmic dependence.
The optimal size of the dictionary varies for different types of data;
the more variable are the data, the smaller the optimal size of the
dictionary.

                   2.2 PPM - algorithms (context modelling)

The idea of context modeling is based on the fact that distribution of
possibilities in the alphabet depends on the nearest context, in other
words, letters are more likely to appear in a particular pattern, that
is, next to or near other particular letters. For this technology, there
also exist principal restrictions for the increase of a sliding frame,
or a moving processor of data or letters, for which a context model is
built. A small sliding frame with a corresponding meagre context model
is optimal on variable data, for which the problem of zero frequency is
especially acute. In the course of increasing the size of a sliding frame,
more and more various information is processed (e.g., a text in French,
then a text in German and then a text in Russian enters this frame),
with the result that the distribution of possibilities widens and the
effectiveness of context modelling (with short context) decreases quickly.
With the increase of the context length, costs also increase exponentially.
The transition to context-mixed models has improved the situation to some
extent; however, the absence of theoretically substantiated schemes of
intermingling probabilities is compensated by a large number of heuristics,
which takes us back to the times of alchemy.

                   2.3  ACB-compression

However, the brain of interlocutors in a mysterious fashion can overcome
this barrier and can use all accumulated information through a mechanism
called associative memory.

The general principle of ACB-compression is very simple: the ACB-algorithm
puts on glasses with an associative filter and looking at the past, it sees
not the whole frame but rather only fragments, which are close by context
to the nearest context. Fragments or phrases of the received picture differ
much by quality; phrases close by context are distinct in their relation to
other phrases, less close phrases are less distinct or are not distinct at
all. In this case, the ACB-algorithm has ideal conditions for work, in that
it works more quickly by considering larger units, moreover, it can
distinguish the value of phrases, through consideration of the probability
of the appearance or use. In the case of an unlimited increase of a frame,
ACB-compression always ensures an optimal size of a context dictionary.

Some ditailes about ACB-compression see APPENDIX_A & APPENDIX_B

                   2.4  DMC - Dynamic Markov Coding

Probabilistic models with a finite number of states can be described by a
finite automate. A set of states S(i) and a set of probabilities of
transition P(i,j) from the state i into the state J are called Markov's
models. Markov's Dynamic Coding (DMC) is of practical interest, in that
it works adaptively, starting from a simple initial model and adding,
if necessary, new states. PPM technology is a particular case of the DMC
approach. DMC allows the construction of context models not only for
single symbols (as with PPM and consideration of letters of the alphabet),
but also for phrases or lines (Ross Williams'DHPC algorithm). In this sense,
ACB-compression can be classified as a variety of the DMC approach.
I am not aware of software realizations of DMC algorithms; I would appreciate
receiving any information on this issue (especially software realizations
for the IBM_PC).

                   3. Comparison: ACB, PPM, LZ

ACB.EXE v1.26a works only in a "solid" mode, in which all files being
packed are lined in a solid stream; this allows the most effective use the
advantages of a large frame. The least favourable data for this solid-mode
test data. As a rule, various information is taken into the catalogue to
conduct tests. However, the high adaptivity of ACB-compression reduces to a
minimum the possible loss, and on real data, the solid mode gives gain in
compression in 99% cases. Below please find a comparative table (as test
files, I have selected those transmitted by modem):

Compared: ACB v1.26a, PPMZ_8.1, RAR v2.0(DOS), ZIP v2.04g
DATA -	Sources of PPMZ_8.1, *.obj (PPMZ_8_1), PPMZ_8_1.EXE
Note: PPMZ_8.1 - Author is Charles Bloom <cbloom@mail.utexas.edu>.

 Directory of D:\GOR\TSTACB\ACB126A\PPMZ

[.]           	[..]          	ARITHC.OBJ  	VERSION.C   	PPMDET.C    
PPMZ.C      	MAIN.C      	PPMZHEAD.C  	PPMZ_CFG.C  	PPMARRAY.C  
PPMCODER.C  	PPMCODER.H  	PPMZ.H      	PPMARRAY.H  	PPMZHEAD.H  
PPMDET.H    	PPMZ_CFG.H  	README.TXT  	TODO.TXT    	LRU.TXT     
NOTES.TXT   	V81REP.TXT  	PPMZGNU.MAK 	PPMZ.MAK    	PPZ.BAT     
UNPPZ.BAT   	MAKEFILE    	PPMZ_NT.LNK 	BBITIO.C    	FILEUTIL.C  
ARITHC.C    	CRC32.C     	CINDCATR.C  	INTMATH.C   	CRC32.H     
FILEUTIL.H  	ARITHC.H    	BBITIO.H    	CINDCATR.H  	INTMATH.H   
O0CODER.C   	MEMPOOL.C   	ORDER0.C    	MEMUTIL.C   	RUNTRANS.C  
O0CODER.H   	MEMUTIL.H   	ORDER0.H    	MEMPOOL.H   	RUNTRANS.H  
TIMER.C     	STRUTIL.C   	STRUTIL.H   	TIMER.H     	INC.H       
CONTEXT.H   	CONTEXT.C   	BBITIO.OBJ  	CINDCATR.OBJ	CONTEXT.OBJ 
CRC32.OBJ   	FILEUTIL.OBJ	INTMATH.OBJ 	MAIN.OBJ    	MEMPOOL.OBJ 
MEMUTIL.OBJ 	O0CODER.OBJ 	ORDER0.OBJ  	PPMARRAY.OBJ	PPMCODER.OBJ
PPMDET.OBJ  	PPMZ.OBJ    	PPMZ_CFG.OBJ	PPMZHEAD.OBJ	RUNTRANS.OBJ
STRUTIL.OBJ 	TIMER.OBJ   	VERSION.OBJ 	PPMZ_8_1.EXE
        77 file(s)        374 462 bytes

As PPMZ_8.1 compresses only one file, the catalogue PPMZ was packed
in the archive without compression with RAR_2.0. PPMZ_8.1 requires
too much RAM, in order to conduct a test on a computer with 16 Mb RAM,
test data were broken down by 128 Kb.

PPMZ     RAR       128 000  01.10.96  14:20
PPMZ     R00       128 000  01.10.96  14:20
PPMZ     R01       122 218  01.10.96  14:20
         3 file(s)        378 218 bytes

                    Regim (Options):
ACB  ACB_1.26a    - FAST/NORMAL/MAX. ( "TAUGHT CHANNEL"-mode is not used)
PPMs PPMZ_8.1     - select mode with max compression (-b) & 0_mode (-c0)
LZ   RAR_2.0(DOS) - all Solid, max compression
LZ   ZIP_2.04g    - max compression

P120/16/WIN95_DOS-SESSION
PPMZ_8_1.EXE compiled with BORLANDC_4.5 (Console_WIN32/speed)

ACB(PPMZ\*.*)  ACB(ppmz.r??) PPMZ            RAR(ppmz.r??)  PKZIP(ppmz.r??)

   Size/sec     Size/sec     Size/sec         Size/sec        Size/sec

B 83255/8      84155/8      93926/36 (-c0)   99116/3        110215/2  
b 82592/11     83441/11     93653/670 (-b)
u 82391/15     83153/16           ^^^
                                  !!!

The advantage of ACB-compression can be increased if ACB is tuned to some type
of data, using the "TAUGHT CHANNEL" mode, having created in advance the
context from relative information. Such a possibility is only a side effect of
this mode. Something similar occurs in choosing options in order to obtain the
maximum compression coefficient (external information about the data type).
This approach is good if one wants to set the world records, but is of little
practical application.

In the APPENDIX_B see the results of the ACB_1.26a on "Calgary Corpus".

In the APPENDIX_C see the comparative table between:
ACB_1.14a,ACB_1.23c,ACB_1.26a,RAR_2.0(DOS),PKZIP_2.04g.


                   4. "TAUGHT CHANNEL"

The idea: in compressing information transmitted by a channel of
          communications, all of ALL!!! the information earlier transmitted by
          this channel is tranmitted, so that any repetitions of the data
          transmitted in the past through the channel will be used in
          establishing the algorithm for compression.

Realization: The ACB-compression algorithm was implemented in the
             archivator ACB.EXE for Dos-WIN95 (ACB_1.26a- the latest version);
             it was fully written on Assembler (32_Protection-mode) and
             optimized for the Pentium processor.
             It supports the "TAUGHT CHANNEL" mode.

Innovations: The ACB-compression algorithm is the only algorithm of
             compression, the compression coefficient of which asymptotically
             increases with the growth of the sliding frame. Other algorithms
             can track not more than 8-65 Kb of the channel's background,
             depending on the types of the data, as compared with
             822 Kb in ACB_1.14a & 1671 Kb in ACB_1.26a.

                   4.1 Testing with "taught channel" - mode

Stream of data consists of two types of data:
- ARCHIVE COMPARISON TEST (A.C.T.) Author: JEFF GILCHRIST
  for June, July, August, September, October ......;
- and *.exe files these are different version ACB.EXE:

       Regim (Options):
 ACB_1.26a - "TAUGHT CHANNEL"-mode
 ZIP_2.04g - max compression

Stream: TXT --> EXE --> TXT --> EXE .... TXT --> EXE

Stream       *.ACB     *.ZIP
of data       size      size

A.C.T_6.95   10662     12812
ACB_1.13b    35500     37884
A.C.T_7.95    4134     13485
ACB_1.14a     9042     36976
A.C.T_8.95    2123     12849
ACB_1.15b     8446     37125
A.C.T_9.95    1942     13057
ACB_1.17a     7731     37247
A.C.T_10.95   1819     13591
ACB_1.20a     7349     37751
A.C.T_11.95   1658     14060
ACB_1.23a     7222     37811
A.C.T_2.96    1503     14133
ACB_1.23b     7417     37987
A.C.T_3.96    2864     15971
ACB_1.23c     5764     38002

      Total   115284   410741


                   5.  Conclusion

Pentium-100 encourages the creation of new algorithms, one of which is
ACB-compression. 

If you are working on the problem of data compression for communications
channels, ACB-compression is the most effective algorithm for compressing
the stream of various information.

---------------------------------------------------------------------------
APPENDIX_A:
                      Complexity of ACB-compression

Time:   T(n) = n/k * ( S + Log2( n/(2*k) ) ) + o(1)
Memory: M(n) = n + n/8 + n*(Log2(n)+1)/4 + 11000

n - sliding frame size (for ACB v1.14 n=822000 bytes & v1.26a n=1671000 bytes),
S - context dictionary size
for ACB_1.14 : S=53, for ACB_1.26a : S=100 (MAX.), S=55 (NORMAL), S=31 (FAST)
k - I/C  I - data size, C - code size,
T(n)code=T(n)decode,
M(n)code=M(n)decode.

Note:
- memory requirements decrease proportionally to an ratio.
  However, the latter possibility in ACB.EXE is not used 
  to simplify memory manager.
  M'(n) = n + n/8 + n*(Log2(n)+1)/(4*k) + 11000

- If S=1, the compression rate will be approximately the same as LZ-schemes
  have on small files (32-64 Kb). However, with the inrease of their size,
  the advantage of the ACB-compression increases (S=1). And all its valuable
  properties (on_the_fly, unrestricted growth of a sliding frame ...) remain.

- Log2(n)+1 is the size of the pointer (bits), necessary for addressing
  each byte of the sliding frame 

For n=4096 bytes   M(n)=28920 bytes
    n=32768        M(n)=178936
    n=65536        M(n)=363256
    n=1048576      M(n)=6695672

Property:
       _____        _____
Data  |     | Cod  |     |  Data
----->| ACB |----->| ACB |-------->
Time1 |_____|      |_____| Time2

Time1=Time2

The first compressed byte generates the code sufficient for
restoration of this byte (on_the_fly).

On Pentium_133 the speed of the code is 28/64 Kbits/sec, if the transmition
speed in the communication line is <= than 28/64 Kbits/sec, then time work
of the ACB does not influence on the total transmition time.
---------------------------------------------------------------------------
APPENDIX_B:

------ ACB_1.26a -----

Computer: (Intel - P120/RAM 16M/MS-DOS)

            ---FAST--    --NORM---    ---MAX---
            bpc / time   bpc / time   bpc / time
                   sec.         sec.         sec.
bib.acb     2.0111   3   1.9687   4   1.9504   7
book1.acb   2.4074  43   2.3560  63   2.3309  94
book2.acb   2.0078  25   1.9719  37   1.9590  55
geo.acb     4.6904   6   4.6830   8   4.6815  11
news.acb    2.3976  14   2.3570  21   2.3365  31
obj1.acb    3.5440   1   3.5350   1   3.5346   1
obj2.acb    2.2919   7   2.2596  11   2.2432  14
paper1.acb  2.4049   2   2.3755   2   2.3715   3
paper2.acb  2.4037   3   2.3730   4   2.3634   6
pic.acb     0.7898   8   0.7698  11   0.7563  15
progc.acb   2.3807   1   2.3554   1   2.3497   2
progl.acb   1.5520   1   1.5345   2   1.5277   2
progp.acb   1.5341   1   1.5188   1   1.5154   1
trans.acb   1.3316   2   1.3148   2   1.3064   2
------------------------------------------------
AVR.        2.2676       2.2409       2.2304
------------------------------------------------

APPENDIX_C:

Here are the results comparing: ACB_1.26a,ACB_1.23c,ACB v1.14a,RAR_2.0(DOS),
PKZIP_2.04g.

P120/16 RAM/256 Kb cache/DOS

--------------/ *.DOC /---------------

Non-formatted text 10 files of parts of fiction (WINWORD_6.0):

AHERON   DOC       118 784 15.02.95   11:08
DREM1    DOC       478 208 16.11.95   16:22
EV1S1    DOC       153 600 23.03.95   11:59
FKLP1    DOC       151 040 19.13.95   12:07
KING     DOC        48 128 10.05.94   18:55
LINE1    DOC       624 128 16.11.95   16:32
MAMA1    DOC       336 896 17.11.95   12:29
PRINCC1  DOC       258 048 17.11.95   12:50
RUSHA    DOC       343 040 29.11.94   11:29
DV1_1    DOC       156 160 17.03.96   10:18
       10 file(s)      2 668 032 bytes

                        FAST_MODE  |  NORMAL_MODE |  MAX._MODE
                      SIZE    TIME | SIZE    TIME | SIZE    TIME
                      bytes   sec. | bytes   sec. | bytes   sec.
                                   |              |
ACB_1.14  .........................| 820644  268  |
ACB_1.23c ..........  844396  133  | 797307  241  |  790186  361
ACB_1.26a ..........  812506  173  | 796485  238  |  788363  336
RAR_2.0(DOS) -m5 -s ..............................| 1055743   47
PKZIP_2.04g  -ex .................................| 1096831   23


--------------/ EXE /---------------

Package SYMANTEC C++ v.6.1 (only for DOSX)  - 3712545 bytes

                        FAST_MODE  |  NORMAL_MODE  |  MAX._MODE
                      SIZE    TIME | SIZE    TIME  | SIZE    TIME
                      bytes   sec. | bytes   sec.  | bytes   sec.
                                   |               |
ACB_1.14  .........................| 1561486  321  |
ACB_1.23c .......... 1457590  151  | 1418865  272  | 1411415  405
ACB_1.26a .......... 1430522  197  | 1417541  273  | 1411213  381
RAR_2.0(DOS) -m5 -s ...............................| 1727004   89
PKZIP_2.04g  -ex ..................................| 1856873   31

=========================

George Buyanovsky
Internet E-mail:  new < acb@online.ru > , old < george@acb.alma-ata.su >

P.S. The Registered Version of ACB.EXE Ver.1.26a can be purchased and received
     immediately on the Internet at Albert's Ambry.  Registration at Albert's
     also eliminates shipping and handling costs.  Please go to:

           http://www.alberts.com

       Search on: acb_126a.zip

       Click on the "Buy It" Hotlink to register this software.

       Thank you for registering this program.
