From : Ed Beroset                                           1:3641/1.250
Subj : SQUARE ROOT CODE


FWIW, I compared a few different implementations of square root programs
and timed them using a simple driver to call each of them with all integers
from 0 to 1000000.  Here are the results I got on a 486DX4-100 with 8M
running in a DOS box under Win95:

Version    avg time (sec)    Source
-------    --------------    ------
slagel      n/a              John A. Slagel (snippets file sqrt1.asm)
leach       67.00            Steve Leach (a few days ago in this echo)
magic       61.51            "Assembly Language Magic" (snippets sqrt2.asm) 
sqrt86      4.89             Ed Beroset (8086 version posted today)
sqrt        4.23             Ed Beroset (80386 version a few days ago)
coffin      4.12             based on Jerry Coffin (snippets file sqrt3.asm)
fastsqrt    3.46             synergistic combination of sqrt86 and coffin
fsqrt       1.54             coprocessor 

Notes:
slagel -- 8086+.  16 bit only version.  It's seems to be based on basically the
same algorithm as implemented in coffin, but not as well done, so I didn't
convert this one.

leach -- 386+.  there was a typo in the posted version -- the mov eax,ecx just
after the label root22_ret should have read mov eax,ebx.

magic -- 8086+.  no problems encountered.

sqrt86 -- 8086+.  could be optimized.  posted version does redundant
multiplications when passed a perfect square.

sqrt -- 80386+.  no problems encountered, but trashes a lot of registers.

coffin -- 8086+.  As posted in the snippets, it's a 16-bit only version, but I
wrote an implementation using the same algorithm which accepts 32-bit numbers.

fastsqrt -- 8086+.  Based on the Jerry's algorithm combined with a better first
guess routine from my previously posted versions.   Since it was fastest version
which didn't use an NPU, I'll append it to the end of this message.

fsqrt -- 8087+.  no problems encounterd.  has unique advantages of accuracy and
the fact that it uses no CPU registers.  Unfortunately, it requires an NPU.

I found it interesting that the NPU version was over twice as fast as the
nearest competitor.  FWIW, since each routine was assembled and linked as a
separate program and then timed using a batch file, these times reflect not only
the time required for the algorithm but also include the overhead required for
loading the program, so the actual speed advantage is probably even greater. 
Also note that the accuracy of each of these versions was not checked or
compared, although I do know that the following routine is more accurate than
the previous ones that I posted.  If I get ambitions, I'll test and post
accuracy comparisons as well.  

;/***************************************************************************
;
;   Name:
;       sqrt
;
;   History:
;       adapted on Fri  09-22-1995  by Ed Beroset
;         from a public domain 16-bit version written by Jerry Coffin
;         with a better first guess approximation for speed.
;
;       released to the public domain by the author
;
;   Purpose:
;       Calculate the integral portion of the square root of a
;       positive integer.
;
;   Algorithm:
;
;       First, use a series of 32-bit shifts to scan for the bit number
;       of the most significant bit.  The result of this operation is
;       the log (base 2) of our original.  We use this information to
;       generate a good first guess for our algorithm, since
;       2**((log[2](X))/2) = sqrt(X). That is, if we halve the log of a
;       number and raise the result to the base of our log, we get the
;       square root of the original.  To assure that we don't get a
;       divide error, all less significant bits are also set.  This is
;       done by decrementing (this has the effect of setting all bits of
;       lesser significance) and then ORing back in the original.
;
;       Once we have that number, we do a series of successive
;       approximations by dividing our target by our guess.  This
;       division gives the other factor which is then averaged with our
;       guess to converge quickly on the correct answer.
;
;       We decide if we are "correct" or not by checking the error term,
;       which is the difference of the current guess and next guess.
;
;   Entry:
;
;     DX:AX = the number whose square root we're seeking
;
;   Register usage within the routine:
;
;        AX = scratch register (for divides)
;        BX = current guess
;        CX = error term
;        DX = scratch register (for divides)
;     DI:SI = copy of original number
;
;   Exit:
;
;        BX = calculated square root
;
;   Trashed:
;       CX, SI, DI
;
;***************************************************************************/
        p8086
proc sqrt
        cmp     dx,0ffffh               ; check for domain error
        jz      error                   ; bail if it's too big
        mov     di,dx                   ; save DX:AX in DI:SI
        mov     si,ax                   ;
        mov     cx,31                   ; max bit count
FindSetBit:
        shl     ax,1                    ; do 32-bit shift left
        rcl     dx,1                    ;   of DX:AX
        jc      GotSetBit               ; if we found MSB, we're done
        loop    FindSetBit              ; keep going
GotSetBit:
        mov     bx,1                    ; set up our initial guess
        shr     cx,1                    ; log2(alpha)/2
        shl     bx,cl                   ; now shift bit into position
        mov     cx,bx                   ; save it
        dec     bx                      ; flip all lesser bits
        or      bx,cx                   ; and or the two together
        mov     cx,2                    ; our error term
top:
        mov     dx,di                   ; reload DX:AX
        mov     ax,si                   ;
        div     bx                      ; ax = dx:ax / bx
        xchg    ax,bx                   ; save current guess in ax
        add     bx,ax                   ; add two factors
        rcr     bx,1                    ; and divide by two for next guess
        sub     ax,bx                   ; Q: (current - next) < 2?
        cmp     ax,cx                   ; under threshold?
        ja      top                     ; if not, keep going
error:
        mov     dx,di                   ; reload DX:AX
        mov     ax,si                   ;
        ret                             ;
endp sqrt

--- Squish v1.01
 * Origin: = Psychotronic BBS // 919-286-4542 // Durham, NC = (1:3641/1.250)

