Mathcenter Forum - Heron Square root

Ancient Square Roots

The ancient Babylonians had a nice method of computing square roots
that can be applied using only simple arithmetic operations. To find
a rational approximation for the square root of an integer N, let k
be any number such that k^2 is less than N. Then k is slightly less
than the square root of N, and so N/k is slightly greater than the
square root of N. It follows that the average of these two numbers
gives an even closer estimate

k + N/k
k_new = -------
2

Iterating this formula leads to k values that converge very rapidly
on the square root of N. This formula is sometimes attributed to
Heron of Alexandria, because he described it in his "Metrica", but
it was evidently known to the Babylonians much earlier.

It's interesting to note that this formula can be seen as a special
case of a more general formula for the nth roots of N, which arises
from the ancient Greek "ladder arithmetic", which we would call
linear recurring sequences. The basic "ladder rule" for generating
a sequence of integers to yield the square root of a number N is the
recurrence formula

s[j] = 2k s[j-1] + (N-k^2) s[j-2]

where k is the largest integer such that k^2 is less than N. The
ratio s[j+1]/s[j] approaches sqrt(A) + N as j goes to infinity. The
sequence s[j] is closely related to the continued fraction for
sqrt(N). This method does not converge as rapidly as the Babylonian
formula, but it's often more convenient for finding the "best"
rational approximations for denominators of a given size.

As an example, to find r = sqrt(7) we have N=7 and k=2, so the
recurrence formula is simply s[j] = 4s[j-1] + 3s[j-2]. If we choose
the initial values s[0]=0 and s[1]=1 we have the following sequence

0, 1, 4, 19, 88, 409, 1900, 8827, 41008, 190513, 885076, ...

On this basis, the successive approximations and their squares are

r r^2
--------------------- -----------------
(4/1) - 2 4.0
(19/4) - 2 7.5625
(88/19) - 2 6.9252...
(409/88) - 2 7.0104...
(1900/409) - 2 6.99854...
(8827/1900) - 2 7.000201...
(41008/8827) - 2 6.9999719...
(190513/41008) - 2 7.00000390...
(885076/190513) - 2 6.999999457...
etc.

Incidentally, applying this method to the square root of 3 gives the
upper and lower bounds used by Archimedes in his calculation of pi.
(See Archimedes and Sqrt(3).)

The reasoning by which the ancient Greeks arrived at this "ladder
arithmetic" is not known, but from a modern perspective it follows
directly from the elementary theory of difference equations and
linear recurrences. If we let q denote the ratio of successive
terms q = s[j+1]/s[j], then the proposition is that q approaches
sqrt(N) + k. In other words, we are trying to find the number q
such that
(q - k)^2 = N

Expanding this and re-arranging terms gives

q^2 - 2k q - (N-k^2) = 0

It follows that the sequence of numbers s[0]=1, s[1]=q, s[2]=q^2,..,
s[k]=q^k,... satisfies the recurrence

s[n] = 2k s[n-1] + (N-k^2) s[n-2]

as can be seen by dividing through by s[n-2] and noting that q equals
s[n-1]/s[n-2] and q^2 = s[n]/s[n-2]. Not surprisingly, then, if we
apply this recurrence to arbitrary initial values s[0], s[1], the
ratio of consecutive terms of the resulting sequence approaches q.
(This is shown more rigorously below.)

It's interesting that if we assume an EXACT rational (non-integer)
value for sqrt(A) and exercise the algorithm in reverse, we generate
an infinite sequence of positive integers whose magnitudes are
strictly decreasing, which of course is impossible. From this
the Greeks might have inferred that there can not be an exact
rational expression for any non-integer square root, and thereby
discovered irrational numbers.

A similar method can be used to give the 3rd, 4th, or any higher root
of any integer using only simple integer operations. In fact, this
same method can be applied to find the largest real root of arbitrary
polynomials. To illustrate, consider the general polynomial

x^k - c1 x^(k-1) - c2 x^(k-2) ... - ck = 0 (1)

The corresponding linear recurrence is

s[n] = c1 s[n-1] + c2 s[n-2] + ... + ck s[n-k] (2)

If we let r1,r2,...,rk denote the k distinct roots of this polynomial,
then s[n] is given by

s[n] = A1 r1^n + A2 r2^n + ... + Ak rk^n

where the coefficients A1,A2,..,Ak are constants which may be
determined by n initial values of the sequence. If r1 is the
greatest of the roots (in terms of the norms) and is a positive
real number then, as n increases, the first term of s[n] becomes
dominant, and so the ratio s[n+1]/s[n] approaches

[A r1^(n+1)] / [A r1^n] = r1

Hence we can use the recurrence relation (2) to generate arbitrarily
precise rational approximations of the largest root of (1). This is
why the ratio of successive Fibonacci numbers 1,1,2,3,5,8,13,21,...
goes to the dominant root of x^2 - x - 1, which is [1+sqr(5)]/2.

To illustrate, consider the "Tribonacci sequence", whose recurrence
formula is s[n]=s[n-1]+s[n-2]+s[n-3], and whose characteristic
equation is the cubic

x^3 - x^2 - x - 1 = 0

The roots of this polynomial are

r1 = 1.839286755214161..
r2 = -0.419643377607081.. + i 0.606290729207199..
r3 = -0.419643377607081.. - i 0.606290729207199..

If we arbitrarily select the initial values s[0]=1, s[1]=2, and s[2]=4,
we have the coefficients

A1 = 1.137451572282629..
A2 = -0.068725786121314.. + i 0.12352180763419..
A3 = -0.068725786121314.. + i 0.12352180763419..

All these numbers can be expressed as nested radicals given by the
solution of the cubic, but I'm too lazy to type them. Anyway, the
nth term of the recurrence is

s[n] = A1 r1^n + A2 r2^n + A3 r3^n

Notice that the magnitudes of r2 and r3 are less then 1, so as we
take higher powers these terms become progressively smaller, and
soon become negligible. On the other hand, r1 is greater than 1,
so as n increases the first term becomes larger, and we have
s[n] -> A1 r1^n as n goes to infinity, and so s[n]/s[n-1] approaches
the root r1.

There are many fascinating things about linear recurrences like this.
For example, they can be used to test numbers for primality. (See the
article on "Symmetric Pseudoprimes".) By the way, to avoid complex
numbers, we can also express s[n] for the Tribonacci sequence as

s[n] = A1 r1^n + 2B^n [u cos(nq) - v sin(nq)]

where
B = 0.737352705760327...
q = 2.176233545491871...
u = -0.068725786121314..
v = 0.12352180763419..

Naturally we can also use this method to generate rational
approximations for the jth root of any integer M, simply by
considering the polynomial x^j - M = 0. The only difficulty is that
the magnitudes of all n roots of this polynomial (the complex jth
roots of M) are all equal, so the sequence doesn't converge. This
is clear from the recurrence s[k] = M s[k-j], which just gives k
disjoint geometric sequences. However, we can solve this problemby
a simple change of variables. If we define y such that x = y - k
where k^j is the greatest nth power less than M, we can make this
substitution to give the polynomial (y-k)^j - M = 0. For example,
with j=2 (for square roots) we have

y^2 - 2k y - (M - k^2) = 0

The corresponding recurrence is

s[n] = 2k s[n-1] + (M-k^2) s[n-2]

which is the same as the "ladder arithmetic" square root method given
earlier. Notice that in order to give the roots of the characteristic
polynomial very unequal magnitudes, so that the positive real square
root would be dominant and cause rapid convergence, we make the
substitution x = y - k, which just shifts the roots in the positive
real direction, so that the negative square root was closer to zero
and the positive square root was further from zero.

The same simple method can be applied to higher-order roots, noting
that the nth roots of the number N are arranged in a circle on the
complex plane, centered on the origin. By simply shifting this circle
of roots in the positive real direction we can make the magnitude of
the positive real root greater than the magnitude of any of the other
roots (because the radius of the original circle of roots is smaller
than the new positive real root). To illustrate, the cube root of 2
can be found using this method by means of the recurrence

s[j] = 3s[n-1] - 3s[n-2] + 3s[n-3]

With the initial values 0,0,1 we have the sequence

0, 0, 1, 3, 6, 12, 27, 63, 144, 324, 729, 1647, 3726,...

Each sequence of 3 terms is commonly divisible by 3, so we can
factor these out to keep the terms from growing so fast. This gives

0,0,1,3,6,12,...
1,2, 4,9,21,48,...
3, 7,16,36,81,183,...
12,27, 61,138,312,705,...
46,104,235,531,1200,2712,...

and so on. In this way we can find the rational approximations

s[6]/s[5] - 1 = 5/4
s[13]/s[12] - 1 = 29/23
s[17]/s[16] - 1 = 63/50
s[25]/s[24] - 1 = 635/504

The cube of 635/504 is 1.999998024...

This approach gives a simple linear recurrence, but when determining
higher order roots it is clearly not very efficient, because the nth
roots of the number N are uniformly arranged in a circle on the
complex plane, centered on the origin. It's easy to see that we
can't make the magnitude of the positive real root much greater
than the magnitudes of the neighboring roots by just shifting in
the positive real direction.

To improve the convergence, let's approach the problem systematically.
The polynomial x^n - N = 0 has n complex roots x1,x2,..,xn, all of
equal magnitude. Let x1 denote the positive real root, and suppose
X is a positive real estimate of x1. We can then define a polynomial
with the roots w1,w2,..,wn given by

xj + X
wi = ------
xj - X

Since X is a positive real number, it is closer to x1 than to any of
the other x roots, and so x1 - X is the smallest denominator, and
x1 + X is the largest numerator. Consequently, w1 is the w value
with the largest magnitude. Solving the above expression for xj
gives
wj + 1
xj = X ------
wj - 1

Substituting xj into the original polynomial x^n - N = 0 gives
an equation with the roots wj

n / w + 1 \ n
X ( ------- ) - N = 0
\ w - 1 /

Expanding this and clearing the denominator gives the polynomial

n n n n-1
( X - N ) w + n( X + N ) w + ... = 0

Since w1 is the largest root (probably by a large margin), and it's
one of at most two real roots, and the other real root is the smallest,
a rough estimate of w1 is just the sum of the w roots, which equals
the coefficient of w^(n-1) of the monic form of the polynomial (i.e.,
the above polynomial divided by the first coefficient). Letting W
denote this coefficient, we have

n
X + N
W = n --------
n
X - N

An improved estimate for x1 is therefore X(W+1)/(W-1), which gives

n
(n+1)X + (n-1)N
X_new = X ----------------
n
(n-1)X + (n+1)N

Although this looks quite different, it is actually a generalization
of the ancient Babylonian (and Heron's) method for square roots. To
see this, note that a slightly less efficient application of this
method would be to replace w with 2(w+1), which leads to the
substitution
2w + 3
x = X ------
2w + 1

If we substitute this expression for x into the original equation
x^n - N = 0 and expand, we get a polynomial in w such that the real
positive x root maps to the largest w root, and the remaning w roots
are relatively small (although they are not as small as with the
previous substitution, which is why this formula isn't quite as
efficient). For square roots we have n=2 and we take the coefficient
of w of the monic form of this polynomial as a good estimate of the
largest root w, leading to the result

2
X + N X + N/X
X_new = X -------- = -------
2 2
2X + 0

which is the ancient Babylonian/Heron formula. In contrast, using
the more nearly optimum substitution with the same approach leads to
the formula for square roots

2
X + 3N
X_new = X ---------
2
3X + N

which converges slightly faster than the ancient formula.

It's interesting to notice the appearance of the quantity (w+1)/(w-1)
in this context, considering the relation between root extractions
and logarithms (if x^n = N then ln(x) = ln(N)/n), because we also
have ln[(w+1)/(w-1)] = 2[1/w + 1/(3w^3) + 1/(5w^5) + ...]. It is
true that the substitution (w+1)/(w-1) is optimal for extracting
the nth root?