INFORMATION CONTENT

GENESIS FOUNDATION Where Science and Scripture Meet!!

INFORMATION CONTENT
Universe / Life / Knowledge
By
Dr. Robert Gange

INTRODUCTION
-----------------------------

Information theory has enabled us to quantify the degree to which we
can specify the complexity of physical systems. Just as length and
weight are measured in terms of say, inches and pounds, so too
complexity is measured in units called "bits" of information. The
complexity of a physical system is characterized by the number of
permutations and combinations that its internal states are free to
assume within the constraints imposed by its boundaries.

But what may be surprising is that this complexity is quantified by the
number of states that deny the system an existence. This is why
biological structures possess vast information content. Unlike inorganic
systems, the *sequence* of an organism's DNA's nucleotide bases is
critical to its survival. Stated differently, all other permutations
and combinations of the DNA's building blocks deny the organism its
existence. The reason is that an organism *is* its DNA sequence e.g.,
elephants and ants differ *only* in the sequence of nucleotide bases
along their DNA.

MEASURING COMPLEXITY
--------------------
This sequence is but one of a myriad of permutations and combinations
of system states that are measured in terms of "bits." But the
measurement of complexity is deceptive. The reason is that the bits
increase logarithmically as the amount of information grows. This means
that a large change in complexity is measured by a comparatively small
number of bits.

Marcel Golay (Anal. Chem. June, 1961, 33:23A) calculated the
information content of the universe. His conclusion was that the
universe is fully described by 220 bits of information. My calculations
are given in Appendix 1, and extend from 235 to 293 bits, depending on
assumptions. They can also be found in Origins & Destiny, 1986,
Appendix 6, Word Publishing, London. James Trefil (SpaceTimeInfinity,
1985, Smithsonian Books, Washington, DC) gives the informational
content of our universe as 300 bits.

Conversely R.Setlow and E.Pollard estimate that the information content
of protein and bacteria ranges from a low of 10,000 bits, to a high of
over 10**12 bits (Molecular Biophysics, 1962, Ch.3(10):71, Addison-
Wesley, Reading, MA). Moreover, they are not alone in these estimates
(H.Yockey, Jour. Theor. Biol., 1977; 67:365, 1974, 46:369).

COMPARING COMPLEXITY
--------------------
Is it preposterous that systems within the universe could contain more
information than the universe in its entirety? How, for example, can a
bacterium contain several million bits of information when the universe
taken as a whole has under 300?

The answer lies in the knowledge that whereas the existence of each
living biological structure depends on the integrity its DNA blueprint,
a vast number of permutations and combinations of the DNA's nucleotide
building blocks exists that will deny the organism its existence. The
universe is described by far less information because its internal
states can assume many permutations and combinations that cannot be
distinguished, thereby leaving out of the total, far fewer permutations
and combinations available to deny it an existence. Stated differently,
the universe can possess far less information than the life it contains
because as a system, its existence (and therefore description) depends
on the configuration of its internal states in ways that are radically
different than is the dependence of life's existence.

To illustrate the point, imagine a simple room constructed from a
modest blueprint, but that contains a complex nuclear device created
from a very sophisticated blueprint. Less information is needed to
describe the plan characterizing the room, than is required to describe
the nuclear plan. Is there a contradiction? No. This example presents
no conceptual difficulty because we have *not* assumed that the nuclear
device originated from say, the walls vibrating or, for that matter,
any activity in the walls or ceiling.

MATERIALISM AND THEISM
----------------------
A conceptual issue occurs in understanding how a bacterium with several
million bits can be found in a universe possessing 270 bits. But why
does the problem arise? It stems from the presupposition that natural
laws operating within the universe produced the bacterium. This idea
results from materialism - a philosophy that asserts: All that exists
is matter and its motion. If one accepts this philosophy, then s/he
must believe that in one way or another, the bacterium is the outcome
of activity of physical matter within the universe. One is then left
with the conundrum that a system capable of spawning structures of
complexity under 300 bits somehow created systems possessing several
million bits.

However if this is untrue, then the conceptual problem disappears.
Just as someone can design a simple room and a complex device that is
later placed within it, so too a Supreme Intelligence can design a less
complex universe, and then the more complicated system called life.

Thus, in principle, a Supreme Intelligence is free to construct two
systems of differing complexity using two sets of informational
specifications that He creates. The reason two specifications can
exploit the identical properties of physical matter is that the
physical matter is created by the same Intelligence that produces the
informational specifications. Therefore the reason "the universe can
contain systems with more information than the universe considered as a
whole" is that the information it contains is a specification separate
and distinct from the one that brought it into being.

MEASURING INFORMATION
---------------------
Information can be broken down into units commonly referred to as
"bits." These should not be confused with "computer bits" whose context
is memory and logic, states are dynamic, and nomenclature includes a
radix. Unlike the fixed number of computer bits within an application,
informational bits increase logarithmically with the growth of the
magnitude of the information. This means that large changes of
information are measured by comparatively small numbers of bits.

The number of bits of information corresponding to various library
sizes is calculated in Appendix 2, and listed below. The average book
size is assumed to be 200 pages. The letters K, M, B respectively
denote thousand, million, billion. The 40 million books listed for the
Library of Congress is a weighted mean, and corresponds to 24 million
books, 19 million pamphlets, 3 million maps and 34 million
miscellaneous items.

No. Books Library No. Bits
1 N/A 32
1K Tiny 42
20K Research Lab 46
100K Public Library 48
4M Large University 54
40M Library of Congress 57
50B All Human Knowledge 70

The information content of physical systems is given below. It can be
estimated several ways, as shown in Appendix 1.

System No. Bits
------ --------
Earth 160
Solar System 170
Universe 235
Protein 1500
Simple Bacterium 7M
Human Cell 20B

______________________________________________________________________

APPENDIX 1
==========

The basic particle count of the universe is estimated to be of the
order of 10**80. A 3D blueprint specifying a distribution on a scale of
this magnitude is described by about 270 bits of information.

Another way to estimate the informational specification for the
universe is to approximate the information content of a blueprint for
planet earth, and then multiply by the appropriate number of planets.
In terms of percentages of the whole by weight, the earth consists of:
46.5 O, 28.0 Si, 8.1 Al, 5.l Fe, 3.5 Ca, 2.8 Na, 2.5 K, 2.0 Mg, 0.58
Ti, 0.20 C, 0.20 H, 0.19 Cl, 0.11 P, 0.l0 S, and 0.12 percent trace
elements (R.Cotterill, Cambridge Guide to the Material World, 1985, Ch
7:100, Cambridge University Press). In total, the weighted mean of the
atomic weight of all earth's elements is 24.3.

Since the earth's mass is about 2 x 10**26 tons, and since the weighted
mean of the atomic weight of all its elements is 24.3, then if we
assume that the average molecule is composed of at least three atoms,
the maximum number of molecules that comprise the earth is of the order
of 1O**54. This gives a blueprint containing about 180 bits.

However, when employing this perspective, each molecule is envisioned
within a space equal to one-half the quotient of earth's volume (2.6 x
10**11 cubic miles) and the number of mean weighted molecules (1.6 x
l0**54), i.e., a cube of the order of 2.7 / 10**10) inches on a side.
This is over two orders of magnitude smaller than the minimum distance
allowed by interatomic bond considerations (about 3.5 x 10**8 inches
for triatomic molecules). If we use this more realistic constraint, the
information content is lowered to about 160 bits. Golay independently
calculated the information content of the earth from biological
considerations and obtained a maximum of 150 bits (1961, op. cit.).

As regards the universe, the average number of stars in a galaxy is
about 3 x 10**11. Therefore allowing 2 x 10**10 galaxies to exist out
to the visible horizon (recent data may imply 5 x 10**10), and further
granting 10 planets per star (an extremely generous assumption),
estimates of the information content of the universe range from a low
of 220 bits (Golay's number) to a high of 235 bits (earth's information
extended through a 30 billion light-year diameter space).

The apparent discrepancy between these numbers and the earlier 270 bit
estimate occurs because the latter 235 bits ignores the information
content of the stars. As a practical matter, stars are primarily
composed of hydrogen and their blueprint is informationally sterile
when compared to earth. This, of course, includes our sun. However, if
we choose to ignore this fact and artificially infuse the sun with
substantive information, then its blueprint can be estimated as
follows: The sun's mass is about 7 x 10**28 tons giving a maximum of
about IO**58 hydrogen molecules and a blueprint of about 195 bits.

Although unrealistically high, extending this to 300 thousand million
stars per galaxy and 20 billion galaxies yields two bits short of 270
for the universe. If one wants to force the information content of the
universe to a maximum, then a consideration of the number of photons
(10**88) based upon the 3 degree Kelvin background temperature within
its space yields 293 bits.

The reason that biological structures have considerably higher
information content lies in the fact that unlike inorganic systems, the
sequence of the building blocks is critical to the survival of the
structure. Estimates of the information content of protein and bacteria
are given in the literature (Setlow and Pollard, 1962, Op. Cit.).
Typical estimates for bacteria range from a low of 10**4 bits to a high
of over 10**12.

The spread is large because the calculations are based on widely
differing interpretations of the biological structure. However, a
reasonable estimate can be made as follows: In the simplest of cells,
such as a bacterium, a minimum of one hundred metabolic reactions must
be performed by at least that many enzymes. In addition there must be
ribosomes to synthesize these enzymes accompanied by RNA and regulatory
molecules, as well as a long DNA double helix. The DNA of E. Coli has
been well studied, and its one millimeter genome is known to have about
2 million base pairs. Since each base pair triplet recruits enzymes in
accord with a uniform genetic code, they each define an amino acid
residue.

Although the information content at each DNA site depends upon the
number of synonymous residues as well as which ones are present, we can
reasonably estimate the order of magnitude of what we seek by
approximating the information at three bits per residue. This yields an
information content for the genome of about 7 x 10**6 bits.
Interestingly, a reasonable estimate for the information content of the
human body is possible by observing that each of its cells has a total
of about 6 x 10**9 base pairs.

Since the genetic code is essentially universal (evidence exists that
it may not be), we estimate the information content of a human cell to
be about 2 x 10**10 bits. Some believe that all of the DNA may not be
useful, and that certain segments may contain no information. This, of
course, means that we are presently unable to understand the function
it performs. But even if the information existed along as little as 1
percent of the DNA, its magnitude would still be so vast that the
conclusions remain unaltered. The actual estimate, however, is about 60
percent.

______________________________________________________________________

APPENDIX 2
==========
INTRODUCTION
------------
Human knowledge is conveyed in terms of "information." However, the
term has a meaning beyond just "informing" or "telling." It is an
entity that scientists use to quantify the complexity of systems. This
can be a physical system with properties defined by the organization of
component parts, or it may be a symantec system with meaning created by
the organization of letters and words. In what follows, we calculate
the informational metric of words, books and all human knowledge.

NUMBER OF WORDS
---------------
A typical dictionary e.g., Webster's 9th New Collegiate, has
approximately 40,000 words. Let's assume the worst and increase this
to, say, 64,000 words.

ENGLISH SYMBOL STATES
---------------------
An average word length is about 8 letters i.e., eight symbols, each
capable of 26 total states. Thus, whereas the total number of possible
combinations is (26)**8 = 2*l0**l1 = 2**37.6 or about 38 bits, the
actual number in the English language is equivalent to a reduced
alphabet defined by (X**8) = 64,000 = 2**16 or X = 4 i.e., to a reduced
language symbol of approximately 4 states, and an information
equivalent of X**8 = 4**8 = 16 bits. We therefore conclude that all
English word labels can be described by a computer word capable of
2**16 states, and that an English word label has, on average, 16 bits
of information.

The length of an average sentence is about 14 words, each subdividable
into binary groups. One example might be: (FIVE THOUSAND) (YEARS AGO)
(PEOPLE IN EGYPT) (FOOLISHLY THOUGHT) (THAT LIFE ORIGINATED) (IN GRAIN)

Many of the binary groups are interchangeable with little or no loss in
meaning e.g.,

FIVE THOUSAND YEARS AGO LIFE CAME FROM GRAIN PEOPLE THOUGHT

PEOPLE THOUGHT FIVE THOUSAND YEARS AGO LIFE CAME FROM GRAIN

LIFE CAME FROM GRAIN PEOPLE THOUGHT FIVE THOUSAND YEARS AGO

FIVE THOUSAND YEARS AGO PEOPLE THOUGHT LIFE CAME FROM GRAIN

PEOPLE THOUGHT LIFE CANE FROM GRAIN FIVE THOUSAND YEARS AGO

LIFE CANE FROM GRAIN FIVE THOUSAND YEARS AGO PEOPLE THOUGHT

These and other combinations preserve the message and essentially have
the same information content. A more careful analysis shows that about
half the word groups are interchangeable with little if any loss in
meaning. The total number of unique, message preserving transformations
is therefore about twice those permitted for the average binary group.

BINARY INFORMATION EQUIVALENT
-----------------------------
Although the average English word label requires 16 bits of
information, the information equivalent for each binary group is
considerably less than twice the 16 bits for each of the binary labels
i.e., less than the 32 bits needed to code the total number of unique,
message preserving transformations that are possible among all the
binary groups. The reason is that the vast majority of the 64,000
English words do not meaningfully modify the other. Out of the total,
the number of understandable adjectives that, on average, can
meaningfully proceed each 64,000 words is estimated to be the square of
the log of the total i.e., about Ln*2 (64,000) = Ln*2 (2**16) =
(16*Ln2)**2 = 123. This says that, on average, we estimate that 123
meaningful adjectives exist for each of the 64,000 words in the English
language (a generous assumption).

On average, the total number of meaningful message preserving sentence
states is, therefore, of the order of (2) * (2**7) * (2**16) = (2**24)
or about 24 bits.

EQUIVALENT SENTENCE SEQUENCES
-----------------------------
We estimate an average paragraph to be about 12 or so sentences.
However the meaning is surprisingly preserved when large numbers of
sentences are interchanged. For example, consider the opening 4
sentences taken from "Origins and Destiny" (page 67 - The History of
Life):

1. Five thousand years ago people thought that life came from grain.
2. They would see rats running from piles of wheat and they thought that
the wheat had given rise to life.
3. They also thought that water was the source of life.
4. For example, when they saw toads hop out of ponds, they sincerely
believed it was the spontaneous generation of life.

The meaning is clear in each of the following sequences of sentences:

1234, 2134, 1243, 2143,3412,3421,4312, 4321 etc.

Also, the second sentence is equivalent to:

2a. They would see rats running from piles of wheat.

2b. They thought that the wheat had given rise to life.

These, in turn, can be interchanged without loss of meaning. In
general, any long sentence can be shortened into smaller parts that can
be interchanged many ways with no loss in meaning.

Meaning is preserved when sentences are interchanged because, in
general, the meaning of each sentence more or less follows from the
proceeding, and leads into the next. Thus the importance of the
sequence in which sentences appear increases with the distance down the
paragraph. We estimate that for, say, 12 sentences, a minimal
requirement is to distinguish between the initial 6 sentences, and the
latter 6 or, to be more conservative, to distinguish among three
groups, each composed of 4 sentences. This means that in a paragraph
composed of 12 sentences, message preservation requires preservation of
the sequence of groups of 4 sentences each.

SENTENCE GROUP INFORMATION EQUIVALENCE
--------------------------------------
These considerations show that to code the unique sentence groups found
in the average English paragraph requires an additional informational
capacity of 3! = 6, corresponding to an additional 2 to 3 bits. If we
further allow for the (likely) possibility that some of the 4 sentence
sequences will be disallowed, then the estimate of the total needed
information increases to something of the order of 5 bits.

BOOK INFORMATION CONTENT
------------------------
On average, many book chapters can be interchanged. When this is not
so, the necessary additional information proves minimal in comparison
to the 24 ö 5 = 29 total bits thus far needed. Typical chapters have 20
to 40 paragraphs - also interchangeable, but to a lesser extent. We thus
require about 3 additional bits, 1 for the chapter and 2 the paragraph
sequence, raising the total to 32 bits per book. Since a plurality of
books consists of separate groups of chapters, the information (like
different crystals) is additive.

NUMBER OF U.S. LIBRARIES
------------------------
The Statistical Abstract of the United States (1986), U.S. Department of
Commerce, Bureau of Census (106th Edition) reports the total number of
public, academic and miscellaneous (medical, law, government, religious
etc.) libraries in the United States in 1984 to be 29,465, 4,989, and
3,348 respectively (a total of 37,802 libraries). The American Library
Dictionary (1986), NY, R.R. Bowker & Co. (36 Edition) on average lists 14
libraries in the United States on each of 1930 pages, for a total of about
27,000 U.S. libraries. Thus the near 38,000 total is reasonable.

The World Almanac (1986) NY, Newspaper Enterprise Association, shows that
the 56 largest U.S. libraries hold, on average, 1.63 million volumes. The
average number is considerably lower than this, being of the order of, at
most, several hundred thousand or so. If we assume that each of the 38,000
U.S. libraries hold, on average, one million volumes (the true number is
closer to one tenth of this), then the maximum total number of bound
volumes in the United States approaches 40 billion.

NUMBER OF BOUND VOLUMES OUTSIDE U.S.
------------------------------------
The World Guide to Libraries Internationales Bibliotheks -Handbuch (1986)
London, K.G. Saur (7th Edition / 7. Ausgabe) lists about 252 libraries
world wide on each of 153 pages, for a total of 38,556. Scanning the pages
shows that the total number of bound volumes in these libraries are
typically tens of thousands. However, if we again increase our estimate by
about ten (to be conservative) and allow, on average, 250,000 volumes per
library, then the total number of bound volumes outside the the United
States is estimated to be of the order of 10 billion.

TOTAL BOUND VOLUME INFORMATION
------------------------------
The total number of volumes world wide counting all volumes is thus
estimated to be about 50 billion. This corresponds to: (2**X) = (50*10**9)
= Ln(50) + 9*Ln(lO) = 3.91 + 20.72 = 24.64. Therefore X = (24.64 / Ln( 2))
= 24.64 / 0.69 = 35.54 or, about, 36 bits. Thus the total information
content of all human knowledge is of the order of 36 bits plus 32 bits or
68 bits total. This is an extremely generous estimate, not only in terms
of the average volumes per library being unrealistically high by an order
of magnitude, but also in terms of the countless millions of duplicate
books that exist within these libraries.

INFORMATION CONTENT OF HUMAN KNOWLEDGE

We therefore conclude that all human knowledge has an information
equivalent of something of the order of 68 bits. However, and for the
reasons cited earlier, the actual number is likely to be significantly
less than this i.e., lower by at least 12 percent or so. This shows
that at best, the maximum information that can be rationally attributed
to human knowledge is 68 bits or so. We sometimes find 2**60 published
in the literature (e.g., Golay), but this is in error, and 60 bits is
the number intended. Such carelessness is not unusual by older men of
science, particularly when the subject matter is intended for a
technical audience who, by training, tend to know when such liberties
are being taken.

Return to
Home Page