Origin of Life – Theories and Genetics

T.C. Goldsmith May, 2001
Revised May 2002

For more extensive and recent information on Genetics, Theories of Aging, and Evolution Theories see: http://www.azinet.com/aging/

New Site dedicated to Info on Human Aging: http://www.programmed-aging.org

Recent rapid advances in the study of genetics and molecular biology have produced additional insight into fundamental questions such as the origin of life on Earth. This paper provides an overview of Genetics from an engineering perspective as well as discussions of these questions.

As a computer engineer, I became interested in genetics because of the striking similarities between genetics processes and computer science. There is a digital genetic code.   There is digital logic with “and” functions and logic matrices. There is even “error correcting” in digital copying of genetic code.

Genetics

All living organisms have a genetic code generally represented by the sequence of nucleotides in their DNA. Since there are four possible bases used in construction of the code (denoted A, G, C, and T), each letter in the code carries two bits of information. The DNA is normally in the form of a double strand (the famous “double helix”) where the second strand is complementary to the first strand. That is, in the second strand a sequence such as “AGCTTT” is replaced by “TCGAAA” which carries the same information. So the terminology “base pair” refers to one letter of genetic code represented by the base and its complement and equivalent to two bits of information in computer parlance.

Humans have a genetic code (the “genome”) of about 3.3 billion base pairs (6.6 gigabits or 825 megabytes). Yes, the human genome would easily fit on a typical laptop hard drive. Each human has two copies of the genome in virtually every cell of his or her body because of inheritance of one set each from both father and mother. Actually males have about 2 percent less code in one of their genomes because they only have one “X” chromosome.   During growth and normal life processes cells endlessly read, and interpret the genetic codes, copy various snippets of the code, and use the copies as templates in the manufacture of proteins.

Early naturalists thought that genetic traits were inherited in more or less “analog” fashion in which offspring had an average of their parent’s characteristics. Gregor Mendel was the first to realize through extensive experiments with breeding of peas that at the lowest level, inheritance is binary, and that there is a minimum unit of inheritance now known as a “gene”. Mendel found that some traits are “recessive” and can appear in progeny even if neither parent has the trait. He also found that inheritance of a trait is independent of inheritance of other traits.

Genes are now known to be implemented as sequences of genetic code that direct specific cells to produce a particular protein at a particular time. An essentially infinite number of possible different protein molecules can be produced depending on the particular order of amino acid molecules used in their construction. The code for protein production has been “broken” so that we now know that a three-letter sequence (a codon) is used to specify a particular amino acid (there are 20 amino acids). For instance, the sequence GGC specifies that the amino acid glycine is to be added to a protein molecule. Start and stop codons mark the beginning and end of a protein coding sequence in a manner startlingly like modern data communications schemes. There are 64 possible codons and only 20 possible amino acids so some redundancy and error correction exists. The regulatory code sequences in genes that specify in which parts of the body and/or at which times a protein will be produced are much more complex and less well understood.

For humans, approximately 45,000 genes are contained in 23 separate strands of DNA known as chromosomes (46 if both sets of code are counted). The number of chromosomes is not indicative of complexity. Dogs have 78; Horses have 64. Ferns have 512.

The international Human Genome Project (HGP) has completed a preliminary “draft” sequencing of the entire human genome genetic code. Sequences of a small number of other organisms such as the mouse, fruit fly, and e coli are completed or in work.   Having the sequence is very different from understanding what it means.

Mendelian Genetics is somewhat like Newtonian Physics. Eventually scientists noticed subtle deviations from the inheritance model predicted by Mendel. Specifically, inheritance of certain traits was not completely independent of other traits. We now know that inheritance of traits will be independent only if they are carried by different chromosomes and that the probability of jointly inheriting traits carried by the same chromosome is proportional to the physical distance between the two genes on the chromosome. Extensive inheritance studies have resulted in maps of the genome showing the approximate location of some trait genes and human genetic disease genes on specific chromosomes. This information can eventually be combined with the detailed sequence data to disclose the genes which (when incorrect) are responsible for genetic diseases. There are an estimated 3000 different human genetic diseases.

Gene Logic

The genetic code has been compared to a blueprint specifying the design of an organism. In fact the genetic code specifies not only the design of the organism but provides for the mechanisms needed to “read” the code and manufacture the components of the organism as well as specifying the procedures needed for the life processes of the finished organism. Simple organisms are completely defined genetically. Each tiny nematode worm has exactly 958 cells. Humans, on the other hand, have trillions of cells and less than 100,000 genes so the genetic code is more of a general plan. For example, major blood vessels are genetically specified. Everybody has an aorta. But minor blood vessels grow where needed according to genetically defined rules.

Although all the somatic cells in an organism contain the complete genetic code, in any given cell only a relatively few genes are active. The difference in the genes that are active determines the difference between, say, liver and brain cells. A complex gene logic determines when and where a particular gene will be “turned on”. The gene logic can accommodate varying amounts of positional detail. The eye, which has a complex structure in which adjacent cells can be very different, presumably requires many genes to implement a relatively small structure. The femur is much larger but much less complex and requires less genetic information. The gene logic also controls when various activities will take place. Cells divide rapidly in growing organisms but do not divide in adults unless needed to replace dead or discarded cells. (Cancer involves a major breakdown in the gene logic in which cells grow in both an inappropriate position and at an inappropriate time. Cancer is thought to require multiple mutations, some of which can be inherited.)

The gene logic is implemented using signaling proteins. That is, genes can control the production of proteins which are actually the building blocks to produce muscle and other structural components in growing cells but can also control production of other proteins which are logic signals. These logic signals can then be received by other genes and determine whether those genes are activated. Some logic proteins are long range in that they can travel through essentially the entire organism (think insulin). Other shorter range signals appear only near their point of origin, possibly only immediately around the cell in which they are generated. Since genes can both generate and detect signaling proteins, many genes can implement a very complex logic. The positional logic framework which governs where in the body specific types of cells are found is an example of a “boot-strap” problem. The logic framework itself has to be constructed as the organism grows from a one-cell fertilized egg to an adult.

A mutation occurs when the genetic code in a cell is altered such that descendent cells formed by division of the altered cell also have the altered DNA. If the mutation occurs in the chain of cell division between the original fertilized egg and reproductive (sperm or egg) cells (the germ line), then the mutation can be passed to progeny. Many other mutations presumably have no effect because they occur in genes that are never activated in the descendents of the affected cells. (A mutation in a gene that was only active in the brain would have no effect if it occurred in the line of cells that were to form a leg, etc)

Evolution takes place by means of mutations which affect the germ line. Often a mutation results in loss of some essential function and is therefore fatal to progeny and not passed on to living descendants. Sometimes the mutation results in an evolutionary advantage and therefore may eventually become universal in descendants. Sometimes mutation results in characteristics which are different (such as a red eye color in a species that previously had only brown eyes) but confers no particular advantage or disadvantage and so becomes common but not universal in descendants. Higher organisms also have extensive non-functional sections in their genetic code. Mutations in the non-functional parts of the code would have no observable effect on the organism and therefore would be passed to descendents. Since the rate at which mutations occur should be relatively constant, differences in non-functional code can be used to determine the time since two individuals shared a common ancestor.

The HGP indicates that the human genome contains about 50 percent apparently non-functional code much of which consists of many repetitions of simple sequences such as ..ATATATATATAT…which clearly have little or no information content. Some of the repeat sequences are known to be necessary synchronization patterns such as the sequences at the beginnings and ends of chromosomes. The purpose, if any, of other repeat sequences is unknown.

Origin of Life on Earth

So, what does all this have to do with the origin of life?

The genetic code represents an historical record of the development of the organism with an extraordinary amount of detail (825 megabytes is a lot of detail!). An organism which shares significant code sequences with another organism very likely has a common ancestor. By looking at changes in non-functional DNA we can estimate the time since that ancestor lived. By comparing genomes we can construct a “family tree” of life on Earth.

Based on data from the HGP and other sources we can say things like the following:

All humans are descended from a single individual who lived about 270,000 years ago.

Humans and New World monkeys share an ancestor which lived about 7 million years ago.

Humans and mice share a common ancestor which lived about 50 million years ago.

All life on earth is thought to be descended from an original primordial single cell organism which lived about 3.5 billion years ago.

The Earth was formed about 4.5 billion years ago but was probably incompatible with life until perhaps 3.8 billion years ago so life apparently appeared relatively quickly.

As more genetic code data is available on various other organisms and as analysis of differences and similarities of codes progresses the entire family tree of life on earth will eventually be developed and more will be known about the characteristics of the primordial organism.

SO, where did that original primordial organism come from?

There seem to be several schools of thought.

Theory 1 - Life Appeared Spontaneously

Some scientists believe that life arose spontaneously from available materials present on the early Earth. In fact, experiments have been conducted in which air, water, carbon dioxide, methane, and common minerals were “cooked” in the presence of energy sources such as heat, sunlight, and simulated lightning to see if life or precursors of life would appear. Indeed, organic “building blocks” such as amino acids did appear.

But it is a very, very long way from amino acids to a life form. The genetic work indicates that the complexity of genetic codes doesn’t track that well with the apparent complexity of the organism and that even very simple organisms have quite complex genomes. The simplest known living thing is the microbe mycoplasma genitalium which causes human non-gonococcal urethritus. This microbe has a genetic code of about 570,000 base pairs. Viruses are simpler but aren’t really “alive” in the sense that they cannot reproduce or grow without using the mechanisms in a living cell to do so.   The bacteria e coli has a genetic code of about 5.7 million base pairs.

But e coli and mycoplasma can’t live in the absence of other more complex organisms (e coli lives in animal gut, mycoplasma lives in … well you get the idea). In fact the primordial organism must have been at the bottom of the food chain, capable of synthesizing its own food from non-living material, and living without assistance from any other living organism. It could have possibly been something on the order of blue-green algae which has 3.6 million base pairs in its genetic code and is thought to be about 3.5 billion years old. Mycoplasma, bacteria, and viruses all must have “devolved” from more complex organisms in response to the availability of more complex forms to act as hosts or links in the food chain.

The original organism had mechanisms (ability to grow, reproduce, and evolve) which led to the evolution of the diverse life forms which now exist on Earth and as indicated above this evolution is documented in the genetic codes of organisms now alive as well as in fossil evidence. But under this scenario the original organism would have had to appear by random happenstance aggregation of materials.   This is somewhat like believing that because while digging you found a rock that looked like a brick, if you dug long and hard enough you would eventually find something that looked like the Sistine Chapel complete with Michelangelo’s Creation on the ceiling.

Life Evolved from Simpler Organisms

Some scientists feel that life originated spontaneously as a much simpler organism than any now found. A difficulty with this idea is the absence of any current examples of the simpler organism. In general, appearance of more complex organisms has not resulted in disappearance of simpler forms. We still have cockroaches. We still have fruit flies.   Indeed, there is evidence of devolution. Eventually, back tracking of many living genetic codes should enable some insight into the probability of this theory.

Its Unknowable

Some scientists take the view that the origin of the primordial organism is “unknowable” meaning that not only do we not know but we are unlikely to ever know and that the subject is therefore more appropriate for philosophy or religion than science. The origin of the primordial organism is therefore the biological equivalent of the “Big Bang Theory” in Astrophysics in which astrophysicists think the entire universe was once the size of a golf ball which then exploded to create the observed universe. They can trace observed cosmic phenomena such as galaxies, red shift, and background radiation back to the golf ball but they admit that it is “unknowable” as to how the golf ball got there.

It Came from Outer Space

Some believe that life originated elsewhere in the universe and was then somehow distributed. This doesn’t have to mean biological contamination of the early Earth by space travelers flushing their ballast tanks. It could be that life was distributed via simple frozen or sporelated organisms carried by fragments of a destroyed planet or ejecta blasted into space by meteorite impact. Comets are known to contain water and ice and NASA thinks it has found evidence of fossilized bacteria in meteorites. DNA has been recovered from material 20 million years old.

The possibility that life originated somewhere else in the universe (it is a Very large universe) and then came here seems to many more likely than the idea that life originated on Earth. The space theory is also less egocentric. Keep in mind that all previous “Earth is the center of the universe” theories have been disproved.

A consequence of the space theory is that life might be widely distributed. Life might appear relatively rapidly on any planet that has appropriate conditions, at least in regions which were in a position to be seeded from the source – a sort of “universe as Petri dish” concept. In other words, if there is life on Earth, then there is likely to be life in any nearby system that has planets with appropriate conditions.

Noted astronomer Fred Hoyle supports the space theory.

Copyright July 2001 T. C. Goldsmith

tgoldsmith@aol.com

Return to Articles