MACROMOLECULES

This text is divided into five major sections:

Chemistry of the bonds in biological macromolecules
Helicity in macromolecules
Macromolecular folding
Macromolecular interactions
Denaturation

Introduction

There are three major types of biological macromolecules in mammalian systems.
  1. Carbohydrates
  2. Nucleic acids
  3. Proteins

Often they are treated separately in different segments of a course. In fact, the principles governing the organization of three-dimensional structure are common to all of them, so we will consider them together.

We will begin with the monomer units.

  1. monosaccharide -- for carbohydrate
  2. nucleotide -- for nucleic acids
  3. amino acid -- for proteins

We will describe the features of representative monomers, and see how the monomers join to form a polymer.

We will then look at the monomers in each major type of macromolecule to see what specific structural contributions come from each.

The three-dimensional structure of each type of macromolecule will then be considered at several levels of organization.

We will investigate macromolecular interactions and how structural complementarity plays a role in them.

The stories for proteins, monosaccharides and nucleotides are just variations on the same theme. So you'll need to learn only one pattern, then apply that pattern to the other systems.

We will conclude this section of the course with a consideration of denaturation and renaturation -- the forces involved in loss of a macromolecule's native structure (that is, its normal 3-dimensional structure), and how that structure, once lost, can be regained.

Let's begin.

Biological macromolecules are polar

The main point of the first segment of this material is this: THE MONOMER UNITS OF BIOLOGICAL MACROMOLECULES HAVE HEADS AND TAILS. WHEN THEY POLYMERIZE IN A HEAD-TO-TAIL FASHION, THE RESULTING POLYMERS ALSO HAVE HEADS AND TAILS.

These macromolecules are polar [polar: having different ends] because they are formed by head to tail condensation of polar monomers. Let's look at the three major classes of macromolecules to see how this works, and let's begin with carbohydrates.

Monosaccharides polymerize to yield polysaccharides.

Glucose is a typical monosaccharide. It has two important types of functional group: a carbonyl group (an aldehyde in glucose, some other sugars have a ketone group instead.) hydroxyl groups on the other carbons. This is what you need to know about glucose, not its detailed structure.

Glucose exists mostly in ring structures. ( 5-OH adds across the carbonyl oxygen double bond.) This is a so-called internal hemiacetal. The ring can close in either of two ways, giving rise to anomeric forms, -OH down (the alpha-form) and -OH up (the beta-form)

The anomeric carbon (the carbon to which this -OH is attached) differs significantly from the other carbons. (note: it's easy to pick out because it is the only carbon with TWO oxygens -- ring and hydroxyl -- attached.)

Free anomeric carbons have the chemical reactivity of carbonyl carbons because they spend part of their time in the open chain form. They can reduce alkaline solutions of cupric salts. Sugars with free anomeric carbons are therefore called reducing sugars. The rest of the carbohydrate consists of ordinary carbons and ordinary -OH groups. The point is, a monosaccharide can therefore be thought of as having polarity, with one end consisting of the anomeric carbon, and the other end consisting of the rest of the molecule.

Monosaccharides can polymerize by elimination of the elements of water

between the anomeric hydroxyl and a hydroxyl of another sugar. This is called a glycosidic bond.

If two anomeric hydroxyl groups react (head to head condensation) the product has no reducing end (no free anomeric carbon). This is the case with sucrose

If the anomeric hydroxyl reacts with a non-anomeric hydroxyl of another sugar, the product has ends with different properties.

This is the case with maltose.

Since most monosaccharides have more than one hydroxyl, branches are possible, and are common. Branches result in a more compact molecule. If the branch ends are the reactive sites, more branches provide more reactive sites per molecule.

Let's now turn to nucleotides and nucleic acids.

Nucleotides polymerize to yield nucleic acids.

Nucleotides consist of three parts.
  1. Phosphate.
  2. Monosaccharide. The presence or absence of the 2' -OH has structural significance that will be discussed later.
  3. A base.

There are four dominant bases; here are three of them:

  1. adenine (purine)
  2. cytosine (pyrimidine)
  3. guanine (purine)

The fourth base is (a pyrimidine)

Be aware that uracil and thymine are very similar; they differ only by a methyl group.

You need to know which are purines and which are pyrimidines, and whether it is the purines or the pyrimidines that have one ring. The reasons for knowing these points relate to the way purines and pyrimidines interact in nucleic acids, which we'll cover shortly.

Nucleotides polymerize by eliminating the elements of water

to form esters between the 5'-phosphate and the 3' -OH of another nucleotide.

A 3'->5' phosphodiester bond is thereby formed. The product has ends with different properties.

Let's look at the conventions for writing sequences of nucleotides in nucleic acids. Bases are abbreviated by their initials: A, C, G and U or T. U is normally found only in RNA, and T is normally found only in DNA. So the presence of U vs. T distinguishes between RNA and DNA in a written sequence.

Sequences are written with the 5' end to the left and the 3' end to the right unless specifically designated otherwise.

Phosphate groups are usually not shown unless the writer wants to draw attention to them. The following representations are all equivalent.

  uracil  adenine  cytosine  guanine
    |        |        |        | 
P-ribose-P-ribose-P-ribose-P-ribose-OH
 5'    3' 5'    3' 5'    3' 5'    3'

pUpApCpG
UACG
3' GCAU 5' 

(Note that in the last line the sequence is written in reverse order , but the ends are appropriately designated.)

Branches are possible in RNA but not in DNA. RNA has a 2' -OH, at which branching could occur, while DNA does not. Branching is very unusual; it is known to occur only during RNA modification [the "lariat"], but not in any finished RNA species.

Amino acids polymerize to form polypeptides or proteins.

Amino acids contain a carboxylic acid (-COOH) group and an amino (-NH2) group. The amino groups are usually attached to the carbons which are alpha to the carboxyl carbons, so they are called alpha-amino acids.

The naturally occurring amino acids are optically active, as they have four different groups attached to one carbon, (Glycine is an exception, having two hydrogens) and have the L-configuration.

The R-groups of the amino acids provide a basis for classifying amino acids. There are many ways of classifying amino acids, but one very useful way is on the basis of how well or poorly the R-group interacts with water

  1. The first class is the hydrophobic R-groups which can be aliphatic (such as the methyl group of alanine) or aromatic (such as the phenyl group of phenylalanine).
  2. The second class is the hydrophilic R-groups which can contain neutral polar (such as the -OH of serine) or ionizable (such as the -COOH of aspartate) functional groups.

Amino acids polymerize by eliminating the elements of water

to form an amide between the amino and carboxyl groups. The amide link thereby formed between amino acids is called a peptide bond.

The product has ends with different properties.

Conventions for writing sequences of amino acids.

Abbreviations for the amino acids are usually used; most of the three letter abbreviations are self-evident, such as gly for glycine, asp for aspartate, etc.

There is also a one-letter abbreviation system; it is becoming more common. Many of the one-letter abbreviations are straightforward, for example:

G = glycine
L = leucine
H = histidine

Others require a little imagination to justify:

F = phenylalanine ("ph" sounds like "F").
Y = tyrosine (T was used for threonine, so we settle for the second letter in the name).
D = aspartate (D is the fourth letter in the alphabet, and aspartate has four carbons).

Still others are rather difficult to justify:

W = tryptophan (The bottom half of the two aromatic rings look sort of like a "W").
K = lysine (if you can think of a good one for this, let us know!)

Question: What do you suppose "Q" represents?

You should be aware this is becoming more and more commonly used, and you should have the mindset of picking it up as you are exposed to it, rather than resisting.

Sequences are written with the N-terminal to the left and the C-terminal to the right.

Although R-groups of some amino acids contain amino and carboxyl groups, branched polypeptides or proteins do not occur.

The sequence of monomer units in a macromolecule is called the PRIMARY STRUCTURE of that macromolecule. Each specific macromolecule has a unique primary structure.

This concludes our consideration of the relationship between the structures of biological polymers and their monomer subunits. Biosynthesis of these macromolecules will be covered in subsequent lectures. Let's now begin to investigate the three-dimensional shapes of these macromolecules in solution and the forces responsible for these shapes. It turns out that

THE REGULAR REPEAT OF MONOMER UNITS HAVING THE SAME SIZE AND THE SAME BOND ANGLES LEADS TO HELICAL (SPIRAL) POLYMERS.

IF THESE HELICES CAN BE STABILIZED BY SUITABLE INTRA- OR INTERMOLECULAR INTERACTIONS, THEY WILL PERSIST IN SOLUTION, AND WILL BE AVAILABLE AS ELEMENTS OF MORE COMPLICATED MACROMOLECULAR STRUCTURES.

Biopolymers consisting of regularly repeating units tend to form helices.

The fundamental reason for this is that the bond angles of the constituent atoms are never 180 degrees, so linear molecules are not likely; rather, a gentle curve should be expected along the length of the macromolecule.

Just what is a helix? A helical structure consists of repeating units that lie on the wall of a cylinder such that the structure is superimposable upon itself if moved along the cylinder axis.

A helix looks like a spiral or a screw. A zig-zag is a degenerate helix.

Helices can be right-handed or left handed. The difference between the two is that:

Right-handed helices or screws advance (move away) if turned clockwise.

Examples: standard screw, bolt, jar lid.
Left-handed helices or screws advance (move away) if turned counterclockwise.
Example: some automobile lug nuts.

Helical organization is an example of secondary structure. These helical conformations of macromolecules persist in solution only if they are stabilized. What might carry out this stabilization?

Stable biological helices are usually maintained by hydrogen bonds.

Let's now look at

Helices in carbohydrates.

Carbohydrates with long sequences of alpha (1 -> 4) links have a weak tendency to form helices.

Starch (amylose) exemplifies this structure.

The starch helix is not very stable in the absence of other interactions (iodine, which forms a purple complex with starch, stabilized the starch helix), and it commonly adopts a random coil conformation in solution.

In contrast, beta (1 -> 4) sequences favor linear structures. Cellulose exemplifies this structure.

Cellulose is a degenerate helix consisting of glucose units in alternating orientation stabilized by intrachain hydrogen bonds. Cellulose chains lying side by side can form sheets stabilized by interchain hydrogen bonds.

Helices in nucleic acids.

Single chains of nucleic acids tend to from helices stabilized by base stacking.

The purine and pyrimidine bases of the nucleic acids are aromatic rings. These rings tend to stack like pancakes, but slightly offset so as to follow the helix. The stacks of bases are in turn stabilized by hydrophobic interactions and by van der Waals forces between the pi-clouds of electrons above and below the aromatic rings.

In these helices the bases are oriented inward, toward the helix axis, and the sugar phosphates are oriented outward, away from the helix axis.

Two lengths of nucleic acid chain can form a double helix stabilized by

Purines and pyrimidines can form specifically hydrogen bonded base pairs. Let's look at how these hydrogen bonds form.
Guanine and cytosine can form a base pair that measures 1.08 nm across, and that contains three hydrogen bonds.
Adenine and thymine (or uracil) can form a base pair that measures 1.08 nm across, and that contains two hydrogen bonds.

Base pairs of this size fit perfectly into a double helix.

This is the so-called Watson-Crick base pairing pattern.

Double helices rich in GC pairs are more stable than those rich in AT (or AU) pairs because GC pairs have more hydrogen bonds

Now, Specific AT (or AU) and GC base pairing can occur only if the lengths of nucleic acid in the double helix consist of complementary sequences of bases. A must always be opposite T (or U). G must always be opposite C. Here's a sample of two complementary sequences.

...ATCCGAGTG...

...TAGGCTCAC...

Most DNA and some sequences of RNA have this complementarity, and form the double helix. It is important to note, though, that the complementary sequences forming a double helix have opposite polarity. The two chains run in opposite directions:

5' ...ATCCGAGTG... 3'

3' ...TAGGCTCAC... 5'

This is described as an antiparallel arrangement. This arrangement allows the two chains to fit together better than if they ran in the same direction (parallel arrangement).

Consequences of complementarity.

In any double helical structure the amount of A equals the amount of T (or U), and the amount of G equals the amount of C. -- count the A's. T's, G's and C's in this or any arbitrary paired sequence to prove this to yourself.

Because DNA is usually double stranded, while RNA is not, in DNA A=T and G=C, while in RNA A does not equal U and G does not equal C.

Three major types of double helix occur in nucleic acids. These three structures are strikingly and obviously different in appearance. You could see the difference if it were out of focus, and you could feel the differences in the dark. This is critically important, because SO CAN AN ENZYME! Such as the enzymes that control the expression of genetic information.

DNA usually exists in the form of a B-helix. Its characteristics:

Right-handed and has 10 nucleotide residues per turn.
The plane of the bases is nearly perpendicular to the helix axis.
There is a prominent major groove and minor groove.
The B-helix may be stabilized by bound water that fits perfectly into the minor groove.

Double-stranded RNA and DNA-RNA hybrids (also DNA in low humidity) exist in the form of an A-helix. Its characteristics:

Right-handed and has 11 nucleotide residues per turn.
The plane of the bases is tilted relative to the helix axis.
The minor groove is larger than in B-DNA.

RNA is incompatible with a B-helix because the 2' -OH of RNA would be sterically hindered. (There is no 2' -OH in DNA.) This is a stabilizing factor you should know.

DNA segments consisting of alternating pairs of purine and pyrimidine (PuPy)n can form a Z-helix. Its characteristics:

Left-handed (this surprised the discoverers) and has 12 residues (6 PuPy dimers) per turn.
Only one groove.
The phosphate groups lie on a zig-zag line, which gives rise to the name, Z-DNA.

The link between the deoxyribose and the purine has a different conformation in Z-DNA as compared to A-DNA or B-DNA. Z-DNA is stabilized if it contains modified (methylated) cytosine residues. These occur naturally.

The detailed shape of the helix determines the interactions in which it can engage. The geometry of the grooves are important in allowing or preventing access to the bases. The surface topography of the helix forms attachment sites for various enzymes sensitive to the differences among the helix types. We'll see some detailed examples of this later.

The DNA triplex (triple helix):

Start by imagining a B-DNA helix. It is possible under certain circumstances to add a third helix fitting it into the major groove.

A triplex can form ONLY if one strand of the original B-helix is all purines (A and G) [why you need to know purines from pyrimidines] and the corresponding region of the other strand is all pyrimidines. Regions of DNA with these characteristics are found in control regions for genes, and triplex formation PREVENTS EXPRESSION OF THE GENE.

The triplex is stabilized by H-bonds in the unusual Hoogsteen base-pairing pattern shown in the slide (along with standard Watson-Crick base pairing).

The existence of this structure was known for 20 years, but no one knew what to make of it. Now, recognizing that it occurs naturally in gene control regions, it is getting a great deal of attention in the research literature.

Currently artificial oligonucleotide drugs are being synthesized that form triplexes with specific natural DNA sequences. Other drugs are being developed that stabilize naturally occurring or artificial triplexes. These are showing promise as antitumor and antibacterial agents, as well as potential agents to modify enzyme activity by controlling enzyme synthesis. It's too new to be in even the most modern text, but you will be seeing more and more of this in the near future. Be aware of this structure, know where it is found in the gene (at control regions) and its effect on gene expression, and that it is the subject of promising clinical investigations.

Helices in proteins.

Properties of the peptide bond dominate the structures of proteins. The first of these properties is that the peptide bond has partial double character. Partial double character is conferred by the electronegative carbonyl oxygen, which draws the unshared electron pair from the amide hydrogen.

As a result of having double bond character the peptide bond is

  1. planar
  2. not free to rotate
  3. more stable in the trans configuration than in the cis

These characteristics restrict the three-dimensional shapes of proteins because they must be accommodated by any stable structure.

The second major property of the peptide bond is that the atoms of the peptide bond can form hydrogen bonds.

Now let's look at some of the structures that accommodate the restrictions imposed by the peptide bond. The first is the alpha-helix. The alpha-helix is a major structural component of proteins.

Stabilizing factors include:
All possible hydrogen bonds between peptide C=O and N-H groups in the backbone are formed. The hydrogen bonds are all intrachain, between different parts of the same chain. A lthough a single hydrogen bond is weak, cooperation of many hydrogen bonds can be strongly stabilizing.
Alpha-helices must have a minimum length to be stable ( so there will be enough hydrogen bonds).
All peptide bonds are trans and planar. So, if the amino acid R-groups do not repel one another helix formation is favored.
The net electric charge should be zero or low (charges of the same sign repel).
Adjacent R-groups should be small, to avoids steric repulsion.

Destabilizing factors include:
R-groups that repel one another favor extended conformations instead of the helix. Examples include large net electric charge and adjacent bulky R-groups.
Proline is incompatible with the alpha-helix. The ring formed by the R-group restricts rotation of a bond that would otherwise be free to rotate. The restricted rotation prevents the polypeptide chain from coiling into an alpha-helix. Occurrence of proline necessarily terminates or kinks alpha-helical regions in proteins.

Occurrence of the alpha-helix.

A component of typical globular proteins.
A component of some fibrous proteins, like alpha-keratin.

Alpha-keratin has high tensile strength, as first observed by Rapunzel. It is found in hair, feathers, horn; the physical strength and elasticity of hair make it useful in ballistas, onagers, etc.

The beta-pleated sheet is a second major structural component of proteins.

The beta-pleated sheet resembles cellulose in that both consist of extended chains -- degenerate helices -- lying side by side and hydrogen bonded to one another.

The polypeptide chains of a beta-pleated sheet can be arranged in two ways: parallel (running in the same direction) or antiparallel (running in opposite directions). An edge-on view shows the pleats.

Stabilizing factors for the pleated sheet resemble those for the alpha-helix.
All possible hydrogen bonds between peptide C=O and N-H groups in the backbone are formed. The hydrogen bonds here are all interchain, unlike those of the alpha-helix.
All peptide bonds are trans and planar.
Small R-groups prevent steric destabilization.
Large R-groups destabilize due to crowding.

Sheets can stack one upon the other, with interdigitating R-groups of the amino acids.

Occurrence of the beta-pleated sheet.

A component of 80% of all globular proteins.
In some fibrous proteins.
Egg stalks of certain moths.
Some silk fibroins.

Collagen has an unusual structure. It consists of three polypeptide chains in a triple helix. This is the structure:

Three extended helices of a type called polyproline II helices (because polyproline can take this form)
hydrogen bonded to one another (interchain); no intrachain hydrogen bonds form because each helix is too extended, and hydrogen bonds cannot reach from one level of the helix up or down to the next level
placed at the corners of a triangle.

The entire assembly is twisted into a superhelix.

The stability of the collagen triple helix is due to its unusual amino acid composition and sequence. One third of the amino acid residues is glycine, and the glycyl residues are evenly spaced: (Gly X Y)n, where X and Y are other amino acids is the amino acid sequence of collagen. This places a glycyl residue at each position where the chain is in the interior of the triple helix. There would be no room for a bulky R-group in this position (glycine's R-group is H). The high glycine content (with its small R-group) would otherwise permit too much conformational freedom and favor a random coil.

Proline and hydroxyproline together comprise about one third of the total amino acid residues, and Gly Pro Hypro is a common sequence. The relative inflexibility of the prolyl and hydroxyprolyl residues stiffens the chains. The high (proline & hydroxyproline) content prevents formation of an alpha-helix.

Collagen occurs in tough, inelastic tissues, like tendon. The collagen helix is already fully extended. Unlike the alpha-helix, it cannot stretch; tendon ought not to stretch under heavy load.

Collagen is the single most abundant protein in the body; fortunately collagen defects are rare.

The next level of macromolecular organization is

Tertiary structure

CONCEPT: NUCLEIC ACIDS AND PROTEINS ARE LARGE MOLECULES WITH COMPLICATED THREE-DIMENSIONAL STRUCTURES. THESE STRUCTURES ARE FORMED FROM SIMPLER ELEMENTS, SUITABLY ARRANGED. ALTHOUGH STRUCTURAL DETAILS VARY FROM MACROMOLECULE TO MACROMOLECULE, A FEW GENERAL PATTERNS DESCRIBE THE OVERALL ORGANIZATION OF MOST MACROMOLECULES.

Tertiary structure is the three dimensional arrangement of helical and nonhelical regions of macromolecules. Let's look first at the

Tertiary structure of nucleic acids.

And let's begin with DNA -- many naturally occurring DNA molecules are circular double helices. Most circular double-stranded DNA is partly unwound before the ends are sealed to make the circle.
Partial unwinding is called negative superhelicity.
Overwinding before sealing would be called positive superhelicity.

Superhelicity introduces strain into the molecule. (Think of holding a coil spring by the two ends and twisting it to unwind it; it takes effort to introduce this strain) The strain of superhelicity can be relieved by forming a supercoil. The identical phenomenon occurs in retractable telephone headset cords when they get twisted. The twisted circular DNA is said to be supercoiled. The supercoil is more compact. It is poised to be unwound, a necessary step in DNA and RNA synthesis.

RNA -- most RNA is single stranded, but contains regions of self-complementarity.

This is exemplified by yeast tRNA. There are four regions in which the strand is complementary to another sequence within itself. These regions are antiparallel, fulfilling the conditions for stable double helix formation. X-ray crystallography shows that the three dimensional structure of tRNA contains the expected double helical regions.

Large RNA molecules have extensive regions of self-complementarity, and are presumed to form complex three-dimensional structures spontaneously.

Tertiary structure in Proteins

The formation of compact, globular structures is governed by the constituent amino acid residues. Folding of a polypeptide chain is strongly influenced by the solubility of the amino acid R-groups in water.

Hydrophobic R-groups, as in leucine and phenylalanine, normally orient inwardly, away from water or polar solutes.

Polar or ionized R-groups, as in glutamine or arginine, orient outwardly to contact the aqueous environment.

Some amino acids, such as glycine, can be accommodated by aqueous or nonaqueous environments.

The rules of solubility and the tendency for secondary structure formation determine how the chain spontaneously folds into its final structure.

Forces stabilizing protein tertiary structure.
Hydrophobic interactions -- the tendency of nonpolar groups to cluster together to exclude water.
Hydrogen bonding, as part of any secondary structure, as well as other hydrogen bonds.
Ionic interactions -- attraction between unlike electric charges of ionized R-groups.
Disulfide bridges between cysteinyl residues. The R-group of cysteine is -CH2-SH. -SH (sulfhydryl) groups can oxidize spontaneously to form disulfides (-S-S-).

R-CH2-SH + R'-CH2-SH + O2 = R-CH2-S-S-CH2-R' + H2O2

(Under reducing conditions a disulfide bridge can be cleaved to regenerate the -SH groups.)

The disulfide bridge is a covalent bond. It strongly links regions of the polypeptide chain that could be distant in the primary sequence. It forms after tertiary folding has occurred, so it stabilizes, but does not determine tertiary structure.

Globular proteins are typically organized into one or more compact patterns called domains.

This concept of domains is important. In general it refers to a region of a protein. But it turns out that in looking at protein after protein, certain structural themes repeat themselves, often, but not always in proteins that have similar biological functions. This phenomenon of repeating structures is consistent with the notion that the proteins are genetically related, and that they arose from one another or from a common ancestor. In looking at the amino acid sequences, sometimes there are obvious homologies, and you could predict that the 3-dimensional structures would be similar. But sometimes virtually identical 3-dimensional structures have no sequence similarities at all!

The four-helix bundle domain is a common pattern in globular proteins. Helices lying side by side can interact favorably if the properties of the contact points are complementary. Hydrophobic amino acids (like leucine) at the contact points and oppositely charged amino acids along the edges will favor interaction. If the helix axes are inclined slightly (18 degrees), the R-groups will interdigitate perfectly along 6 turns of the helix. Sets of four helices yield stable structures with symmetrical, equivalent interactions. Interestingly, four-helix bundles diverge at one end, providing a cavity in which ions may bind.

All-beta structures comprise domains in many globular proteins. Beta-pleated sheets fold back on themselves to form barrel-like structures. Part of the immunoglobulin molecule exemplifies this. The interiors of beta-barrels serve in some proteins as binding sites for hydrophobic molecules such as retinol, a vitamin A derivative. What keeps these proteins from forming infinitely large beta-sheets is not clear.

Now let's look at combined alpha/beta structures. Beta/alpha8 domains are found in a variety of proteins which have no obvious functional relationship. They consist of a beta-barrel surrounded by a wheel of alpha-helices.

Examples
Triose phosphate isomerase.
Domain 1 of pyruvate kinase.

Beta-sheet surrounded by alpha-helices also occur. This is a variation on the theme of beta-structure inside and alpha-helix outside.

Examples
Lactate dehydrogenase domain 1
Phosphoglycerate kinase domain 2

Now that we are familiar with the structures of single chain macromolecules, we are in a position to look at some of the interactions of macromolecules with other macromolecules and with smaller molecules.

Macromolecular Interactions

CONCEPT: MACROMOLECULES INTERACT WITH EACH OTHER AND WITH SMALL MOLECULES. ALL THESE INTERACTIONS REFLECT COMPLEMENTARITY BETWEEN THE INTERACTING SPECIES. SOMETIMES THE COMPLEMENTARITY IS GENERAL, AS IN THE ASSOCIATION OF HYDROPHOBIC GROUPS, BUT MORE OFTEN AN EXACT FIT OF SIZE, SHAPE AND CHEMICAL AFFINITY IS INVOLVED.

Macromolecular interactions involving proteins.

Quaternary structure refers to proteins formed by association of polypeptide subunits. Individual globular polypeptide subunits may associate to form biologically active oligomers.
    The association is specific.
  1. A limited number of subunits is involved.
    • Oligo = several; mer = body, or subunit.
    • 2 (dimer) and 4 (tetramer) are most common, but other aggregates occur, such as trimers, pentamers, etc.
  2. The subunits may be identical or they may be different.
  3. Subunit interaction is entirely noncovalent between complementary regions on the subunit surface.
    • Hydrophobic regions can interact.
    • Hydrogen bonding may occur.
    • Electrostatic (ionic) attraction may be involved.

If covalent links exist (such as disulfide bridges) then the structure is not considered quaternary. In proteins with quaternary structure the deaggregated subunits alone are generally biologically inactive.

Here are some examples of quaternary structure.

Quaternary structure in proteins is the most intricate degree of organization considered to be a single molecule. Higher levels of organization are multimolecular complexes.

Incorporation of nonprotein components into proteins

The resulting species are called conjugated proteins. If we establish a classification of proteins by composition we can identify two categories.
  1. Simple proteins consist of polypeptide only.
  2. Conjugated proteins also contain a nonprotein moiety which frequently plays a role in biological function.

Many different kinds of compound are found in conjugated proteins. A few examples are:

Heme
Lipid
Carbohydrate
Metal ions
Phosphate

Nomenclature: the word "conjugated" is from the Latin, cum = with and jugum = yoke. The protein and nonprotein moieties are yoked with one another (like oxen) to work together.

The apoprotein = the protein without its nonprotein component.
The prosthetic group = the nonprotein portion alone.
The conjugated protein = the apoprotein + prosthetic group.

Metalloproteins

Metals found as prosthetic groups of proteins include Mg, Ca, V, Cr, Mn, Fe, Co, Cu, Zn and Mo. (Also W in some archaeobacteria.) These metals can form coordination complexes. They accept electron pairs from atoms with unshared electron pairs. The electron pairs fill vacant orbitals of the metal ion, such as sp3d2 orbitals. Some of these metals can easily undergo oxidation-reduction, e.g.
Fe(II) = Fe(III) + e-
All are relatively small; no heavy metals (e.g., Pb, Hg) are included. The roles of metals in proteins are related to these properties.
Roles involving simple binding. include
Complexing several groups of the protein simultaneously, thereby stabilizing the three-dimensional structure of the protein. The protein acts as a polydentate ligand. Example: thermolysin loses its structure if Ca(II) is removed. [Adv. Enzymol. 56:378]
Binding the protein and some other molecule together (e.g., an enzyme and its substrate are ligands of the metal ion simultaneously).
Participation in the protein's function. such as activation of a substrate. When a metal accepts an electron pair form a bound substrate, the resulting electron deficiency may make the substrate more reactive.
Metals frequently participate in oxidation-reduction. Sometimes bound metals participate directly in biological oxidation-reduction reactions by accepting or donating an electron (changing oxidation state).

Sometimes other organic or inorganic compounds share metals with proteins.

Sulfide ions participate in formation of the iron-sulfur centers of redoxins.

Heme -- here the iron is part of a large organic complex. It is bound by coordination links to the organic moiety. Binding to the protein is
  1. partly through one of the remaining coordination links.
  2. partly through the organic moiety.

Lipoproteins

Protein associates with lipid through hydrophobic interactions involving the protein's hydrophobic R-groups. Lipoproteins are pseudomicellar structures. Micelles are orderly arrays of molecules having polar heads and hydrophobic tails. The arrays are of molecular dimensions (e.g., two molecules across). In water, the polar heads orient outward, and the polar tails cluster in the center of the micelle.

Lipoproteins resemble micelles in some respects. The structure of lipoproteins typically includes the following features. Their outer surface is coated with polar lipids, with protein intermingled. Their interior is a region of randomly oriented neutral lipid.

Lipoproteins are usually much larger than two molecules across. The role of the polar lipid and protein on the surface is to solubilize the neutral lipid interior. Protein interacts with the lipid of lipoproteins through amphipathic helices. Alpha-helical regions of apolipoproteins have polar amino acids on one surface, and nonpolar ones on the opposite surface. The helix lies on the surface of the structure, with the polar groups oriented outward toward the water, and the nonpolar groups buried in the lipid. (Recall the four-helix bundle domains of proteins, in which contacts between helices involved hydrophobic residues at the contact points.)

Consequence of charged surface: (not unlike many proteins) a tendency to stick to things.

  1. Lipoprotein concentrations go up in infection. This has a protective effect. (NEJM 6/30/92)
  2. Copper, transferrin and other proteins bind to HDL, making it more effective in preventing oxidation of LDL, thereby protecting against atherosclerosis. (PNAS 1992, p. 6993)

Membrane proteins are lipoprotein-like in that they have nonpolar amino acids in strategic locations to permit interaction with the membrane lipid. Proteins of the membrane surface may be structured like the apoproteins of lipoproteins, with amphipathic helices.

Some membrane proteins transverse the membrane. The region of the protein that is completely immersed in membrane should consist entirely of hydrophobic amino acids. A common structural motif to accomplish this is an alpha-helix consisting of at least 22 hydrophobic amino acyl groups. This makes an alpha-helix long enough to span a membrane. In arrays of membrane-spanning helices, helices in the interior of the array could be shorter.

The problem of proline in transmembrane "helices:" Mostly you find hydrophobic residues in transmembrane helices, and their length is about right, around 24 residues. You also find PROLINE. This is very common. Does it violate the prohibition against proline in the helix? Probably not. The current opinion of qualified protein chemists is that when we eventually determine the exact structures of these molecules, we will find the expected kink in the helix at each P residue, and that it will prove to be important in the biological function of the protein.

Glycoproteins are proteins with carbohydrate prosthetic groups.

Typical structure -- one or more chains of monosaccharide units, 1 to 30 units long. It may be straight or branched, and it is usually covalently linked to the apoprotein in one of three major ways.
  1. It may be N-linked (Type I) N-acetylglucosamine (a sugar with an acetylated amino group in place of a hydroxyl group) at the reducing end of a carbohydrate chain is linked to the amide nitrogen of asparagine residue. The asparagine residue must be in the sequence, Asn X Thr (or Ser), where X is any amino acid residue. This specific sequence is called a sequon. No other asparagine will do.
  2. It may be O-linked (Type II): Here the reducing end of a carbohydrate chain (usually N-acetylglucosamine residue) is linked to the hydroxyl of a seryl or threonyl residue.
  3. It may be O-linked (Type III): In this case The reducing end of a carbohydrate chain (usually N-acetylgalactosamine) is linked to the hydroxyl of a hydroxylysyl residue in collagen. (Hydroxylysine is made from lysine in collagen after the collagen has been synthesized.)

Glycoproteins have two major types of functions.

The first is recognition: carbohydrate prosthetic groups serve as antigenic sites (e.g., blood group substances are carbohydrate prosthetic groups), intracellular sorting signals (mannose 6-phosphate bound to a newly synthesized protein sends it to the lysosomes), etc.

Or they may be structural components of the organism: E.g., the proteoglycans of cartilage. The central core is a polysaccharide called hyaluronic acid. Many glycoprotein branches are attached to the hyaluronic acid noncovalently. Each branch is a glycoprotein (core protein) with many carbohydrate chains (chondroitin sulfate -- alternating galactosamine and galactose -- and keratan sulfate -- alternating glucosamine and galactose) attached covalently (xylose beta-> O-ser). The attachment of the core protein to the hyaluronic acid is mediated by a protein called link protein.

We've now seen interactions between protein and metal ions, lipid and carbohydrate. Let's now turn to

Interactions between proteins and nucleic acids.

We will see that these are based on structural complementarity. There are three patterns (motifs) that I want to present.

The zinc finger motif

A small Zn-stabilized structural domain found in proteins that interact with nucleic acids. The zinc finger is a loop of about 25 amino acyl residues stabilized by a Zn atom.

Zn complexed to His and/or Cys maintains the structure of the domain.

Unlike a -S-S- bridge, the Zn complex will not be broken by reducing conditions within the cell.
Unlike Cu or Fe, Zn does not participate in oxidation-reduction reactions that could generate free radicals which might damage nucleic acids.

Other amino acyl residues in the loop are involved in binding to specific nucleotides of the nucleic acid or helping to maintain the folded structure of the domain.

Zinc fingers occur in proteins occur in tandem arrays. They are joined to nearby zinc fingers by short linking regions of peptide. They are spaced to fit into the major groove of DNA, with the bases of the alpha-helices down in the grooves, and the beta-loops touching the double helix.

The leucine zipper

A pair of amphipathic alpha-helices joining two subunits of a dimeric protein that binds to DNA. Some sites in DNA important to biological control have twofold symmetry: the base sequence is the same in both directions.
Example:
5' ...TGACTCA... 3'
3' ...ACTGAGT... 5'

A protein designed to bind at such a site might also be symmetric; this could be accomplished if the protein were a head-to-head dimer.

A class of DNA binding proteins appears to form such dimers through alpha-helices having regularly spaced leucyl residues along one edge. Interaction between the protein monomer units is thought to be through leucyl residues along the edges of the amphipathic helices, sort of like the 4-helix bundle, but with just two helices.

Originally it was thought that the leucyl residues interdigitated (hence the name, "leucine zipper"), but it is now believed that they face each other (reality in the form of x-ray crystallography strikes again). In any case, the symmetric dimer binds to the symmetric region of the DNA through special binding domains.

The helix-turn-helix motif

Two short adjacent alpha-helices that cross one another. One alpha-helix fits into the major groove of DNA, and interacts with specific bases; this is called the recognition helix. A short segment of protein links the recognition helix to a second helix; this is the turn, and is so named because it contains a so-called beta-turn, a well recognized structural element of proteins. The second helix lies across the major groove of DNA, and binds nonspecifically.

A dimeric protein can have a helix-turn-helix motif in each subunit, and if the monomer units are identical it can thereby recognize and bind to symmetric DNA structures.

Denaturation

CONCEPT: DESTRUCTION OF A MACROMOLECULE'S THREE-DIMENSIONAL STRUCTURE REQUIRES DISRUPTION OF THE FORCES RESPONSIBLE FOR ITS STABILITY. THE ABILITY OF AGENTS TO ACCOMPLISH THIS DISRUPTION -- DENATURATION -- CAN BE PREDICTED ON THE BASIS OF WHAT IS KNOWN ABOUT MACROMOLECULAR STABILIZING FORCES. DENATURED MACROMOLECULES WILL USUALLY RENATURE SPONTANEOUSLY (UNDER SUITABLE CONDITIONS), SHOWING THAT THE MACROMOLECULE ITSELF CONTAINS THE INFORMATION NEEDED TO ESTABLISH ITS OWN THREE-DIMENSIONAL STRUCTURE.

Denaturation is the loss of a protein's or DNA's three dimensional structure. The "normal" three dimensional structure is called the native state.

Denaturation is physiological -- structures ought not to be too stable.

  1. Double stranded DNA must come apart to replicate and for RNA synthesis.
  2. Proteins must be degraded under certain circumstances.

Loss of native structure must involve disruption of factors responsible for its stabilization. These factors are:

  1. Hydrogen bonding
  2. Hydrophobic interaction
  3. Electrostatic interaction
  4. Disulfide bridging (in proteins)

Note that no break in the polymer chain (disruption of primary structure) is involved in denaturation.

Denaturing agents disrupt stabilizing factors.

  1. Agents that disrupt hydrogen bonding:

    Heat -- thermal agitation (vibration, etc.) -- will denature proteins or nucleic acids. Heat denaturation of DNA is called melting because the transition from native to denatured state occurs over a narrow temperature range. As the purine and pyrimidine bases become unstacked during denaturation they absorb light of 260 nanometers wavelength more strongly. The abnormally low absorption in the stacked state is called the hypochromic effect.

    Urea and guanidinium chloride -- work by competition These compounds contain functional groups that can accept or donate hydrogen atoms in hydrogen bonding. [picture of structures] At high concentration (8 to 10 M for urea, and 6 to 8 M for guanidinium chloride) they compete favorably for the hydrogen bonds of the native structure. Hydrogen bonds of the alpha-helix will be replaced by hydrogen bonds to urea, for example, and the helix will unwind.

  2. Agents that disrupt hydrophobic interaction.

    Organic solvents, such as acetone or ethanol -- dissolve nonpolar groups.

    Detergents -- dissolve nonpolar groups.

    Cold -- increases solubility of nonpolar groups in water. When a hydrophobic group contacts water, the water dipoles must solvate it by forming an orderly array around it. The array is called an "iceberg," because it is an ordered water structure, but not true ice. The ordering of water in an "iceberg" decreases the randomness (entropy) of the system, and is energetically unfavorable. If hydrophobic groups cluster together, contact with water is minimized, and less water must become ordered. This is the driving force behind hydrophobic interaction. (The clustering together of hydrophobic groups is also entropically unfavorable, but not as much so as "iceberg" formation.) At low temperatures, solvation of hydrophobic groups by water dipoles is more favorable. The water molecules have less thermal energy. They can "sit still" to form a solvation "iceberg" more easily. The significance of cold denaturation is that cold is not a stabilizing factor for all proteins. Cold denaturation is important in proteins that are highly dependent on hydrophobic interaction to maintain their native structure.

  3. Agents that disrupt electrostatic interaction.

    pH extremes -- Most macromolecules are electrically charged. Ionizable groups of the macromolecule contribute to its net charge (sum of positive and negative charges). Bound ions also contribute to its net charge. Electric charges of the same sign repel one another. If the net charge of a macromolecule is zero or near zero, electrostatic repulsion will be minimized. The substance will be minimally soluble, because intermolecular repulsion will be minimal. A compact three-dimensional structure will be favored, because repulsion between parts of the same molecule will be minimal. The pH at which the net charge of a molecule is zero is called the isoelectric pH (or isoelectric point).

    pH extremes result in large net charges on most macromolecules. Most macromolecules contain many weakly acidic groups. At low pH all the acidic groups will be in the associated state (with a zero or positive charge). So the net charge on the protein will be positive. At high pH all the acidic groups will be dissociated (with a zero or negative charge). So the net charge on the protein will be negative. Intramolecular electrostatic repulsion from a large net charge will favor an extended conformation rather than a compact one.

  4. Agents that disrupt disulfide bridges -- destabilize some proteins.

    Agents with free sulfhydryl groups will reduce (and thereby cleave) disulfide bridges.

    Example:

    2 HO-CH2-CH2-SH + R1-S-S-R2 = R1-SH + HS-R2 + HO-CH2-CH2-S-S-CH 2-CH2-OH

Some proteins are stabilized by numerous disulfide bridges; cleaving them renders these proteins more susceptible to denaturation by other forces.

Renaturation is the regeneration of the native structure of a protein or nucleic acid. Renaturation requires removal of the denaturing conditions and restoration of conditions favorable to the native structure. This includes

Usually considerable skill and art are required to accomplish renaturation. The fact that renaturation is feasible demonstrates that the information necessary for forming the correct three-dimensional structure of a protein or nucleic acid is encoded in its primary structure, the sequence of monomer units. But...

This folding may be slow; what happens in the cell during protein synthesis? Guidance may be needed for it to occur correctly and rapidly.

Molecular chaperones are intracellular proteins which guide the folding of proteins, preventing incorrect molecular interactions. They do NOT appear as components of the final structures. Chaperones are widespread, and chaperone defects are believed to be the etiology of some diseases. Medical applications of chaperones may be expected to include things such as

Return to the NetBiochem Welcome page.

jb

Last modified 1/5/95