Tag Archives: DNA

PCR, Pyrosequencing, and all that Jazz

So for the past few months I’ve been frantically studying for not the GRE, but for my preliminary exams for my PhD. I will therefore be adding blogs that answer potential questions from my prelims in addition to GRE questions. This should be fun for all!

This is my brain
This is my brain.
This is my brain on Prelims.

My advisors have been giving me “hints” (i.e. “know everything in the world”) about the type of questions they are going to ask me, and one such hint made it perfectly clear that I better have a very clear understanding about the techniques I am using in my research. I won’t go into detail, but my research involves flies and bacteria, and I’m identifying that bacteria with pyrosequencing. I’ll then figure out the concentration of bacteria using qPCR or rtPCR, whichever our lab can afford. What the best way to understand these techniques? Explain them to you, good readers! Here we go.

First, a little bit of background. Both PCR and pyrosequencing involve DNA, and therefore you have to know a little bit about how DNA replicates. Long story short: DNA is made of long chains of 4 nucleotides: A, T, G, and C. The sequence of these nucleotides (called bases) determines which amino acids are produced, which in turn determines which proteins are produced, which then make up all that we see as life. Neat! DNA is packaged into cells in double strands…each strand is complimentary to the other. When these complimentary strands unzip, they can pair up with free nucleotides and make copies of themselves. This is what all these techniques are based on.

DNA replicating

Ok, lets get into a little bit more detail about how this replication works exactly, shall we? Nucleotides are made up of a nucleobase (adenine, guanine, cytosine, or thymine-A,G,C or T), a five carbon sugar, and some phosphate groups. Don’t let those terms “5 carbon sugar” and “phosphate group” scare you–a 5 carbon sugar is just what it sounds like…a sugar with 5 carbon molecules arranged in a ring:

See the numbers? You count the carbons clockwise, from 1' to 5'.

Hello there 5 carbons! Ok, we’ve got one of our bases attached to this ring of carbons, plus a phosphate group:

Isn't it cute? Just a little phosphorus surrounded by oxygen.

See? Phosphate groups are simple. I always get a little freaked out when scientists start changing the endings of words–all those “-ates” and “-ites” throw me off. However, they are really just word parts that tell me a bit about the compound. I won’t go into them here, but you can read about them on Wikipedia. Thanks Wiki!

Ok, so now we’ve got our complex compound: A,T,C, or G attached to that 5 carbon ring, which has some phosphate groups hanging on it.

Look at this whole thing put together! Ain't it cute?

Now, phosphate groups are reactive–they like attaching to what we call hydroxyl groups. Hydroxyl groups are simple things with confusing names (like most things in chemistry, I think). It is simply an oxygen bonded to a hydrogen. How simple is that?

See? In this case we've got our hydroxyl group bonded to a carbon. How cozy they look!

So phosphate groups and hydroxyl groups totally love each other, and they want to bond ALL THE TIME. It’s actually kinda cute. And a little gross. Anyhow, nucleotides have all these groups in particular places on their carbon rings. Let’s look at that picture of the five carbon ring again:

Notice the phosphate group up in the top left.

Ok, see the numbers? We pronounce those numbers as “five prime” or “three prime.” On nucleotides, the phosphate group is attached to the five prime carbon. See it? A hydroxyl group is attached to the three prime carbon. When two nucleotides are lined up next to each other, the 5′ (five prime) phosphate group bonds with the 3′ (three prime) hydroxyl group, and they totally make out. And make long chains of nucleotides, which become DNA. Whatever.

The make out sessions are circled in blue.

Therefore, in order for a nucleotide to attach to the end of a chain of nucleotides, the 3′ end has to be exposed at the end of the chain. Don’t worry, you’ll understand why I’ve explained all this in a second.

Now that we have the basics of how DNA replicates and how nucleotides stick together to facilitate that replication, let’s move on to some of the procedures I promised I’d explain forever ago.


PCR is short for Polymerase Chain Reaction, and is a method we can use to clone sequences of DNA. We often want to clone these sequences a whole bunch (on the order of a billion copies of a single sequence!), so technology is obviously involved. This is how it works:

DNA is collected from somewhere. It can be anything, really, and we only technically need a single copy (although more DNA makes this much easier). We take that DNA and break it apart. Knowing what we know about the structure of DNA makes this process simple. The two complimentary strands of DNA are attached via hydrogen bonds. Heat can break those hydrogen bonds (this is one of the reasons living things can only tolerate so much heat–DNA actually breaks apart). The temperature at which the two strands of DNA disassociate is called the DNA’s melting temperature, and varies with the DNA sequences.

You see, base pairs are attached by slightly different bonds: A attaches to T via a double bond, while G attaches to C via a triple bond. The more bonds present, the more heat it takes to break those bonds. Therefore, GC bonds take more heat than AT bonds.

See those two Hs in the middle? Those are the hydrogen bonds. There are two of them.
See how there are three Hs here? That's a triple bond.

If a strand of DNA has a bunch of GC pairs, then it’s going to take more heat to cause the complimentary strands to disassociate. More heat means a higher melting temperature. But I digress.

The DNA is heated until all the hydrogen bonds are broken, and then we can focus in on the particular part of the DNA that we want to copy (called “amplify”). I suppose we could do the entire genome, but that would take FOREVER and use up a lot of reagents. We don’t want that. Let’s focus on a single gene, or section, or tiny little part instead, shall we?

So, what part do we amplify? Well, that depends on the question you’re asking. Most of the time you do PCR so you can identify a particular species, or look to see if two people are related, or identify a person, or something like that, and the regions you choose to amplify vary for each of these questions. My research involves using PCR to identify a species, so I’m going to look at a particular section of RNA called the 16S rRNA region. Let me explain (because you know you want me to).

All organisms, be them eukaryotic (having membrane bound organelles) or prokaryotic (no membrane bound organelles) have ribosomes. Ribosomes are those parts of a cell that take amino acids and knit them together into proteins. Without ribosomes, proteins would never be made, and life as we know it wouldn’t exist. Thanks ribosomes!

Anyhow, ribosomes are able to do what they do because they are made up of RNA (RNA is that compliment to DNA that takes information all over the place). The RNA in ribosomes is broken up into two subunits: a large subunit and a small subunit, with messenger RNA (mRNA) smashed between the two subunits.

The amount of RNA in the ribosome depends on if the organism is a prokaryote or a eukaryote. Eukaryotes have larger chunks of RNA, and therefore the subunits are larger. The size of RNA is measured a little strangely–it doesn’t have to do with length or weight or mass or some simple measuring tool like that. Things this small are hard to measure with a ruler anyhow. No, RNA is measured by where it floats in a liquid while spinning in a centrifuge: the bigger it is, the lower it will sink when spun around. The smaller it is, the higher it will float.

Think of it this way: you know those spinning roller coasters at amusement parks where you stand against a wall and then the floor drops out and you stick? The ones where they say “if you are going to vomit, cover your mouth and raise your hand!” because if someone pukes EVERYONE is gonna have a bad day?

OMG, nobody throw up! Also, I always wanted to hang upside down on these, but I was never brave enough.

You ever been on one of these? They’re super fun. Did you ever look around while it was spinning, though? If you had, you would have noticed that larger people tended to slide down the wall (sometimes coming to rest on the floor ), while smaller people could stay really high up on the wall. This is a good way to separate the really big people from the really small people–the smaller a person is, the higher on the wall he’ll sit while the ride is spinning.

You can use this same principle to separate different sizes of RNA in the ribosome. You put some RNA in a liquid, spin it around, and then find out how high up on the wall it stuck. A Swedish chemist named Theodor Svedberg figured this out sometime in the 20th century. (I wonder if he went on one of these rides before going into the lab one day? I kinda hope he did). He spun RNA around and then numbered the places that it stuck to the wall. The lower the number, the higher up on the wall it stuck, and therefore the smaller the size. Naturally, he named these number units after himself, so now we have the strangely named “Svedberg unit” (hee!) to measure RNA. We abbreviate the Svedberg as S (because spelling “Svedberg” is hard).

Therefore, when you see a number followed by “S,” it means that you can tell the size of that RNA. For example, eukaryotic RNA is broken up in to a large subunit, which is 60S, and a small subunit, which is 40S. The 60S means that the larger subunit sunk down to the 60 mark in the spinning tube, while the smaller subunit only sunk down to the 40 mark. Make sense?

Now, of course we can break up the subunits of RNA into smaller and smaller bits. So we do. In bacteria (prokaryotes), the RNA is made of two subunits: the 50S and the 30S. We break up the 30S subunit into tiny, bite sized pieces because it’s easier to deal with that way. A long time ago some super smart scientist realized that a small portion of the 30S subunit was highly conserved, and could be easily used to tell species apart. This is called the 16S rRNA in prokaryotes, and the 18S rRNA in eukaryotes. It’s used all of the time, and there has been a lot of study on these regions, so most scientific studies use this in some way.

So I am, too. Since I want to be able to tell species of bacteria apart, I chose to use the 16S rRNA region to amplify and look at for my study. This is really nice, because there are primers out there that will amplify this region very easily. Ah, the perks of looking at a well-studied bit of RNA!

But I need to have enough RNA so I can look at it, and RNA is tiny…especially when I’m talking about just the 16S region. What’s a girl to do? Amplify!

PCR is the amplification of regions of DNA or RNA. (Am I repeating myself? Probably). Knowing the properties of DNA/RNA allows us to target specific regions (like the 16S region) and selectively amplify that region alone. Step one: break up double strands. Remember how to do that? Yep, heat (go over those double bonds above if you forget). As luck would have it, we know the melting temperature of DNA and RNA (due to calculations of CG content), and so if we heat up our sample to around 94 C, those bonds will rupture and we’ll be left with single stranded DNA. (For simplicity, I’m going to talk about DNA from here on out, but the same process holds true for RNA).

Once we have our single strand, we need to focus in on just the region we want (like the 16S region). To do that, we need to tell the DNA what to replicate, and then give it the means to do so. We do this by using enzymes and primers.

Enzymes are proteins which speed up reactions without being consumed themselves. The most important enzyme in a PCR reaction is called Taq polymerase (you know it’s an enzyme when you see the -ase suffix at the end of the word). A polymerase is an enzyme that attaches molecules together (and we just so happen want to have many nucleotides attached together, so it works out for us).

Every cell that has DNA (so, pretty much every cell ever) has its own polymerase that takes care of replication of DNA and of translating bits of DNA to do work in the cell. PCR uses a polymerase from a species of bacteria, Thermus aquaticus, which normall lives in hot springs.

So steamy! Who knew there would be life in such unfavorable conditions?!?

Have you been to any hot springs? They are ridiculous. I heard a story once about someone who jumped in one at Yellowstone national park. The meat fell off his bones before he was able to resurface. That’s stupid hot. Anyhow, bacteria are able to survive in these conditions, and do quite well, thankyouverymuch. Why am I telling you this? Well, cells that live happily at lower temperatures have enzymes that work perfectly at lower temperatures. If the temps get too high, the enzymes denature and no longer work. When we run PCR, we first start out with that melting step where we raise the temperature to break apart double stranded DNA. If we use enzymes in the PCR reaction that are denatured during that step, we either can’t continue, or we have to add more enzyme after we cool the reaction down. This is EXACTLY how PCR used to work–some poor grad student (because you KNOW professors weren’t in the lab doing this for hours on end) would have to add enzyme every 3-4 minutes during a reaction, all day. Talk about a crappy job!

Imagine doing this every 2-3 minutes, for several hundred little tubes, 30-40 times a day. Worst. Job. Ever.

So after a few years of having to manually add more and more enzyme every PCR cycle, someone thought “you know, there’s gotta be a better way!” Necessity is the mother of invention and all that, and some brilliant soul thought that there must be DNA polymerase that is stable at high temperatures. Sure enough, our friendly, heat-loving bacterium saved the day, and gave us a polymerase that doens’t denature at 95 C. Because it came from the bacterium Thermus aquaticus, we now call it Taq polymerase.

Aright, so now we have our sample of DNA, heat to break those double strands apart, and an enzyme that is stable during hot spells that will facilitate copying of regions of DNA. Now it’s time to tell the polymerase which region we want to copy!

We do this by using what are called primers. Primers are short bits of DNA that selectively attach to certain regions. Scientists design primers to attach to the parts of the DNA on either side of the region we want to amplify:

See? Primers on either side.

These primers are simply short bits of DNA that attach to the 3′ end of the single stranded DNA. These primers attach to the regions we’re after, and form stable hydrogen bonds. Of course, we can’t do this at the high temperature we used to break apart the DNA, so we have to cool the reaction down to 45-55 C for the primers to attach. We call the the annealing temperature and the annealing step. The exact temperature needed depends on how big your primers are, and how many Cs and Gs are involved.

The longer your primer is, the less likely it is to accidentally attach to random regions of the DNA, but the more likely it is to miss the region you actually want to amplify (long primers take a lot of time to attach, and we may not give them enough time). If you want a primer that is very specific, you design one that is really long (since it won’t attach to any other region of the DNA just by chance). If you want a primer that is sensitive, however, you design one that is shorter (since it will defiantly get the region you’re after, even if you don’t give it a lot of time). Therefore, when scientists design primers, they have to think about how specific and sensitive they want their primers, and how many mistakes they’re willing and able to put up with (called “noise”).

They also have to consider the GC content, due to those pesky triple bonds. The higher the GC content in a primer, the higher your annealing temperature. As a general rule of thumb, you want to have an annealing temperature about 5 C below the melting temperature of your primers.

Primer design is considered a little bit of science, and a little bit of art. Designers use published DNA sequences to choose good primer sites, then send off to specialized companies that make the primers for them. This is why you often see people using well-studied areas of DNA or RNA for research–the primers are already in existence, you can probably buy them in bulk, and there are certain regions that are found in all living things so you can use what are called “universal primers” to amplify DNA even in species you haven’t identified. Another plus for using the 16S region in my work!

So, we’ve broken apart the double stranded DNA, gotten our heat-resistant polymerase ready to make more DNA, and found our primers to tell that polymerase where to do its work. Now what?

Well, now we let nature take its course. We supply the Taq polymerase with all the tools it needs to do its job: the perfect environment (PCR buffer that puts everything at the optimal pH and the perfect temperature), a DNA template (our sample DNA which we broke apart), a bunch of nucleotides (in lab manuals this is called dNTP, and is really just a bunch of As, Ts, Cs, and Gs), and enough time to get the job done. We provide this in the extension or elongation step, where we raise the temperature to around 72 C (which is optimal temperature for Taq polymerase) and let it do its work. The enzyme takes all those free-floating nucleotides and lines them up all nice and neat on the template DNA.

Look at it go!

Depending on how long our target site is, we give the polymerase 1-3 minutes to do its job (the longer the site, the longer it’s gonna take to copy it, naturally). We then repeat the process 30-40 times. After we’ve repeated it that many times, we do a final extension step at the very end, just to give the polymerase some extra time to copy all the remaining single stranded DNA (usually 7 minutes does the job nicely), and then we cool the whole reaction down to refrigerator-type temperatures to hold the DNA until we’re ready to use it.

Notice how I keep saying we change the temperature of this reaction to do the different steps? We have to have precise control over the temperature to make sure everything happens in the correct sequence. (After all, what happens if we try to copy regions of the DNA before the primers are attached? Or before the DNA goes from double stranded to single stranded? Anarchy, I tell you! Actually, the reaction just wouldn’t work. Whatever).  We control the temperature by doing this entire reaction in a piece of lab equipment called the thermocycler.

...and here it is! Looks kinda boring, doesn't it?

We put all of our PCR reaction stuff in tiny PCR tubes…

Who knew everything in microbiology was so...micro?

…which we then place in the thermocycler and press the “go” button. Ah, automation at its best!

So here are the steps of PCR in a nut shell:

1. Put all the ingredients in a PCR tube: DNA, Taq polymerase, nucleotides (dNTP), buffer, primers

2. Place the tubes in the thermocycler and press “go”

3. Denaturation step: the thermocycler raises the temperature to 94 C  for 20-30 seconds to melt the hydrogen bonds between the double strands of DNA and create single strands of DNA, ready for copying.

4. Annealing step: the thermocycler lowers the temperature to 45-55 C for 20-40 seconds to allow the primers to find the area on the DNA we want to amplify and attach.

5. Elongation/Extension step: the thermocycler raises the temperature to 72 C for 1-3 minutes (1 minute if the target sequence is under 500 bp long, 3 minutes if it’s over 500 bp long) and Taq polymerase goes to work copying the target sequence.

6. Repeat steps 3-5: the thermocycler then starts from the beginning again, raising the temperature to break apart the newly formed DNA, lowers the temp to anneal the primers, and raises the temp to elongate the DNA. Each time it goes through steps 3-5 it’s called a “cycle,” and the thermocycler is programed to run 30-50 cycles.

7. Final elongation step: after 30 or so cycles, the thermocycler raises the temperature to 72 C for 7 minutes just to make sure all the left over single strands of DNA have time to be copied.

8. Final hold step: the thermocycler lowers the temperature to 4-15 C (4 C is about what your refrigerator is at to keep your milk cold) to keep the DNA fresh until you’re ready to use it.

Here is a good video that goes through the whole process: PCR on YouTube

How neat is that? And so simple! Of course, I’m saying that after writing 3700 words to explain exactly how it works, but whatever. It’s simple.

By the time PCR is finished, we’re left with 1 billion identical copies of our DNA. That’s enough to do whatever we want! And you know what we want to do with all those copies of the 16S region? Pyrosequencing!


Ok, now we know a bit about DNA, replication, and PCR. The next bit of information we want about all of this is what is the exact genetic code for various regions of DNA. This information can tell us a lot about the organism from which it came, their relationship to other organisms in the world, and even the presence of mutations within particular genes. In short, knowing the actual sequence of a strand of DNA opens up a whole world of possibilities for scientists.

I want to know the particular genetic code so I can identify the bacteria I’m working with. Sure, I could use various other techniques to identify my bacteria, but conventional laboratory methods take a lot of time (as in weeks), and I have other things to do. Instead, I could extract my DNA, take a couple of hours and amplify the 16S region, and then load it all into a pyrosequencing machine and have my identifications by the end of the day. The other up side to this method is my ids are positive beyond a shadow of a doubt. No one questions identifications by genetic methods. This is why DNA is so powerful as evidence.

So, how does pyrosequencing work? I must say, this technique is brilliant…I was super excited the first time I learned about it (also? I’m a bit of a nerd). It’s a method based on sequencing by synthesis, and takes advantage of some byproducts of DNA replication.  It then uses enzymes from various organisms to show us (well, technically a computer) our sequence.

Remember how I talked about how DNA polymerase facilitates the addition of nucleotides to a DNA strand? (No? Look a few hundred words above this and you’ll find out). Well, what I didn’t mention at that time was that when it does this, that reaction has a byproduct: pyrophosphate.

Look! Phosphorus and oxygen all bonded together.

This is a molecule of phosphorus and oxygen that can be used to make ATP (energy).  So every time a single nucleotide, no matter which one, is added to a strand of DNA, this molecule is produced and sent out into the environment.

Now, if I happened to mix pyrophosphate (abbreviated PPi so I don’t have to type that long, hard-to-spell word again) with adenosine phosphosulfate (APS), I can get ATP. Isn’t that neat? So all I need to do is make that reaction happen–can you guess what I need to do that? Yep, and enzyme.

That enzyme is ATP sulfurylase, and converts PPi from nucleotide incorporation to ATP. That newly formed ATP goes floating off into the environment, all primed and ready to do some work.

We wouldn’t want to disappoint ATP, now would we? Nope, so we give it some work to do. We have provided this newly formed energy with a reaction to run: the conversion of luciferin into oxyluciferin. Why is that important work to do, you ask? Because this reaction causes a pretty glowing light–like in fireflies!

It glows with the power of luciferase!

Notice the word “luciferin” conjures up images of fire and brimstone. That’s on purpose–it’s supposed to remind you of something fiery and burning…that’s how you remember that it’s a substance that glows. Many animals in the world use this reaction all the time…fireflies are just one type. They take a substance we call luciferin, add some energy and the enzyme luciferase and cause a beautiful glow on summer evenings.

Well, some brilliant scientist thought this was neat, and decided to bring the luciferin/luciferase combo into the lab. So during the pyrosequencing reaction we take newly formed ATP and give it to the luciferase enzyme, which turns luciferin into oxyluciferin, causing it emit light.

I don't even know what to put here. Look! Reactions!

Why do we want it to emit light? I’ll tell you in a minute. Stay tuned.

Once all the reactions are finished, we want to clean up our solution so we can run the next cycle and continue on our path to sequencing DNA, so we put in a clean up enzyme in the form of apyrase, which degrades any excess nucleotides that are floating around. This leaves us with a nice, clean slate (or, more specifically, a nice clean solution).

Alright, so in order to run a pyrosequencing reaction we need some DNA, several enzymes, and some luciferin to make glow. Lets see how those things work together to give us some information, shall we?

Step 1: Put the DNA you want to sequence in a tube (this DNA usually consists of a bunch of PCR product, so you have over a billion copies of the target sequence–the more copies, the easier it is to sequence your DNA) along with primers for your sequence (so you can get synthesis started), DNA polymerase, ATP sulfurylase, luciferin, luciferase, and apryase. This gets put into the pyrosequencing machine so the computer can take over.

Another boring looking machine that does something quite neato.

Step 2: Press “go” on the machine. The computer adds one of the four nucleotides (A,T,C, or G) at a time–let’s say it starts with A. It floods the tube with the A nucleotide, and the enzymes take over.

Step 3: Your DNA strand is primed and ready, so if the first nucleotide on the template strand is a T, then the A that just flooded the solution will be added by DNA polymerase on the new complimentary strand. (If you forget how DNA replication works, check out my other blogs, or watch this video).

Step 4: The incorporation of that A into the new strand causes PPi to be released. That PPi is taken by the ATP sulfurylase and converted into ATP.

Step 5: The ATP is used as an energy source to allow luciferase to turn luciferin into oxyluciferin, which emits light.

Step 6: The light given off by the reaction is recorded by a camera attached to the computer, and logged as a peak called a pyrogram.

Step 7: Once DNA polymerase has used up all the A nucleotides it needs, apyrase degrades all the extra nucleotides floating around and gets the solution ready for the next one.

Step 8: The computer floods the solution with another nucleotide (let’s say G), and the process starts again.

The general sequence of steps

If there is more than one identical nucleotide in sequence (say GGG or TT), then more PPi is released (if there are two nucleotides in a row, then twice as much PPi is released; if there are three then 3x as much is released, etc.). More PPi means more ATP. More ATP means more luciferase action. More luciferase action means a brighter light. A brighter light is recorded as a higher peak by the computer.

The above steps are repeated until the entire sequence of DNA has been replicated. The computer then looks at they pyrogram and translates the peaks into a DNA sequence.

Hey, you know what would make this easier? A Video!! Watch and enjoy.

Wasn’t that neat? So pyrosequencing can kick back the sequence of up to 20,000 different DNA strands in 6 hours. How super awesome is that?!? I know!!

Well, at nearly 5000 words, that’s PCR and pyrosequencing in a nut shell. I hope you learned something!


Amino Acids and DNA

Let’s do an easy one, shall we? Ok! Here’s the question:

The cDNA fragment that includes the ricin gene is 5.7 kilobases. If the entire fragment codes for the ricen polypeptide,the approximate number of amino acids in the poly peptide would be: (enter some weird numbers with lots of zeros here).

Well, once again the GRE just loves trying to confuse people with scary names and things. In this case, it throws in that whole ricin thing to throw you off. You can really just take that out of this question, so it reads something like “The cDNA fragment is 5.7 kilobases. How many amino acids does this code for?”

Alright, this is another one of those you-have-to-know-it questions. How much DNA does it take to code for a single amino acid? First, some very basic background. Amino acids are the building blocks of protein, and really what DNA codes for. Remember when we talked about DNA? DNA strands are studded with genes. Genes are simply lengths of DNA that code for certain proteins. Since the lengths of DNA make proteins, parts of the genes must code for the building blocks of proteins, or amino acids.

The next logical question is what percentage of each length of DNA codes for each amino acid? Ok, I’ll just tell you: 3 base pairs. Yep, that’s it. 3. Once you know how many base pairs are in a gene, then you just divide by three and that gives you the number of amino acids the gene codes for. How many base pairs are in the gene the question is asking about? 5.7 kilobases. Once again, don’t be afraid of words here. “Kilo” simply means 1000, while “bases” means, well, bases. So 5.7 kilobases is 5700 bases or base pairs. Divide that by three, and you get the nice round number of 1900. There you go!

How DNA moves through a electrophoretic gel

Well, I’m sitting at a conference at the moment, and have decided that it has been too long since I have indulged in the joy of biological teaching. Seriously! Stop laughing. Here’s the question I randomly chose for today:

The rate at which a DNA fragment moves in an electrophoretic gel is primarily a function of the fragment’s….

Isn’t it lucky that I totally by accident chose a question that can be answered pretty quickly? I know! Lucky! Anyhow, let me tell you a little about electrophoresis. This process is a step used in laboratories to study DNA, and is often taught in every single lab class in college simply because it’s rather simple and rather impressive. (Seriously–try this the next time you’re having dinner with your family “So I was studying deoxyribonucleic acid the other day, and needed to separate the fragments after I broke the bonds at known gene sites, so I simply ran an electrophoretic gel.” This is good for at least an extra helping of dessert and hours of proud bragging by your mom at the next knitting circle).

Well, how exactly does this work? DNA, as you might imagine, is huge. Think about the amazinhg amount of information stored in the genetic code–all that information just sitting there waiting to be expressed. When we study DNA, we usually want to study a particular section, or a particular gene. We do this by cutting the big string of DNA into fragments using enzymes. We then copy the DNA (lots and lots and lots through PCR which I’ll explain in a later post) and then somehow have to pick out the genes we want to focus upon.

This is where electrophoresis comes in. An electrophoretic gel is basically really stiff Jell-O. The gel is melted and poured into a rectangular mold, and 8 (or so) wells are formed in one end of the solidified gel. These wells give us a place to put the DNA. Now, DNA has a charge. Due to it’s chemical make up and all that jazz, it has a an overall negative charge. At this point, we want to separate the DNA into its different fragments, so some smarty somewhere decided to use that overall negative charge to do just this. The gel (with its wells filled to the brim with DNA in a liquid medium) is subjected to an electrical current. The DNA fragments are pulled through the pores of the gel as it is attracted to the positively charged energy at the far end of the gel.

Now, the DNA separates depending upon its size. The bigger the DNA fragment, the harder it is to force it through those tiny, tiny pores in the solid gel. Therefore, the bigger (or longer) the DNA fragment, the more slowly it moves through the gel. After a predetermined amount of time, the electrical current is removed, and the gel is stained with some horrible substance that causes DNA to glow under a black light. You then take a picture of the gel and look at the bands (see the picture above) and the ones that are furthest away from the wells are the shortest, while the ones closest to the wells are the longest.

So, back to the question:

The rate at which a DNA fragment moves in an electrophoretic gel is primarily a function of the fragment’s:
A) Length
B) double helical structure
C) Radioactivity
D) Degree of methylation
E) Adenine content

Can you pick out the correct answer now? Movement through an electrophoretic gel is strictly due to size, therefore the answer is “A.”

DNA Replication (i.e. Base Pair Porn!)

Could I come up with a more boring title? I don’t think so! But how in the world do you write something interesting about how DNA copies itself? Maybe “base pair porn!” That would totally work! I’m putting that now…hee for me! Anyhow, on to today’s question!

When DNA replicates semi conservatively, which of the following is true of each daughter DNA molecule?

A) Both strands are newly synthesized
B) One strand is newly synthesized, whereas the other is a strand from the parent DNA molecule
C) Both strands are the original strands of the parent molecule
D) One strand has more AT-rich regions than the other strand has
E) The newly synthesized strands are more susceptible to melting and renaturation than the parental DNA strands are

Ok, the big question in this question is “What is semi conservative replication?”

Remember that blog I did about complimentary base pairing? Yeah, me too! That was a good one. Sigh. Well, this is sort of a continuation of that last post. When DNA needs to copy itself, it undergoes replication. There are three methods the books talk about when discussing DNA replication: conservative, dispersive, and semi conservative.

Conservative DNA replication is when an entirely new double helix of DNA is replicated for the new (or daughter) cell. This works just like a copy machine–it’s based on the mother cell’s dna, and an exact copy is made. The two new strands are what are sent on to the daughter cell, while the strands they were copied from are left in the mother cell. This method of DNA replication has not been found to be biologically significant, so most people don’t really care about it. And neither do we!

Dispersive replication is when bits and pieces of the mother strands are mixed up with new sections and all put together into a new double helix. The two daughter cells end up with a strange mix-and-match version of the DNA made up of both mother and daughter sections. Just like the last one, no one thinks this is a biologically significant method of replication.

Finally, the big one: semiconservative replication. This is the main way DNA is totally replicated during cell division. During this type of replication, the entire DNA double helix unzips. A new strand is made to match up with each original strand using complimentary base pairing. The result is two double helices where only one was before. Each double helix is made up of an old strand of DNA (the mother strand) and a new strand of DNA (the daughter strand). Each new daughter cell gets a double helix of DNA–one strand from the mother cell and one brand-spankin’-new strand. This is the only replication method of the three that is considered biologically significant (meaning, this is what we care about!)

Ok, back to the question:

When DNA replicates semi conservatively, which of the following is true of each daughter DNA molecule?

A) Both strands are newly synthesized
B) One strand is newly synthesized, whereas the other is a strand from the parent DNA molecule
C) Both strands are the original strands of the parent molecule
D) One strand has more AT-rich regions than the other strand has
E) The newly synthesized strands are more susceptible to melting and renaturation than the parental DNA strands are

Let’s go through the answers. “A” is obviously incorrect, since we just learned that when both strands of a double helix are newly synthesized, that is called conservative replication. “C” is also wrong, because if both strands were of the parent molecule, no replication would have happened at all….the DNA would have just moved from one cell to another. “D” just doesn’t make much sense. We know from complimentary base pairing, that each strand has exactly the same number of bases, so it’s impossible for a semi conservatively replicated strand to have more AT regions than the other. “E” tries to throw you off by mentioning melting and renaturation, but we don’t care about that.That leaves “B.” This answer is the definition of semiconservative replication–one strand is newly synthesized, whereas the other is a strand from the parent DNA molecule.

There you go! Yay us!

DNA and RNA Base Pairing

Today’s subject involves the basics of DNA and RNA. Here’s the question from the GRE practice test I’ll be answering:

“The complementary RNA sequence for GATCAA is….” (and then there is a list of answers).

This is actually a simple question, provided you know a two key bits of information–1) What is RNA? 2) What the hell are all those letters? I’ll tell you!

I’m sure by this time in your life, no matter what level of education you currently have stuffed into your pretty little brain, you have heard of DNA. DNA is the handy short-hand for deoxyribose nucleic acid, and is a double stranded helical structure found in the nucleus of eukaryotic cells. The double helix resembles a ladder, with two parallel sides and pairs of bases that match up and form the rungs.

These “rungs” are called nucleotides, and are made up of a sugar (in the case of DNA, that sugar is dexoyribose), one phosphate group, and a nitrogenous base. That nitrogenous base is what we are interested in today. Don’t let the phrase “nitrogenous base” scare you–this is just a way for biologists to sound smart when talking about something relatively simple. In this case, a nitrogenous base is simply a compound that contains nitrogen and happens to be basic. Easy, yes? Ok, so the rungs of the double helix are made of a pair of nitrogenous bases–two of these nitrogen containing bases that pair up.

There are four of these bases involved with DNA: Adenine (A), Thymine (T), Guanine (G), and Cytosine (C), and these bases follow a concept called complimentary base pairing. This fancy sounding process simply means that each base only pairs up with the one that is likes the best, or the one that compliments it: adenine pairs with thymine, and guanine pairs with cytosine. Biologists hate writing out the full name of things, so each of these bases is abbreviated down to the first letter of its name: A pairs with T, and G pairs with C–AT, GC.

When DNA replicates, the double helix unzips, and free-floating bases pair up with their partners to form new strands. If we know the sequence of bases on one strand, we can predict what the complimentary strand will look like using complimentary base pairing:

ATTTCGGA will pair up with the strand TAAAGCCT. See how that works? The bases pair up with their favorite, and form a new strand in the process. There’s the basics!

Now, DNA doesn’t just make copies of itself. On the contrary, it most of the time codes for proteins that build things or activate things or deactivate things, or do any number of jobs in the cell. In order to code for these proteins, the DNA needs to get its message to the rest of the cell. It does this via RNA

RNA stands for ribonucleic acid–it looks a heck of a lot like DNA, except it is made up of the sugar ribose instead of deoxyribose. RNA is the messenger unit of the cell. It’s job is to take memos from DNA, and give that information to the rest of the cell. RNA gets its memos from DNA via complimentary base pairing. Who knew?! When DNA wants the cell to make a protein, it unzips that little portion of the double helix that codes for that protein, RNA zips in and makes a copy of the information using complimentary base pairing, and zips out again to take the information to the rest of the cell.

So how can we tell the difference between RNA and DNA? Well, other than the fact that RNA is made of ribose while DNA is made of deoxyribose, they also use slightly different nitrogenous bases. While DNA uses the bases adenine, thymine, guanine, and cytosine, RNA uses adenine, URACIL, guanine, and cytosine. In DNA, adenine pairs up with thymine (AT). In RNA, adenine pairs up with uracil (AU). Just think of it as if RNA can’t seem to produce a T, so it has to produce something else to match up with A. So, if I were to ask you, say, what is complimentary RNA sequence for GATCAA, you would say CUAGUU. See how that works? Every time you see a “G” you match it up with “C.” When you see a “T” you match it up with “A,” and when you see an “A” you match it up with “U.”

Back to the question:

The complementary RNA sequence for GATCAA is:


In this case, you can immediately knock out three of the answers. Since we know that RNA doesn’t produce any T’s, then we can get rid of A, C, and E. That leaves B and D to choose from. We are also familiar with the concept of complimentary base pairing, so we know that G always pairs with C, and A with T/U. This is one of those questions that I suggest answering before you look at the answers, then just scanning the answers for the one that matches what you came up with. In this case, the answer is “B.”

Incidentally, as I was scanning through the GRE, I noticed another question along these same lines:

When DNA is extracted from cells of E. coli and analyzed for base composition, it is found that 38 percent of the bases are cytosine. What percentage of the bases are adenine?

Because we know about complimentary base pairing now, we can figure out this question pretty easily. I’ve noticed that the GRE likes trying to scare test-takers by saying things like “DNA is extracted from E. coli.” Don’t let them! DNA is DNA, and it doesn’t matter what species it’s extracted from, it is still made up of those same 4 bases. (Isn’t that amazing, by the way? This is why I love biology!).

The question tells us that 38% of the bases were cytosine. We know cytosine pairs up with guanine, so another 38% must be guanine. (Think about this for a second–remember that both strands of the double helix were being analyzed here, so every instance of cytosine was counted. You don’t find cytosine in DNA with it’s best friend guanine, so if 38% were cytosine, then 38% had to be guanine). Ok, 38 + 38 = 76% of the DNA accounted for. What does that leave? 24% of the bases must be adenine and thymine. Since these guys are paired up equally, then half of that 24% must be adenine, and the other half thymine, therefore 12% of the bases are adenine and 12% thymine. Here’s the question again:

When DNA is extracted from cells of E. coli and analyzed for base composition, it is found that 38 percent of the bases are cytosine. What percentage of the bases are adenine?

A) 12%
B) 24%
C) 38%
D) 62%
E) 76%

Do you see how annoying the answer writers of this test can be? They put in all the possible numbers you could come up with when figuring out this answer: 12% (the percentage of adenine in the DNA), 24% (the percentage of adenine and thymine together in the DNA), 38% (the percentage of cytosine or guanine) and 76% (the percentage of guanine and cytosine together). However, because you know the basics of complimentary base pairing you are able to figure out that the correct answer is “A.” Good for you!