So for the past few months I’ve been frantically studying for not the GRE, but for my preliminary exams for my PhD. I will therefore be adding blogs that answer potential questions from my prelims in addition to GRE questions. This should be fun for all!
My advisors have been giving me “hints” (i.e. “know everything in the world”) about the type of questions they are going to ask me, and one such hint made it perfectly clear that I better have a very clear understanding about the techniques I am using in my research. I won’t go into detail, but my research involves flies and bacteria, and I’m identifying that bacteria with pyrosequencing. I’ll then figure out the concentration of bacteria using qPCR or rtPCR, whichever our lab can afford. What the best way to understand these techniques? Explain them to you, good readers! Here we go.
First, a little bit of background. Both PCR and pyrosequencing involve DNA, and therefore you have to know a little bit about how DNA replicates. Long story short: DNA is made of long chains of 4 nucleotides: A, T, G, and C. The sequence of these nucleotides (called bases) determines which amino acids are produced, which in turn determines which proteins are produced, which then make up all that we see as life. Neat! DNA is packaged into cells in double strands…each strand is complimentary to the other. When these complimentary strands unzip, they can pair up with free nucleotides and make copies of themselves. This is what all these techniques are based on.
Ok, lets get into a little bit more detail about how this replication works exactly, shall we? Nucleotides are made up of a nucleobase (adenine, guanine, cytosine, or thymine-A,G,C or T), a five carbon sugar, and some phosphate groups. Don’t let those terms “5 carbon sugar” and “phosphate group” scare you–a 5 carbon sugar is just what it sounds like…a sugar with 5 carbon molecules arranged in a ring:
Hello there 5 carbons! Ok, we’ve got one of our bases attached to this ring of carbons, plus a phosphate group:
See? Phosphate groups are simple. I always get a little freaked out when scientists start changing the endings of words–all those “-ates” and “-ites” throw me off. However, they are really just word parts that tell me a bit about the compound. I won’t go into them here, but you can read about them on Wikipedia. Thanks Wiki!
Ok, so now we’ve got our complex compound: A,T,C, or G attached to that 5 carbon ring, which has some phosphate groups hanging on it.
Now, phosphate groups are reactive–they like attaching to what we call hydroxyl groups. Hydroxyl groups are simple things with confusing names (like most things in chemistry, I think). It is simply an oxygen bonded to a hydrogen. How simple is that?
So phosphate groups and hydroxyl groups totally love each other, and they want to bond ALL THE TIME. It’s actually kinda cute. And a little gross. Anyhow, nucleotides have all these groups in particular places on their carbon rings. Let’s look at that picture of the five carbon ring again:
Ok, see the numbers? We pronounce those numbers as “five prime” or “three prime.” On nucleotides, the phosphate group is attached to the five prime carbon. See it? A hydroxyl group is attached to the three prime carbon. When two nucleotides are lined up next to each other, the 5′ (five prime) phosphate group bonds with the 3′ (three prime) hydroxyl group, and they totally make out. And make long chains of nucleotides, which become DNA. Whatever.
Therefore, in order for a nucleotide to attach to the end of a chain of nucleotides, the 3′ end has to be exposed at the end of the chain. Don’t worry, you’ll understand why I’ve explained all this in a second.
Now that we have the basics of how DNA replicates and how nucleotides stick together to facilitate that replication, let’s move on to some of the procedures I promised I’d explain forever ago.
PCR is short for Polymerase Chain Reaction, and is a method we can use to clone sequences of DNA. We often want to clone these sequences a whole bunch (on the order of a billion copies of a single sequence!), so technology is obviously involved. This is how it works:
DNA is collected from somewhere. It can be anything, really, and we only technically need a single copy (although more DNA makes this much easier). We take that DNA and break it apart. Knowing what we know about the structure of DNA makes this process simple. The two complimentary strands of DNA are attached via hydrogen bonds. Heat can break those hydrogen bonds (this is one of the reasons living things can only tolerate so much heat–DNA actually breaks apart). The temperature at which the two strands of DNA disassociate is called the DNA’s melting temperature, and varies with the DNA sequences.
You see, base pairs are attached by slightly different bonds: A attaches to T via a double bond, while G attaches to C via a triple bond. The more bonds present, the more heat it takes to break those bonds. Therefore, GC bonds take more heat than AT bonds.
If a strand of DNA has a bunch of GC pairs, then it’s going to take more heat to cause the complimentary strands to disassociate. More heat means a higher melting temperature. But I digress.
The DNA is heated until all the hydrogen bonds are broken, and then we can focus in on the particular part of the DNA that we want to copy (called “amplify”). I suppose we could do the entire genome, but that would take FOREVER and use up a lot of reagents. We don’t want that. Let’s focus on a single gene, or section, or tiny little part instead, shall we?
So, what part do we amplify? Well, that depends on the question you’re asking. Most of the time you do PCR so you can identify a particular species, or look to see if two people are related, or identify a person, or something like that, and the regions you choose to amplify vary for each of these questions. My research involves using PCR to identify a species, so I’m going to look at a particular section of RNA called the 16S rRNA region. Let me explain (because you know you want me to).
All organisms, be them eukaryotic (having membrane bound organelles) or prokaryotic (no membrane bound organelles) have ribosomes. Ribosomes are those parts of a cell that take amino acids and knit them together into proteins. Without ribosomes, proteins would never be made, and life as we know it wouldn’t exist. Thanks ribosomes!
Anyhow, ribosomes are able to do what they do because they are made up of RNA (RNA is that compliment to DNA that takes information all over the place). The RNA in ribosomes is broken up into two subunits: a large subunit and a small subunit, with messenger RNA (mRNA) smashed between the two subunits.
The amount of RNA in the ribosome depends on if the organism is a prokaryote or a eukaryote. Eukaryotes have larger chunks of RNA, and therefore the subunits are larger. The size of RNA is measured a little strangely–it doesn’t have to do with length or weight or mass or some simple measuring tool like that. Things this small are hard to measure with a ruler anyhow. No, RNA is measured by where it floats in a liquid while spinning in a centrifuge: the bigger it is, the lower it will sink when spun around. The smaller it is, the higher it will float.
Think of it this way: you know those spinning roller coasters at amusement parks where you stand against a wall and then the floor drops out and you stick? The ones where they say “if you are going to vomit, cover your mouth and raise your hand!” because if someone pukes EVERYONE is gonna have a bad day?
You ever been on one of these? They’re super fun. Did you ever look around while it was spinning, though? If you had, you would have noticed that larger people tended to slide down the wall (sometimes coming to rest on the floor ), while smaller people could stay really high up on the wall. This is a good way to separate the really big people from the really small people–the smaller a person is, the higher on the wall he’ll sit while the ride is spinning.
You can use this same principle to separate different sizes of RNA in the ribosome. You put some RNA in a liquid, spin it around, and then find out how high up on the wall it stuck. A Swedish chemist named Theodor Svedberg figured this out sometime in the 20th century. (I wonder if he went on one of these rides before going into the lab one day? I kinda hope he did). He spun RNA around and then numbered the places that it stuck to the wall. The lower the number, the higher up on the wall it stuck, and therefore the smaller the size. Naturally, he named these number units after himself, so now we have the strangely named “Svedberg unit” (hee!) to measure RNA. We abbreviate the Svedberg as S (because spelling “Svedberg” is hard).
Therefore, when you see a number followed by “S,” it means that you can tell the size of that RNA. For example, eukaryotic RNA is broken up in to a large subunit, which is 60S, and a small subunit, which is 40S. The 60S means that the larger subunit sunk down to the 60 mark in the spinning tube, while the smaller subunit only sunk down to the 40 mark. Make sense?
Now, of course we can break up the subunits of RNA into smaller and smaller bits. So we do. In bacteria (prokaryotes), the RNA is made of two subunits: the 50S and the 30S. We break up the 30S subunit into tiny, bite sized pieces because it’s easier to deal with that way. A long time ago some super smart scientist realized that a small portion of the 30S subunit was highly conserved, and could be easily used to tell species apart. This is called the 16S rRNA in prokaryotes, and the 18S rRNA in eukaryotes. It’s used all of the time, and there has been a lot of study on these regions, so most scientific studies use this in some way.
So I am, too. Since I want to be able to tell species of bacteria apart, I chose to use the 16S rRNA region to amplify and look at for my study. This is really nice, because there are primers out there that will amplify this region very easily. Ah, the perks of looking at a well-studied bit of RNA!
But I need to have enough RNA so I can look at it, and RNA is tiny…especially when I’m talking about just the 16S region. What’s a girl to do? Amplify!
PCR is the amplification of regions of DNA or RNA. (Am I repeating myself? Probably). Knowing the properties of DNA/RNA allows us to target specific regions (like the 16S region) and selectively amplify that region alone. Step one: break up double strands. Remember how to do that? Yep, heat (go over those double bonds above if you forget). As luck would have it, we know the melting temperature of DNA and RNA (due to calculations of CG content), and so if we heat up our sample to around 94 C, those bonds will rupture and we’ll be left with single stranded DNA. (For simplicity, I’m going to talk about DNA from here on out, but the same process holds true for RNA).
Once we have our single strand, we need to focus in on just the region we want (like the 16S region). To do that, we need to tell the DNA what to replicate, and then give it the means to do so. We do this by using enzymes and primers.
Enzymes are proteins which speed up reactions without being consumed themselves. The most important enzyme in a PCR reaction is called Taq polymerase (you know it’s an enzyme when you see the -ase suffix at the end of the word). A polymerase is an enzyme that attaches molecules together (and we just so happen want to have many nucleotides attached together, so it works out for us).
Every cell that has DNA (so, pretty much every cell ever) has its own polymerase that takes care of replication of DNA and of translating bits of DNA to do work in the cell. PCR uses a polymerase from a species of bacteria, Thermus aquaticus, which normall lives in hot springs.
Have you been to any hot springs? They are ridiculous. I heard a story once about someone who jumped in one at Yellowstone national park. The meat fell off his bones before he was able to resurface. That’s stupid hot. Anyhow, bacteria are able to survive in these conditions, and do quite well, thankyouverymuch. Why am I telling you this? Well, cells that live happily at lower temperatures have enzymes that work perfectly at lower temperatures. If the temps get too high, the enzymes denature and no longer work. When we run PCR, we first start out with that melting step where we raise the temperature to break apart double stranded DNA. If we use enzymes in the PCR reaction that are denatured during that step, we either can’t continue, or we have to add more enzyme after we cool the reaction down. This is EXACTLY how PCR used to work–some poor grad student (because you KNOW professors weren’t in the lab doing this for hours on end) would have to add enzyme every 3-4 minutes during a reaction, all day. Talk about a crappy job!
So after a few years of having to manually add more and more enzyme every PCR cycle, someone thought “you know, there’s gotta be a better way!” Necessity is the mother of invention and all that, and some brilliant soul thought that there must be DNA polymerase that is stable at high temperatures. Sure enough, our friendly, heat-loving bacterium saved the day, and gave us a polymerase that doens’t denature at 95 C. Because it came from the bacterium Thermus aquaticus, we now call it Taq polymerase.
Aright, so now we have our sample of DNA, heat to break those double strands apart, and an enzyme that is stable during hot spells that will facilitate copying of regions of DNA. Now it’s time to tell the polymerase which region we want to copy!
We do this by using what are called primers. Primers are short bits of DNA that selectively attach to certain regions. Scientists design primers to attach to the parts of the DNA on either side of the region we want to amplify:
These primers are simply short bits of DNA that attach to the 3′ end of the single stranded DNA. These primers attach to the regions we’re after, and form stable hydrogen bonds. Of course, we can’t do this at the high temperature we used to break apart the DNA, so we have to cool the reaction down to 45-55 C for the primers to attach. We call the the annealing temperature and the annealing step. The exact temperature needed depends on how big your primers are, and how many Cs and Gs are involved.
The longer your primer is, the less likely it is to accidentally attach to random regions of the DNA, but the more likely it is to miss the region you actually want to amplify (long primers take a lot of time to attach, and we may not give them enough time). If you want a primer that is very specific, you design one that is really long (since it won’t attach to any other region of the DNA just by chance). If you want a primer that is sensitive, however, you design one that is shorter (since it will defiantly get the region you’re after, even if you don’t give it a lot of time). Therefore, when scientists design primers, they have to think about how specific and sensitive they want their primers, and how many mistakes they’re willing and able to put up with (called “noise”).
They also have to consider the GC content, due to those pesky triple bonds. The higher the GC content in a primer, the higher your annealing temperature. As a general rule of thumb, you want to have an annealing temperature about 5 C below the melting temperature of your primers.
Primer design is considered a little bit of science, and a little bit of art. Designers use published DNA sequences to choose good primer sites, then send off to specialized companies that make the primers for them. This is why you often see people using well-studied areas of DNA or RNA for research–the primers are already in existence, you can probably buy them in bulk, and there are certain regions that are found in all living things so you can use what are called “universal primers” to amplify DNA even in species you haven’t identified. Another plus for using the 16S region in my work!
So, we’ve broken apart the double stranded DNA, gotten our heat-resistant polymerase ready to make more DNA, and found our primers to tell that polymerase where to do its work. Now what?
Well, now we let nature take its course. We supply the Taq polymerase with all the tools it needs to do its job: the perfect environment (PCR buffer that puts everything at the optimal pH and the perfect temperature), a DNA template (our sample DNA which we broke apart), a bunch of nucleotides (in lab manuals this is called dNTP, and is really just a bunch of As, Ts, Cs, and Gs), and enough time to get the job done. We provide this in the extension or elongation step, where we raise the temperature to around 72 C (which is optimal temperature for Taq polymerase) and let it do its work. The enzyme takes all those free-floating nucleotides and lines them up all nice and neat on the template DNA.
Depending on how long our target site is, we give the polymerase 1-3 minutes to do its job (the longer the site, the longer it’s gonna take to copy it, naturally). We then repeat the process 30-40 times. After we’ve repeated it that many times, we do a final extension step at the very end, just to give the polymerase some extra time to copy all the remaining single stranded DNA (usually 7 minutes does the job nicely), and then we cool the whole reaction down to refrigerator-type temperatures to hold the DNA until we’re ready to use it.
Notice how I keep saying we change the temperature of this reaction to do the different steps? We have to have precise control over the temperature to make sure everything happens in the correct sequence. (After all, what happens if we try to copy regions of the DNA before the primers are attached? Or before the DNA goes from double stranded to single stranded? Anarchy, I tell you! Actually, the reaction just wouldn’t work. Whatever). We control the temperature by doing this entire reaction in a piece of lab equipment called the thermocycler.
We put all of our PCR reaction stuff in tiny PCR tubes…
…which we then place in the thermocycler and press the “go” button. Ah, automation at its best!
So here are the steps of PCR in a nut shell:
1. Put all the ingredients in a PCR tube: DNA, Taq polymerase, nucleotides (dNTP), buffer, primers
2. Place the tubes in the thermocycler and press “go”
3. Denaturation step: the thermocycler raises the temperature to 94 C for 20-30 seconds to melt the hydrogen bonds between the double strands of DNA and create single strands of DNA, ready for copying.
4. Annealing step: the thermocycler lowers the temperature to 45-55 C for 20-40 seconds to allow the primers to find the area on the DNA we want to amplify and attach.
5. Elongation/Extension step: the thermocycler raises the temperature to 72 C for 1-3 minutes (1 minute if the target sequence is under 500 bp long, 3 minutes if it’s over 500 bp long) and Taq polymerase goes to work copying the target sequence.
6. Repeat steps 3-5: the thermocycler then starts from the beginning again, raising the temperature to break apart the newly formed DNA, lowers the temp to anneal the primers, and raises the temp to elongate the DNA. Each time it goes through steps 3-5 it’s called a “cycle,” and the thermocycler is programed to run 30-50 cycles.
7. Final elongation step: after 30 or so cycles, the thermocycler raises the temperature to 72 C for 7 minutes just to make sure all the left over single strands of DNA have time to be copied.
8. Final hold step: the thermocycler lowers the temperature to 4-15 C (4 C is about what your refrigerator is at to keep your milk cold) to keep the DNA fresh until you’re ready to use it.
Here is a good video that goes through the whole process: PCR on YouTube
How neat is that? And so simple! Of course, I’m saying that after writing 3700 words to explain exactly how it works, but whatever. It’s simple.
By the time PCR is finished, we’re left with 1 billion identical copies of our DNA. That’s enough to do whatever we want! And you know what we want to do with all those copies of the 16S region? Pyrosequencing!
Ok, now we know a bit about DNA, replication, and PCR. The next bit of information we want about all of this is what is the exact genetic code for various regions of DNA. This information can tell us a lot about the organism from which it came, their relationship to other organisms in the world, and even the presence of mutations within particular genes. In short, knowing the actual sequence of a strand of DNA opens up a whole world of possibilities for scientists.
I want to know the particular genetic code so I can identify the bacteria I’m working with. Sure, I could use various other techniques to identify my bacteria, but conventional laboratory methods take a lot of time (as in weeks), and I have other things to do. Instead, I could extract my DNA, take a couple of hours and amplify the 16S region, and then load it all into a pyrosequencing machine and have my identifications by the end of the day. The other up side to this method is my ids are positive beyond a shadow of a doubt. No one questions identifications by genetic methods. This is why DNA is so powerful as evidence.
So, how does pyrosequencing work? I must say, this technique is brilliant…I was super excited the first time I learned about it (also? I’m a bit of a nerd). It’s a method based on sequencing by synthesis, and takes advantage of some byproducts of DNA replication. It then uses enzymes from various organisms to show us (well, technically a computer) our sequence.
Remember how I talked about how DNA polymerase facilitates the addition of nucleotides to a DNA strand? (No? Look a few hundred words above this and you’ll find out). Well, what I didn’t mention at that time was that when it does this, that reaction has a byproduct: pyrophosphate.
This is a molecule of phosphorus and oxygen that can be used to make ATP (energy). So every time a single nucleotide, no matter which one, is added to a strand of DNA, this molecule is produced and sent out into the environment.
Now, if I happened to mix pyrophosphate (abbreviated PPi so I don’t have to type that long, hard-to-spell word again) with adenosine phosphosulfate (APS), I can get ATP. Isn’t that neat? So all I need to do is make that reaction happen–can you guess what I need to do that? Yep, and enzyme.
That enzyme is ATP sulfurylase, and converts PPi from nucleotide incorporation to ATP. That newly formed ATP goes floating off into the environment, all primed and ready to do some work.
We wouldn’t want to disappoint ATP, now would we? Nope, so we give it some work to do. We have provided this newly formed energy with a reaction to run: the conversion of luciferin into oxyluciferin. Why is that important work to do, you ask? Because this reaction causes a pretty glowing light–like in fireflies!
Notice the word “luciferin” conjures up images of fire and brimstone. That’s on purpose–it’s supposed to remind you of something fiery and burning…that’s how you remember that it’s a substance that glows. Many animals in the world use this reaction all the time…fireflies are just one type. They take a substance we call luciferin, add some energy and the enzyme luciferase and cause a beautiful glow on summer evenings.
Well, some brilliant scientist thought this was neat, and decided to bring the luciferin/luciferase combo into the lab. So during the pyrosequencing reaction we take newly formed ATP and give it to the luciferase enzyme, which turns luciferin into oxyluciferin, causing it emit light.
Why do we want it to emit light? I’ll tell you in a minute. Stay tuned.
Once all the reactions are finished, we want to clean up our solution so we can run the next cycle and continue on our path to sequencing DNA, so we put in a clean up enzyme in the form of apyrase, which degrades any excess nucleotides that are floating around. This leaves us with a nice, clean slate (or, more specifically, a nice clean solution).
Alright, so in order to run a pyrosequencing reaction we need some DNA, several enzymes, and some luciferin to make glow. Lets see how those things work together to give us some information, shall we?
Step 1: Put the DNA you want to sequence in a tube (this DNA usually consists of a bunch of PCR product, so you have over a billion copies of the target sequence–the more copies, the easier it is to sequence your DNA) along with primers for your sequence (so you can get synthesis started), DNA polymerase, ATP sulfurylase, luciferin, luciferase, and apryase. This gets put into the pyrosequencing machine so the computer can take over.
Step 2: Press “go” on the machine. The computer adds one of the four nucleotides (A,T,C, or G) at a time–let’s say it starts with A. It floods the tube with the A nucleotide, and the enzymes take over.
Step 3: Your DNA strand is primed and ready, so if the first nucleotide on the template strand is a T, then the A that just flooded the solution will be added by DNA polymerase on the new complimentary strand. (If you forget how DNA replication works, check out my other blogs, or watch this video).
Step 4: The incorporation of that A into the new strand causes PPi to be released. That PPi is taken by the ATP sulfurylase and converted into ATP.
Step 5: The ATP is used as an energy source to allow luciferase to turn luciferin into oxyluciferin, which emits light.
Step 6: The light given off by the reaction is recorded by a camera attached to the computer, and logged as a peak called a pyrogram.
Step 7: Once DNA polymerase has used up all the A nucleotides it needs, apyrase degrades all the extra nucleotides floating around and gets the solution ready for the next one.
Step 8: The computer floods the solution with another nucleotide (let’s say G), and the process starts again.
If there is more than one identical nucleotide in sequence (say GGG or TT), then more PPi is released (if there are two nucleotides in a row, then twice as much PPi is released; if there are three then 3x as much is released, etc.). More PPi means more ATP. More ATP means more luciferase action. More luciferase action means a brighter light. A brighter light is recorded as a higher peak by the computer.
The above steps are repeated until the entire sequence of DNA has been replicated. The computer then looks at they pyrogram and translates the peaks into a DNA sequence.
Hey, you know what would make this easier? A Video!! Watch and enjoy.
Wasn’t that neat? So pyrosequencing can kick back the sequence of up to 20,000 different DNA strands in 6 hours. How super awesome is that?!? I know!!
Well, at nearly 5000 words, that’s PCR and pyrosequencing in a nut shell. I hope you learned something!