Connecting DNA to Disease Using BLAST

Author: Catherine Quist

Grade Level: 10th

Lesson Overview: Students are given a DNA sequence for a gene that was found to be expressed in a diseased tissue. Students transcribe and translate the sequence of DNA and perform a BLAST search against a database of known proteins to determine which protein their sequence encodes. They are then asked to make inferences as to the disease caused by the protein. The goal is to show students that genes encode proteins, which in turn can cause disease if mutated or function improperly. The students are also introduced to how to use BLAST, an authentic bioinformatics tool, to correctly identify a gene and its associated disease.

Learning Outcomes:
  • Learning Performances: At the end of this lesson, students will understand the connection between DNA sequence, protein, protein function and disease. They will also know how to use the program BLAST to identify a gene and it's protein sequence, and then use this identification as the basis for inferring a patient is suffering from a particular disease.
> At the end of the activity students will be able to:
  • 1. Transcribe and translate a DNA sequence
  • 2. Use the internet and BLAST, a specialized database search engine, to identify the protein encoded by a certain DNA sequence.
  • 3. Use Google to research a certain protein and the disease that can result from an abnormality in the protein
  • Links to Michigan Standards:
    • B4.2D Predict the consequences that changes in the DNA composition of particular genes may have on an organism (e.g., sickle cell anemia, other).
    • B4.2f Demonstrate how the genetic information in DNA molecules provides instructions for assembling protein molecules adn that this is virtually the same mechanism for all life forms.
    • B4.2g Describe the processes of replication, transcription, and translation and how they relate to each other in molecular biology.

Students' Prior Knowledge of Experience:
  • Students should understand the structure of DNA and how it is abbreviated.
  • Students should understand how DNA is transcribed to yield mRNA.
  • Students should understand how mRNA is translated to yield protein.
  • Students should understand how changes in the DNA composition affect its resuting protein sequence.
  • Students should understand how changes in a protein sequence may affect an organism.
  • Students should be familiar with using the internet, including how to use Google to find information and how to submit information to a server using a form.

Establishing Purpose: Unfortunately, most students fail to make the connection between DNA sequence, proteins, and protein function to sustain life. The goal of this activity is to allow students to discover that DNA sequences, while the A, T, G, and C’s seem like non-sense, do in fact encode very important proteins that help us sustain life. They will also discover that some of these proteins are involved in causing diseases. In addition, they will have an authentic biomedical research experience. Using DNA chips researchers can infer that a gene sequence is over or under-expressed in a tissue. In order to make inferences about the role of the gene in a disease, the researcher will use the program BLAST to identify the protein associated with the gene. Next, the researcher performs a Google search to find out the disease associated with that protein. In this activity, the students will have an opportunity to engage in this process using the BLAST server that is used by researchers. Finally, this activity will give students an opportunity to practice the process of transcribing and translating a DNA sequence to yield its protein sequence.

Materials Needed:
1 DNA sequence per group
1 computer with an internet connection per group

Time required: 2 hours

Instructional Sequence:
  • Introducing the Lesson:
    • Activating Prior Knowledge with Questions:
    1. What are the four nucleotides that make up a DNA code? What are their common abbreviation?
    2. What does DNA code for?
    3. What is a gene? Where/how do we get genes?
    4. Where are genes located?
    5. Explain how DNA determines the traits of an organism. Use the words nucleotides, gene and protein in your answer.
    6. What do you think can cause a genetic disease?
    7. What will happen to an organism's homeostasis if a gene for an important protein becomes defective?
    8. What would happen if an organism inherited a gene that coded for defective proteins?

    • Establish links to the purpose of the lesson:
*** Give mini-lecture: Over the past decade, through large-scale genome and proteome sequencing efforts, such as the Human Genome Project, nucleotide and protein sequence databases have been filled with publicly available sequence data. Efforts to catalogue these sequences and associate them with functions and diseases have kept pace with sequencing efforts. As a result, today, when a biomedical researcher discovers a DNA or protein sequence in a tissue, he or she can run the program BLAST to compare the sequence to the publicly stored data in order to identify it by matching it to a sequence in the database. Using the database, the researcher will then retrieve its function and the associated disease. Today, we will use BLAST and Google to identify a gene sequence and it's associated disease.

  • Body of the Lesson:
    • The teacher tells the students that using DNA chips you have inferred that a gene sequence is differentially expressed in a patient’s tissue. In order to figure out which disease is affecting the patient, you must first figure out which disease is associated with the gene. To do this you will first use the program BLAST to identify the protein associated with the gene. Next, you will perform a Google search to find out the disease associated with that protein.
** The teacher forms the students into seven groups, and gives each of the groups a DNA sequence to identify and to associate with a disease.
*** Transcribing their DNA sequence to yield an mRNA sequence
      • Translating the resulting mRNA sequence to yield a protein sequence
      • Submitting their sequence to the BLAST server to yield a protein identification
      • Using Google to find the disease associated with the protein
      • Running BLAST and using Google to identify the disease associated with a DNA sequence directly

  • Concluding the Lesson:
    • Reporting Findings: Each group will report the name of their protein, along with the name of its associated disease. The group will also explain the relationship between the protein and the disease.
    • Discussion: The teacher asks the students what they learned during the lab that they didn't already know, using the question to initiate a brief discussion.

Challenges in Carrying-out the Investigation:
  • Students may have difficulty in using internet forms due to lack of experience and may need extra help.
  • Students may have difficulty using Google to do searches due to lack of experience and may need extra help.
  • Little instruction is provided as to how to BLAST the DNA sequence. As a result, students may have difficulty figuring out how to do nucleotide only BLAST. They may have trouble finding the gene location, or the associated disease, since the methods are not clearly laid out.

Assessing Student Understanding:
Included in the worksheet for the activity are a series of questions, which should be assigned as homework, that are designed to assess the students understanding of the connection between DNA sequence changes and disease, as well as their understanding of BLAST. In addition, the questions assess the student's understanding of the variety of contexts in which BLAST can be used by researchers:

This question asks the students to recall what they've learned about the relationship between genetic mutations and disease. To answer this question students need to describe how single nucleotide substitutions lead to single amino acid substitutions, which can alter protein function, leading to disease. They also need to describe how single nucleotide insertions and deletions, can result in new stop condons, resulting in truncated protein products, leading to disease.
    • 1. Explain how changes in DNA coding sequence can lead to disease.

This question asks the students to explore the social implications of insurance companies knowing DNA sequence information. To answer this question students need to first assert that the knowledge of the DNA sequence can predict certain diseases, which may lead to insurance companies not providing health insurance or life insurance for people with certain DNA sequences.
    • 2. What do you think the ramifications are for insurance companies knowing DNA sequences of individuals?

This question asks students to make a connection between the relationship between DNA and disease and an imporant issue in the news today, the ethics of gene patenting. The answer to this question is that pharmaceutical companies are patenting gene sequences, such as BRCA1, because they can be used to treat diseases, such as breast cancer.
    • 3. Why do you think pharmaceutical companies are patenting gene sequences?

This question asks students to extend their knowledge of the BLAST procedure to a new situation. The answer is that you'd BLAST the gene against the human nucleotide database.
    • 4. If you were a scientist working with mice and discovered a gene that had something to do with obesity in mice, describe how you might find out if there is a similar gene that is known to exist in humans?

This question asks students to contemplate the relationship between BLAST specificity and the length of a sequence. The answer is that a longer sequence would find more specific matches. To demonstrate this, you could generate results for varying sequence lengths.
    • 5. If you had more nucleotides in your sequence to enter into BLAST (say 1000 instead of 100), do you think it would find more specific or less specific matches? Explain your answer. How would you conduct an experiment using the the BLAST server to provide evidence for your answer.

This question asks students to generalize their knowledge of the applicability of BLAST. The answer is that the scientist should BLAST the sequence against the database of all known nucleotide sequences.
    • 6. How would scientists all over the world check to see what a newly sequenced region of DNA is similar to? What do you think they do with the new DNA sequence if it is unknown? Explain

This question asks the student to understand that BLAST does not look for perfect matches. BLAST can still be used in the event of single letter mutations, in fact, this is it's real utility. To demonstrate this, you'd mutate a sequence before BLASTing it and demonstrate that it gets the same hits. This is important because scientists need to be able to find the identity of mutated sequences.
    • 7. Describe how mutations affect BLAST results. How would you conduct an experiment using the sequences you’ve been given and the BLAST server to answer. Why is this important? Explain.

This question pushes the student to understand how BLAST can be used to study evolution. The answer is that a scientist can compare the number of matches between the genes from both organisms. More matches means the organisms are more closely related.
    • 8. How could a scientist use BLAST to get a rough estimate of how closely related two organisms are?

This question asks the student to infer whether it is better to use BLAST with protein or DNA sequences. The protein sequences are more informative and are less likely to yield random matches.
    • 9. Does running BLAST using nucleotides or amino acids yield more specific matches? Explain.

Student Resources:

Cautions: Students should be reminded not to use the internet to access sites that are not relevant to the activity and that accesssing inappropriate sites will be grounds for immediate referral.

Comments on Connecting DNA to Disease Using BLAST

Rationale for Connecting DNA to Disease Using BLAST