Biotech Report Assignment

Assignment instructions

Background

In this exercise, you will look into different gene-finding approaches learn various things about a contig from a genome assembly and annotate the genes it encodes (see sequence assembly lecture if you need a reminder of the vocabulary).

There are two main methods for automatic gene prediction: ab initio methods and comparative methods. Ab initio methods use the DNA sequence as the only input and are referred to as intrinsic methods. There are several features that can be identified in a genomic sequence and used to identify genes computationally. Such features are related either to the signals that regulate the biological mechanisms of gene expression (signal sensors), or to biases in sequence composition in DNA regions that are translated into proteins (content sensors). Signal sensors are typically splice-sites (donor: GTRAGT, acceptor: YAG, branch-site: CTRAY), the start of translation (codon ATG), and the end of translation (codons TGA, TAA, and TAG).

The content sensor most commonly used is bias in codon usage: regions of DNA coding for a protein use some codons more frequently than others. Both signal sensors and content sensors must be trained, i.e., we must start from a set of observations (such as known genes) from which we build a sensor model. Predicting a gene therefore involves looking for new features in the genomic sequence that resemble our model. The resemblance can be established in terms of probabilities.

Comparative methods are called extrinsic methods. They include two strategies: those that use homology with sequences from other genes, also called homology-based, and those that make comparisons with genomic sequence from other genomes, also called comparative-genomics-based. Homology-based methods predict a gene from the alignment of a protein sequence, or an RNA sequence in the form of a full-length mRNA, cDNA or EST (expressed sequence tag), with the genome sequence that we want to annotate. The known sequence (also called evidence) guides the prediction. There are several ways of applying homology-based methods. The simplest is to accept the alignment of the known sequence to the genome as the gene prediction. More advanced methods use the known sequence as a guide and try to complete the evidence to yield a complete gene structure. The efficacy of the latter method depends on the number of known gene sequences; hence it is limited by the completeness of biological databases. Comparative-genomics-based methods hypothesize that any sequences conserved between two relatively closely-related genomes are functional and likely to code for a gene.

The annotation of a genome involves a combination of several gene prediction methods, and perhaps the prediction of other biological signals (such as transcription start sites, promoter regions, etc). In this practical we will use different ab initio (FGENESH and AUGUSTUS) and homology-based (BLAST) approaches to annotate a DNA sequence.

You are required to write a more than 1,250 words report on your analysis.  Refer to the general assignment preparation guidelines given to you before.

Introduction: Provide appropriate background to the aims of your analysis. Include the difference between ab initio and homology-based approaches and explain under what circumstances one approach would be more suitable than the other.

Methods: Make sure in your methods section that you state which kind of BLAST you used for each task and explain why.

Results: Based on the tasks carried out in the prac on your allocated 20-kb DNA sequence （attachment：） (make sure you note which sequence you worked on), present the findings in this section of the report. Specifically, we are looking for:

A summary diagram of the gene models from the program you believe represents the best, most accurate ab initio annotation. Provide a summary diagram showing the entire 20kb of DNA annotated with all predicted genes. Your diagram should indicate the base-pair positions of all gene features. You must show exons, introns, start and stop sites and the gene direction. You should also provide the polypeptide sequence encoded by each gene. Include in the text the annotation method you chose to depict in your diagram, and why you chose it (i.e. strengths and weaknesses compared to other methods). Explain the support you obtained for these predictions. Use the appropriate BLAST program to determine the type of protein produced from each gene predicted by your chosen ab initio method. Present the evidence and comment on your confidence in each BLAST result.
Report on the homology-based method to give an independent prediction of the gene models in your 20-kb sequence. Provide a summary diagram showing the entire 20 kb of DNA, annotated with all predicted genes. Your diagram should indicate the base-pair positions of all gene features. You must show exons, introns, start and stop sites and the gene direction. You should also provide the polypeptide sequence encoded by each gene. Determine the type of protein produced by each gene predicted by the homology- based search. Present the evidence and comment on how confident you are in each BLASTx result.
Find and identify the one type of gene that is found on all DNA fragments for the class. Either download another few sequences to identify the gene type in common, or check with your colleagues which gene types they found.

Tip for visualization purposes: Geneious Prime is very convenient to show annotations along a sequence. You can download a free trial from here: (Links to an external site.) https://www.geneious.com/prime-features. Alternatively, visualize the results in PowerPoint or similar software.

Discussion: Include the following discussion points:

Compare the ab initio predictions and the BLASTx-based homology predictions for your sequence and comment on which approach worked best for annotating your sequence.
Comment on why BLASTx was used for the homology-based prediction. How is BLASTx different to the other kinds of BLAST?
Discuss the predicted biological and/or biochemical function of the gene found in Results #3 above. Include the nature of the conserved functional motif(s) and the homology to characterized genes. You will need to search the scientific literature in order to determine the biological and/or biochemical function of the chosen gene.

Visualization results：

Last Updated on April 26, 2021