Description
Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed.
Download count: 2
Details
Title
- Waveform mapping and time-frequency processing of biological sequences and structures
Contributors
- Ravichandran, Lakshminarayan (Author)
- Papandreou-Suppappola, Antonia (Thesis advisor)
- Spanias, Andreas S (Thesis advisor)
- Chakrabarti, Chaitali (Committee member)
- Tepedelenlioğlu, Cihan (Committee member)
- Lacroix, Zoé (Committee member)
- Arizona State University (Publisher)
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2011
Subjects
Resource Type
Collections this item is in
Note
- thesisPartial requirement for: Ph. D., Arizona State University, 2011
- bibliographyIncludes bibliographical references (p. 108-121)
- Field of study: Electrical engineering
Citation and reuse
Statement of Responsibility
by Lakshminarayan Ravichandran