Faceted search and browsing of Indonesian text collection using shallow parsing techniques

149410-Thumbnail Image.png
Description
Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The search engine data was extracted from various Indonesian web sources. Metadata information like person, organization, location, and sentiment analytic keyword entities should be tagged in each document to provide facet search capability. A shallow parsing technique like named entity recognizer is used for this purpose. There are more than 1500 entities that have been tagged in this process. These documents have been successfully converted into XML format and are indexed with "Apache Solr". It is an open source enterprise search engine with full text search and faceted search capabilities. The entities will be helpful for users to specify conditions and search faster through the large collection of documents. The user is assured results by clicking on a metadata condition. Since the sentiment analytic keywords are tagged with positive and negative values, social scientists can use these results to check for overlapping or conflicting organizations and ideologies. In addition, this tool is the first of its kind for the Indonesian language. The results are fetched much faster and with better accuracy.
Date Created
2010
Agent

Computational modeling of peptide-protein binding

149386-Thumbnail Image.png
Description
Peptides offer great promise as targeted affinity ligands, but the space of possible peptide sequences is vast, making experimental identification of lead candidates expensive, difficult, and uncertain. Computational modeling can narrow the search by estimating the affinity and specificity

Peptides offer great promise as targeted affinity ligands, but the space of possible peptide sequences is vast, making experimental identification of lead candidates expensive, difficult, and uncertain. Computational modeling can narrow the search by estimating the affinity and specificity of a given peptide in relation to a predetermined protein target. The predictive performance of computational models of interactions of intermediate-length peptides with proteins can be improved by taking into account the stochastic nature of the encounter and binding dynamics. A theoretical case is made for the hypothesis that, because of the flexibility of the peptide and the structural complexity of the target protein, interactions are best characterized by an ensemble of possible bound configurations rather than a single “lock and key” fit. A model incorporating these factors is proposed and evaluated. A comprehensive dataset of 3,924 peptide-protein interface structures was extracted from the Protein Data Bank (PDB) and descriptors were computed characterizing the geometry and energetics of each interface. The characteristics of these interfaces are shown to be generally consistent with the proposed model, and heuristics for design and selection of peptide ligands are derived. The curated and energy-minimized interface structure dataset and a relational database containing the detailed results of analysis and energy modeling are made publicly available via a web repository. A novel analytical technique based on the proposed theoretical model, Virtual Scanning Probe Mapping (VSPM), is implemented in software to analyze the interaction between a target protein of known structure and a peptide of specified sequence, producing a spatial map indicating the most likely peptide binding regions on the protein target. The resulting predictions are shown to be superior to those of two other published methods, and support the validity of the stochastic binding model.
Date Created
2010
Agent