Augmenting Academic Research Search And Reading With Richer Context

161564-Thumbnail Image.png
Description
The volume of scientific research is growing at an exponential rate over the past100 years. With the advent of the internet and ubiquitous access to the web, academic research search engines such as Google Scholar, Microsoft Academic, etc., have become

The volume of scientific research is growing at an exponential rate over the past100 years. With the advent of the internet and ubiquitous access to the web, academic research search engines such as Google Scholar, Microsoft Academic, etc., have become the go-to platforms for systemic reviews and search. Although many academic search engines host lots of content, they provide minimal context about where the search terms matched. Many of these search engines also fail to provide additional tools which can help enhance a researcher’s understanding of research content outside their respective websites. An example of such a tool can be a browser extension/plugin that surfaces context-relevant information about a research article when the user reads a research article. This dissertation discusses a solution developed to bring more intrinsic characteristics of research documents such as the structure of the research document, tables in the document, the keywords associated with the document to improve search capabilities and augment the information a researcher may read. The prototype solution named Sci-Genie(https://sci-genie.com/) is a search engine over scientific articles from Computer Science ArXiv. Sci-Genie parses research papers and indexes research documents’ structure to provide context-relevant information about the matched search fragments. The same search engine also powers a browser extension to augment the information about a research article the user may be reading. The browser extension augments the user’s interface with information about tables from the cited papers, other papers by the same authors, and even the citations to and from the current article. The browser extension is further powered with access endpoints that leverage a machine learning model to filter tables comparing various entities. The dissertation further discusses these machine learning models and some baselines that help classify whether a table is comparing various entities or not. The dissertation finally concludes by discussing the current shortcomings of Sci-Genie and possible future research scope based on learnings after building Sci-Genie.
Date Created
2021
Agent