Description
Code Generation is a task that has gained rapid progress in Natural Language Processing (NLP) research. This thesis focuses on the text-to-Structured Query Language (SQL) task, where the input is a question about a specific database and the output is the SQL that when executed will return the desired answer. The data creation process bottlenecks current text-to-SQL datasets. The technical knowledge required to understand and create SQL makes crowd-sourcing a dataset expensive and time-consuming. Thus, existing datasets do not provide a robust enough training set for state-of-the-art semantic parsing models. This thesis outlines my technique for generating a text-to-SQL dataset using GPT3 and prompt engineering techniques. My approach entails providing the Generative Pretrained Transformer 3 model (GPT-3) with particular instructions to build a rigorous text-to-SQL dataset. In this paper, I show that the created pairs have excellent quality and diversity, and when utilized as training data, they can enhance the accuracy of SQL generation models. I expect that my method will be of interest to academics in the disciplines of NLP because it can considerably reduce the time, effort, and cost necessary to produce large, high-quality text-to-SQL datasets. Furthermore, my approach can be extended to other tasks and domains to alleviate the burden of curating human-annotated data.
Download count: 29
Details
Title
- Using Language Models to Generate Text-to-SQL Training Data An Approach to Improve Performance of a Text-to-SQL Parser
Contributors
- Kuznia, Kirby Charles (Author)
- Baral, Chitta (Thesis advisor)
- Blanco, Eduardo (Committee member)
- Gopalan, Nakul (Committee member)
- Arizona State University (Publisher)
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2023
Subjects
Resource Type
Collections this item is in
Note
- Partial requirement for: M.S., Arizona State University, 2023
- Field of study: Computer Science