Full metadata
Title
Modeling and Exploiting the Structure of Data via Meta-Features for Robust and Efficient Machine Learning
Description
In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.
Date Created
2022
Contributors
- Li, Weizhi (Author)
- Berisha, Visar (Thesis advisor)
- Dasarathy, Gautam (Thesis advisor)
- Natesan Ramamurthy, Karthikeyan (Committee member)
- Turaga, Pavan (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
162 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.171940
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: Ph.D., Arizona State University, 2022
Field of study: Computer Engineering
System Created
- 2022-12-20 06:19:18
System Modified
- 2022-12-20 06:19:18
- 1 year 10 months ago
Additional Formats