Mining Data with Feature Interactions
Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved by a convex relaxation in existing studies. I further propose to use the non-convex weak hierarchical Lasso formulation for hypothesis testing on the interaction features with hierarchical assumptions. Secondly, I propose a type of bi-linear models that take advantage of interactions of features for drug discovery problems where specific drug-drug pairs or drug-disease pairs are of interest. These models are learned by maximizing the number of positive data pairs that rank above the average score of unlabeled data pairs. Then I generalize the method to the case of using the top-ranked unlabeled data pairs for representative construction and derive an efficient algorithm for the extended formulation. Last but not least, motivated by a special form of bi-linear models, I propose a framework that enables simultaneously subgrouping data points and building specific models on the subgroups for learning on massive and heterogeneous datasets. Experiments on synthetic and real datasets are conducted to demonstrate the effectiveness or efficiency of the proposed methods.
- Author (aut): Liu, Yashu
- Thesis advisor (ths): Ye, Jieping
- Thesis advisor (ths): Xue, Guoliang
- Committee member: Liu, Huan
- Committee member: Mittelmann, Hans D
- Publisher (pbl): Arizona State University