Full metadata

Title

On feature selection stability: a data perspective

Description

The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability of feature selection algorithms has gained an increasing attention as a new indicator due to the necessity to select similar subsets of features each time when the algorithm is run on the same dataset even in the presence of a small amount of perturbation. In order to cure the selection stability issue, we should understand the cause of instability first. In this dissertation, we will investigate the causes of instability in high-dimensional datasets using well-known feature selection algorithms. As a result, we found that the stability mostly data-dependent. According to these findings, we propose a framework to improve selection stability by solving these main causes. In particular, we found that data noise greatly impacts the stability and the learning performance as well. So, we proposed to reduce it in order to improve both selection stability and learning performance. However, current noise reduction approaches are not able to distinguish between data noise and variation in samples from different classes. For this reason, we overcome this limitation by using Supervised noise reduction via Low Rank Matrix Approximation, SLRMA for short. The proposed framework has proved to be successful on different types of datasets with high-dimensionality, such as microarrays and images datasets. However, this framework cannot handle unlabeled, hence, we propose Local SVD to overcome this limitation.

Date Created

2013

Contributors

Alelyani, Salem (Author)
Liu, Huan (Thesis advisor)
Xue, Guoliang (Committee member)
Ye, Jieping (Committee member)
Zhao, Zheng (Committee member)
Arizona State University (Publisher)

Topical Subject

Resource Type

Text

Genre

Doctoral Dissertation

Academic theses

Extent

xii, 124 p. : ill. (some col.)

Language

eng

Copyright Statement

In Copyright

Reuse Permissions

Primary Member of

ASU Electronic Theses and Dissertations

Peer-reviewed

No

Open Access

No

Handle

https://hdl.handle.net/2286/R.I.17762

Statement of Responsibility

by Salem Alelyani

Description Source

Viewed on Nov. 6, 2013

Level of coding

full

Note

thesis

Partial requirement for: Ph.D., Arizona State University, 2013

bibliography

Includes bibliographical references (p. 117-124)

Field of study: Computer science

System Created

2013-07-12 06:14:45

System Modified

2021-08-30 01:42:41
3 years 2 months ago

Additional Formats