Full metadata
Title
On feature selection stability: a data perspective
Description
The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability of feature selection algorithms has gained an increasing attention as a new indicator due to the necessity to select similar subsets of features each time when the algorithm is run on the same dataset even in the presence of a small amount of perturbation. In order to cure the selection stability issue, we should understand the cause of instability first. In this dissertation, we will investigate the causes of instability in high-dimensional datasets using well-known feature selection algorithms. As a result, we found that the stability mostly data-dependent. According to these findings, we propose a framework to improve selection stability by solving these main causes. In particular, we found that data noise greatly impacts the stability and the learning performance as well. So, we proposed to reduce it in order to improve both selection stability and learning performance. However, current noise reduction approaches are not able to distinguish between data noise and variation in samples from different classes. For this reason, we overcome this limitation by using Supervised noise reduction via Low Rank Matrix Approximation, SLRMA for short. The proposed framework has proved to be successful on different types of datasets with high-dimensionality, such as microarrays and images datasets. However, this framework cannot handle unlabeled, hence, we propose Local SVD to overcome this limitation.
Date Created
2013
Contributors
- Alelyani, Salem (Author)
- Liu, Huan (Thesis advisor)
- Xue, Guoliang (Committee member)
- Ye, Jieping (Committee member)
- Zhao, Zheng (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
xii, 124 p. : ill. (some col.)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.17762
Statement of Responsibility
by Salem Alelyani
Description Source
Viewed on Nov. 6, 2013
Level of coding
full
Note
thesis
Partial requirement for: Ph.D., Arizona State University, 2013
bibliography
Includes bibliographical references (p. 117-124)
Field of study: Computer science
System Created
- 2013-07-12 06:14:45
System Modified
- 2021-08-30 01:42:41
- 3 years 2 months ago
Additional Formats