Abstract:
In oncogenetics, the intricate relationship between gene expression and cancer development is still an enigma. Identification of the genetic driver for cancer is essential for an effective treatment innovation. Microarray technology holds immense potential for uncovering the epigenetic nature of cancer-causing genes. However, the high dimensionality of microarray gene expression data always poses a challenge in identifying these genes. In this study, we have considered this challenge by proposing a novel framework that integrates metaheuristics, machine learning, and an enrichment analysis approach to find the most relevant genes associated with different cancer types from the vast amount of expression data.
Our methodology involves data transformation and ranking, more precisely Yeo-Johnson transfor- mation and Pearson ranking, as a preprocessing step. Subsequently, we combine eight metaheuristic algorithms and five machine-learning algorithms within a wrapper framework. A weighted ranking mechanism is utilized to identify the most significant genes with the highest classification accuracy. Eventually, enrichment analysis has been performed on these top genes to validate their true associ- ation with the particular cancer biology.
We evaluated the effectiveness of our method on three microarray cancer gene expression datasets. Through a rigorous enrichment analysis, we validated the biological significance of the genes identi- fied by our approach. Compared to the state-of-the-art methods, our approach achieved compatible performance in terms of classification accuracy, and significantly superior performance in terms of the ability to identify biologically relevant genes.