| dc.description.abstract |
Data transformation (DT) plays a vital role in the data preprocessing phase of machine learning (ML) model training. Several DT methods are being used in ML applications. However, all the DT methods are not suitable for the same ML application. This issue has been ignored in the research community. In this thesis, we have investigated and analyzed the effectiveness, suit- ability, and applicability of DT methods in ML applications covering multiple domains along different dimensions. Focusing on the mentioned issue, we have come up with a novel DT approach which not only improves ML prediction performance in various application domains but also preserves data privacy. In the process, this research spanned several ML applications.
At first, we developed customer churn prediction models in the telecommunication industry (TCI), where we investigated the impact of several DT methods. Our findings revealed that DT methods, particularly Weight-of-Evidence (WOE), significantly improved churn prediction ac- curacy. However, despite its effectiveness, we identified certain limitations of WOE. To address these issues, we developed a modified version of WOE and introduced it as adaptive Weight-of- Evidence (aWOE). The proposed method was evaluated across multiple application domains, demonstrating improvements in prediction performance, data privacy preservation, and model interpretability. These findings were validated using three publicly available datasets from three different domains and seven classification algorithms.
Since aWOE is able to boost prediction performance in various domains, we employed it to improve the prediction performance in Loan Eligibility Prediction (LEP), along with other DT methods. Extensive experiments were conducted on seven publicly available datasets using eleven different classifiers. The experimental results indicate that the aWOE based LEP models achieve improved prediction performance while preserving data privacy. Furthermore, SHAP analysis revealed that aWOE prioritizes features that are more closely aligned with practical loan eligibility criteria.
Following the impressive success of the proposed aWOE method in the aforementioned pre- diction tasks, this thesis turned its attention to data privacy. We propose a privacy-preserving
customer churn prediction (PPCCP) framework in the cloud environment for the telecommu- nications industry (TCI). The proposed approach is a combination of Generative Adversarial Networks (GANs) and adaptive Weight-of-Evidence (aWOE). Synthetic data is generated from GANs, and aWOE is applied on the synthetic data before feeding the data to the classifica- tion algorithms. Our experiments were carried out using eight different ML classifiers on three publicly accessible datasets. The experimental results, supported by statistical tests and compar- isons with previous studies, demonstrate that the proposed GANs-aWOE framework enhances prediction performance while effectively preserving data privacy.
Next, we shift our focus to the healthcare sector. We propose a distributed patient similarity computation (DPSC) for clinical decision support, leveraging aWOE in conjunction with static and time series data. Dynamic Time Warping (DTW) is employed for time series similarity, while Spark-based distributed processing is utilized to meet real-time computational demands. SHAP analysis further reveals that, when using the aWOE method, patient medical records contribute more significantly to prediction performance than demographic attributes.
Overall, this thesis has developed and validated a generic framework for multiple prediction problems across several domains. The proposed methodologies, techniques, results, observa- tions, and insightful discussions are believed to have advanced the knowledge base and the current state-of-the-art. |
en_US |