DSpace Repository

Effective use of data transformation Methods for machine learning applications

Show simple item record

dc.contributor.advisor Saifur Rahman, Dr. Mohammad
dc.contributor.author Sana, Joydeb Kumar
dc.date.accessioned 2026-04-08T06:40:51Z
dc.date.available 2026-04-08T06:40:51Z
dc.date.issued 2025-05-13
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/7311
dc.description.abstract Data transformation (DT) plays a vital role in the data preprocessing phase of machine learning (ML) model training. Several DT methods are being used in ML applications. However, all the DT methods are not suitable for the same ML application. This issue has been ignored in the research community. In this thesis, we have investigated and analyzed the effectiveness, suit- ability, and applicability of DT methods in ML applications covering multiple domains along different dimensions. Focusing on the mentioned issue, we have come up with a novel DT approach which not only improves ML prediction performance in various application domains but also preserves data privacy. In the process, this research spanned several ML applications. At first, we developed customer churn prediction models in the telecommunication industry (TCI), where we investigated the impact of several DT methods. Our findings revealed that DT methods, particularly Weight-of-Evidence (WOE), significantly improved churn prediction ac- curacy. However, despite its effectiveness, we identified certain limitations of WOE. To address these issues, we developed a modified version of WOE and introduced it as adaptive Weight-of- Evidence (aWOE). The proposed method was evaluated across multiple application domains, demonstrating improvements in prediction performance, data privacy preservation, and model interpretability. These findings were validated using three publicly available datasets from three different domains and seven classification algorithms. Since aWOE is able to boost prediction performance in various domains, we employed it to improve the prediction performance in Loan Eligibility Prediction (LEP), along with other DT methods. Extensive experiments were conducted on seven publicly available datasets using eleven different classifiers. The experimental results indicate that the aWOE based LEP models achieve improved prediction performance while preserving data privacy. Furthermore, SHAP analysis revealed that aWOE prioritizes features that are more closely aligned with practical loan eligibility criteria. Following the impressive success of the proposed aWOE method in the aforementioned pre- diction tasks, this thesis turned its attention to data privacy. We propose a privacy-preserving customer churn prediction (PPCCP) framework in the cloud environment for the telecommu- nications industry (TCI). The proposed approach is a combination of Generative Adversarial Networks (GANs) and adaptive Weight-of-Evidence (aWOE). Synthetic data is generated from GANs, and aWOE is applied on the synthetic data before feeding the data to the classifica- tion algorithms. Our experiments were carried out using eight different ML classifiers on three publicly accessible datasets. The experimental results, supported by statistical tests and compar- isons with previous studies, demonstrate that the proposed GANs-aWOE framework enhances prediction performance while effectively preserving data privacy. Next, we shift our focus to the healthcare sector. We propose a distributed patient similarity computation (DPSC) for clinical decision support, leveraging aWOE in conjunction with static and time series data. Dynamic Time Warping (DTW) is employed for time series similarity, while Spark-based distributed processing is utilized to meet real-time computational demands. SHAP analysis further reveals that, when using the aWOE method, patient medical records contribute more significantly to prediction performance than demographic attributes. Overall, this thesis has developed and validated a generic framework for multiple prediction problems across several domains. The proposed methodologies, techniques, results, observa- tions, and insightful discussions are believed to have advanced the knowledge base and the current state-of-the-art. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE), BUET en_US
dc.subject Machine learning en_US
dc.title Effective use of data transformation Methods for machine learning applications en_US
dc.type Thesis-PhD en_US
dc.contributor.id 1018054003 en_US
dc.identifier.accessionNumber 120741
dc.contributor.callno 006.31/SAN/2025 en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account