Effective use of data transformation Methods for machine learning applications

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Saifur Rahman, Dr. Mohammad
dc.contributor.author	Sana, Joydeb Kumar
dc.date.accessioned	2026-04-08T06:40:51Z
dc.date.available	2026-04-08T06:40:51Z
dc.date.issued	2025-05-13
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/7311
dc.description.abstract	Data transformation (DT) plays a vital role in the data preprocessing phase of machine learning (ML) model training. Several DT methods are being used in ML applications. However, all the DT methods are not suitable for the same ML application. This issue has been ignored in the research community. In this thesis, we have investigated and analyzed the effectiveness, suit- ability, and applicability of DT methods in ML applications covering multiple domains along different dimensions. Focusing on the mentioned issue, we have come up with a novel DT approach which not only improves ML prediction performance in various application domains but also preserves data privacy. In the process, this research spanned several ML applications. At first, we developed customer churn prediction models in the telecommunication industry (TCI), where we investigated the impact of several DT methods. Our findings revealed that DT methods, particularly Weight-of-Evidence (WOE), significantly improved churn prediction ac- curacy. However, despite its effectiveness, we identified certain limitations of WOE. To address these issues, we developed a modified version of WOE and introduced it as adaptive Weight-of- Evidence (aWOE). The proposed method was evaluated across multiple application domains, demonstrating improvements in prediction performance, data privacy preservation, and model interpretability. These findings were validated using three publicly available datasets from three different domains and seven classification algorithms. Since aWOE is able to boost prediction performance in various domains, we employed it to improve the prediction performance in Loan Eligibility Prediction (LEP), along with other DT methods. Extensive experiments were conducted on seven publicly available datasets using eleven different classifiers. The experimental results indicate that the aWOE based LEP models achieve improved prediction performance while preserving data privacy. Furthermore, SHAP analysis revealed that aWOE prioritizes features that are more closely aligned with practical loan eligibility criteria. Following the impressive success of the proposed aWOE method in the aforementioned pre- diction tasks, this thesis turned its attention to data privacy. We propose a privacy-preserving customer churn prediction (PPCCP) framework in the cloud environment for the telecommu- nications industry (TCI). The proposed approach is a combination of Generative Adversarial Networks (GANs) and adaptive Weight-of-Evidence (aWOE). Synthetic data is generated from GANs, and aWOE is applied on the synthetic data before feeding the data to the classifica- tion algorithms. Our experiments were carried out using eight different ML classifiers on three publicly accessible datasets. The experimental results, supported by statistical tests and compar- isons with previous studies, demonstrate that the proposed GANs-aWOE framework enhances prediction performance while effectively preserving data privacy. Next, we shift our focus to the healthcare sector. We propose a distributed patient similarity computation (DPSC) for clinical decision support, leveraging aWOE in conjunction with static and time series data. Dynamic Time Warping (DTW) is employed for time series similarity, while Spark-based distributed processing is utilized to meet real-time computational demands. SHAP analysis further reveals that, when using the aWOE method, patient medical records contribute more significantly to prediction performance than demographic attributes. Overall, this thesis has developed and validated a generic framework for multiple prediction problems across several domains. The proposed methodologies, techniques, results, observa- tions, and insightful discussions are believed to have advanced the knowledge base and the current state-of-the-art.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE), BUET	en_US
dc.subject	Machine learning	en_US
dc.title	Effective use of data transformation Methods for machine learning applications	en_US
dc.type	Thesis-PhD	en_US
dc.contributor.id	1018054003	en_US
dc.identifier.accessionNumber	120741
dc.contributor.callno	006.31/SAN/2025	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Show simple item record

Search BUET IR

Advanced Search

Browse

All of IR
This Collection

Effective use of data transformation Methods for machine learning applications

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account