Dissertations/Theses - Department of Computer Science and Engineering

Dissertations/Theses - Department of Computer Science and Engineering http://lib.buet.ac.bd;localhosthttp://:8080/xmlui/handle/123456789/60 Post graduate dissertations (Theses) of Computer Science Engineering (CSE) 2026-07-23T00:20:03Z Effective use of data transformation Methods for machine learning applications http://lib.buet.ac.bd;localhosthttp://:8080/xmlui/handle/123456789/7311 Effective use of data transformation Methods for machine learning applications Saifur Rahman, Dr. Mohammad; Sana, Joydeb Kumar; 1018054003; 006.31/SAN/2025 Data transformation (DT) plays a vital role in the data preprocessing phase of machine learning (ML) model training. Several DT methods are being used in ML applications. However, all the DT methods are not suitable for the same ML application. This issue has been ignored in the research community. In this thesis, we have investigated and analyzed the effectiveness, suit- ability, and applicability of DT methods in ML applications covering multiple domains along different dimensions. Focusing on the mentioned issue, we have come up with a novel DT approach which not only improves ML prediction performance in various application domains but also preserves data privacy. In the process, this research spanned several ML applications. At first, we developed customer churn prediction models in the telecommunication industry (TCI), where we investigated the impact of several DT methods. Our findings revealed that DT methods, particularly Weight-of-Evidence (WOE), significantly improved churn prediction ac- curacy. However, despite its effectiveness, we identified certain limitations of WOE. To address these issues, we developed a modified version of WOE and introduced it as adaptive Weight-of- Evidence (aWOE). The proposed method was evaluated across multiple application domains, demonstrating improvements in prediction performance, data privacy preservation, and model interpretability. These findings were validated using three publicly available datasets from three different domains and seven classification algorithms. Since aWOE is able to boost prediction performance in various domains, we employed it to improve the prediction performance in Loan Eligibility Prediction (LEP), along with other DT methods. Extensive experiments were conducted on seven publicly available datasets using eleven different classifiers. The experimental results indicate that the aWOE based LEP models achieve improved prediction performance while preserving data privacy. Furthermore, SHAP analysis revealed that aWOE prioritizes features that are more closely aligned with practical loan eligibility criteria. Following the impressive success of the proposed aWOE method in the aforementioned pre- diction tasks, this thesis turned its attention to data privacy. We propose a privacy-preserving customer churn prediction (PPCCP) framework in the cloud environment for the telecommu- nications industry (TCI). The proposed approach is a combination of Generative Adversarial Networks (GANs) and adaptive Weight-of-Evidence (aWOE). Synthetic data is generated from GANs, and aWOE is applied on the synthetic data before feeding the data to the classifica- tion algorithms. Our experiments were carried out using eight different ML classifiers on three publicly accessible datasets. The experimental results, supported by statistical tests and compar- isons with previous studies, demonstrate that the proposed GANs-aWOE framework enhances prediction performance while effectively preserving data privacy. Next, we shift our focus to the healthcare sector. We propose a distributed patient similarity computation (DPSC) for clinical decision support, leveraging aWOE in conjunction with static and time series data. Dynamic Time Warping (DTW) is employed for time series similarity, while Spark-based distributed processing is utilized to meet real-time computational demands. SHAP analysis further reveals that, when using the aWOE method, patient medical records contribute more significantly to prediction performance than demographic attributes. Overall, this thesis has developed and validated a generic framework for multiple prediction problems across several domains. The proposed methodologies, techniques, results, observa- tions, and insightful discussions are believed to have advanced the knowledge base and the current state-of-the-art. 2025-05-13T00:00:00Z Quantifying pathological progression from single-cell transcriptomics data http://lib.buet.ac.bd;localhosthttp://:8080/xmlui/handle/123456789/7309 Quantifying pathological progression from single-cell transcriptomics data Dr. Mohammad Saifur Rahman; Samin Rahman Khan; 0422052003; 006.31/SAM/2025 The surge in single-cell datasets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce single-cell Pathological Shift Scoring (scPSS), which provides a statistical measure for how much a “query” cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its k-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top n principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a p-value to be assigned to each query cell’s shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength scPSS is its applicability in a “semi-supervised” setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method’s superiority in accuracy and efficiency. Additionally, we have also shown that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level. The code for scPSS is available at https://github.com/SaminRK/scPSS. 2025-06-21T00:00:00Z Multi-agent code generation approach for competitive problem solving http://lib.buet.ac.bd;localhosthttp://:8080/xmlui/handle/123456789/7185 Multi-agent code generation approach for competitive problem solving Ali, Dr. Mohammed Eunus; Ashraful Islam, Md.; 0422052007; 005.453/ASH/2025 Code synthesis, which requires a deep understanding of complex natural language (NL) problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests, presents a significant challenge. Thus, while large language models (LLMs) demonstrate impressive proficiency in natural language processing (NLP), their performance in code generation tasks remains limited. In this thesis, we introduce a new approach to code generation tasks leveraging the multi- agent prompting that uniquely replicates the full cycle of program synthesis as observed in human developers. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of this cycle: recalling relevant examples, planning, code generation, and debugging. After conducting thorough experiments, with multiple LLMs ablations and analyses across eight challenging competitive problem-solving and program synthesis benchmarks—MapCoder showcases remarkable code generation capabilities, achieving their new state-of-the-art (pass@1) results—(HumanEval 93.9%, MBPP 83.1%, APPS 22.0%, CodeContests 28.5%, and xCodeEval 45.3%). Moreover, our method consistently delivers superior performance across various programming languages and varying problem difficulties. We open-source our framework at https://github.com/Md-Ashraful-Pramanik/MapCoder. 2025-01-19T00:00:00Z Stroke prediction using ensemble learning with clinical and image features http://lib.buet.ac.bd;localhosthttp://:8080/xmlui/handle/123456789/7184 Stroke prediction using ensemble learning with clinical and image features Shahriyar, Dr. Rifat; Jannatul Ferdous, Most.; 0417052080; 006.31/JAN/2024 A stroke is a life-threatening brain attack that disrupts blood flow into the brain. As a result, brain cells start to die due to a lack of oxygen and nutrients. After a stroke, every minute is most important. Approximately 1.9 million brain cells die per minute. Early diagnosis of stroke can save the life of a stroke patient or can reduce the permanent damage to the brain. For earlier stroke detection, an initial investigation uses the patient’s clinical information. Then, doctors advise computed tomography images of the brain. If doctors delay diagnosis or may make erroneous diagnoses, this can be a life-threatening issue. For that reason, an automatic diagnosis of stroke from clinical data initially and then finally from a brain CT scan image will be beneficial for stroke patients. For the clinical data, we have applied different machine learning models, such as Logistic Regression, Decision Tree, K-Nearest Neighbour, Ada-Boost, Xg-Boost, and others. In the case of clinical data, three balancing techniques: Random Oversampling, SMOTE, and ADASYN are employed and also record the performance of individual models. For the brain CT image data, we have moderated three pre-trained CNN models named Inceptionv3, MobileNetv2, and Xception by updating the top layer of those models using the transfer learning technique. A new ensemble convolutional neural network model named ENSNET is proposed for automatic brain stroke prediction from brain CT scan images. ENSNET is the average of two improved CNN models named Inceptionv3 and Xception. We have used accuracy, precision, recall, f1- score, confusion matrix, accuracy vs. epoch, loss vs. epoch, and ROC curve as performance evaluation matrices. The accuracy of the moderated Inceptionv3 is 97.48%, the moderated MobileNetv2 is 83.29%, and the moderated Xception is 96.11%. However, when it comes to diagnosing stroke from brain CT scans, the proposed ensemble model ENSNET outperforms the other models, offering 98.86% accuracy, 97.71% precision, 98.46% recall, 98.08% f1-score, and 98.74% AUC. This proposed ensemble model (ENSNET) is validated by using another two datasets. So, the proposed ENSNET model will be beneficial for the health sector in detecting stroke from the brain-computed tomography images of the brain more successfully than other models. 2024-11-24T00:00:00Z