DSpace Repository

Diagnosis of heart disease using machine learning

Show simple item record

dc.contributor.advisor Md. Rubaiyat Hossain Mondal, Dr.
dc.contributor.author Istiaq Habib Khan, Md.
dc.date.accessioned 2021-10-18T09:25:54Z
dc.date.available 2021-10-18T09:25:54Z
dc.date.issued 2020-10-10
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5871
dc.description.abstract Early detection of heart disease can help in preventing the disease progression. Different risk factors are associated with heart disease prediction. This project focuses on multiple datasets in order to find the most valuable attributes and risk factors associated with heart disease.One dataset containing 14 attributes including the target attribute and 303 instances is collected from UCI machine learning repository. The second one containing 10 attributes and 462 instances is collected from Kaggle repository. The third one contains 12 attributes of 70000 instances, and is available at Kaggle repository. Seven different machine learning algorithms are applied on these three individual datasets to study the most influential attributes for heart disease prediction. One hybrid dataset is also generated using only the common attributes of two individual datasets. Scikit-learn library of Python programing language is used for data analysis purpose. Univariate feature selection algorithm is applied in order to find the most valuable attributes associated with heart disease. The heart disease is predicted using several machine learning algorithms including support vector machine (SVM), decision tree, k-nearest neighbors (kNN), logistic regression, naïve Bayes, random forest, and majority voting.The training and testing portions of each dataset is separated using holdout and cross validation methods. Different parameters related to different algorithms are altered andapplied to find out which condition gives the highest accuracy. To evaluate the performance of different algorithms, classification report and confusion matrix are also calculated. It is shown here that majority voting as a combination of logistic regression, SVM, and naïve Bayes exhibits the best accuracy of 88.89% when applied to the first dataset.It is also shown that for the hybrid dataset, the classification accuracy is lower than that of the individual datasets.Finally, the best result obtained from this project work is compared with the results of existing similar research approaches. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Communication Technology (IICT), BUET en_US
dc.subject Diagnosis-Heart diseases en_US
dc.title Diagnosis of heart disease using machine learning en_US
dc.type Thesis - Post Graduate Diploma en_US
dc.contributor.id 0417311006 en_US
dc.identifier.accessionNumber 117626
dc.contributor.callno 616.12 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account