Diagnosis of heart disease using machine learning

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Information and Communication Technology
→
View Item

dc.contributor.advisor	Md. Rubaiyat Hossain Mondal, Dr.
dc.contributor.author	Istiaq Habib Khan, Md.
dc.date.accessioned	2021-10-18T09:25:54Z
dc.date.available	2021-10-18T09:25:54Z
dc.date.issued	2020-10-10
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5871
dc.description.abstract	Early detection of heart disease can help in preventing the disease progression. Different risk factors are associated with heart disease prediction. This project focuses on multiple datasets in order to find the most valuable attributes and risk factors associated with heart disease.One dataset containing 14 attributes including the target attribute and 303 instances is collected from UCI machine learning repository. The second one containing 10 attributes and 462 instances is collected from Kaggle repository. The third one contains 12 attributes of 70000 instances, and is available at Kaggle repository. Seven different machine learning algorithms are applied on these three individual datasets to study the most influential attributes for heart disease prediction. One hybrid dataset is also generated using only the common attributes of two individual datasets. Scikit-learn library of Python programing language is used for data analysis purpose. Univariate feature selection algorithm is applied in order to find the most valuable attributes associated with heart disease. The heart disease is predicted using several machine learning algorithms including support vector machine (SVM), decision tree, k-nearest neighbors (kNN), logistic regression, naïve Bayes, random forest, and majority voting.The training and testing portions of each dataset is separated using holdout and cross validation methods. Different parameters related to different algorithms are altered andapplied to find out which condition gives the highest accuracy. To evaluate the performance of different algorithms, classification report and confusion matrix are also calculated. It is shown here that majority voting as a combination of logistic regression, SVM, and naïve Bayes exhibits the best accuracy of 88.89% when applied to the first dataset.It is also shown that for the hybrid dataset, the classification accuracy is lower than that of the individual datasets.Finally, the best result obtained from this project work is compared with the results of existing similar research approaches.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Information and Communication Technology (IICT), BUET	en_US
dc.subject	Diagnosis-Heart diseases	en_US
dc.title	Diagnosis of heart disease using machine learning	en_US
dc.type	Thesis - Post Graduate Diploma	en_US
dc.contributor.id	0417311006	en_US
dc.identifier.accessionNumber	117626
dc.contributor.callno	616.12	en_US