Abstract:
Carcinoma is one of the scariest and frequently occurring cancers nowadays among fe- males. It affects nearly around ten percent of the females all over the world at some point of their lives. Although, the cure for this cancer is currently obtainable, the treat- ment is not effective enough if the disease is not identified at the early stages. Early detection of disease has become a crucial problem due to rapid population growth in medical research in recent times. With the rapid population growth, the risk of death incurred by breast cancer is rising exponentially. Breast cancer is the second most se- vere cancer among all of the cancers already unveiled. An automatic disease detection system aids medical staffs in disease diagnosis and offers reliable, effective, and rapid response as well as decreases the risk of death. Generally, some contemporary medi- cal tests: roentgenogram, breast ultrasound, biopsy, etc., are used for identification of breast cancer. As an alternative, researchers are exploring machine learning techniques for classifying tumours at different stages, e.g., benign and malignant. Classification and data processing strategies can be an effective mechanism for prediction of cancer. Especially in medical field, these methods have been used to predict and to make deci- sions. In this project, we analyse six classification models: Decision Tree, K Nearest Neighbours, Random Forest, Logistic Regression, Extra Trees and Support Vector Ma- chine on three different datasets from the UCI repository. With respect to the results of accuracy, precision, sensitivity, specificity and false positive rate the efficiency of each algorithm is measured and compared. These techniques are coded in python and executed in Spyder, the Scientific Python Development Environment. Experimental re- sults show that Random Forest obtained the best accuracy, recall, CV score, and F1 score among the six classification techniques for all three datasets. After comparing the experimental results with alternative schemes that used with three different dataset, per- formance comparison shows that Random Forest outperformed the other five machine learning techniques with the best accuracy of 99.57%, 96.3% precision and 100% recall to predict the breast cancer.