Performance Evaluation of Machine Learning Models for Breast Cancer Prediction
Keywords:
Cross-Dataset Evaluation, Machine Learning Models, Breast Cancer Detection, Breast Cancer Prediction, Performance Comparison, Predictive ModelingAbstract
One of the main causes of cancer-related fatalities globally has been breast cancer. The underlying cause of this malady is that it is mostly revealed in late stages after a certain time of its occurrence making it difficult to treat. Another significant characteristic of breast cancer is that it can reoccur after its treatment. Therefore, early prediction of its occurrence and re-occurrence is the best solution to decree the death-rate. This can be achieved through using machine learning based predictive models. This study aims to forecast the breast cancer outcome using machine learning classifiers including Gaussian Naïve Bayes (GNB), Logistic Regression (LR), K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests (RF). The generalization ability and robustness of these distinct classifiers is evaluated on Breast Cancer Wisconsin (Diagnostic) datasets from UCI repository. We analyzed cross-dataset performance in aspects of accuracy, F1 score, precision, and ROC to recognize the most reliable models for accurate breast cancer prediction and to highlight potential dataset-specific biases. The results indicate significant variations in algorithm performance on the dataset. This comparative study not only provides insights into the relative strengths and weaknesses of each machine learning approach but also emphasizes the importance of evaluating predictive models over the dataset to ensure their effectiveness in practical scenarios. Our findings contribute to the expansion of more trustworthy and generalizable breast cancer prediction tools, enhancing early detection and treatment strategies.