Predicting Colorectal Cancer Using Machine Learning and Worldwide Dietary Data
DOI:
https://doi.org/10.66108/mna.v4i1.64Keywords:
Colonel Cancer, Machine Learning, Cancer Screening, Early Cancer DetectionAbstract
Colorectal Cancer (CRC) is considered to be a substantial catastrophic disease and the third most commonly reported type of cancer worldwide. By performing proactive screening of patients for CRC detection, it has been found that its is most prominently diagnosed in younger adults. However, most of the recently published papers have primarily focused upon the implication of statistical machine learning algorithms for CRC diagnosis in older adults with the aid of small-scale datasets, which are unable to depict acceptable performance in practice for large populations. So, it is crucial to assess machine learning algorithms on big datasets from varied areas and socio demographics, including both younger and older persons. The Centre for Disease Control and Prevention acquired a dataset of 109,343 individuals from colorectal cancer research in South Korea, India, Canada Mexico, Italy, Sweden, and the US. This worldwide dietary database was supplemented using publicly available information from several sources. In this study, we have evaluated performance of nine supervised and unsupervised machine learning methods on the aggregated dataset. Both type of tested models (i.e., supervised and unsupervised) models accurately predicted CRC and non-CRC traits. Among the nine tested models, artificial neural network (ANN) has achieved best performance, while attaining a misclassification rate of 1% and 3% for CRC and non-CRC respectively. ANN model has depicted extraordinary performance over diverse datasets, which make it a suitable choice for CRC diagnosis in both young and elderly persons. Using optimum algorithms and ensuring high screening compliance can significantly enhance early cancer detection and increase the success rate of prompt treatments.
Downloads
Additional Files
Published
How to Cite
License
© This work is published by Machines and Algorithms and licensed under the terms of Creative Commons Attribution 4.0 International License (CC BY 4.0).
