Abstract
Oncologists nowadays are faced with big amount of heterogeneous medical data of diagnostic studies. Possible errors in determining the nature and extent of spread the tumor process will inevitably reduce the effectiveness of treatment and increase the unnecessary costs to it. To reduce the burden on clinicians, various computer-aided solutions based on machine learning algorithms are being developed. We made an attempt to evaluate effectiveness of thirteen machine learning algorithms in the tasks of classification of pathologic tissue samples in cancerous thorax based on gene expression levels. For a preliminary study we used open data set of molecular genetics composition of lung adenocarcinoma and pleural mesothelioma. Effectiveness of machine learning algorithms was evaluated by Matthews correlation coefficient and Area Under ROC Curve. Best results were showed by two methods: Bayesian logistic regression and Discriminative Multinomial Naive Bayes classifier. Nevertheless, all of the methods were effective at automatic discrimination of two types of cancer. That proves machine learning algorithms are applicable in lung cancer classification. In the future studies it will be carried out a similar analysis of the diagnostic value of methods for other malignancies with more complex differential morphological diagnosis. Similar methods can be applied to other diagnostic studies including computerized tomography image analysis in the differential diagnosis of lung nodules.
References
Злокачественные новообразования в России в 2015 году (заболеваемость и смертность) / Под ред. А.Д. Каприна, В.В. Старинского, Г.В. Петровой. - М.: МНИ-ОИ им. П.А. Герцена- филиал ФГБУ «НМИРЦ» Минздрава России, 2017. - 250 с.
Злокачественные новообразования в Санкт-Петербурге и других административных территориях Северо-Западного федерального округа России (заболеваемость, смертность, контингенты, выживаемость, больных). Экспресс-информация. Второй выпуск / под ред. А.М. Беляева, Г.М. Манихаса, В.М. Мерабишвили. - СПб.: Т8 Издательские технологии, 2016. - 208 с.
Aref A., Tran Т. Using ensemble of Bayesian classifying algorithms for medical systematic reviews // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - 2014. - Vol. 8436 LNAI. - P 263268.
Baldi P, Brunak S., Chauvin Y et al. Assessing the accuracy of prediction algorithms for classification: an overview // Bioinformatics. - 2000. - Vol. 16. - № 5. - P 412-424.
Barchuk A.A., Podolsky M.D., Gaidukov V.S. et al. Intelligent distributed system of population cancer screening // Voprosy onkologii. - 2015. - Vol. 61. - № 4. - P 517-522.
Bhattacharjee A., Richards W.G., Staunton J. et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses // PNAS. - 2001. - Vol. 98. - № 24. - P 13790-13795.
Brady S.M., Highnam R., Irving B., Schnabel J.A. Onco- 22 logical image analysis // Medical Image Analysis. - 2016. - Vol. 33. - P. 7-12.
De Bruijne M. Machine learning approaches in medical 23 image analysis: From detection to diagnosis // Medical Image Analysis. - 2016. - Vol. 33. - P. 94-97.
Cai Z., Xu D., Zhang Q. et al. Classification of lung cancer using ensemble-based feature selection and machine learning methods // Mol. BioSyst. - 2015. - Vol. 11. - № 24, 3. - P. 791-800.
Cancer incidence in Five Continents Vol. X / Ed. D. Forman. F. Btay, D.H. Brewster, C. Gombe Mbalawa, B. Kohler, M. Pineros, E. Steliarova-Foucher, R. Swamina-than and J. Ferlay. IARC Scientific Publication №164. - 25 Lyon, 2014. - 1365 p.
Cheng P, Cheng Y, Li Y et al. Comparison of the Gene Expression Profiles Between Smokers With and Without 26 Lung Cancer Using RNA-Seq // Asian Pacific Journal of Cancer Prevention. - 2012. - Vol. 13. - № 8. - P 3605-3609.
Cohen W.W. Fast Effective Rule Induction // Proceedings of the Twelfth International Conference on Machine 27, Learning. California. - 1995. - P 115-123.
Cruz J.A., Wishart D.S. Applications of machine learning in cancer prediction and prognosis // Cancer Inform. - 28 2006. - Vol. 2. - P. 59-77.
Devi A.V., Devaraj D., Venkatesulu M. Gene expression data classification using Support Vector Machine and mutual information-based gene selection // Procedia Computer Science. - 2014. - Vol. 47. - P 13-21. 29
Dumouchel W. Multivariate bayesian logistic regression for analysis of clinical study safety issues // Statistical Science. - 2012. - Vol. 27. - № 3. - P 319-339.
Freund Y, Schapire R.E. Large margin classification using 30 the perceptron algorithm // Machine Learning. - 1999. - Vol. 37. - № 3. - P. 277-296.
Gordon G.J., Jensen R.V., Hsiao L.-L. et al. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic 31 Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma // Cancer Res. - 2002. - Vol. 62. - № 17. - P 4963-4967.
Hall M., Frank E., Holmes G. et al. The WEKA data mining software: an update // SIGKDD Explorations. - 2009. - 32 Vol. 11. - № 1. - P 10-18.
Hosseinzadeh F., Ebrahimi M., Goliaei B., Shamaba-di N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins 33 by bioinformatics models // PLoS ONE. - 2012. - Vol. 7. - № 7.
Jiang L., Cai Z., Zhang H., Wang D. Naive Bayes text classifiers: A locally weighted learning approach // Journal 34 of Experimental and Theoretical Artificial Intelligence. -2013. - Vol. 25. - № 2. - P. 273-286.
Kohli R., Krishnamurti R., Jedidi K. Subset-conjunctive rules for breast cancer diagnosis // Discrete Applied Mathematics. - 2006. - Vol. 154. - № 7. - P 11001112.
Landwehr N., Hall M., Frank E. Logistic model trees // Machine Learning. - 2005. - Vol. 59. - № 1-2. - P. 161-205
Liu M., Pan H., Zhang F. et al. Screening of Differentially Expressed Genes among Various TNM Stages of Lung Adenocarcinoma by Genomewide Gene Expression Profile Analysis // Asian Pacific Journal of Cancer Prevention. 2013. - Vol. 14. - № 11. - P 6281-6286
Murphy K., Van G., Schilham A.M.R. et al. A large-scale evaluation of automatic pulmonary nodule detection in chest CT using local image features and k-nearest-neigh-bour classification // Medical Image Analysis. - 2009. -Vol. 13. - № 5. - P. 757-770
Naveen N., Ravi V., Rao C.R. Rule extraction from differential evolution trained radial basis function network using genetic algorithms. - 2009. - P. 152-157
Orozco H.M., Villegas O.O.V., Sanchez V.G.C. et al. Automated system for lung nodules classification based on wavelet feature descriptor and support vector machine // BioMedical Engineering OnLine. - 2015. - Vol. 14. - № 1. - P 9
Pass H.I. Malignant Pleural Mesothelioma: Surgical Roles and Novel Therapies // Clinical Lung Cancer. - 2001. -Vol. 3. - № 2. - P 102-117
Podolsky M.D., Barchuk A.A., Kuznetcov V.I. et al. Evaluation of machine learning algorithm utilization for lung cancer classification based on gene expression levels // Asian Pacific Journal of Cancer Prevention. - 2016. - Vol. 17. - № 2. - P. 835-838
Ramani R.G., Jacob S.G. Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models // PLOS ONE. - 2013. - Vol. 8. - № 3. - P. e58772
Rouhi R., Jafari M. Classification of benign and malignant breast tumors based on hybrid level set segmentation // Expert Systems with Applications. - 2016. - Vol. 46. - P. 45-59
Sun T, Wang J., Li X. et al. Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set // Computer Methods and Programs in Biomedicine. - 2013. - Vol. 111. - № 2. - P 519-524
Wang C.-W., Yu C.-P Automated morphological classification of lung cancer subtypes using H&E tissue images // Machine Vision and Applications. - 2013. - Vol. 24. - № 7. - P 1383-1391
Yoo C., Ramirez L., Liuzzi J. Big data analysis using modern statistical and machine learning methods in medicine // International Neurourology Journal. - 2014. - Vol. 18. - № 2. - P 50-57
Data Repository - Lung Cancer [Electronic resource]. URL: http://datam.i2r.a-star.edu.sg/datasets/krbd/ LungCancer/LungCancer-Harvard2.html (accessed: 23.09.2016)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
© АННМО «Вопросы онкологии», Copyright (c) 2017