Diagnosis of Diseases from Medical Check-up Test Reports Using OCR Technology with BoW and AdaBoost algorithms

Abdulaziz, Wisam and M.Ameen, Musa (2020) Diagnosis of Diseases from Medical Check-up Test Reports Using OCR Technology with BoW and AdaBoost algorithms. 2019 International Engineering Conference (IEC). pp. 205-210.

[img] Text (Research Article)
DiagnosisofDiseasesfromMedicalCheck-upTestReportsUsingOCRTechnologywithBoWandAdaBoostalgorithms.pdf - Published Version

Download (2MB)
Official URL: https://conferences.tiu.edu.iq/iec/

Abstract

Abstract—This research introduces an approach to diagnose diseases from medical check-up test reports. The proposed approach is produced from Optical Character Recognition (OCR) technology to convert the hard copy test reports into editable textual data, Bag of Words (BoW) model as feature selection algorithm, Naïve Bayes as classification algorithm, and AdaBoost technique to enhance the performance of the Naïve Bayes classifier. The performance of the proposed approach is very good in terms of validity and can be used in diagnosing of diseases from medical check-up test reports. The proposed approach is trained on dedicated trained partitions of multiple medical datasets, and then tested on the testing sets partitioned from the original datasets. The proposed algorithm is compared with the Support Vector Machine (SVM), Naïve Bayes (NB), Decision Table (DT),and k-Nearest Neighbors (k-NN) classifiers, in which all the algorithms are tested on the same datasets. The proposed algorithm showed higher accuracy than the other four classifiers.So, the proposed approach which is the combination of BoW with AdaBoost technique is used to predict the name of the diseases from the medical check-up test reports. After that, an image as an example of the disease will be presented as well with the name of the disease to the physician and the patient. The image presentation is very important for the patients, because they may not familiar with the medical terms and disease names. Finally, the proposed approach can be used in the medical area because of its good performance and showing validated results after it is tested.

Item Type: Article
Uncontrolled Keywords: Natural Language Processing, Optical Character Recognition, Bag-of-Words, AdaBoost, Support Vector Machine, Decision Table, Naïve Bayes, k-Nearest Neighbors
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Depositing User: ePrints deposit
Date Deposited: 04 Apr 2021 08:49
Last Modified: 04 Apr 2021 08:49
URI: http://eprints.tiu.edu.iq/id/eprint/452

Actions (login required)

View Item View Item