Theses (Master In Health Informatics & Analytics)
Permanent URI for this collection
Browse
Browsing Theses (Master In Health Informatics & Analytics) by Subject "Artificial Intelligence"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ThesisRestrictedAPPLICATION OF TRANSFORMERS TRANSFER LEARNING IN AUTOCODIFICATION OF INTERNATIONAL CLASSIFICATION OF DISEASES 10TH REVISION (ICD-10) CODING FOR MEDICAL DIAGNOSIS IN MALAYSIA(International Medical University, 2023)MUHAMMAD NAUFAL BIN NORDINThe process of converting unstructured medical diagnoses into structured data using the International Classification of Diseases 10th Revision (ICD-10) codes presents a significant challenge to healthcare facilities in Malaysia. The reliance on manual codification leads to potential errors, backlogs, and delays in data availability for analysis and decision-making which can negatively affect healthcare planning and resource allocation. To address these challenges, this study proposes the use of Artificial Intelligence (AI) specifically Transfer Learning and Natural Language Processing (NLP) to auto-codify free text medical diagnoses into standardized ICD-10 codes. The primary aim is to demonstrate that the fine-tuned machine learning model is capable of achieving over 85% prediction accuracy. The research objectives include identifying the best-pretrained model, determining the optimal model parameters, and investigating the impact of different training dataset sizes on prediction accuracy. Through these targeted strategies, this study seeks to provide a viable AI solution that enhances the accuracy, efficiency, and timeliness of medical data codification. This study successfully identified the finetuned Generative Pretrained Transformers 2 (GPT2) Large model as the most accurate prediction model for ICD-10 classification task with an optimal configuration that achieved a prediction F1 score of 86.27%, exceeding the initial target of 85%. However, it is worth noting that the Bidirectional Encoder Representations from Transformers (BERT) variant model namely ‘BioClinicalBERT’, which has been pre-trained on healthcare domain-specific data demonstrated significant efficiency in training with fewer parameters compared to the GPT2 Large Model. This finding underscores the potential of balancing domain-specific pre-training, selection of pre-trained model based on parameters and training dataset size in creating efficient models for complex healthcare tasks such as ICD-10 coding, suggesting an alternative route for future model development and improvement.