Theses (Master In Health Informatics & Analytics)
Permanent URI for this collection
Browse
Browsing Theses (Master In Health Informatics & Analytics) by Title
Now showing 1 - 13 of 13
Results Per Page
Sort Options
- ThesisRestrictedAPPLICATION OF SUPERVISED MACHINE LEARNING ON PREDICTION OF DEATH AMONG SEVERE DENGUE CASES(International Medical University, 2023)DURATUL’AIN BINTI MOHAMAD NAZRIAlthough various predictive models exist for dengue, only four models exist for predicting mortality in severe dengue cases. However, none of the existing models consider the timing of events during model development. To address this gap, a multicentre retrospective cohort study was conducted to create a predictive model for deaths in severe dengue. The study aimed to provide a more comprehensive and effective approach to predicting mortality rates in such cases. The study focused on patients diagnosed with severe dengue based on the classification provided by WHO 2009 and gathered a total of 18 predictor variables comprising demographic, clinical, and laboratory data. We used LASSO for variable selection and model-building and conducted a ten-fold cross-validation for internal validation. The analysis was conducted on a cohort of 786 severe dengue cases, out of which 35 (4%) resulted in fatalities. Furthermore, 575 out of 786 severe dengue patients (73.2%) were diagnosed with severe dengue during the febrile phase. LASSO model identified Body Mass Index (BMI), vomiting, third space fluid accumulation, respiratory rate, white cell count, platelet count, AST, serum albumin, bicarbonate, and lactate at diagnosis of severe dengue as the predictor variables. The LASSO model's accuracy was 0.96 (95%CI: 0.91 to 0.98), with low sensitivity of 0.29 (95%CI: 0.15 – 0.46), high specificity of 1.00 (95%CI: 0.99 – 1.00). We have developed a high-performance dengue mortality prediction model based on timing of the event, clinical and laboratory data. We will deploy an open-access web tool for local validation and to stratify severe dengue patients. However, further investigations are required as 2/3 of the study population were diagnosed with severe dengue fever during the febrile phase, as this contradicts the current guideline.
- ThesisRestrictedAPPLICATION OF TRANSFORMERS TRANSFER LEARNING IN AUTOCODIFICATION OF INTERNATIONAL CLASSIFICATION OF DISEASES 10TH REVISION (ICD-10) CODING FOR MEDICAL DIAGNOSIS IN MALAYSIA(International Medical University, 2023)MUHAMMAD NAUFAL BIN NORDINThe process of converting unstructured medical diagnoses into structured data using the International Classification of Diseases 10th Revision (ICD-10) codes presents a significant challenge to healthcare facilities in Malaysia. The reliance on manual codification leads to potential errors, backlogs, and delays in data availability for analysis and decision-making which can negatively affect healthcare planning and resource allocation. To address these challenges, this study proposes the use of Artificial Intelligence (AI) specifically Transfer Learning and Natural Language Processing (NLP) to auto-codify free text medical diagnoses into standardized ICD-10 codes. The primary aim is to demonstrate that the fine-tuned machine learning model is capable of achieving over 85% prediction accuracy. The research objectives include identifying the best-pretrained model, determining the optimal model parameters, and investigating the impact of different training dataset sizes on prediction accuracy. Through these targeted strategies, this study seeks to provide a viable AI solution that enhances the accuracy, efficiency, and timeliness of medical data codification. This study successfully identified the finetuned Generative Pretrained Transformers 2 (GPT2) Large model as the most accurate prediction model for ICD-10 classification task with an optimal configuration that achieved a prediction F1 score of 86.27%, exceeding the initial target of 85%. However, it is worth noting that the Bidirectional Encoder Representations from Transformers (BERT) variant model namely ‘BioClinicalBERT’, which has been pre-trained on healthcare domain-specific data demonstrated significant efficiency in training with fewer parameters compared to the GPT2 Large Model. This finding underscores the potential of balancing domain-specific pre-training, selection of pre-trained model based on parameters and training dataset size in creating efficient models for complex healthcare tasks such as ICD-10 coding, suggesting an alternative route for future model development and improvement.
- ThesisRestrictedDASHBOARD ON KEY FACTORS INFLUENCING WORK EXPERIENCE AMONG DOCTORS IN PUBLIC HOSPITALS IN KLANG VALLEY(IMU University, 2024)YEE LEE ENGIntroduction: Advancement in information technology in healthcare and data visualisation should be leveraged to address the challenge of high doctor turnover, ultimately improving healthcare services for the public. The Malaysian government is currently gappling with issues related to brain drain, as highlighted by the Prime Minister during the Malaysia Madani economic development event in September 2023. Methodology: This research has three primary objectives: (I) to identify the key factors influencing doctors' work experience in the public hospital, (2) to design a dashboard for monitoring these key factors, and (3) to validate the dashboard with associated experts. The study utilised 301 primary datasets from Hospital Kuala Lumpur and Hospital Tunku Azizah, employing a cross-sectional, convenience purposive sampling method. The focus is on sociodemographic, socioeconomic, work, psychosocial and job satisfaction factors. The study used a structured questionnaire consisting of 80 questions. Results: Six key factors were identified through simple logistic regression (P<0.05): age group, total length of service, job position, total income, job satisfaction and burnout. Among these, job position and job satisfaction were selected as predictors through the hierarchical multiple logistic regression. The customised dashboard, designed using Tableau version 2023.2.1, provides interactive and actionable insights. The dashboard was validated by respective experts to ensure accuracy, usability, and functionality. Conclusion: Identifying key factors and designing an interactive dashboard are crucial for understanding and improving with doctors' work experiences in public hospitals. These insights can guide government policy-making to help retain the doctors and enhance the national healthcare system. Keywords: Doctor, Work experience, Job satisfaction, burnout, dashboard
- ThesisRestrictedDATA ANALYTICS ON TREND OF C-REACTIVE PROTEIN (CRP) LEVEL AND ITS CORRELATION WITH OTHER CLINICAL PARAMETERS IN DESCRIBING DISEASE PROGRESSION OF MILD COVID- 19 PATIENTS(International Medical University, 2022)NURAINI MUHAMMAD NAIMIntroduction: CRP is well known inflammatory marker and has been used as one of the haematological parameters to monitor the disease progression of COVID-19 patients. Several publications have reported the correlation between a high CRP level and the severity of COVID-19 infection as well as the possibility of using the CRP level as a predictive marker of a severe disease or an adverse outcome. However, a time-series analysis of the trend of CRP level in relation to the day of illness has not been done. Therefore, this study aims to assess the trend of CRP level in relation to the days of illness among patients with mild COVID-19 infection and to find its correlation with other clinical parameters in describing the progression of the disease. Apart from that, this study also aims to explore the possibility of CRP level being utilized as an indicator to determine if the patient is on the road to recovery. In view that CRP level is also known to increase with age and various chronic diseases, this study aims to only focus on a specific group of population who may be presumed to be healthy individuals to minimize the effect of other potential confounders. Methodology: Random sampling of 100 patients was done on patients who were admitted to the MAEPS Integrated Low Risk COVID-19 Treatment and Quarantine Centre 2.0, Serdang in August 2021. The medical records of each patient were accessed to retrieve the laboratory investigation results which was mapped by days of illness. Other clinical parameters information was also extracted based on a standard data collection form. A descriptive statistical analysis of the data was carried out to identify the trend of CRP levels in patients with mild COVID-19. Results: Most patients were admitted to PKRC MAEPS 2.0 within 4 days of testing positive for COVID-19. Out of these, 37 were fully vaccinated, 28 partially vaccinated and 35 unvaccinated. A different trend of CRP level emerged for these three different groups of patients, indicating an effect of vaccination on their inflammatory response. However, due to the large variation in the CRP levels and limited sampling size, a clear trend of the CRP level in relation to the day of illness to describe the disease progression was not established. On the other hand, high CRP level was associated with cough, anosmia and ageusia while fever was more commonly reported by patients with a lower value of CRP. Conclusion: CRP level is elevated in COVID-19 patients, but its actual value is not predictive of the outcome of patients with mild infection. This value is also affected by the vaccination status of the patient and may not be a useful for an infectious disease monitoring when there is an active vaccination programme taking place. Keywords: COVID-19, C-reactive protein, CRP, trend, data analytic
- ThesisRestrictedExploring Potential Benefits and Ethical Implications of Internet-Based Information Gathering for Active Ageing: A Data Analytics and Visualization Approach(International Medical University, 2023)Tajul Asni Bin AhamadThe world is moving fast towards an ageing population, with older adults representing a significant and growing proportion of the population. As older adults age, their health and well-being become increasingly important. The use of the Internet for health information gathering among older adults is a topic of growing interest in research and policy discussions and is a growing concern. Motivation for this study stemmed from the need to understand patterns and behaviours of how older adults use the internet for health information seeking. The use of the internet has been prevalent since the year 2000 and the older adults are not left behind in utilizing this resource. There has been a growing body of research on the use of technology, including the internet, by older adults. However, there is still limited research on the specific use of the internet for health information gathering among older adults. The aim of the study is to explore the potential benefits and ethical implications of Internet-based information gathering in promoting active ageing, with a focus on data analytics and visualization as an approach. The rationale for this study is to analyze the data collected on how the elderly use the Internet for health information seeking to add more insightful information to plan an effective policy to support the better use of the Internet for them in the future. By exploring this, policymakers can plan effective policies and support the elderly in better utilizing the internet for their health needs . The methodology for this study involved a quantitative and analytical approach to visualizing the data collected on the use of the internet for health information seeking among older adults. The data obtained from the National Health Institute (NIH), Ministry of Health Malaysia survey conducted in 2017 will serve as the baseline data on the use of the internet in searching for health information by the elderly. For this study, we will focus on the data analytics and visualization approach to exploring the potential benefits and ethical implications of internet-based information gathering for active ageing among the older adults. The findings of this study will provide valuable insights into the patterns and behaviours of older adults in using the internet for health information seeking. First of all, there are positive indications that the older adults that use the internet for health information seeking trusts its source and finds it useful. There is also evidence of active participation in social activities and also positive behaviours in taking actions upon getting health information from the internet like making lifestyle changes or seeking medical advice. The potential benefits of internet-based information gathering for active ageing are significant. Older adults are motivated to seek health related information online for a variety of reasons, including being self-reliant, staying active and productive, making better treatment choices, achieving a healthier life, valuing a healthy lifestyle, feeling relieved from stress, and lack of adequate information on health issues. They seek health information to maintain their independence, participate in decision-making about their care plan, and to stay healthy and active even in the face of physical limitations. However, this also exposes them to potential risks and ethical implications. The ethical implications of internet-based information gathering for active ageing must be carefully considered. For example, there is a concern about the quality and reliability of the health information available online which are further discussed in this study. As another example, privacy concerns arise as older adults may unknowingly share personal and sensitive information while seeking health information online. The study also focuses on the importance of addressing ethical implications and potential risks associated with internet-based information gathering for active ageing. In conclusion, exploring the potential benefits and ethical implications of internet-based information gathering in promoting active ageing is crucial in order to develop effective policies and programs to support the health and well-being of older adults.
- ThesisRestrictedFEATURE INTERACTION MODELLING FOR IMPROVED DENTAL IMPLANT FAILURE PREDICTION WITH MACHINE LEARNING(IMU University, 2024)Simranjeet KaurDental implants success poses a significant challenge in oral rehabilitation, influenced by various patients-specific and surgical factors. The objective of this study is to employ supervised machine learning (ML) algorithms to predict dental implant failure emphasizing on feature interaction analysis and its impact on model accuracy and reliability. 747 implants record dataset was utilised consisting of features related to patient surgeon and surgery. Features were selected based on statistical test and backward elimination. Greenwell's method and Freidman's H statistical method were employed to identify interacting pairs, followed by modelling with Random Forest, K Nearest Neighbor and Extreme Gradient Boosting with hyper parameter tuning via Standardized Search CV and Grid Search CV. While some interacting pairs improved model accuracy marginally, not all interactions led to improvement. With 'ridge augmentation and age' interacting pair, XG Boost achieved the highest accuracy of 89.04%. However, it could not surpass the Random Forest accuracy of 90.4% achieved through feature selection and hyperparameter tuning without interactions. The study demonstrated the potential oi feature interaction modelling in predicting dental implant failure but emphasized the careful selection of interactions. Future research can refine the understanding of feature interactions and its impact on predictive models, leading to enhanced clinical decision making in dental implantology.
- ThesisRestrictedIMAGE RECOGNITION USING MACHINE LEARNING TO DETERMINE CALORIE AND SUGAR CONTENT IN FRUITS AND VEGETABLES FOR INDIVIDUALS WITH DIABETES(International Medical University, 2023)NARMATHAA A/P KISHORE KUMARClassification and identification of fruits and vegetables through image recognition technology has witnessed significant growth in recent years, and this advancement is driven by the importance of nutritional profiling in determining dietary recommendations, particularly for people managing diabetes. Various studies were done using image recognition technology to identify the images and create nutritional information on the scanned images. The nutritional information emphasized in all the previous studies were mainly calorie value and some included information on sodium, potassium, iron, calcium, vitamins, protein, fats, and carbohydrate. However, the existing literature falls short in comprehensively integrating image recognition of fruits and vegetables with their sugar content values and how it affects the blood glucose levels in individuals with diabetes. Therefore, this project aims to bridge these gaps and proposes a novel solution to educate individuals managing diabetes on nutritional values such as calorie value and sugar content in fruits and vegetables and how the consumption affects blood glucose levels through image recognition technology. It is carried out by creating a workflow that demonstrates image classification and recognition, as well as a web application in Jupyter Notebook using Python libraries. A Kaggle dataset which consists of 3825 images of fruits and vegetables was used in this project. Three pre-trained deep-learning models named AlexNet, MobileNet-v2, and YOLOv4 were chosen for the image classification as these models have given the highest accuracy percentage for fruits and vegetables image classification in previous literature. The AlexNet model has given an accuracy percentage of 48 %, whereas MobileNet-v2 has given an accuracy percentage of 94% upon training the dataset with 3825 images. Due to the limitation of hardware and labeling, the YOLOv4 model was not used to train the dataset for image classification.
- ThesisRestrictedIMAGE SEGMENTATION OF CARBOHYDRATES ON PLATES OF COOKED MEALS(IMU University, 2024)YEE LI XIENThe accurate assessment of dietary intake is crucial for promoting health and managing diet-related conditions such as diabetes and obesity. Traditional methods of dietary assessment are often prone to inaccuracies and time-consuming. This study evaluates the effectiveness of three deep learning models U-Net with 16 filters, U-Net with 64 filters, and the Segment Anything Model (SAM) for segmenting carbohydrate regions in food images. The models were assessed using metrics such as accuracy, Intersection over Union (IoU), and Dice Score. The SAM model outperformed the U-Net models, achieving an overall accuracy of 99.24%, an IoU of 90.59%, and a Dice Score of 94.21%. The UNet 16-filter model showed better performance than the 64-filter model, with an accuracy of 97.86% and an IoU of 81.15%. These results highlight SAM's advanced capabilities in promptable segmentation and zero-shot transfer, making it the most effective model for this task. Future research should focus on expanding the dataset, integrating texture-based segmentation methods, and exploring data augmentation techniques to further enhance model robustness.
- ThesisRestrictedLOGISTIC CLASSIFICATION IN DIAGNOSING ACUTE FUNCTIONAL IMPAIRMENT IN MEN WITH MAJOR DEPRESSION(International Medical University, 2023)LOH MING HUIAlthough men are diagnosed with depression half as frequently as women and are unlikely to seek suicide, men are in fact more prone to death by suicide up to three to four times as often. Ultimately, it is highlighted that rarely do men pursue help yet are engaging in detrimental behaviours all the same at a greater prevalence than that of women. To improve mental health among men, a crucial measure to take is to raise their inclination in seeking aid for depression and associated functional impairment. Consequently, two research objectives are determined for the present study. Firstly, to ascertain the direction and strength of association between sociodemographic characteristics and Major Depressive Episode with Severe Impairment (MDESI) in men by utilising a nomogram. Secondly, to develop a logistic regression predictive model to classify men diagnosed with MDESI into categories with and without severe functional impairment. Data on adult men aged 18 years and above who have participated in the National Survey on Drug Use and Health (NSDUH), 2020 to 2021, are pooled and analysed. The nomogram has revealed that Native American men are at highest risk of experiencing MDESI compared to men of other ethnicities. Additionally, for men, being at an age between 50 to 64 years, having a family income of less than 20,000$ (US), being gay, strongly disagreeing with the importance of friends sharing religious beliefs, strongly agreeing with the importance of personal religious beliefs, agreeing with religious beliefs influencing personal decisions, and living at a non-metro area further increase the risk of experiencing MDESI. Using the training data set, the logistic regression predictive model has produced AUC = 0.733, accuracy = 0.638, recall = 0.638, and precision = 0.697 . Using the test data set, the scores have slightly increased for all measures (AUC = 0.746, accuracy = 0.678, recall = 0.678, precision = 0.729). Study results have, however, indicated that the current logistic model, when utilised as a classifier, is presently performing inadequately. Further work is required in order to enhance the overall model to be at a more adequate state. Keywords men’s mental health, major depression, severe impairment, National Survey on Drug Use and Health, machine learning, logistic regression classifier
- ThesisRestrictedMACHINE LEARNING-BASED PREDICTION FOR HOSPITAL ADMISSION AT EMERGENCY DEPARTMENT BY USING MEDICAL INFORMATION MART FOR INTENSIVE CARE(International Medical University, 2022)HANIS BINTI HASRIPredicting hospital admission in the Emergency Department (ED) using essential triage information is vital to reduce the current steps of the work processes and the chances of missing values. We aimed to predict hospital admission using supervised machine learning methods of triage information from Medical Information Mart for Intensive Care IV ED (MIMIC-IV-ED). We analyzed the MIMIC-IV-ED database and selected basic demography and vital signs. Two datasets were created after replacing string values with numerical values. Subsequently, feature selection techniques were applied to identify the essential variables. The cleaned data were fit into three machine learning algorithms: (i) Logistic Regression, (ii) Decision Tree, and (iii) Random Forest. The performance was compared by using the area under the operating curve (AUROC). We included 420,848 ED data, of which 26,864 (6%) were excluded. The hospital admission risk was 39%. Logistic Regression and Random Forest without feature selection perform the best in both hot-encoded (AUROC 0.763 and 0.770, respectively) and label encoded datasets (AUROC 0.761 and 0.771, respectively). Although Decision Tree performs better with feature selection, its best result is still inferior to Logistic Regression and Random Forest. We developed a prediction modeling using essential triage information routinely collected at the triage. Although the result is promising, additional demography, such as age, pain score, and chief complaint would likely yield a better result. Implementing such minimalist information into the Electronic Health Record (EHR) could add extra value by giving healthcare providers information about the upcoming overcrowding issue.
- ThesisRestrictedMOLECULAR DYNAMICS SIMULATION OF HYPERPHOSPHORYLATED TAU PROTEIN WITH DEOXYCORTICOSTERONE ACETATE, NITRAZEPAM, TESTOSTERONE CYPIONATE IN ALZHEIMER’S DISEASE(International Medical University, 2023)NISHALINI KURUP GOPALAKRISHNANAlzheimer’s Disease (AD) is a fatal neurodegenerative disorder marked by cognitive decline. Current treatment offers limited relief but do not later the progression of the disease. This study employs Molecular Dynamics (MD) simulation to investigate drug repurposing for hyperphosphorylated tau (HP Tau) protein, an important factor in AD as it damages neurons, produces tangles, and spreads resulting in cognitive decline. The binding affinities and interactions of another three top ranked repurposed drugs : Deoxycorticosterone Acetate (DOCA), Nitrazepam (NTZ) and Testosterone Cypionate (TPCC) that was selected in previous docking studies among 7 other repurposed drugs were evaluated based on their interactions to the protein HP Tau using MD simulations like thermodynamic properties and structural properties like Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), Radius of Gyration (RoG) and Hydrogen Bond (HB) analysis. The simulations demonstrated sustained drug-protein interactions, suggesting their potential to inhibit HP Tau aggregation. DOCA provided moderate stability with a large number of structural interactions, NTZ has multiple binding sites due to its increased hydrogen bonds, which are consistent with its modulatory effects on cognitive function and the hypothalamic-pituitary-adrenal axis associated with AD. This study emphasises drug repurposing, facilitated by MD simulations, as a cost-effective strategy for identifying new AD treatments. By targeting HP Tau, these drugs offer promising avenues for halting the progression of AD, offering hope in the fight against this incurable disease. Keywords: Alzheimer’s Disease, Ligands, Molecular Docking, Molecular Dynamics Simulation, Drug repurposing, Root Mean Square Deviation, Hydrogen Bonding, Radius of Gyration, Root Mean Square Fluctuation
- ThesisRestrictedMOLECULAR DYNAMICS SIMULATION OF HYPERPHOSPHORYLATED TAU PROTEIN WITH POTENTIAL LIGANDS FROM DRUG REPURPOSING(International Medical University, 2022)KEE KYE VERNAlzheimer disease (AD) is a progressive degenerative disorder of the brain resulting in the loss of higher cognitive function and is considered as the most common form of dementia. AD is characterised by a triad of pathological changes in the brain and there have been many proposed approaches and research aimed at treating AD. The two hallmark substrates causing the cognitive decline in AD are the amyloid beta (Aβ) plaques deposition, and the neurofibrillary tangles of hyperphosphorylated (HP) tau. In recent years, the focus on research has been based on the Aβ hypothesis. However, the failed clinical drug trials targeting Aβ suggest that tau related therapies may be a more viable approach to AD treatment. This study aims to analyse the binding affinity of Teniposide and Testosterone Enanthate as potential repurposed drug candidates acting as aggregation inhibitors of HP tau protein to prevent the formation of neurofibrillary tangles (NFT) which might stop the progression of AD. The binding interactions between the two proposed drugs with the HP tau protein was analysed by conducting 20 ns molecular dynamics (MD) simulation. Thermodynamics properties, root mean squared deviation (RMSF), root mean squared fluctuation (RMSF), radius of gyration (RoG) and hydrogen bond (HB) analysis were conducted on the trajectories of the MD simulation. The findings from this study suggested that Teniposide is the better potential compound in inhibiting the aggregation of HP tau protein and should be analysed further with a longer simulation, inclusion of MMGB/PBSA calculation and 2D/3D interaction images to ensure higher reliability.
- ThesisRestrictedRESEARCH PROJECT: A DATA-DRIVEN APPROACH TO PREDICT AND VISUALIZE THE CARDIOVASCULAR DISEASE WITH DATA ENGINEERING PROCESS AND MACHINE LEARNING.(International Medical University, 2022)LEE ZI SHENGCardiovascular disease (CVD) is a general term describe a group of patients with disorder of the heart and blood vessels. It is one of the deadliest diseases in the world. However, vast amount of data with patients’ information collect by health management, can act as input of machine learning (ML) models, which is useful in making prediction. This prediction help health professional make intervention earlier to reduce mortality rate from CVD. In this study, effective CVD prediction models are developed using ML models. Feature selection and oversampling are applied in this study to reduce the redundancy data and improve performance of machine learning models. The machine learning models are applied on 5 open-source CVD datasets. The results are present by dashboard visualization which help audience understand easily. Results have proved that predictive model can identify the risk factor of CVD, identify the machine learning model with highest accuracy. Keywords: Cardiovascular disease, machine learning, feature selection, dashboard visualization.