JUCS - Journal of Universal Computer Science 29(4): 349-373, doi: 10.3897/jucs.89923

Automatic assignment of diagnosis codes to free-form text medical note

Stefan Strydom^‡, Andrei Michael Dreyer^‡, Brink van der Merwe^‡

‡ Stellenbosch University, Stellenbosch, South Africa

Corresponding author: Andrei Michael Dreyer ( andrei.dreyer1997@gmail.com )

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-ND 4.0). This license allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

Citation: Strydom S, Dreyer AM, van der Merwe B (2023) Automatic assignment of diagnosis codes to free-form text medical note. JUCS - Journal of Universal Computer Science 29(4): 349-373. https://doi.org/10.3897/jucs.89923

Abstract

International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.

Keywords

Prediction of ICD codes; transformer architecture