Osteoarthritis in the Knee Joint: Detection and Prediction Using X-ray Image Analysis

Ramakant, Jyoti Yadav, Sandhya Verma, Shubhanshi Rani, Shivam Kumar,

doi:10.5281/zenodo.17066643

Research Paper | Open Access
Volume 02 | Issue 09 | Article Id IJSRT/250309003

Osteoarthritis in the Knee Joint: Detection and Prediction Using X-ray Image Analysis
Ramakant* Jyoti Yadav Sandhya Verma Shubhanshi Rani Shivam Kumar
SCPM College of Nursing and Paramedical Sciences, Gonda, Uttar-Pradesh, India 271003

Abstract

Background: Knee osteoarthritis (OA) is one of the most prevalent musculoskeletal disorders, causing pain, disability, and reduced quality of life worldwide. Conventional diagnosis relies on radiographic evaluation using the Kellgren–Lawrence (KL) grading system, which is subjective and prone to inter-observer variability. Artificial intelligence (AI) has been proposed as a tool to enhance diagnostic accuracy through automated image analysis. Aim: To evaluate the performance of AI-based joint space width (JSW) measurements compared to manual methods and assess its potential for predicting OA severity. Methods: This cross-sectional analytical study included 39 patients aged 40 years and above with radiographically confirmed knee OA. Standard knee X-rays were analyzed manually and through an AI-based system using convolutional neural network algorithms. Measurements of JSW were compared using a Paired Samples T-Test. Correlation between KL grade and pain scores was assessed using Spearman’s rank correlation. Predictive performance of AI-based JSW for OA severity was evaluated by ROC curve analysis, and ordinal logistic regression examined associations with demographic and clinical factors. Results: The mean manual JSW was 3.249 mm, while AI-based JSW averaged 3.079 mm. There was no significant difference between manual and AI measurements (p = 0.507). Pain scores showed no significant correlation with KL grade (r = 0.089, p = 0.588). ROC analysis demonstrated poor predictive ability of AI-based JSW alone (AUC = 0.517). Ordinal logistic regression indicated that age, gender, and AI-derived JSW were not significant predictors of OA severity. Conclusion: AI-based analysis provides JSW measurements comparable to manual methods, suggesting its utility in reducing observer variability. However, JSW alone has limited predictive value for OA severity, highlighting the need for multi-factorial models integrating radiographic, clinical, and demographic data for robust prediction.

Keywords

Osteoarthritis, knee joint, artificial intelligence, Kellgren–Lawrence grade, joint space width

Introduction

Osteoarthritis (OA) of the knee is a chronic degenerative condition affecting synovial joints and is a leading cause of pain and disability globally. In 2019, an estimated 528 million individuals worldwide were living with osteoarthritis, representing a striking 113% increase compared to 1990 (1). Among those affected, approximately 73% were aged 55 years or older, and women accounted for nearly 60% of the cases (1). The knee joint is the most commonly involved site, with an estimated prevalence of 365 million cases globally, followed by the hip and hand (2). Of these, nearly 344 million individuals experience moderate to severe forms of the disease that could significantly benefit from rehabilitation interventions (3). The global burden of osteoarthritis is projected to rise further in the coming decades, driven by aging populations, increasing obesity rates, and higher incidence of joint injuries. Importantly, osteoarthritis should not be regarded as an inevitable outcome of aging but rather as a complex condition influenced by multiple risk factors that can be modified to reduce its impact. The disease is characterized by progressive deterioration of articular cartilage, formation of osteophytes, subchondral sclerosis, and joint space narrowing (JSN). These structural changes compromise joint function, leading to impaired mobility and reduced quality of life, particularly among older adults and individuals with obesity. According to the World Health Organization, more than 250 million people worldwide suffer from OA, with knee OA accounting for a substantial proportion of cases. Radiographic imaging, particularly X-rays, remains the primary diagnostic modality for assessing knee OA. The Kellgren–Lawrence (KL) grading system is the most widely adopted method for categorizing OA severity, ranging from Grade 0 (normal) to Grade 4 (advanced disease). Although widely used, manual interpretation of radiographs presents inherent limitations, including inter-observer variability and subjectivity, which may lead to diagnostic inconsistencies. Additionally, measuring joint space width (JSW) a critical indicator of cartilage loss manually is both time-intensive and prone to error (4). Artificial intelligence (AI) offers promising opportunities to overcome these limitations through automated and objective analysis of radiographic images (5). AI-based models, particularly convolutional neural networks (CNNs), have demonstrated high accuracy in medical image interpretation by learning complex patterns from large datasets. Automated JSW measurement can improve diagnostic reproducibility and facilitate early detection, which is essential for delaying disease progression and reducing the need for invasive treatments like total knee replacement. Despite these advancements, there is limited evidence on how AI compares with traditional manual methods for OA detection in real-world clinical settings. This study aims to fill this gap by comparing AI-based JSW measurements with manual measurements, evaluating their predictive ability for OA severity, and analyzing associations between radiographic severity and patient-reported pain levels.

MATERIALS AND METHODS

Study Design and Setting

This was a cross-sectional analytical study conducted at SCPM Hospital, Gonda. The study included patients diagnosed with knee OA who had undergone standard anteroposterior knee radiographs.

Study Population

A total of 39 patients aged 40 years and above with confirmed radiographic knee OA (KL grades 1–4) were enrolled. Patients with a history of knee surgery, trauma-induced arthritis, or rheumatoid arthritis were excluded.

Data Collection and Image Analysis

Standard knee radiographs were assessed both manually by experienced radiologists and through AI-based automated analysis. Manual JSW was measured using standard protocols, while AI-based JSW measurement utilized convolutional neural network (CNN) algorithms combined with preprocessing steps such as noise reduction, contrast enhancement, and region-of-interest segmentation.

Variables Assessed

Independent variables: AI-based JSW, osteophyte presence, subchondral sclerosis.
Dependent variables: OA severity (KL grade), pain score (1–10).

Statistical Analysis:

Statistical analysis was performed using SPSS software. The Paired Samples T-Test compared manual and AI-based JSW measurements. Correlation between KL grade and pain was analyzed using Spearman’s rank correlation. ROC analysis assessed the predictive ability of JSW_AI for OA severity. Ordinal logistic regression evaluated the role of age, gender, and AI-based features in predicting OA severity. A p-value <0.05 was considered significant.

RESULTS

Demographics and Baseline Characteristics

The study included 22 females (56.4%) and 17 males (43.6%), with a mean age of 57.8 ± 10.1 years. The majority of participants were in KL Grades 2 and 3 (moderate and severe OA), accounting for 61.6% of cases.

Table 1 Frequency Distribution of KL Grade (OA Severity)

KL Grade	Frequency (N)	Percent (%)	Valid Percent (%)	Cumulative Percent (%)
1 (Mild)	10	25.6	25.6	25.6
2 (Moderate)	12	30.8	30.8	56.4
3 (Severe)	12	30.8	30.8	87.2
4 (Advanced)	5	12.8	12.8	100.0
Total	39	100.0	100.0	100.0

Radiographic Features

Osteophytes were present in 64.1% of participants, and subchondral sclerosis was observed in 51.3%. The mean manual JSW was 3.249 mm (SD = 1.183), while the AI-based JSW was slightly lower at 3.079 mm (SD = 1.2327).

Table 2 Frequency Distribution of Osteophytes Presence

Osteophytes Presence	Frequency (N)	Percent (%)	Valid Percent (%)	Cumulative Percent (%)
Absent (0)	14	35.9	35.9	35.9
Present (1)	25	64.1	64.1	100.0
Total	39	100.0	100.0	100.0

Comparison of AI vs. Manual Measurements

The Paired Samples T-Test revealed no significant difference between manual and AI-based JSW measurements (p = 0.507), indicating strong agreement between the two methods.

Table 3 Paired Samples Statistics for JSW (Manual vs. AI Measurements)

Pair	Variable	Mean (M)	N	Std. Deviation (SD)	Std. Error Mean (SEM)	T-value	P-value
1	JSW_Manual (mm)	3.249	39	1.1830	0.1894	0.670	0.507
1	JSW_AI (mm)	3.079	39	1.2327	0.1974

Association Between KL Grade and Pain

Pain scores ranged from 1 to 10 (mean = 4.97 ± 2.65). Spearman’s correlation showed a weak and non-significant relationship between KL grade and pain (r = 0.089, p = 0.588), suggesting that radiographic severity does not directly predict pain intensity.

Table 4 Spearman’s Correlation Between KL Grade and Pain Level

Variable	KL Grade	Pain Level (1-10)
KL Grade	1.000	0.089
Pain Level (1-10)	0.089	1.000
Sig. (2-tailed)	—	0.588
N	39	39

Predictive Analysis

ROC analysis for JSW_AI in predicting OA severity yielded an AUC of 0.517, indicating poor predictive capability when used as a standalone marker. Ordinal logistic regression found that age, gender, and AI-measured JSW were not significant predictors of OA severity (Nagelkerke R² = 0.117).

Figure 1 The AUC = 0.517 indicates that JSW_AI measurements perform only slightly better than random guessing (AUC = 0.50) in classifying KL Grade 1 (mild OA) vs. higher grades (moderate to severe OA).

DISCUSSION

The present study evaluated the performance of AI-based joint space width (JSW) measurements compared to manual assessments and explored their potential as predictors of osteoarthritis (OA) severity. The results demonstrated that AI-derived JSW measurements were comparable to those obtained manually, with no statistically significant difference between the two methods. This finding underscores the capability of AI-based systems to provide accurate, reproducible measurements, supporting their clinical utility in OA detection. By reducing inter-observer variability—a persistent limitation in musculoskeletal radiology AI has the potential to improve diagnostic consistency and efficiency in routine clinical practice.

Comparison with Previous Studies

The accuracy achieved by AI-based JSW measurements in this study aligns with prior research. Antony et al. (2019) reported that deep learning models achieved reliable performance in automated JSW quantification, showing close agreement with radiologist-derived measurements (6). Similarly, Tiulpin et al. (2018) demonstrated that multimodal deep learning approaches, which combine features such as bone shape and texture, enhance OA detection beyond traditional measurements (7). These findings collectively suggest that AI can effectively replicate manual measurements while offering improved scalability and standardization. However, despite these promising results, the predictive ability of JSW as a single parameter for OA severity was limited in our study. The observed area under the curve (AUC) value indicates that JSW alone is insufficient for accurately classifying disease severity. This observation is consistent with previous evidence highlighting that OA is a multifactorial condition involving structural, biochemical, and clinical components (6,7). Advanced models integrating additional radiographic features such as osteophytes, bone texture, and subchondral sclerosis—have been shown to significantly outperform models that rely solely on JSW (7).

Pain and Radiographic Severity Relationship

An important finding of this study was the absence of a significant correlation between Kellgren–Lawrence (KL) grade and patient-reported pain severity. This result mirrors earlier studies that demonstrated weak or inconsistent associations between radiographic severity and clinical symptoms in knee OA (8). Pain in OA is influenced by multiple factors beyond cartilage loss, including synovial inflammation, meniscal pathology, and central pain sensitization. Moreover, psychosocial factors such as anxiety, depression, and coping strategies can further modulate pain perception, which explains why radiographic severity often fails to predict patient-reported outcomes (8). These findings underscore the need for holistic assessment strategies that incorporate both structural and symptomatic domains.

Impact of Age and Gender

Age and gender have consistently been identified as strong epidemiological risk factors for OA in large population studies (9,10). However, in this study, neither variable emerged as a significant predictor of OA severity. This discrepancy is likely attributable to the limited sample size and cross-sectional design, which restrict the ability to detect subtle associations. Prior research indicates that advancing age is associated with cumulative joint stress and cartilage degeneration, while hormonal and biomechanical differences contribute to the higher prevalence of OA among women, particularly after menopause (9,10). Future studies employing larger, more diverse cohorts may better elucidate the contribution of demographic factors to AI-based predictive models.

Clinical Implications and Integration of AI

The integration of AI-based tools in clinical radiology holds substantial promise. Automated systems can streamline the diagnostic process by providing rapid, reproducible measurements, reducing variability between observers, and supporting radiologists in high-volume settings (11). These tools can be particularly valuable in resource-limited environments where access to musculoskeletal imaging specialists is restricted. Additionally, AI-driven approaches could facilitate large-scale screening programs aimed at identifying individuals at high risk of progressive OA, thereby enabling earlier interventions such as physiotherapy, lifestyle modifications, and pharmacologic treatments. Despite these advantages, several barriers to implementation remain. A primary challenge is ensuring the generalizability of AI models across diverse patient populations and imaging conditions. Many deep learning algorithms are trained on datasets derived from specific cohorts, which may limit their performance in real-world scenarios involving different ethnicities, age groups, and radiographic protocols (11). External validation using heterogeneous datasets is therefore essential before clinical deployment. Another critical consideration is interpretability. AI systems often function as “black boxes,” making it difficult for clinicians to understand the rationale behind their predictions. Recent advances in explainable AI (XAI) have introduced visualization methods, such as saliency maps and feature attribution techniques, that can help identify which image regions influence model decisions (11). Improving transparency is crucial for fostering clinician confidence and ensuring safe integration into clinical workflows.

Limitations of the Present Study

This study has several limitations. The relatively small sample size reduces statistical power and limits the generalizability of findings. Additionally, the analysis was confined to radiographic features and did not include other relevant biomarkers such as synovial inflammation, biochemical markers, or MRI-based structural changes. MRI provides superior visualization of cartilage and soft tissue structures, which could enhance the accuracy of predictive models when combined with radiographic data (12). The cross-sectional design also precludes assessment of longitudinal progression, which is critical for evaluating the predictive performance of AI models over time.

FUTURE DIRECTIONS

Future research should focus on developing multimodal AI systems that integrate radiographic features with clinical variables, biochemical markers, and advanced imaging modalities such as MRI. Combining these diverse data sources can improve model robustness and predictive performance (12). Additionally, building large, standardized, multi-institutional datasets will be essential for improving the generalizability of AI models across varied clinical settings (13,14). Incorporating explainable AI methods will further enhance transparency and promote clinical acceptance.

CONCLUSION

AI-assisted X-ray analysis offers a reliable alternative to manual JSW measurement and can enhance standardization in OA diagnosis. However, its standalone predictive ability for OA severity is limited, emphasizing the need for multi-modal approaches. Implementing AI-based diagnostic tools in clinical practice could facilitate early detection and improve patient outcomes, provided they are validated on larger, diverse datasets.

REFERENCE

GBD 2019: Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. https://vizhub.healthdata.org/gbd-results/.
Long H, Liu Q, Yin H, Diao N, Zhang Y, Lin J et al. Prevalence trends of site-specific osteoarthritis from 1990 to 2019: Findings from the global burden of disease study 2019. Arthritis Rheumatol 2022; 74(7): 1172-1183.
Cieza A, Causey K, Kamenow K, Wulf Hansen S, Chatterji S, Vos T. Global estimates of the need for rehabilitation based on the Global Burden of Disease study 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020 Dec 19; 396(10267): 2006–2017.
Piccolo, C. L., Mallio, C. A., Vaccarino, F., Grasso, R. F., & Zobel, B. B. (2023). Imaging of knee osteoarthritis: a review of multimodal diagnostic approach. Quantitative imaging in medicine and surgery, 13(11), 7582–7595. https://doi.org/10.21037/qims-22-1392
Kumar, D., Boora, N., & Verma, M. K. (2022). Role of Artificial Intelligence in Radiography Techniques and Procedure. International Journal, 5(1), 134.
Antony, J., McGuinness, K., Moran, K., & O’Connor, N. E. (2019). Automated measurement of joint space width using deep learning models: Implications for osteoarthritis diagnosis. Medical Image Analysis, 56, 52–62.
Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., & Saarakkala, S. (2018). Multimodal machine learning for OA detection using X-rays: Combining shape and texture features. Scientific Reports, 8, 17275.
Neogi, T., Guermazi, A., Roemer, F., Nevitt, M., Scholz, J., Arendt-Nielsen, L., ... & Felson, D. T. (2016). The relationship between radiographic severity and pain in knee osteoarthritis. Annals of the Rheumatic Diseases, 75(8), 1358–1364.
Hunter, D. J., & Bierma-Zeinstra, S. (2019). Osteoarthritis. The Lancet, 393(10182), 1745–1759.
Kloppenburg, M., & Berenbaum, F. (2020). Osteoarthritis year in review: Epidemiology and therapy. Osteoarthritis and Cartilage, 28(3), 242–248.
Oakden-Rayner, L., Carneiro, G., Bessen, T., Nascimento, J. C., Bradley, A. P., & Palmer, L. J. (2020). The role of artificial intelligence in radiology: Current status and future directions. Radiology, 295(2), 345–361.
Hunter, D. J., Guermazi, A., & Roemer, F. W. (2014). MRI features of knee osteoarthritis and their clinical relevance. Rheumatic Disease Clinics of North America, 40(4), 527–554.
Saleem, S., Zahid, U., & Qayyum, A. (2020). Bone shape features and OA detection using image analysis. Journal of Digital Imaging, 33(3), 640–650.
Reddy, B., Kumar, M., & Sharma, A. (2024). Ensemble deep learning for OA detection: Combining Xception and InceptionResNetV2. Computers in Biology and Medicine, 162, 107073.
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A., Ciompi, F., Ghafoorian, M., ... & van Ginneken, B. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88.