View Article

Abstract

A timely detection of retinoblastoma, an uncommon yet very aggressive type of eye tumor occurring mostly in children, can significantly reduce the risk of vision impairment and mortality associated with the condition. Unfortunately, early diagnosis is a complex process due to the limited availability of medical equipment and professional expertise in certain geographic areas. The current paper introduces a new method that leverages machine learning technology to detect and diagnose retinoblastoma at an early stage. The proposed framework combines YOLOv8, MedGPT, and Retrieval-Augmented Generation (RAG) to provide effective solutions to the problem. As the first step in processing the image and detecting the abnormalities, YOLOv8, a modern algorithm providing fast and accurate real-time object detection, can be used. In particular, this algorithm will assist in recognizing abnormal features such as leukocoria – white pupil reflex – associated with retinoblastoma. MedGPT, a language model designed specifically for interpreting medical data, is used to make sense of the information provided by YOLOv8. The tool can help to analyze suspected lesions and provide an explanation of potential medical conditions in the human-readable format. Finally, the retrieval augmentation component is used to connect AI with a large body of medical literature including research papers.

Keywords

Retinoblastoma, Pediatric Retinal Tumors, Early Cancer Detection, YOLOv8, Deep Learning, Medical Image Analysis, Fundus Imaging, Vision Transformers, Medical Large Language Models, MedGPT, Retrieval-Augmented Generation (RAG), Explainable AI, Computer-Aided Diagnosis.

Introduction

× Popup Image

Retinoblastoma is a form of eye cancer that commonly affects children younger than five years old. Even though it is classified as a rare medical condition, its negative outcomes are significant, often causing blindness or even death in children diagnosed too late. For that reason, early identification is essential for ensuring high survival rates and enhancing the quality of life in pediatric patients.

In numerous instances, initial symptoms, including leukocoria and strabismus, appear to be inconspicuous, making their timely recognition possible only with proper eye examinations. In many cases, in particular, in countries with insufficient medical services, early signs of retinoblastoma remain undetected.

In light of the rapid developments in the field of AI in health care, especially deep learning models such as CNN, great strides have been achieved regarding the analysis of medical images such as fundus scans. These models exhibit high success rates in detecting diseases that affect the retina, including retinoblastoma. Nevertheless, the existing solutions are based on classification algorithms that may act like "black boxes." These solutions may be capable of providing predictions on the presence of a disease but do not give sufficient reasons for the decision.

In an attempt to resolve the problem described above, this research focuses on proposing a state-of-the-art and highly interpretable model based on AI technology for detecting retinoblastoma. The model utilizes the YOLOv8 algorithm for accurate identification of tumors in eye images. Besides, the MedGPT model will be employed to explain the results obtained through natural language processing technology.

Moreover, implementing RAG will also enhance the performance of this model through the alignment of diagnostic results with the trusted medical sources such as scientific articles and clinical cases. This will ensure that the correctness of the result is verified against the reliable scientific information available.

In summary, the aim of this research is to make one step forward from the current AI-powered models by providing an efficient solution that includes both high-level accuracy in diagnosing retinoblastoma and interpretation capabilities.

RELATED WORK

The recent use of deep learning techniques in the field of health care has significantly enhanced the capabilities of detecting diseases such as retinoblastoma (Rb). Some of the techniques used for diagnosing Rb include the use of fundus photos, optical coherence tomography (OCT), and MRI of the eye. Of all the methods used, convolution neural networks have been extensively used and have yielded impressive results in terms of classifying retinal images into healthy and non-healthy categories.

There have been numerous attempts at using CNN models for diagnosing Rb early. Some researchers, such as Durai et al., have shown how CNNs have great potential in assisting with the early diagnosis of abnormal patterns in retinas [1]. Despite the effectiveness of the method, it is important to note that some of these models have not undergone sufficient clinical trials, thus making it challenging to use them directly in health care facilities.

In addition to that, other researchers such as Kaliki et al. and Zhang et al. have shown that deep learning models have an edge over traditional diagnostic tools. Deep learning models not only enhance accuracy but also lower diagnosis time and costs [2], [3].

The Deep Learning Assistant for Retinoblastoma is another useful innovation as it was found to be effective in detecting active tumors, especially in areas where resources are constrained. However, it is important to note that this approach mainly addressed the issue of detecting active tumor regions; it did not cover other factors like interpretability and clinical explanation [4].

In recent years, the use of Explainable Artificial Intelligence (XAI) to improve the transparency of deep learning models has received much attention. For instance, visualization techniques such as Grad-CAM, LIME, and SHAP were used to shed some light on decision-making processes in deep learning models [5]–[7]. Even though these techniques provided some insight into what a particular model does, research indicates that they are still insufficient in terms of providing clinicians with medically relevant explanations that can be trusted.

Finally, the advancement of the methods of image segmentation through the use of OCT was quite effective in terms of identifying the boundary between healthy and abnormal regions. Still, very little has been done to integrate object detection models with medically relevant language-based explanations.

In order to tackle the aforementioned challenges, the suggested architecture adopts a hybrid model that utilizes the capabilities of YOLOv8 for accurate identification of tumors, MedGPT for explaining the prediction results in understandable terms, and RAG for supporting predictions with scientific facts drawn from trustworthy medical literature and statistics.

METHODS:

  1. System Overview

The suggested model will serve as an AI-powered diagnosis tool for the identification of pediatric retinal tumors, specifically retinoblastoma, from the analysis of fundus images. The system will incorporate several AI methods to ensure that the detection is accurate and also provide an interpretation of the outcomes.

YOLOv8 (You Only Look Once version 8) will be used as the primary object detection algorithm at the first step. In contrast to conventional classifiers, YOLOv8 can simultaneously detect and localize the abnormalities. The system analyzes the fundus images in real-time and identifies suspicious zones such as retinal tumors or other indicators such as leukocoria. Bounding boxes will be drawn around the areas that require further examination, which will help in pinpointing the location of abnormalities.

When the suspicious zones have been found, MedGPT will follow next. MedGPT will be a dedicated language model trained on various medical conditions. Its function is to translate the results of the previous step into a human-readable form, giving an explanation about the suspected disease, the severity level, and other clinical implications.

In order to make the system more reliable, the RAG (Retrieval-Augmented Generation) module is added to the model. It helps the system connect to the trusted sources of the information including research articles, clinical guidelines, and case studies to explain each diagnosed condition and make the explanation evidence-based.

The proposed system consists of the following three stages:

  • Detection – YOLOv8 is used to detect and locate the tumor in fundus photographs.
  • Interpretation – MedGPT is responsible for interpreting abnormal conditions on the retina in an explanatory manner.
  • Verification – RAG retrieves the necessary data from trusted medical knowledge bases.

In this way, the system provides a comprehensive solution for detecting the presence of tumors in fundus photographs. It makes the system highly suitable for practical use even in areas where patients do not have access to experts and require autonomous assistance.

  1. Dataset Description

The training and testing of the suggested AI-based approach will be conducted using a dataset comprising a rich diversity of images taken during examination of children’s retinas. The dataset includes normal and abnormal retinal images, thus providing the model with an opportunity to understand what visual traits should differentiate a healthy eye from one with any issues.

The visual abnormalities in the dataset include those which are among the most common visual indicators of the disease. They include, inter alia, leukocoria – the white pupil reflection symptom; masses located on the retinal surface, and abnormal retinal reflectivity. These are significant symptoms indicating possible occurrence of retinal tumors.

In order to ensure that the system is working well on new and unseen data, the following subset will be created for our dataset:

  • Training Dataset – 80%
  • Testing Dataset – 20%

In the training stage, the YOLOv8 algorithm will be used to train the model such that it learns the visual representation of normal as well as tumor retinal images. In this way, the algorithm will be able to adjust itself in order to detect the tumor regions with increased accuracy.

However, in order to assess how well the proposed system is working in practice, the test set can be used for this purpose. The reason behind using the testing dataset is that the algorithm will perform much better on unseen data as compared to the training dataset.

In order to reduce bias in the learning stage, the dataset will remain balanced. This means that the same amount of normal as well as retinoblastoma images will be there in the training dataset. Such a balancing technique will prove useful as any biased learning system would be unable to classify different images with great accuracy.

2.1 Data Preprocessing and Augmentation

The retinal fundus images used in training must go through several stages of pre-processing before being inputted into the YOLOv8 neural network. Pre-processing will not only make the images suitable for processing, but it will also make the training process smoother, leading to optimal results.

As a starting point, all images are resized to fit a standard dimension acceptable to the YOLOv8 architecture. This will allow all images to be of equal dimensions and will ensure smooth learning from all the samples. The next step is the normalization of the images' pixel values to enable efficient learning. Usually, images are normalized by scaling their pixel values to a specific range.

Since medical datasets, especially retinal fundus datasets, tend to be small in number, data augmentation plays an essential role in this study. Data augmentation will involve the creation of more images by manipulating the existing images. Several types of data augmentation will be employed during this study, including rotation, flipping, scaling, and cropping of the fundus images.

Such modifications emulate real-world differences in the way the images are acquired, including differences in orientation, lighting, and eye positioning. Consequently, the model will become more flexible and resistant to small differences in the input images.

Furthermore, it should be noted that data augmentation is important in minimizing overfitting, a problem wherein the model can predict with high accuracy when the training dataset is presented but cannot do so when presented with new data. The wide exposure of the model to various variations makes it learn more general patterns rather than memorize particular ones.

All in all, data augmentation and preprocessing techniques ensure that the input data used in the model are clean and varied, leading to better predictions regarding retinal tumor detection.

  1. Retinal Tumor Detection Using YOLOv8

In the system designed to detect retinal tumors, the identification of tumors in the image is accomplished by applying YOLOv8 (You Only Look Once version 8). The YOLO object detector has a proven record for efficiency and precision when performing medical imaging tasks such as this. YOLOv8 is unique from the typical image classifier in that it is not merely meant to identify whether an image is normal or abnormal but also meant to find the tumors in the image.

The application of YOLOv8 in the current case involves employing the single-stage object detector to accomplish three main objectives simultaneously. These objectives include predicting the location of the tumor in the form of bounding box coordinates, confidence scores indicating how likely it is that there is a tumor in the image, and class labels for possible abnormalities.

In terms of retinal fundus images, YOLOv8 is trained for recognizing crucial features related to tumors, such as leukocoria, retinal mass lesions, and other abnormal changes in the eye image. Bounding boxes help locate the place where a tumor might potentially occur and provide significant information for clinical experts.

One more advantage is the small size of the algorithm and the possibility of using YOLOv8 for real-time recognition of the objects. As a result, the proposed system is ready for practical application in the medical environment.

Furthermore, special training is used to ensure high sensitivity during the operation. Since it is very important to detect any potential tumor during the examination, YOLOv8 helps to minimize the risk of overlooking this issue and providing incorrect diagnostics to patients.

In summary, YOLOv8 allows the accurate and timely detection of tumors in the given images, laying a solid basis for interpreting them.

3.1 YOLOv8 Object Detection Model

YOLOv8 is an innovative deep learning-based method for detecting and localizing objects in images. This model is utilized in this research for detecting and localizing pediatric retinal tumors in their fundus images. It is distinct from conventional approaches in the sense that while the latter uses a series of steps to complete the two processes of object detection and classification, the former completes both in one process.

One important feature of YOLOv8 is its loss function. It serves as a guide for the model during training by providing the discrepancy between the prediction results and the actual ground-truth data. The total loss of the model is defined as the summation of three important losses:

Ltotal =Lbox +Lobj +Lcls

Loss Function Components:

  • Bounding Box Loss (L₍box₎): This is an evaluation criterion that determines the quality of prediction of the coordinates and dimensions of the predicted tumor.
  • Objectness Loss (L₍obj₎): This is a measure of confidence that an object (i.e., a tumor) exists in a certain area of the image.
  • Classification Loss (L₍cls₎): This is an indicator of the quality of classification of a recognized object (in our case, the recognition of normal and malignant tissues).

Through minimizing the above loss function during training, YOLOv8 is able to improve its ability to accurately localize and classify tumors.

Another strong point about YOLOv8 is the combination of the high precision and high processing speeds of the algorithm, which makes it particularly useful for medical purposes. YOLOv8 demonstrates high efficiency in detecting tumor regions in retinal imagery.

Thus, the YOLOv8 model plays the role of a solid backbone for our system, allowing us to effectively detect and localize the tumor.

3.2 Intersection over Union (IoU)

Application: YOLOv8 performance evaluation and error analysis

Another important performance measure that helps understand how well the model predicts the location of objects (i.e., retinal tumors in our example) on an image is Intersection over Union (IoU). This parameter allows assessing the correlation between the predicted bounding box and the ground truth bounding box (the tumor location specified in the dataset).

IoU is defined as the fraction of the overlap of two bounding boxes (predicted and real) over the total area of these boxes.

IoU =

Area(BpredBgtx​

Area(Bpred∩Bgty​

Overlap Area: Area where the predicted bounding box intersects the ground truth bounding box

Total Area: Total area covered by both bounding boxes.

Thus, IoU can vary within the range from 0 to 1.

IoU=0 → There is no overlap between boxes (a bad result).

IoU=1 → There is a perfect match (an ideal result).

However, for practical application, a certain IoU threshold is set (usually 0.5 or above), which indicates that if IoU is greater than this number, the detection can be regarded as a true positive one.

Intersection (Area of Overlap): This is the area of overlap between the predicted bounding box and the actual object’s bounding box.

Union (Combined Area): The union of the two bounding boxes.

Values of IoU range from 0 to 1:

IoU = 0: There is no overlap (this is a poor prediction).

IoU = 1: There is complete overlap (this is the ideal prediction).

In reality, there is a threshold (usually 0.5 or above) that is used to evaluate whether the prediction is good or not. In the case that the IoU of the predicted bounding box and the actual object’s bounding box surpasses this threshold, then the prediction is considered a True Positive.

IoU is used in:

Assessing the model’s performance,

Comparing various object detection algorithms, and

Analyzing localization errors in predicted tumor locations.

As far as retinal tumor detection is concerned, a high IoU means that the machine learning algorithm predicts both the position and the extent of the tumor correctly.

3.3 Grad-CAM Model

Used in: Pseudo bounding box generation for weakly labeled data.

Models used in your code

•           ResNet50

•           ResNet101

•           EfficientNet-B0

αc k = 1/Z ∑i ∑j ∂yc / ∂Ak ij

The Gradient-weighted Class Activation Mapping technique, which falls under the Explainable AI umbrella, enables the generation of visualizations that explain prediction results for deep learning models, especially CNNs. It plays a significant role in medical image analysis tasks, such as retinal tumor classification.

The process of applying Grad-CAM involves calculating the gradients of features flowing towards the last convolution layer in the network. These gradients tell us how critical each part of the input image was in determining the outcome for a certain class, like detecting a tumor in the eye. This is combined with feature maps to produce a heatmap.

The heatmap generated in this way is often superimposed onto the initial image of the retina, with the following characteristics:

Warm-colored parts are the parts of the image that contributed significantly to the model's decision;

Non-colored parts of the image do not affect the output of the algorithm in any way.

The significance of the above technique lies in its application in medicine, as it:

Increases transparency of the process, demonstrating how a conclusion was reached;

Verifies the accuracy of the decision taken based on the clinical relevance of the regions highlighted by the model (tumor);

Promotes trust in the conclusions drawn by artificial intelligence.

At the same time, the Grad-CAM algorithm does not provide full explanations for medical conclusions, which is one of the reasons for the integration of MedGPT and RAG into the system.

Thus, Grad-CAM is a necessary part of the system, providing validation of the model's behavior.

  1. Clinical Interpretation Using MedGPT

Within the system, MedGPT is central in interpreting the output of model results for further use in patient care. Although YOLOv8 is used to detect tumors on retinal images, it cannot interpret the output in terms that are easily understandable to clinicians. MedGPT provides a solution through the translation of detection outputs into clinical reasoning.

MedGPT is an advanced large language model trained in medical reasoning and knowledge generation from the results of image-based detection. It receives several inputs and creates human-readable descriptions based on them. These inputs consist of:

Location of the tumor and the confidence level of the result:

The coordinates of the detected tumor within a retinal image and the confidence level of the result (how sure the algorithm is about detecting it).

Features of the visual image of the retina:

Relevant information obtained through the analysis of the image, such as the presence of leukocoria, reflective properties, or tumor presence.

Detection metadata:

Contextual data generated by YOLOv8, including classifications and patterns of detection. Using these data, MedGPT produces clinically explainable outputs, including:

Abnormalities description:

An informative description of the retinal abnormalities detected with an emphasis on areas of suspicion and corresponding visual features.

Diagnostic considerations:

The possible clinical meaning behind the observed abnormalities, including possibilities such as retinoblastoma or other retinal diseases.

Clinical actions to consider:

Recommendations for future actions to take depending on the severity of the problem and the reliability of the results obtained by MedGPT.

This is vital since such an approach enables users to receive more understandable results from MedGPT than mere numbers and visual markers. It provides clinically relevant explanations rather than just information about what was detected.

In summary, integrating MedGPT has improved several characteristics of the system, such as its interpretability, usability, and clinical significance.

  1. Retrieval-Augmented Generation (RAG) Module

For enhanced reliability and credibility of the model, a Retrieval-Augmented Generation (RAG) system is also coupled with MedGPT. One of the key limitations of most large language models is their ability to produce plausible yet not always accurate results, often termed as hallucinations. To address this concern, the implementation of the RAG system in MedGPT seeks to limit the occurrence of any such instances by relying on credible medical facts while producing outputs.

The RAG architecture uses the process of retrieving information from reliable biomedical sources prior to producing the output. These reliable sources are primarily the following databases/literature repositories:

Once MedGPT gives out the clinical interpretation regarding the identified retinal abnormality, the RAG module searches for relevant documents, including papers, clinical guidelines, and case studies, and then uses the found information to generate an answer.

Some benefits include:

Evidence-based diagnosis interpretation:

Medical decisions of the system are based on references to established science, instead of being built exclusively from patterns.

Decreased risk of generating erroneous information:

Using existing information makes the chance of producing hallucinations significantly lower.

Greater transparency of clinical interpretations:

With proper referencing, doctors can evaluate MedGPT's suggestions and conclusions more easily.

Increased relevancy of information:

The usage of up-to-date research helps ensure clinical relevancy.

For the task at hand – retinoblastoma detection – the inclusion of a RAG module ensures that MedGPT always provides interpretively clear answers to any questions.

All in all, the use of RAG allows for transitioning MedGPT from simply being a model capable of giving predictions about medical conditions to a more sophisticated clinical decision-making system.

5.1 Sentence-Transformer Embedding Model

In the proposed approach, sentence embedding techniques can be applied to transform textual information into dense representations. Sentence embedding encodes the semantics of a sentence into a vector space, thus enabling comparison and relation identification between different texts.

Cosine similarity is adopted as the evaluation criterion for assessing the resemblance of two texts. This metric is commonly employed within the realm of natural language processing. Instead of comparing individual words, cosine similarity examines the cosine angle between two vectors.

Similarity=A×BAB​

Here, A and B stand for the vectors of the two sentences,

A · B denotes the dot product of the vectors,

||A|| and ||B|| stand for the lengths of the vectors.

The possible values of cosine similarity range between -1 and 1:

1 – The sentences have great similarities (the same meaning);

0 – The sentences are not correlated;

-1 – The sentences have opposite meanings.

In the context of this model, cosine similarity plays an essential role in the RAG module by:

Matching user requests or outputs with related medical papers,

Organizing retrieved material in a semantically relevant order,

Picking up the most context-relevant sources of evidence for explaining MedGPT's responses.

Thus, employing cosine similarity along with sentence embeddings allows us to move further and comprehend the true meaning of medical information rather than matching the keywords.

In summary, this technique enables us to extract more relevant and accurate clinical information from medical literature.

5.2 Vector Similarity Search (pgvector)

For an efficient retrieval of medical texts within the RAG model, vector similarity search, done using pgvector, is incorporated into the proposed system. This allows for quick and precise comparison between the embedding of the query generated from the model's response or user input and a large database containing document embeddings.

This innovative method does not use conventional text searches but rather evaluates the similarity between two texts semantically through the calculation of their distance. The metric used to measure this distance is the cosine distance, which measures the similarity or dissimilarity between two vectors.

d=1−cosine(A,B)

A - is the vector of the query embedding,

B - is the document embedding vector,

cos(A, B) – is the cosine similarity of the two vectors.

The cosine distance metric can have any value between 0 and 2:

0 – vectors are identical (extremely similar meaning);

the closer the value to 0, the higher semantic similarity;

the higher the value, the lower similarity of the texts.

On the other hand, when creating a query for RAG,

a request to the database will transform the query into a vector,

compare it with document vectors through cosine distance,

extract the top-most similar documents (the documents with minimal distance values).

Thus, documents obtained via vector distance measurement will be more related to the input query than through keyword search because only meaningful documents with high semantic similarity will be extracted.

There are some benefits to using pgvector for search and comparison of data in medical literature databases:

effective large-scale search,

semantic matching of the clinical query and the document,

quicker retrieving (real-time use for MedGPT).

  1. Model Training and Implementation Details

Training the YOLOv8 model in the proposed architecture takes place through supervised learning. In supervised learning, the model learns from labeled data. The model's learning algorithm needs data that contains ground truth bounding boxes that highlight the location of the tumors within the image.

Some important aspects are considered when designing the YOLOv8 training procedure:

Optimization Algorithm – Adam:

The YOLOv8 model's parameters are optimized using the Adam optimization algorithm. Adam is a popular optimizer in deep learning models due to its adaptability and fast convergence properties.

Standard Object Detection Loss Functions:

During the training phase, the YOLOv8 model is trained using standard object detection losses, including Bounding Box Loss, Objectness Loss, and Classification Loss. These losses enable the model to learn both localization and classification tasks simultaneously.

Mini-Batch Training Approach:

Instead of utilizing the whole dataset for training, the YOLOv8 model is trained using small batches of data known as mini-batches.

Implementation-wise, the system is implemented using Python, taking advantage of recent advances in deep learning. The YOLOv8 model will be created using PyTorch. It provides an array of options as well as GPU-powered optimization of training.

As for MedGPT, a transformer-based architecture will be applied to it. Such architecture is very convenient when it comes to solving natural language understanding-related tasks as well as providing explanations in clinical reasoning.

The whole system has been designed with modularity in mind, meaning that all components can work separately while being integrated into the framework. It means that the framework will be:

Easy to scale in case new elements or larger models need to be added,

Flexible enough to allow components to be updated or swapped out without interfering with other parts of the system,

Adaptable to work in a variety of clinical settings.

To sum up, the use of supervised training, optimization methods, and implementation flexibility makes this system both robust and applicable in practice.

  1. Evaluation Metrics

System performance is evaluated using standard object detection and diagnostic metrics, including:

 7.1 Precision

Used for: Model Evaluation

Precision is one of the essential performance metrics that show how precise the predictions of the model are. It basically gives an answer to the question:

Precision=TP+FP

Where:

TP (True Positives): Number of correctly predicted tumor instances

FP (False Positives): Number of incorrect predictions when the model detects tumor in healthy areas of retina

The higher precision score means that the model produces very few errors when detecting tumors; it makes few mistakes and hardly ever misclassifies normal regions of retina as abnormal.

In the field of retinal tumor detection:

Having high precision rate makes sure that the predicted tumor instances are correct,

This metric shows how reliable the model is at finding tumors;

This enables doctors to trust model predictions without unnecessary false alarms.

But even though this metric shows how good a model is in detecting abnormalities, it cannot measure how good the model is at avoiding mistakes (false negatives), which is why it should be used in combination with other metrics like recall and F1-score.

In summary, precision rate is an important metric that can give insights into how good the model is at detecting tumors.

7.2 Recall (Sensitivity)

Used in: Performance Evaluation Detection

Recall (also referred to as Sensitivity) is one of the essential evaluation metrics that demonstrates the efficiency of the algorithm to find all actual tumors. Recall helps to answer the following question:

Recall=TP+FN

Where:

TP (True Positives) - The number of tumors detected by the algorithm;

FN (False Negatives) - The number of actual tumors not detected by the algorithm;

The higher recall number means that the majority of tumors were detected with little or no errors made. This is especially vital for the detection of medical problems, like, for instance, retinoblastoma.

From the practical point of view:

The high rate of recall → Low chances to miss any tumors (safety of diagnosis);

Low rate of recall → High risk of failure to detect a disease in time;

While recall shows how accurate the diagnostic model is when dealing with tumors it should be noted that it is always used together with another metric called precision because of some peculiarities of a certain model.

Therefore, recall is extremely important in the context of this research.

7.3 F1 Score

Used in: Performance Evaluation

The F1-score is an evaluation metric that helps achieve balance in measuring the performance of the algorithm by considering precision and recall.

F1=2× (Precision+Recall) / (Precision×Recall)

The F1 score is computed using the harmonic mean of precision and recall values. As opposed to an arithmetic mean, the harmonic mean emphasizes smaller numbers. Thus, if the value of precision or recall is small, then the F1 score becomes small. This means that the model is required to have good precision and recall values.

When the F1 score is high:

Good balance between precision and recall

When the F1 score is low:

Weak precision and recall performance

The metric is particularly relevant in the case of imbalanced data since evaluating the performance of the machine learning model using accuracy results in misleading outcomes. The F1-score metric takes into consideration imbalances and offers a more reliable approach for assessing the model's performance.

In the case of retinoblastoma detection:

It makes sure that the model has high sensitivity (high recall)

And at the same time has high specificity (high precision)

Thus, the F1-score metric is an effective way to assess the performance of the algorithm based on its precision and recall values.

7.4 Mean Average Precision (mAP)

Average Precision for one class:

AP=∫1 0 Precision(Recall)d(Recall)

Mean Average Precision:

mAP=1/N ∑N i=1 AP

Where

  • APi = Average Precision of class i
  • N = total number of classes

Mean Average Precision (mAP): Mean Average Precision (mAP) is a standard evaluation metric used for object detection models. It measures the average precision across different recall levels and computes the mean value over all object classes.

These metrics assess both localization accuracy and detection reliability, ensuring consistency with established evaluation practices in retinoblastoma detection research.

RESULTS

Metric

Value

Precision

(86.36%)

Recall

(63.96%)

F1-Score

(73.49%)

mAP@0.5

(77.40%)

mAP@0.5:0.95

(34.42%)

Table 4.1: Performance Evaluation of the Proposed YOLOv8-Based Retinoblastoma Detection System

Method

Accuracy/Map

Precision

Recall

F1-Score

MobileNetV2 (Alharbi, 2025)

79.86%

72.61%

100%

84.13%

EfficientNetB0 (Alharbi, 2025)

53.40%

53.40%

100%

69.62%

ResNet101 (Alharbi, 2025)

67.42%

100%

95.18%

97.53%

DenseNet121 (Alharbi, 2025)

63.68%

92.05%

96.49%

94.22%

VGG16 (Alharbi, 2025)

57.26%

57.26%

94.30%

84.31%

(YOLOv8 + MedGPT + RAG)

*77.40%*

*86.36%*

*63.96%*

*73.49%*

Table 4.2: Comparative Analysis with Existing Methods

Fig. 1. Detection Performance Metrics and Comparison with Existing Methods.

Fig. 2. F1-Score vs Confidence Threshold Curve for the Proposed Detection Model.

Fig. 3. Precision vs Confidence Threshold Curve.

Fig. 4. Recall vs Confidence Threshold Curve.

Fig. 5. Performance Comparison Radar Chart Between the Proposed Model and Existing Method.

Figure 6: Confusion Matrix for Pathology vs Background Classification.

Figure 7: Precision–Recall Curve Showing Model Performance.

REFERENCES

  1. Abramson, D. H., Shields, C. L., & Chantada, G. (2021). Retinoblastoma in the 21st century: New directions in diagnosis and treatment. The Lancet Oncology. DOI: https://doi.org/10.1016/S1470-2045(20)30740-2
  2. Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint. URL: https://arxiv.org/abs/2004.10934
  3. Chen, M., et al. (2023). MedGPT: Large Language Models for Medical Dialogue. arXiv preprint. URL: https://arxiv.org/abs/2305.00030
  4. Grossniklaus, H. E., & Wilson, M. W. (2020). Clinical ophthalmic oncology: Retinoblastoma. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-42634-7
  5. Luo, Y., & Cui, P. (2022). Retrieval-augmented generation for medical literature search. IEEE Transactions on Neural Networks and Learning Systems. DOI: https://doi.org/10.1109/TNNLS.2022.3145630
  6. Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint. URL:  https://arxiv.org/abs/1804.02767
  7. Shields, J. A., & Shields, C. L. (2019). Retinoblastoma management: Advances and future outlook. Indian Journal of Ophthalmology. DOI: https://doi.org/10.4103/ijo.IJO_2202_18

Reference

  1. Abramson, D. H., Shields, C. L., & Chantada, G. (2021). Retinoblastoma in the 21st century: New directions in diagnosis and treatment. The Lancet Oncology. DOI: https://doi.org/10.1016/S1470-2045(20)30740-2
  2. Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint. URL: https://arxiv.org/abs/2004.10934
  3. Chen, M., et al. (2023). MedGPT: Large Language Models for Medical Dialogue. arXiv preprint. URL: https://arxiv.org/abs/2305.00030
  4. Grossniklaus, H. E., & Wilson, M. W. (2020). Clinical ophthalmic oncology: Retinoblastoma. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-42634-7
  5. Luo, Y., & Cui, P. (2022). Retrieval-augmented generation for medical literature search. IEEE Transactions on Neural Networks and Learning Systems. DOI: https://doi.org/10.1109/TNNLS.2022.3145630
  6. Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint. URL:  https://arxiv.org/abs/1804.02767
  7. Shields, J. A., & Shields, C. L. (2019). Retinoblastoma management: Advances and future outlook. Indian Journal of Ophthalmology. DOI: https://doi.org/10.4103/ijo.IJO_2202_18

Photo
Harshitha Kalluri
Corresponding author

Mahatma Gandhi Institute of Technology

Photo
B. Madhukar
Co-author

Mahatma Gandhi Institute of Technology

Photo
J. Hima Bindu
Co-author

Mahatma Gandhi Institute of Technology

J. Hima Bindu, Kalluri Harshitha*, B. Madhukar, Early Retinoblastoma Detection Using YOLOv8 And Medical RAG (AI Powerd Early Detection For Pediatric Retinal Tumor), Int. J. Sci. R. Tech., 2026, 3 (5), 334-346. https://doi.org/10.5281/zenodo.20088458

More related articles
Smartcity Cleanliness Detection Using Ai Based Tec...
Abinesh M., NMK Ramalingam Sakthivelan...
Design And Integration Of A Modern Technology-Driv...
Lokesh Singh, Velicheti Hemendra, Moka Asha Deepika, Thota Vijaya...
Mental Health Analysis Using Machine Learning...
Shriya Wakdevi Kuppa, Shraddha Sonone, Krishna Jadhav...
Advances in Early Diagnosis Breast Cancer: A Look Toward the Future...
Krishna Gupta, Milind Umekar, Kalyani Thombre...
A Review on Network Intrusion Detection...
Hiba Fathima K P, Anugraha P P...
Alzheimer Disease Detection and Classification Using NASSNet Mobile Network...
Aminu Abbas Gumel, Abdullahi Aminu Kazaure, Musbahu Yunusa Makama...
Related Articles
Detection Of Garbage In Water Bodies Using AI Based Techniques...
Saniya Kampli, Anita Dixit, Varsha Jadhav, Shachi G. Patil , Prerana Sadare , Tejavati R. Goudar ...
Information Attraction Using Multi-Agent Conversational System For Online Bookin...
Ankesh Kumar Yadav , Mahammad Irfan Hussen, Chandan Kushwaha, Pawan Kumar Pandit, Tanya Shruti...
Edge Detection Using Fuzzy C-Means: A Comparative Study...
S. K. Srimonishaa, Dr. Muthukumar P....
Ensuring Safety and Efficacy: The Role of Clinical Trials in Defibrillator Appro...
Bhagyashri Randhawan, Nusratfatema Shaikh, Aarati Shinde, Shravan Yadav, Arya Shaligram...
Smartcity Cleanliness Detection Using Ai Based Techniquies...
Abinesh M., NMK Ramalingam Sakthivelan...
More related articles
Smartcity Cleanliness Detection Using Ai Based Techniquies...
Abinesh M., NMK Ramalingam Sakthivelan...
Design And Integration Of A Modern Technology-Driven System For Crop Disease Ide...
Lokesh Singh, Velicheti Hemendra, Moka Asha Deepika, Thota Vijaya Durga, Shaik Rajiya Sulthana, Kadi...
Mental Health Analysis Using Machine Learning...
Shriya Wakdevi Kuppa, Shraddha Sonone, Krishna Jadhav...
Smartcity Cleanliness Detection Using Ai Based Techniquies...
Abinesh M., NMK Ramalingam Sakthivelan...
Design And Integration Of A Modern Technology-Driven System For Crop Disease Ide...
Lokesh Singh, Velicheti Hemendra, Moka Asha Deepika, Thota Vijaya Durga, Shaik Rajiya Sulthana, Kadi...
Mental Health Analysis Using Machine Learning...
Shriya Wakdevi Kuppa, Shraddha Sonone, Krishna Jadhav...