Semantic segmentation is a basic computer vision challenge that involves classifying each pixel in an image into various object such as” person,”” car,”” building”, etc. Semantic segmentation presents a comprehensive knowledge of the scene by dividing it into meaningful parts based on item categories, in contrast to image classification, which gives a single label to a whole image. In computer vision, semantic segmentation [2] plays an important role in diffrent disciplines like autonomous driving and medical imaging. Systems for object detection and recognition are aided by the use of semantic segmentation, and the accuracy of localization and identification is improved by the accurate drawing of object boundaries. It also applies in involving scene understanding, such as autonomous driving, robotics, surveillance systems, and scene parsing, since it helps robots to efficiently understand spatial layouts and semantic contexts. Semantic segmentation enhances diagnosis, treatment planning, and medical research in medical imaging by analysing MRI, CT images, and histopathology slides. In the field of semantic segmentation, various innovative strategies have evolved to improve accuracy and efficiency. Fully Convolutional Networks (FCNs) [24] replace fully connected layers with convolutional layers to make spatially dense predictions. U-Net, well known for its usage in medical imaging, employs an encoder-decoder design with skip links to preserve localization and contextual information. SegNet has a similar design, but it up samples utilising max-pooling indices from the encoder step. Deep Lab [26] employs atrous convolutions to gather multi-scale contextual information and Atrous Spatial Pyramid Pooling (ASPP) [25] to improve feature extraction. The PSPNet [1] includes a pyramid pooling module for gathering contextual information at multiple sizes, which improves scene interpretation. Semantic segmentation faces several challenges, including handling complex scenes, maintaining fine-grained details, computational efficiency, and dealing with varied object scales. Using PSPNet with atrous convolution and a spatial attention module effectively addresses several key challenges in semantic segmentation. Atrous convolution, used in the atrous spatial pyramid pooling (ASPP) module, collects multi-scale information while maintaining resolution, resulting in detailed and high-quality segmentation. The ASPP module is made up of parallel atrous convolutions with varying dilation rates, which assist capture features at different scales and incorporate multi-level contextual information. The paper introduces a spatial attention module to the PSPNet [1] design and Atrous convolution, improving it through the introduction of a unique semantic segmentation technique. The PSPNet, [3] is a well-known architecture designed for semantic segmentation applications. Its salient feature is its capacity to employ pyramid pooling modules to record contextual information at various sizes, thereby enabling an improved understanding of situations. By including a spatial attention module, this method enhances the capabilities of the PSPNet network and improves its overall performance. The effectiveness of this suggested strategy in improving seg-mentation accuracy and performance has been extensively tested on the Stanford Background Dataset [6], [11] . The PSPNet obtains context from a variety of receptive fields by dividing the input feature map into sub-regions and carrying out pooling operations with varying kernel sizes.
Devika K. P.* 1
Reshmi S. Bhooshan 2
10.5281/zenodo.17682633