We use cookies to ensure our website works properly and to personalise your experience. Cookies policy
1Department of Electronics and Communication, Vidya Academy of Science and Technology, Technical Campus, Kilimanoor, India.
2Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India
Semantic segmentation is a challenging problem in computer vision. In recent years, the performance of semantic segmentation has been considerably enhanced by employing cutting edge technique.This paper presents an advanced semantic segmentation methodology that uses the PSPNet (Pyramid Scene Parsing Net-work) architecture augmented with atrous convolution networks and a spatial attention module . The primary objective is to improve segmentation accuracy by integrating spatial attention mechanisms with the PSPNet framework, in association with atrous convolution networks. The spatial attention module selec-tively highlights pertinent spatial regions within feature maps, enhancing the ability of the model to capture intricate details crucial for precise segmentation. Experimental evaluations are carried out in two datasets: Stanford Background dataset and the Aerial Semantic Segmentation Drone dataset.This improvement underscores the efficacy of integrating spatial attention mechanisms and atrous convolution networks within the PSPNet architecture for semantic segmentation tasks, propelling advancements in the state-of-the-art performance within this domain.
Semantic segmentation is a basic computer vision challenge that involves classifying each pixel in an image into various object such as” person,”” car,”” building”, etc. Semantic segmentation presents a comprehensive knowledge of the scene by dividing it into meaningful parts based on item categories, in contrast to image classification, which gives a single label to a whole image. In computer vision, semantic segmentation [2] plays an important role in diffrent disciplines like autonomous driving and medical imaging. Systems for object detection and recognition are aided by the use of semantic segmentation, and the accuracy of localization and identification is improved by the accurate drawing of object boundaries. It also applies in involving scene understanding, such as autonomous driving, robotics, surveillance systems, and scene parsing, since it helps robots to efficiently understand spatial layouts and semantic contexts. Semantic segmentation enhances diagnosis, treatment planning, and medical research in medical imaging by analysing MRI, CT images, and histopathology slides. In the field of semantic segmentation, various innovative strategies have evolved to improve accuracy and efficiency. Fully Convolutional Networks (FCNs) [24] replace fully connected layers with convolutional layers to make spatially dense predictions. U-Net, well known for its usage in medical imaging, employs an encoder-decoder design with skip links to preserve localization and contextual information. SegNet has a similar design, but it up samples utilising max-pooling indices from the encoder step. Deep Lab [26] employs atrous convolutions to gather multi-scale contextual information and Atrous Spatial Pyramid Pooling (ASPP) [25] to improve feature extraction. The PSPNet [1] includes a pyramid pooling module for gathering contextual information at multiple sizes, which improves scene interpretation. Semantic segmentation faces several challenges, including handling complex scenes, maintaining fine-grained details, computational efficiency, and dealing with varied object scales. Using PSPNet with atrous convolution and a spatial attention module effectively addresses several key challenges in semantic segmentation. Atrous convolution, used in the atrous spatial pyramid pooling (ASPP) module, collects multi-scale information while maintaining resolution, resulting in detailed and high-quality segmentation. The ASPP module is made up of parallel atrous convolutions with varying dilation rates, which assist capture features at different scales and incorporate multi-level contextual information. The paper introduces a spatial attention module to the PSPNet [1] design and Atrous convolution, improving it through the introduction of a unique semantic segmentation technique. The PSPNet, [3] is a well-known architecture designed for semantic segmentation applications. Its salient feature is its capacity to employ pyramid pooling modules to record contextual information at various sizes, thereby enabling an improved understanding of situations. By including a spatial attention module, this method enhances the capabilities of the PSPNet network and improves its overall performance. The effectiveness of this suggested strategy in improving seg-mentation accuracy and performance has been extensively tested on the Stanford Background Dataset [6], [11] . The PSPNet obtains context from a variety of receptive fields by dividing the input feature map into sub-regions and carrying out pooling operations with varying kernel sizes.
Nuradeen Abdullahi Yusuf, Danlami Dauda*, Saudat Bello Adamu, Effect of Entrepreneurship Education on Entrepreneurial Motivation Among Students of Federal Polytechnics in North-West Nigeria, Int. J. Sci. R. Tech., 2025, 2 (11), 647-656. https://doi.org/10.5281/zenodo.17682633
10.5281/zenodo.17682633