View Article

  • Semantic Segmentation Using PSP Network with Attention Mechanism

  • 1Department of Electronics and Communication, Vidya Academy of Science and Technology, Technical Campus, Kilimanoor, India.
    2Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India
     

Abstract

Semantic segmentation is a challenging problem in computer vision. In recent years, the performance of semantic segmentation has been considerably enhanced by employing cutting edge technique.This paper presents an advanced semantic segmentation methodology that uses the PSPNet (Pyramid Scene Parsing Net-work) architecture augmented with atrous convolution networks and a spatial attention module . The primary objective is to improve segmentation accuracy by integrating spatial attention mechanisms with the PSPNet framework, in association with atrous convolution networks. The spatial attention module selec-tively highlights pertinent spatial regions within feature maps, enhancing the ability of the model to capture intricate details crucial for precise segmentation. Experimental evaluations are carried out in two datasets: Stanford Background dataset and the Aerial Semantic Segmentation Drone dataset.This improvement underscores the efficacy of integrating spatial attention mechanisms and atrous convolution networks within the PSPNet architecture for semantic segmentation tasks, propelling advancements in the state-of-the-art performance within this domain.

Keywords

Semantic segmentation, PSP Network, Spatial attention module

Introduction

Semantic segmentation is a basic computer vision challenge that involves classifying each pixel in an image into various object such as” person,”” car,”” building”, etc. Semantic segmentation presents a comprehensive knowledge of the scene by dividing it into meaningful parts based on item categories, in contrast to image classification, which gives a single label to a whole image. In computer vision, semantic segmentation [2] plays an important role in diffrent disciplines like autonomous driving and medical imaging. Systems for object detection and recognition are aided by the use of semantic segmentation, and the accuracy of localization and identification is improved by the accurate drawing of object boundaries. It also applies in involving scene understanding, such as autonomous driving, robotics, surveillance systems, and scene parsing, since it helps robots to efficiently understand spatial layouts and semantic contexts. Semantic segmentation enhances diagnosis, treatment planning, and medical research in medical imaging by analysing MRI, CT images, and histopathology slides. In the field of semantic segmentation, various innovative strategies have evolved to improve accuracy and efficiency. Fully Convolutional Networks (FCNs) [24] replace fully connected layers with convolutional layers to make spatially dense predictions. U-Net, well known for its usage in medical imaging, employs an encoder-decoder design with skip links to preserve localization and contextual information. SegNet has a similar design, but it up samples utilising max-pooling indices from the encoder step. Deep Lab [26] employs atrous convolutions to gather multi-scale contextual information and Atrous Spatial Pyramid Pooling (ASPP) [25] to improve feature extraction. The PSPNet [1] includes a pyramid pooling module for gathering contextual information at multiple sizes, which improves scene interpretation. Semantic segmentation faces several challenges, including handling complex scenes, maintaining fine-grained details, computational efficiency, and dealing with varied object scales. Using PSPNet with atrous convolution and a spatial attention module effectively addresses several key challenges in semantic segmentation. Atrous convolution, used in the atrous spatial pyramid pooling (ASPP) module, collects multi-scale information while maintaining resolution, resulting in detailed and high-quality segmentation. The ASPP module is made up of parallel atrous convolutions with varying dilation rates, which assist capture features at different scales and incorporate multi-level contextual information. The paper introduces a spatial attention module to the PSPNet [1] design and Atrous convolution, improving it through the introduction of a unique semantic segmentation technique. The PSPNet, [3] is a well-known architecture designed for semantic segmentation applications. Its salient feature is its capacity to employ pyramid pooling modules to record contextual information at various sizes, thereby enabling an improved understanding of situations. By including a spatial attention module, this method enhances the capabilities of the PSPNet network and improves its overall performance. The effectiveness of this suggested strategy in improving seg-mentation accuracy and performance has been extensively tested on the Stanford Background Dataset [6], [11] . The PSPNet obtains context from a variety of receptive fields by dividing the input feature map into sub-regions and carrying out pooling operations with varying kernel sizes.

Reference

  1. Zhao, Qi, et al.” Semantic segmentation with attention mechanism for remote sensing images.” IEEE Transactions on Geoscience and Remote Sensing 60 (2021): 1-13.
  2. T. Pham,” Semantic Road Segmentation using Deep Learning,” 2020 Applying New Technology in Green Buildings (ATiGB), Da Nang, Vietnam, 2021, pp. 45-48, doi: 10.1109/ATiGB50996.2021.9423307. keywords: Deep learning; Image seg-mentation; Green buildings; Roads; Semantics; Neural networks; Autonomous auto-mobiles; deep learning; semantic segmentation; computer visions; convolutional neu-ral networks.
  3. Zhao, Heng Shuang, et al.” Pyramid scene parsing network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  4. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. ”Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  5. Shaaban, Aya M., Nancy M. Salem, and Walid I. Al-atabany.” A semantic-based scene segmentation using convolutional neural networks.” AEU-International Journal of Electronics and Communications 125 (2020): 153364.
  6. Luc, Pauline, et al.” Semantic segmentation using adversarial networks.” arXiv preprint arXiv:1611.08408 (2016).
  7. Kim, Dong Seop, et al.” ESSN: Enhanced semantic segmentation network by residual concatenation of feature maps.” IEEE Access 8 (2020): 21363-21379.
  8. Niu, Zhaoyang, Guoqiang Zhong, and Hui Yu. ”A review on the attention mechanism of deep learning.” Neurocomputing 452 (2021): 48-62.
  9. Zhao, Qi, et al.” Semantic segmentation with attention mechanism for remote sensing images.” IEEE Transactions on Geoscience and Remote Sensing 60 (2021): 1-13.
  10. Li, Haifeng, et al.” SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images.” IEEE Geoscience and Remote Sensing Letters 18.5 (2020): 905-909.
  11. Gould, Stephen, Richard Fulton, and Daphne Koller.” Decomposing a scene into geometric and semantically consistent regions.” 2009 IEEE 12th international conference on computer vision. IEEE, 2009.
  12. Zhao, Guangzhe, et al.” Bilateral U-Net semantic segmentation with spatial attention mechanism.” CAAI Transactions on Intelligence Technology 8.2 (2023): 297-307.
  13. Zhao, Qi, et al.” Semantic segmentation with attention mechanism for remote sensing images.” IEEE Transactions on Geoscience and Remote Sensing 60 (2021): 1-13.
  14. Bai, Wei.” An ENet Semantic Segmentation Method Combined with Attention Mechanism.” Computational Intelligence and Neuroscience 2023 (2023).
  15. Li, Hanchao, et al.” Pyramid attention network for semantic segmentation.” arXiv preprint arXiv:1805.10180 (2018).
  16. Sun, Le, et al.” SPANet: Successive pooling attention network for semantic seg-mentation of remote sensing images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15 (2022): 4045-4057.
  17. Jiao, Yijie, et al.” Image semantic segmentation fusion of edge detection and AFF attention mechanism.” Applied Sciences 12.21 (2022): 11248.
  18. Yu, Yang, et al.” Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach.” Engineering Applications of Artificial Intel-ligence 133 (2024): 108261.
  19. Sang, Haiwei, Qiuhao Zhou, and Yong Zhao.” PCANet: Pyramid convolutional attention network for semantic segmentation.” Image and Vision Computing 103 (2020): 103997.
  20. Kim, Dong Seop, Yu Hwan Kim, and Kang Ryoung Park.” Semantic segmentation by multi-scale feature extraction based on grouped dilated convolution module.” Mathematics 9.9 (2021): 947.
  21. Chen, Boyu, et al.” Spatiotemporal convolutional neural network with convolu-tional block attention module for micro-expression recognition.” Information 11.8 (2020): 380.
  22. Minaee, Shervin, et al.” Image segmentation using deep learning: A survey.” IEEE transactions on pattern analysis and machine intelligence 44.7 (2021): 3523-3542.
  23. Nedevschi, Sergiu.” A Critical Evaluation of Aerial Datasets for Semantic Seg-mentation.” 2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 2020.
  24. Ozturk, Ozan, Batuhan Sar?t¨urk, and Dursun Zafer Seker.” Comparison of fully convolutional networks (FCN) and U-Net for road segmentation from high res-olution imageries.” International journal of environment and geoinformatics 7.3 (2020): 272-279.
  25. Chen, Jin, Chuanya Wang, and Ying Tong.” AtICNet: semantic segmentation with atrous spatial pyramid pooling in image cascade network.” EURASIP Journal on Wireless Communications and Networking 2019 (2019): 1-7.
  26. Wang, Zuoshuai, et al.” Multi-scale dense and attention mechanism for image semantic segmentation based on improved DeepLabv3+.” Journal of Electronic Imaging 31.5 (2022): 053006-053006.
  27. Jamali-Rad, H. and Szabo, A., 2021. Lookahead adversarial learning for near real-time semantic segmentation. Computer Vision and Image Understanding, 212, p.103271.
  28. Zhao, Hengshuang, et al.” Psanet: Point-wise spatial attention network for scene parsing.” Proceedings of the European conference on computer vision (ECCV). 2018.
  29. Lin, Guosheng, et al.” Refinenet: Multi-path refinement networks for high-resolution semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  30. Chen, Liang-Chieh, et al.” Deeplab: Semantic image segmentation with deep con-volutional nets, atrous convolution, and fully connected crfs.” IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834-848.

Photo
Devika K. P.
Corresponding author

Department of Electronics and Communication, Vidya Academy of Science and Technology, Technical Campus, Kilimanoor, India.

Photo
Reshmi S. Bhooshan
Co-author

Department of Electronics and Communication, College of Engineering Trivandrum, Trivandrum, India

Nuradeen Abdullahi Yusuf, Danlami Dauda*, Saudat Bello Adamu, Effect of Entrepreneurship Education on Entrepreneurial Motivation Among Students of Federal Polytechnics in North-West Nigeria, Int. J. Sci. R. Tech., 2025, 2 (11), 647-656. https://doi.org/10.5281/zenodo.17682633

More related articles
AI Content Generator SaaS Product Using Next. JS a...
Purushottam Kumar, Shashank Sekhar, Rani Singh, ...
AI-Enhanced CRM With Chatbots For Real Estate Lead...
Dipanjali Shipne, Vaishnavi Mhaske , Aditya Shingane , Rohini Nar...
Design and Fabrication of a Pedal-Driven Rope Twis...
Karthikeyan K., Dins Milton J., Yuvanesh Kumar V., Manikandan M.,...
Related Articles
Neurochemistry of Love: Molecular Mechanisms of Human Attachment and Relationshi...
Deep Jyoti Shah, Abhishek Kumar, Ashish Kumar, Kundan Kumar, Luckey Kumari, Nandini Kumari, Neha Kum...
Assessment of Chest Structures in Smoking vs. Non-Smoking Individuals Using Comp...
Manish Kumar Shukla, Jyoti Yadav, Sandhya Verma, Shubhanshi Rani, Shivam Kumar, ...
Comprehensive Study of Partial Replacement of Cement with Biochar in Concrete...
Dr. Pranab Jyoti Barman, Manash Pratim Deka, Ankita Gogoi, Gyandeep Das, Ritushna Sarmah, Manjit Pat...
Daily Obstacles First- and Second-Year Dental Students Face During their College...
S. Gowtham Raj, M. Hariharan, S. Gopikrishna, C. Selvakumar, R. Kavyapriya, M. Kamali, G. Shruthi Pr...
AI Content Generator SaaS Product Using Next. JS and LLM...
Purushottam Kumar, Shashank Sekhar, Rani Singh, ...
More related articles
AI Content Generator SaaS Product Using Next. JS and LLM...
Purushottam Kumar, Shashank Sekhar, Rani Singh, ...
AI-Enhanced CRM With Chatbots For Real Estate Lead Optimization...
Dipanjali Shipne, Vaishnavi Mhaske , Aditya Shingane , Rohini Narwade, Sarthak Taru, ...
Design and Fabrication of a Pedal-Driven Rope Twisting Machine with Integrated P...
Karthikeyan K., Dins Milton J., Yuvanesh Kumar V., Manikandan M., Annamalai K., Sankarnarayanan V., ...
AI Content Generator SaaS Product Using Next. JS and LLM...
Purushottam Kumar, Shashank Sekhar, Rani Singh, ...
AI-Enhanced CRM With Chatbots For Real Estate Lead Optimization...
Dipanjali Shipne, Vaishnavi Mhaske , Aditya Shingane , Rohini Narwade, Sarthak Taru, ...
Design and Fabrication of a Pedal-Driven Rope Twisting Machine with Integrated P...
Karthikeyan K., Dins Milton J., Yuvanesh Kumar V., Manikandan M., Annamalai K., Sankarnarayanan V., ...