Enhanced Multi-Scale Single-Image Super-Resolution Using Transformer-Integrated Residual Generative Adversarial Networks

Nitin Varshney, Harsh Mathur,

doi:10.5281/zenodo.18899050

Single-image super-resolution (SISR) is one of the foundations of computer vision, which al- lows one to reconstruct high-resolution (HR) image form low-resolution (LR) images, and it is used in applications of surveillance, medical imaging, remote sensing, and multimedia enhancement. Conventional methods, such as residual generative adversarial networks (Res- GANs), have been very successful with respect to perceptual quality, yet fail in dealing with multi-scale upsampling, real-world degradations (e.g. blur, noise, and compression effects), and long-range interactions. To fill these gaps, we introduce TransResGAN-SR, a new frame- work, which applies Transformer modules into residual GAN framework in multi-scale SISR (2 x, 4 x, and 8 x). The generator uses a hybrid residual-Transformer backbone that includes self-attention to learn global contexts and a degradation-aware module that learns adaptive ker- nels on real-world inputs. An advanced instance of the loss of perception that includes LPIPS and diffusion-based priors improves the texture fidelity. As the experiments on different sets of data (Div2K, Set5, Set14, BSD100, and RealSR) reveal, the new approach TransResGAN-SR provides significantly better PSNR gains (up to 1.2 dB) compared to the established approaches such as ESRGAN and Real-ESRGAN, as well as enhanced SSIM, MOS, and perceptual ratings. The work presents SISR in the direction of practical application in a wide range of degradation conditions, which may lead to edge computing integrations.

High-quality images are the main requirement in the age of high-definition digital media and vision systems that are controlled by AI. The problem of low-resolution imagery, which can be caused by the restrictions of the sensor or transmission, or even by the environmental conditions obstructs such tasks as object detection, facial recognition, and semantic segmentation. Single-image super-resolution (SISR) becomes a significant ill-posed inverse problem, which intends to provide missing high-frequency information on a single LR image to generate an approximate HR image. Traditional algorithms such as bicubic interpolation are simple to use when it comes to upscaling, but they present artifacts, such as blurring and aliasing, especially at scaling factors of 4 or higher. Deep learning has transformed SISR, with convolutional neural networks (CNNs) end-to-end trained on pixel-wise losses (e.g., MSE) to produce high PSNR, but visually dull results. One way to address this issue is by using generative adversarial networks (GANs) that incorporate discriminators that impose natural image statistics, as in SRGAN [1] and ESRGAN [2], which do not focus on distortion measures. Nevertheless, the current GAN-based models, such as the ResGAN-SR [3] described by us, have the following

limitations: (i) fixed-scale training does not provide the ability to adapt to multi-scale image scenarios; (ii) CNN-based architecture does not take into account the long-range pixel correlations that are essential to complex textures; (iii) the assumption of ideal bicubic degradation cannot be applicable in reality with composite blurs, noise, and JPEG artifacts; and (iv) similar to adversarial models, training is unstable and generates To overcome such shortcomings, we propose TransResGAN-SR, an improve framework to combine Global attention of Transformer with residual learning in a GAN framework. Key innovations include:

A dual-network generator structure that combines the rest of the block with Swin Trans- former blocks [4] that are effective in multi-scale feature extraction. An adaptive degradation module that employed dynamic kernels that had been learned through meta-networks to address real-world inputs with no corresponding data. A perceptual loss augmented with VGG characteristics [5], adversarial language and diffusion priors [6] to achieve better texture synthesis. Advanced multi-scale training plan of arbitrary upscaling variables. Benchmark dataset quantitative and qualitative testing confirm the effectiveness of TransReGAN-SR, which is superior to state-of-the-art (SOTA) algorithms in terms of distortion (PSNR, SSIM) and perceptual (MOS, LPIPS) measures. This paper does not only resolve the weak- nesses of the previous ResGAN-SR but also applies SISR in resource limited context.

2. Related Work

2.1. Classical and Learning-Based SISR

Humanized Text: The initial SISR was based on interpolation (e.g., bilinear, Lanczos) and reconstruction priors (e.g., sparsity [7]). Learned mappings of LR-HR patches [8] were learned but had a hard time on large scales because of limited expressiveness. The CNNs represented a paradigm shift: SRCNN [9] was the first to introduce an end-to-end learning, and more complex models such as VDSR [10] with global residuals were introduced. EDSR [11] used no batch normalization to be more stable, and obtained SOTA PSNR. Enhanced feature reuse was achieved through dense connections in RDN [12].

2.2. GAN-Driven Advances

The result of MSE optimization produces smooth images; the perceptual losses based on VGG features are consistent with human perception [13]. SRGAN [1] proposed GANs to be realistic, and ESRGAN [2] is improved through relativistic discriminators and perceptual priors. Real- ESRGAN [14] addressed real losses that were not paired and high order losses. The latest GAN variants are DAF-GAN [15] that uses lightweight fusion and DS-GAN [16] with IGMRF priors that are smooth. In the case of remote sensing, FBD-KAN [17] incorporates Kolmogorov- Arnold networks.

2.3. Transformer Integration in SISR

Transformers [18] are good at understanding dependencies through self-attention. Transformer in SR was introduced by IPT [19], and shifted windows were employed by SwinIR [4]. CNNs are hybridized with attention [20]. In GANs, SRTransGAN [21] uses Transformers in the generators and T-GAN [22] is used on medical pictures. SR multi-attention is fused in MAFT [23], and GANs are combined with diffusion in SRDDGAN [24].

2.4. Real-World and Multi-Scale Challenges

RealSR [25] datasets bring out the mismatches of degradation. BSRGAN [26] learns blind kernels. The adaptations of GAN are few, and multi-scale techniques such as MDSR [27] train shared networks. Our model is based on ResGAN-SR [3], which adds Transformers [4,18], degradation modeling [26], and diffusion priors [6,24] to achieve a single multi-scale real- world SISR model.

3. Proposed Method

3.1. Methodological Overview

Single-image super-resolution (SISR) is an ill-posed inverse problem in which one needs lo- cal features modeling and global contextual knowledge to restore high-frequency detail. The traditional convolutional neural networks are mainly based on the local receptive fields that curtail their ability to learn long-range spatial constraints. Deep optimization is made stable by the residual learning, in which reconstruction is reformulated as residual prediction, and global feature interaction is introduced by the Transformer-based attention, consisting of self-attention mechanisms. It is inspired by these values, and the proposed TransResGAN-SR incorporates residual learning, Transformer attention, and adversarial optimization to provide a strong multi- scale super-resolution in the degradations of the real world.

3.2 Problem Formulation

Given a low-resolution (LR) image ILR degraded by an operator D (e.g., blur k, downsampling ↓s, and noise η), the degradation process is modeled as:

ILR = (D(IHR) ↓s) + η (1)

TransResGAN-SR learns a generator G such that:

ISR = G(ILR, s) ≈ IHR (2)

for scaling factors s ∈ {2, 4, 8}, optimizing perceptual fidelity under real degradation D.

3.3. Generator Architecture

The generator is a hybrid residual–Transformer network: Feature Extraction: An initial con- volution layer maps ILR into a 64-channel feature space. Hybrid Trunk: Residual blocks (Conv–ReLU–Conv with skip connections) alternate with Swin Transformer blocks [4]. Shifted- window self-attention reduces computational complexity to O (HW). Each Swin block is de- fined as:

xˆ = MSA(LN(x)) + x (3)

x′ = MLP(LN(xˆ)) + xˆ (4)

where MSA denotes multi-head self-attention. Degradation Module: A meta-network pre- dicts adaptive kernels k from ILR and applies dynamic convolution:

f′ = k ∗ f (5)

Nitin Varshney

Corresponding author

Computer Science Engineering, Ravindranath Tagore University, Bhopal, India

Harsh Mathur

Co-author

Nitin Varshney*, Harsh Mathur, Enhanced Multi-Scale Single-Image Super-Resolution Using Transformer-Integrated Residual Generative Adversarial Networks, Int. J. Sci. R. Tech., 2026, 3 (3), 117-122. https://doi.org/10.5281/zenodo.18899050

View Article

Enhanced Multi-Scale Single-Image Super-Resolution Using Transformer-Integrated Residual Generative Adversarial Networks

Abstract

Keywords

Introduction

Reference

Nitin Varshney

Harsh Mathur

More related articles

Development and Evaluation of Sprayable Nanoemulsi...

Innovative Nanoformulations for Intranasal Therapy...

Semantic Segmentation Using PSP Network with Atten...

View more

Vending App – An Integrated Vending Experience Using POS...

Development And Implementation Of A Custom Port Scanner And Network Mapping Tool...

Research on Sentimental Analysis on Mental Health Using Social Media...

View more

Related Articles

Interpretable Transfer Learning For Multi-Class Skin Disease Classification Usin...

Formulation and Evaluation of Herbal Lip Scrub by Using Beetroot...

Comparative Study of Different Brands of Amoxicillin Using UV Spectroscopy...

Edge Detection Using Fuzzy C-Means: A Comparative Study...

Development and Evaluation of Sprayable Nanoemulsion For Skin Cancer Using 5- Fl...

More related articles

Development and Evaluation of Sprayable Nanoemulsion For Skin Cancer Using 5- Fl...

Innovative Nanoformulations for Intranasal Therapy in Neurodegenerative Disease ...

Semantic Segmentation Using PSP Network with Attention Mechanism...

View more

Development and Evaluation of Sprayable Nanoemulsion For Skin Cancer Using 5- Fl...

Innovative Nanoformulations for Intranasal Therapy in Neurodegenerative Disease ...

Semantic Segmentation Using PSP Network with Attention Mechanism...

View more