Selected Publications
Real-Time Glottis Detection Framework via Spatial-decoupled Feature Learning for Nasal Transnasal Intubation
Arxiv, 2026
Jinyu Liu, Gaoyang Zhang, Yang Zhou, Ruoyi Hao, Yang Zhang*, Hongliang Ren*
Nasotracheal intubation (NTI) is a vital procedure in emergency airway management, where rapid and accurate glottis detection is
essential to ensure patient safety. We propose Mobile GlottisNet, a lightweight and efficient glottis detection
framework designed for real time inference on embedded and edge devices. The model incorporates structural awareness and spatial
alignment mechanisms, enabling robust glottis localization under complex anatomical and visual conditions. We implement a hierarchical
dynamic thresholding strategy to enhance sample assignment, and introduce an adaptive feature decoupling module based on deformable
convolution to support dynamic spatial reconstruction. A cross layer dynamic weighting scheme further facilitates the fusion of
semantic and detail features across multiple scales.
Efficient RGB-D Scene Understanding via Multi-task Adaptive Learning and Cross-dimensional Feature Guidance
Knowledge-Based Systems, 2025
Guodong Sun, Junjie Liu, Gaoyang Zhang, Bo Wu, Yang Zhang*
Scene understanding plays a critical role in enabling intelligence and autonomy in robotic systems. This paper presents an efficient RGB-D scene understanding
model that performs a range of tasks, including semantic segmentation, instance segmentation, orientation estimation, panoptic segmentation, and scene classification.
For semantic segmentation, we introduce normalized focus channel layers and a context feature interaction layer, designed to mitigate issues such
as shallow feature misguidance and insufficient local-global feature representation. The instance segmentation task benefits from a non-bottleneck
1D structure, which achieves superior contour representation with fewer parameters. Additionally, we propose a multi-task adaptive loss function that
dynamically adjusts the learning strategy for different tasks based on scene variations.
Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains
IEEE Transactions on Industrial Electronics, 2024
Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren
Spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) Is designed for visual fault detection of freight trains.
We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and
effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a
dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant
details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance
embedding to model the global spatial and semantic information.
Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness
IEEE Transactions on Industrial Informatics, 2024
Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou
An efficient NAS-based framework for visual fault detection of freight trains is proposed to search for the task-specific detection head
with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive
field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed
search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency.
An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary
IEEE Transactions on Industrial Informatics, 2024
Guodong Sun, Yuting Peng, Le Cheng, Mengya Xu, An Wang, Bo Wu, Hongliang Ren, Yang Zhang*
Due to the homogeneous appearance of the ores, which leads to low contrast and unclear boundaries, accurate segmentation becomes challenging,
and recognition becomes problematic. This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses
on solving the problem of edge burring. Specifically, we introduce a lightweight backbone better suited for efficiently extracting
low-level features. Besides, we design a feature pyramid network consisting of two MLP structures that balance local and global
information thus enhancing detection accuracy. Furthermore, we propose a novel loss function that guides the prediction points to
match the instance edge points to achieve clear object boundaries.
Efficient segmentation with texture in ore images based on box-supervised approach
Engineering Applications of Artificial Intelligence, 2024
Guodong Sun, Delong Huang, Yuting Peng, Le Cheng, Bo Wu, Yang Zhang*
An effective box-supervised technique with texture features is provided for ore image segmentation that can identify complete
and independent ores. Firstly, a ghost feature pyramid network (Ghost-FPN) is proposed to process the features obtained from the backbone to reduce
redundant semantic information and computation generated by complex networks. Then, an optimized detection head is proposed to obtain the feature
to maintain accuracy. Finally, Lab color space (Lab) and local binary patterns (LBP) texture features are combined to form a
fusion feature similarity-based loss function to improve accuracy while incurring no loss.
Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild
Advanced Engineering Informatics, 2023
Yang Zhang, Huilin Pan, Yang Zhou, Mingying Li, and Guodong Sun
This paper proposes a heterogeneous self-distillation framework to ensure detection accuracy and speed while satisfying low resource requirements. The
privileged information in the output feature knowledge can be transferred from the teacher to the student model through distillation to boost performance.
We first adopt a lightweight backbone to extract features and generate a new heterogeneous knowledge neck. Such neck models positional information and
long-range dependencies among channels through parallel encoding to optimize feature extraction capabilities. Then, we utilize the general distribution
to obtain more credible and accurate bounding box estimates. Finally, we employ a novel loss function that makes the network easily concentrate on values
near the label to improve learning efficiency.
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation
IEEE Sensors Journal, 2023
Yang Zhang, Chenyun Xiong, Junjie Liu, Xuhui Ye, Guodong Sun
This paper proposes an efficient lightweight encoderdecoder network that reduces the computational parameters and
guarantees the robustness of the algorithm. Working with channel and spatial fusion attention modules, our
network effectively captures multi-level RGB-D features. A globally guided local affinity context module is
proposed to obtain sufficient high-level context information. The decoder utilizes a lightweight residual unit
that combines short- and long-distance information with a few redundant computations. Experimental results on
NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation
accuracy, inference time, and parameters than the state-of-the-art methods.
Faster OreFSDet: A Lightweight and Effective Few-shot Object Detector for Ore Images
Pattern Recognition, 2023
Yang Zhang, Le Cheng, Yuting Peng, Chengming Xu, Yanwei Fu, Bo Wu, and Guodong Sun
A lightweight and effective few-shot detector is proposed to achieve competitive performance with general object detection with only a few samples
for ore images. First, the proposed support feature mining block characterizes the importance of location information in
support features. Next, the relationship guidance block makes full use of support features to guide the generation of accurate
candidate proposals. Finally, the dual-scale semantic aggregation module retrieves detailed features at different resolutions
to contribute with the prediction process. Our method achieves the smallest model size of 19MB as well as being competitive at 50 FPS
detection speed compared with general object detectors.
Adaptive Fusion Affinity Graph with Noise-free Online Low-rank Representation for Natural Image Segmentation
Pattern Recognition, 2023
Yang Zhang, Moyun Liu, Huiming Zhang, Guodong Sun, and Jingwu He
The proposed adaptive fusion affinity graph (AFA-graph) with noise-free low-rank representation in an online manner for natural image segmentation.
An input image is first over-segmented into superpixels at different scales and then filtered by the proposed improved kernel density
estimation method. Moreover, we select global nodes of these superpixels on the basis of their subspace-preserving presentation, which
reveals the feature distribution of superpixels exactly. To reduce time complexity while improving performance, a sparse representation
of global nodes based on noise-free online lowrank representation is used to obtain a global graph at each scale. The global graph is
finally used to update a local graph which is built upon all superpixels at each scale.
Visual Fault Detection of Multi-scale Key Components in Freight Trains
IEEE Transactions on Industrial Informatics, 2022
Yang Zhang, Yang Zhou, Huilin Pan, Bo Wu, Guodong Sun
Despite the frequently employed methods based on deep learning, these fault detectors are extremely reliant on hardware resources and complex to implement.
In addition, no train fault detectors consider the drop in accuracy induced by scale variation of fault parts. This paper proposes a lightweight
anchor-free framework to solve the above problems. Specifically, to reduce the amount of computation and model size, we introduce a lightweight
backbone and adopt an anchor-free method for localization and regression. To improve detection accuracy for multi-scale parts, we design a feature
pyramid network to generate rectangular layers of different sizes to map parts with similar aspect ratios.
A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains
IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-11
Guodong Sun, Yang Zhou, Huilin Pan, Bo Wu, Ye Hu, Yang Zhang*
A lightweight NMS-free framework is proposed to achieve real-time detection and high accuracy simultaneously. We use a lightweight backbone for
feature extraction and design a fault detection pyramid toprocess features. This fault detection pyramid includes three novel individual
modules using attention mechanism, bottleneck, and dilated convolution for feature enhancement and computation reduction. Instead of using
NMS, we calculate different loss functions, including classification and location costs in the head to reduce computation.