Guest Editorial: Special Issue on "Intelligent-based Image and Vision Computing for Future Video Surveillance"

Xiaoxian Yang,
Jung Yoon Kim,
Walayat Hussain,


The video surveillance applications typically require massive devices with real-time computation capability for monitoring, recognition and data analysis. These tasks bring several challenging issues in developing intelligent solutions for handling large-scale and continuous video data. In this special issue, we selected five papers from submissions. A summary of these papers is outlined below. In the paper entitled “Edge Computing Offloading at Middle-sea Scenario for Maritime Video Surveillances” by Ziyang Gong et al., the authors combine mobile edge computing with abundant node resources, and the network connectivity delay characteristics to handle massive maritime video surveillance information. They establish SSU (single-user single-hop unicast), MSUS1, and MSUS2 (multi-user single-hop unicast models) models based on the middle-sea scenario, and divide the optimization problems into two sub-problems. A binary search method is proposed to optimize the transmission power allocation. The methods of OAAS (offloading algorithm based on alternating selection), OAMOAS (multi-objective alternating selection), and OANR (node redistribution offloading algorithm) are proposed to optimize the offloading decision allocation. The experiment shows that their method outperforms in saving delay. In the paper entitled “Segmentation-Based Decision Networks for Steel Surface Defect Detection” by Zhongqin Bi et al., the authors explore the use of different numbers of labels with various accuracies during training to achieve the maximum detection accuracy with the lowest cost. The proposed method includes the improved segmentation and decision networks. An attention mechanism is integrated into the segmentation subnetwork. The atrous convolutions are used in the segmentation and decision subnetworks. Then, the original loss function is improved. Experiments are carried out on the Severstal Steel Defect dataset. The results show that the detection accuracy is improved by 1% to 2%. In the paper entitled “A Multi-modal Feature Fusion-based Approach for Mobile Application Classification and Recommendation” by Buqing Cao et al., the authors propose a mobile application classification and recommendation method based on multimodal feature fusion. First, the method extracts the image and description features of the mobile application using the TRedBert model, such as the integration of involution residual network and pre-trained language representation model. Second, these features are fused by using the attention mechanism in the transformer model. Then, the classification results of the mobile applications are based on Softmax. Finally, the method extracts the high-order and low-order embedding features of the mobile app with FiBiNET (a bi-linear feature interaction) model, to update the mobile app representation and complete the recommendation task. The experiment results demonstrated that the proposed approach outperforms other methods in terms of F1, Accuracy, AUC, and Log-loss. In the paper entitled “IAMPDNet: Instance-aware and Multi-part Decoupled Network for Joint Detection and Embedding” by Pan Yang et al., the authors point out that intelligent video surveillance methods have been widely investigated to address largescale video data, among which multi-object tracking (MOT) is the most popular method, which aims to track every object appearing in the video for monitoring. For accelerating the inference speed, the method of joint detection and embedding (JDE) has become a new paradigm for MOT. Thus, the authors propose an instance-aware and multi-part decoupled network (IAMPDNet), which can perceive all instances in the environment and extract multi-part features from the instances. IAMPDNet consists of three key modules: a complementary attention module used to perceive all instances in the environment, a feature extraction module used to decouple multi-part features from the instances, and an adaptive aggregation module used to fuse multi-level features of instances. Experiments on MOT benchmarks demonstrate that IAMPDNet achieves higher tracking accuracy and lower identity switches against recent MOT methods. In the paper entitled “Time-based calibration: A way to ensure that stitched images are captured simultaneously” by Ziwei Song et al., the authors propose a multi-source video frames calibration technique based on external information sources for solving the problems of ghosting and cutting when stitching with different video sources. The proposed method calibrates the video stitching by introducing an information source and calculating the time difference between different devices. Experiments show that the error of the calibrated video stitching is less than 33 ms, which guarantees the quality of the spliced video.



Citation Format:
Xiaoxian Yang, Jung Yoon Kim, Walayat Hussain, "Guest Editorial: Special Issue on "Intelligent-based Image and Vision Computing for Future Video Surveillance"," Journal of Internet Technology, vol. 23, no. 6 , pp. 1389-1390, Nov. 2022.

Full Text:



  • There are currently no refbacks.

Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: