• IEEE Xplore Digital Library
  • IEEE Standards
  • IEEE Spectrum
  • Subscribe to Newsletter
  • Resource Center
  • Create Account

What is Signal Processing?

  • Board of Governors
  • Executive Committee
  • Awards Board
  • Conferences Board
  • Membership Board
  • Publications Board
  • Technical Directions Board
  • Standing Committees
  • Liaisons & Representatives
  • Education Board
  • Our Members
  • Society History
  • State of the Society
  • SPS Branding Materials
  • Publications & Resources
  • IEEE Signal Processing Magazine
  • IEEE Journal of Selected Topics in Signal Processing
  • IEEE Signal Processing Letters
  • IEEE/ACM Transactions on Audio Speech and Language Processing
  • IEEE Transactions on Computational Imaging

IEEE Transactions on Image Processing

  • IEEE Transactions on Information Forensics and Security
  • IEEE Transactions on Multimedia
  • IEEE Transactions on Signal and Information Processing over Networks
  • IEEE Transactions on Signal Processing
  • Data & Challenges
  • Submit Manuscript
  • Information for Authors
  • Special Issue Deadlines
  • Overview Articles
  • Top Accessed Articles
  • SPS Newsletter
  • SPS Resource Center
  • Publications Feedback
  • Publications FAQ
  • Dataset Papers
  • Conferences & Events
  • Conferences
  • Attend an Event
  • Conference Call for Papers
  • Calls for Proposals
  • Conference Sponsorship Info
  • Conference Resources
  • SPS Travel Grants
  • Conferences FAQ
  • Getting Involved
  • Young Professionals
  • Our Technical Committees
  • Contact Technical Committee
  • Technical Committees FAQ
  • Data Science Initiative
  • Join Technical Committee
  • Young Professionals Resources
  • Member-driven Initiatives
  • Chapter Locator
  • Award Recipients
  • IEEE Fellows Program
  • Call for Nominations
  • Professional Development
  • Distinguished Lecturers
  • Past Lecturers
  • Nominations
  • DIS Nominations
  • Seasonal Schools
  • Industry Resources
  • Job Submission Form
  • IEEE Training Materials
  • For Volunteers
  • Board Agenda/Minutes
  • Chapter Resources
  • Governance Documents
  • Membership Development Reports
  • TC Best Practices
  • SPS Directory
  • Society FAQ
  • Information for Authors-OJSP

abstract_general.jpg

ieee research papers on image processing

  • (SSP 2025) 2025 IEEE Statistical Signal Processing Workshop

SPM_SI_header_3.jpg

IEEE Signal Processing Magazine Special Issue

IEEE SPM Special Issue on Accelerating Brain Discovery Through Data Science and Neurotechnology

Policy_procedure.jpg.

ieee research papers on image processing

IEEE Policies Related to Election and Campaign Activities

  • Celebrating 75 Years of IEEE SPS
  • Diversity, Equity, and Inclusion
  • General SP Multimedia Content

ieee research papers on image processing

  • Submit a Manuscript
  • Editorial Board Nominations
  • Challenges & Data Collections
  • Publication Guidelines
  • Unified EDICS
  • Signal Processing Magazine The premier publication of the society.
  • SPS Newsletter Monthly updates in Signal Processing
  • SPS Resource Center Online library of tutorials, lectures, and presentations.
  • SigPort Online repository for reports, papers, and more.
  • SPS Feed The latest news, events, and more from the world of Signal Processing.
  • IEEE SP Magazine
  • IEEE SPS Content Gazette
  • IEEE SP Letters
  • IEEE/ACM TASLP
  • All SPS Publications
  • SPS Entrepreneurship Forum
  • Call for Papers
  • Call for Proposals
  • Request Sponsorship
  • Conference Organizer Resources
  • Past Conferences & Events
  • Event Calendar
  • Conferences Meet colleagues and advance your career.
  • Webinars Register for upcoming webinars.
  • Distinguished Lectures Learn from experts in signal processing.
  • Seasonal Schools For graduate students and early stage researchers.
  • All Events Browse all upcoming events.
  • Join SPS The IEEE Signal Processing Magazine, Conference, Discounts, Awards, Collaborations, and more!
  • Chapter Locator Find your local chapter and connect with fellow industry professionals, academics and students
  • Women in Signal Processing Networking and engagement opportunities for women across signal processing disciplines
  • Students Scholarships, conference discounts, travel grants, SP Cup, VIP Cup, 5-MICC
  • Young Professionals Career development opportunities, networking
  • Chapters & Communities
  • Member Advocate
  • Awards & Submit an Award Nomination
  • Volunteer Opportunities
  • Organize Local Initiatives
  • Autonomous Systems Initiative
  • Applied Signal Processing Systems
  • Audio and Acoustic Signal Processing
  • Bio Imaging and Signal Processing
  • Computational Imaging
  • Image Video and Multidimensional Signal Processing
  • Information Forensics and Security
  • Machine Learning for Signal Processing
  • Multimedia Signal Processing
  • Sensor Array and Multichannel
  • Signal Processing for Communication and Networking
  • Signal Processing Theory and Methods
  • Speech and Language Processing
  • Synthetic Aperture Technical Working Group
  • Industry Technical Working Group
  • Integrated Sensing and Communication Technical Working Group
  • TC Affiliate Membership
  • Co-Sponsorship of Non-Conference TC Events
  • Mentoring Experiences for Underrepresented Young Researchers (ME-UYR)
  • Micro Mentoring Experience Program (MiME)
  • Distinguished Lecturer Program
  • Distinguished Lecturer Nominations
  • Distinguished Industry Speaker Program
  • Distinguished Industry Speakers
  • Distinguished Industry Speaker Nominations
  • Jobs in Signal Processing: IEEE Job Site
  • SPS Education Program Educational content in signal processing and related fields.
  • Distinguished Lecturer Program Chapters have access to educators and authors in the fields of Signal Processing
  • PROGRESS Initiative Promoting diversity in the field of signal processing.
  • Job Opportunities Signal Processing and Technical Committee specific job opportunities
  • Job Submission Form Employers may submit opportunities in the area of Signal Processing.
  • Technical Committee Best Practices
  • Conflict of Interest
  • Policy and Procedures Manual
  • Constitution
  • Board Agenda/Minutes* Agendas, minutes and supporting documentation for Board and Committee Members
  • SPS Directory* Directory of volunteers, society and division directory for Board and Committee Members.
  • Membership Development Reports* Insight into the Society’s month-over-month and year-over-year growths and declines for Board and Committee Members

Popular Pages

  • (ISBI 2025) 2025 IEEE International Symposium on Biomedical Imaging
  • Signal Processing 101
  • (MLSP 2024) 2024 IEEE International Workshop on Machine Learning for Signal Processing
  • Call for Proposals: IEEE ASRU 2025
  • Information for Authors-SPL

Last viewed:

  • Statistical Principles of Time Reversal
  • Signal Processing Cup
  • (ICME 2025) 2025 IEEE International Conference on Multimedia and Expo
  • (SAM 2024) 2024 IEEE 13th Sensor Array and Multichannel Signal Processing Workshop
  • IEEE JSTSP Special Issue on Neural Speech and Audio Coding
  • Bayes’ Rule Using Imprecise Probabilities
  • Publications
  • Guidelines for Reviewers

Search form

You are here.

  • Challenges & Data Collections Members
  • Data Challenges
  • Dataset Resources
  • Challenge Papers
  • Editorial Board
  • Table of Contents
  • Early Access

For Authors

  • IEEE Author Center
  • IEEE Copyright Form

Submit a Manuscript 

Editorial Board Nominations 

ieee research papers on image processing

dataport_5.jpg

IEEE DataPort

nominate_3.jpg

Nominate

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine 2. Signal Processing Digital Library* 3. Inside Signal Processing Newsletter 4. SPS Resource Center 5. Career advancement & recognition 6. Discounts on conferences and publications 7. Professional networking 8. Communities for students, young professionals, and women 9. Volunteer opportunities 10. Coming soon! PDH/CEU credits Click here to learn more .

ieee research papers on image processing

The IEEE Transactions on Image Processing covers novel theory, algorithms, and architectures for the formation, capture, processing, communication, analysis, and display of images, video, and multidimensional signals in a wide variety of applications. Topics of interest include, but are not limited to, the mathematical, statistical, and perceptual modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Applications of interest include image and video communications, electronic imaging, biomedical imaging, image and video systems, and remote sensing.

Reproducible research

The Transactions encourages authors to make their publications reproducible by making all information needed to reproduce the presented results available online. This typically requires publishing the code and data used to produce the publication`s figures and tables on a website; see the supplemental materials section of the information for authors . It gives other researchers easier access to the work, and facilitates fair comparisons.

Multimedia content

It is now possible to submit for review and publish in Xplore supporting multimedia material such as speech samples, images, movies, matlab code etc. A multimedia graphical abstract can also be displayed along with the traditional text. More information is available under Multimedia Materials at the IEEE Author Center .

ieee research papers on image processing

TIP Volume 33 | 2024

A study of subjective and objective quality assessment of hdr videos.

As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content.

Robust Remote Photoplethysmography Estimation With Environmental Noise Disentanglement

Remote Photoplethysmography (rPPG) has been attracting increasing attention due to its potential in a wide range of application scenarios such as physical training, clinical monitoring, and face anti-spoofing. On top of conventional solutions, deep-learning approach starts to dominate in rPPG estimation and achieves top-level performance.

A Discrete-Mapping-Based Cross-Component Prediction Paradigm for Screen Content Coding

Cross-component prediction is an important intra-prediction tool in the modern video coders. Existing prediction methods to exploit cross-component correlation include cross-component linear model and its extension of multi-model linear model. These models are designed for camera captured content. For screen content coding, where videos exhibit different signal characteristics, a cross-component prediction model tailored to their characteristics is desirable.

Dynamic Dense Graph Convolutional Network for Skeleton-Based Human Motion Prediction

Graph Convolutional Networks (GCN) which typically follows a neural message passing framework to model dependencies among skeletal joints has achieved high success in skeleton-based human motion prediction task. Nevertheless, how to construct a graph from a skeleton sequence and how to perform message passing on the graph are still open problems, which severely affect the performance of GCN.

  • THIS WEEK: Join NASA’s Dr. Jacqueline Le Moigne as she shares her journey through academia, the private sector, and pivotal roles at NASA, emphasizing her work in signal processing, computer vision, and related technologies. Register now! https://x.com/IEEEsps/status/1785057479606288505
  • Join NASA’s Dr. Jacqueline Le Moigne for this interactive webinar as she shares her journey through the realms of signal processing, computer vision, and related technologies, including her pivotal roles at NASA. https://x.com/IEEEsps/status/1782468413551423536
  • Great crowd at the Student Job Fair at #ICASSP2024! Thank you to our sponsors for furnishing an exciting, engaging event! https://x.com/IEEEsps/status/1780817453569687559
  • Thank you to our Women in Signal Processing Luncheon panelists for their wisdom and insights during today’s event at #ICASSP2024! https://x.com/IEEEsps/status/1780458637338530252
  • Free Machine Learning (ML) Lecture Series from IEEE SPS From basics to recent advances, unlock the secrets of ML with Prof. Sergios Theodoridis! https://x.com/IEEEsps/status/1779931297093222415

IEEE SPS Educational Resources

ieee research papers on image processing

Home  |   Sitemap  |   Contact  |   Accessibility  |   Nondiscrimination Policy  |   IEEE Ethics Reporting  |   IEEE Privacy Policy  |   Terms  |   Feedback

© Copyright 2024 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions . A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Subscribe to the PwC Newsletter

Join the community, search results for author: ieee, found 70 papers, 12 papers with code, variational neuron shifting for few-shot image classification across domains.

no code implementations • journal 2024 • Liyun Zuo , Baoyan Wang , Lei Zhang , Jun Xu , Member , IEEE , and Xiantong Zhen

Existing meta-learning models learn the ability of learning good representation or model parameters, in order to adapt to new tasks with a few training samples.

ieee research papers on image processing

Event-Triggered Tracking Control for Nonlinear Systems With Prescribed Performance

no code implementations • IEEE TRANSACTIONS ON SYSTEMS 2024 • Ruihang Ji , Shuzhi Sam Ge , Kai Zhao , Member , and Haizhou Li , Fellow , IEEE

Abstract—This article addresses the entry capture problem (ECP) of uncertain nonlinear systems under asymmetric performance constraints.

Instance Paradigm Contrastive Learning for Domain Generalization

no code implementations • IEEE Transactions on Circuits and Systems for Video Technology 2024 • Zining Chen , Weiqiu Wang , Zhicheng Zhao , Fei Su , Member , IEEE , Aidong Men , and Yuan Dong

In this paper, we propose an instance paradigm contrastive learning framework, introducing contrast between original features and novel paradigms to alleviate domain-specific distractions.

ieee research papers on image processing

An Ultralightweight Hybrid CNN Based on Redundancy Removal for Hyperspectral Image Classification

no code implementations • IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024 • Xiaohu Ma , Wuli Wang , Member , IEEE

Simultaneously, for PW-Conv, we design a spectral convolution with redundancy removal (R2Spectral-Conv).

ieee research papers on image processing

Meta Reinforcement Learning for Multi-Task Offloading in Vehicular Edge Computing

no code implementations • TMC 2024 • Penglin Dai , Yaorong Huang , Kaiwen Hu , Xiao Wu , Huanlai Xing , and Zhaofei Yu , Member , IEEE

The objective is to design a unified solution to minimize task execution time under different MTO scenarios.

ieee research papers on image processing

Ultra-Robust Real-Time Estimation of Gait Phase

no code implementations • IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023 • Mohammad Shushtari , Hannah Dinovitzer , Jiacheng Weng , and Arash Arami , Member , IEEE

The estimator is finally tested on a participant walking with an active exoskeleton, demonstrating the robustness of D67 in interaction with an exoskeleton without being trained on any data from the test subject with or without an exoskeleton.

Interaction-Aware Planning With Deep Inverse Reinforcement Learning for Human-Like Autonomous Driving in Merge Scenarios

1 code implementation • journal 2023 • Jiangfeng Nan , Weiwen Deng , Member , IEEE , Ruzheng Zhang , Ying Wang , Rui Zhao , Juan Ding

To consider the interaction factor, the reward function for planning is utilized to evaluate the joint trajectories of the autonomous driving vehicle (ADV) and traffic vehicles.

ieee research papers on image processing

Seismic Random Noise Attenuation Based on Non-IID Pixel-Wise Gaussian Noise Modeling

1 code implementation • IEEE Transactions on Geoscience and Remote Sensing 2023 • Chuangji Meng , Jinghuai Gao , Member , IEEE , Yajun Tian , Zhiqiang Wang

Thus, our proposed framework called VI-Non-IID inclines to have better noise characterization and generalization capabilities, which brings better performance on seismic field NA.

ieee research papers on image processing

Spoof Trace Disentanglement for generic face antispoofing

no code implementations • journal 2023 • Yaojie Liu and Xiaoming Liu , Member , IEEE

Yet, it is a challenging task due to the diversity of spoof attacks and the lack of ground truth for spoof traces.

ieee research papers on image processing

Bio-Inspired Feature Selection in Brain Disease Detection via an Improved Sparrow Search Algorithm

no code implementations • IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022 • Wenyu Yu , Hui Kang , Geng Sun , Member , Shuang Liang , and Jiahui Li , Student Member , IEEE

Finally, the proposed ISSA is utilized to solve the objective function.

ieee research papers on image processing

VCI-LSTM: Vector Choquet Integral-based Long Short-Term Memory

no code implementations • IEEE 2022 • Mikel Ferrero-Jaurrieta , Zdenko Taka ́cˇ , Javier Ferna ́ndez , Member , IEEE , Lˇubom ́ıra Horanska ́ , Grac ̧aliz Pereira Dimuro , Susana Montes , Irene D ́ıaz and Humberto Bustince , Fellow , IEEE.

Choquet integral is a widely used aggregation oper- ator on one-dimensional and interval-valued information, since it is able to take into account the possible interaction among data.

ieee research papers on image processing

Lightweight Deep Neural Network for Joint Learning of Underwater Object Detection and Color Conversion

no code implementations • journal 2022 • Chia-Hung Yeh , Chu-Han Lin , Li-Wei Kang , Member , Chih-Hsiang Huang , Min-Hui Lin , Chuan-Yu Chang , and Chua-Chin Wang , Senior Member , IEEE

Li-Wei Kang is with the Department of Electrical Engineering, National Taiwan Normal University, Taipei 106, Taiwan (e-mail: lwkang@ntnu. edu. tw).

Topology Change Aware Data-Driven Probabilistic Distribution State Estimation Based on Gaussian Process

no code implementations • IEEE Transactions on Smart Grid 2022 • Di Cao , Member , Junbo Zhao , Weihao Hu , Senior Member , Qishu Liao , Qi Huang , Zhe Chen , Fellow , IEEE

Abstract—This paper addresses the distribution system state estimation (DSSE) with unknown topology change.

ieee research papers on image processing

STMGCN: Mobile Edge Computing-Empowered Vessel Trajectory Prediction Using Spatio-Temporal Multigraph Convolutional Network

no code implementations • IEEE Transactions on Industrial Informatics 2022 • Ryan Wen Liu , Maohan Liang , Jiangtian Nie , Yanli Yuan , Zehui Xiong , Member , IEEE , Han Yu

—The revolutionary advances in machine learning and data mining techniques have contributed greatly to the rapid developments of maritime Internet of Things (IoT).

Coverage Control Algorithm for DSNs Based on Improved Gravitational Search

no code implementations • IEEE Sensors Journal 2022 • Yindi Yao , Huanmin Liao , Xiong Li , Student Member , IEEE , Feng Zhao , Xuan Yang , and Shanshan Hu

—In directional sensor networks (DSNs), coverage control is an important way to ensure efficient communication and reliable data transmission.

High-order Correlation Preserved Incomplete Multi-view Subspace Clustering

3 code implementations • IEEE Transactions on Image Processing 2022 • Zhenglai Li , Chang Tang , Xiao Zheng , Xinwang Liu , Senior Member , Wei zhang , Member , IEEE , and En Zhu

Specifically, multiple affinity matrices constructed from the incomplete multi-view data are treated as a thirdorder low rank tensor with a tensor factorization regularization which preserves the high-order view correlation and sample correlation.

ieee research papers on image processing

A GAN-Based Short-Term Link Traffic Prediction Approach for Urban Road Networks Under a Parallel Learning Framework

no code implementations • IEEE Transactions on Intelligent Transportation Systems 2022 • Junchen Jin , Member , IEEE , Dingding Rong , Tong Zhang , Qingyuan Ji , Haifeng Guo , Yisheng Lv , Xiaoliang Ma , and Fei-Yue Wang

This paper proposes a short-term traffic speed prediction approach, called PL-WGAN, for urban road networks, which is considered an important part of a novel parallel learning framework for traffic control and operation.

ieee research papers on image processing

Shallow Network Based on Depthwise Over-Parameterized Convolution for Hyperspectral Image Classification

no code implementations • 1 Dec 2021 • Hongmin Gao , Member , Zhonghao Chen , Student Member , IEEE , Chenming Li

Therefore, this letter proposes a shallow model for HSIC, which is called depthwise over-parameterized convolutional neural network (DOCNN).

Distributed Differential Evolution Based on Adaptive Mergence and Split for Large-Scale Optimization

1 code implementation • IEEE Transactions on Evolutionary Computation 2021 • Yinglan Feng , Liang Feng , Senior Member , Sam Kwong , and Kay Chen Tan , Fellow , IEEE

In this way, the number of subpopulations is adaptively adjusted and better performing subpopulations obtain more individuals.

Double Deep Q-learning Based Real-Time Optimization Strategy for Microgrids

no code implementations • 27 Jul 2021 • Hang Shuai , Xiaomeng Ai , Jiakun Fang , Wei Yao , Senior Member , Jinyu Wen , Member , IEEE

It is challenging to solve this kind of stochastic nonlinear optimization problem.

ieee research papers on image processing

A Novel Deep Learning Method for Thermal to Annotated Thermal-Optical Fused Images

no code implementations • 13 Jul 2021 • Suranjan Goswami , IEEE Student Member , Satish Kumar Singh , Senior Member , Bidyut B. Chaudhuri , Life Fellow , IEEE

As a part of this work, we also present a new and unique database for obtaining the region of interest in thermal images based on an existing thermal visual paired database, containing the Region of Interest on 5 different classes of data.

Deep Learning Based Autonomous Vehicle Super Resolution DOA Estimation for Safety Driving

no code implementations • IEEE Transactions on Intelligent Transportation Systems 2021 • Liangtian Wan , Yuchen Sun , Lu Sun , Member , Zhaolong Ning , Senior Member , and Joel J. P. C. Rodrigues , Fellow , IEEE

Abstract— In this paper, a novel system architecture including a massive multi-input multi-output (MIMO) or a reconfigurable intelligent surface (RIS) and multiple autonomous vehicles is considered in vehicle location systems.

ieee research papers on image processing

Content-Preserving Image Stitching with Piecewise Rectangular Boundary Constraints

no code implementations • IEEE Transactions on Visualization and Computer Graphics 2021 • Yun Zhang , Yu-Kun Lai , and Fang-Lue Zhang , Member , IEEE

By analyzing the irregular boundary, we construct a piecewise rectangular boundary.

ieee research papers on image processing

Deep Reinforcement Learning Based Optimization for IRS Based UAV-NOMA Downlink Networks

no code implementations • 17 Jun 2021 • Shiyu Jiao , Ximing Xie , Zhiguo Ding , Fellow , IEEE

This paper investigates the application of deep deterministic policy gradient (DDPG) to intelligent reflecting surface (IRS) based unmanned aerial vehicles (UAV) assisted non-orthogonal multiple access (NOMA) downlink networks.

Detailed Primary and Secondary Distribution System Model Enhancement Using AMI Data

no code implementations • 29 May 2021 • Karen Montano-Martinez , Sushrut Thakar , Shanshan Ma , Zahra Soltani , Student Member , Vijay Vittal , Life Fellow , Mojdeh Khorsand , Raja Ayyanar , Senior Member , Cynthia Rojas , Member , IEEE

Reliable and accurate distribution system modeling, including the secondary network, is essential in examining distribution system performance with high penetration of distributed energy resources (DERs).

ieee research papers on image processing

Context-aware taxi dispatching at city-scale using deep reinforcement learning

no code implementations • IEEE Transactions on Intelligent Transportation Systems 2021 • Zhidan Liu , Jiangzhou Li , and Kaishun Wu , Member , IEEE

Abstract— Proactive taxi dispatching is of great importance to balance taxi demand-supply gaps among different locations in a city.

Low-Complexity Symbol Detection and Interference Cancellation for OTFS System

no code implementations • 期刊 2021 • Huiyang Qu , Guanghui Liu , Lei Zhang , Shan Wen , Graduate Student Member , and Muhammad Ali Imran , Senior Member , IEEE

Orthogonal time frequency space (OTFS) is a two-dimensional modulation scheme realized in the delay- Doppler domain, which targets the robust wireless transmissions in high-mobility environments.

Multi-Scale and Multi-Direction GAN for CNN-Based Single Palm-V ein Identification

no code implementations • IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2021 • Huafeng Qin , Mounim A. El-Y acoubi , Y a n t a o L i , Member , IEEE , and Chongwen Liu

Despite recent advances of deep neural networks in hand vein identification, the existing solutions assume the availability of a large and rich set of training image samples.

Joint Trajectory and Power Allocation Design for Secure Artificial Noise Aided UAV Communications

no code implementations • journals 2021 • Milad Tatar Mamaghani , Graduate Student Member , and Yi Hong , Senior Member , IEEE

This paper investigates an average secrecy rate (ASR) maximization problem for an unmanned aerial vehicle (UAV) enabled wireless communication system, wherein a UAV is employed to deliver confidential information to a ground destination in the presence of a terrestrial passive eavesdropper.

A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

no code implementations • journal 2021 • Weiwei Shan , Minhao Yang , Tao Wang , Yicheng Lu , Hao Cai , Lixuan Zhu , Jiaming Xu , Chengjun Wu , Longxing Shi , Senior Member , and Jun Yang , Member , IEEE

We propose a sub-µW always-ON keyword spotting (µKWS) chip for audio wake-up systems.

ieee research papers on image processing

Data-Driven Assisted Chance-Constrained Energy and Reserve Scheduling with Wind Curtailment

no code implementations • 2 Nov 2020 • Xingyu Lei , Student Member , Zhifang Yang , Member , Junbo Zhao , Juan Yu , Senior Member , IEEE

Case studies performed on the PJM 5-bus and IEEE 118-bus systems demonstrate that the proposed method is capable of accurately accounting the influence of wind curtailment dispatch in CCO.

Systems and Control Systems and Control

CRPN-SFNet: A High-Performance Object Detector on Large-Scale Remote Sensing Images

no code implementations • 28 Oct 2020 • QiFeng Lin , Jianhui Zhao , Gang Fu , and Zhiyong Yuan , Member , IEEE

Extensive experiments on the public Dataset for Object deTection in Aerial images data set indicate that our CRPN can help our detector deal the larger image faster with the limited GPU memory; meanwhile, the SFNet is beneficial to achieve more accurate detection of geospatial objects with wide-scale range.

Frame-wise Cross-modal Matching for Video Moment Retrieval

1 code implementation • 22 Sep 2020 • Haoyu Tang , Jihua Zhu , Meng Liu , Member , IEEE , Zan Gao , Zhiyong Cheng

Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy.

ieee research papers on image processing

Attention Transfer Network for Nature Image Matting

1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2020 • Fenfen Zhou , Yingjie Tian , Member , IEEE , and Zhiquan Qi

Then, we introduce a scale transfer block to magnify the feature maps without adding extra information.

ieee research papers on image processing

A New Multiple Source Domain Adaptation Fault Diagnosis Method between Different Rotating Machines

no code implementations • TRANSACTIONS ON INDUSTRIAL INFORMA TICS 2020 • un Zhu , Nan Chen , Member , IEEE , and Changqing Shen

To solve this issue, transfer learning is proposed by leveraging knowl- edge learned from source domain to target domain.

ieee research papers on image processing

Learning Person Re-identification Models from Videos with Weak Supervision

no code implementations • 21 Jul 2020 • Xueping Wang , Sujoy Paul , Dripta S. Raychaudhuri , Min Liu , Yaonan Wang , Amit K. Roy-Chowdhury , Fellow , IEEE

In order to cope with this issue, we introduce the problem of learning person re-identification models from videos with weak supervision.

Obstacle Avoidance and Tracking Control of Redundant Robotic Manipulator: An RNN-Based Metaheuristic Approach

no code implementations • IEEE Transactions on Industrial Informatics 2020 • Ameer Hamza Khan , Student Member , Shuai Li , and Xin Luo , Senior Member , IEEE

In this article, we present a metaheuristic-based control framework, called beetle antennae olfactory recurrent neural network, for simultaneous tracking control and obstacle avoidance of a redundant manipulator.

Edge server deployment scheme of blockchain in IoVs

no code implementations • 16 Jun 2020 • Liya Xu , Mingzhu Ge , Weili Wu , Member , IEEE

In fact, the application of blockchain in IoVs can be implemented by employing edge computing.

Service Provisioning Framework for RAN Slicing: User Admissibility, Slice Association and Bandwidth Allocation

no code implementations • IEEE Transactions on Mobile Computing 2020 • Yao Sun , Shuang Qin , Member , Gang Feng , Lei Zhang , and Muhammad Ali Imran , SeniorMember , IEEE

Network slicing (NS) has been identified as one of the most promising architectural technologies for future mobile network systems to meet the extremely diversified service requirements of users.

A Simplified 2D-3D CNN Architecture for Hyperspectral Image Classification Based on Spatial–Spectral Fusion

no code implementations • 5 Jun 2020 • Chunyan Yu , Rui Han , Meiping Song , Caiyu Liu , and Chein-I Chang , Life Fellow , IEEE

Abstract—Convolutional neural networks (CNN) have led to a successful breakthrough for hyperspectral image classification (HSIC).

ieee research papers on image processing

Decision Fusion in Space-Time Spreading aided Distributed MIMO WSNs

no code implementations • 16 May 2020 • I. Dey , H. Joshi , Member , N. Marchetti , Senior Member , IEEE

In this letter, we propose space-time spreading (STS) of local sensor decisions before reporting them over a wireless multiple access channel (MAC), in order to achieve flexible balance between diversity and multiplexing gain as well as eliminate any chance of intrinsic interference inherent in MAC scenarios.

Energy-Efficient Over-the-Air Computation Scheme for Densely Deployed IoT Networks

no code implementations • IEEE 2020 • Semiha Tedik Basaran , Student Member , Gunes Karabulut Kurt , and Periklis Chatzimisios , Senior Member , IEEE

The proposed MMSE estimator provides a signif- icant mean squared error improvement with reducing en- ergy consumption compared to the conventional estimator.

A Lightweight and Privacy-Preserving Authentication Protocol for Mobile Edge Computing

no code implementations • 27 Feb 2020 • Kuljeet Kaur∗ , Sahil Garg∗ , Georges Kaddoum∗ , Member , Mohsen Guizani† , Fellow , IEEE , and Dushantha Nalin K. Jayakody‡ , Senior Member , IEEE.

With the advent of the Internet-of-Things (IoT), vehicular networks and cyber-physical systems, the need for realtime data processing and analysis has emerged as an essential pre-requite for customers’ satisfaction.

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning

1 code implementation • 24 Feb 2020 • Chongwen Huang , Member , IEEE , Ronghong Mo , Chau Yuen , Senior Member

In this paper, we investigate the joint design of transmit beamforming matrix at the base station and the phase shift matrix at the RIS, by leveraging recent advances in deep reinforcement learning (DRL).

Reinforcement Learning Tracking Control for Robotic Manipulator With Kernel-Based Dynamic Model

no code implementations • TRANSACTION 2020 • Yazhou Hu , Wenxue Wang , Hao liu , and Lianqing Liu , Member , IEEE

In this algorithm, a reward function is defined according to the features of tracking control in order to speed up the learning process, and then an RL tracking controller with a kernel-based transition dynamic model is proposed.

Broad Learning System Based on Maximum Correntropy Criterion

no code implementations • 24 Dec 2019 • Yunfei Zheng , Badong Chen , Shiyuan Wang , Senior Member , Weiqun Wang , Member , IEEE

As an effective and efficient discriminative learning method, Broad Learning System (BLS) has received increasing attention due to its outstanding performance in various regression and classification problems.

ieee research papers on image processing

Localization and Clustering Based on Swarm Intelligence in UAV Networks for Emergency Communications

no code implementations • IEEE Internet of Things Journal 2019 • Muhammad Yeasir Arafat , Sangman Moh , Member , IEEE

Second, we propose an energy-efficient swarm-intelligence-based clustering (SIC) algorithm based on PSO, in which the particle fitness function is exploited for inter-cluster distance, intra-cluster distance, residual energy, and geographic location.

VSSA-NET: Vertical Spatial Sequence Attention Network for Traffic Sign Detection

no code implementations • 5 May 2019 • Yuan Yuan , Zhitong Xiong , Student Member , Qi. Wang , Senior Member , IEEE

Our contributions are as follows: 1) We propose a multi-resolution feature fusion network architecture which exploits densely connected deconvolution layers with skip connections, and can learn more effective features for the small size object; 2) We frame the traffic sign detection as a spatial sequence classification and regression task, and propose a vertical spatial sequence attention (VSSA) module to gain more context information for better detection performance.

ieee research papers on image processing

GETNET: A General End-to-end Two-dimensional CNN Framework for Hyperspectral Image Change Detection

1 code implementation • 5 May 2019 • Qi. Wang , Senior Member , Zhenghang Yuan , Qian Du , Xuelong. Li , Fellow , IEEE

In order to better handle high dimension problem and explore abundance information, this paper presents a General End-to-end Two-dimensional CNN (GETNET) framework for hyperspectral image change detection (HSI-CD).

ieee research papers on image processing

Discrete-Time Impulsive Adaptive Dynamic Programming

no code implementations • IEEE Transactions on Cybernetics 2019 • Qinglai Wei , Ruizhuo Song , Member , IEEE , Zehua Liao , Benkai Li , and Frank L. Lewis

Abstract—In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems.

Generalization of the Dark Channel Prior for Single Image Restoration

no code implementations • IEEE Transactions on Image Processing 2019 • Yan-Tsung Peng , Keming Cao , and Pamela C. Cosman , Fellow , IEEE

Abstract— Images degraded by light scattering and absorption, such as hazy, sandstorm, and underwater images, often suffer color distortion and low contrast because of light traveling through turbid media.

A MIP Model for Risk Constrained Switch Placement in Distribution Networks

no code implementations • IEEE 2019 • Milad Izadi , Student Member , IEEE and Amir Safdarian , Member , IEEE

The model is applied to the RBTS-Bus4 and a real distribution network.

PEA265: Perceptual Assessment of Video Compression Artifacts

no code implementations • 1 Mar 2019 • Liqun Lin , Shiqi Yu , Tiesong Zhao , Member , Zhou Wang , Fellow , IEEE

To monitor and improve visual QoE, it is crucial to develop subjective and objective measures that can identify and quantify various types of PEAs.

ieee research papers on image processing

A Provably Secure and Efficient Identity-Based Anonymous Authentication Scheme for Mobile Edge Computing

no code implementations • 22 Feb 2019 • Xiaoying Jia,Debiao He , Neeraj Kumar , and Kim-Kwang Raymond Choo , Senior Member , IEEE

Mobile edge computing (MEC) allows one to overcome a number of limitations inherent in cloud computing, although achieving the broad range of security requirements in MEC settings remains challenging.

Location-Centered House Price Prediction: A Multi-Task Learning Approach

no code implementations • 7 Jan 2019 • Guangliang Gao , Zhifeng Bao , Jie Cao , A. K. Qin , Timos Sellis , Fellow , IEEE , Zhiang Wu

Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire house data for modeling, or split the entire data and model each partition independently.

ieee research papers on image processing

DATS: Dispersive Stable Task Scheduling in Heterogeneous Fog Networks

no code implementations • Conference 2018 • Zening Liu , Xiumei Yang , Yang Yang , Kunlun Wang , and Guoqiang Mao , Fellow , IEEE

Abstract—Fog computing has risen as a promising architecture for future Internet of Things (IoT), 5G and embedded artificial intelligence (AI) applications with stringent service delay requirements along the cloud to things continuum.

Blockchain for Secure and Efficient Data Sharing in Vehicular Edge Computing and Networks

no code implementations • IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO. 3 2018 • Jiawen Kang , Rong Y u , Xumin Huang , Maoqiang Wu , Sabita Maharjan , Member , Shengli Xie , and Y an Zhang , Senior Member , IEEE

Due to limited resources with vehicles, vehicular edge computing and networks (VECONs) i. e., the integration of mobile edge computing and vehicular networks, can provide powerful computing and massive storage resources.

Optimal Training for Residual Self-Interference for Full-Duplex One-Way Relays

no code implementations • 13 Aug 2018 • Xiaofeng Li , Cihan Tepedelenlio˘glu , and Habib ¸Senol , Member , IEEE

For the former, we propose a training scheme to estimate the overall channel, and for the latter the CRB and the optimal number of relays are derived when the distance between the source and the destination is fixed.

Medical Image Synthesis with Deep Convolutional Adversarial Networks

1 code implementation • IEEE Transactions on Biomedical Engineering 2018 • Dong Nie , Roger Trullo , Jun Lian , Li Wang , Caroline Petitjean , Su Ruan , Qian Wang , and Dinggang Shen , Fellow , IEEE

To better model a nonlinear mapping from source to target and to produce more realistic target images, we propose to use the adversarial learning strategy to better model the FCN.

ieee research papers on image processing

Single Image Dehazing Using Color Ellipsoid Prior

1 code implementation • IEEE Transactions on Image Processing 2018 • Trung Minh Bui , Student Member , and Wonha Kim , Senior Member , IEEE

The proposed method constructs color ellipsoids that are statistically fitted to haze pixel clusters in RGB space and then calculates the transmission values through color ellipsoid geometry.

Significantly Fast and Robust Fuzzy C-MeansClustering Algorithm Based on MorphologicalReconstruction and Membership Filtering

no code implementations • IEEE 2018 • Tao Lei , Xiaohong Jia , Yanning Zhang , Lifeng He , Hongy-ing Meng , Senior Member , and Asoke K. Nandi , Fellow , IEEE

However, the introduction oflocal spatial information often leads to a high computationalcomplexity, arising out of an iterative calculation of the distancebetween pixels within local spatial neighbors and clusteringcenters.

An Integrated Platform for Live 3D Human Reconstruction and Motion Capturing

no code implementations • 8 Dec 2017 • Dimitrios S. Alexiadis , Anargyros Chatzitofis , Nikolaos Zioulis , Olga Zoidi , Georgios Louizis , Dimitrios Zarpalas , Petros Daras , Senior Member , IEEE

The latest developments in 3D capturing, processing, and rendering provide means to unlock novel 3D application pathways.

ieee research papers on image processing

Robust Single Image Super-Resolution via Deep Networks With Sparse Prior

1 code implementation • journals 2016 • Ding Liu , Zhaowen Wang , Bihan Wen , Student Member , Jianchao Yang , Member , Wei Han , and Thomas S. Huang , Fellow , IEEE

We demonstrate that a sparse coding model particularly designed for SR can be incarnated as a neural network with the merit of end-to-end optimization over training data.

ieee research papers on image processing

A Decentralized Cooperative Control Scheme With Obstacle Avoidance for a Team of Mobile Robots

no code implementations • journal 2013 • Hamed Rezaee , Student Member , and Farzaneh Abdollahi , Member , IEEE

The problem of formation control of a team of mobile robots based on the virtual and behavioral structures is considered in this paper.

A Grid-Based Evolutionary Algorithm for Many-Objective Optimization

1 code implementation • IEEE Transactions on Evolutionary Computation 2013 • Shengxiang Yang , Member , IEEE , Miqing Li , Xiaohui Liu , and Jinhua Zheng

Balancing convergence and diversity plays a key role in evolutionary multiobjective optimization (EMO).

Physiological Parameter Monitoring from Optical Recordings with a Mobile Phone

no code implementations • 29 Jul 2011 • Christopher G. Scully , Student Member , Jinseok Lee , Joseph Meyer , Alexander M. Gorbach , Domhnull Granquist-Fraser , Yitzhak Mendelson , Member , and Ki H. Chon , Senior Member , IEEE

We show that a mobile phone can serve as an accurate monitor for several physiological variables, based on its ability to record and analyze the varying color signals of a fingertip placed in contact with its optical sensor.

ieee research papers on image processing

Performance Analysis of Two Hop Amplify-and-Forward Systems with Interference at the Relay

no code implementations • journal 2010 • Himal A. Suraweera , Member , HariK.Garg , and A. Nallanathan , Senior Member , IEEE

Abstract—We analyze the performance of a two hop channel state information (CSI)-assisted amplify-and-forward system, with co-channel interference at the relay.

Efficiently Indexing Large Sparse Graphs for Similarity Search

no code implementations • 18 Feb 2010 • Guoren Wang , Bin Wang , Xiaochun Yang , IEEE Computer Society , and Ge Yu , Member , IEEE

Abstract—The graph structure is a very important means to model schemaless data with complicated structures, such as protein- protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks.

ANALYSIS OF CALIBRATED SEA CLUTTER AND BOAT REFLECTIVITY DATA AT C- AND X-BAND IN SOUTH AFRICAN COASTAL WATERS

no code implementations • IEEE 2007 • Ron Rubinstein , Member , Tomer Peleg , Student Member , and Michael Elad , Fellow , IEEE

Abstract—The synthesis-based sparse representation model for signals has drawn considerable interest in the past decade.

Parameter-free Geometric Document Layout Analysis

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2001 • Seong-Whan Lee , Senior Member , IEEE , and Dae-Seok Ryu

Based on the proposed periodicity measure, multiscale analysis, and confirmation procedure, we could develop a robust method for geometric document layout analysis independent of character font sizes, text line spacing, and document layout structures.

ieee research papers on image processing

2024 IEEE International Conference on Image Processing

  • Organizing Committee
  • Conference Policies
  • Editorial Policies
  • Important Dates
  • Call for Volunteers
  • Call for papers
  • Call for Tutorials
  • Call for Special Sessions
  • Call for Challenge Proposal
  • Call for Workshops
  • Call for Workshop Papers
  • Submit a Regular Paper
  • Submit a Paper to the OJSP Track
  • Submit a Paper to the “Datasets and Benchmarks” Track
  • Request ICIP Presentation of an SPS Journal Paper
  • Presentation Guidelines
  • Registration
  • Approval Request
  • Program at a Glance
  • Technical Program
  • Plenary Speakers
  • Special Sessions
  • MVI Tournament
  • Micro Mentoring Experience (MiME)
  • PROGRESS Workshop
  • Student Job Fair and Luncheon
  • Women in Signal Processing Luncheon
  • Young Professionals Networking
  • Plan your future with IEEE SPS
  • Welcome Reception
  • Banquet Dinner
  • Closing Ceremony
  • Travel Information
  • Travel Guide
  • Venue Layout
  • Useful Information
  • Patrons & Sponsors

Datasets and Benchmarks Track at ICIP 2024

We are thrilled to announce ICIP dataset and benchmark track. High-quality, publicly available images and videos datasets are critical for advancing the field of image processing, and we seek to provide researchers with a diverse collection of datasets that can be routinely used to test, benchmark, and improve the overall performance of image processing methods and algorithms. We encourage researchers from all fields to submit their datasets and be part of this exciting track. This track serves as a venue for high-quality publications on highly valuable images and videos datasets and benchmarks, as well as a forum for discussions on how to improve dataset development.

Submissions to the track will be  part of the main ICIP conference , presented alongside the main conference papers. Accepted papers will be  officially published in the ICIP proceedings and follow the same deadlines as regular papers.  Make sure you choose the “ Submit to Datasets and Benchmarks Track ” button on the paper submission site .

We are aiming for an  equally stringent review  as the main conference, yet better suited to datasets and benchmarks. Submissions to this track will be  reviewed according to a set of criteria and best practices specifically designed for datasets and benchmarks . A key criterion is accessibility: datasets should be  available and accessible , i.e. the data can be found and obtained without a personal request to the PI, and any required code should be open source. Next to a scientific paper, authors should also submit supplementary materials such as detail on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained.

The factors that will be considered when evaluating papers include:

  • Utility and quality of the submission:  Impact, originality, novelty, relevance to the ICIP community will all be considered.
  • Reproducibility: All submissions should be accompanied by sufficient information to reproduce the results described i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented. We encourage the use of a reproducibility framework such as the  Research Reproducibility  standards to guarantee that all results can be easily reproduced. Benchmark submissions in particular should take care to ensure sufficient details are provided to ensure reproducibility. If submissions include code, please refer to the  IEEE  guidelines .
  • Ethics: Any ethical implications of the work should be addressed. Authors should rely on the  IEEE Digital Ethics and Privacy Technology Guidelines. 
  • Completeness of the relevant documentation:  datasets must be accompanied by documentation communicating the details of the dataset as part of their submissions. Sufficient detail must be provided on how the data was collected and organized, what kind of information it contains, ethically and responsibly, and how it will be made available and maintained.
  • Licensing and access:  authors should provide licenses for any datasets released. These should consider the intended use and limitations of the dataset, and develop licenses and terms of use to prevent misuse or inappropriate use.
  • Consent and privacy: datasets should minimize the exposure of any personally identifiable information, unless informed consent from those individuals is provided to do so. Any paper that chooses to create a dataset with real data of real people should ask for the explicit consent of participants, or explain why they were unable to do so.
  • Ethics and responsible use: Any ethical implications of new datasets should be addressed and guidelines for responsible use should be provided where appropriate. Note that, if your submission includes publicly available datasets (e.g. as part of a larger benchmark), you should also check these datasets for ethical issues. You remain responsible for the ethical implications of including existing datasets or other data sources in your work.
  • Legal compliance: For datasets, authors should ensure awareness and compliance with regional legal requirements.

This track welcomes all work on data-centric image processing research, covering images, videos, and 3D datasets and benchmarks as well as algorithms, tools, methods, and analyses for working with visual data. This includes but is not limited to:

  • New datasets, or carefully and thoughtfully designed (collections of) datasets based on previously available data.
  • Data generators.
  • Data-centric image processing methods and tools.
  • Advanced practices in data collection and curation that are of general interest even if the data itself cannot be shared.
  • Frameworks for responsible dataset development, audits of existing datasets, identifying significant problems with existing datasets and their use
  • Benchmarks on new or existing datasets, as well as benchmarking tools.
  • In-depth analyses of image processing challenges and competitions (by organizers and/or participants) that yield important new insight.
  • Systematic analyses of existing systems on novel datasets yielding important new insight.

Submission: 

There will be  one deadline  for all papers including those submitted to this track. Submitted papers follow the same format and page limitation as a regular paper.  Supplementary materials are strongly encouraged. Submission introducing new datasets must include the following in the supplementary materials:

  • Dataset documentation and intended uses. Recommended documentation frameworks
  • URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers.
  • Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.
  • Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.
  • Links to access the dataset and its metadata. This can be hidden upon submission if the dataset is not yet publicly available but must be added in the camera-ready version. Reviewers must have access to the data. Simulation environments should link to open source code repositories.
  • The dataset itself should ideally use an open and widely used data format. Provide a detailed explanation on how the dataset can be read. For simulation environments, use existing frameworks or explain how they can be used.
  • Long-term preservation: It must be clear that the dataset will be available for a long time, by uploading to a data repository.
  • Explicit license: Authors must choose a license, ideally a CC license for datasets, or an open source license for code. An overview of licenses can be found here:  https://paperswithcode.com/datasets/license
  • Add structured metadata to a dataset’s meta-data page using Web standards. This allows it to be discovered and organized by anyone. A guide can be found here:  https://developers.google.com/search/docs/data-types/dataset . If you use an existing data repository, this is often done automatically.
  • Highly recommended: a persistent dereferenceable identifier (e.g. a  DOI minted by a data repository or a prefix on  identifiers.org ) for datasets, or a code repository (e.g. GitHub, GitLab,…) for code. If this is not possible or useful, please explain why.
  • For benchmarks, the supplementary materials must ensure that all results are easily reproducible. Where possible, use a reproducibility framework such as the IEEE  Research Reproducibility standards, or otherwise guarantee that all results can be easily reproduced, i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented.
  • For papers introducing evaluation and new perspectives of existing datasets, the above supplementary materials are required.
  • For papers introducing best practices in creating or curating datasets and benchmarks, the above supplementary materials are not required.

Advertisement

Advertisement

Deep learning models for digital image processing: a review

  • Published: 07 January 2024
  • Volume 57 , article number  11 , ( 2024 )

Cite this article

ieee research papers on image processing

  • R. Archana 1 &
  • P. S. Eliahim Jeevaraj 1  

11k Accesses

12 Citations

Explore all metrics

Within the domain of image processing, a wide array of methodologies is dedicated to tasks including denoising, enhancement, segmentation, feature extraction, and classification. These techniques collectively address the challenges and opportunities posed by different aspects of image analysis and manipulation, enabling applications across various fields. Each of these methodologies contributes to refining our understanding of images, extracting essential information, and making informed decisions based on visual data. Traditional image processing methods and Deep Learning (DL) models represent two distinct approaches to tackling image analysis tasks. Traditional methods often rely on handcrafted algorithms and heuristics, involving a series of predefined steps to process images. DL models learn feature representations directly from data, allowing them to automatically extract intricate features that traditional methods might miss. In denoising, techniques like Self2Self NN, Denoising CNNs, DFT-Net, and MPR-CNN stand out, offering reduced noise while grappling with challenges of data augmentation and parameter tuning. Image enhancement, facilitated by approaches such as R2R and LE-net, showcases potential for refining visual quality, though complexities in real-world scenes and authenticity persist. Segmentation techniques, including PSPNet and Mask-RCNN, exhibit precision in object isolation, while handling complexities like overlapping objects and robustness concerns. For feature extraction, methods like CNN and HLF-DIP showcase the role of automated recognition in uncovering image attributes, with trade-offs in interpretability and complexity. Classification techniques span from Residual Networks to CNN-LSTM, spotlighting their potential in precise categorization despite challenges in computational demands and interpretability. This review offers a comprehensive understanding of the strengths and limitations across methodologies, paving the way for informed decisions in practical applications. As the field evolves, addressing challenges like computational resources and robustness remains pivotal in maximizing the potential of image processing techniques.

Similar content being viewed by others

ieee research papers on image processing

Image denoising in the deep learning era

ieee research papers on image processing

Impact of Deep Learning in Image Processing and Computer Vision

ieee research papers on image processing

Deep Learning for Image Classification: A Review

Explore related subjects.

  • Artificial Intelligence
  • Medical Imaging

Avoid common mistakes on your manuscript.

1 Introduction

Image Processing (IP) stands as a multifaceted field encompassing a range of methodologies dedicated to gleaning valuable insights from images. Concurrently, the landscape of Artificial Intelligence (AI) has burgeoned into an expansive realm of exploration, serving as the conduit through which intelligent machines strive to replicate human cognitive capacities. Within the expansive domain of AI, Machine Learning (ML) emerges as a pivotal subset, empowering models to autonomously extrapolate outcomes from structured datasets, effectively diminishing the need for explicit human intervention in the decision-making process. At the heart of ML lies Deep Learning (DL), a subset that transcends conventional techniques, particularly in handling unstructured data. DL boasts an unparalleled potential for achieving remarkable accuracy, at times even exceeding human-level performance. This prowess, however, hinges on the availability of copious data to train intricate neural network architectures, characterized by their multilayered composition. Unlike their traditional counterparts, DL models exhibit an innate aptitude for feature extraction, a task that historically posed challenges. This proficiency can be attributed to the architecture's capacity to inherently discern pertinent features, bypassing the need for explicit feature engineering. Rooted in the aspiration to emulate cognitive processes, DL strives to engineer learning algorithms that faithfully mirror the intricacies of the human brain. In this paper, a diverse range of deep learning methodologies, contributed by various researchers, is elucidated within the context of Image Processing (IP) techniques.

This comprehensive compendium delves into the diverse and intricate landscape of Image Processing (IP) techniques, encapsulating the domains of image restoration, enhancement, segmentation, feature extraction, and classification. Each domain serves as a cornerstone in the realm of visual data manipulation, contributing to the refinement, understanding, and utilization of images across a plethora of applications.

Image restoration techniques constitute a critical first step in rectifying image degradation and distortion. These methods, encompassing denoising, deblurring, and inpainting, work tirelessly to reverse the effects of blurring, noise, and other forms of corruption. By restoring clarity and accuracy, these techniques lay the groundwork for subsequent analyses and interpretations, essential in fields like medical imaging, surveillance, and more.

The purview extends to image enhancement, where the focus shifts to elevating image quality through an assortment of adjustments. Techniques that manipulate contrast, brightness, sharpness, and other attributes enhance visual interpretability. This enhancement process, applied across diverse domains, empowers professionals to glean finer details, facilitating informed decision-making and improved analysis.

The exploration further extends to image segmentation, a pivotal process for breaking down images into meaningful regions. Techniques such as clustering and semantic segmentation aid in the discernment of distinct entities within images. The significance of image segmentation is particularly pronounced in applications like object detection, tracking, and scene understanding, where it serves as the backbone of accurate identification and analysis.

Feature extraction emerges as a fundamental aspect of image analysis, entailing the identification of crucial attributes that pave the way for subsequent investigations. While traditional methods often struggle to encapsulate intricate attributes, deep learning techniques excel in autonomously recognizing complex features, contributing to a deeper understanding of images and enhancing subsequent analysis.

Image classification, a quintessential task in the realm of visual data analysis, holds prominence. This process involves assigning labels to images based on their content, playing a pivotal role in areas such as object recognition and medical diagnosis. Both machine learning and deep learning techniques are harnessed to automate the accurate categorization of images, enabling efficient and effective decision-making.

The Sect.  1 elaborates the insights of the image processing operations. In Sect.  2 of this paper, a comprehensive overview of the evaluation metrics employed for various image processing operations is provided. Moving to Sect.  3 , an in-depth exploration unfolds concerning the diverse range of Deep Learning (DL) models specifically tailored for image preprocessing tasks. Within Sect.  4 , a thorough examination ensues, outlining the array of DL methods harnessed for image segmentation tasks, unraveling their techniques and applications.

Venturing into Sect.  5 , a meticulous dissection is conducted, illuminating DL strategies for feature extraction, elucidating their significance and effectiveness. In Sect.  6 , the spotlight shifts to DL models designed for the intricate task of image classification, delving into their architecture and performance characteristics. The significance of each models are discussed in Sect.  7 . Concluding this comprehensive analysis, Sect.  8 encapsulates the synthesized findings and key takeaways, consolidating the insights gleaned from the study.

The array of papers discussed in this paper collectively present a panorama of DL methodologies spanning various application domains. Notably, these domains encompass medical imagery, satellite imagery, botanical studies involving flower images, as well as fruit images, and even real-time image scenarios. Each domain's unique challenges and intricacies are met with tailored DL approaches, underscoring the adaptability and potency of these methods across diverse real-world contexts.

2 Metrics for image processing operations

Evaluation metrics serve as pivotal tools in the assessment of the efficacy and impact of diverse image processing techniques. These metrics serve the essential purpose of furnishing quantitative measurements that empower researchers and practitioners to undertake an unbiased analysis and facilitate meaningful comparisons among the outcomes yielded by distinct methods. By employing these metrics, the intricate and often subjective realm of image processing can be rendered more objective, leading to informed decisions and advancements in the field.

2.1 Metrics for image preprocessing

2.1.1 mean squared error (mse).

The average of the squared differences between predicted and actual values. It penalizes larger errors more heavily.

where, M and N are the dimensions of the image. \({Original}_{(i,j)}\,and\, {Denoised}_{(i,j)}\) are the pixel values at position (i, j) in the original and denoised images respectively.

2.1.2 Peak signal-to-noise ratio (PSNR)

PSNR is commonly used to measure the quality of restored images. It compares the original and restored images by considering the mean squared error between their pixel values.

where, MAX is the maximum possible pixel value (255 for 8-bit images), MSE is the mean squared error between the original and denoised images.

2.1.3 Structural similarity index (SSIM)

SSIM is applicable to image restoration as well. It assesses the similarity between the original and restored images in terms of luminance, contrast, and structure. Higher SSIM values indicate better restoration quality.

\({SSIM}_{\left(x,y\right)}=\left(2*{\mu }_{x }*{\mu }_{y }+{c}_{1}\right)*(2*{\sigma }_{xy }+{c}_{2})/({\mu }_{x}^{2}+{\mu }_{y}^{2}+{c}_{1})*({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{c}_{2}\) ).where, \({\mu }_{x }and {\mu }_{y}\) are the mean values of the original and denoised images. \({\sigma }_{x}^{2} and {\sigma }_{y}^{2}\) are the variances of the original and denoised images. \({\sigma }_{xy}\) is the covariance between the original and denoised images. \({c}_{1}{ and c}_{2}\) are constants to avoid division by zero.

2.1.4 Mean structural similarity index (MSSIM)

MSSIM extends SSIM to multiple patches of the image and calculates the mean SSIM value over those patches.

where x i and y i are the patches of the original and enhanced images.

2.1.5 Mean absolute error (MAE)

The average of the absolute differences between predicted and actual values. It provides a more robust measure against outliers.

where n is the number of samples.

2.1.6 NIQE (Naturalness image quality evaluator)

NIQE quantifies the naturalness of an image by measuring the deviation of local statistics from natural images. It calculates the mean of the local differences in luminance and contrast.

2.1.7 FID (Fréchet inception distance)

FID measures the distance between two distributions (real and generated images) using the Fréchet distance between their feature representations calculated by a pre-trained neural network.

2.2 Metrics for image segmentation

2.2.1 intersection over union (iou).

IoU measures the overlap between the predicted bounding box and the ground truth bounding box. Commonly used to evaluate object detection models.

2.2.2 Average precision (AP)

AP measures the precision at different recall levels and computes the area under the precision-recall curve. Used to assess object detection and instance segmentation models.

2.2.3 Dice similarity coefficient

The Dice similarity coefficient is another measure of similarity between the predicted segmentation and ground truth. It considers both false positives and false negatives.

The Dice Similarity Coefficient, also known as the Sørensen-Dice coefficient, is a common metric for evaluating the similarity between two sets. In the context of image segmentation, it quantifies the overlap between the predicted segmentation and the ground truth, taking into account both true positives and false positives. DSC ranges from 0 to 1, where higher values indicate better overlap between the predicted and ground truth segmentations. A DSC of 1 corresponds to a perfect match.

2.2.4 Average accuracy (AA)

Average Accuracy measures the overall accuracy of the segmentation by calculating the percentage of correctly classified pixels across all classes.

where, N is the number of classes. True Positives i and True Negativesi are the true positives and true negatives for class ii. Total Pixels i is the total number of pixels in class.

2.3 Metrics for feature extraction and classification

2.3.1 accuracy.

The ratio of correctly predicted instances to the total number of instances. It's commonly used for balanced datasets but can be misleading for imbalanced datasets.

2.3.2 Precision

The ratio of true positive predictions to the total number of positive predictions. It measures the model’s ability to avoid false positives.

2.3.3 Recall (Sensitivity or true positive rate)

The ratio of true positive predictions to the total number of actual positive instances. It measures the model’s ability to correctly identify positive instances.

2.3.4 F1-Score

The harmonic mean of precision and recall. It provides a balanced measure between precision and recall.

2.3.5 Specificity (True negative rate)

The ratio of true negative predictions to the total number of actual negative instances.

2.3.6 ROC curve (Receiver operating characteristic curve )

A graphical representation of the trade-off between true positive rate and false positive rate as the classification threshold varies. These metrics are commonly used in binary classification. The ROC curve plots this trade-off, and AUC summarizes the curve's performance.

3 Image preprocessing

Image preprocessing is a fundamental step in the field of image processing that involves a series of operations aimed at preparing raw or unprocessed images for further analysis, interpretation, or manipulation. This crucial phase helps enhance the quality of images, mitigate noise, correct anomalies, and extract relevant information, ultimately leading to more accurate and reliable results in subsequent tasks such as image analysis, recognition, and classification.

Image preprocessing is broadly categorized into image restoration which removes the noises and blurring in the images and image enhancement which improves the contrast, brightness and details of the images.

3.1 Image restoration

Image restoration serves as a pivotal process aimed at reclaiming the integrity and visual quality of images that have undergone degradation or distortion. Its objective is to transform a degraded image into a cleaner, more accurate representation, thereby revealing concealed details that may have been obscured. This process is particularly vital in scenarios where images have been compromised due to factors like digital image acquisition issues or post-processing procedures such as compression and transmission. By rectifying these issues, image restoration contributes to enhancing the interpretability and utility of visual data.

A notable adversary in the pursuit of pristine images is noise, an unintended variation in pixel values that introduces unwanted artifacts and can lead to the loss of important information. Different types of noise, such as Gaussian noise characterized by its random distribution, salt and pepper noise causing sporadic bright and dark pixels, and speckle noise resulting from interference, can mar the quality of images. These disturbances often originate from the acquisition process or subsequent manipulations of the image data.

Historically, traditional image restoration techniques have included an array of methods to mitigate the effects of degradation and noise. These techniques encompass constrained least square filters, blind deconvolution methods that aim to reverse the blurring effects, Weiner and inverse filters for enhancing signal-to-noise ratios, as well as Adaptive Mean, Order Static, and Alpha-trimmed mean filters that tailor filtering strategies based on the local pixel distribution. Additionally, algorithms dedicated to deblurring counteract motion or optical-induced blurriness, restoring sharpness. Denoising techniques (Tian et al. 2018 ; Peng et al. March 2020 ; Tian and Fei 2020 ) such as Total Variation Denoising (TVD) and Non-Local Means (NLM) further contribute by effectively reducing random noise while preserving essential image details, collectively advancing the field's capacity to improve image integrity and visual clarity. In Table 1 , a summary of deep learning models for image restoration is provided, including their respective advantages and disadvantages.

Recent advancements in deep learning, particularly through Convolutional Neural Networks (CNN), have revolutionized the field of image restoration. CNNs are adept at learning and extracting complex features from images, allowing them to recognize patterns and nuances that may be challenging for traditional methods to discern. Through extensive training on large datasets, these networks can significantly enhance the quality of restored images, often surpassing the capabilities of conventional techniques. This leap in performance is attributed to the network's ability to implicitly understand the underlying structures of images and infer optimal restoration strategies.

Chunwei Tiana et al. (Tian and Fei 2020 ) provided an overview of deep network utilization in denoising images to eliminate Gaussian noise. They explored deep learning techniques for various noisy tasks, including additive white noisy images, blind denoising, and real noisy images. Through benchmark dataset analysis, they assessed the denoising outcomes, efficiency, and visual effects of distinct networks, followed by cross-comparisons of different image denoising methods against diverse types of noise. They concluded by addressing the challenges encountered by deep learning in image denoising.

Quan et al. ( 2020 ) introduced a self-supervised deep learning method named Self2Self for image denoising. Their study demonstrated that the denoising neural network trained with the Self2Self scheme outperformed non-learning-based denoisers and single-image-learning denoisers.

Yan et al. ( 2020 ) proposed a novel technique for removing speckle noise in digital holographic speckle pattern interferometry (DHSPI) wrapped phase. Their method employed improved denoising convolutional neural networks (DnCNNs) and evaluated noise reduction using Mean Squared Error (MSE) comparisons between noisy and denoised data.

Sori et al. ( 2021 ) presented lung cancer detection from denoised Computed Tomography images using a two-path convolutional neural network (CNN). They employed the denoised image by DR-Net as input for lung cancer detection, achieving superior results in accuracy, sensitivity, and specificity compared to recent approaches.

Pang et al. ( 2021 ) implemented an unsupervised deep learning method for denoising using unmatched noisy images, with a loss function analogous to supervised training. Their model, based on the Additive White Gaussian Noise model, attained competitive outcomes against unsupervised methods.

Hasti and Shin ( 2022 ) proposed a deep learning approach to denoise fuel spray images derived from Mie scattering and droplet center detection. A comprehensive comparison of diverse algorithms—standard CNN, modified ResNet, and modified U-Net—revealed the superior performance of the modified U-Net architecture in terms of Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR).

Niresi and Chi et al. ( 2022 ) employed an unsupervised HSI denoising algorithm under the DIP framework, which minimized the Half-Quadratic Lagrange Function (HLF) without regularizers, effectively removing mixed types of noises like Gaussian noise and sparse noise while preserving edges. Zhou et al. ( 2022 ) introduced a novel bearing fault diagnosis model called deep network-based sparse denoising (DNSD). They addressed the challenges faced by traditional sparse theory algorithms, demonstrating that DNSD overcomes issues related to generalization, parameter adjustment, and data-driven complexity. Tawfik et al. ( 2022 ) conducted a comprehensive evaluation of image denoising techniques, categorizing them as traditional (user-based) non-learnable denoising filters and DL-based methods. They introduced semi-supervised denoising models and employed qualitative and quantitative assessments to compare denoising performance. Meng and Zhang et al. ( 2022 ) proposed a gray image denoising method utilizing a constructed symmetric and dilated convolutional residual network. Their technique not only effectively removed noise in high-noise settings but also achieved higher SSIM, PSNR, FOM, and improved visual effects, offering valuable data for subsequent applications like target detection, recognition, and tracking.

In essence, image restoration encapsulates a continuous endeavor to salvage and improve the visual fidelity of images marred by degradation and noise. As technology advances, the integration of deep learning methodologies promises to propel this field forward, ushering in new standards of image quality and accuracy.

3.2 Image enhancement

Image enhancement refers to the process of manipulating an image to improve its visual quality and interpretability for human perception. This technique involves various adjustments that aim to reveal hidden details, enhance contrast, and sharpen edges, ultimately resulting in an image that is clearer and more suitable for analysis or presentation. The goal of image enhancement is to make the features within an image more prominent and recognizable, often by adjusting brightness, contrast, color balance, and other visual attributes.

Standard image enhancement methods encompass a range of techniques, including histogram matching to adjust the pixel intensity distribution, contrast-limited adaptive histogram equalization (CLAHE) to enhance local contrast, and filters like the Wiener filter and median filter to reduce noise. Linear contrast adjustment and unsharp mask filtering are also commonly employed to boost image clarity and sharpness.

In recent years, deep learning methods have emerged as a powerful approach for image enhancement. These techniques leverage large datasets and complex neural network architectures to learn patterns and features within images, enabling them to restore and enhance images with impressive results. Researchers have explored various deep learning models for image enhancement, each with its strengths and limitations. These insights are summarized in Table 2 .

The study encompasses an array of innovative techniques, including the integration of Retinex theory and deep image priors in the Novel RetinexDIP method, robustness-enhancing Fuzzy operation to mitigate overfitting, and the fusion of established techniques like Unsharp Masking, High-Frequency Emphasis Filtering, and CLAHE with EfficientNet-B4, ResNet-50, and ResNet-18 architectures to bolster generalization and robustness. Among these, FCNN Mean Filter exhibits computational efficiency, while CV-CNN leverages the capabilities of complex-valued convolutional networks. Additionally, the versatile pix2pixHD framework and the swift convergence of LE-net (Light Enhancement Net) contribute to the discourse. Deep Convolutional Neural Networks demonstrate robust enhancements, yet require meticulous hyperparameter tuning. Finally, MSSNet-WS (Multi-Scale-Stage Network) efficiently converges and addresses overfitting. This analysis systematically highlights their merits, encompassing improved convergence rates, overfitting mitigation, robustness, and computational efficiency.

Gao et al. ( 2022 ) proposed an inventive approach for enhancing low-light images by leveraging Retinex decomposition after initial denoising. In their method, the Retinex decomposition technique was applied to restore brightness and contrast, resulting in images that are clearer and more visually interpretable. Notably, their method underwent rigorous comparison with several other techniques, including LIME, NPE, SRIE, KinD, Zero-DCE, and RetinexDIP, showcasing its superior ability to enhance image quality while preserving image resolution and minimizing memory usage (Tables  1 , 2 , 3 , 4 and 5 ).

Liu et al. ( 2019 ) explored the application of deep learning in iris recognition, utilizing Fuzzy-CNN (F-CNN) and F-Capsule models. What sets their approach apart is the integration of Gaussian and triangular fuzzy filters, a novel enhancement step that contributes to improving the clarity of iris images. The significance lies in the method’s practicality, as it smoothly integrates with existing networks, offering a seamless upgrade to the recognition process.

Munadi et al. ( 2020 ) combined deep learning techniques with image enhancement methodologies to tackle tuberculosis (TB) image classification. Their innovative approach involved utilizing Unsharp Masking (UM) and High-Frequency Emphasis Filtering (HEF) in conjunction with EfficientNet-B4, ResNet-50, and ResNet-18 models. By evaluating the performance of three image enhancement algorithms, their work demonstrated remarkable accuracy and Area Under Curve (AUC) scores, revealing the potential of their method for accurate TB image diagnosis.

Lu et al. ( 2021 ) introduced a novel application of deep learning, particularly the use of a fully connected neural network (FCNN), to address impulse noise in degraded images with varying noise densities. What's noteworthy about their approach is the development of an FCNN mean filter that outperformed traditional mean/median filters, especially when handling low-noise density environments. Their study thus highlights the promising capabilities of deep learning in noise reduction scenarios. Quan et al. ( 2020 ) presented a non-blind image deblurring technique employing complex-valued CNN (CV-CNN). The uniqueness of their approach lies in incorporating Gabor-domain denoising as a prior step in the deconvolution model. By evaluating their model using quantitative metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM), their work showcased effective deblurring outcomes, reaffirming the potential of complex-valued CNNs in image restoration.

Jin et al. ( 2021 ) harnessed the power of deep learning, specifically the pix2pixHD model, to enhance multidetector computed tomography (MDCT) images. Their focus was on accurately measuring vertebral bone structure. By utilizing MDCT images, their approach demonstrated the potential of deep learning techniques in precisely enhancing complex medical images, which can play a pivotal role in accurate clinical assessments.

Li et al. ( 2021a ) introduced a CNN-based LE-net tailored for image recovery in low-light conditions, catering to applications like driver assistance systems and connected autonomous vehicles (CAV). Their work highlighted the significance of their model in outperforming traditional approaches and even other deep learning models. The research underscores the importance of tailored solutions for specific real-world scenarios.

Mehranian et al. ( 2022 ) ventured into the realm of Time-of-Flight (ToF) enhancement in positron emission tomography (PET) images using deep convolutional neural networks. Their innovative use of the block-sequential-regularized-expectation–maximization (BSREM) algorithm for PET data reconstruction in combination with DL-ToF(M) demonstrated superior diagnostic performance, measured through metrics like SSIM and Fréchet Inception Distance (FID).

Kim et al. ( 2022 ) introduced the Multi-Scale-Stage Network (MSSNet), a pioneering deep learning-based approach for single image deblurring. What sets their work apart is their meticulous analysis of previous deep learning-based coarse-to-fine approaches, leading to the creation of a network that achieves state-of-the-art performance in terms of image quality, network size, and computation time.

In the core, image enhancement plays a crucial role in improving the visual quality of images, whether for human perception or subsequent analytical tasks. The combination of traditional methods and cutting-edge deep learning techniques continues to advance our ability to reveal and amplify important information within images. Each of these studies contributes to the expanding landscape of image enhancement and restoration, showcasing the immense potential of deep learning techniques in various domains, from medical imaging to low-light scenarios, while addressing specific challenges and advancing the state-of-the-art in their respective fields.

However, the study recognizes inherent limitations, including constrained adaptability, potential loss of intricate details, and challenges posed by complex scenes or real-world images. Through a meticulous exploration of these advantages and disadvantages, the study endeavors to offer a nuanced perspective on the diverse applicability of these methodologies across various image enhancement scenarios.

4 Image segmentation

Image segmentation is a pivotal process that involves breaking down an image into distinct segments based on certain discernible characteristics such as intensity, color, texture, or spatial proximity. This technique is classified into two primary categories: Semantic segmentation and Instance segmentation. Semantic segmentation assigns each pixel to a specific class within the input image, enabling the identification of distinct object regions. On the other hand, instance segmentation takes a step further by not only categorizing pixels into classes but also differentiating individual instances of those classes within the image.

Traditional segmentation methodologies entail the partitioning of data, such as images, into well-defined segments governed by predetermined criteria. This approach predates the era of deep learning and relies on techniques rooted in expert-designed features or domain-specific knowledge. Common techniques encompass thresholding, which categorizes pixels into object and background regions using specific intensity thresholds, region-based segmentation that clusters pixels with similar attributes into coherent regions, and edge detection to identify significant intensity transitions that might signify potential boundaries.Nonetheless, traditional segmentation techniques grapple with inherent complexities when it comes to handling intricate shapes, dynamic backgrounds, and noise within the data. Moreover, the manual craftsmanship of features for various scenarios can be laborious and might not extend well to different contexts. In contrast, deep learning has ushered in a paradigm shift in segmentation by introducing automated feature learning. Deep neural networks have the remarkable ability to extract intricate features directly from raw data, negating the necessity for manual feature engineering. This empowers them to capture nuanced spatial relationships and adapt to variations, effectively addressing the limitations inherent in traditional methods. This transformation, especially pronounced in image segmentation tasks, has opened doors to unprecedented possibilities in the field of computer vision and image analysis. Table 3 encapsulates the strengths and limitations of various explored deep learning models.

Ahmed et al. ( 2020 ) conducted a comprehensive exploration of deep learning-based semantic segmentation models for the challenging task of top-view multiple person segmentation. They assessed the performance of key models, including Fully Convolutional Neural Network (FCN), U-Net, and DeepLabV3. This investigation is particularly important as accurate segmentation of multiple individuals in top-view images holds significance in various applications like surveillance, crowd monitoring, and human–computer interaction. The researchers found that DeepLabV3 and U-Net outperformed FCN in terms of accuracy. These models achieved impressive accuracy and mean Intersection over Union (mIoU) scores, indicating the precision of segmentation, with DeepLabV3 and U-Net leading the way. The results underscore the value of utilizing advanced deep learning models for complex segmentation tasks involving multiple subjects.

Wang et al. ( 2020 ) proposed an adaptive segmentation algorithm employing the UNet structure, which is adept at segmenting both shallow and deep features. Their study addressed the challenge of segmenting complex boundaries within images, a crucial task in numerous medical imaging and computer vision applications. They validated their model's effectiveness on natural scene images and liver cancer CT images, highlighting its advantages over existing segmentation methods. This research contributes to the field by showcasing the potential of adaptive segmentation algorithms, emphasizing their superiority in handling intricate boundaries in diverse image datasets.

Ahammad et al. ( 2020 ) introduced a novel deep learning framework based on Convolutional Neural Networks (CNNs) for diagnosing Spinal Cord Injury (SCI) features through segmentation. This study's significance lies in its application to medical imaging, specifically spinal cord disease prediction. Their model’s high computational efficiency and remarkable accuracy underscore its potential clinical utility. The CNN-based framework leveraged sensor SCI image data, demonstrating the capacity of deep learning to contribute to accurate diagnosis and prediction in medical scenarios, enhancing patient care.

Lorenzoni et al. ( 2020 ) employed Deep Learning techniques based on Convolutional Neural Networks (CNNs) to automate the segmentation of microCT images of distinct cement-based composites. This research is essential in materials science and civil engineering, where automated segmentation can aid in understanding material properties. Their study emphasizes the adaptability of Deep Learning models, showcasing the transferability of network parameters optimized on high-strength materials to other related contexts. This work demonstrates the potential of CNN-based methodologies for advancing materials characterization and analysis.

Mahajan et al. ( 2021 ) introduced a clustering-based profound iterating Deep Learning model (CPIDM) for hyperspectral image segmentation. This research addresses the challenge of segmenting hyperspectral images, which are prevalent in fields like remote sensing and environmental monitoring. The proposed approach's superiority over state-of-the-art methods indicates its potential for enhancing the accuracy of hyperspectral image analysis. The study contributes to the field by providing an innovative methodology to tackle the unique challenges posed by hyperspectral data.

Jalali et al. ( 2021 ) designed a novel deep learning-based approach for segmenting lung regions from CT images using Bi-directional ConvLSTM U-Net with densely connected convolutions (BCDU-Net). This research is critical for medical image analysis, specifically lung-related diagnoses. Their model's impressive accuracy on a large dataset indicates its potential for aiding radiologists in identifying lung regions accurately. The application of advanced deep learning architectures to medical imaging tasks underscores the transformative potential of such technologies in healthcare.

Bouteldja et al. ( 2020 ) developed a CNN-based approach for accurate multiclass segmentation of stained kidney images from various species and renal disease models. This research’s significance lies in its potential contribution to histopathological analysis and disease diagnosis. The model's high performance across diverse species and disease models highlights its robustness and utility for aiding pathologists in accurate image-based diagnosis.

Liu et al. ( 2021 ) proposed a novel convolutional neural network architecture incorporating cross-connected layers and multi-scale feature aggregation for image segmentation. The research addresses the need for advanced segmentation techniques that can capture intricate features and relationships within images. Their model's impressive performance metrics underscore its potential for enhancing segmentation accuracy, which is pivotal in diverse fields, including medical imaging, robotics, and autonomous systems.

Saood and Hatem et al. ( 2021 ) introduced deep learning networks, SegNet and U-Net, for segmenting COVID-19-infected areas in CT scan images. This research's timeliness is evident, as it contributes to the fight against the global pandemic. Their comparison of network performance provides insights into the effectiveness of different deep learning architectures for accurately identifying infected regions in lung images. This work showcases the agility of deep learning in addressing real-world challenges.

Nurmain et al. ( 2020 ), a novel approach employing Mask-RCNN is introduced for accurate fetal septal defect detection. Addressing limitations in previous methods, the model demonstrates multiclass heart chamber detection with high accuracy: right atrium (97.59%), left atrium (99.67%), left ventricle (86.17%), right ventricle (98.83%), and aorta (99.97%). Competitive results are shown for defect detection in atria and ventricles, with MRCNN achieving around 99.48% mAP compared to 82% for FRCNN. The study concludes that the proposed MRCNN model holds promise for aiding cardiologists in early fetal congenital heart disease screening.

Park et al. ( 2021a ) propose a method for intelligently segmenting food in images using deep neural networks. They address labor-intensive data collection by utilizing synthetic data through 3D graphics software Blender, training Mask R-CNN for instance segmentation. The model achieves 52.2% on real-world food instances with only synthetic data, and + 6.4%p performance improvement after fine-tuning compared to training from scratch. Their approach shows promise for healthcare robot systems like meal assistance robots.

Pérez-Borrero et al. ( 2020 ) underscores the significance of fruit instance segmentation, specifically within autonomous fruit-picking systems. It highlights the adoption of deep learning techniques, particularly Mask R-CNN, as a benchmark. The review justifies the proposed methodology's alterations to address limitations, emphasizing its efficiency gains. Additionally, the introduction of the Instance Intersection Over Union (I2oU) metric and the StrawDI_Db1 dataset creation are positioned as contributions with real-world implementation potential.

These studies collectively highlight the transformative impact of deep learning in various segmentation tasks, ranging from medical imaging to materials science and computer vision. By leveraging advanced neural network architectures and training methodologies, researchers are pushing the boundaries of what is achievable in image segmentation, ultimately contributing to advancements in diverse fields and applications.

5 Feature extraction

Feature extraction is a fundamental process in image processing and computer vision that involves transforming raw pixel data into a more compact and informative representation, often referred to as features. These features capture important characteristics of the image, making it easier for algorithms to understand and analyze images for various tasks like object recognition, image classification, and segmentation. Traditional methods of feature extraction were prevalent before the rise of deep learning and involved techniques that analyzed pixel-level information.Some traditional methods are explained here. Principle Components Analysis (PCA) is a statistical technique that reduces the dimensionality of the data while retaining as much of the original variance as possible. It identifies the orthogonal axes (principal components) along which the data varies the most. Independent Component Analysis (ICA) aims to find a linear transformation of the data into statistically independent components. It is often used for separating mixed sources in images, such as separating different image sources from a single mixed image. Locally Linear Embedding (LLE) is a nonlinear dimensionality reduction technique that aims to preserve the local structure of data points. It finds a low-dimensional representation of the data while maintaining the neighborhood relationships.

These traditional methods of feature extraction have been widely used and have provided valuable insights and representations for various image analysis tasks. However, they often rely on handcrafted features designed by experts or domain knowledge, which can be labor-intensive and may not generalize well across different types of images or tasks.

Conventional methods of feature extraction encompass the conversion of raw data into a more concise and insightful representation by pinpointing specific attributes or characteristics. These selected features are chosen to encapsulate vital insights and patterns inherent in the data. This procedure often involves a manual approach guided by domain expertise or specific insights. For example, within image processing, methods like Histogram of Oriented Gradients (HOG) might extract insights about gradient distributions, while in text analysis, features such as word frequencies could be selected.

Despite the effectiveness of traditional feature extraction for particular tasks and its ability to provide data insights, it comes with inherent limitations. Conventional techniques frequently necessitate expert intervention to craft features, which can be a time-intensive process and might overlook intricate relationships or patterns within the data. Moreover, traditional methods might encounter challenges when dealing with data of high dimensionality or scenarios where features are not easily definable.

In contrast, the ascent of deep learning approaches has revolutionized feature extraction by automating the process. Deep neural networks autonomously learn to extract meaningful features directly from raw data, eliminating the need for manual feature engineering. This facilitates the capture of intricate relationships, patterns, and multifaceted interactions that traditional methods might overlook. Consequently, deep learning has showcased exceptional achievements across various domains, particularly in tasks involving intricate data, such as image and speech recognition. Table 4 succinctly outlines the metrics, strengths and limitations of diverse deep learning models explored for feature enhancement.

Magsi et al. ( 2020 ) embarked on a significant endeavor in the realm of disease identification within date palm trees by harnessing the power of deep learning techniques. Their study centered around texture and color extraction methods from images of various date palm diseases. Through the application of Convolutional Neural Networks (CNNs), they effectively created a system that could discern diseases based on specific visual patterns. The achieved accuracy of 89.4% signifies the model's proficiency in accurately diagnosing diseases within this context. This approach not only showcases the potential of deep learning in addressing agricultural challenges but also emphasizes the importance of automated disease detection for crop management and security.

Sharma et al. ( 2020 ) delved into the domain of medical imaging with a focus on chest X-ray images. They introduced a comprehensive investigation involving different deep Convolutional Neural Network (CNN) architectures to facilitate the extraction of features from these images. Notably, the study evaluated the impact of dataset size on CNN performance, highlighting the scalability of their approach. By incorporating augmentation and dropout techniques, the model achieved a high accuracy of 0.9068, suggesting its ability to accurately classify and diagnose chest X-ray images. This work underscores the potential of deep learning in aiding medical professionals in diagnosing diseases and conditions through image analysis.

Zhang et al. ( 2020 ) offered a novel solution to the challenge of distinguishing between genuine and counterfeit facial images generated using deep learning methods. Their approach relied on a Counterfeit Feature Extraction Method that employed a Convolutional Neural Network (CNN) model. This model demonstrated remarkable accuracy, achieving a rate of 97.6%. Beyond the impressive accuracy, the study also addressed a crucial aspect of computational efficiency, highlighting the potential for reducing the computational demands associated with counterfeit image detection. This research is particularly relevant in today's digital landscape where ensuring the authenticity of images has become increasingly vital.

Simon and V et al. ( 2020 ) explored the fusion of deep learning and feature extraction in the context of image classification and texture analysis. Their study involved Convolutional Neural Networks (CNNs) including popular architectures like AlexNet, VGG19, Inception, InceptionResNetV3, ResNet, and DenseNet201. These architectures were employed to extract meaningful features from images, which were then fed into a Support Vector Machine (SVM) for texture classification. The results were promising, with the model achieving good to superior accuracy levels ranging from 85 to 95% across different pretrained models and datasets. This approach showcases the ability of deep learning to contribute to image analysis tasks, particularly when combined with traditional machine learning techniques.

Sungheetha and Sharma et al. ( 2021 ) addressed the critical challenge of detecting diabetic conditions through the identification of specific signs within blood vessels of the eye. Their approach relied on a deep feature Convolutional Neural Network (CNN) designed to spot these indicators. With an impressive accuracy of 97%, the model demonstrated its efficacy in accurately identifying diabetic conditions. This work not only showcases the potential of deep learning in medical diagnostics but also highlights its ability to capture intricate visual patterns that are indicative of specific health conditions.

Devulapalli et al. ( 2021 ) proposed a hybrid feature extraction method that combined Gabor transform-based texture features with automated high-level features using the Googlenet architecture. By utilizing pre-trained models such as Alexnet, VGG 16, and Googlenet, the study achieved exceptional accuracy levels. Interestingly, the hybrid feature extraction method outperformed the existing pre-trained models, underscoring the potential of combining different feature extraction techniques to achieve superior performance in image analysis tasks. Shankar et al. ( 2022 ) embarked on the critical task of COVID-19 diagnosis using chest X-ray images. Their approach involved a multi-step process that encompassed preprocessing through Weiner filtering, fusion-based feature extraction using GLCM, GLRM, and LBP, and finally, classification through an Artificial Neural Network (ANN). By carefully selecting optimal feature subsets, the model exhibited the potential for robust classification between infected and healthy patients. This study showcases the versatility of deep learning in medical diagnostics, particularly in addressing urgent global health challenges.

Ahmad et al. ( 2022 ) made significant strides in breast cancer detection by introducing a hybrid deep learning model, AlexNet-GRU, capable of autonomously extracting features from the PatchCamelyon benchmark dataset. The model demonstrated its prowess in accurately identifying metastatic cancer in breast tissue. With superior performance compared to state-of-the-art methods, this research emphasizes the potential of deep learning in medical imaging, specifically for cancer detection and classification. Sharif et al. ( 2019 ) ventured into the complex field of detecting gastrointestinal tract (GIT) infections using wireless capsule endoscopy (WCE) images. Their innovative approach combined deep convolutional (CNN) and geometric features to address the intricate challenges posed by lesion attributes. The fusion of contrast-enhanced color features and geometric characteristics led to exceptional classification accuracy and precision, showcasing the synergy between deep learning and traditional geometric features. This approach is particularly promising in enhancing medical diagnostics through the integration of multiple information sources.

Aarthi and Rishma ( 2023 ) responded to the pressing challenges of waste management by introducing a real-time automated waste detection and segregation system using deep learning. Leveraging the Mask R-CNN architecture, their model demonstrated the capability to identify and classify waste objects in real time. Additionally, the study explored the extraction of geometric features for more effective object manipulation by robotic arms. This innovative approach not only addresses environmental concerns related to waste but also showcases the potential of deep learning in practical applications beyond traditional image analysis, with the aim of enhancing efficiency and reducing pollution risks.

These studies showcase the efficacy of methods like CNNs, hybrid approaches, and novel architectures in achieving high accuracies and improved performance metrics in applications such as disease identification, image analysis, counterfeit detection, and more. While these methods automate the extraction of meaningful features, they also encounter challenges like computational complexity, dataset quality, and real-world variability, which should be carefully considered in their practical implementation.

6 Image classification

Image classification is a fundamental task in computer vision that involves categorizing images into predefined classes or labels. The goal is to enable machines to recognize and differentiate objects, scenes, or patterns within images.

Traditional classification is a fundamental data analysis technique that involves categorizing data points into specific classes or categories based on predetermined rules and established features. Before the advent of deep learning, several conventional methods were widely used for this purpose, including Decision Trees, Support Vector Machines (SVM), Naive Bayes, and k-Nearest Neighbors (k-NN). In the realm of traditional classification, experts would carefully design and select features that encapsulate relevant information from the data. These features are typically chosen based on domain knowledge and insights, aiming to capture distinguishing characteristics that help discriminate between different classes. While effective in various scenarios, traditional classification methods often require manual feature engineering, which can be time-consuming and may not fully capture intricate patterns and relationships present in complex datasets. These selected features act as inputs for classification algorithms, which utilize predefined criteria to assign data points to specific classes. Table 5 provides a compact overview of strengths and limitations in the realm of image classification by examining various deep learning models.

In the realm of medical image analysis, Sarah Ali et al. (Ismael et al. 2020 ) introduced an advanced approach that harnesses the power of Residual Networks (ResNets) for brain tumor classification. Their study involved a comprehensive evaluation on a benchmark dataset comprising 3064 MRI images of three distinct brain tumor types. Impressively, their model achieved a remarkable accuracy of 99%, surpassing previous works in the same domain. Shifting focus to the domain of remote sensing, Xiaowei et al. ( 2020 ) embarked on a deep learning journey for remote sensing image classification. Their methodology combined Recurrent Neural Networks (RNN) with Random Forest, aiming to optimize cross-validation on the UC Merced dataset. Through rigorous experimentation and comparison with various deep learning techniques, their approach achieved a commendable accuracy of 87%.

Texture analysis and classification hold significant implications, as highlighted by Aggarwal and Kuma ( 2020 ). Their study introduced a novel deep learning-based model, centered around Convolution Neural Networks (CNN), specifically composed of two sub-models. The outcomes were noteworthy, with model-1 achieving an accuracy of 92.42%, while model-2 further improved the accuracy to an impressive 96.36%.

Abdar et al. ( 2021 ) unveiled a pioneering hybrid dynamic Bayesian Deep Learning (BDL) model that leveraged the Three-Way Decision (TWD) theory for skin cancer diagnosis. By incorporating different uncertainty quantification (UQ) methods and deep neural networks within distinct classification phases, they attained substantial accuracy and F1-score percentages on two skin cancer datasets.

The landscape of medical diagnostics saw another stride forward with Ibrahim et al. ( 2021 ), who explored a deep learning approach based on a pretrained AlexNet model for classifying COVID-19, pneumonia, and healthy CXR scans. Their model exhibited notable performance in both three-way and four-way classifications, achieving high accuracy, sensitivity, and specificity percentages.

In the realm of image classification under resource constraints, Ma et al. ( 2022 ) introduced a novel deep CNN classification method with knowledge transfer. This method showcased superior performance compared to traditional histogram-based techniques, achieving an impressive classification accuracy of 93.4%.

Diving into agricultural applications, Gill et al. ( 2022 ) devised a hybrid CNN-RNN approach for fruit classification. Their model demonstrated remarkable efficiency and accuracy in classifying fruits, showcasing its potential for aiding in quality assessment and sorting.

Abu-Jamie et al. et al. ( 2022 ) turned their attention to fruit classification as well, utilizing a deep learning-based approach. By employing CNN Model VGG16, they managed to achieve a remarkable 100% accuracy, underscoring the potential of such methodologies in real-world applications.

Medical imaging remained a prominent field of exploration, as Sharma et al. ( 2022 ) explored breast cancer diagnosis through Convolutional Neural Networks (CNN) with transfer learning. Their study showcased a promising accuracy of 98.4%, reinforcing the potential of deep learning in augmenting medical diagnostics.

Beyond the realm of medical imagery, Yang et al. ( 2022 ) applied diverse CNN models to an urban wetland identification framework, with DenseNet121 emerging as the top-performing model. The achieved high Kappa and OA values underscore the significance of deep learning in land cover classification.

Hussain et al. ( 2020 ) delved into Alzheimer's disease detection using a 12-layer CNN model. Their approach showcased a remarkable accuracy of 97.75%, surpassing existing CNN models on the OASIS dataset. Their study also provided a head-to-head comparison with pre-trained CNNs, solidifying the efficacy of their proposed approach in enhancing Alzheimer's disease detection.

In the textile industry, Gao et al. ( 2019 ) addressed fabric defect detection using deep learning. Their novel approach, involving a convolutional neural network with multi-convolution and max-pooling layers, showcased promising results with an overall detection accuracy of 96.52%, offering potential implications for real-world practical applications.

Expanding the horizon to neurological disorders, Vikas et al. study ( 2021 ) pioneered ADHD classification from resting-state functional MRI (rs-fMRI) data. Employing a hybrid 2D CNN–LSTM model, the study achieved remarkable improvements in accuracy, specificity, sensitivity, F1-score, and AUC when compared to existing methods. The integration of deep learning with rs-fMRI holds the promise of a robust model for effective ADHD diagnosis and differentiation from healthy controls.

Skouta et al. ( 2021 ) work focused on retinal image classification. By harnessing the capabilities of convolutional neural networks (CNNs), their approach achieved an impressive classification accuracy of 95.5% for distinguishing between normal and proliferative diabetic retinas. The inclusion of an expanded dataset contributed to capturing intricate features and ensuring accurate classification outcomes. These studies collectively illuminate the transformative influence of deep learning techniques across diverse classification tasks, spanning medical diagnoses, texture analysis, image categorization, and neurological disorder identification.

While traditional methods have their merits, they heavily rely on domain expertise for feature selection and algorithm tuning. However, these traditional classification approaches encounter limitations. They might struggle with complex and high-dimensional data, where identifying important features becomes intricate. Additionally, they demand substantial manual effort in feature engineering, making them less adaptable to evolving data distributions or novel data types. The emergence of deep learning has revolutionized classification by automating the process of feature extraction. Deep neural networks directly learn hierarchical representations from raw data, eliminating the need for manually crafted features. This enables them to capture intricate patterns and relationships that traditional methods might miss. Notably, Convolutional Neural Networks (CNNs) have excelled in image classification tasks, while Recurrent Neural Networks (RNNs) demonstrate proficiency in handling sequential data. These deep learning models often surpass traditional methods in tackling complex tasks across various domains.

7 Discussion

Among the deep learning model for image denoising, Self2Self NN for cost reduction with data augmentation dependency, Denoising CNNs enhancing accuracy but facing resource challenges, and DFT-Net managing image label imbalance while risking detail loss. Robustness and hyperparameter tuning characterize MPR-CNN, while R2R noise reduction balances results and computational demands. CNN architectures prevent overfitting in denoising, and HLF-DIP achieves high values despite complexity. (Noise 2Noise) models exhibit efficiency and generalization trade-offs, and ConvNet enhances receptive fields while grappling with interpretability. This collection offers insights into the evolving landscape of image processing techniques.

This compilation of studies showcases a variety of image enhancement techniques. Ming Liu et al. employ Fuzzy-CNN and F-Capsule for iris recognition, ensuring robustness and avoiding overfitting. Khairul Munadi combines various methods with EfficientNet and ResNets for tuberculosis image enhancement, enhancing generalization while facing time and memory challenges. Ching Ta Lu employs FCNN mean filters for noise reduction, addressing noise while considering potential detail loss. Yuhui Quan implements CV-CNN for image deblurring, providing an efficient model with overfitting prevention. Dan Jin employs pix2pixHD for high-quality MDCT image enhancement, achieving quality improvement with possible overfitting concerns. Guofa Li introduces LE-net for low-light image recovery, emphasizing generalization and robustness with real-world limitations. Xianjie Gao introduces RetinexDIP for image enhancement, offering faster convergence and reduced runtime, despite challenges in complex scenes. Kiyeon Kim unveils MSSNet-WS for single image deblurring, prioritizing computational efficiency in real-world scenarios.

This compilation of research papers presents a comprehensive exploration of deep learning methodologies applied to two prominent types of image segmentation: semantic segmentation and instance segmentation. In the realm of semantic segmentation, studies utilize architectures like FCN, U-Net, and DeepLabV3 for tasks such as efficient detection of multiple persons and robust object recognition in varying lighting and background conditions. These approaches achieve notable performance metrics, with IoU and mIoU ranging from 80 to 86%. Meanwhile, in the context of instance segmentation, methods like Mask-RCNN and AFD-UNet are employed to precisely delineate individual object instances within an image, contributing to efficient real-time waste collection, accurate medical image interpretation, and more. The papers highlight the benefits of these techniques, including enhanced boundary delineation, reduced manual intervention, and substantial time savings, while acknowledging challenges such as computational complexity, model customization, and hardware limitations. This compilation provides a comprehensive understanding of the strengths and challenges of deep learning-based semantic and instance segmentation techniques across diverse application domains.

This review explores deep learning methodologies tailored to different types of image feature extraction across varied application domains. Texture/color-based approaches encompass studies like Aurangzeb Magsi et al.’s disease classification achieving 89.4% ACC, and Weiguo Zhang’s counterfeit detection at 97% accuracy. Pattern-based analysis includes Akey Sungheetha’s 97% class score for retinal images, K. Shankar et al.'s 95.1%-95.7% accuracy using FM-ANN, GLCM, GLRM, and LBP for chest X-rays, and Shahab Ahmad's 99.5% accuracy with AlexNet-GRU for PCam images. Geometric feature extraction is demonstrated by Sharif, Muhammad with 99.4% accuracy in capsule endoscopy images and Aarthi.R et al. achieving 97% accuracy in real-time waste image analysis using MRCNN. This comprehensive review showcases deep learning's adaptability in extracting diverse image features for various applications.

This compilation of research endeavors showcases diverse deep learning models applied to distinct types of image classification tasks. For multiclass classification, studies like Sarah Ali et al.'s employment of Residual Networks attains 99% accuracy in MRI image classification, while Akarsh Aggarwal et al.'s CNN approach achieves 92.42% accuracy in Kylberg Texture datasets. Abdullahi Umar Ibrahim's utilization of an AlexNet model records a 94% accuracy rate for lung conditions. In multiclass scenarios, Harmandeep Singh Gill's hybrid CNN-RNN attains impressive results in fruit classification, and Tanseem N et al. achieve 100% accuracy with VGG16 on fruit datasets. For binary classification, Emtiaz Hussain et al.'s CNN achieves 97.75% accuracy in OASIS MRI data, while Can Gao et al. achieve 96.52% accuracy in defect detection for fabric images. Vikas Khullar et al.'s CNN-LSTM hybrid records 95.32% accuracy for ADHD diagnosis, and Ayoub Skouta's CNN demonstrates 95.5% accuracy in diabetic retinopathy detection. These studies collectively illustrate the efficacy and adaptability of deep learning techniques across various types of classification tasks while acknowledging challenges such as dataset biases, computational intensity, and interpretability.

8 Conclusions

This comprehensive review paper embarks on an extensive exploration across the diverse domains of image denoising, enhancement, segmentation, feature extraction, and classification. By meticulously analyzing and comparing these methodologies, it offers a panoramic view of the contemporary landscape of image processing. In addition to highlighting the unique strengths of each technique, the review shines a spotlight on the challenges that come hand in hand with their implementation.

In the realm of image denoising, the efficacy of methods like Self2Self NN, DnCNNs, and DFT-Net is evident in noise reduction, although challenges such as detail loss and hyperparameter optimization persist. Transitioning to image enhancement, strategies like Novel RetinexDIP, Unsharp Masking, and LE-net excel in enhancing visual quality but face complexities in handling intricate scenes and maintaining image authenticity.

Segmentation techniques span the gamut from foundational models to advanced ones, providing precise object isolation. Yet, challenges arise in scenarios with overlapping objects and the need for robustness. Feature extraction methodologies encompass a range from CNNs to LSTM-augmented CNNs, unveiling crucial image characteristics while requiring careful consideration of factors like efficiency and adaptability.

Within classification, Residual Networks to CNN-LSTM architectures showcase potential for accurate categorization. However, data dependency, computational complexity, and model interpretability remain as challenges. The review's contributions extend to the broader image processing field, providing a nuanced understanding of each methodology's traits and limitations. By offering such insights, it empowers researchers to make informed decisions regarding technique selection for specific applications. As the field evolves, addressing challenges like computation demands and interpretability will be pivotal to fully realize the potential of these methodologies.

The scope of papers discussed in this review offers a panorama of DL methodologies that traverse diverse application domains. These domains encompass medical and satellite imagery, botanical studies featuring flower and fruit images, as well as real-time scenarios. The tailored DL approaches for each domain underscore the adaptability and efficacy of these methods across multifaceted real-world contexts.

Aarthi R, Rishma G (2023) A Vision based approach to localize waste objects and geometric features exaction for robotic manipulation. Int Conf Mach Learn Data Eng Procedia Comput Sci 218:1342–1352. https://doi.org/10.1016/j.procs.2023.01.113

Article   Google Scholar  

Abdar M, Samami M, Mahmoodabad SD, Doan T, Mazoure B, Hashemifesharaki R, Liu L, Khosravi A, Acharya UR, Makarenkov V, Nahavandi S (2021) Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning. Comput Biol Med 135:104418. https://doi.org/10.1016/j.compbiomed.2021.104418

Aggarwal A, Kuma M (2020) Image surface texture analysis and classification using deep learning. Multimed Tools Appl 80(1):1289–1309. https://doi.org/10.1007/s11042-020-09520-2

Ahammad SH, Rajesh V, Rahman MZU, Lay-Ekuakille A (2020) A hybrid CNN-based segmentation and boosting classifier for real time sensor spinal cord injury data. IEEE Sens J 20(17):10092–10101. https://doi.org/10.1109/jsen.2020.2992879

Ahmad S, Ullah T, Ahmad I, Al-Sharabi A, Ullah K, Khan RA, Rasheed S, Ullah I, Uddin MN, Ali MS (2022) A novel hybrid deep learning model for metastatic cancer detection". Comput Intell Neurosci 2022:14. https://doi.org/10.1155/2022/8141530

Ahmed I, Ahmad M, Khan FA, Asif M (2020) Comparison of deep-learning-based segmentation models: using top view person images”. IEEE Access 8:136361–136373. https://doi.org/10.1109/access.2020.3011406

Aish MA, Abu-Naser SS, Abu-Jamie TN (2022) Classification of pepper using deep learning. Int J Acad Eng Res (IJAER) 6(1):24–31.

Google Scholar  

Ashraf H, Waris A, Ghafoor MF et al (2022) Melanoma segmentation using deep learning with test-time augmentations and conditional random fields. Sci Rep 12:3948. https://doi.org/10.1038/s41598-022-07885-y

Bouteldja N, Klinkhammer BM, Bülow RD et al (2020) Deep learning based segmentation and quantification in experimental kidney histopathology. J Am Soc Nephrol. https://doi.org/10.1681/ASN.2020050597

Cheng G, Xie X, Han J, Guo L, Xia G-S (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Select Topics Appl Earth Observ Remote Sens 13:3735–3756. https://doi.org/10.1109/JSTARS.2020.3005403

Devulapalli S, Potti A, Rajakumar Krishnan M, Khan S (2021) Experimental evaluation of unsupervised image retrieval application using hybrid feature extraction by integrating deep learning and handcrafted techniques. Mater Today: Proceed 81:983–988. https://doi.org/10.1016/j.matpr.2021.04.326

Dey S, Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2022) CovidConvLSTM: a fuzzy ensemble model for COVID-19 detection from chest X-rays. Exp Syst Appl 206:117812. https://doi.org/10.1016/j.eswa.2022.117812

Gao C, Zhou J, Wong WK, Gao T (2019) Woven Fabric Defect Detection Based on Convolutional Neural Network for Binary Classification. In: Wong W (ed) Artificial Intelligence on Fashion and Textiles AITA 2018 Advances in Intelligent Systems and Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-99695-0_37

Chapter   Google Scholar  

Gao X, Zhang M, Luo J (2022) Low-light image enhancement via retinex-style decomposition of denoised deep image prior. Sensors 22:5593. https://doi.org/10.3390/s22155593

Gill HS, Murugesan G, Mehbodniya A, Sajja GS, Gupta G, Bhatt A (2023) Fruit Type Classification using Deep Learning and Feature Fusion. Comput Electronic Agric 211:107990 https://doi.org/10.1016/j.compag.2023.107990

Gite S, Mishra A, Kotecha K (2022) Enhanced lung image segmentation using deep learning. Neural Comput and Appl. https://doi.org/10.1007/s00521-021-06719-8

Hasti VR, Shin D (2022) Denoising and fuel spray droplet detection from light-scattered images using deep learning. Energy and AI 7:100130. https://doi.org/10.1016/j.egyai.2021.100130

Hedayati R, Khedmati M, Taghipour-Gorjikolaie M (2021) Deep feature extraction method based on ensemble of convolutional auto encoders: Application to Alzheimer’s disease diagnosis. Biomed Signal Process Control 66:102397. https://doi.org/10.1016/j.bspc.2020.102397

Hussain E, Hasan M, Hassan SZ, Azmi TH, Rahman MA, Parvez MZ (2020) [IEEE 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA) - Kristiansand, Norway (2020.11.9–2020.11.13)] 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA) - Deep Learning Based Binary Classification for Alzheimerâ™s Disease Detection using Brain MRI Images. pp. 1115–1120. https://doi.org/10.1109/iciea48937.2020.9248213

Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS (2021) Pneumonia Classifcation Using Deep Learning from Chest X ray Images During COVID 19. Cognitive Computation. Springer, Berlin. https://doi.org/10.1007/s12559-020-09787-5

Ismael SAA, Mohammed A, Hefny H (2020) An enhanced deep learning approach for brain cancer MRI images classification using residual networks. Artif Intell Med 102:101779. https://doi.org/10.1016/j.artmed.2019.101779

Jalali Y, Fateh M, Rezvani M, Abolghasemi V, Anisi MH (2021) ResBCDU-Net: a deep learning framework for lung CT image segmentation. Sensors. https://doi.org/10.3390/s21010268

Jiang X, Zhu Y, Zheng B et al (2021) Images denoising for COVID-19 chest X-ray based on multi-resolution parallel residual CNN. July 2021 Machine Vision and Applications 32(4). https://doi.org/10.1007/s00138-021-01224-3

Jin D, Zheng H, Zhao Q, Wang C, Zhang M, Yuan H (2021) Generation of vertebra micro-CT-like image from MDCT: a deep-learning-based image enhancement approach. Tomography 7:767–782. https://doi.org/10.3390/tomography7040064

Kasongo SM, Sun Y (2020) A deep learning method with wrapper based feature extraction for wireless intrusion detection system. Comput Secur 92:101752. https://doi.org/10.1016/j.cose.2020.101752

Khullar V, Salgotra K, Singh HP, Sharma DP (2021) Deep learning-based binary classification of ADHD using resting state MR images. Augment Hum Res. https://doi.org/10.1007/s41133-020-00042-y

Kim K, Lee S, Cho S (2023) MSSNet: Multi-Scale-Stage Network for Single Image Deblurring. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13802. Springer, Cham. https://doi.org/10.1007/978-3-031-25063-7_32

Kim B, Ye JC (2019) Mumford-Shah Loss functional for image segmentation with deep learning. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2019.2941265

Kong Y, Ma X, Wen C (2022) A new method of deep convolutional neural network image classification based on knowledge transfer in small label sample environment. Sensors 22:898. https://doi.org/10.3390/s22030898

Li G, Yang Y, Xingda Q, Cao D, Li K (2021a) A deep learning based image enhancement approach for autonomous driving at night. Knowl-Based Syst 213:106617. https://doi.org/10.1016/j.knosys.2020.106617

Li W, Raj ANJ, Tjahjadi T, Zhuang Z (2021b) Digital hair removal by deep learning for skin lesion segmentation”. Pattern Recog 117:107994. https://doi.org/10.1016/j.patcog.2021.107994

Liu M, Zhou Z, Shang P, Xu D (2019) Fuzzified image enhancement for deep learning in iris recognition”. IEEE Trans Fuzzy Syst 2019:2912576. https://doi.org/10.1109/TFUZZ.2019.2912576

Liu D, Wen B, Jiao J, Liu X, Wang Z, Huang TS (2020) Connecting image denoising and high-level vision tasks via deep learning. IEEE Trans Image Process 29:3695–3706. https://doi.org/10.1109/TIP.2020.2964518

Liu L, Tsui YY, Mandal M (2021) Skin lesion segmentation using deep learning with auxiliary task. J Imag 7:67. https://doi.org/10.3390/jimaging7040067

Lorenzoni R, Curosu I, Paciornik S, Mechtcherine V, Oppermann M, Silva F (2020) Semantic segmentation of the micro-structure of strain-hardening cement-based composites (SHCC) by applying deep learning on micro-computed tomography scans. Cement Concrete Compos 108:103551. https://doi.org/10.1016/j.cemconcomp.2020.103551

Lu CT, Wang LL, Shen JH et al (2021) Image enhancement using deep-learning fully connected neural network mean filter. J Supercomput 77:3144–3164. https://doi.org/10.1007/s11227-020-03389-6

Ma S, Li L, Zhang C (2022) Adaptive Image denoising method based on diffusion equation and deep learning”. Internet of Robotic Things-Enabled Edge Intelligence Cognition for Humanoid Robots Volume 2022 | Article ID 7115551. https://doi.org/10.1155/2022/7115551

Magsi A, Mahar JA, Razzaq MA, Gill SH (2020) Date Palm Disease Identification Using Features Extraction and Deep Learning Approach. 2020 IEEE 23rd International Multitopic Conference (INMIC). https://doi.org/10.1109/INMIC50486.2020.9318158

Mahajan K, Garg U, Shabaz M (2021) CPIDM: a clustering-based profound iterating deep learning model for HSI segmentation Hindawi. Wireless Commun Mobile Comput 2021:12. https://doi.org/10.1155/2021/7279260

Mahmoudi O, Wahab A, Chong KT (2020) iMethyl-deep: N6 methyladenosine identification of yeast genome with automatic feature extraction technique by using deep learning algorithm. Genes 2020, 11(5), 529; https://doi.org/10.3390/genes11050529

Mehranian A, Wollenweber SD, Walker MD et al (2022) Deep learning–based time-of-flight (ToF) image enhancement of non-ToF PET scans. Eur J Nucl Med Mol Imag 49:3740–3749. https://doi.org/10.1007/s00259-022-05824-7

Meng Y, Zhang J (2022) A novel gray image denoising method using convolutional neural network”. IEEE Access 10:49657–49676 https://doi.org/10.1007/s00259-022-05824-7

Munadi K, Muchtar K, Maulina N (2020) And Biswajeet Pradhan”, image enhancement for tuberculosis detection using deep learning. IEEE Access 8:217897. https://doi.org/10.1109/ACCESS.2020.3041867

Niresi FK, Chi C-Y (2022) Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution”. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing vol. 15, pp. 5967-5983, 2022. https://doi.org/10.1109/JSTARS.2022.3187722

Nurmaini S, Rachmatullah MN, Sapitri AI, Darmawahyuni A, Jovandy A, Firdaus F, Tutuko B, Passarella R (2020) Accurate detection of septal defects with fetal ultrasonography images using deep learning-based multiclass instance segmentation. IEEE Access 8:196160–196174. https://doi.org/10.1109/ACCESS.2020.3034367

Pang T, Zheng H, Quan Y, Ji H (2021) Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR46437.2021.00208

Park KH, Batbaatar E, Piao Y, Theera-Umpon N, Ryu KH (2021b) Deep learning feature extraction approach for hematopoietic cancer subtype classification. Int J Environ Res Public Health 18:2197. https://doi.org/10.3390/ijerph18042197

Park D, Lee J, Lee J, Lee K (2021) Deep Learning based Food Instance Segmentation using Synthetic Data, IEEE, 18th International Conference on Ubiquitous Robots (UR). https://doi.org/10.1109/UR52253.2021.9494704

Peng Z, Peng S, Lidan Fu, Binchun Lu, Tanga J, Wang Ke, Wenyuan Li, (2020) A novel deep learning ensemble model with data denoising for short-term wind speed forecasting”. Energy Convers Manag 207:112524. https://doi.org/10.1016/j.enconman.2020.112524

Pérez-Borrero I, Marín-Santos D, Gegúndez-Arias ME, Cortés-Ancos E (2020) A fast and accurate deep learning method for strawberry instance segmentation. Comput Electron Agric 178:105736. https://doi.org/10.1016/j.compag.2020.105736

Picon A, San-Emeterio MG, Bereciartua-Perez A, Klukas C, Eggers T, Navarra-Mestre R (2022) Deep learning-based segmentation of multiple species of weeds and corn crop using synthetic and real image datasets. Comput Electron Agric 194:10671. https://doi.org/10.1016/j.compag.2022.106719

Quan Y, Lin P, Yong X, Nan Y, Ji H (2021) Nonblind image deblurring via deep learning in complex field. IEEE Trans Neural Netw Learn Syst 33(10):5387–5400. https://doi.org/10.1109/TNNLS.2021.3070596

Quan, Y., Chen, M., Pang, T. and Ji, H., 2020 “Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image”, IEEE 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Seattle, WA, 2020, pp. 1887–1895. https://doi.org/10.1109/CVPR42600.2020.00196

Robiul Islam Md, Nahiduzzaman Md (2022) Complex features extraction with deep learning model for the detection of COVID19 from CT scan images using ensemble based machine learning approach. Exp Syst Appl 195:116554. https://doi.org/10.1016/j.eswa.2022.116554

Saood A, Hatem I (2021) COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet”. BMC Med Imaging 21:19. https://doi.org/10.1186/s12880-020-00529-5

Sarki R, Ahmed K, Wang H et al (2020) Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf Sci Syst 8:32. https://doi.org/10.1007/s13755-020-00125-5

Shankar K, Perumal E, Tiwari P et al (2022) Deep learning and evolutionary intelligence with fusion-based feature extraction for detection of COVID-19 from chest X-ray images. Multimedia Syst 28:1175–1187. https://doi.org/10.1007/s00530-021-00800-x

Sharif M, Attique Khan M, Rashid M, Yasmin M, Afza F, Tanik UJ (2019) Deep CNN and geometric features-based gastrointestinal tract diseases detection and classification from wireless capsule endoscopy images. J Exp Theor Artif Intell 33:1–23. https://doi.org/10.1080/0952813X.2019.1572657

Sharma A, Mishra PK (2022) Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19 features using CXR images. Multimed Tools Appl 81:42649–42690. https://doi.org/10.1007/s11042-022-13486-8

Sharma T, Nair R, Gomathi S (2022) Breast cancer image classification using transfer learning and convolutional neural network. Int J Modern Res 2(1):8–16

Sharma, Harsh, Jain, Jai Sethia, Bansal, Priti, Gupta, Sumit (2020). [IEEE 2020 10th International Conference on Cloud Computing, Data Science and Engineering (Confluence) - Noida, India (2020.1.29–2020.1.31)] 2020 10th International Conference on Cloud Computing, Data Science and Engineering (Confluence) - Feature Extraction and Classification of Chest X-Ray Images Using CNN to Detect Pneumonia. pp. 227–231. https://doi.org/10.1109/Confluence47617.2020.9057809

Simon P, Uma V (2020) Deep learning based feature extraction for texture classification. Procedia Comput Sci 171:1680–1687. https://doi.org/10.1016/j.procs.2020.04.180

Skouta A, Elmoufidi A, Jai-Andaloussi S, Ochetto O (2021) Automated Binary Classification of Diabetic Retinopathy by Convolutional Neural Networks. In: Saeed F, Al-Hadhrami T, Mohammed F, Mohammed E (eds) Advances on Smart and Soft Computing, Advances in Intelligent Systems and Computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-6048-4_16

Sori WJ, Feng J, Godana AW et al (2021) DFD-Net: lung cancer detection from denoised CT scan image using deep learning. Front Comput Sci 15:152701. https://doi.org/10.1007/s11704-020-9050-z

Sungheetha A, Rajesh Sharma R (2021) Design an early detection and classification for diabetic retinopathy by deep feature extraction based convolution neural network. J Trends Comput Sci Smart Technol (TCSST) 3(2):81–94. https://doi.org/10.36548/jtcsst.2021.2.002

Tang H, Zhu H, Fei L, Wang T, Cao Y, Xie C (2023) Low-Illumination image enhancement based on deep learning techniques: a brief review. Photonics 10(2):198. https://doi.org/10.3390/photonics10020198

Tanseem N. Abu-Jamie, Samy S. Abu-Naser, Mohammed A. Alkahlout, Mohammed A. Aish,“Six Fruits Classification Using Deep Learning”, International Journal of Academic Information Systems Research (IJAISR) ISSN: 2643–9026. 6(1):1–8

Tawfik MS, Adishesha AS, Hsi Y, Purswani P, Johns RT, Shokouhi P, Huang X, Karpyn ZT (2022) Comparative study of traditional and deep-learning denoising approaches for image-based petrophysical characterization of porous media. Front Water 3:800369 https://doi.org/10.3389/frwa.2021.800369

Tian C, Xu Y, Fei L, Yan K (2019) Deep Learning for Image Denoising: A Survey. In: Pan JS, Lin JW, Sui B, Tseng SP (eds) Genetic and Evolutionary Computing. ICGEC 2018. Advances in Intelligent Systems and Computing. Springer, Singapore. https://doi.org/10.48550/arXiv.1810.05052

Tian C, Fei L, Zheng W, Xu Y, Zuof W, Lin CW (2020) Deep Learning on Image Denoising: An Overview. Neural Networks 131:251-275 https://doi.org/10.1016/j.neunet.2020.07.025

Wang D, Su J, Yu H (2020) Feature Extraction and analysis of natural language processing for deep learning english language. IEEE Access 8:46335–46345. https://doi.org/10.1109/ACCESS.2020.2974101

Wang EK, Chen CM, Hassan MM, Almogren A (2020) A deep learning based medical image segmentation technique in Internet-of-Medical-Things domain. Future Gen Comput Syst 108:135–144. https://doi.org/10.1016/j.future.2020.02.054

Xiaowei Xu, Chen Y, Junfeng Zhang Y, Chen PA, Manickam A (2020) A novel approach for scene classification from remote sensing images using deep learning methods. Eur J Remote Sens 54:383–395. https://doi.org/10.1080/22797254.2020.1790995

Yan K, Chang L, Andrianakis M, Tornari V, Yu Y (2020) Deep learning-based wrapped phase denoising method for application in digital holographic speckle pattern interferometry. Appl Sci 10:4044. https://doi.org/10.3390/app10114044

Yang R, Luo F, Ren F, Huang W, Li Q, Du K, Yuan D (2022) Identifying urban wetlands through remote sensing scene classification using deep learning: a case study of Shenzhen. China ISPRS Int J Geo-Inf 11:131. https://doi.org/10.3390/ijgi11020131

Yoshimura N, Kuzuno H, Shiraishi Y, Morii M (2022) DOC-IDS: a deep learning-based method for feature extraction and anomaly detection in network traffic. Sensors 22:4405. https://doi.org/10.3390/s22124405

Zhang W, Zhao C, Li Y (2020) A novel counterfeit feature extraction technique for exposing face-swap images based on deep learning and error level analysis. Entropy 22(2):249. https://doi.org/10.3390/e22020249

Article   MathSciNet   Google Scholar  

Zhou Y, Zhang C, Han X, Lin Y (2021) Monitoring combustion instabilities of stratified swirl flames by feature extractions of time-averaged flame images using deep learning method. Aerospace Sci Technol 109:106443. https://doi.org/10.1016/j.ast.2020.106443

Zhou X, Zhou H, Wen G, Huang X, Le Z, Zhang Z, Chen X (2022) A hybrid denoising model using deep learning and sparse representation with application in bearing weak fault diagnosis. Measurement 189:110633. https://doi.org/10.1016/j.measurement.2021.110633

Download references

Author information

Authors and affiliations.

Department of Computer Science, Bishop Heber College (Affiliated to Bharathidasan University), Tiruchirappalli, Tamil Nadu, India

R. Archana & P. S. Eliahim Jeevaraj

You can also search for this author in PubMed   Google Scholar

Contributions

All authors reviewed the manuscript.

Corresponding author

Correspondence to P. S. Eliahim Jeevaraj .

Ethics declarations

Conflict of interest.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Archana, R., Jeevaraj, P.S.E. Deep learning models for digital image processing: a review. Artif Intell Rev 57 , 11 (2024). https://doi.org/10.1007/s10462-023-10631-z

Download citation

Accepted : 17 December 2023

Published : 07 January 2024

DOI : https://doi.org/10.1007/s10462-023-10631-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image processing
  • Deep learning models
  • Convolutional neural networks (CNN)
  • Find a journal
  • Publish with us
  • Track your research

DHS Informatics

IEEE 2024-2025 : IMAGE PROCESSING Projects

Click here matlab image processing projects, click here  python  ieee image processing projects .

onlineClass

For Outstation Students, we are having online project classes both technical and coding using net-meeting software

For details, call: 9886692401/9845166723.

DHS Informatics  providing  latest 2024-2025 IEEE projects  on Image Processing for the final year engineering students. DHS Informatics trains all students to develop their project with good idea what they need to submit in college to get good marks. DHS Informatics offers placement training in Bangalore and the program name is  OJT  –  On Job Training , job seekers as well as final year college students can join in this placement training program and job opportunities in their dream IT companies. We are providing IEEE projects for BE / B.TECH, M.TECH, MCA, BCA, DIPLOMA students from more than two decades.

Python  Final year CSE projects in Bangalore

  • Python 2024 – 2025 IEEE PYTHON PROJECTS CSE | ECE | ISE
  • Python 2024 – 2025 IEEE PYTHON MACHINE LEARNING PROJECTS
  • Python 2024 – 2025 IEEE PYTHON IMAGE PROCESSING PROJECTS
  • Python 2024 – 2025 IEEE IOT PYTHON RASPBERRY PI PROJECTS

Image processing

Efficient quantum information hiding for remote medical image sharing.

Abstract : Information hiding aims to embed secret data into the multimedia, such as image, audio, video,  and text. In this paper, two new quantum information hiding approaches are put forward. A  quantum stenography approach is proposed to hide a quantum secret image into a quantum  cover image. The quantum secret image is encrypted first using a controlled-NOT gate to  demonstrate the security of the embedded data. The encrypted secret image is embedded into the  quantum cover image using the two most and least significant qubits. In addition, a quantum  image watermarking approach is presented to hide a quantum watermark gray image into a  quantum carrier image. The quantum watermark image, which is scrambled by utilizing Arnold’s  cat map, is then embedded into the quantum carrier image using the two least and most  significant qubits. Only the watermarked image and the key are sufficient to extract the  embedded quantum watermark image. The proposed novelty has been illustrated using a scenario  of sharing medical imagery between two remote hospitals. The simulation and analysis  demonstrate that the two newly proposed approaches have excellent visual quality and high  embedding capacity y and security.                                                                                                                                                                                                                                                                                                                      

An Efficient MSB Prediction-Based Method for High-Capacity Reversible Data Hiding in Encrypted Images

Abstract : Reversible data hiding in encrypted images (RDHEI) is an effective technique to embed data in the encrypted domain. An original image is encrypted with a secret key and during or after its transmission, it is possible to embed additional information in the encrypted image, without knowing the encryption key or the original content of the image. During the decoding process, the secret message can be extracted and the original image can be reconstructed. In the last few years, RDHEI has started to draw research interest. Indeed, with the development of cloud computing, data privacy has become a real issue. However, none of the existing methods allows us to hide a large amount of information in a reversible manner. In this paper, we propose a new reversible method based on MSB (most significant bit) prediction with a very high capacity. We present two approaches, these are: high capacity reversible data hiding approach with correction of prediction errors (CPE-HCRDH) and high capacity reversible data hiding approach with embedded prediction errors (EPE-HCRDH). With this method, regardless of the approach used, our results are better than those  obtained with current state of the art methods, both in terms of reconstructed image quality and embedding capacity                                                                                                                                                                                                                                                            

Visual Secret Sharing Schemes Encrypting Multiple Images

Abstract : The aim of this paper is to maximize the range of the access control of visual secret sharing (VSS) schemes encrypting multiple images. First, the formulation of access structures for a single secret is generalized to that for multiple secrets. This generalization is maximal in the sense that the generalized for-mulation makes no restrictions on access structures; in particular, it includes the existing ones as special cases. Next, a sufficient condition to be satisfied by the encryption of VSS schemes realizing an access structure for multiple secrets of the most general form is introduced, and two constructions of VSS schemes with encryption satisfying this condition are provided. Each of the two constructions has its advantage against the other; one is more general and can generate VSS schemes with strictly better contrast and pixel expansion than the other, while the other has a straightforward implementation. Moreover, for threshold access structures, the pixel expansions of VSS schemes generated by the  latter construction are estimated and turn out to be the same as those of the existing schemes called the threshold multiple-secret visual cryptographic schemes. Finally, the optimality of the former construction is examined, giving that there exist access structures for which it generates no optimal VSS schemes.                                                                                                                                                                               

Computer Assisted Segmentation of Palmprint Images for Biometric Research

Abstract : One of the becoming popular bio metric modalities   is the palm print. This bio metric modality is rich with   information, such as minutiae, ridges, wrinkles, and creases.   This research team is interested to investigate the creases for   bio metric identification. The palm print images in this research   have been captured by using a commercially available consumer   scanner. For each palm print image, two square regions on the   palm print image are extracted for bio metric identification   purpose. One of the regions is from the hyperthyroid region,   while the another is from the inter digital region. Due to   misalignment of the hand, the process of extraction of these   regions is tedious and time-consuming. Therefore, in this paper,   a computer-aided method has been proposed to simplify the   extraction process. The user only needs to mark two points on   the palm print image. Based on these points, the palm print image   will be aligned, and those two regions are extracted   automatically.                                                                                                                                                                                     

Image Classification using Manifold Learning Based Non-Linear Dimensionality Reduction

Abstract : This paper presents fast categorization or  classification of images on an animal data set using different  classification algorithm in combination with manifold learning  algorithms. The paper will focus on comparing the effects of  different non-linear dimensional reduction algorithms on  speed and accuracy of different classification algorithms. It  examines how manifold learning algorithms can improve  classification speed by reducing the number of features in the  vector representation of images while keeping the classification  accuracy high.                                                                                                                                 

IEEE IMAGE PROCESSING PROJECTS (2024-2025)

1. IEEE : Eye Recognition with Mixed Convolutional and Residual Network(MiCoRe-Net)
2. IEEE : Latent Fingerprint Value Prediction: Crowd-based Learning
3. IEEE : Developing LSB Method Using Mask in Colored  Images
4. IEEE : Efficient Quantum Information Hiding for Remote Medical Image Sharing
5. IEEE : An Efficient MSB Prediction-Based Method for High-Capacity Reversible Data Hiding in Encrypted Images
6. IEEE :  Visual Secret Sharing Schemes Encrypting Multiple Images
7.  IEEE : Human Identification from Freestyle Walks using Posture-Based Gait Feature
8.  IEEE : Computer Assisted Segmentation of Palmprint  Images for Biometric Research
9. IEEE : Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures
10.  IEEE : Image Classification using Manifold Learning Based  Non-Linear Dimensionality Reduction
11.  IEEE : Conceptual view of the IRIS recognition systems in  the biometric world using image processing  techniques
12. IEEE : Animal classification using facial images with score-level fusion.
13. IEEE : Smile Detection in the Wild Based on Transfer Learning
14.  IEEE : Design of biometric recognition software based on image processing
15. IEEE : Effective and Efficient Global Context  Verification for Image Copy Detection
16. IEEE : Face Recognition Using Sparse Fingerprint Classification Algorithm
17.  IEEE : One-time Password for Biometric Systems:  Disposable Feature Templates
18.  IEEE : Enhanced Password Processing Scheme  Based on Visual Cryptography and OCR
19.  IEEE : Semi-Supervised Image-to-Video Adaptation for Video Action Recognition
20.  IEEE : My Privacy My Decision: Control of Photo Sharing on Online Social Networks
21.  IEEE :  MR Image classification using adaboost for brain  tumor type
22.  IEEE : Lung lesion extraction using a toboggan based growing automatic segmentation approach
23.  IEEE : PassBYOP: Bring Your Own Picture for Securing Graphical Passwords
24.  IEEE : Accurate Detection and Recognition of Dirty Vehicle Plate Numbers for High-Speed Applications

DHS Informatics believes in students’ stratification, we first brief the students about the technologies and type of Image Processing projects and other domain projects. After complete concept explanation of the IEEE Image Processing projects, students are allowed to choose more than one IEEE Image Processing projects for functionality details. Even students can pick one project topic from Image Processing and another two from other domains like Image Processing, data mining, image process, information forensic, big data, Image Processing, Image Processing, data science, block chain etc. DHS Informatics is a pioneer institute in Bangalore / Bengaluru; we are supporting project works for other institute all over India. We are the leading final year project centre in Bangalore / Bengaluru and having office in five different main locations Jayanagar, Yelahanka, Vijayanagar, RT Nagar & Indiranagar.

We allow the ECE, CSE, ISE final year students to use the lab and assist them in project development work; even we encourage students to get their own idea to develop their final year projects for their college submission.

DHS Informatics first train students on project related topics then students are entering into practical sessions. We have well equipped lab set-up, experienced faculties those who are working in our client projects and friendly student coordinator to assist the students in their college project works.

We appreciated by students for our Latest IEEE projects & concepts on final year Image Processing projects for ECE , CSE , and ISE departments.

Latest IEEE 2024-2025 projects on Image Processing with real time concepts which are implemented using Java , MATLAB , and NS2 with innovative ideas. Final year students of computer Image Processing, computer science, information science, electronics and communication can contact our corporate office located at Jayanagar, Bangalore for Image Processing project details.

IMAGE PROCESSING

Image processing  is processing of images using mathematical operations by using any form of signal processing for which the input is an image, a series of images, such as a photograph, the output of image processing may be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve isolating the individual color planes of an image and treating them as two-dimensional signal and applying standard signal-processing techniques to them. Images are also processed as three-dimensional signals with the third dimension being time or the z-axis.

Image processing usually refers to digital image processing, but optical and analog image processing also are possible. This article is about general techniques that apply to all of them. The acquisition of images (producing the input image in the first place) is referred to as imaging.

Java Final year CSE projects in Bangalore

  • Java Information Forensic / Block Chain B.E Projects
  • Java  Cloud Computing B.E Projects
  • Java  Big Data with Hadoop B.E Projects
  • Java  Networking & Network Security B.E Pr ojects
  • Java  Data Mining / Web Mining / Cyber Secu rity B.E Projects
  • Java DataScience / Machine Learning  B.E Projects
  •  Java Artificaial Inteligence B.E Projects
  • Java  Wireless Sensor Network B.E Projects
  • Java  Distributed & Parallel Networking B.E Projects
  • Java Mobile Computing B.E Projects

Android Final year CSE projects in Bangalore

  • Android  GPS, GSM, Bluetooth & GPRS B.E Projects
  • Android  Embedded System Application Projetcs for B.E
  • Android  Database Applications Projects for B.E Students
  • Android  Cloud Computing Projects for Final Year B.E Students
  • Android  Surveillance Applications B.E Projects
  • Android  Medical Applications Projects for B.E

Embedded  Final year CSE projects in Bangalore

  • Embedded  Robotics Projects for M.tech Final Year Students
  • Embedded  IEEE Internet of Things Projects for B.E Students
  • Embedded   Raspberry PI Projects for B.E Final Year Students
  • Embedded  Automotive Projects for Final Year B.E Students
  • Embedded  Biomedical Projects for B.E Final Year Students
  • Embedded  Biometric Projects for B.E Final Year Students
  • Embedded  Security Projects for B.E Final Year

MatLab  Final year CSE projects in Bangalore

  • Matlab  Image Processing Projects for B.E Students
  • MatLab  Wireless Communication B.E Projects
  • MatLab  Communication Systems B.E Projects
  • MatLab  Power Electronics Projects for B.E Students
  • MatLab  Signal Processing Projects for B.E
  • MatLab  Geo Science & Remote Sensors B.E Projects
  • MatLab  Biomedical Projects for B.E Students

A Small-Scale Image U-Net-Based Color Quality Enhancement for Dense Point Cloud

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, fusion of rgb and hsv colour space for foggy image quality enhancement.

The physical properties of water cause light-prompted degradation of foggy images. The light quickly loses intensity as it goes in the water, depending upon the shading range wavelength. Visible light is consumed at the longest wavelength first. Red and ...

Efficient edge-preserving algorithm for color contrast enhancement with application to color image segmentation

In this paper, a new and efficient edge-preserving algorithm is presented for color contrast enhancement in CIE Lu^'v^' color space. The proposed algorithm not only can enhance the color contrast as the previous algorithm does, but also has an edge-...

Skin color enhancement based on favorite skin color in HSV color space

Skin color enhancement based on favorite skin color is proposed to make skin color displayed on large screen flat panel TVs agree with human favorite skin color. A robust skin detection method in different intensity is obtained after analyzing the ...

Information

Published in, publication history.

  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

digital image processing IEEE PAPERS-2020

Free ieee paper and projects, ieee projects 2022, seminar reports, free ieee projects ieee papers.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

electronics-logo

Article Menu

ieee research papers on image processing

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A hybrid parallel computing architecture based on cnn and transformer for music genre classification.

ieee research papers on image processing

Graphical Abstract

1. Introduction

2. related work, 3. preliminary knowledge, 3.2. vision transformer, 3.3. mel spectrogram, 4. proposed hybrid architecture, 5. experiments and analyses on gtzan dataset, 5.1. gtzan dataset description, 5.2. evaluation metrics, 5.3. experimental settings, 5.4. comparison of results and analysis on gtzan dataset, 6. experiments and analyses on free music archive dataset, 6.1. fma dataset description, 6.2. comparison of results and analysis on fma dataset, 6.3. the ablation study and analysis on cnn-te, 7. conclusions, author contributions, data availability statement, conflicts of interest.

  • Cheng, Y.H.; Chang, P.C.; Kuo, C.N. Convolutional Neural Networks Approach for Music Genre Classification. In Proceedings of the 2020 International Symposium on Computer, Consumer and Control (IS3C), Taichung City, Taiwan, 13–16 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 399–403. [ Google Scholar ]
  • Liu, J.; Wang, C.; Zha, L. A middle-level learning feature interaction method with deep learning for multi-feature music genre classification. Electronics 2021 , 10 , 2206. [ Google Scholar ] [ CrossRef ]
  • Wen, Z.; Chen, A.; Zhou, G.; Yi, J.; Peng, W. Parallel attention of representation global time–frequency correlation for music genre classification. Multimed. Tools Appl. 2024 , 83 , 10211–10231. [ Google Scholar ]
  • Deepak, S.; Prasad, B. Music Classification based on Genre using LSTM. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 985–991. [ Google Scholar ]
  • Zheng, Z. The Classification of Music and Art Genres under the Visual Threshold of Deep Learning. Comput. Intell. Neurosci. 2022 , 2022 , 4439738. [ Google Scholar ] [ PubMed ]
  • Narkhede, N.; Mathur, S.; Bhaskar, A.; Kalla, M. Music genre classification and recognition using convolutional neural network. Multimed. Tools Appl. 2024 , 1–16. [ Google Scholar ] [ CrossRef ]
  • Pelchat, N.; Gelowitz, C.M. Neural network music genre classification. Can. J. Electr. Comput. Eng. 2020 , 43 , 170–173. [ Google Scholar ]
  • Cheng, Y.H.; Kuo, C.N. Machine Learning for Music Genre Classification Using Visual Mel Spectrum. Mathematics 2022 , 10 , 4427. [ Google Scholar ] [ CrossRef ]
  • Prabhakar, S.K.; Lee, S.W. Holistic Approaches to Music Genre Classification using Efficient Transfer and Deep Learning Techniques. Expert Syst. Appl. 2023 , 211 , 118636. [ Google Scholar ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [ Google Scholar ]
  • Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020 , arXiv:2010.11929. [ Google Scholar ]
  • Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022 , 45 , 87–110. [ Google Scholar ]
  • Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12009–12019. [ Google Scholar ]
  • Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 16519–16529. [ Google Scholar ]
  • Fu, Z.; Lu, G.; Ting, K.M.; Zhang, D. A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 2010 , 13 , 303–319. [ Google Scholar ]
  • Rosner, A.; Kostek, B. Automatic music genre classification based on musical instrument track separation. J. Intell. Inf. Syst. 2018 , 50 , 363–384. [ Google Scholar ]
  • Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016 , 39 , 2298–2304. [ Google Scholar ]
  • Wu, W.; Han, F.; Song, G.; Wang, Z. Music genre classification using independent recurrent neural network. In Proceedings of the 2018 Chinese Automation Congress (CAC), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 192–195. [ Google Scholar ]
  • Kim, T.; Lee, J.; Nam, J. Comparison and analysis of samplecnn architectures for audio classification. IEEE J. Sel. Top. Signal Process. 2019 , 13 , 285–297. [ Google Scholar ]
  • Hongdan, W.; SalmiJamali, S.; Zhengping, C.; Qiaojuan, S.; Le, R. An intelligent music genre analysis using feature extraction and classification using deep learning techniques. Comput. Electr. Eng. 2022 , 100 , 107978. [ Google Scholar ]
  • Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Convolutional recurrent neural networks for music classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2392–2396. [ Google Scholar ]
  • Wang, Z.; Muknahallipatna, S.; Fan, M.; Okray, A.; Lan, C. Music classification using an improved crnn with multi-directional spatial dependencies in both time and frequency dimensions. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [ Google Scholar ]
  • Zhao, H.; Zhang, C.; Zhu, B.; Ma, Z.; Zhang, K. S3t: Self-supervised pre-training with swin transformer for music classification. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 606–610. [ Google Scholar ]
  • Jena, K.K.; Bhoi, S.K.; Mohapatra, S.; Bakshi, S. A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Comput. Appl. 2023 , 35 , 11223–11248. [ Google Scholar ]
  • Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollár, P.; Girshick, R. Early convolutions help transformers see better. arXiv 2021 , arXiv:2106.14881. [ Google Scholar ]
  • Zaman, K.; Sah, M.; Direkoglu, C.; Unoki, M. A survey of audio classification using deep learning. IEEE Access 2023 , 11 , 106620–106649. [ Google Scholar ]
  • Gupta, C.; Li, H.; Goto, M. Deep learning approaches in topics of singing information processing. IEEE/ACM Trans. Audio Speech Lang. Process. 2022 , 30 , 2422–2451. [ Google Scholar ]
  • Serrano, S.; Patanè, L.; Serghini, O.; Scarpa, M. Detection and Classification of Obstructive Sleep Apnea Using Audio Spectrogram Analysis. Electronics 2024 , 13 , 2567. [ Google Scholar ] [ CrossRef ]
  • Tzanetakis, G.; Cook, P. Musical Genre Classification of Audio Signals. IEEE Trans. Speech Audio Process. 2002 , 10 , 293–302. [ Google Scholar ]
  • Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [ Google Scholar ]
  • Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016 , arXiv:1611.01578. [ Google Scholar ]
  • Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021 , arXiv:2110.02178. [ Google Scholar ]
  • Defferrard, M.; Benzi, K.; Vandergheynst, P.; Bresson, X. FMA: A dataset for music analysis. arXiv 2016 , arXiv:1612.01840. [ Google Scholar ]

Click here to enlarge figure

Model TypeModelOA (%)Precision (%)Recall (%)F1 (%)Parameter (M)Time (ms)
MobileNet V280.3781.4780.5080.163.523.26
MobileNet V3 small79.8879.6880.0079.432.553.24
MobileNet V3 large78.9178.6478.7079.705.493.76
GhostNet 10078.1278.0277.9077.263.913.13
EfficientNet B078.4277.9478.2077.344.023.39
EfficientNet V2 B080.1879.8080.0079.365.873.42
DenseNet12186.3186.8886.4085.926.964.88
ResNet1885.5185.6985.4085.1411.183.27
ResNet3484.5584.7184.3083.8021.293.48
Inception V383.3083.6982.9082.4521.813.99
Inception V482.0382.4881.9081.7341.165.59
Xception84.4284.5084.4084.2820.836.80
ViT71.0371.6271.1070.46
MoblileViT small73.0274.0473.0072.975.584.76
Swin Transformer80.0681.180.9480.558.174.88
Jena et al. [ ]80.48180.180.22.373.96
CNN-TE (ours) 1.463.02
Model TypeModelOA (%)Precision (%)Recall (%)F1 (%)
MobileNet V281.0480.9681.2381.09
MobileNet V3 small81.2981.9381.3281.62
MobileNet V3 large82.2483.6182.6483.12
GhostNet 10080.380.2980.1580.22
EfficientNet B080.7980.6480.4380.54
EfficientNet V2 B082.0682.0381.9882
DenseNet12188.2388.6188.1788.39
ResNet1886.8787.0886.4586.76
ResNet3487.6787.9487.5487.74
Inception V384.3684.7984.1984.49
Inception V485.4285.7485.2385.48
Xception86.6586.7486.6886.71
ViT73.2373.2673.3873.32
MoblileViT small75.1175.0374.8574.94
CNN-TE (ours)
MethodGTZAN DatasetFMA-Small Dataset
OA (%) Precision (%) Recall (%) F1 (%) OA (%) Precision (%) Recall (%) F1 (%)
Without CNN module78.2179.3578.0679.1181.0781.6580.981.03
Without Transformer encoder82.3383.0481.9682.7184.5584.9284.0383.97
CNN-TE
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Chen, J.; Ma, X.; Li, S.; Ma, S.; Zhang, Z.; Ma, X. A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification. Electronics 2024 , 13 , 3313. https://doi.org/10.3390/electronics13163313

Chen J, Ma X, Li S, Ma S, Zhang Z, Ma X. A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification. Electronics . 2024; 13(16):3313. https://doi.org/10.3390/electronics13163313

Chen, Jiyang, Xiaohong Ma, Shikuan Li, Sile Ma, Zhizheng Zhang, and Xiaojing Ma. 2024. "A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification" Electronics 13, no. 16: 3313. https://doi.org/10.3390/electronics13163313

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. Ieee Paper On Image Processing Based On HUMAN MACHINE INTERFACE

    ieee research papers on image processing

  2. Biomedical Image Processing

    ieee research papers on image processing

  3. 😊 Research paper on digital image processing. Digital Image Processing

    ieee research papers on image processing

  4. IEEE Paper Template in A4 (V1)

    ieee research papers on image processing

  5. Brain Tumor Detection Using Image Processing Ieee Papers

    ieee research papers on image processing

  6. PPT

    ieee research papers on image processing

COMMENTS

  1. Image Processing Technology Based on Machine Learning

    Machine learning is a relatively new field. With the deepening of people's research in this field, the application of machine learning is increasingly extensive. On the other hand, with the advancement of science and technology, graphics have been an indispensable medium of information transmission, and image processing technology is also booming. However, the traditional image processing ...

  2. Techniques and Applications of Image and Signal Processing

    This paper comprehensively overviews image and signal processing, including their fundamentals, advanced techniques, and applications. Image processing involves analyzing and manipulating digital images, while signal processing focuses on analyzing and interpreting signals in various domains. The fundamentals encompass digital signal representation, Fourier analysis, wavelet transforms ...

  3. Deep Learning-based Image Text Processing Research

    Deep learning is a powerful multi-layer architecture that has important applications in image processing and text classification. This paper first introduces the development of deep learning and two important algorithms of deep learning: convolutional neural networks and recurrent neural networks. The paper then introduces three applications of deep learning for image recognition, image ...

  4. IEEE Transactions on Image Processing

    The IEEE Transactions on Image Processing covers novel theory, algorithms, and architectures for the formation, capture, processing, communication, analysis, and display of images, video, and multidimensional signals in a wide variety of applications.

  5. Advances in image processing using machine learning techniques

    With the recent advances in digital technology, there is an eminent integration of ML and image processing to help resolve complex problems. In this special issue, we received six interesting papers covering the following topics: image prediction, image segmentation, clustering, compressed sensing, variational learning, and dynamic light coding.

  6. IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Deep Boosting Learning: A Brand

    A. Image-Text Matching. e queries, and vice versa. Research on this topic can be roughly divided into two aspects: 1) mono-modal representation. To achieve this, some works [36]-[39] introduced graph reasoning networks to enhance the region and wor

  7. IEEE TRANSACTIONS ON IMAGE PROCESSING, JAN. -, NO. -,

    ing for human parsing achieves remarkable performance improvements, especially, FCNet [11] was proposed. In this paper, our motivation is to focus on the recent works in human parsing and discuss deep learning-based parsing 2D methods proposed until 2022. We provide a comprehensive review and insight on different aspects of these methods, including network architectures, training data, main ...

  8. IEEE

    Papers by IEEE with links to code and results.

  9. Datasets and Benchmarks

    This track welcomes all work on data-centric image processing research, covering images, videos, and 3D datasets and benchmarks as well as algorithms, tools, methods, and analyses for working with visual data.

  10. Deep learning models for digital image processing: a review

    The array of papers discussed in this paper collectively present a panorama of DL methodologies spanning various application domains. Notably, these domains encompass medical imagery, satellite imagery, botanical studies involving flower images, as well as fruit images, and even real-time image scenarios.

  11. IEEE Xplore

    This question is for testing whether you are a human visitor and to prevent automated spam submission. What code is in the image?

  12. (PDF) Advances in Artificial Intelligence for Image Processing

    PDF | AI has had a substantial influence on image processing, allowing cutting-edge methods and uses. The foundations of image processing are covered in... | Find, read and cite all the research ...

  13. Ieee Transactions on Image Processing, Vol. Xx, No. Xx, Month Xx 1

    1106 ieee transactions on image processing, v ol. 13, no. 8, august 2004 Fig. 4. Edge representation through scale using DOG (levels 1, 4, and 7) and zero crossing detection.

  14. Digital Image Processing: Advanced Technologies and Applications

    In this Special Issue, we invite authors to submit original research papers, reviews, and viewpoint articles that are related to recent advances at all levels of the applications and technologies of imaging and signal analysis.

  15. IMAGE RECOGNITION USING MACHINE LEARNING

    Abstract. Image recognition is important side of image processing for machine learning without involving any human support at any step. In this paper we study how image classification is completed ...

  16. Pattern Recognition and Image Processing

    Extensive research and development has taken place over the last 20 years in the areas of pattern recognition and image processing. Areas to which these disciplines have been applied include business (e. g., character recognition), medicine (diagnosis, abnormality detection), automation (robot vision), military intelligence, communications (data compression, speech recognition), and many ...

  17. IEEE 2024-2025 : IMAGE PROCESSING Projects

    Latest IEEE 2024-2025 projects on Image Processing with real time concepts which are implemented using Java, MATLAB, and NS2 with innovative ideas. Final year students of computer Image Processing, computer science, information science, electronics and communication can contact our corporate office located at Jayanagar, Bangalore for Image Processing project details.

  18. A Small-Scale Image U-Net-Based Color Quality Enhancement for Dense

    Efficient compression for 3D point clouds is crucial due to their massive data volume. Quality enhancement can significantly improve the compression efficiency of 3D point clouds. In this study, we propose a neural network-based quality enhancement method ...

  19. Stress Detection in It Professionals by Image Processing ...

    The project aims to leverage image processing and machine learning techniques for the detection of stress in IT professionals. The primary focus is on monitoring the emotional well-being of ...

  20. Digital Image Processing

    In this paper we give a tutorial overview of the field of digital image processing. Following a brief discussion of some basic concepts in this area, image processing algorithms are presented with emphasis on fundamental techniques which are broadly applicable to a number of applications. In addition to several real-world examples of such techniques, we also discuss the applicability of ...

  21. digital image processing IEEE PAPERS-2020

    digital image processing IEEE PAPERS-2020. In this article an algorithm has been developed to digitally compress an image using two- dimensional Haar wavelets, reduce its size, determine the recovery coefficients, and display a higher quality image of the processed image than the original image . It is known that one.

  22. Electronics

    Feature papers represent the most advanced research with significant potential for high impact in the field. ... are highly effective in image processing tasks. This has led to the adoption of CNNs in MGC by converting ... M. Deep learning approaches in topics of singing information processing. IEEE/ACM Trans. Audio Speech Lang. Process. 2022 ...

  23. Research on Image Processing Technology of Computer ...

    With the gradual improvement of artificial intelligence technology, image processing has become a common technology and is widely used in various fields to provide people with high-quality services. Starting from computer vision algorithms and image processing technologies, the computer vision display system is designed, and image distortion correction algorithms are explored for reference.

  24. Research on fluorescence image acquisition device and ...

    In order to solve the problem of inaccurate droplet count and negative and positive identification in image processing by existing digital PCR technology, this paper constructed a fluorescence image acquisition device and fluorescence image processing method based on digital PCR chip, which can process images from the chip. The device includes automatic image acquisition, microdrop counting ...

  25. Research on image pre-processing techniques for multimode ...

    The test and inspection business in the metrology field and laboratory in the electric power field faces the problems of low degree of automation of experimental data flow, high degree of heterogeneity of sensor transmission signals, low degree of resource management sharing and low degree of intelligence. In the paper, orthogonal projection, perspective projection and other projection ...