Grad Coach

Research Topics & Ideas: Data Science

50 Topic Ideas To Kickstart Your Research Project

Research topics and ideas about data science and big data analytics

If you’re just starting out exploring data science-related topics for your dissertation, thesis or research project, you’ve come to the right place. In this post, we’ll help kickstart your research by providing a hearty list of data science and analytics-related research ideas , including examples from recent studies.

PS – This is just the start…

We know it’s exciting to run through a list of research topics, but please keep in mind that this list is just a starting point . These topic ideas provided here are intentionally broad and generic , so keep in mind that you will need to develop them further. Nevertheless, they should inspire some ideas for your project.

To develop a suitable research topic, you’ll need to identify a clear and convincing research gap , and a viable plan to fill that gap. If this sounds foreign to you, check out our free research topic webinar that explores how to find and refine a high-quality research topic, from scratch. Alternatively, consider our 1-on-1 coaching service .

Research topic idea mega list

Data Science-Related Research Topics

  • Developing machine learning models for real-time fraud detection in online transactions.
  • The use of big data analytics in predicting and managing urban traffic flow.
  • Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.
  • The application of predictive analytics in personalizing cancer treatment plans.
  • Analyzing consumer behavior through big data to enhance retail marketing strategies.
  • The role of data science in optimizing renewable energy generation from wind farms.
  • Developing natural language processing algorithms for real-time news aggregation and summarization.
  • The application of big data in monitoring and predicting epidemic outbreaks.
  • Investigating the use of machine learning in automating credit scoring for microfinance.
  • The role of data analytics in improving patient care in telemedicine.
  • Developing AI-driven models for predictive maintenance in the manufacturing industry.
  • The use of big data analytics in enhancing cybersecurity threat intelligence.
  • Investigating the impact of sentiment analysis on brand reputation management.
  • The application of data science in optimizing logistics and supply chain operations.
  • Developing deep learning techniques for image recognition in medical diagnostics.
  • The role of big data in analyzing climate change impacts on agricultural productivity.
  • Investigating the use of data analytics in optimizing energy consumption in smart buildings.
  • The application of machine learning in detecting plagiarism in academic works.
  • Analyzing social media data for trends in political opinion and electoral predictions.
  • The role of big data in enhancing sports performance analytics.
  • Developing data-driven strategies for effective water resource management.
  • The use of big data in improving customer experience in the banking sector.
  • Investigating the application of data science in fraud detection in insurance claims.
  • The role of predictive analytics in financial market risk assessment.
  • Developing AI models for early detection of network vulnerabilities.

Research topic evaluator

Data Science Research Ideas (Continued)

  • The application of big data in public transportation systems for route optimization.
  • Investigating the impact of big data analytics on e-commerce recommendation systems.
  • The use of data mining techniques in understanding consumer preferences in the entertainment industry.
  • Developing predictive models for real estate pricing and market trends.
  • The role of big data in tracking and managing environmental pollution.
  • Investigating the use of data analytics in improving airline operational efficiency.
  • The application of machine learning in optimizing pharmaceutical drug discovery.
  • Analyzing online customer reviews to inform product development in the tech industry.
  • The role of data science in crime prediction and prevention strategies.
  • Developing models for analyzing financial time series data for investment strategies.
  • The use of big data in assessing the impact of educational policies on student performance.
  • Investigating the effectiveness of data visualization techniques in business reporting.
  • The application of data analytics in human resource management and talent acquisition.
  • Developing algorithms for anomaly detection in network traffic data.
  • The role of machine learning in enhancing personalized online learning experiences.
  • Investigating the use of big data in urban planning and smart city development.
  • The application of predictive analytics in weather forecasting and disaster management.
  • Analyzing consumer data to drive innovations in the automotive industry.
  • The role of data science in optimizing content delivery networks for streaming services.
  • Developing machine learning models for automated text classification in legal documents.
  • The use of big data in tracking global supply chain disruptions.
  • Investigating the application of data analytics in personalized nutrition and fitness.
  • The role of big data in enhancing the accuracy of geological surveying for natural resource exploration.
  • Developing predictive models for customer churn in the telecommunications industry.
  • The application of data science in optimizing advertisement placement and reach.

Recent Data Science-Related Studies

While the ideas we’ve presented above are a decent starting point for finding a research topic, they are fairly generic and non-specific. So, it helps to look at actual studies in the data science and analytics space to see how this all comes together in practice.

Below, we’ve included a selection of recent studies to help refine your thinking. These are actual studies,  so they can provide some useful insight as to what a research topic looks like in practice.

  • Data Science in Healthcare: COVID-19 and Beyond (Hulsen, 2022)
  • Auto-ML Web-application for Automated Machine Learning Algorithm Training and evaluation (Mukherjee & Rao, 2022)
  • Survey on Statistics and ML in Data Science and Effect in Businesses (Reddy et al., 2022)
  • Visualization in Data Science VDS @ KDD 2022 (Plant et al., 2022)
  • An Essay on How Data Science Can Strengthen Business (Santos, 2023)
  • A Deep study of Data science related problems, application and machine learning algorithms utilized in Data science (Ranjani et al., 2022)
  • You Teach WHAT in Your Data Science Course?!? (Posner & Kerby-Helm, 2022)
  • Statistical Analysis for the Traffic Police Activity: Nashville, Tennessee, USA (Tufail & Gul, 2022)
  • Data Management and Visual Information Processing in Financial Organization using Machine Learning (Balamurugan et al., 2022)
  • A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis (Pitroda, 2022)
  • Applications of Data Science in Respective Engineering Domains (Rasool & Chaudhary, 2022)
  • Jupyter Notebooks for Introducing Data Science to Novice Users (Fruchart et al., 2022)
  • Towards a Systematic Review of Data Science Programs: Themes, Courses, and Ethics (Nellore & Zimmer, 2022)
  • Application of data science and bioinformatics in healthcare technologies (Veeranki & Varshney, 2022)
  • TAPS Responsibility Matrix: A tool for responsible data science by design (Urovi et al., 2023)
  • Data Detectives: A Data Science Program for Middle Grade Learners (Thompson & Irgens, 2022)
  • MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH (Mike & Hazzan, 2022)
  • COMPONENTS OF DATA SCIENCE AND ITS APPLICATIONS (Paul et al., 2022)
  • Analysis on the Application of Data Science in Business Analytics (Wang, 2022)

As you can see, these research topics are a lot more focused than the generic topic ideas we presented earlier. So, for you to develop a high-quality research topic, you’ll need to get specific and laser-focused on a specific context with specific variables of interest.  In the video below, we explore some other important things you’ll need to consider when crafting your research topic.

Get 1-On-1 Help

If you’re still unsure about how to find a quality research topic, check out our Research Topic Kickstarter service, which is the perfect starting point for developing a unique, well-justified research topic.

Research Topic Kickstarter - Need Help Finding A Research Topic?

You Might Also Like:

IT & Computer Science Research Topics

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

eml header

37 Research Topics In Data Science To Stay On Top Of

Stewart Kaplan

  • February 22, 2024

As a data scientist, staying on top of the latest research in your field is essential.

The data science landscape changes rapidly, and new techniques and tools are constantly being developed.

To keep up with the competition, you need to be aware of the latest trends and topics in data science research.

In this article, we will provide an overview of 37 hot research topics in data science.

We will discuss each topic in detail, including its significance and potential applications.

These topics could be an idea for a thesis or simply topics you can research independently.

Stay tuned – this is one blog post you don’t want to miss!

37 Research Topics in Data Science

1.) predictive modeling.

Predictive modeling is a significant portion of data science and a topic you must be aware of.

Simply put, it is the process of using historical data to build models that can predict future outcomes.

Predictive modeling has many applications, from marketing and sales to financial forecasting and risk management.

As businesses increasingly rely on data to make decisions, predictive modeling is becoming more and more important.

While it can be complex, predictive modeling is a powerful tool that gives businesses a competitive advantage.

predictive modeling

2.) Big Data Analytics

These days, it seems like everyone is talking about big data.

And with good reason – organizations of all sizes are sitting on mountains of data, and they’re increasingly turning to data scientists to help them make sense of it all.

But what exactly is big data? And what does it mean for data science?

Simply put, big data is a term used to describe datasets that are too large and complex for traditional data processing techniques.

Big data typically refers to datasets of a few terabytes or more.

But size isn’t the only defining characteristic – big data is also characterized by its high Velocity (the speed at which data is generated), Variety (the different types of data), and Volume (the amount of the information).

Given the enormity of big data, it’s not surprising that organizations are struggling to make sense of it all.

That’s where data science comes in.

Data scientists use various methods to wrangle big data, including distributed computing and other decentralized technologies.

With the help of data science, organizations are beginning to unlock the hidden value in their big data.

By harnessing the power of big data analytics, they can improve their decision-making, better understand their customers, and develop new products and services.

3.) Auto Machine Learning

Auto machine learning is a research topic in data science concerned with developing algorithms that can automatically learn from data without intervention.

This area of research is vital because it allows data scientists to automate the process of writing code for every dataset.

This allows us to focus on other tasks, such as model selection and validation.

Auto machine learning algorithms can learn from data in a hands-off way for the data scientist – while still providing incredible insights.

This makes them a valuable tool for data scientists who either don’t have the skills to do their own analysis or are struggling.

Auto Machine Learning

4.) Text Mining

Text mining is a research topic in data science that deals with text data extraction.

This area of research is important because it allows us to get as much information as possible from the vast amount of text data available today.

Text mining techniques can extract information from text data, such as keywords, sentiments, and relationships.

This information can be used for various purposes, such as model building and predictive analytics.

5.) Natural Language Processing

Natural language processing is a data science research topic that analyzes human language data.

This area of research is important because it allows us to understand and make sense of the vast amount of text data available today.

Natural language processing techniques can build predictive and interactive models from any language data.

Natural Language processing is pretty broad, and recent advances like GPT-3 have pushed this topic to the forefront.

natural language processing

6.) Recommender Systems

Recommender systems are an exciting topic in data science because they allow us to make better products, services, and content recommendations.

Businesses can better understand their customers and their needs by using recommender systems.

This, in turn, allows them to develop better products and services that meet the needs of their customers.

Recommender systems are also used to recommend content to users.

This can be done on an individual level or at a group level.

Think about Netflix, for example, always knowing what you want to watch!

Recommender systems are a valuable tool for businesses and users alike.

7.) Deep Learning

Deep learning is a research topic in data science that deals with artificial neural networks.

These networks are composed of multiple layers, and each layer is formed from various nodes.

Deep learning networks can learn from data similarly to how humans learn, irrespective of the data distribution.

This makes them a valuable tool for data scientists looking to build models that can learn from data independently.

The deep learning network has become very popular in recent years because of its ability to achieve state-of-the-art results on various tasks.

There seems to be a new SOTA deep learning algorithm research paper on  https://arxiv.org/  every single day!

deep learning

8.) Reinforcement Learning

Reinforcement learning is a research topic in data science that deals with algorithms that can learn on multiple levels from interactions with their environment.

This area of research is essential because it allows us to develop algorithms that can learn non-greedy approaches to decision-making, allowing businesses and companies to win in the long term compared to the short.

9.) Data Visualization

Data visualization is an excellent research topic in data science because it allows us to see our data in a way that is easy to understand.

Data visualization techniques can be used to create charts, graphs, and other visual representations of data.

This allows us to see the patterns and trends hidden in our data.

Data visualization is also used to communicate results to others.

This allows us to share our findings with others in a way that is easy to understand.

There are many ways to contribute to and learn about data visualization.

Some ways include attending conferences, reading papers, and contributing to open-source projects.

data visualization

10.) Predictive Maintenance

Predictive maintenance is a hot topic in data science because it allows us to prevent failures before they happen.

This is done using data analytics to predict when a failure will occur.

This allows us to take corrective action before the failure actually happens.

While this sounds simple, avoiding false positives while keeping recall is challenging and an area wide open for advancement.

11.) Financial Analysis

Financial analysis is an older topic that has been around for a while but is still a great field where contributions can be felt.

Current researchers are focused on analyzing macroeconomic data to make better financial decisions.

This is done by analyzing the data to identify trends and patterns.

Financial analysts can use this information to make informed decisions about where to invest their money.

Financial analysis is also used to predict future economic trends.

This allows businesses and individuals to prepare for potential financial hardships and enable companies to be cash-heavy during good economic conditions.

Overall, financial analysis is a valuable tool for anyone looking to make better financial decisions.

Financial Analysis

12.) Image Recognition

Image recognition is one of the hottest topics in data science because it allows us to identify objects in images.

This is done using artificial intelligence algorithms that can learn from data and understand what objects you’re looking for.

This allows us to build models that can accurately recognize objects in images and video.

This is a valuable tool for businesses and individuals who want to be able to identify objects in images.

Think about security, identification, routing, traffic, etc.

Image Recognition has gained a ton of momentum recently – for a good reason.

13.) Fraud Detection

Fraud detection is a great topic in data science because it allows us to identify fraudulent activity before it happens.

This is done by analyzing data to look for patterns and trends that may be associated with the fraud.

Once our machine learning model recognizes some of these patterns in real time, it immediately detects fraud.

This allows us to take corrective action before the fraud actually happens.

Fraud detection is a valuable tool for anyone who wants to protect themselves from potential fraudulent activity.

fraud detection

14.) Web Scraping

Web scraping is a controversial topic in data science because it allows us to collect data from the web, which is usually data you do not own.

This is done by extracting data from websites using scraping tools that are usually custom-programmed.

This allows us to collect data that would otherwise be inaccessible.

For obvious reasons, web scraping is a unique tool – giving you data your competitors would have no chance of getting.

I think there is an excellent opportunity to create new and innovative ways to make scraping accessible for everyone, not just those who understand Selenium and Beautiful Soup.

15.) Social Media Analysis

Social media analysis is not new; many people have already created exciting and innovative algorithms to study this.

However, it is still a great data science research topic because it allows us to understand how people interact on social media.

This is done by analyzing data from social media platforms to look for insights, bots, and recent societal trends.

Once we understand these practices, we can use this information to improve our marketing efforts.

For example, if we know that a particular demographic prefers a specific type of content, we can create more content that appeals to them.

Social media analysis is also used to understand how people interact with brands on social media.

This allows businesses to understand better what their customers want and need.

Overall, social media analysis is valuable for anyone who wants to improve their marketing efforts or understand how customers interact with brands.

social media

16.) GPU Computing

GPU computing is a fun new research topic in data science because it allows us to process data much faster than traditional CPUs .

Due to how GPUs are made, they’re incredibly proficient at intense matrix operations, outperforming traditional CPUs by very high margins.

While the computation is fast, the coding is still tricky.

There is an excellent research opportunity to bring these innovations to non-traditional modules, allowing data science to take advantage of GPU computing outside of deep learning.

17.) Quantum Computing

Quantum computing is a new research topic in data science and physics because it allows us to process data much faster than traditional computers.

It also opens the door to new types of data.

There are just some problems that can’t be solved utilizing outside of the classical computer.

For example, if you wanted to understand how a single atom moved around, a classical computer couldn’t handle this problem.

You’ll need to utilize a quantum computer to handle quantum mechanics problems.

This may be the “hottest” research topic on the planet right now, with some of the top researchers in computer science and physics worldwide working on it.

You could be too.

quantum computing

18.) Genomics

Genomics may be the only research topic that can compete with quantum computing regarding the “number of top researchers working on it.”

Genomics is a fantastic intersection of data science because it allows us to understand how genes work.

This is done by sequencing the DNA of different organisms to look for insights into our and other species.

Once we understand these patterns, we can use this information to improve our understanding of diseases and create new and innovative treatments for them.

Genomics is also used to study the evolution of different species.

Genomics is the future and a field begging for new and exciting research professionals to take it to the next step.

19.) Location-based services

Location-based services are an old and time-tested research topic in data science.

Since GPS and 4g cell phone reception became a thing, we’ve been trying to stay informed about how humans interact with their environment.

This is done by analyzing data from GPS tracking devices, cell phone towers, and Wi-Fi routers to look for insights into how humans interact.

Once we understand these practices, we can use this information to improve our geotargeting efforts, improve maps, find faster routes, and improve cohesion throughout a community.

Location-based services are used to understand the user, something every business could always use a little bit more of.

While a seemingly “stale” field, location-based services have seen a revival period with self-driving cars.

GPS

20.) Smart City Applications

Smart city applications are all the rage in data science research right now.

By harnessing the power of data, cities can become more efficient and sustainable.

But what exactly are smart city applications?

In short, they are systems that use data to improve city infrastructure and services.

This can include anything from traffic management and energy use to waste management and public safety.

Data is collected from various sources, including sensors, cameras, and social media.

It is then analyzed to identify tendencies and habits.

This information can make predictions about future needs and optimize city resources.

As more and more cities strive to become “smart,” the demand for data scientists with expertise in smart city applications is only growing.

21.) Internet Of Things (IoT)

The Internet of Things, or IoT, is exciting and new data science and sustainability research topic.

IoT is a network of physical objects embedded with sensors and connected to the internet.

These objects can include everything from alarm clocks to refrigerators; they’re all connected to the internet.

That means that they can share data with computers.

And that’s where data science comes in.

Data scientists are using IoT data to learn everything from how people use energy to how traffic flows through a city.

They’re also using IoT data to predict when an appliance will break down or when a road will be congested.

Really, the possibilities are endless.

With such a wide-open field, it’s easy to see why IoT is being researched by some of the top professionals in the world.

internet of things

22.) Cybersecurity

Cybersecurity is a relatively new research topic in data science and in general, but it’s already garnering a lot of attention from businesses and organizations.

After all, with the increasing number of cyber attacks in recent years, it’s clear that we need to find better ways to protect our data.

While most of cybersecurity focuses on infrastructure, data scientists can leverage historical events to find potential exploits to protect their companies.

Sometimes, looking at a problem from a different angle helps, and that’s what data science brings to cybersecurity.

Also, data science can help to develop new security technologies and protocols.

As a result, cybersecurity is a crucial data science research area and one that will only become more important in the years to come.

23.) Blockchain

Blockchain is an incredible new research topic in data science for several reasons.

First, it is a distributed database technology that enables secure, transparent, and tamper-proof transactions.

Did someone say transmitting data?

This makes it an ideal platform for tracking data and transactions in various industries.

Second, blockchain is powered by cryptography, which not only makes it highly secure – but is a familiar foe for data scientists.

Finally, blockchain is still in its early stages of development, so there is much room for research and innovation.

As a result, blockchain is a great new research topic in data science that vows to revolutionize how we store, transmit and manage data.

blockchain

24.) Sustainability

Sustainability is a relatively new research topic in data science, but it is gaining traction quickly.

To keep up with this demand, The Wharton School of the University of Pennsylvania has  started to offer an MBA in Sustainability .

This demand isn’t shocking, and some of the reasons include the following:

Sustainability is an important issue that is relevant to everyone.

Datasets on sustainability are constantly growing and changing, making it an exciting challenge for data scientists.

There hasn’t been a “set way” to approach sustainability from a data perspective, making it an excellent opportunity for interdisciplinary research.

As data science grows, sustainability will likely become an increasingly important research topic.

25.) Educational Data

Education has always been a great topic for research, and with the advent of big data, educational data has become an even richer source of information.

By studying educational data, researchers can gain insights into how students learn, what motivates them, and what barriers these students may face.

Besides, data science can be used to develop educational interventions tailored to individual students’ needs.

Imagine being the researcher that helps that high schooler pass mathematics; what an incredible feeling.

With the increasing availability of educational data, data science has enormous potential to improve the quality of education.

online education

26.) Politics

As data science continues to evolve, so does the scope of its applications.

Originally used primarily for business intelligence and marketing, data science is now applied to various fields, including politics.

By analyzing large data sets, political scientists (data scientists with a cooler name) can gain valuable insights into voting patterns, campaign strategies, and more.

Further, data science can be used to forecast election results and understand the effects of political events on public opinion.

With the wealth of data available, there is no shortage of research opportunities in this field.

As data science evolves, so does our understanding of politics and its role in our world.

27.) Cloud Technologies

Cloud technologies are a great research topic.

It allows for the outsourcing and sharing of computer resources and applications all over the internet.

This lets organizations save money on hardware and maintenance costs while providing employees access to the latest and greatest software and applications.

I believe there is an argument that AWS could be the greatest and most technologically advanced business ever built (Yes, I know it’s only part of the company).

Besides, cloud technologies can help improve team members’ collaboration by allowing them to share files and work on projects together in real-time.

As more businesses adopt cloud technologies, data scientists must stay up-to-date on the latest trends in this area.

By researching cloud technologies, data scientists can help organizations to make the most of this new and exciting technology.

cloud technologies

28.) Robotics

Robotics has recently become a household name, and it’s for a good reason.

First, robotics deals with controlling and planning physical systems, an inherently complex problem.

Second, robotics requires various sensors and actuators to interact with the world, making it an ideal application for machine learning techniques.

Finally, robotics is an interdisciplinary field that draws on various disciplines, such as computer science, mechanical engineering, and electrical engineering.

As a result, robotics is a rich source of research problems for data scientists.

29.) HealthCare

Healthcare is an industry that is ripe for data-driven innovation.

Hospitals, clinics, and health insurance companies generate a tremendous amount of data daily.

This data can be used to improve the quality of care and outcomes for patients.

This is perfect timing, as the healthcare industry is undergoing a significant shift towards value-based care, which means there is a greater need than ever for data-driven decision-making.

As a result, healthcare is an exciting new research topic for data scientists.

There are many different ways in which data can be used to improve healthcare, and there is a ton of room for newcomers to make discoveries.

healthcare

30.) Remote Work

There’s no doubt that remote work is on the rise.

In today’s global economy, more and more businesses are allowing their employees to work from home or anywhere else they can get a stable internet connection.

But what does this mean for data science? Well, for one thing, it opens up a whole new field of research.

For example, how does remote work impact employee productivity?

What are the best ways to manage and collaborate on data science projects when team members are spread across the globe?

And what are the cybersecurity risks associated with working remotely?

These are just a few of the questions that data scientists will be able to answer with further research.

So if you’re looking for a new topic to sink your teeth into, remote work in data science is a great option.

31.) Data-Driven Journalism

Data-driven journalism is an exciting new field of research that combines the best of both worlds: the rigor of data science with the creativity of journalism.

By applying data analytics to large datasets, journalists can uncover stories that would otherwise be hidden.

And telling these stories compellingly can help people better understand the world around them.

Data-driven journalism is still in its infancy, but it has already had a major impact on how news is reported.

In the future, it will only become more important as data becomes increasingly fluid among journalists.

It is an exciting new topic and research field for data scientists to explore.

journalism

32.) Data Engineering

Data engineering is a staple in data science, focusing on efficiently managing data.

Data engineers are responsible for developing and maintaining the systems that collect, process, and store data.

In recent years, there has been an increasing demand for data engineers as the volume of data generated by businesses and organizations has grown exponentially.

Data engineers must be able to design and implement efficient data-processing pipelines and have the skills to optimize and troubleshoot existing systems.

If you are looking for a challenging research topic that would immediately impact you worldwide, then improving or innovating a new approach in data engineering would be a good start.

33.) Data Curation

Data curation has been a hot topic in the data science community for some time now.

Curating data involves organizing, managing, and preserving data so researchers can use it.

Data curation can help to ensure that data is accurate, reliable, and accessible.

It can also help to prevent research duplication and to facilitate the sharing of data between researchers.

Data curation is a vital part of data science. In recent years, there has been an increasing focus on data curation, as it has become clear that it is essential for ensuring data quality.

As a result, data curation is now a major research topic in data science.

There are numerous books and articles on the subject, and many universities offer courses on data curation.

Data curation is an integral part of data science and will only become more important in the future.

businessman

34.) Meta-Learning

Meta-learning is gaining a ton of steam in data science. It’s learning how to learn.

So, if you can learn how to learn, you can learn anything much faster.

Meta-learning is mainly used in deep learning, as applications outside of this are generally pretty hard.

In deep learning, many parameters need to be tuned for a good model, and there’s usually a lot of data.

You can save time and effort if you can automatically and quickly do this tuning.

In machine learning, meta-learning can improve models’ performance by sharing knowledge between different models.

For example, if you have a bunch of different models that all solve the same problem, then you can use meta-learning to share the knowledge between them to improve the cluster (groups) overall performance.

I don’t know how anyone looking for a research topic could stay away from this field; it’s what the  Terminator  warned us about!

35.) Data Warehousing

A data warehouse is a system used for data analysis and reporting.

It is a central data repository created by combining data from multiple sources.

Data warehouses are often used to store historical data, such as sales data, financial data, and customer data.

This data type can be used to create reports and perform statistical analysis.

Data warehouses also store data that the organization is not currently using.

This type of data can be used for future research projects.

Data warehousing is an incredible research topic in data science because it offers a variety of benefits.

Data warehouses help organizations to save time and money by reducing the need for manual data entry.

They also help to improve the accuracy of reports and provide a complete picture of the organization’s performance.

Data warehousing feels like one of the weakest parts of the Data Science Technology Stack; if you want a research topic that could have a monumental impact – data warehousing is an excellent place to look.

data warehousing

36.) Business Intelligence

Business intelligence aims to collect, process, and analyze data to help businesses make better decisions.

Business intelligence can improve marketing, sales, customer service, and operations.

It can also be used to identify new business opportunities and track competition.

BI is business and another tool in your company’s toolbox to continue dominating your area.

Data science is the perfect tool for business intelligence because it combines statistics, computer science, and machine learning.

Data scientists can use business intelligence to answer questions like, “What are our customers buying?” or “What are our competitors doing?” or “How can we increase sales?”

Business intelligence is a great way to improve your business’s bottom line and an excellent opportunity to dive deep into a well-respected research topic.

37.) Crowdsourcing

One of the newest areas of research in data science is crowdsourcing.

Crowdsourcing is a process of sourcing tasks or projects to a large group of people, typically via the internet.

This can be done for various purposes, such as gathering data, developing new algorithms, or even just for fun (think: online quizzes and surveys).

But what makes crowdsourcing so powerful is that it allows businesses and organizations to tap into a vast pool of talent and resources they wouldn’t otherwise have access to.

And with the rise of social media, it’s easier than ever to connect with potential crowdsource workers worldwide.

Imagine if you could effect that, finding innovative ways to improve how people work together.

That would have a huge effect.

crowd sourcing

Final Thoughts, Are These Research Topics In Data Science For You?

Thirty-seven different research topics in data science are a lot to take in, but we hope you found a research topic that interests you.

If not, don’t worry – there are plenty of other great topics to explore.

The important thing is to get started with your research and find ways to apply what you learn to real-world problems.

We wish you the best of luck as you begin your data science journey!

Other Data Science Articles

We love talking about data science; here are a couple of our favorite articles:

  • Why Are You Interested In Data Science?
  • Recent Posts

Stewart Kaplan

  • Mastering Pricing Analysis in Data Science [Boost Your Revenue Now] - June 25, 2024
  • How much does Orbis Investment pay software engineers? [Revealed Inside] - June 25, 2024
  • Understanding ML vs DL in Data Science: Key Differences Explained [Must-Read Insights] - June 25, 2024
  • Latest News

Logo

  • Cryptocurrencies
  • White Papers

10 Best Research and Thesis Topic Ideas for Data Science in 2022

10 Best Research and Thesis Topic Ideas for Data Science in 2022

These research and thesis topics for data science will ensure more knowledge and skills for both students and scholars

As businesses seek to employ data to boost digital and industrial transformation, companies across the globe are looking for skilled and talented data professionals who can leverage the meaningful insights extracted from the data to enhance business productivity and help reach company objectives successfully. Recently, data science has turned into a lucrative career option. Nowadays, universities and institutes are offering various data science and big data courses to prepare students to achieve success in the tech industry. The best course of action to amplify the robustness of a resume is to participate or take up different data science projects. In this article, we have listed 10 such research and thesis topic ideas to take up as data science projects in 2022.

  • Handling practical video analytics in a distributed cloud:  With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things (IoT), telecom infrastructure, and operators is huge in generating insights from video analytics. In this perspective, several questions need to be answered, like the efficiency of the existing analytics systems, the changes about to take place if real-time analytics are integrated, and others.
  • Smart healthcare systems using big data analytics: Big data analytics plays a significant role in making healthcare more efficient, accessible, and cost-effective. Big data analytics enhances the operational efficiency of smart healthcare providers by providing real-time analytics. It enhances the capabilities of the intelligent systems by using short-span data-driven insights, but there are still distinct challenges that are yet to be addressed in this field.
  • Identifying fake news using real-time analytics:  The circulation of fake news has become a pressing issue in the modern era. The data gathered from social media networks might seem legit, but sometimes they are not. The sources that provide the data are unauthenticated most of the time, which makes it a crucial issue to be addressed.
  • TOP 10 DATA SCIENCE JOB SKILLS THAT WILL BE ON HIGH DEMAND IN 2022
  • TOP 10 DATA SCIENCE UNDERGRADUATE COURSES IN INDIA FOR 2022
  • TOP DATA SCIENCE PROJECTS TO DO DURING YOUR OMICRON QUARANTINE
  • Secure federated learning with real-world applications : Federated learning is a technique that trains an algorithm across multiple decentralized edge devices and servers. This technique can be adopted to build models locally, but if this technique can be deployed at scale or not, across multiple platforms with high-level security is still obscure.
  • Big data analytics and its impact on marketing strategy : The advent of data science and big data analytics has entirely redefined the marketing industry. It has helped enterprises by offering valuable insights into their existing and future customers. But several issues like the existence of surplus data, integrating complex data into customers' journeys, and complete data privacy are some of the branches that are still untrodden and need immediate attention.
  • Impact of big data on business decision-making: Present studies signify that big data has transformed the way managers and business leaders make critical decisions concerning the growth and development of the business. It allows them to access objective data and analyse the market environments, enabling companies to adapt rapidly and make decisions faster. Working on this topic will help students understand the present market and business conditions and help them analyse new solutions.
  • Implementing big data to understand consumer behaviour : In understanding consumer behaviour, big data is used to analyse the data points depicting a consumer's journey after buying a product. Data gives a clearer picture in understanding specific scenarios. This topic will help understand the problems that businesses face in utilizing the insights and develop new strategies in the future to generate more ROI.
  • Applications of big data to predict future demand and forecasting : Predictive analytics in data science has emerged as an integral part of decision-making and demand forecasting. Working on this topic will enable the students to determine the significance of the high-quality historical data analysis and the factors that drive higher demand in consumers.
  • The importance of data exploration over data analysis : Exploration enables a deeper understanding of the dataset, making it easier to navigate and use the data later. Intelligent analysts must understand and explore the differences between data exploration and analysis and use them according to specific needs to fulfill organizational requirements.
  • Data science and software engineering : Software engineering and development are a major part of data science. Skilled data professionals should learn and explore the possibilities of the various technical and software skills for performing critical AI and big data tasks.

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

logo

StatAnalytica

99+ Interesting Data Science Research Topics For Students In 2024

Data Science Research Topics

In today’s information-driven world, data science research stands as a pivotal domain shaping our understanding and application of vast data sets. It amalgamates statistics, computer science, and domain knowledge to extract valuable insights from data. Understanding ‘What Is Data Science?’ is fundamental—a field exploring patterns, trends, and solutions embedded within data.

However, the significance of data science research papers in a student’s life cannot be overstated. They foster critical thinking, analytical skills, and a deeper comprehension of the subject matter. To aid students in navigating this realm effectively, this blog dives into essential elements integral to a data science research paper, while also offering a goldmine of 99+ engaging and timely data science research topics for 2024.

Unraveling tips for crafting an impactful research paper and insights on choosing the right topic, this blog is a compass for students exploring data science research topics. Stay tuned to unearth more about ‘data science research topics’ and refine your academic journey.

What Is Data Science?

Table of Contents

Data Science is like a detective for information! It’s all about uncovering secrets and finding valuable stuff in heaps of data. Imagine you have a giant puzzle with tons of pieces scattered around. Data Science helps in sorting these pieces and figuring out the picture they create. It uses tools and skills from math, computer science, and knowledge about different fields to solve real-world problems.

In simpler terms, Data Science is like a chef in a kitchen, blending ingredients to create a perfect dish. Instead of food, it combines data—numbers, words, pictures—to cook up solutions. It helps in understanding patterns, making predictions, and answering tricky questions by exploring data from various sources. In essence, Data Science is the magic that turns data chaos into meaningful insights that can guide decisions and make life better.

Importance Of Data Science Research Paper In Student’s Life

Data Science research papers are like treasure maps for students! They’re super important because they teach students how to explore and understand the world of data. Writing these papers helps students develop problem-solving skills, think critically, and become better at analyzing information. It’s like a fun adventure where they learn how to dig into data and uncover valuable insights that can solve real-world problems.

  • Enhances critical thinking: Research papers challenge students to analyze and interpret data critically, honing their thinking skills.
  • Fosters analytical abilities: Students learn to sift through vast amounts of data, extracting meaningful patterns and information.
  • Encourages exploration: Engaging in research encourages students to explore diverse data sources, broadening their knowledge horizon.
  • Develops communication skills: Writing research papers hones students’ ability to articulate complex findings and ideas clearly.
  • Prepares for real-world challenges: Through research, students learn to apply theoretical knowledge to practical problems, preparing them for future endeavors.

Elements That Must Be Present In Data Science Research Paper

Here are some elements that must be present in data science research paper:

1. Clear Objective

A data science research paper should start with a clear goal, stating what the study aims to investigate or achieve. This objective guides the entire paper, helping readers understand the purpose and direction of the research.

2. Detailed Methodology

Explaining how the research was conducted is crucial. The paper should outline the tools, techniques, and steps used to collect, analyze, and interpret data. This section allows others to replicate the study and validate its findings.

3. Accurate Data Presentation

Presenting data in an organized and understandable manner is key. Graphs, charts, and tables should be used to illustrate findings clearly, aiding readers’ comprehension of the results.

4. Thorough Analysis and Interpretation

Simply presenting data isn’t enough; the paper should delve into a deep analysis, explaining the meaning behind the numbers. Interpretation helps draw conclusions and insights from the data.

5. Conclusive Findings and Recommendations

A strong conclusion summarizes the key findings of the research. It should also offer suggestions or recommendations based on the study’s outcomes, indicating potential avenues for future exploration.

Here are some interesting data science research topics for students in 2024:

Natural Language Processing (NLP)

  • Multi-modal Contextual Understanding: Integrating text, images, and audio to enhance NLP models’ comprehension abilities.
  • Cross-lingual Transfer Learning: Investigating methods to transfer knowledge from one language to another for improved translation and understanding.
  • Emotion Detection in Text: Developing models to accurately detect and interpret emotions conveyed in textual content.
  • Sarcasm Detection in Social Media: Building algorithms that can identify and understand sarcastic remarks in online conversations.
  • Language Generation for Code: Generating code snippets and scripts from natural language descriptions using NLP techniques.
  • Bias Mitigation in Language Models: Developing strategies to mitigate biases present in large language models and ensure fairness in generated content.
  • Dialogue Systems for Personalized Assistance: Creating intelligent conversational agents that provide personalized assistance based on user preferences and history.
  • Summarization of Legal Documents: Developing NLP models capable of summarizing lengthy legal documents for quick understanding and analysis.
  • Understanding Contextual Nuances in Sentiment Analysis: Enhancing sentiment analysis models to better comprehend contextual nuances and sarcasm in text.
  • Hate Speech Detection and Moderation: Building systems to detect and moderate hate speech and offensive language in online content.

Computer Vision

  • Weakly Supervised Object Detection: Exploring methods to train object detection models with limited annotated data.
  • Video Action Recognition in Uncontrolled Environments: Developing models that can recognize human actions in videos captured in uncontrolled settings.
  • Image Generation and Translation: Investigating techniques to generate realistic images from textual descriptions and translate images across different domains.
  • Scene Understanding in Autonomous Systems: Enhancing computer vision algorithms for better scene understanding in autonomous vehicles and robotics.
  • Fine-grained Visual Classification: Improving models to classify objects at a more granular level, distinguishing subtle differences within similar categories.
  • Visual Question Answering (VQA): Creating systems capable of answering questions based on visual input, requiring reasoning and comprehension abilities.
  • Medical Image Analysis for Disease Diagnosis: Developing computer vision models for accurate and early diagnosis of diseases from medical images.
  • Action Localization in Videos: Building models to precisely localize and recognize specific actions within video sequences.
  • Image Captioning with Contextual Understanding: Generating captions for images considering the context and relationships between objects.
  • Human Pose Estimation in Real-time: Improving algorithms for real-time estimation of human poses in videos for applications like motion analysis and gaming.

Machine Learning Algorithms

  • Self-supervised Learning Techniques: Exploring novel methods for training machine learning models without explicit supervision.
  • Continual Learning in Dynamic Environments: Investigating algorithms that can continuously learn and adapt to changing data distributions and tasks.
  • Explainable AI for Model Interpretability: Developing techniques to explain the decisions and predictions made by complex machine learning models.
  • Transfer Learning for Small Datasets: Techniques to effectively transfer knowledge from large datasets to small or domain-specific datasets.
  • Adaptive Learning Rate Optimization: Enhancing optimization algorithms to dynamically adjust learning rates based on data characteristics.
  • Robustness to Adversarial Attacks: Building models resistant to adversarial attacks, ensuring stability and security in machine learning applications.
  • Active Learning Strategies: Investigating methods to select and label the most informative data points for model training to minimize labeling efforts.
  • Privacy-preserving Machine Learning: Developing algorithms that can train models on sensitive data while preserving individual privacy.
  • Fairness-aware Machine Learning: Techniques to ensure fairness and mitigate biases in machine learning models across different demographics.
  • Multi-task Learning for Jointly Learning Tasks: Exploring approaches to jointly train models on multiple related tasks to improve overall performance.

Deep Learning

  • Graph Neural Networks for Representation Learning: Using deep learning techniques to learn representations from graph-structured data.
  • Transformer Models for Image Processing: Adapting transformer architectures for image-related tasks, such as image classification and generation.
  • Few-shot Learning Strategies: Investigating methods to enable deep learning models to learn from a few examples in new categories.
  • Memory-Augmented Neural Networks: Enhancing neural networks with external memory for improved learning and reasoning capabilities.
  • Neural Architecture Search (NAS): Automating the design of neural network architectures for specific tasks or constraints.
  • Meta-learning for Fast Adaptation: Developing models capable of quickly adapting to new tasks or domains with minimal data.
  • Deep Reinforcement Learning for Robotics: Utilizing deep RL techniques for training robots to perform complex tasks in real-world environments.
  • Generative Adversarial Networks (GANs) for Data Augmentation: Using GANs to generate synthetic data for enhancing training datasets.
  • Variational Autoencoders for Unsupervised Learning: Exploring VAEs for learning latent representations of data without explicit supervision.
  • Lifelong Learning in Deep Networks: Strategies to enable deep networks to continually learn from new data while retaining past knowledge.

Big Data Analytics

  • Streaming Data Analysis for Real-time Insights: Techniques to analyze and derive insights from continuous streams of data in real-time.
  • Scalable Algorithms for Massive Graphs: Developing algorithms that can efficiently process and analyze large-scale graph-structured data.
  • Anomaly Detection in High-dimensional Data: Detecting anomalies and outliers in high-dimensional datasets using advanced statistical methods and machine learning.
  • Personalization and Recommendation Systems: Enhancing recommendation algorithms for providing personalized and relevant suggestions to users.
  • Data Quality Assessment and Improvement: Methods to assess, clean, and enhance the quality of big data to improve analysis and decision-making.
  • Time-to-Event Prediction in Time-series Data: Predicting future events or occurrences based on historical time-series data.
  • Geospatial Data Analysis and Visualization: Analyzing and visualizing large-scale geospatial data for various applications such as urban planning, disaster management, etc.
  • Privacy-preserving Big Data Analytics: Ensuring data privacy while performing analytics on large-scale datasets in distributed environments.
  • Graph-based Deep Learning for Network Analysis: Leveraging deep learning techniques for network analysis and community detection in large-scale networks.
  • Dynamic Data Compression Techniques: Developing methods to compress and store large volumes of data efficiently without losing critical information.

Healthcare Analytics

  • Predictive Modeling for Patient Outcomes: Using machine learning to predict patient outcomes and personalize treatments based on individual health data.
  • Clinical Natural Language Processing for Electronic Health Records (EHR): Extracting valuable information from unstructured EHR data to improve healthcare delivery.
  • Wearable Devices and Health Monitoring: Analyzing data from wearable devices to monitor and predict health conditions in real-time.
  • Drug Discovery and Development using AI: Utilizing machine learning and AI for efficient drug discovery and development processes.
  • Predictive Maintenance in Healthcare Equipment: Developing models to predict and prevent equipment failures in healthcare settings.
  • Disease Clustering and Stratification: Grouping diseases based on similarities in symptoms, genetic markers, and response to treatments.
  • Telemedicine Analytics: Analyzing data from telemedicine platforms to improve remote healthcare delivery and patient outcomes.
  • AI-driven Radiomics for Medical Imaging: Using AI to extract quantitative features from medical images for improved diagnosis and treatment planning.
  • Healthcare Resource Optimization: Optimizing resource allocation in healthcare facilities using predictive analytics and operational research techniques.
  • Patient Journey Analysis and Personalized Care Pathways: Analyzing patient trajectories to create personalized care pathways and improve healthcare outcomes.

Time Series Analysis

  • Forecasting Volatility in Financial Markets: Predicting and modeling volatility in stock prices and financial markets using time series analysis.
  • Dynamic Time Warping for Similarity Analysis: Utilizing DTW to measure similarities between time series data, especially in scenarios with temporal distortions.
  • Seasonal Pattern Detection and Analysis: Identifying and modeling seasonal patterns in time series data for better forecasting.
  • Time Series Anomaly Detection in Industrial IoT: Detecting anomalies in industrial sensor data streams to prevent equipment failures and improve maintenance.
  • Multivariate Time Series Forecasting: Developing models to forecast multiple related time series simultaneously, considering interdependencies.
  • Non-linear Time Series Analysis Techniques: Exploring non-linear models and methods for analyzing complex time series data.
  • Time Series Data Compression for Efficient Storage: Techniques to compress and store time series data efficiently without losing crucial information.
  • Event Detection and Classification in Time Series: Identifying and categorizing specific events or patterns within time series data.
  • Time Series Forecasting with Uncertainty Estimation: Incorporating uncertainty estimation into time series forecasting models for better decision-making.
  • Dynamic Time Series Graphs for Network Analysis: Representing and analyzing dynamic relationships between entities over time using time series graphs.

Reinforcement Learning

  • Multi-agent Reinforcement Learning for Collaboration: Developing strategies for multiple agents to collaborate and solve complex tasks together.
  • Hierarchical Reinforcement Learning: Utilizing hierarchical structures in RL for solving tasks with varying levels of abstraction and complexity.
  • Model-based Reinforcement Learning for Sample Efficiency: Incorporating learned models into RL for efficient exploration and planning.
  • Robotic Manipulation with Reinforcement Learning: Training robots to perform dexterous manipulation tasks using RL algorithms.
  • Safe Reinforcement Learning: Ensuring that RL agents operate safely and ethically in real-world environments, minimizing risks.
  • Transfer Learning in Reinforcement Learning: Transferring knowledge from previously learned tasks to expedite learning in new but related tasks.
  • Curriculum Learning Strategies in RL: Designing learning curricula to gradually expose RL agents to increasingly complex tasks.
  • Continuous Control in Reinforcement Learning: Exploring techniques for continuous control tasks that require precise actions in a continuous action space.
  • Reinforcement Learning for Adaptive Personalization: Utilizing RL to personalize experiences or recommendations for individuals in dynamic environments.
  • Reinforcement Learning in Healthcare Decision-making: Using RL to optimize treatment strategies and decision-making in healthcare settings.

Data Mining

  • Graph Mining for Social Network Analysis: Extracting valuable insights from social network data using graph mining techniques.
  • Sequential Pattern Mining for Market Basket Analysis: Discovering sequential patterns in customer purchase behaviors for market basket analysis.
  • Clustering Algorithms for High-dimensional Data: Developing clustering techniques suitable for high-dimensional datasets.
  • Frequent Pattern Mining in Healthcare Datasets: Identifying frequent patterns in healthcare data for actionable insights and decision support.
  • Outlier Detection and Fraud Analysis: Detecting anomalies and fraudulent activities in various domains using data mining approaches.
  • Opinion Mining and Sentiment Analysis in Reviews: Analyzing opinions and sentiments expressed in product or service reviews to derive insights.
  • Data Mining for Personalized Learning: Mining educational data to personalize learning experiences and adapt teaching methods.
  • Association Rule Mining in Internet of Things (IoT) Data: Discovering meaningful associations and patterns in IoT-generated data streams.
  • Multi-modal Data Fusion for Comprehensive Analysis: Integrating information from multiple data modalities for a holistic understanding and analysis.
  • Data Mining for Energy Consumption Patterns: Analyzing energy usage data to identify patterns and optimize energy consumption in various sectors.

Ethical AI and Bias Mitigation

  • Fairness Metrics and Evaluation in AI Systems: Developing metrics and evaluation frameworks to assess the fairness of AI models.
  • Bias Detection and Mitigation in Facial Recognition: Addressing biases present in facial recognition systems to ensure equitable performance across demographics.
  • Algorithmic Transparency and Explainability: Designing methods to make AI algorithms more transparent and understandable to stakeholders.
  • Fair Representation Learning in Unbalanced Datasets: Learning fair representations from imbalanced data to reduce biases in downstream tasks.
  • Fairness-aware Recommender Systems: Ensuring fairness and reducing biases in recommendation algorithms across diverse user groups.
  • Ethical Considerations in AI for Criminal Justice: Investigating the ethical implications of AI-based decision-making in criminal justice systems.
  • Debiasing Techniques in Natural Language Processing: Developing methods to mitigate biases in language models and text generation.
  • Diversity and Fairness in Hiring Algorithms: Ensuring diversity and fairness in AI-based hiring systems to minimize biases in candidate selection.
  • Ethical AI Governance and Policy: Examining the role of governance and policy frameworks in regulating the development and deployment of AI systems.
  • AI Accountability and Responsibility: Addressing ethical dilemmas and defining responsibilities concerning AI system behaviors and decision-making processes.

Tips For Writing An Effective Data Science Research Paper

Here are some tips for writing an effective data science research paper:

Tip 1: Thorough Planning and Organization

Begin by planning your research paper carefully. Outline the sections and information you’ll include, ensuring a logical flow from introduction to conclusion. This organized approach makes writing easier and helps maintain coherence in your paper.

Tip 2: Clarity in Writing Style

Use clear and simple language to communicate your ideas. Avoid jargon or complex terms that might confuse readers. Write in a way that is easy to understand, ensuring your message is effectively conveyed.

Tip 3: Precise and Relevant Information

Include only information directly related to your research topic. Ensure the data, explanations, and examples you use are precise and contribute directly to supporting your arguments or findings.

Tip 4: Effective Data Visualization

Utilize graphs, charts, and tables to present your data visually. Visual aids make complex information easier to comprehend and can enhance the overall presentation of your research findings.

Tip 5: Review and Revise

Before submitting your paper, review it thoroughly. Check for any errors in grammar, spelling, or formatting. Revise sections if necessary to ensure clarity and coherence in your writing. Asking someone else to review it can also provide valuable feedback.

  • Hospitality Management Research Topics

Things To Remember While Choosing The Data Science Research Topic

When selecting a data science research topic, consider your interests and its relevance to the field. Ensure the topic is neither too broad nor too narrow, striking a balance that allows for in-depth exploration while staying manageable.

  • Relevance and Significance: Choose a topic that aligns with current trends or addresses a significant issue in the field of data science.
  • Feasibility : Ensure the topic is researchable within the resources and time available. It should be practical and manageable for the scope of your study.
  • Your Interest and Passion: Select a topic that genuinely interests you. Your enthusiasm will drive your motivation and engagement throughout the research process.
  • Availability of Data: Check if there’s sufficient data available for analysis related to your chosen topic. Accessible and reliable data sources are vital for thorough research.
  • Potential Contribution: Consider how your chosen topic can contribute to existing knowledge or fill a gap in the field. Aim for a topic that adds value and insights to the data science domain.

In wrapping up our exploration of data science research topics, we’ve uncovered a world of importance and guidance for students. From defining data science to understanding its impact on student life, identifying essential elements in research papers, offering a multitude of intriguing topics for 2024, to providing tips for crafting effective papers—the journey has been insightful. 

Remembering the significance of topic selection and the key components of a well-structured paper, this voyage emphasizes how data science opens doors to endless opportunities. It’s not just a subject; it’s the compass guiding tomorrow’s discoveries and innovations in our digital landscape.

Related Posts

best way to finance car

Step by Step Guide on The Best Way to Finance Car

how to get fund for business

The Best Way on How to Get Fund For Business to Grow it Efficiently

Data Science

Research Areas

Main navigation.

The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data science and related methodologists with disciplines that are being transformed by data science and computation.

Our work supports research in a variety of fields where incredible advances are being made through the facilitation of meaningful collaborations between domain researchers, with deep expertise in societal and fundamental research challenges, and methods researchers that are developing next-generation computational tools and techniques, including:

Data Science for Wildland Fire Research

In recent years, wildfire has gone from an infrequent and distant news item to a centerstage isssue spanning many consecutive weeks for urban and suburban communities. Frequent wildfires are changing everyday lives for California in numerous ways -- from public safety power shutoffs to hazardous air quality -- that seemed inconceivable as recently as 2015. Moreover, elevated wildfire risk in the western United States (and similar climates globally) is here to stay into the foreseeable future. There is a plethora of problems that need solutions in the wildland fire arena; many of them are well suited to a data-driven approach.

Seminar Series

Data Science for Physics

Astrophysicists and particle physicists at Stanford and at the SLAC National Accelerator Laboratory are deeply engaged in studying the Universe at both the largest and smallest scales, with state-of-the-art instrumentation at telescopes and accelerator facilities

Data Science for Economics

Many of the most pressing questions in empirical economics concern causal questions, such as the impact, both short and long run, of educational choices on labor market outcomes, and of economic policies on distributions of outcomes. This makes them conceptually quite different from the predictive type of questions that many of the recently developed methods in machine learning are primarily designed for.

Data Science for Education

Educational data spans K-12 school and district records, digital archives of instructional materials and gradebooks, as well as student responses on course surveys. Data science of actual classroom interaction is also of increasing interest and reality.

Data Science for Human Health

It is clear that data science will be a driving force in transitioning the world’s healthcare systems from reactive “sick-based” care to proactive, preventive care.

Data Science for Humanity

Our modern era is characterized by massive amounts of data documenting the behaviors of individuals, groups, organizations, cultures, and indeed entire societies. This wealth of data on modern humanity is accompanied by massive digitization of historical data, both textual and numeric, in the form of historic newspapers, literary and linguistic corpora, economic data, censuses, and other government data, gathered and preserved over centuries, and newly digitized, acquired, and provisioned by libraries, scholars, and commercial entities.

Data Science for Linguistics

The impact of data science on linguistics has been profound. All areas of the field depend on having a rich picture of the true range of variation, within dialects, across dialects, and among different languages. The subfield of corpus linguistics is arguably as old as the field itself and, with the advent of computers, gave rise to many core techniques in data science.

Data Science for Nature and Sustainability

Many key sustainability issues translate into decision and optimization problems and could greatly benefit from data-driven decision making tools. In fact, the impact of modern information technology has been highly uneven, mainly benefiting large firms in profitable sectors, with little or no benefit in terms of the environment. Our vision is that data-driven methods can — and should — play a key role in increasing the efficiency and effectiveness of the way we manage and allocate our natural resources.

Ethics and Data Science

With the emergence of new techniques of machine learning, and the possibility of using algorithms to perform tasks previously done by human beings, as well as to generate new knowledge, we again face a set of new ethical questions.

The Science of Data Science

The practice of data analysis has changed enormously. Data science needs to find new inferential paradigms that allow data exploration prior to the formulation of hypotheses.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Iqbal h. sarker.

1 Swinburne University of Technology, Melbourne, VIC 3122 Australia

2 Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, Chittagong, 4349 Bangladesh

The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science, advanced analytics methods including machine learning modeling can provide actionable insights or deeper knowledge about data, which makes the computing process automatic and smart. In this paper, we present a comprehensive view on “Data Science” including various types of advanced analytics methods that can be applied to enhance the intelligence and capabilities of an application through smart decision-making in different scenarios. We also discuss and summarize ten potential real-world application domains including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making. Based on this, we finally highlight the challenges and potential research directions within the scope of our study. Overall, this paper aims to serve as a reference point on data science and advanced analytics to the researchers and decision-makers as well as application developers, particularly from the data-driven solution point of view for real-world problems.

Introduction

We are living in the age of “data science and advanced analytics”, where almost everything in our daily lives is digitally recorded as data [ 17 ]. Thus the current electronic world is a wealth of various kinds of data, such as business data, financial data, healthcare data, multimedia data, internet of things (IoT) data, cybersecurity data, social media data, etc [ 112 ]. The data can be structured, semi-structured, or unstructured, which increases day by day [ 105 ]. Data science is typically a “concept to unify statistics, data analysis, and their related methods” to understand and analyze the actual phenomena with data. According to Cao et al. [ 17 ] “data science is the science of data” or “data science is the study of data”, where a data product is a data deliverable, or data-enabled or guided, which can be a discovery, prediction, service, suggestion, insight into decision-making, thought, model, paradigm, tool, or system. The popularity of “Data science” is increasing day-by-day, which is shown in Fig. ​ Fig.1 1 according to Google Trends data over the last 5 years [ 36 ]. In addition to data science, we have also shown the popularity trends of the relevant areas such as “Data analytics”, “Data mining”, “Big data”, “Machine learning” in the figure. According to Fig. ​ Fig.1, 1 , the popularity indication values for these data-driven domains, particularly “Data science”, and “Machine learning” are increasing day-by-day. This statistical information and the applicability of the data-driven smart decision-making in various real-world application areas, motivate us to study briefly on “Data science” and machine-learning-based “Advanced analytics” in this paper.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig1_HTML.jpg

The worldwide popularity score of data science comparing with relevant  areas in a range of 0 (min) to 100 (max) over time where x -axis represents the timestamp information and y -axis represents the corresponding score

Usually, data science is the field of applying advanced analytics methods and scientific concepts to derive useful business information from data. The emphasis of advanced analytics is more on anticipating the use of data to detect patterns to determine what is likely to occur in the future. Basic analytics offer a description of data in general, while advanced analytics is a step forward in offering a deeper understanding of data and helping to analyze granular data, which we are interested in. In the field of data science, several types of analytics are popular, such as "Descriptive analytics" which answers the question of what happened; "Diagnostic analytics" which answers the question of why did it happen; "Predictive analytics" which predicts what will happen in the future; and "Prescriptive analytics" which prescribes what action should be taken, discussed briefly in “ Advanced analytics methods and smart computing ”. Such advanced analytics and decision-making based on machine learning techniques [ 105 ], a major part of artificial intelligence (AI) [ 102 ] can also play a significant role in the Fourth Industrial Revolution (Industry 4.0) due to its learning capability for smart computing as well as automation [ 121 ].

Although the area of “data science” is huge, we mainly focus on deriving useful insights through advanced analytics, where the results are used to make smart decisions in various real-world application areas. For this, various advanced analytics methods such as machine learning modeling, natural language processing, sentiment analysis, neural network, or deep learning analysis can provide deeper knowledge about data, and thus can be used to develop data-driven intelligent applications. More specifically, regression analysis, classification, clustering analysis, association rules, time-series analysis, sentiment analysis, behavioral patterns, anomaly detection, factor analysis, log analysis, and deep learning which is originated from the artificial neural network, are taken into account in our study. These machine learning-based advanced analytics methods are discussed briefly in “ Advanced analytics methods and smart computing ”. Thus, it’s important to understand the principles of various advanced analytics methods mentioned above and their applicability to apply in various real-world application areas. For instance, in our earlier paper Sarker et al. [ 114 ], we have discussed how data science and machine learning modeling can play a significant role in the domain of cybersecurity for making smart decisions and to provide data-driven intelligent security services. In this paper, we broadly take into account the data science application areas and real-world problems in ten potential domains including the area of business data science, health data science, IoT data science, behavioral data science, urban data science, and so on, discussed briefly in “ Real-world application domains ”.

Based on the importance of machine learning modeling to extract the useful insights from the data mentioned above and data-driven smart decision-making, in this paper, we present a comprehensive view on “Data Science” including various types of advanced analytics methods that can be applied to enhance the intelligence and the capabilities of an application. The key contribution of this study is thus understanding data science modeling, explaining different analytic methods for solution perspective and their applicability in various real-world data-driven applications areas mentioned earlier. Overall, the purpose of this paper is, therefore, to provide a basic guide or reference for those academia and industry people who want to study, research, and develop automated and intelligent applications or systems based on smart computing and decision making within the area of data science.

The main contributions of this paper are summarized as follows:

  • To define the scope of our study towards data-driven smart computing and decision-making in our real-world life. We also make a brief discussion on the concept of data science modeling from business problems to data product and automation, to understand its applicability and provide intelligent services in real-world scenarios.
  • To provide a comprehensive view on data science including advanced analytics methods that can be applied to enhance the intelligence and the capabilities of an application.
  • To discuss the applicability and significance of machine learning-based analytics methods in various real-world application areas. We also summarize ten potential real-world application areas, from business to personalized applications in our daily life, where advanced analytics with machine learning modeling can be used to achieve the expected outcome.
  • To highlight and summarize the challenges and potential research directions within the scope of our study.

The rest of the paper is organized as follows. The next section provides the background and related work and defines the scope of our study. The following section presents the concepts of data science modeling for building a data-driven application. After that, briefly discuss and explain different advanced analytics methods and smart computing. Various real-world application areas are discussed and summarized in the next section. We then highlight and summarize several research issues and potential future directions, and finally, the last section concludes this paper.

Background and Related Work

In this section, we first discuss various data terms and works related to data science and highlight the scope of our study.

Data Terms and Definitions

There is a range of key terms in the field, such as data analysis, data mining, data analytics, big data, data science, advanced analytics, machine learning, and deep learning, which are highly related and easily confusing. In the following, we define these terms and differentiate them with the term “Data Science” according to our goal.

The term “Data analysis” refers to the processing of data by conventional (e.g., classic statistical, empirical, or logical) theories, technologies, and tools for extracting useful information and for practical purposes [ 17 ]. The term “Data analytics”, on the other hand, refers to the theories, technologies, instruments, and processes that allow for an in-depth understanding and exploration of actionable data insight [ 17 ]. Statistical and mathematical analysis of the data is the major concern in this process. “Data mining” is another popular term over the last decade, which has a similar meaning with several other terms such as knowledge mining from data, knowledge extraction, knowledge discovery from data (KDD), data/pattern analysis, data archaeology, and data dredging. According to Han et al. [ 38 ], it should have been more appropriately named “knowledge mining from data”. Overall, data mining is defined as the process of discovering interesting patterns and knowledge from large amounts of data [ 38 ]. Data sources may include databases, data centers, the Internet or Web, other repositories of data, or data dynamically streamed through the system. “Big data” is another popular term nowadays, which may change the statistical and data analysis approaches as it has the unique features of “massive, high dimensional, heterogeneous, complex, unstructured, incomplete, noisy, and erroneous” [ 74 ]. Big data can be generated by mobile devices, social networks, the Internet of Things, multimedia, and many other new applications [ 129 ]. Several unique features including volume, velocity, variety, veracity, value (5Vs), and complexity are used to understand and describe big data [ 69 ].

In terms of analytics, basic analytics provides a summary of data whereas the term “Advanced Analytics” takes a step forward in offering a deeper understanding of data and helps to analyze granular data. Advanced analytics is characterized or defined as autonomous or semi-autonomous data or content analysis using advanced techniques and methods to discover deeper insights, predict or generate recommendations, typically beyond traditional business intelligence or analytics. “Machine learning”, a branch of artificial intelligence (AI), is one of the major techniques used in advanced analytics which can automate analytical model building [ 112 ]. This is focused on the premise that systems can learn from data, recognize trends, and make decisions, with minimal human involvement [ 38 , 115 ]. “Deep Learning” is a subfield of machine learning that discusses algorithms inspired by the human brain’s structure and the function called artificial neural networks [ 38 , 139 ].

Unlike the above data-related terms, “Data science” is an umbrella term that encompasses advanced data analytics, data mining, machine, and deep learning modeling, and several other related disciplines like statistics, to extract insights or useful knowledge from the datasets and transform them into actionable business strategies. In [ 17 ], Cao et al. defined data science from the disciplinary perspective as “data science is a new interdisciplinary field that synthesizes and builds on statistics, informatics, computing, communication, management, and sociology to study data and its environments (including domains and other contextual aspects, such as organizational and social aspects) to transform data to insights and decisions by following a data-to-knowledge-to-wisdom thinking and methodology”. In “ Understanding data science modeling ”, we briefly discuss the data science modeling from a practical perspective starting from business problems to data products that can assist the data scientists to think and work in a particular real-world problem domain within the area of data science and analytics.

Related Work

In the area, several papers have been reviewed by the researchers based on data science and its significance. For example, the authors in [ 19 ] identify the evolving field of data science and its importance in the broader knowledge environment and some issues that differentiate data science and informatics issues from conventional approaches in information sciences. Donoho et al. [ 27 ] present 50 years of data science including recent commentary on data science in mass media, and on how/whether data science varies from statistics. The authors formally conceptualize the theory-guided data science (TGDS) model in [ 53 ] and present a taxonomy of research themes in TGDS. Cao et al. include a detailed survey and tutorial on the fundamental aspects of data science in [ 17 ], which considers the transition from data analysis to data science, the principles of data science, as well as the discipline and competence of data education.

Besides, the authors include a data science analysis in [ 20 ], which aims to provide a realistic overview of the use of statistical features and related data science methods in bioimage informatics. The authors in [ 61 ] study the key streams of data science algorithm use at central banks and show how their popularity has risen over time. This research contributes to the creation of a research vector on the role of data science in central banking. In [ 62 ], the authors provide an overview and tutorial on the data-driven design of intelligent wireless networks. The authors in [ 87 ] provide a thorough understanding of computational optimal transport with application to data science. In [ 97 ], the authors present data science as theoretical contributions in information systems via text analytics.

Unlike the above recent studies, in this paper, we concentrate on the knowledge of data science including advanced analytics methods, machine learning modeling, real-world application domains, and potential research directions within the scope of our study. The advanced analytics methods based on machine learning techniques discussed in this paper can be applied to enhance the capabilities of an application in terms of data-driven intelligent decision making and automation in the final data product or systems.

Understanding Data Science Modeling

In this section, we briefly discuss how data science can play a significant role in the real-world business process. For this, we first categorize various types of data and then discuss the major steps of data science modeling starting from business problems to data product and automation.

Types of Real-World Data

Typically, to build a data-driven real-world system in a particular domain, the availability of data is the key [ 17 , 112 , 114 ]. The data can be in different types such as (i) Structured—that has a well-defined data structure and follows a standard order, examples are names, dates, addresses, credit card numbers, stock information, geolocation, etc.; (ii) Unstructured—has no pre-defined format or organization, examples are sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, etc.; (iii) Semi-structured—has elements of both the structured and unstructured data containing certain organizational properties, examples are HTML, XML, JSON documents, NoSQL databases, etc.; and (iv) Metadata—that represents data about the data, examples are author, file type, file size, creation date and time, last modification date and time, etc. [ 38 , 105 ].

In the area of data science, researchers use various widely-used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 127 ], UNSW-NB15 [ 79 ], Bot-IoT [ 59 ], ISCX’12 [ 15 ], CIC-DDoS2019 [ 22 ], etc., smartphone datasets such as phone call logs [ 88 , 110 ], mobile application usages logs [ 124 , 149 ], SMS Log [ 28 ], mobile phone notification logs [ 77 ] etc., IoT data [ 56 , 11 , 64 ], health data such as heart disease [ 99 ], diabetes mellitus [ 86 , 147 ], COVID-19 [ 41 , 78 ], etc., agriculture and e-commerce data [ 128 , 150 ], and many more in various application domains. In “ Real-world application domains ”, we discuss ten potential real-world application domains of data science and analytics by taking into account data-driven smart computing and decision making, which can help the data scientists and application developers to explore more in various real-world issues.

Overall, the data used in data-driven applications can be any of the types mentioned above, and they can differ from one application to another in the real world. Data science modeling, which is briefly discussed below, can be used to analyze such data in a specific problem domain and derive insights or useful information from the data to build a data-driven model or data product.

Steps of Data Science Modeling

Data science is typically an umbrella term that encompasses advanced data analytics, data mining, machine, and deep learning modeling, and several other related disciplines like statistics, to extract insights or useful knowledge from the datasets and transform them into actionable business strategies, mentioned earlier in “ Background and related work ”. In this section, we briefly discuss how data science can play a significant role in the real-world business process. Figure ​ Figure2 2 shows an example of data science modeling starting from real-world data to data-driven product and automation. In the following, we briefly discuss each module of the data science process.

  • Understanding business problems: This involves getting a clear understanding of the problem that is needed to solve, how it impacts the relevant organization or individuals, the ultimate goals for addressing it, and the relevant project plan. Thus to understand and identify the business problems, the data scientists formulate relevant questions while working with the end-users and other stakeholders. For instance, how much/many, which category/group, is the behavior unrealistic/abnormal, which option should be taken, what action, etc. could be relevant questions depending on the nature of the problems. This helps to get a better idea of what business needs and what we should be extracted from data. Such business knowledge can enable organizations to enhance their decision-making process, is known as “Business Intelligence” [ 65 ]. Identifying the relevant data sources that can help to answer the formulated questions and what kinds of actions should be taken from the trends that the data shows, is another important task associated with this stage. Once the business problem has been clearly stated, the data scientist can define the analytic approach to solve the problem.
  • Understanding data: As we know that data science is largely driven by the availability of data [ 114 ]. Thus a sound understanding of the data is needed towards a data-driven model or system. The reason is that real-world data sets are often noisy, missing values, have inconsistencies, or other data issues, which are needed to handle effectively [ 101 ]. To gain actionable insights, the appropriate data or the quality of the data must be sourced and cleansed, which is fundamental to any data science engagement. For this, data assessment that evaluates what data is available and how it aligns to the business problem could be the first step in data understanding. Several aspects such as data type/format, the quantity of data whether it is sufficient or not to extract the useful knowledge, data relevance, authorized access to data, feature or attribute importance, combining multiple data sources, important metrics to report the data, etc. are needed to take into account to clearly understand the data for a particular business problem. Overall, the data understanding module involves figuring out what data would be best needed and the best ways to acquire it.
  • Data pre-processing and exploration: Exploratory data analysis is defined in data science as an approach to analyzing datasets to summarize their key characteristics, often with visual methods [ 135 ]. This examines a broad data collection to discover initial trends, attributes, points of interest, etc. in an unstructured manner to construct meaningful summaries of the data. Thus data exploration is typically used to figure out the gist of data and to develop a first step assessment of its quality, quantity, and characteristics. A statistical model can be used or not, but primarily it offers tools for creating hypotheses by generally visualizing and interpreting the data through graphical representation such as a chart, plot, histogram, etc [ 72 , 91 ]. Before the data is ready for modeling, it’s necessary to use data summarization and visualization to audit the quality of the data and provide the information needed to process it. To ensure the quality of the data, the data  pre-processing technique, which is typically the process of cleaning and transforming raw data [ 107 ] before processing and analysis is important. It also involves reformatting information, making data corrections, and merging data sets to enrich data. Thus, several aspects such as expected data, data cleaning, formatting or transforming data, dealing with missing values, handling data imbalance and bias issues, data distribution, search for outliers or anomalies in data and dealing with them, ensuring data quality, etc. could be the key considerations in this step.
  • Machine learning modeling and evaluation: Once the data is prepared for building the model, data scientists design a model, algorithm, or set of models, to address the business problem. Model building is dependent on what type of analytics, e.g., predictive analytics, is needed to solve the particular problem, which is discussed briefly in “ Advanced analytics methods and smart computing ”. To best fits the data according to the type of analytics, different types of data-driven or machine learning models that have been summarized in our earlier paper Sarker et al. [ 105 ], can be built to achieve the goal. Data scientists typically separate training and test subsets of the given dataset usually dividing in the ratio of 80:20 or data considering the most popular k -folds data splitting method [ 38 ]. This is to observe whether the model performs well or not on the data, to maximize the model performance. Various model validation and assessment metrics, such as error rate, accuracy, true positive, false positive, true negative, false negative, precision, recall, f-score, ROC (receiver operating characteristic curve) analysis, applicability analysis, etc. [ 38 , 115 ] are used to measure the model performance, which can guide the data scientists to choose or design the learning method or model. Besides, machine learning experts or data scientists can take into account several advanced analytics such as feature engineering, feature selection or extraction methods, algorithm tuning, ensemble methods, modifying existing algorithms, or designing new algorithms, etc. to improve the ultimate data-driven model to solve a particular business problem through smart decision making.
  • Data product and automation: A data product is typically the output of any data science activity [ 17 ]. A data product, in general terms, is a data deliverable, or data-enabled or guide, which can be a discovery, prediction, service, suggestion, insight into decision-making, thought, model, paradigm, tool, application, or system that process data and generate results. Businesses can use the results of such data analysis to obtain useful information like churn (a measure of how many customers stop using a product) prediction and customer segmentation, and use these results to make smarter business decisions and automation. Thus to make better decisions in various business problems, various machine learning pipelines and data products can be developed. To highlight this, we summarize several potential real-world data science application areas in “ Real-world application domains ”, where various data products can play a significant role in relevant business problems to make them smart and automate.

Overall, we can conclude that data science modeling can be used to help drive changes and improvements in business practices. The interesting part of the data science process indicates having a deeper understanding of the business problem to solve. Without that, it would be much harder to gather the right data and extract the most useful information from the data for making decisions to solve the problem. In terms of role, “Data Scientists” typically interpret and manage data to uncover the answers to major questions that help organizations to make objective decisions and solve complex problems. In a summary, a data scientist proactively gathers and analyzes information from multiple sources to better understand how the business performs, and  designs machine learning or data-driven tools/methods, or algorithms, focused on advanced analytics, which can make today’s computing process smarter and intelligent, discussed briefly in the following section.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig2_HTML.jpg

An example of data science modeling from real-world data to data-driven system and decision making

Advanced Analytics Methods and Smart Computing

As mentioned earlier in “ Background and related work ”, basic analytics provides a summary of data whereas advanced analytics takes a step forward in offering a deeper understanding of data and helps in granular data analysis. For instance, the predictive capabilities of advanced analytics can be used to forecast trends, events, and behaviors. Thus, “advanced analytics” can be defined as the autonomous or semi-autonomous analysis of data or content using advanced techniques and methods to discover deeper insights, make predictions, or produce recommendations, where machine learning-based analytical modeling is considered as the key technologies in the area. In the following section, we first summarize various types of analytics and outcome that are needed to solve the associated business problems, and then we briefly discuss machine learning-based analytical modeling.

Types of Analytics and Outcome

In the real-world business process, several key questions such as “What happened?”, “Why did it happen?”, “What will happen in the future?”, “What action should be taken?” are common and important. Based on these questions, in this paper, we categorize and highlight the analytics into four types such as descriptive, diagnostic, predictive, and prescriptive, which are discussed below.

  • Descriptive analytics: It is the interpretation of historical data to better understand the changes that have occurred in a business. Thus descriptive analytics answers the question, “what happened in the past?” by summarizing past data such as statistics on sales and operations or marketing strategies, use of social media, and engagement with Twitter, Linkedin or Facebook, etc. For instance, using descriptive analytics through analyzing trends, patterns, and anomalies, etc., customers’ historical shopping data can be used to predict the probability of a customer purchasing a product. Thus, descriptive analytics can play a significant role to provide an accurate picture of what has occurred in a business and how it relates to previous times utilizing a broad range of relevant business data. As a result, managers and decision-makers can pinpoint areas of strength and weakness in their business, and eventually can take more effective management strategies and business decisions.
  • Diagnostic analytics: It is a form of advanced analytics that examines data or content to answer the question, “why did it happen?” The goal of diagnostic analytics is to help to find the root cause of the problem. For example, the human resource management department of a business organization may use these diagnostic analytics to find the best applicant for a position, select them, and compare them to other similar positions to see how well they perform. In a healthcare example, it might help to figure out whether the patients’ symptoms such as high fever, dry cough, headache, fatigue, etc. are all caused by the same infectious agent. Overall, diagnostic analytics enables one to extract value from the data by posing the right questions and conducting in-depth investigations into the answers. It is characterized by techniques such as drill-down, data discovery, data mining, and correlations.
  • Predictive analytics: Predictive analytics is an important analytical technique used by many organizations for various purposes such as to assess business risks, anticipate potential market patterns, and decide when maintenance is needed, to enhance their business. It is a form of advanced analytics that examines data or content to answer the question, “what will happen in the future?” Thus, the primary goal of predictive analytics is to identify and typically answer this question with a high degree of probability. Data scientists can use historical data as a source to extract insights for building predictive models using various regression analyses and machine learning techniques, which can be used in various application domains for a better outcome. Companies, for example, can use predictive analytics to minimize costs by better anticipating future demand and changing output and inventory, banks and other financial institutions to reduce fraud and risks by predicting suspicious activity, medical specialists to make effective decisions through predicting patients who are at risk of diseases, retailers to increase sales and customer satisfaction through understanding and predicting customer preferences, manufacturers to optimize production capacity through predicting maintenance requirements, and many more. Thus predictive analytics can be considered as the core analytical method within the area of data science.
  • Prescriptive analytics: Prescriptive analytics focuses on recommending the best way forward with actionable information to maximize overall returns and profitability, which typically answer the question, “what action should be taken?” In business analytics, prescriptive analytics is considered the final step. For its models, prescriptive analytics collects data from several descriptive and predictive sources and applies it to the decision-making process. Thus, we can say that it is related to both descriptive analytics and predictive analytics, but it emphasizes actionable insights instead of data monitoring. In other words, it can be considered as the opposite of descriptive analytics, which examines decisions and outcomes after the fact. By integrating big data, machine learning, and business rules, prescriptive analytics helps organizations to make more informed decisions to produce results that drive the most successful business decisions.

In summary, to clarify what happened and why it happened, both descriptive analytics and diagnostic analytics look at the past. Historical data is used by predictive analytics and prescriptive analytics to forecast what will happen in the future and what steps should be taken to impact those effects. In Table ​ Table1, 1 , we have summarized these analytics methods with examples. Forward-thinking organizations in the real world can jointly use these analytical methods to make smart decisions that help drive changes in business processes and improvements. In the following, we discuss how machine learning techniques can play a big role in these analytical methods through their learning capabilities from the data.

Various types of analytical methods with examples

Analytical methodsData-driven model buildingExamples
Descriptive analyticsAnswer the question, “what happened in the past”?Summarising past events, e.g., sales, business data, social media usage, reporting general trends, etc.
Diagnostic analyticsAnswer the question, “why did it happen?”Identify anomalies and determine casual relationships, to find out business loss, identifying the influence of medications, etc.
Predictive analyticsAnswer the question, “what will happen in the future?”Predicting customer preferences, recommending products, identifying possible security breaches, predicting staff and resource needs, etc.
Prescriptive analyticsAnswer the question, “what action should be taken?” Improving business management, maintenance, improving patient care and healthcare administration, determining optimal marketing strategies, etc.

Machine Learning Based Analytical Modeling

In this section, we briefly discuss various advanced analytics methods based on machine learning modeling, which can make the computing process smart through intelligent decision-making in a business process. Figure ​ Figure3 3 shows a general structure of a machine learning-based predictive modeling considering both the training and testing phase. In the following, we discuss a wide range of methods such as regression and classification analysis, association rule analysis, time-series analysis, behavioral analysis, log analysis, and so on within the scope of our study.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig3_HTML.jpg

A general structure of a machine learning based predictive model considering both the training and testing phase

Regression Analysis

In data science, one of the most common statistical approaches used for predictive modeling and data mining tasks is regression techniques [ 38 ]. Regression analysis is a form of supervised machine learning that examines the relationship between a dependent variable (target) and independent variables (predictor) to predict continuous-valued output [ 105 , 117 ]. The following equations Eqs. 1 , 2 , and 3 [ 85 , 105 ] represent the simple, multiple or multivariate, and polynomial regressions respectively, where x represents independent variable and y is the predicted/target output mentioned above:

Regression analysis is typically conducted for one of two purposes: to predict the value of the dependent variable in the case of individuals for whom some knowledge relating to the explanatory variables is available, or to estimate the effect of some explanatory variable on the dependent variable, i.e., finding the relationship of causal influence between the variables. Linear regression cannot be used to fit non-linear data and may cause an underfitting problem. In that case, polynomial regression performs better, however, increases the model complexity. The regularization techniques such as Ridge, Lasso, Elastic-Net, etc. [ 85 , 105 ] can be used to optimize the linear regression model. Besides, support vector regression, decision tree regression, random forest regression techniques [ 85 , 105 ] can be used for building effective regression models depending on the problem type, e.g., non-linear tasks. Financial forecasting or prediction, cost estimation, trend analysis, marketing, time-series estimation, drug response modeling, etc. are some examples where the regression models can be used to solve real-world problems in the domain of data science and analytics.

Classification Analysis

Classification is one of the most widely used and best-known data science processes. This is a form of supervised machine learning approach that also refers to a predictive modeling problem in which a class label is predicted for a given example [ 38 ]. Spam identification, such as ‘spam’ and ‘not spam’ in email service providers, can be an example of a classification problem. There are several forms of classification analysis available in the area such as binary classification—which refers to the prediction of one of two classes; multi-class classification—which involves the prediction of one of more than two classes; multi-label classification—a generalization of multiclass classification in which the problem’s classes are organized hierarchically [ 105 ].

Several popular classification techniques, such as k-nearest neighbors [ 5 ], support vector machines [ 55 ], navies Bayes [ 49 ], adaptive boosting [ 32 ], extreme gradient boosting [ 85 ], logistic regression [ 66 ], decision trees ID3 [ 92 ], C4.5 [ 93 ], and random forests [ 13 ] exist to solve classification problems. The tree-based classification technique, e.g., random forest considering multiple decision trees, performs better than others to solve real-world problems in many cases as due to its capability of producing logic rules [ 103 , 115 ]. Figure ​ Figure4 4 shows an example of a random forest structure considering multiple decision trees. In addition, BehavDT recently proposed by Sarker et al. [ 109 ], and IntrudTree [ 106 ] can be used for building effective classification or prediction models in the relevant tasks within the domain of data science and analytics.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig4_HTML.jpg

An example of a random forest structure considering multiple decision trees

Cluster Analysis

Clustering is a form of unsupervised machine learning technique and is well-known in many data science application areas for statistical data analysis [ 38 ]. Usually, clustering techniques search for the structures inside a dataset and, if the classification is not previously identified, classify homogeneous groups of cases. This means that data points are identical to each other within a cluster, and different from data points in another cluster. Overall, the purpose of cluster analysis is to sort various data points into groups (or clusters) that are homogeneous internally and heterogeneous externally [ 105 ]. To gain insight into how data is distributed in a given dataset or as a preprocessing phase for other algorithms, clustering is often used. Data clustering, for example, assists with customer shopping behavior, sales campaigns, and retention of consumers for retail businesses, anomaly detection, etc.

Many clustering algorithms with the ability to group data have been proposed in machine learning and data science literature [ 98 , 138 , 141 ]. In our earlier paper Sarker et al. [ 105 ], we have summarized this based on several perspectives, such as partitioning methods, density-based methods, hierarchical-based methods, model-based methods, etc. In the literature, the popular K-means [ 75 ], K-Mediods [ 84 ], CLARA [ 54 ] etc. are known as partitioning methods; DBSCAN [ 30 ], OPTICS [ 8 ] etc. are known as density-based methods; single linkage [ 122 ], complete linkage [ 123 ], etc. are known as hierarchical methods. In addition, grid-based clustering methods, such as STING [ 134 ], CLIQUE [ 2 ], etc.; model-based clustering such as neural network learning [ 141 ], GMM [ 94 ], SOM [ 18 , 104 ], etc.; constrained-based methods such as COP K-means [ 131 ], CMWK-Means [ 25 ], etc. are used in the area. Recently, Sarker et al. [ 111 ] proposed a hierarchical clustering method, BOTS [ 111 ] based on bottom-up agglomerative technique for capturing user’s similar behavioral characteristics over time. The key benefit of agglomerative hierarchical clustering is that the tree-structure hierarchy created by agglomerative clustering is more informative than an unstructured set of flat clusters, which can assist in better decision-making in relevant application areas in data science.

Association Rule Analysis

Association rule learning is known as a rule-based machine learning system, an unsupervised learning method is typically used to establish a relationship among variables. This is a descriptive technique often used to analyze large datasets for discovering interesting relationships or patterns. The association learning technique’s main strength is its comprehensiveness, as it produces all associations that meet user-specified constraints including minimum support and confidence value [ 138 ].

Association rules allow a data scientist to identify trends, associations, and co-occurrences between data sets inside large data collections. In a supermarket, for example, associations infer knowledge about the buying behavior of consumers for different items, which helps to change the marketing and sales plan. In healthcare, to better diagnose patients, physicians may use association guidelines. Doctors can assess the conditional likelihood of a given illness by comparing symptom associations in the data from previous cases using association rules and machine learning-based data analysis. Similarly, association rules are useful for consumer behavior analysis and prediction, customer market analysis, bioinformatics, weblog mining, recommendation systems, etc.

Several types of association rules have been proposed in the area, such as frequent pattern based [ 4 , 47 , 73 ], logic-based [ 31 ], tree-based [ 39 ], fuzzy-rules [ 126 ], belief rule [ 148 ] etc. The rule learning techniques such as AIS [ 3 ], Apriori [ 4 ], Apriori-TID and Apriori-Hybrid [ 4 ], FP-Tree [ 39 ], Eclat [ 144 ], RARM [ 24 ] exist to solve the relevant business problems. Apriori [ 4 ] is the most commonly used algorithm for discovering association rules from a given dataset among the association rule learning techniques [ 145 ]. The recent association rule-learning technique ABC-RuleMiner proposed in our earlier paper by Sarker et al. [ 113 ] could give significant results in terms of generating non-redundant rules that can be used for smart decision making according to human preferences, within the area of data science applications.

Time-Series Analysis and Forecasting

A time series is typically a series of data points indexed in time order particularly, by date, or timestamp [ 111 ]. Depending on the frequency, the time-series can be different types such as annually, e.g., annual budget, quarterly, e.g., expenditure, monthly, e.g., air traffic, weekly, e.g., sales quantity, daily, e.g., weather, hourly, e.g., stock price, minute-wise, e.g., inbound calls in a call center, and even second-wise, e.g., web traffic, and so on in relevant domains.

A mathematical method dealing with such time-series data, or the procedure of fitting a time series to a proper model is termed time-series analysis. Many different time series forecasting algorithms and analysis methods can be applied to extract the relevant information. For instance, to do time-series forecasting for future patterns, the autoregressive (AR) model [ 130 ] learns the behavioral trends or patterns of past data. Moving average (MA) [ 40 ] is another simple and common form of smoothing used in time series analysis and forecasting that uses past forecasted errors in a regression-like model to elaborate an averaged trend across the data. The autoregressive moving average (ARMA) [ 12 , 120 ] combines these two approaches, where autoregressive extracts the momentum and pattern of the trend and moving average capture the noise effects. The most popular and frequently used time-series model is the autoregressive integrated moving average (ARIMA) model [ 12 , 120 ]. ARIMA model, a generalization of an ARMA model, is more flexible than other statistical models such as exponential smoothing or simple linear regression. In terms of data, the ARMA model can only be used for stationary time-series data, while the ARIMA model includes the case of non-stationarity as well. Similarly, seasonal autoregressive integrated moving average (SARIMA), autoregressive fractionally integrated moving average (ARFIMA), autoregressive moving average model with exogenous inputs model (ARMAX model) are also used in time-series models [ 120 ].

In addition to the stochastic methods for time-series modeling and forecasting, machine and deep learning-based approach can be used for effective time-series analysis and forecasting. For instance, in our earlier paper, Sarker et al. [ 111 ] present a bottom-up clustering-based time-series analysis to capture the mobile usage behavioral patterns of the users. Figure ​ Figure5 5 shows an example of producing aggregate time segments Seg_i from initial time slices TS_i based on similar behavioral characteristics that are used in our bottom-up clustering approach, where D represents the dominant behavior BH_i of the users, mentioned above [ 111 ]. The authors in [ 118 ], used a long short-term memory (LSTM) model, a kind of recurrent neural network (RNN) deep learning model, in forecasting time-series that outperform traditional approaches such as the ARIMA model. Time-series analysis is commonly used these days in various fields such as financial, manufacturing, business, social media, event data (e.g., clickstreams and system events), IoT and smartphone data, and generally in any applied science and engineering temporal measurement domain. Thus, it covers a wide range of application areas in data science.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig5_HTML.jpg

An example of producing aggregate time segments from initial time slices based on similar behavioral characteristics

Opinion Mining and Sentiment Analysis

Sentiment analysis or opinion mining is the computational study of the opinions, thoughts, emotions, assessments, and attitudes of people towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [ 71 ]. There are three kinds of sentiments: positive, negative, and neutral, along with more extreme feelings such as angry, happy and sad, or interested or not interested, etc. More refined sentiments to evaluate the feelings of individuals in various situations can also be found according to the problem domain.

Although the task of opinion mining and sentiment analysis is very challenging from a technical point of view, it’s very useful in real-world practice. For instance, a business always aims to obtain an opinion from the public or customers about its products and services to refine the business policy as well as a better business decision. It can thus benefit a business to understand the social opinion of their brand, product, or service. Besides, potential customers want to know what consumers believe they have when they use a service or purchase a product. Document-level, sentence level, aspect level, and concept level, are the possible levels of opinion mining in the area [ 45 ].

Several popular techniques such as lexicon-based including dictionary-based and corpus-based methods, machine learning including supervised and unsupervised learning, deep learning, and hybrid methods are used in sentiment analysis-related tasks [ 70 ]. To systematically define, extract, measure, and analyze affective states and subjective knowledge, it incorporates the use of statistics, natural language processing (NLP), machine learning as well as deep learning methods. Sentiment analysis is widely used in many applications, such as reviews and survey data, web and social media, and healthcare content, ranging from marketing and customer support to clinical practice. Thus sentiment analysis has a big influence in many data science applications, where public sentiment is involved in various real-world issues.

Behavioral Data and Cohort Analysis

Behavioral analytics is a recent trend that typically reveals new insights into e-commerce sites, online gaming, mobile and smartphone applications, IoT user behavior, and many more [ 112 ]. The behavioral analysis aims to understand how and why the consumers or users behave, allowing accurate predictions of how they are likely to behave in the future. For instance, it allows advertisers to make the best offers with the right client segments at the right time. Behavioral analytics, including traffic data such as navigation paths, clicks, social media interactions, purchase decisions, and marketing responsiveness, use the large quantities of raw user event information gathered during sessions in which people use apps, games, or websites. In our earlier papers Sarker et al. [ 101 , 111 , 113 ] we have discussed how to extract users phone usage behavioral patterns utilizing real-life phone log data for various purposes.

In the real-world scenario, behavioral analytics is often used in e-commerce, social media, call centers, billing systems, IoT systems, political campaigns, and other applications, to find opportunities for optimization to achieve particular outcomes. Cohort analysis is a branch of behavioral analytics that involves studying groups of people over time to see how their behavior changes. For instance, it takes data from a given data set (e.g., an e-commerce website, web application, or online game) and separates it into related groups for analysis. Various machine learning techniques such as behavioral data clustering [ 111 ], behavioral decision tree classification [ 109 ], behavioral association rules [ 113 ], etc. can be used in the area according to the goal. Besides, the concept of RecencyMiner, proposed in our earlier paper Sarker et al. [ 108 ] that takes into account recent behavioral patterns could be effective while analyzing behavioral data as it may not be static in the real-world changes over time.

Anomaly Detection or Outlier Analysis

Anomaly detection, also known as Outlier analysis is a data mining step that detects data points, events, and/or findings that deviate from the regularities or normal behavior of a dataset. Anomalies are usually referred to as outliers, abnormalities, novelties, noise, inconsistency, irregularities, and exceptions [ 63 , 114 ]. Techniques of anomaly detection may discover new situations or cases as deviant based on historical data through analyzing the data patterns. For instance, identifying fraud or irregular transactions in finance is an example of anomaly detection.

It is often used in preprocessing tasks for the deletion of anomalous or inconsistency in the real-world data collected from various data sources including user logs, devices, networks, and servers. For anomaly detection, several machine learning techniques can be used, such as k-nearest neighbors, isolation forests, cluster analysis, etc [ 105 ]. The exclusion of anomalous data from the dataset also results in a statistically significant improvement in accuracy during supervised learning [ 101 ]. However, extracting appropriate features, identifying normal behaviors, managing imbalanced data distribution, addressing variations in abnormal behavior or irregularities, the sparse occurrence of abnormal events, environmental variations, etc. could be challenging in the process of anomaly detection. Detection of anomalies can be applicable in a variety of domains such as cybersecurity analytics, intrusion detections, fraud detection, fault detection, health analytics, identifying irregularities, detecting ecosystem disturbances, and many more. This anomaly detection can be considered a significant task for building effective systems with higher accuracy within the area of data science.

Factor Analysis

Factor analysis is a collection of techniques for describing the relationships or correlations between variables in terms of more fundamental entities known as factors [ 23 ]. It’s usually used to organize variables into a small number of clusters based on their common variance, where mathematical or statistical procedures are used. The goals of factor analysis are to determine the number of fundamental influences underlying a set of variables, calculate the degree to which each variable is associated with the factors, and learn more about the existence of the factors by examining which factors contribute to output on which variables. The broad purpose of factor analysis is to summarize data so that relationships and patterns can be easily interpreted and understood [ 143 ].

Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are the two most popular factor analysis techniques. EFA seeks to discover complex trends by analyzing the dataset and testing predictions, while CFA tries to validate hypotheses and uses path analysis diagrams to represent variables and factors [ 143 ]. Factor analysis is one of the algorithms for unsupervised machine learning that is used for minimizing dimensionality. The most common methods for factor analytics are principal components analysis (PCA), principal axis factoring (PAF), and maximum likelihood (ML) [ 48 ]. Methods of correlation analysis such as Pearson correlation, canonical correlation, etc. may also be useful in the field as they can quantify the statistical relationship between two continuous variables, or association. Factor analysis is commonly used in finance, marketing, advertising, product management, psychology, and operations research, and thus can be considered as another significant analytical method within the area of data science.

Log Analysis

Logs are commonly used in system management as logs are often the only data available that record detailed system runtime activities or behaviors in production [ 44 ]. Log analysis is thus can be considered as the method of analyzing, interpreting, and capable of understanding computer-generated records or messages, also known as logs. This can be device log, server log, system log, network log, event log, audit trail, audit record, etc. The process of creating such records is called data logging.

Logs are generated by a wide variety of programmable technologies, including networking devices, operating systems, software, and more. Phone call logs [ 88 , 110 ], SMS Logs [ 28 ], mobile apps usages logs [ 124 , 149 ], notification logs [ 77 ], game Logs [ 82 ], context logs [ 16 , 149 ], web logs [ 37 ], smartphone life logs [ 95 ], etc. are some examples of log data for smartphone devices. The main characteristics of these log data is that it contains users’ actual behavioral activities with their devices. Similar other log data can be search logs [ 50 , 133 ], application logs [ 26 ], server logs [ 33 ], network logs [ 57 ], event logs [ 83 ], network and security logs [ 142 ] etc.

Several techniques such as classification and tagging, correlation analysis, pattern recognition methods, anomaly detection methods, machine learning modeling, etc. [ 105 ] can be used for effective log analysis. Log analysis can assist in compliance with security policies and industry regulations, as well as provide a better user experience by encouraging the troubleshooting of technical problems and identifying areas where efficiency can be improved. For instance, web servers use log files to record data about website visitors. Windows event log analysis can help an investigator draw a timeline based on the logging information and the discovered artifacts. Overall, advanced analytics methods by taking into account machine learning modeling can play a significant role to extract insightful patterns from these log data, which can be used for building automated and smart applications, and thus can be considered as a key working area in data science.

Neural Networks and Deep Learning Analysis

Deep learning is a form of machine learning that uses artificial neural networks to create a computational architecture that learns from data by combining multiple processing layers, such as the input, hidden, and output layers [ 38 ]. The key benefit of deep learning over conventional machine learning methods is that it performs better in a variety of situations, particularly when learning from large datasets [ 114 , 140 ].

The most common deep learning algorithms are: multi-layer perceptron (MLP) [ 85 ], convolutional neural network (CNN or ConvNet) [ 67 ], long short term memory recurrent neural network (LSTM-RNN) [ 34 ]. Figure ​ Figure6 6 shows a structure of an artificial neural network modeling with multiple processing layers. The Backpropagation technique [ 38 ] is used to adjust the weight values internally while building the model. Convolutional neural networks (CNNs) [ 67 ] improve on the design of traditional artificial neural networks (ANNs), which include convolutional layers, pooling layers, and fully connected layers. It is commonly used in a variety of fields, including natural language processing, speech recognition, image processing, and other autocorrelated data since it takes advantage of the two-dimensional (2D) structure of the input data. AlexNet [ 60 ], Xception [ 21 ], Inception [ 125 ], Visual Geometry Group (VGG) [ 42 ], ResNet [ 43 ], etc., and other advanced deep learning models based on CNN are also used in the field.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig6_HTML.jpg

A structure of an artificial neural network modeling with multiple processing layers

In addition to CNN, recurrent neural network (RNN) architecture is another popular method used in deep learning. Long short-term memory (LSTM) is a popular type of recurrent neural network architecture used broadly in the area of deep learning. Unlike traditional feed-forward neural networks, LSTM has feedback connections. Thus, LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, sorting, and predicting data based on time-series data. Therefore, when the data is in a sequential format, such as time, sentence, etc., LSTM can be used, and it is widely used in the areas of time-series analysis, natural language processing, speech recognition, and so on.

In addition to the most popular deep learning methods mentioned above, several other deep learning approaches [ 104 ] exist in the field for various purposes. The self-organizing map (SOM) [ 58 ], for example, uses unsupervised learning to represent high-dimensional data as a 2D grid map, reducing dimensionality. Another learning technique that is commonly used for dimensionality reduction and feature extraction in unsupervised learning tasks is the autoencoder (AE) [ 10 ]. Restricted Boltzmann machines (RBM) can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling, according to [ 46 ]. A deep belief network (DBN) is usually made up of a backpropagation neural network and unsupervised networks like restricted Boltzmann machines (RBMs) or autoencoders (BPNN) [ 136 ]. A generative adversarial network (GAN) [ 35 ] is a deep learning network that can produce data with characteristics that are similar to the input data. Transfer learning is common worldwide presently because it can train deep neural networks with a small amount of data, which is usually the re-use of a pre-trained model on a new problem [ 137 ]. These deep learning methods can perform  well, particularly, when learning from large-scale datasets [ 105 , 140 ]. In our previous article Sarker et al. [ 104 ], we have summarized a brief discussion of various artificial neural networks (ANN) and deep learning (DL) models mentioned above, which can be used in a variety of data science and analytics tasks.

Real-World Application Domains

Almost every industry or organization is impacted by data, and thus “Data Science” including advanced analytics with machine learning modeling can be used in business, marketing, finance, IoT systems, cybersecurity, urban management, health care, government policies, and every possible industries, where data gets generated. In the following, we discuss ten most popular application areas based on data science and analytics.

  • Business or financial data science: In general, business data science can be considered as the study of business or e-commerce data to obtain insights about a business that can typically lead to smart decision-making as well as taking high-quality actions [ 90 ]. Data scientists can develop algorithms or data-driven models predicting customer behavior, identifying patterns and trends based on historical business data, which can help companies to reduce costs, improve service delivery, and generate recommendations for better decision-making. Eventually, business automation, intelligence, and efficiency can be achieved through the data science process discussed earlier, where various advanced analytics methods and machine learning modeling based on the collected data are the keys. Many online retailers, such as Amazon [ 76 ], can improve inventory management, avoid out-of-stock situations, and optimize logistics and warehousing using predictive modeling based on machine learning techniques [ 105 ]. In terms of finance, the historical data is related to financial institutions to make high-stakes business decisions, which is mostly used for risk management, fraud prevention, credit allocation, customer analytics, personalized services, algorithmic trading, etc. Overall, data science methodologies can play a key role in the future generation business or finance industry, particularly in terms of business automation, intelligence, and smart decision-making and systems.
  • Manufacturing or industrial data science: To compete in global production capability, quality, and cost, manufacturing industries have gone through many industrial revolutions [ 14 ]. The latest fourth industrial revolution, also known as Industry 4.0, is the emerging trend of automation and data exchange in manufacturing technology. Thus industrial data science, which is the study of industrial data to obtain insights that can typically lead to optimizing industrial applications, can play a vital role in such revolution. Manufacturing industries generate a large amount of data from various sources such as sensors, devices, networks, systems, and applications [ 6 , 68 ]. The main categories of industrial data include large-scale data devices, life-cycle production data, enterprise operation data, manufacturing value chain sources, and collaboration data from external sources [ 132 ]. The data needs to be processed, analyzed, and secured to help improve the system’s efficiency, safety, and scalability. Data science modeling thus can be used to maximize production, reduce costs and raise profits in manufacturing industries.
  • Medical or health data science: Healthcare is one of the most notable fields where data science is making major improvements. Health data science involves the extrapolation of actionable insights from sets of patient data, typically collected from electronic health records. To help organizations, improve the quality of treatment, lower the cost of care, and improve the patient experience, data can be obtained from several sources, e.g., the electronic health record, billing claims, cost estimates, and patient satisfaction surveys, etc., to analyze. In reality, healthcare analytics using machine learning modeling can minimize medical costs, predict infectious outbreaks, prevent preventable diseases, and generally improve the quality of life [ 81 , 119 ]. Across the global population, the average human lifespan is growing, presenting new challenges to today’s methods of delivery of care. Thus health data science modeling can play a role in analyzing current and historical data to predict trends, improve services, and even better monitor the spread of diseases. Eventually, it may lead to new approaches to improve patient care, clinical expertise, diagnosis, and management.
  • IoT data science: Internet of things (IoT) [ 9 ] is a revolutionary technical field that turns every electronic system into a smarter one and is therefore considered to be the big frontier that can enhance almost all activities in our lives. Machine learning has become a key technology for IoT applications because it uses expertise to identify patterns and generate models that help predict future behavior and events [ 112 ]. One of the IoT’s main fields of application is a smart city, which uses technology to improve city services and citizens’ living experiences. For example, using the relevant data, data science methods can be used for traffic prediction in smart cities, to estimate the total usage of energy of the citizens for a particular period. Deep learning-based models in data science can be built based on a large scale of IoT datasets [ 7 , 104 ]. Overall, data science and analytics approaches can aid modeling in a variety of IoT and smart city services, including smart governance, smart homes, education, connectivity, transportation, business, agriculture, health care, and industry, and many others.
  • Cybersecurity data science: Cybersecurity, or the practice of defending networks, systems, hardware, and data from digital attacks, is one of the most important fields of Industry 4.0 [ 114 , 121 ]. Data science techniques, particularly machine learning, have become a crucial cybersecurity technology that continually learns to identify trends by analyzing data, better detecting malware in encrypted traffic, finding insider threats, predicting where bad neighborhoods are online, keeping people safe while surfing, or protecting information in the cloud by uncovering suspicious user activity [ 114 ]. For instance, machine learning and deep learning-based security modeling can be used to effectively detect various types of cyberattacks or anomalies [ 103 , 106 ]. To generate security policy rules, association rule learning can play a significant role to build rule-based systems [ 102 ]. Deep learning-based security models can perform better when utilizing the large scale of security datasets [ 140 ]. Thus data science modeling can enable professionals in cybersecurity to be more proactive in preventing threats and reacting in real-time to active attacks, through extracting actionable insights from the security datasets.
  • Behavioral data science: Behavioral data is information produced as a result of activities, most commonly commercial behavior, performed on a variety of Internet-connected devices, such as a PC, tablet, or smartphones [ 112 ]. Websites, mobile applications, marketing automation systems, call centers, help desks, and billing systems, etc. are all common sources of behavioral data. Behavioral data is much more than just data, which is not static data [ 108 ]. Advanced analytics of these data including machine learning modeling can facilitate in several areas such as predicting future sales trends and product recommendations in e-commerce and retail; predicting usage trends, load, and user preferences in future releases in online gaming; determining how users use an application to predict future usage and preferences in application development; breaking users down into similar groups to gain a more focused understanding of their behavior in cohort analysis; detecting compromised credentials and insider threats by locating anomalous behavior, or making suggestions, etc. Overall, behavioral data science modeling typically enables to make the right offers to the right consumers at the right time on various common platforms such as e-commerce platforms, online games, web and mobile applications, and IoT. In social context, analyzing the behavioral data of human being using advanced analytics methods and the extracted insights from social data can be used for data-driven intelligent social services, which can be considered as social data science.
  • Mobile data science: Today’s smart mobile phones are considered as “next-generation, multi-functional cell phones that facilitate data processing, as well as enhanced wireless connectivity” [ 146 ]. In our earlier paper [ 112 ], we have shown that users’ interest in “Mobile Phones” is more and more than other platforms like “Desktop Computer”, “Laptop Computer” or “Tablet Computer” in recent years. People use smartphones for a variety of activities, including e-mailing, instant messaging, online shopping, Internet surfing, entertainment, social media such as Facebook, Linkedin, and Twitter, and various IoT services such as smart cities, health, and transportation services, and many others. Intelligent apps are based on the extracted insight from the relevant datasets depending on apps characteristics, such as action-oriented, adaptive in nature, suggestive and decision-oriented, data-driven, context-awareness, and cross-platform operation [ 112 ]. As a result, mobile data science, which involves gathering a large amount of mobile data from various sources and analyzing it using machine learning techniques to discover useful insights or data-driven trends, can play an important role in the development of intelligent smartphone applications.
  • Multimedia data science: Over the last few years, a big data revolution in multimedia management systems has resulted from the rapid and widespread use of multimedia data, such as image, audio, video, and text, as well as the ease of access and availability of multimedia sources. Currently, multimedia sharing websites, such as Yahoo Flickr, iCloud, and YouTube, and social networks such as Facebook, Instagram, and Twitter, are considered as valuable sources of multimedia big data [ 89 ]. People, particularly younger generations, spend a lot of time on the Internet and social networks to connect with others, exchange information, and create multimedia data, thanks to the advent of new technology and the advanced capabilities of smartphones and tablets. Multimedia analytics deals with the problem of effectively and efficiently manipulating, handling, mining, interpreting, and visualizing various forms of data to solve real-world problems. Text analysis, image or video processing, computer vision, audio or speech processing, and database management are among the solutions available for a range of applications including healthcare, education, entertainment, and mobile devices.
  • Smart cities or urban data science: Today, more than half of the world’s population live in urban areas or cities [ 80 ] and considered as drivers or hubs of economic growth, wealth creation, well-being, and social activity [ 96 , 116 ]. In addition to cities, “Urban area” can refer to the surrounding areas such as towns, conurbations, or suburbs. Thus, a large amount of data documenting daily events, perceptions, thoughts, and emotions of citizens or people are recorded, that are loosely categorized into personal data, e.g., household, education, employment, health, immigration, crime, etc., proprietary data, e.g., banking, retail, online platforms data, etc., government data, e.g., citywide crime statistics, or government institutions, etc., Open and public data, e.g., data.gov, ordnance survey, and organic and crowdsourced data, e.g., user-generated web data, social media, Wikipedia, etc. [ 29 ]. The field of urban data science typically focuses on providing more effective solutions from a data-driven perspective, through extracting knowledge and actionable insights from such urban data. Advanced analytics of these data using machine learning techniques [ 105 ] can facilitate the efficient management of urban areas including real-time management, e.g., traffic flow management, evidence-based planning decisions which pertain to the longer-term strategic role of forecasting for urban planning, e.g., crime prevention, public safety, and security, or framing the future, e.g., political decision-making [ 29 ]. Overall, it can contribute to government and public planning, as well as relevant sectors including retail, financial services, mobility, health, policing, and utilities within a data-rich urban environment through data-driven smart decision-making and policies, which lead to smart cities and improve the quality of human life.
  • Smart villages or rural data science: Rural areas or countryside are the opposite of urban areas, that include villages, hamlets, or agricultural areas. The field of rural data science typically focuses on making better decisions and providing more effective solutions that include protecting public safety, providing critical health services, agriculture, and fostering economic development from a data-driven perspective, through extracting knowledge and actionable insights from the collected rural data. Advanced analytics of rural data including machine learning [ 105 ] modeling can facilitate providing new opportunities for them to build insights and capacity to meet current needs and prepare for their futures. For instance, machine learning modeling [ 105 ] can help farmers to enhance their decisions to adopt sustainable agriculture utilizing the increasing amount of data captured by emerging technologies, e.g., the internet of things (IoT), mobile technologies and devices, etc. [ 1 , 51 , 52 ]. Thus, rural data science can play a very important role in the economic and social development of rural areas, through agriculture, business, self-employment, construction, banking, healthcare, governance, or other services, etc. that lead to smarter villages.

Overall, we can conclude that data science modeling can be used to help drive changes and improvements in almost every sector in our real-world life, where the relevant data is available to analyze. To gather the right data and extract useful knowledge or actionable insights from the data for making smart decisions is the key to data science modeling in any application domain. Based on our discussion on the above ten potential real-world application domains by taking into account data-driven smart computing and decision making, we can say that the prospects of data science and the role of data scientists are huge for the future world. The “Data Scientists” typically analyze information from multiple sources to better understand the data and business problems, and develop machine learning-based analytical modeling or algorithms, or data-driven tools, or solutions, focused on advanced analytics, which can make today’s computing process smarter, automated, and intelligent.

Challenges and Research Directions

Our study on data science and analytics, particularly data science modeling in “ Understanding data science modeling ”, advanced analytics methods and smart computing in “ Advanced analytics methods and smart computing ”, and real-world application areas in “ Real-world application domains ” open several research issues in the area of data-driven business solutions and eventual data products. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions to build data-driven products.

  • Understanding the real-world business problems and associated data including nature, e.g., what forms, type, size, labels, etc., is the first challenge in the data science modeling, discussed briefly in “ Understanding data science modeling ”. This is actually to identify, specify, represent and quantify the domain-specific business problems and data according to the requirements. For a data-driven effective business solution, there must be a well-defined workflow before beginning the actual data analysis work. Furthermore, gathering business data is difficult because data sources can be numerous and dynamic. As a result, gathering different forms of real-world data, such as structured, or unstructured, related to a specific business issue with legal access, which varies from application to application, is challenging. Moreover, data annotation, which is typically the process of categorization, tagging, or labeling of raw data, for the purpose of building data-driven models, is another challenging issue. Thus, the primary task is to conduct a more in-depth analysis of data collection and dynamic annotation methods. Therefore, understanding the business problem, as well as integrating and managing the raw data gathered for efficient data analysis, may be one of the most challenging aspects of working in the field of data science and analytics.
  • The next challenge is the extraction of the relevant and accurate information from the collected data mentioned above. The main focus of data scientists is typically to disclose, describe, represent, and capture data-driven intelligence for actionable insights from data. However, the real-world data may contain many ambiguous values, missing values, outliers, and meaningless data [ 101 ]. The advanced analytics methods including machine and deep learning modeling, discussed in “ Advanced analytics methods and smart computing ”, highly impact the quality, and availability of the data. Thus understanding real-world business scenario and associated data, to whether, how, and why they are insufficient, missing, or problematic, then extend or redevelop the existing methods, such as large-scale hypothesis testing, learning inconsistency, and uncertainty, etc. to address the complexities in data and business problems is important. Therefore, developing new techniques to effectively pre-process the diverse data collected from multiple sources, according to their nature and characteristics could be another challenging task.
  • Understanding and selecting the appropriate analytical methods to extract the useful insights for smart decision-making for a particular business problem is the main issue in the area of data science. The emphasis of advanced analytics is more on anticipating the use of data to detect patterns to determine what is likely to occur in the future. Basic analytics offer a description of data in general, while advanced analytics is a step forward in offering a deeper understanding of data and helping to granular data analysis. Thus, understanding the advanced analytics methods, especially machine and deep learning-based modeling is the key. The traditional learning techniques mentioned in “ Advanced analytics methods and smart computing ” may not be directly applicable for the expected outcome in many cases. For instance, in a rule-based system, the traditional association rule learning technique [ 4 ] may  produce redundant rules from the data that makes the decision-making process complex and ineffective [ 113 ]. Thus, a scientific understanding of the learning algorithms, mathematical properties, how the techniques are robust or fragile to input data, is needed to understand. Therefore, a deeper understanding of the strengths and drawbacks of the existing machine and deep learning methods [ 38 , 105 ] to solve a particular business problem is needed, consequently to improve or optimize the learning algorithms according to the data characteristics, or to propose the new algorithm/techniques with higher accuracy becomes a significant challenging issue for the future generation data scientists.
  • The traditional data-driven models or systems typically use a large amount of business data to generate data-driven decisions. In several application fields, however, the new trends are more likely to be interesting and useful for modeling and predicting the future than older ones. For example, smartphone user behavior modeling, IoT services, stock market forecasting, health or transport service, job market analysis, and other related areas where time-series and actual human interests or preferences are involved over time. Thus, rather than considering the traditional data analysis, the concept of RecencyMiner, i.e., recent pattern-based extracted insight or knowledge proposed in our earlier paper Sarker et al. [ 108 ] might be effective. Therefore, to propose the new techniques by taking into account the recent data patterns, and consequently to build a recency-based data-driven model for solving real-world problems, is another significant challenging issue in the area.
  • The most crucial task for a data-driven smart system is to create a framework that supports data science modeling discussed in “ Understanding data science modeling ”. As a result, advanced analytical methods based on machine learning or deep learning techniques can be considered in such a system to make the framework capable of resolving the issues. Besides, incorporating contextual information such as temporal context, spatial context, social context, environmental context, etc. [ 100 ] can be used for building an adaptive, context-aware, and dynamic model or framework, depending on the problem domain. As a result, a well-designed data-driven framework, as well as experimental evaluation, is a very important direction to effectively solve a business problem in a particular domain, as well as a big challenge for the data scientists.
  • In several important application areas such as autonomous cars, criminal justice, health care, recruitment, housing, management of the human resource, public safety, where decisions made by models, or AI agents, have a direct effect on human lives. As a result, there is growing concerned about whether these decisions can be trusted, to be right, reasonable, ethical, personalized, accurate, robust, and secure, particularly in the context of adversarial attacks [ 104 ]. If we can explain the result in a meaningful way, then the model can be better trusted by the end-user. For machine-learned models, new trust properties yield new trade-offs, such as privacy versus accuracy; robustness versus efficiency; fairness versus robustness. Therefore, incorporating trustworthy AI particularly, data-driven or machine learning modeling could be another challenging issue in the area.

In the above, we have summarized and discussed several challenges and the potential research opportunities and directions, within the scope of our study in the area of data science and advanced analytics. The data scientists in academia/industry and the researchers in the relevant area have the opportunity to contribute to each issue identified above and build effective data-driven models or systems, to make smart decisions in the corresponding business domains.

In this paper, we have presented a comprehensive view on data science including various types of advanced analytical methods that can be applied to enhance the intelligence and the capabilities of an application. We have also visualized the current popularity of data science and machine learning-based advanced analytical modeling and also differentiate these from the relevant terms used in the area, to make the position of this paper. A thorough study on the data science modeling with its various processing modules that are needed to extract the actionable insights from the data for a particular business problem and the eventual data product. Thus, according to our goal, we have briefly discussed how different data modules can play a significant role in a data-driven business solution through the data science process. For this, we have also summarized various types of advanced analytical methods and outcomes as well as machine learning modeling that are needed to solve the associated business problems. Thus, this study’s key contribution has been identified as the explanation of different advanced analytical methods and their applicability in various real-world data-driven applications areas including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making.

Finally, within the scope of our study, we have outlined and discussed the challenges we faced, as well as possible research opportunities and future directions. As a result, the challenges identified provide promising research opportunities in the field that can be explored with effective solutions to improve the data-driven model and systems. Overall, we conclude that our study of advanced analytical solutions based on data science and machine learning methods, leads in a positive direction and can be used as a reference guide for future research and applications in the field of data science and its real-world applications by both academia and industry professionals.

Declarations

The author declares no conflict of interest.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ten Research Challenge Areas in Data Science

Jeannette Wing

Although data science builds on knowledge from computer science, mathematics, statistics, and other disciplines, data science is a unique field with many mysteries to unlock: challenging scientific questions and pressing questions of societal importance.

Is data science a discipline?

Data science is a field of study: one can get a degree in data science, get a job as a data scientist, and get funded to do data science research.  But is data science a discipline, or will it evolve to be one, distinct from other disciplines?  Here are a few meta-questions about data science as a discipline.

  • What is/are the driving deep question(s) of data science?   Each scientific discipline (usually) has one or more “deep” questions that drive its research agenda: What is the origin of the universe (astrophysics)?  What is the origin of life (biology)?  What is computable (computer science)?  Does data science inherit its deep questions from all its constituency disciplines or does it have its own unique ones?
  • What is the role of the domain in the field of data science?   People (including this author) (Wing, J.M., Janeia, V.P., Kloefkorn, T., & Erickson, L.C. (2018)) have argued that data science is unique in that it is not just about methods, but about the use of those methods in the context of a domain—the domain of the data being collected and analyzed; the domain for which a question to be answered comes from collecting and analyzing the data.  Is the inclusion of a domain inherent in defining the field of data science?  If so, is the way it is included unique to data science?
  • What makes data science data science?   Is there a problem unique to data science that one can convincingly argue would not be addressed or asked by any of its constituent disciplines, e.g., computer science and statistics?

Ten research areas

While answering the above meta-questions is still under lively debate, including within the pages of this  journal, we can ask an easier question, one that also underlies any field of study: What are the research challenge areas that drive the study of data science?  Here is a list of ten.  They are not in any priority order, and some of them are related to each other.  They are phrased as challenge areas, not challenge questions.  They are not necessarily the “top ten” but they are a good ten to start the community discussing what a broad research agenda for data science might look like. 1

  • Scientific understanding of learning, especially deep learning algorithms.    As much as we admire the astonishing successes of deep learning, we still lack a scientific understanding of why deep learning works so well.  We do not understand the mathematical properties of deep learning models.  We do not know how to explain why a deep learning model produces one result and not another.  We do not understand how robust or fragile they are to perturbations to input data distributions.  We do not understand how to verify that deep learning will perform the intended task well on new input data.  Deep learning is an example of where experimentation in a field is far ahead of any kind of theoretical understanding.
  • Causal reasoning.   Machine learning is a powerful tool to find patterns and examine correlations, particularly in large data sets. While the adoption of machine learning has opened many fruitful areas of research in economics, social science, and medicine, these fields require methods that move beyond correlational analyses and can tackle causal questions. A rich and growing area of current study is revisiting causal inference in the presence of large amounts of data.  Economists are already revisiting causal reasoning by devising new methods at the intersection of economics and machine learning that make causal inference estimation more efficient and flexible (Athey, 2016), (Taddy, 2019).  Data scientists are just beginning to explore multiple causal inference, not just to overcome some of the strong assumptions of univariate causal inference, but because most real-world observations are due to multiple factors that interact with each other (Wang & Blei, 2018).
  • Precious data.    Data can be precious for one of three reasons: the dataset is expensive to collect; the dataset contains a rare event (low signal-to-noise ratio );  or the dataset is artisanal—small and task-specific.   A good example of expensive data comes from large, one-of, expensive scientific instruments, e.g., the Large Synoptic Survey Telescope, the Large Hadron Collider, the IceCube Neutrino Detector at the South Pole.  A good example of rare event data is data from sensors on physical infrastructure, such as bridges and tunnels; sensors produce a lot of raw data, but the disastrous event they are used to predict is (thankfully) rare.   Rare data can also be expensive to collect.  A good example of artisanal data is the tens of millions of court judgments that China has released online to the public since 2014 (Liebman, Roberts, Stern, & Wang, 2017) or the 2+ million US government declassified documents collected by Columbia’s  History Lab  (Connelly, Madigan, Jervis, Spirling, & Hicks, 2019).   For each of these different kinds of precious data, we need new data science methods and algorithms, taking into consideration the domain and intended uses of the data.
  • Multiple, heterogeneous data sources.   For some problems, we can collect lots of data from different data sources to improve our models.  For example, to predict the effectiveness of a specific cancer treatment for a human, we might build a model based on 2-D cell lines from mice, more expensive 3-D cell lines from mice, and the costly DNA sequence of the cancer cells extracted from the human. State-of-the-art data science methods cannot as yet handle combining multiple, heterogeneous sources of data to build a single, accurate model.  Since many of these data sources might be precious data, this challenge is related to the third challenge.  Focused research in combining multiple sources of data will provide extraordinary impact.
  • Inferring from noisy and/or incomplete data.   The real world is messy and we often do not have complete information about every data point.  Yet, data scientists want to build models from such data to do prediction and inference.  A great example of a novel formulation of this problem is the planned use of differential privacy for Census 2020 data (Garfinkel, 2019), where noise is deliberately added to a query result, to maintain the privacy of individuals participating in the census. Handling “deliberate” noise is particularly important for researchers working with small geographic areas such as census blocks, since the added noise can make the data uninformative at those levels of aggregation. How then can social scientists, who for decades have been drawing inferences from census data, make inferences on this “noisy” data and how do they combine their past inferences with these new ones? Machine learning’s ability to better separate noise from signal can improve the efficiency and accuracy of those inferences.
  • Trustworthy AI.   We have seen rapid deployment of systems using artificial intelligence (AI) and machine learning in critical domains such as autonomous vehicles, criminal justice, healthcare, hiring, housing, human resource management, law enforcement, and public safety, where decisions taken by AI agents directly impact human lives. Consequently, there is an increasing concern if these decisions can be trusted to be correct, reliable, robust, safe, secure, and fair, especially under adversarial attacks. One approach to building trust is through providing explanations of the outcomes of a machine learned model.  If we can interpret the outcome in a meaningful way, then the end user can better trust the model.  Another approach is through formal methods, where one strives to prove once and for all a model satisfies a certain property.  New trust properties yield new tradeoffs for machine learned models, e.g., privacy versus accuracy; robustness versus efficiency. There are actually multiple audiences for trustworthy models: the model developer, the model user, and the model customer.  Ultimately, for widespread adoption of the technology, it is the public who must trust these automated decision systems.
  • Computing systems for data-intensive applications.    Traditional designs of computing systems have focused on computational speed and power: the more cycles, the faster the application can run.  Today, the primary focus of applications, especially in the sciences (e.g., astronomy, biology, climate science, materials science), is data.  Also, novel special-purpose processors, e.g., GPUs, FPGAs, TPUs, are now commonly found in large data centers. Even with all these data and all this fast and flexible computational power, it can still take weeks to build accurate predictive models; however, applications, whether from science or industry, want  real-time  predictions.  Also, data-hungry and compute-hungry algorithms, e.g., deep learning, are energy hogs (Strubell, Ganesh, & McCallum, 2019).   We should consider not only space and time, but also energy consumption, in our performance metrics.  In short, we need to rethink computer systems design from first principles, with data (not compute) the focus.  New computing systems designs need to consider: heterogeneous processing; efficient layout of massive amounts of data for fast access; the target domain, application, or even task; and energy efficiency.
  • Automating front-end stages of the data life cycle.   While the excitement in data science is due largely to the successes of machine learning, and more specifically deep learning, before we get to use machine learning methods, we need to prepare the data for analysis.  The early stages in the data life cycle (Wing, 2019) are still labor intensive and tedious.  Data scientists, drawing on both computational and statistical methods, need to devise automated methods that address data cleaning and data wrangling, without losing other desired properties, e.g., accuracy, precision, and robustness, of the end model.One example of emerging work in this area is the Data Analysis Baseline Library (Mueller, 2019), which provides a framework to simplify and automate data cleaning, visualization, model building, and model interpretation.  The Snorkel project addresses the tedious task of data labeling (Ratner et al., 2018).
  • Privacy.   Today, the more data we have, the better the model we can build.  One way to get more data is to share data, e.g., multiple parties pool their individual datasets to build collectively a better model than any one party can build.  However, in many cases, due to regulation or privacy concerns, we need to preserve the confidentiality of each party’s dataset.  An example of this scenario is in building a model to predict whether someone has a disease or not. If multiple hospitals could share their patient records, we could build a better predictive model; but due to Health Insurance Portability and Accountability Act (HIPAA) privacy regulations, hospitals cannot share these records. We are only now exploring practical and scalable ways, using cryptographic and statistical methods, for multiple parties to share data and/or share models to preserve the privacy of each party’s dataset.  Industry and government are exploring and exploiting methods and concepts, such as secure multi-party computation, homomorphic encryption, zero-knowledge proofs, and differential privacy, as part of a point solution to a point problem.
  • Ethics.   Data science raises new ethical issues. They can be framed along three axes: (1) the ethics of data: how data are generated, recorded, and shared; (2) the ethics of algorithms: how artificial intelligence, machine learning, and robots interpret data; and (3) the ethics of practices: devising responsible innovation and professional codes to guide this emerging science (Floridi & Taddeo, 2016) and for defining Institutional Review Board (IRB) criteria and processes specific for data (Wing, Janeia, Kloefkorn, & Erickson 2018). Example ethical questions include how to detect and eliminate racial, gender, socio-economic, or other biases in machine learning models.

Closing remarks

As many universities and colleges are creating new data science schools, institutes, centers, etc. (Wing, Janeia, Kloefkorn, & Erickson 2018), it is worth reflecting on data science as a field.  Will data science as an area of research and education evolve into being its own discipline or be a field that cuts across all other disciplines?  One could argue that computer science, mathematics, and statistics share this commonality: they are each their own discipline, but they each can be applied to (almost) every other discipline. What will data science be in 10 or 50 years?

Acknowledgements

I would like to thank Cliff Stein, Gerad Torats-Espinosa, Max Topaz, and Richard Witten for their feedback on earlier renditions of this article.  Many thanks to all Columbia Data Science faculty who have helped me formulate and discuss these ten (and other) challenges during our Fall 2019 retreat.

Athey, S. (2016). “Susan Athey on how economists can use machine learning to improve policy,”  Retrieved from  https://siepr.stanford.edu/news/susan-athey-how-economists-can-use-machine-learning-improve-policy

Berger, J., He, X., Madigan, C., Murphy, S., Yu, B., & Wellner, J. (2019), Statistics at a Crossroad: Who is for the Challenge? NSF workshop report.  Retrieved from  https://hub.ki/groups/statscrossroad

Connelly, M., Madigan, D., Jervis, R., Spirling, A., & Hicks, R. (2019). The History Lab.  Retrieved from   http://history-lab.org/

Floridi , L. &  Taddeo , M. (2016). What is Data Ethics?  Philosophical Transactions of the Royal Society A , vol. 374, issue 2083, December 2016.

Garfinkel, S. (2019). Deploying Differential Privacy for the 2020 Census of Population and Housing. Privacy Enhancing Technologies Symposium, Stockholm, Sweden.  Retrieved from  http://simson.net/ref/2019/2019-07-16%20Deploying%20Differential%20Privacy%20for%20the%202020%20Census.pdf

Liebman, B.L., Roberts, M., Stern, R.E., & Wang, A. (2017).  Mass Digitization of Chinese Court Decisions: How to Use Text as Data in the Field of Chinese Law. UC  San Diego School of Global Policy and Strategy, 21 st  Century China Center Research Paper No. 2017-01; Columbia Public Law Research Paper No. 14-551. Retrieved from  https://scholarship.law.columbia.edu/faculty_scholarship/2039

Mueller, A. (2019). Data Analysis Baseline Library. Retrieved from  https://libraries.io/github/amueller/dabl

Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S, & Ré, C. (2018).  Snorkel: Rapid Training Data Creation with Weak Supervision . Proceedings of the 44 th  International Conference on Very Large Data Bases.

Strubell E., Ganesh, A., & McCallum, A. (2019),”Energy and Policy Considerations for Deep Learning in NLP.  Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).

Taddy, M. (2019).   Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions , Mc-Graw Hill.

Wang, Y. & Blei, D.M. (2018). The Blessings of Multiple Causes, Retrieved from  https://arxiv.org/abs/1805.06826

Wing, J.M. (2019), The Data Life Cycle,  Harvard Data Science Review , vol. 1, no. 1. 

Wing, J.M., Janeia, V.P., Kloefkorn, T., & Erickson, L.C. (2018). Data Science Leadership Summit, Workshop Report, National Science Foundation.  Retrieved from  https://dl.acm.org/citation.cfm?id=3293458

J.M. Wing, “ Ten Research Challenge Areas in Data Science ,” Voices, Data Science Institute, Columbia University, January 2, 2020.  arXiv:2002.05658 .

Jeannette M. Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University.

Emerging trends and global scope of big data analytics: a scientometric analysis

  • Published: 27 October 2020
  • Volume 55 , pages 1371–1396, ( 2021 )

Cite this article

research topic on data analytics

  • Keshav Singh Rawat   ORCID: orcid.org/0000-0002-1497-985X 1 &
  • Sandeep Kumar Sood 1  

1172 Accesses

24 Citations

Explore all metrics

The primary sources of big data nowadays are from cloud computing, social networks, and the internet of things, and henceforth the data analytics has gained popularity these days, with the increasing demand for these technologies. This study presents scientometric analysis to identify overall growth, emerging trends, and global scope of data analytics research during 2010–2019. This study uses a bibliometric database retrieved from the Scopus in CSV files that contain bibliographic information. This study provides a detailed look of bibliometric features of Scopus indexed documents and analyses bibliometric networks to identify the hidden information from the downloaded dataset. This study focuses on the research publication growth, subject categories, geographical distribution, citation, and productivity parameters of bibliometric data. Furthermore, it identifies significant major contributors, highly cited publications, prominent journals, influential institutes, and research collaborations. This study also reveals research frontiers,and hotspots of data analytics research by analyzing keyword co-occurrence using VOSviewer. The outcomes of this study present the applications, emerging trends, and global research landscape over the last decade that help to understand fundamental research and the directions of future research in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research topic on data analytics

Similar content being viewed by others

research topic on data analytics

A decade of big data literature: analysis of trends in light of bibliometrics

research topic on data analytics

The evolution of data science and big data research: A bibliometric analysis

A bibliometric approach to tracking big data research trends.

Ahmed, E., Yaqoob, I., Hashem, I., Khan, I., Ahmed, A., Imran, M., Vasilakos, A.: The role of big data analytics in internet of things. Comput. Netw. 129 (2), 459–471 (2017). https://doi.org/10.1016/j.comnet.2017.06.013

Article   Google Scholar  

Al-Fuqaha, A., guizani, m, Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. 17 (4), 2347–2376 (2015). https://doi.org/10.1109/COMST.2015.2444095

Aldowah, H., Al-Samarraie, H., Fauzy, W.M.: Educational data mining and learning analytics for 21st century higher education: A review and synthesis. Telematics Inform. 37 , 13–49 (2019). https://doi.org/10.1016/j.tele.2019.01.007

Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinlander, A., Sax, M.J., Schelter, S., Hoger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23 (6), 939–964 (2014). https://doi.org/10.1007/s00778-014-0357-y

Attaran, M., Stark, J., Stotler, D.: Opportunities and challenges for big data analytics in us higher education: A conceptual model for implementation. Ind. Higher Educ. 32 (3), 169–182 (2018). https://doi.org/10.1177/0950422218770937

Baker, R.S., Inventado, P.S.: Educational Data Mining and Learning Analytics, pp. 61–75. Springer New York (2014). https://doi.org/10.1007/978-1-4614-3305-7_4

Banerjee, A., Chakraborty, C., Kumar, A., Biswas, D.: Emerging trends in iot and big data analytics for biomedical and health care technologies. In: Handbook of Data Science Approaches for Biomedical Engineering, chap. 5, pp. 121 – 152. Academic Press (2020). https://doi.org/10.1016/B978-0-12-818318-2.00005-2

Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI Soc. 30 , 89–116 (2015). https://doi.org/10.1007/s00146-014-0549-4

Biljecki, F.: A scientometric analysis of selected giscience journals. Int. J. Geogr. Inf. Sci. 30 (7), 1302–1335 (2016). https://doi.org/10.1080/13658816.2015.1130831

Botta, A., de Donato, W., Persico, V., Pescape, A.: Integration of cloud computing and internet of things: a survey. Future Gener. Comput. Syst. 56 , 684–700 (2016). https://doi.org/10.1016/j.future.2015.09.021

Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15 (5), 662–679 (2012). https://doi.org/10.1080/1369118X.2012.678878

Chadegani, A.A., Salehi, H., Yunus, M.M., Farhadi, H., Fooladi, M., Farhadi, M., Ebrahim, N.A.: A comparison between two main academic literature collections: Web of science and scopus databases. Asian Soc. Sci. 9 , 18–26 (2013). https://doi.org/10.5539/ass.v9n5p18

Chen, H., Chiang, R.H.L., Storey, V.C.: Business intelligence and analytics: From big data to big impact. MIS Q. 36 (4), 1165–1188 (2012). https://doi.org/10.2307/41703503

Cobo, M., Lopez-Herrera, A.G., Herrera-Viedma, E., Herrera, F.: Scimat: A new science mapping analysis software tool. J. Am. Soc. Inform. Sci. Technol. 63 , 1609–1630 (2012). https://doi.org/10.1002/asi.22688

Dash, S., Shakyawar, S., Sharma, M., Kaushik, S.: Big data in healthcare: management, analysis and future prospects. J. Big Data. (2019). https://doi.org/10.1186/s40537-019-0217-0

Erevelles, S., Fukawa, N., Swayne, L.: Big data consumer analytics and the transformation of marketing. J. Bus. Res. 69 (2), 897–904 (2016). https://doi.org/10.1016/j.jbusres.2015.07.001

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magaz. 17 (3), 37–54 (1996). https://doi.org/10.1609/aimag.v17i3.1230

Galetsi, P., Katsaliaki, K.: A review of the literature on big data analytics in healthcare. J. Oper. Res. Soc. (2019). https://doi.org/10.1080/01605682.2019.1630328

Gandomi, A., Haider, M.: Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manage. 35 (2), 137–144 (2015). https://doi.org/10.1016/j.ijinfomgt.2014.10.007

Ge, M., Bangui, H., Buhnova, B.: Big data for internet of things: a survey. Future Gener. Comput. Syst. 87 , 601–614 (2018). https://doi.org/10.1016/j.future.2018.04.053

Ge, Z., Song, Z., Ding, S.X., Huang, B.: Data mining and analytics in the process industry: The role of machine learning. IEEE Access 5 , 20590–20616 (2017). https://doi.org/10.1109/ACCESS.2017.2756872

Ghani, N., Hamid, S., Hashem, I., Ahmed, E.: Social media big data analytics: a survey. Comput. Human Behav. (2019). https://doi.org/10.1016/j.chb.2018.08.039

Hariri, R.H., Fredericks, E.M., Bowers, K.M.: Uncertainty in big data analytics: survey, opportunities, and challenges. J. Big Data (2019). https://doi.org/10.1186/s40537-019-0206-3

Hashem, I.A.T., Chang, V., Anuar, N.B., Adewole, K., Yaqoob, I., Gani, A., Ahmed, E., Chiroma, H.: The role of big data in smart city. Int. J. Inf. Manage. 36 (5), 748–758 (2016). https://doi.org/10.1016/j.ijinfomgt.2016.05.002

Hassan, M.D., Castanha, R.C.G., Wolfram, D.: Scientometric analysis of global trypanosomiasis research: 1988–2017. J. Infect. Public Health 13 (4), 514–520 (2020). https://doi.org/10.1016/j.jiph.2019.10.006

Hassan, S.U., Haddawy, P., Zhu, J.: A bibliometric study of the world’s research activity in sustainable development and its sub-areas using scientific literature. Scientometrics 99 , 549–579 (2014). https://doi.org/10.1007/s11192-013-1193-3

Heilig, L., Voss, S.: A scientometric analysis of cloud computing literature. IEEE Trans. Cloud Comput. 2 (3), 266–278 (2014). https://doi.org/10.1109/TCC.2014.2321168

Herodotou, H., Babu, S.: Pro ling, what-if analysis, and cost-based optimization of mapreduce programs. Proc. VLDB Endow. 4 (11), 1111–1122 (2011). https://doi.org/10.14778/3402707.3402746

Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Nat. Acad. Sci. USA 102 (46), 16569–16572 (2005). https://doi.org/10.1073/pnas.0507655102

Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: A technology tutorial. IEEE Access 2 , 652–687 (2014). https://doi.org/10.1109/ACCESS.2014.2332453

Huang, L., Chen, K., Zhou, M.: Climate change and carbon sink: a bibliometric analysis. Environ. Sci. Pollut. Res. 27 , 8740–8758 (2020). https://doi.org/10.1007/s11356-019-07489-6

Imran, A., Zoha, A., Abu-Dayya, A.: Challenges in 5g: how to empower son with big data for enabling 5g. IEEE Netw. 28 (6), 27–33 (2014). https://doi.org/10.1109/MNET.2014.6963801

Jacomy, M., Venturini, T., Heymann, S., Bastian, M.: Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS ONE 9 (6), 1–12 (2014). https://doi.org/10.1371/journal.pone.0098679

Jalali, S.M.J., Mahdizadeh, E., Mahmoudi, M., Moro, S.: Analytical assessment process of e-learning domain research between 1980 and 2014. Int. J. Manage. Educ. 12 , 43 (2018). https://doi.org/10.1504/IJMIE.2018.10008710

Jalali, S.M.J., Park, H.W.: State of the art in business analytics: themes and collaborations. Qual. Quant. 52 , 627–633 (2018). https://doi.org/10.1007/s11135-017-0522-7

Jalali, S.M.J., Park, H.W., Raeesi Vanani, I., Kim Hung, P.: Research trends on big data domain using text mining algorithms. Digital Scholarship in the Humanities (2020). https://doi.org/10.1093/llc/fqaa012

Ji, Q., Pang, X., Zhao, X.: A bibliometric analysis of research on antarctica during 1993–2012. Scientometrics 101 , 1925–1939 (2014). https://doi.org/10.1007/s11192-014-1332-5

Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74 (7), 2561–2573 (2014). https://doi.org/10.1016/j.jpdc.2014.01.003

Kaur, A., Sood, S.K.: Ten years of disaster management and use of ict: a scientometric analysis. Earth Sci. Inf. 13 , 1–27 (2020). https://doi.org/10.1007/s12145-019-00408-w

Khan, M., Saqib, S., Alyas, T., Rehman, A., Saeed, Y., Zeb, A., Zareei, M., Mohamed, E.: Effective demand forecasting model using business intelligence empowered with machine learning. IEEE Access 8 , 116013–116023 (2020). https://doi.org/10.1109/ACCESS.2020.3003790

Kitchin, R.: The real-time city? Big data and smart urbanism. GeoJournal 79 , 1–14 (2014a). https://doi.org/10.1007/s10708-013-9516-8

Kitchin, R.: Big data, new epistemologies and paradigm shifts. Big Data Soc. (2014b). https://doi.org/10.1177/2053951714528481

Klievink, B., Romijn, B.J., Cunningham, S., Bruijn, H.: Big data in the public sector: uncertainties and readiness. Inf/ Syst. Front. 19 , 267–283 (2017). https://doi.org/10.1007/s10796-016-9686-2

Kwon, O., Lee, N., Shin, B.: Data quality management, data usage experience and acquisition intention of big data analytics. Int. J. Inf. Manage. 34 (3), 387–394 (2014). https://doi.org/10.1016/j.ijinfomgt.2014.02.002

Liang, T.P., Liu, Y.H.: Research landscape of business intelligence and big data analytics: a bibliometrics study. Exp. Syst. Appl. 111 , 2–10 (2018). https://doi.org/10.1016/j.eswa.2018.05.018 . Big Data Analytics for Business Intelligence

Liu, J., Tian, J., Kong, X., Lee, I., Xia, F.: Two decades of information systems: a bibliometric review. Scientometrics 118 , 617–643 (2019). https://doi.org/10.1007/s11192-018-2974-5

Mehta, N., Pandit, A.: Concurrence of big data analytics and healthcare: a systematic review. Int. J. Med. Inf. 114 , 57–65 (2018). https://doi.org/10.1016/j.ijmedinf.2018.03.013

Najafabadi, M.M., Villanustre, F., Seliya, T.M.K.N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J Big Data (2015). https://doi.org/10.1186/s40537-014-0007-7

Nguyen, A., Gardner, L., Sheridan, D.: Data analytics in higher education: an integrated view. J. Inf. Syst. Educ. 31 , 61–71 (2020). https://aisel.aisnet.org/jise/vol31/iss1/5

Nuaimi, E.A., Neyadi, H.A., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart cities. J. Internet Serv. Appl. (2015). https://doi.org/10.1186/s13174-015-0041-5

Ozturk, G.B.: Interoperability in building information modeling for aeco/fm industry. Autom. Constr. 113 , 103122 (2020). https://doi.org/10.1016/j.autcon.2020.103122

Prinsloo, P., Slade, S.: Big Data, Higher Education and Learning Analytics: Beyond Justice, Towards an Ethics of Care, pp. 109–124. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-06520-5_8

Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Urban planning and building smart cities based on the internet of things using big data analytics. Comput. Netw. 101 , 63–80 (2016). https://doi.org/10.1016/j.comnet.2015.12.023

Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21 (1), 4–21 (2017). https://doi.org/10.1109/JBHI.2016.2636665

Sahil, Sood, S.K.: Bibliometric monitoring of research performance in ict-based disaster management literature. Qual. Quant. (2020). https://doi.org/10.1007/s11135-020-00991-x

Santhakumar, R., Kaliyaperumal, K.: A scientometric analysis of mobile technology publications. Scientometrics 105 , 921–939 (2015). https://doi.org/10.1007/s11192-015-1710-7

Shayaa, S., Jaafar, N.I., Bahri, S., Sulaiman, A., Phoong, S.W., Chung, Y., Piprani, A., al garadi, M.: Sentiment analysis of big data: Methods, applications, and open challenges. IEEE Access 6 , 37807–37827 (2018). https://doi.org/10.1109/ACCESS.2018.2851311

Sidiropoulos, N.D., Lathauwer, L.D., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65 (13), 3551–3582 (2017). https://doi.org/10.1109/TSP.2017.2690524

Siemens, G.: Learning analytics: the emergence of a discipline. Am. Behav. Sci. 57 (10), 1380–1400 (2013). https://doi.org/10.1177/0002764213498851

Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70 , 263–286 (2017). https://doi.org/10.1016/j.jbusres.2016.08.001

Soleimani-Roozbahani, F., Ghatari, A.R., Radfar, R.: Knowledge discovery from a more than a decade studies on healthcare big data systems: a scientometrics study. J. Big Data (2019). https://doi.org/10.1186/s40537-018-0167-y

Sun, Y., Song, H., Jara, A.J., Bie, R.: Internet of things and big data analytics for smart and connected communities. IEEE Access 4 , 766–773 (2016). https://doi.org/10.1109/ACCESS.2016.2529723

Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.V.: Big data analytics: a survey. J. Big Data 2 (1), 1–32 (2015). https://doi.org/10.1186/s40537-015-0030-3

van Eck, N.J., Waltman, L.: Software survey: Vosviewer, a computer program for bibliometric mapping. Scientometrics 84 , 523–538 (2010). https://doi.org/10.1007/s11192-009-0146-3

van Eck, N.J., Waltman, L.: Visualizing Bibliometric Networks, pp. 285–320. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-10377-8_13

Vanani, I.R., Jalali, S.M.J.: A comparative analysis of emerging scientific themes in business analytics. Int. J. Bus. Inf. Syst. 29 (2), 183–206 (2018). https://doi.org/10.1504/IJBIS.2018.10009115

Waheed, H., Hassan, S.U., Aljohani, N.R., Wasif, M.: A bibliometric perspective of learning analytics research landscape. Behav. Inf. Technol. 37 (10–11), 941–957 (2018). https://doi.org/10.1080/0144929X.2018.1467967

Wamba, S.F., Gunasekaran, A., Akter, S., fan Ren, S.J., Dubey, R., Childe, S.J.: Big data analytics and rm performance: E ects of dynamic capabilities. J. Bus. Res. 70 , 356–365 (2017). https://doi.org/10.1016/j.jbusres.2016.08.009

Wang, S., Wan, J., Li, D., Zhang, C.: Implementing smart factory of industrie 4.0: An outlook. Int. J. Distrib. Sens. Netw. 12 (1), 3159805 (2016). https://doi.org/10.1155/2016/3159805

Wang, W., Lu, C.: Visualization analysis of big data research based on citespace. Soft. Comput. 24 , 8173–8186 (2020). https://doi.org/10.1007/s00500-019-04384-7

Wu, Q., Ding, G., Xu, Y., Feng, S., Du, Z., Wang, J., Long, K.: Cognitive internet of things: A new paradigm beyond connection. IEEE Internet Things J. 1 (2), 129–143 (2014). https://doi.org/10.1109/JIOT.2014.2311513

Wu, Y., Duan, Z.: Social network analysis of international scientific collaboration on psychiatry research. Int. J. Ment. Health Syst. 9 , 2 (2015). https://doi.org/10.1186/1752-4458-9-2

Xiang, Z., Schwartz, Z., Gerdes, J.H., Uysal, M.: What can big data and text analytics tell us about hotel guest experience and satisfaction? Int. J. Hospitality Manage. 44 , 120–130 (2015). https://doi.org/10.1016/j.ijhm.2014.10.013

Xu, Z., Yu, D.: A bibliometrics analysis on big data research (2009–2018). J. Data Inf. Manage. 1 , 3–15 (2019). https://doi.org/10.1007/s42488-019-00001-2

Zeng, L., Li, Z., Wu, T., Yang, L.: Mapping knowledge domain research in big data: From 2006 to 2016. In: Data Mining and Big Data, pp. 234–246. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-61845-6_24

Zhang, Y., Qiu, M., Tsai, C., Hassan, M.M., Alamri, A.: Health-cps: Healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11 (1), 88–95 (2017). https://doi.org/10.1109/JSYST.2015.2460747

Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. (2014). https://doi.org/10.1145/2629592

Zhong, R.Y., Xu, X., Klotz, E., Newman, S.T. (2017) Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3 (5):616–630 (2017). https://doi.org/10.1016/J.ENG.2017.05.015

Download references

Author information

Authors and affiliations.

Department of Computer Science and Informatics, Central University of Himachal Pradesh, Dharamshala, India

Keshav Singh Rawat & Sandeep Kumar Sood

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Keshav Singh Rawat .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Rawat, K.S., Sood, S.K. Emerging trends and global scope of big data analytics: a scientometric analysis. Qual Quant 55 , 1371–1396 (2021). https://doi.org/10.1007/s11135-020-01061-y

Download citation

Accepted : 14 October 2020

Published : 27 October 2020

Issue Date : August 2021

DOI : https://doi.org/10.1007/s11135-020-01061-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data analytics
  • Scientometric
  • Citation analysis
  • Emerging trends
  • Find a journal
  • Publish with us
  • Track your research

University Libraries

Advanced data analytics.

  • Advanced Data Analytics Course Libguide
  • Data sources
  • Learning resources
  • Analytic tools
  • Research and data services at UNT
  • Research on the topic of data analytics

Finding research on data analytics

Data analytics is a very new area of work and not really a discipline of its own. This makes finding scholarly work  about  data analytics kind of tricky.

Much of the work is done by computer scientists, so a good bit of the literature can be found in the these databases:

  • IEEE Xplore EL (IEEE Xplore) Full text access to IEEE Journals, Transactions, Magazines, Letters, Conference Proceedings, Standards (current only), and IEE Journals and Conferences. Dates of Coverage: 1988 to current, with some journals and conference titles available from 1950.
  • ACM digital library ACM Digital Library bibliographic citations, abstracts, reviews, and full text for articles published in ACM proceedings and periodicals, covering computer science and engineering, as well as more general computing topics. Dates of Coverage: 1954 - Present

Otherwise, you'll want to search a database related to the domain in which you want to apply data analytics: see one of the UNT Libraries' subject guides to find relevant database.

Or you can explore the literature across fields, such as by using the Books & More (for ebooks) and Online Article Search  (for journal articles) features, both on the homepage of the UNT Libraries . (On a mobile device, click the magnifying glass icon and then choose Library Catalog  for ebooks or Full Text Online for articles.)

  • << Previous: Research and data services at UNT
  • Last Updated: Mar 18, 2024 12:32 PM
  • URL: https://guides.library.unt.edu/data-analytics

Additional Links

UNT: Apply now UNT: Schedule a tour UNT: Get more info about the University of North Texas

UNT: Disclaimer | UNT: AA/EOE/ADA | UNT: Privacy | UNT: Electronic Accessibility | UNT: Required Links | UNT: UNT Home

  • How It Works
  • PhD thesis writing
  • Master thesis writing
  • Bachelor thesis writing
  • Dissertation writing service
  • Dissertation abstract writing
  • Thesis proposal writing
  • Thesis editing service
  • Thesis proofreading service
  • Thesis formatting service
  • Coursework writing service
  • Research paper writing service
  • Architecture thesis writing
  • Computer science thesis writing
  • Engineering thesis writing
  • History thesis writing
  • MBA thesis writing
  • Nursing dissertation writing
  • Psychology dissertation writing
  • Sociology thesis writing
  • Statistics dissertation writing
  • Buy dissertation online
  • Write my dissertation
  • Cheap thesis
  • Cheap dissertation
  • Custom dissertation
  • Dissertation help
  • Pay for thesis
  • Pay for dissertation
  • Senior thesis
  • Write my thesis

214 Best Big Data Research Topics for Your Thesis Paper

big data research topics

Finding an ideal big data research topic can take you a long time. Big data, IoT, and robotics have evolved. The future generations will be immersed in major technologies that will make work easier. Work that was done by 10 people will now be done by one person or a machine. This is amazing because, in as much as there will be job loss, more jobs will be created. It is a win-win for everyone.

Big data is a major topic that is being embraced globally. Data science and analytics are helping institutions, governments, and the private sector. We will share with you the best big data research topics.

On top of that, we can offer you the best writing tips to ensure you prosper well in your academics. As students in the university, you need to do proper research to get top grades. Hence, you can consult us if in need of research paper writing services.

Big Data Analytics Research Topics for your Research Project

Are you looking for an ideal big data analytics research topic? Once you choose a topic, consult your professor to evaluate whether it is a great topic. This will help you to get good grades.

  • Which are the best tools and software for big data processing?
  • Evaluate the security issues that face big data.
  • An analysis of large-scale data for social networks globally.
  • The influence of big data storage systems.
  • The best platforms for big data computing.
  • The relation between business intelligence and big data analytics.
  • The importance of semantics and visualization of big data.
  • Analysis of big data technologies for businesses.
  • The common methods used for machine learning in big data.
  • The difference between self-turning and symmetrical spectral clustering.
  • The importance of information-based clustering.
  • Evaluate the hierarchical clustering and density-based clustering application.
  • How is data mining used to analyze transaction data?
  • The major importance of dependency modeling.
  • The influence of probabilistic classification in data mining.

Interesting Big Data Analytics Topics

Who said big data had to be boring? Here are some interesting big data analytics topics that you can try. They are based on how some phenomena are done to make the world a better place.

  • Discuss the privacy issues in big data.
  • Evaluate the storage systems of scalable in big data.
  • The best big data processing software and tools.
  • Data mining tools and techniques are popularly used.
  • Evaluate the scalable architectures for parallel data processing.
  • The major natural language processing methods.
  • Which are the best big data tools and deployment platforms?
  • The best algorithms for data visualization.
  • Analyze the anomaly detection in cloud servers
  • The scrutiny normally done for the recruitment of big data job profiles.
  • The malicious user detection in big data collection.
  • Learning long-term dependencies via the Fourier recurrent units.
  • Nomadic computing for big data analytics.
  • The elementary estimators for graphical models.
  • The memory-efficient kernel approximation.

Big Data Latest Research Topics

Do you know the latest research topics at the moment? These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars.

  • Evaluate the data mining process.
  • The influence of the various dimension reduction methods and techniques.
  • The best data classification methods.
  • The simple linear regression modeling methods.
  • Evaluate the logistic regression modeling.
  • What are the commonly used theorems?
  • The influence of cluster analysis methods in big data.
  • The importance of smoothing methods analysis in big data.
  • How is fraud detection done through AI?
  • Analyze the use of GIS and spatial data.
  • How important is artificial intelligence in the modern world?
  • What is agile data science?
  • Analyze the behavioral analytics process.
  • Semantic analytics distribution.
  • How is domain knowledge important in data analysis?

Big Data Debate Topics

If you want to prosper in the field of big data, you need to try even hard topics. These big data debate topics are interesting and will help you to get a better understanding.

  • The difference between big data analytics and traditional data analytics methods.
  • Why do you think the organization should think beyond the Hadoop hype?
  • Does the size of the data matter more than how recent the data is?
  • Is it true that bigger data are not always better?
  • The debate of privacy and personalization in maintaining ethics in big data.
  • The relation between data science and privacy.
  • Do you think data science is a rebranding of statistics?
  • Who delivers better results between data scientists and domain experts?
  • According to your view, is data science dead?
  • Do you think analytics teams need to be centralized or decentralized?
  • The best methods to resource an analytics team.
  • The best business case for investing in analytics.
  • The societal implications of the use of predictive analytics within Education.
  • Is there a need for greater control to prevent experimentation on social media users without their consent?
  • How is the government using big data; for the improvement of public statistics or to control the population?

University Dissertation Topics on Big Data

Are you doing your Masters or Ph.D. and wondering the best dissertation topic or thesis to do? Why not try any of these? They are interesting and based on various phenomena. While doing the research ensure you relate the phenomenon with the current modern society.

  • The machine learning algorithms are used for fall recognition.
  • The divergence and convergence of the internet of things.
  • The reliable data movements using bandwidth provision strategies.
  • How is big data analytics using artificial neural networks in cloud gaming?
  • How is Twitter accounts classification done using network-based features?
  • How is online anomaly detection done in the cloud collaborative environment?
  • Evaluate the public transportation insights provided by big data.
  • Evaluate the paradigm for cancer patients using the nursing EHR to predict the outcome.
  • Discuss the current data lossless compression in the smart grid.
  • How does online advertising traffic prediction helps in boosting businesses?
  • How is the hyperspectral classification done using the multiple kernel learning paradigm?
  • The analysis of large data sets downloaded from websites.
  • How does social media data help advertising companies globally?
  • Which are the systems recognizing and enforcing ownership of data records?
  • The alternate possibilities emerging for edge computing.

The Best Big Data Analysis Research Topics and Essays

There are a lot of issues that are associated with big data. Here are some of the research topics that you can use in your essays. These topics are ideal whether in high school or college.

  • The various errors and uncertainty in making data decisions.
  • The application of big data on tourism.
  • The automation innovation with big data or related technology
  • The business models of big data ecosystems.
  • Privacy awareness in the era of big data and machine learning.
  • The data privacy for big automotive data.
  • How is traffic managed in defined data center networks?
  • Big data analytics for fault detection.
  • The need for machine learning with big data.
  • The innovative big data processing used in health care institutions.
  • The money normalization and extraction from texts.
  • How is text categorization done in AI?
  • The opportunistic development of data-driven interactive applications.
  • The use of data science and big data towards personalized medicine.
  • The programming and optimization of big data applications.

The Latest Big Data Research Topics for your Research Proposal

Doing a research proposal can be hard at first unless you choose an ideal topic. If you are just diving into the big data field, you can use any of these topics to get a deeper understanding.

  • The data-centric network of things.
  • Big data management using artificial intelligence supply chain.
  • The big data analytics for maintenance.
  • The high confidence network predictions for big biological data.
  • The performance optimization techniques and tools for data-intensive computation platforms.
  • The predictive modeling in the legal context.
  • Analysis of large data sets in life sciences.
  • How to understand the mobility and transport modal disparities sing emerging data sources?
  • How do you think data analytics can support asset management decisions?
  • An analysis of travel patterns for cellular network data.
  • The data-driven strategic planning for citywide building retrofitting.
  • How is money normalization done in data analytics?
  • Major techniques used in data mining.
  • The big data adaptation and analytics of cloud computing.
  • The predictive data maintenance for fault diagnosis.

Interesting Research Topics on A/B Testing In Big Data

A/B testing topics are different from the normal big data topics. However, you use an almost similar methodology to find the reasons behind the issues. These topics are interesting and will help you to get a deeper understanding.

  • How is ultra-targeted marketing done?
  • The transition of A/B testing from digital to offline.
  • How can big data and A/B testing be done to win an election?
  • Evaluate the use of A/B testing on big data
  • Evaluate A/B testing as a randomized control experiment.
  • How does A/B testing work?
  • The mistakes to avoid while conducting the A/B testing.
  • The most ideal time to use A/B testing.
  • The best way to interpret results for an A/B test.
  • The major principles of A/B tests.
  • Evaluate the cluster randomization in big data
  • The best way to analyze A/B test results and the statistical significance.
  • How is A/B testing used in boosting businesses?
  • The importance of data analysis in conversion research
  • The importance of A/B testing in data science.

Amazing Research Topics on Big Data and Local Governments

Governments are now using big data to make the lives of the citizens better. This is in the government and the various institutions. They are based on real-life experiences and making the world better.

  • Assess the benefits and barriers of big data in the public sector.
  • The best approach to smart city data ecosystems.
  • The big analytics used for policymaking.
  • Evaluate the smart technology and emergence algorithm bureaucracy.
  • Evaluate the use of citizen scoring in public services.
  • An analysis of the government administrative data globally.
  • The public values are found in the era of big data.
  • Public engagement on local government data use.
  • Data analytics use in policymaking.
  • How are algorithms used in public sector decision-making?
  • The democratic governance in the big data era.
  • The best business model innovation to be used in sustainable organizations.
  • How does the government use the collected data from various sources?
  • The role of big data for smart cities.
  • How does big data play a role in policymaking?

Easy Research Topics on Big Data

Who said big data topics had to be hard? Here are some of the easiest research topics. They are based on data management, research, and data retention. Pick one and try it!

  • Who uses big data analytics?
  • Evaluate structure machine learning.
  • Explain the whole deep learning process.
  • Which are the best ways to manage platforms for enterprise analytics?
  • Which are the new technologies used in data management?
  • What is the importance of data retention?
  • The best way to work with images is when doing research.
  • The best way to promote research outreach is through data management.
  • The best way to source and manage external data.
  • Does machine learning improve the quality of data?
  • Describe the security technologies that can be used in data protection.
  • Evaluate token-based authentication and its importance.
  • How can poor data security lead to the loss of information?
  • How to determine secure data.
  • What is the importance of centralized key management?

Unique IoT and Big Data Research Topics

Internet of Things has evolved and many devices are now using it. There are smart devices, smart cities, smart locks, and much more. Things can now be controlled by the touch of a button.

  • Evaluate the 5G networks and IoT.
  • Analyze the use of Artificial intelligence in the modern world.
  • How do ultra-power IoT technologies work?
  • Evaluate the adaptive systems and models at runtime.
  • How have smart cities and smart environments improved the living space?
  • The importance of the IoT-based supply chains.
  • How does smart agriculture influence water management?
  • The internet applications naming and identifiers.
  • How does the smart grid influence energy management?
  • Which are the best design principles for IoT application development?
  • The best human-device interactions for the Internet of Things.
  • The relation between urban dynamics and crowdsourcing services.
  • The best wireless sensor network for IoT security.
  • The best intrusion detection in IoT.
  • The importance of big data on the Internet of Things.

Big Data Database Research Topics You Should Try

Big data is broad and interesting. These big data database research topics will put you in a better place in your research. You also get to evaluate the roles of various phenomena.

  • The best cloud computing platforms for big data analytics.
  • The parallel programming techniques for big data processing.
  • The importance of big data models and algorithms in research.
  • Evaluate the role of big data analytics for smart healthcare.
  • How is big data analytics used in business intelligence?
  • The best machine learning methods for big data.
  • Evaluate the Hadoop programming in big data analytics.
  • What is privacy-preserving to big data analytics?
  • The best tools for massive big data processing
  • IoT deployment in Governments and Internet service providers.
  • How will IoT be used for future internet architectures?
  • How does big data close the gap between research and implementation?
  • What are the cross-layer attacks in IoT?
  • The influence of big data and smart city planning in society.
  • Why do you think user access control is important?

Big Data Scala Research Topics

Scala is a programming language that is used in data management. It is closely related to other data programming languages. Here are some of the best scala questions that you can research.

  • Which are the most used languages in big data?
  • How is scala used in big data research?
  • Is scala better than Java in big data?
  • How is scala a concise programming language?
  • How does the scala language stream process in real-time?
  • Which are the various libraries for data science and data analysis?
  • How does scala allow imperative programming in data collection?
  • Evaluate how scala includes a useful REPL for interaction.
  • Evaluate scala’s IDE support.
  • The data catalog reference model.
  • Evaluate the basics of data management and its influence on research.
  • Discuss the behavioral analytics process.
  • What can you term as the experience economy?
  • The difference between agile data science and scala language.
  • Explain the graph analytics process.

Independent Research Topics for Big Data

These independent research topics for big data are based on the various technologies and how they are related. Big data will greatly be important for modern society.

  • The biggest investment is in big data analysis.
  • How are multi-cloud and hybrid settings deep roots?
  • Why do you think machine learning will be in focus for a long while?
  • Discuss in-memory computing.
  • What is the difference between edge computing and in-memory computing?
  • The relation between the Internet of things and big data.
  • How will digital transformation make the world a better place?
  • How does data analysis help in social network optimization?
  • How will complex big data be essential for future enterprises?
  • Compare the various big data frameworks.
  • The best way to gather and monitor traffic information using the CCTV images
  • Evaluate the hierarchical structure of groups and clusters in the decision tree.
  • Which are the 3D mapping techniques for live streaming data.
  • How does machine learning help to improve data analysis?
  • Evaluate DataStream management in task allocation.
  • How is big data provisioned through edge computing?
  • The model-based clustering of texts.
  • The best ways to manage big data.
  • The use of machine learning in big data.

Is Your Big Data Thesis Giving You Problems?

These are some of the best topics that you can use to prosper in your studies. Not only are they easy to research but also reflect on real-time issues. Whether in University or college, you need to put enough effort into your studies to prosper. However, if you have time constraints, we can provide professional writing help. Are you looking for online expert writers? Look no further, we will provide quality work at a cheap price.

genetics research topics

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment * Error message

Name * Error message

Email * Error message

Save my name, email, and website in this browser for the next time I comment.

As Putin continues killing civilians, bombing kindergartens, and threatening WWIII, Ukraine fights for the world's peaceful future.

Ukraine Live Updates

Shapiro Library

Data Analytics

Data analytics research page.

The following Shapiro Library databases offer articles and reports from scholarly journals, trade publications, magazines, and other sources. Use keyword searching techniques to find information based on your research interest.

This resource contains full-text articles and reports from journals and magazines.

  • Engineering Source - EBSCO This link opens in a new window Engineering Source offers more than 3000 full text titles in engineering, computer science and related areas.

This resource contains newspaper articles.

Suggested e-Books: Data Analytics

research topic on data analytics

Suggested e-books: Data Analytics - Research Methods

research topic on data analytics

  • << Previous: Identifying Data Needs
  • Next: Sources for Data Sets >>

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

The High Cost of Misaligned Business and Analytics Goals

  • Preethika Sainam,
  • Seigyoung Auh,
  • Richard Ettenson,
  • Bulent Menguc

research topic on data analytics

Findings from research on more than 300 companies undergoing data and analytics transformations.

How and where do companies’ investments in new and improved data and analytic capabilities contribute to tangible business benefits like profitability and growth? Should they invest in talent? Technology? Culture? According to new research, the degree of alignment between business goals and analytics capabilities is among the most important factors. While companies that are early in their analytics journey will see value creation even with significant internal misalignment, at higher levels of data maturity aligned companies find that analytics capabilities create significantly more value across growth, financial, and customer KPIs.

Business leaders are feeling acute pressure to ramp up their company’s data and analytics capabilities — and fast — or risk falling behind more data-savvy competitors. If only the path to success were that straightforward! In our previous research, we found that capitalizing on data and analytics requires creating a data culture, obtaining senior leadership commitment, acquiring data and analytics skills and competencies, as well as empowering employees. And each of these dimensions is necessary just to start the analytics journey.

research topic on data analytics

  • PS Preethika Sainam is an Assistant Professor of Global Marketing at Thunderbird School of Global Management, Arizona State University.
  • SA Seigyoung Auh is Professor of Global Marketing at Thunderbird School of Global Management, Arizona State University, and Research Faculty at the Center for Services Leadership at the WP Carey School of Business, Arizona State University.
  • RE Richard Ettenson is Professor and Keickhefer Fellow in Global Marketing and Brand Strategy, The Thunderbird School of Global Management, Arizona State University .
  • BM Bulent Menguc is a Professor of Marketing at the Leeds University Business School in the U.K.

Partner Center

  • News & Media
  • Chemical Biology
  • Computational Biology
  • Ecosystem Science
  • Cancer Biology
  • Exposure Science & Pathogen Biology
  • Metabolic Inflammatory Diseases
  • Advanced Metabolomics
  • Mass Spectrometry-Based Measurement Technologies
  • Spatial and Single-Cell Proteomics
  • Structural Biology
  • Biofuels & Bioproducts
  • Human Microbiome
  • Soil Microbiome
  • Synthetic Biology
  • Computational Chemistry
  • Chemical Separations
  • Chemical Physics
  • Atmospheric Aerosols
  • Human-Earth System Interactions
  • Modeling Earth Systems
  • Coastal Science
  • Plant Science
  • Subsurface Science
  • Terrestrial Aquatics
  • Materials in Extreme Environments
  • Precision Materials by Design
  • Science of Interfaces
  • Friction Stir Welding & Processing
  • Dark Matter
  • Flavor Physics
  • Fusion Energy Science
  • Neutrino Physics
  • Quantum Information Sciences
  • Emergency Response
  • AGM Program
  • Tools and Capabilities
  • Grid Architecture
  • Grid Cybersecurity
  • Grid Energy Storage
  • Earth System Modeling
  • Energy System Modeling
  • Transmission
  • Distribution
  • Appliance and Equipment Standards
  • Building Energy Codes
  • Advanced Building Controls
  • Advanced Lighting
  • Building-Grid Integration
  • Building and Grid Modeling
  • Commercial Buildings
  • Federal Performance Optimization
  • Resilience and Security
  • Grid Resilience and Decarbonization
  • Building America Solution Center
  • Energy Efficient Technology Integration
  • Home Energy Score
  • Electrochemical Energy Storage
  • Flexible Loads and Generation
  • Grid Integration, Controls, and Architecture
  • Regulation, Policy, and Valuation
  • Science Supporting Energy Storage
  • Chemical Energy Storage
  • Waste Processing
  • Radiation Measurement
  • Environmental Remediation
  • Subsurface Energy Systems
  • Carbon Capture
  • Carbon Storage
  • Carbon Utilization
  • Advanced Hydrocarbon Conversion
  • Fuel Cycle Research
  • Advanced Reactors
  • Reactor Operations
  • Reactor Licensing
  • Solar Energy
  • Wind Resource Characterization
  • Wildlife and Wind
  • Community Values and Ocean Co-Use
  • Wind Systems Integration
  • Wind Data Management
  • Distributed Wind
  • Energy Equity & Health
  • Environmental Monitoring for Marine Energy
  • Marine Biofouling and Corrosion
  • Marine Energy Resource Characterization
  • Testing for Marine Energy
  • The Blue Economy
  • Environmental Performance of Hydropower
  • Hydropower Cybersecurity and Digitalization
  • Hydropower and the Electric Grid
  • Materials Science for Hydropower
  • Pumped Storage Hydropower
  • Water + Hydropower Planning
  • Grid Integration of Renewable Energy
  • Geothermal Energy
  • Algal Biofuels
  • Aviation Biofuels
  • Waste-to-Energy and Products
  • Hydrogen & Fuel Cells
  • Emission Control
  • Energy-Efficient Mobility Systems
  • Lightweight Materials
  • Vehicle Electrification
  • Vehicle Grid Integration
  • Contraband Detection
  • Pathogen Science & Detection
  • Explosives Detection
  • Threat-Agnostic Biodefense
  • Discovery and Insight
  • Proactive Defense
  • Trusted Systems
  • Nuclear Material Science
  • Radiological & Nuclear Detection
  • Nuclear Forensics
  • Ultra-Sensitive Nuclear Measurements
  • Nuclear Explosion Monitoring
  • Global Nuclear & Radiological Security
  • Disaster Recovery
  • Global Collaborations
  • Legislative and Regulatory Analysis
  • Technical Training
  • Additive Manufacturing
  • Deployed Technologies
  • Rapid Prototyping
  • Systems Engineering
  • 5G Security
  • RF Signal Detection & Exploitation
  • Climate Security
  • Internet of Things
  • Maritime Security
  • Artificial Intelligence
  • Graph and Data Analytics
  • Software Engineering
  • Computational Mathematics & Statistics
  • High-Performance Computing
  • Adaptive Autonomous Systems
  • Visual Analytics
  • Lab Objectives
  • Publications & Reports
  • Featured Research
  • Diversity, Equity, Inclusion & Accessibility
  • Lab Leadership
  • Lab Fellows
  • Staff Accomplishments
  • Undergraduate Students
  • Graduate Students
  • Post-graduate Students
  • University Faculty
  • University Partnerships
  • K-12 Educators and Students
  • STEM Workforce Development
  • STEM Outreach
  • Meet the Team
  • Internships
  • Regional Impact
  • Philanthropy
  • Volunteering
  • Available Technologies
  • Industry Partnerships
  • Licensing & Technology Transfer
  • Entrepreneurial Leave
  • Atmospheric Radiation Measurement User Facility
  • Electricity Infrastructure Operations Center
  • Energy Sciences Center
  • Environmental Molecular Sciences Laboratory
  • Grid Storage Launchpad
  • Institute for Integrated Catalysis
  • Interdiction Technology and Integration Laboratory
  • PNNL Portland Research Center
  • PNNL Seattle Research Center
  • PNNL-Sequim (Marine and Coastal Research)
  • Radiochemical Processing Laboratory
  • Shallow Underground Laboratory

New Software Tool Generates Data Analysis for Complex ToF-SIMS

Software creates reports automatically to explain chemical differences among samples 

A blue blurred stripes depict information being sent through circuits before being turned into arrows depicting organized and information understandable by a non-computing researcher

A new and flexible software tool based on the Python computing language was developed by Pacific Northwest National Laboratory researchers where artificial intelligence–machine learning functions are incorporated to help users chemically understand complex spectra generated from Time-of-flight secondary ion mass spectrometry.

(Image: Freepik )

The Science  

Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is a specialized mass spectrometry technique. It can be used to identify the composition of solid surfaces, providing elemental, isotopic, and molecular information with part per million (ppm) sensitivity. ToF-SIMS spectra, however, may contain numerous individual ion signals, which can be hard to distinguish without dedicated statistical tools. Statistical techniques, such as PCA, have been widely used to visualize differences of ToF-SIMS spectra and have shown great success. But chemically understanding of these differences, especially for less experienced users, has long been challenging. To solve this issue, a new and flexible software tool based on Python was developed where artificial intelligence (AI)–machine learning functions are incorporated to help users chemically understand complex spectra generated from ToF-SIMS. In this work, AI–machine leaning functions were incorporated to a traditional PCA analysis tool, showing great progress in resolving this issue. 

The Impact 

Compared to other software packages available, the advantage of the new Python-based package is that a detailed Microsoft Word format data analysis report can be automatically generated, greatly facilitating less experienced users in understanding chemical differences among samples. This saves users a lot of time in data report processing. The new package is free, open-source software that comes with a detailed manual, making it user-friendly and straightforward to understand. The new package is flexible, powerful, and extensible. For example, most exported parameters can be customized. Additionally, some basic AI functions have been integrated into the software package, and many other AI/machine learning functions in the Python system are available for exploring more powerful capabilities. In principle, the software package can be used to treat many types of spectra data, such as Fourier-transform ion cyclotron resonance mass spectrometry (FTICR-MS), matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS), infrared spectroscopy (IR), and X-ray photoelectron spectroscopy (XPS). 

So far, several software tools have been developed for PCA of ToF-SIMS spectra; however, none of them are freely available. Such a situation leads to some difficulties in extending applications of PCA to various research fields. More importantly, it has long been challenging for common researchers to understand PCA plots and extract chemical differences among samples. In this work, a team of researchers from Pacific Northwest National Laboratory and the Environmental Molecular Sciences Laboratory, a Department of Energy Office of Science user facility, developed a new and flexible software tool called “Advanced Spectra PCA Toolbox.” The software is based on Python for PCA of complex ToF-SIMS spectra and features an easy-to-read manual. It can generate data analysis reports automatically to explain chemical differences among samples, helping less experienced researchers easily understand tricky PCA results. Moreover, it is expandable and compatible with AI/machine learning functions. Pure goethite and different lignin-adsorbed goethite samples were used as a model system to demonstrate the software tool, showing that it can be readily used in complex spectra data processing. Our software tool is open-source, flexible, and expandable. The research team expects this open-source tool will benefit not only the ToF-SIMS community, but also other spectrometry techniques (such as FTICR-MS, MALDI-MS, IR, and XPS). 

PNNL Contact

Zihua Zhu, Pacific Northwest National Laboratory and Environmental Molecular Sciences Laboratory,  [email protected]  

This research was supported by the Laboratory Directed Research and Development program in the Earth and Biological Sciences Directorate at Pacific Northwest National Laboratory and was performed on a project award from the Environmental Molecular Sciences Laboratory, a Department of Energy Office of Science user facility sponsored by the Biological and Environmental Research program. 

Published: June 20, 2024

Y. Zhou, et al. “Novel principal component analysis tool based on Python for analysis of complex spectra of time-of-flight secondary ion mass spectrometry.” Journal of Vacuum Science and Technology A (2024), Vol. 42, Issue 2, 023204. https://doi.org/10.1116/6.0003355

Research topics

Lab-level communications priority topics.

Center for Big Data Analytics

Research topics.

  • Alternating Minimization
  • Bioinformatics
  • Bregman Divergence
  • Co-Clustering
  • Compressed Sensing
  • Computer Vision
  • Coordinate Descent
  • Covariance Estimation
  • Crowd Computing
  • Data Clustering
  • Deep Learning
  • Divide-and-Conquer Methods
  • Eigenvalue Decomposition
  • Empirical Risk Minimization
  • Gene-Disease Prediction
  • Graph Clustering
  • Graphical Models
  • Greedy Method
  • High-Dimensional Statistics
  • Information-Theoretic Analysis
  • Kernel Methods
  • Learning Theory
  • Link Prediction
  • Matrix Approximation
  • Matrix Completion
  • Metric Learning
  • Multi-label Learning
  • Newton Methods
  • Nonnegative Matrix Factorization
  • Numerical Linear Algebra
  • Online Learning
  • Recommender Systems
  • Robust Learning
  • Signed Networks
  • Social Network Analysis
  • Spectral Clustering
  • Stochastic Gradient Methods
  • Support Vector Machines
  • Tight Frames
  • Topic Models

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

remotesensing-logo

Article Menu

research topic on data analytics

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A systematic literature review and bibliometric analysis of semantic segmentation models in land cover mapping.

research topic on data analytics

1. Introduction

2. materials and methods, 2.1. research questions (rqs).

  • RQ1. What are the emerging patterns in land cover mapping?
  • RQ2. What are the domain studies of semantic segmentation models in land cover mapping?
  • RQ3. What are the data used in semantic segmentation models for land cover mapping?
  • RQ4. What are the architecture and performances of semantic segmentation methodologies used in land cover mapping?

2.2. Search Strategy

2.3. study selection criteria, 2.4. eligibility and data analysis, 2.5. data synthesis, 3. results and discussion, 3.1. rq1. what are the emerging patterns in land cover mapping.

  • Annual distribution of research studies
  • Leading Journals
  • Geographic distribution of studies
  • Leading Themes and Timelines

3.2. RQ2. What Are Domain Studies of Semantic Segmentation Models in Land Cover Mapping?

  • Land Cover Studies
  • Precision Agriculture
  • Environment
  • Coastal Areas

3.3. RQ3. What Are the Data Used in Semantic Segmentation Models for Land Cover Mapping?

  • Study Locations
  • Data Sources
  • Benchmark datasets

3.4. RQ4. What Are the Architecture and Performances of Semantic Segmentation Methodologies Used in Land Cover Mapping?

  • Encoder-Decoder based structure
  • Transformer-based structure
  • Hybrid-based structure
  • Performance analysis of semantic segmentation model structures on ISPRS 2-D labelling Potsdam and Vaihingen datasets
  • Common experimental training settings

4. Challenges, Future Insights and Directions

4.1. land cover mapping.

  • Extracting boundary information
  • Generating Precise Land Cover Maps

4.2. Semantic Segmentation Methodologies

  • Enhancing deep learning model performance
  • Analysis of RS images
  • Unlabeled and Imbalance RS data

5. Conclusions

Author contributions, data availability statement, acknowledgments, conflicts of interest, abbreviations.

BANetBilateral Awareness Network
CNNConvolutional Neural Networks
DCNN Deep Convolutional Neural Network
DEANETDual Encoder with Attention Network
DGFNETDual-Gate Fusion Network
DL Deep Learning
DSMDigital Surface Model
FCNFully Convolutional Networks
GF-2GaoFen-2
GF-3GaoFen-3
GIDGaoFen Image Data
HFENetHierarchical Feature Extraction Network
HMRTHybrid Multi-resolution and Transformer semantic extraction Network
IEEEInstitute of Electrical and Electronics Engineers
IoUMean Intersection over Union
ISPRSInternational Society for Photogrammetry and Remote Sensing
LC Land Cover
LiDARLight Detection and Ranging data
LoveDALand-cOVEr Domain Adaptive
LULCLand Use and Land Cover
MAREMulti-Attention REsu-Net
MDPIMultidisciplinary Digital Publishing Institute
MIoUMean Intersection over Union
NLPNatural Language Processing
OAOverall Accuracy
PolSARPolarimetric Synthetic Aperture Radar
RAANETResidual ASPP with Attention Net
RQResearch Question
RS Remote Sensing
RSIRemote Sensing Imaginary
SARSynthetic Aperture Radar
SBANetSemantic Boundary Awareness Network
SEG-ESRGANSegmentation Enhanced Super-Resolution Generative Adversarial Network
SOCNNSuperpixel-Optimized convolutional neural network
SOTA State-Of-The-Art
UASUnmanned Aircraft System
UAVUnmanned Aerial Vehicle
VEDAIVEhicle Detection in Aerial Imagery
WHDLDWuhan Dense Labeling Dataset
  • Vali, A.; Comai, S.; Matteucci, M. Deep Learning for Land Use and Land Cover Classification Based on Hyperspectral and Multispectral Earth Observation Data: A Review. Remote Sens. 2020 , 12 , 2495. [ Google Scholar ] [ CrossRef ]
  • Ma, J.; Wu, L.; Tang, X.; Liu, F.; Zhang, X.; Jiao, L. Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network. Remote Sens. 2020 , 12 , 2350. [ Google Scholar ] [ CrossRef ]
  • Pourmohammadi, P.; Adjeroh, D.A.; Strager, M.P.; Farid, Y.Z. Predicting Developed Land Expansion Using Deep Convolutional Neural Networks. Environ. Model. Softw. 2020 , 134 , 104751. [ Google Scholar ] [ CrossRef ]
  • Di Pilato, A.; Taggio, N.; Pompili, A.; Iacobellis, M.; Di Florio, A.; Passarelli, D.; Samarelli, S. Deep Learning Approaches to Earth Observation Change Detection. Remote Sens. 2021 , 13 , 4083. [ Google Scholar ] [ CrossRef ]
  • Wei, P.; Chai, D.; Lin, T.; Tang, C.; Du, M.; Huang, J. Large-Scale Rice Mapping under Different Years Based on Time-Series Sentinel-1 Images Using Deep Semantic Segmentation Model. ISPRS J. Photogramm. Remote Sens. 2021 , 174 , 198–214. [ Google Scholar ] [ CrossRef ]
  • Dal Molin Jr., R.; Rizzoli, P. Potential of Convolutional Neural Networks for Forest Mapping Using Sentinel-1 Interferometric Short Time Series. Remote Sens. 2022 , 14 , 1381. [ Google Scholar ] [ CrossRef ]
  • Sun, Y.; Zhang, X.; Huang, J.; Wang, H.; Xin, Q. Fine-Grained Building Change Detection from Very High-Spatial-Resolution Remote Sensing Images Based on Deep Multitask Learning. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 8000605. [ Google Scholar ] [ CrossRef ]
  • Trenčanová, B.; Proença, V.; Bernardino, A. Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes. Remote Sens. 2022 , 14 , 1262. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Wang, Z.; Zhang, J.; Wei, A. MSANet: An Improved Semantic Segmentation Method Using Multi-Scale Attention for Remote Sensing Images. Remote Sens. Lett. 2022 , 13 , 1249–1259. [ Google Scholar ] [ CrossRef ]
  • Scepanovic, S.; Antropov, O.; Laurila, P.; Rauste, Y.; Ignatenko, V.; Praks, J. Wide-Area Land Cover Mapping with Sentinel-1 Imagery Using Deep Learning Semantic Segmentation Models. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021 , 14 , 10357–10374. [ Google Scholar ] [ CrossRef ]
  • Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep Learning for Visual Understanding: A Review. Neurocomputing 2016 , 187 , 27–48. [ Google Scholar ] [ CrossRef ]
  • Huang, J.; Weng, L.; Chen, B.; Xia, M. DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover. ISPRS Int. J. Geoinf. 2021 , 10 , 125. [ Google Scholar ] [ CrossRef ]
  • Chen, S.; Wu, C.; Mukherjee, M.; Zheng, Y. Ha-Mppnet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation. ISPRS Int. J. Geoinf. 2021 , 10 , 672. [ Google Scholar ] [ CrossRef ]
  • Hao, S.; Zhou, Y.; Guo, Y. A Brief Survey on Semantic Segmentation with Deep Learning. Neurocomputing 2020 , 406 , 302–321. [ Google Scholar ] [ CrossRef ]
  • Chen, G.; Zhang, X.; Wang, Q.; Dai, F.; Gong, Y.; Zhu, K. Symmetrical Dense-Shortcut Deep Fully Convolutional Networks for Semantic Segmentation of Very-High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018 , 11 , 1633–1644. [ Google Scholar ] [ CrossRef ]
  • Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Munich, Germany, 5–9 October 2015; Volume 9351. [ Google Scholar ]
  • Chen, B.; Xia, M.; Huang, J. Mfanet: A Multi-Level Feature Aggregation Network for Semantic Segmentation of Land Cover. Remote Sens. 2021 , 13 , 731. [ Google Scholar ] [ CrossRef ]
  • Weng, L.; Pang, K.; Xia, M.; Lin, H.; Qian, M.; Zhu, C. Sgformer: A Local and Global Features Coupling Network for Semantic Segmentation of Land Cover. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023 , 16 , 6812–6824. [ Google Scholar ] [ CrossRef ]
  • Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery. ISPRS J. Photogramm. Remote Sens. 2022 , 190 , 196–214. [ Google Scholar ] [ CrossRef ]
  • Xiao, D.; Kang, Z.; Fu, Y.; Li, Z.; Ran, M. Csswin-Unet: A Swin-Unet Network for Semantic Segmentation of Remote Sensing Images by Aggregating Contextual Information and Extracting Spatial Information. Int. J. Remote Sens. 2023 , 44 , 7598–7625. [ Google Scholar ] [ CrossRef ]
  • Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Martinez-Gonzalez, P.; Garcia-Rodriguez, J. A Survey on Deep Learning Techniques for Image and Video Semantic Segmentation. Appl. Soft Comput. J. 2018 , 70 , 41–65. [ Google Scholar ] [ CrossRef ]
  • Lateef, F.; Ruichek, Y. Survey on Semantic Segmentation Using Deep Learning Techniques. Neurocomputing 2019 , 338 , 321–348. [ Google Scholar ] [ CrossRef ]
  • Yuan, X.; Shi, J.; Gu, L. A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery. Expert. Syst. Appl. 2021 , 169 , 114417. [ Google Scholar ] [ CrossRef ]
  • Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021 , 372 , n71. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Manley, K.; Nyelele, C.; Egoh, B.N. A Review of Machine Learning and Big Data Applications in Addressing Ecosystem Service Research Gaps. Ecosyst. Serv. 2022 , 57 , 101478. [ Google Scholar ] [ CrossRef ]
  • Tian, T.; Chu, Z.; Hu, Q.; Ma, L. Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2021 , 13 , 3211. [ Google Scholar ] [ CrossRef ]
  • Wan, L.; Tian, Y.; Kang, W.; Ma, L. D-TNet: Category-Awareness Based Difference-Threshold Alternative Learning Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5633316. [ Google Scholar ] [ CrossRef ]
  • Picon, A.; Bereciartua-Perez, A.; Eguskiza, I.; Romero-Rodriguez, J.; Jimenez-Ruiz, C.J.; Eggers, T.; Klukas, C.; Navarra-Mestre, R. Deep Convolutional Neural Network for Damaged Vegetation Segmentation from RGB Images Based on Virtual NIR-Channel Estimation. Artif. Intell. Agric. 2022 , 6 , 199–210. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.; Huang, X.; Li, J. DWin-HRFormer: A High-Resolution Transformer Model With Directional Windows for Semantic Segmentation of Urban Construction Land. IEEE Trans. Geosci. Remote Sens. 2023 , 61 , 5400714. [ Google Scholar ] [ CrossRef ]
  • Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images. Remote Sens. 2021 , 13 , 3065. [ Google Scholar ] [ CrossRef ]
  • Akcay, O.; Kinaci, A.C.; Avsar, E.O.; Aydar, U. Semantic Segmentation of High-Resolution Airborne Images with Dual-Stream DeepLabV3+. ISPRS Int. J. Geoinf. 2022 , 11 , 23. [ Google Scholar ] [ CrossRef ]
  • Sun, Z.; Zhou, W.; Ding, C.; Xia, M. Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image. ISPRS Int. J. Geoinf. 2022 , 11 , 165. [ Google Scholar ] [ CrossRef ]
  • Chen, T.-H.K.; Qiu, C.; Schmitt, M.; Zhu, X.X.; Sabel, C.E.; Prishchepov, A.V. Mapping Horizontal and Vertical Urban Densification in Denmark with Landsat Time-Series from 1985 to 2018: A Semantic Segmentation Solution. Remote Sens. Environ. 2020 , 251 , 112096. [ Google Scholar ] [ CrossRef ]
  • Wu, F.; Wang, C.; Zhang, H.; Li, J.; Li, L.; Chen, W.; Zhang, B. Built-up Area Mapping in China from GF-3 SAR Imagery Based on the Framework of Deep Learning. Remote Sens. Environ. 2021 , 262 , 112515. [ Google Scholar ] [ CrossRef ]
  • Xu, S.; Zhang, S.; Zeng, J.; Li, T.; Guo, Q.; Jin, S. A Framework for Land Use Scenes Classification Based on Landscape Photos. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020 , 13 , 6124–6141. [ Google Scholar ] [ CrossRef ]
  • Xu, L.; Shi, S.; Liu, Y.; Zhang, H.; Wang, D.; Zhang, L.; Liang, W.; Chen, H. A Large-Scale Remote Sensing Scene Dataset Construction for Semantic Segmentation. Int. J. Image Data Fusion 2023 , 14 , 299–323. [ Google Scholar ] [ CrossRef ]
  • Sirous, A.; Satari, M.; Shahraki, M.M.; Pashayi, M. A Conditional Generative Adversarial Network for Urban Area Classification Using Multi-Source Data. Earth Sci. Inf. 2023 , 16 , 2529–2543. [ Google Scholar ] [ CrossRef ]
  • Vasavi, S.; Sri Somagani, H.; Sai, Y. Classification of Buildings from VHR Satellite Images Using Ensemble of U-Net and ResNet. Egypt. J. Remote Sens. Space Sci. 2023 , 26 , 937–953. [ Google Scholar ] [ CrossRef ]
  • Kang, J.; Fernandez-Beltran, R.; Sun, X.; Ni, J.; Plaza, A. Deep Learning-Based Building Footprint Extraction with Missing Annotations. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 3002805. [ Google Scholar ] [ CrossRef ]
  • Yu, J.; Zeng, P.; Yu, Y.; Yu, H.; Huang, L.; Zhou, D. A Combined Convolutional Neural Network for Urban Land-Use Classification with GIS Data. Remote Sens. 2022 , 14 , 1128. [ Google Scholar ] [ CrossRef ]
  • Wei, P.; Chai, D.; Huang, R.; Peng, D.; Lin, T.; Sha, J.; Sun, W.; Huang, J. Rice Mapping Based on Sentinel-1 Images Using the Coupling of Prior Knowledge and Deep Semantic Segmentation Network: A Case Study in Northeast China from 2019 to 2021. Int. J. Appl. Earth Obs. Geoinf. 2022 , 112 , 102948. [ Google Scholar ] [ CrossRef ]
  • Liu, S.; Peng, D.; Zhang, B.; Chen, Z.; Yu, L.; Chen, J.; Pan, Y.; Zheng, S.; Hu, J.; Lou, Z.; et al. The Accuracy of Winter Wheat Identification at Different Growth Stages Using Remote Sensing. Remote Sens. 2022 , 14 , 893. [ Google Scholar ] [ CrossRef ]
  • Bem, P.P.D.; de Carvalho Júnior, O.A.; Carvalho, O.L.F.D.; Gomes, R.A.T.; Guimarāes, R.F.; Pimentel, C.M.M. Irrigated Rice Crop Identification in Southern Brazil Using Convolutional Neural Networks and Sentinel-1 Time Series. Remote Sens. Appl. 2021 , 24 , 100627. [ Google Scholar ] [ CrossRef ]
  • Niu, B.; Feng, Q.; Su, S.; Yang, Z.; Zhang, S.; Liu, S.; Wang, J.; Yang, J.; Gong, J. Semantic Segmentation for Plastic-Covered Greenhouses and Plastic-Mulched Farmlands from VHR Imagery. Int. J. Digit. Earth 2023 , 16 , 4553–4572. [ Google Scholar ] [ CrossRef ]
  • Sykas, D.; Sdraka, M.; Zografakis, D.; Papoutsis, I. A Sentinel-2 Multiyear, Multicountry Benchmark Dataset for Crop Classification and Segmentation With Deep Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022 , 15 , 3323–3339. [ Google Scholar ] [ CrossRef ]
  • Descals, A.; Wich, S.; Meijaard, E.; Gaveau, D.L.A.; Peedell, S.; Szantoi, Z. High-Resolution Global Map of Smallholder and Industrial Closed-Canopy Oil Palm Plantations. Earth Syst. Sci. Data 2021 , 13 , 1211–1231. [ Google Scholar ] [ CrossRef ]
  • He, J.; Lyu, D.; He, L.; Zhang, Y.; Xu, X.; Yi, H.; Tian, Q.; Liu, B.; Zhang, X. Combining Object-Oriented and Deep Learning Methods to Estimate Photosynthetic and Non-Photosynthetic Vegetation Cover in the Desert from Unmanned Aerial Vehicle Images with Consideration of Shadows. Remote Sens. 2023 , 15 , 105. [ Google Scholar ] [ CrossRef ]
  • Wan, L.; Li, S.; Chen, Y.; He, Z.; Shi, Y. Application of Deep Learning in Land Use Classification for Soil Erosion Using Remote Sensing. Front. Earth Sci. 2022 , 10 , 849531. [ Google Scholar ] [ CrossRef ]
  • Cho, A.Y.; Park, S.-E.; Kim, D.-J.; Kim, J.; Li, C.; Song, J. Burned Area Mapping Using Unitemporal PlanetScope Imagery With a Deep Learning Based Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023 , 16 , 242–253. [ Google Scholar ] [ CrossRef ]
  • Bergado, J.R.; Persello, C.; Reinke, K.; Stein, A. Predicting Wildfire Burns from Big Geodata Using Deep Learning. Saf. Sci. 2021 , 140 , 105276. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Yang, P.; Liang, H.; Zheng, C.; Yin, J.; Tian, Y.; Cui, W. Semantic Segmentation and Analysis on Sensitive Parameters of Forest Fire Smoke Using Smoke-Unet and Landsat-8 Imagery. Remote Sens. 2022 , 14 , 45. [ Google Scholar ] [ CrossRef ]
  • Liu, C.-C.; Zhang, Y.-C.; Chen, P.-Y.; Lai, C.-C.; Chen, Y.-H.; Cheng, J.-H.; Ko, M.-H. Clouds Classification from Sentinel-2 Imagery with Deep Residual Learning and Semantic Image Segmentation. Remote Sens. 2019 , 11 , 119. [ Google Scholar ] [ CrossRef ]
  • Ji, W.; Chen, Y.; Li, K.; Dai, X. Multicascaded Feature Fusion-Based Deep Learning Network for Local Climate Zone Classification Based on the So2Sat LCZ42 Benchmark Dataset. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023 , 16 , 449–467. [ Google Scholar ] [ CrossRef ]
  • Ayhan, B.; Kwan, C. Tree, Shrub, and Grass Classification Using Only RGB Images. Remote Sens. 2020 , 12 , 1333. [ Google Scholar ] [ CrossRef ]
  • Maxwell, A.E.; Bester, M.S.; Guillen, L.A.; Ramezan, C.A.; Carpinello, D.J.; Fan, Y.; Hartley, F.M.; Maynard, S.M.; Pyron, J.L. Semantic Segmentation Deep Learning for Extracting Surface Mine Extents from Historic Topographic Maps. Remote Sens. 2020 , 12 , 4145. [ Google Scholar ] [ CrossRef ]
  • Zhou, G.; Xu, J.; Chen, W.; Li, X.; Li, J.; Wang, L. Deep Feature Enhancement Method for Land Cover With Irregular and Sparse Spatial Distribution Features: A Case Study on Open-Pit Mining. IEEE Trans. Geosci. Remote Sens. 2023 , 61 , 4401220. [ Google Scholar ] [ CrossRef ]
  • Lee, S.-H.; Han, K.-J.; Lee, K.; Lee, K.-J.; Oh, K.-Y.; Lee, M.-J. Classification of Landscape Affected by Deforestation Using High-resolution Remote Sensing Data and Deep-learning Techniques. Remote Sens. 2020 , 12 , 3372. [ Google Scholar ] [ CrossRef ]
  • Yu, T.; Wu, W.; Gong, C.; Li, X. Residual Multi-Attention Classification Network for a Forest Dominated Tropical Landscape Using High-Resolution Remote Sensing Imagery. ISPRS Int. J. Geoinf. 2021 , 10 , 22. [ Google Scholar ] [ CrossRef ]
  • Pashaei, M.; Kamangir, H.; Starek, M.J.; Tissot, P. Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study over a Wetland. Remote Sens. 2020 , 12 , 959. [ Google Scholar ] [ CrossRef ]
  • Fang, B.; Chen, G.; Chen, J.; Ouyang, G.; Kou, R.; Wang, L. Cct: Conditional Co-Training for Truly Unsupervised Remote Sensing Image Segmentation in Coastal Areas. Remote Sens. 2021 , 13 , 3521. [ Google Scholar ] [ CrossRef ]
  • Buchsteiner, C.; Baur, P.A.; Glatzel, S. Spatial Analysis of Intra-Annual Reed Ecosystem Dynamics at Lake Neusiedl Using RGB Drone Imagery and Deep Learning. Remote Sens. 2023 , 15 , 3961. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Mahmoudian, N. Aerial Fluvial Image Dataset for Deep Semantic Segmentation Neural Networks and Its Benchmarks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023 , 16 , 4755–4766. [ Google Scholar ] [ CrossRef ]
  • Chen, J.; Chen, G.; Wang, L.; Fang, B.; Zhou, P.; Zhu, M. Coastal Land Cover Classification of High-Resolution Remote Sensing Images Using Attention-Driven Context Encoding Network. Sensors 2020 , 20 , 7032. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, Y.; Zhou, Y.; Zhang, Y.; Zhong, L.; Wang, J.; Chen, J. DKDFN: Domain Knowledge-Guided Deep Collaborative Fusion Network for Multimodal Unitemporal Remote Sensing Land Cover Classification. ISPRS J. Photogramm. Remote Sens. 2022 , 186 , 170–189. [ Google Scholar ] [ CrossRef ]
  • Tzepkenlis, A.; Marthoglou, K.; Grammalidis, N. Efficient Deep Semantic Segmentation for Land Cover Classification Using Sentinel Imagery. Remote Sens. 2023 , 15 , 2027. [ Google Scholar ] [ CrossRef ]
  • Billson, J.; Islam, M.D.S.; Sun, X.; Cheng, I. Water Body Extraction from Sentinel-2 Imagery with Deep Convolutional Networks and Pixelwise Category Transplantation. Remote Sens. 2023 , 15 , 1253. [ Google Scholar ] [ CrossRef ]
  • Bergamasco, L.; Bovolo, F.; Bruzzone, L. A Dual-Branch Deep Learning Architecture for Multisensor and Multitemporal Remote Sensing Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023 , 16 , 2147–2162. [ Google Scholar ] [ CrossRef ]
  • Yang, X.; Zhang, B.; Chen, Z.; Bai, Y.; Chen, P. A Multi-Temporal Network for Improving Semantic Segmentation of Large-Scale Landsat Imagery. Remote Sens. 2022 , 14 , 5062. [ Google Scholar ] [ CrossRef ]
  • Yang, X.; Chen, Z.; Zhang, B.; Li, B.; Bai, Y.; Chen, P. A Block Shuffle Network with Superpixel Optimization for Landsat Image Semantic Segmentation. Remote Sens. 2022 , 14 , 1432. [ Google Scholar ] [ CrossRef ]
  • Boonpook, W.; Tan, Y.; Nardkulpat, A.; Torsri, K.; Torteeka, P.; Kamsing, P.; Sawangwit, U.; Pena, J.; Jainaen, M. Deep Learning Semantic Segmentation for Land Use and Land Cover Types Using Landsat 8 Imagery. ISPRS Int. J. Geoinf. 2023 , 12 , 14. [ Google Scholar ] [ CrossRef ]
  • Bergado, J.R.; Persello, C.; Stein, A. Recurrent Multiresolution Convolutional Networks for VHR Image Classification. IEEE Trans. Geosci. Remote Sens. 2018 , 56 , 6361–6374. [ Google Scholar ] [ CrossRef ]
  • Karila, K.; Matikainen, L.; Karjalainen, M.; Puttonen, E.; Chen, Y.; Hyyppä, J. Automatic Labelling for Semantic Segmentation of VHR Satellite Images: Application of Airborne Laser Scanner Data and Object-Based Image Analysis. ISPRS Open J. Photogramm. Remote Sens. 2023 , 9 , 100046. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Du, L.; Tan, S.; Wu, F.; Zhu, L.; Zeng, Y.; Wu, B. Land Use and Land Cover Mapping Using Rapideye Imagery Based on a Novel Band Attention Deep Learning Method in the Three Gorges Reservoir Area. Remote Sens. 2021 , 13 , 1225. [ Google Scholar ] [ CrossRef ]
  • Zhu, Y.; Geis, C.; So, E.; Jin, Y. Multitemporal Relearning with Convolutional LSTM Models for Land Use Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021 , 14 , 3251–3265. [ Google Scholar ] [ CrossRef ]
  • Fan, Z.; Zhan, T.; Gao, Z.; Li, R.; Liu, Y.; Zhang, L.; Jin, Z.; Xu, S. Land Cover Classification of Resources Survey Remote Sensing Images Based on Segmentation Model. IEEE Access 2022 , 10 , 56267–56281. [ Google Scholar ] [ CrossRef ]
  • Clark, A.; Phinn, S.; Scarth, P. Pre-Processing Training Data Improves Accuracy and Generalisability of Convolutional Neural Network Based Landscape Semantic Segmentation. Land 2023 , 12 , 1268. [ Google Scholar ] [ CrossRef ]
  • Mohammadimanesh, F.; Salehi, B.; Mahdianpari, M.; Gill, E.; Molinier, M. A New Fully Convolutional Neural Network for Semantic Segmentation of Polarimetric SAR Imagery in Complex Land Cover Ecosystem. ISPRS J. Photogramm. Remote Sens. 2019 , 151 , 223–236. [ Google Scholar ] [ CrossRef ]
  • Wenger, R.; Puissant, A.; Weber, J.; Idoumghar, L.; Forestier, G. Multimodal and Multitemporal Land Use/Land Cover Semantic Segmentation on Sentinel-1 and Sentinel-2 Imagery: An Application on a MultiSenGE Dataset. Remote Sens. 2023 , 15 , 151. [ Google Scholar ] [ CrossRef ]
  • Xia, J.; Yokoya, N.; Adriano, B.; Zhang, L.; Li, G.; Wang, Z. A Benchmark High-Resolution GaoFen-3 SAR Dataset for Building Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021 , 14 , 5950–5963. [ Google Scholar ] [ CrossRef ]
  • Kotru, R.; Turkar, V.; Simu, S.; De, S.; Shaikh, M.; Banerjee, S.; Singh, G.; Das, A. Development of a Generalized Model to Classify Various Land Covers for ALOS-2 L-Band Images Using Semantic Segmentation. Adv. Space Res. 2022 , 70 , 3811–3821. [ Google Scholar ] [ CrossRef ]
  • Mehra, A.; Jain, N.; Srivastava, H.S. A Novel Approach to Use Semantic Segmentation Based Deep Learning Networks to Classify Multi-Temporal SAR Data. Geocarto Int. 2022 , 37 , 163–178. [ Google Scholar ] [ CrossRef ]
  • Pešek, O.; Segal-Rozenhaimer, M.; Karnieli, A. Using Convolutional Neural Networks for Cloud Detection on VENμS Images over Multiple Land-Cover Types. Remote Sens. 2022 , 14 , 5210. [ Google Scholar ] [ CrossRef ]
  • Jing, H.; Wang, Z.; Sun, X.; Xiao, D.; Fu, K. PSRN: Polarimetric Space Reconstruction Network for PolSAR Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021 , 14 , 10716–10732. [ Google Scholar ] [ CrossRef ]
  • Zhang, R.; Chen, J.; Feng, L.; Li, S.; Yang, W.; Guo, D. A Refined Pyramid Scene Parsing Network for Polarimetric SAR Image Semantic Segmentation in Agricultural Areas. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 4014805. [ Google Scholar ] [ CrossRef ]
  • Garg, R.; Kumar, A.; Bansal, N.; Prateek, M.; Kumar, S. Semantic Segmentation of PolSAR Image Data Using Advanced Deep Learning Model. Sci. Rep. 2021 , 11 , 15365. [ Google Scholar ] [ CrossRef ]
  • Zheng, N.-R.; Yang, Z.-A.; Shi, X.-Z.; Zhou, R.-Y.; Wang, F. Land Cover Classification of Synthetic Aperture Radar Images Based on Encoder—Decoder Network with an Attention Mechanism. J. Appl. Remote Sens. 2022 , 16 , 014520. [ Google Scholar ] [ CrossRef ]
  • Shi, X.; Fu, S.; Chen, J.; Wang, F.; Xu, F. Object-Level Semantic Segmentation on the High-Resolution Gaofen-3 FUSAR-Map Dataset. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021 , 14 , 3107–3119. [ Google Scholar ] [ CrossRef ]
  • Yoshida, K.; Pan, S.; Taniguchi, J.; Nishiyama, S.; Kojima, T.; Islam, M.T. Airborne LiDAR-Assisted Deep Learning Methodology for Riparian Land Cover Classification Using Aerial Photographs and Its Application for Flood Modelling. J. Hydroinformatics 2022 , 24 , 179–201. [ Google Scholar ] [ CrossRef ]
  • Arief, H.A.; Strand, G.-H.; Tveite, H.; Indahl, U.G. Land Cover Segmentation of Airborne LiDAR Data Using Stochastic Atrous Network. Remote Sens. 2018 , 10 , 973. [ Google Scholar ] [ CrossRef ]
  • Xu, Z.; Su, C.; Zhang, X. A Semantic Segmentation Method with Category Boundary for Land Use and Land Cover (LULC) Mapping of Very-High Resolution (VHR) Remote Sensing Image. Int. J. Remote Sens. 2021 , 42 , 3146–3165. [ Google Scholar ] [ CrossRef ]
  • Liu, M.; Zhang, P.; Shi, Q.; Liu, M. An Adversarial Domain Adaptation Framework with KL-Constraint for Remote Sensing Land Cover Classification. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 3002305. [ Google Scholar ] [ CrossRef ]
  • Lee, D.G.; Shin, Y.H.; Lee, D.C. Land Cover Classification Using SegNet with Slope, Aspect, and Multidirectional Shaded Relief Images Derived from Digital Surface Model. J. Sens. 2020 , 2020 , 8825509. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Shi, H.; Zhuang, Y.; Sang, Q.; Chen, L. Bidirectional Grid Fusion Network for Accurate Land Cover Classification of High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020 , 13 , 5508–5517. [ Google Scholar ] [ CrossRef ]
  • Shi, H.; Fan, J.; Wang, Y.; Chen, L. Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images. Remote Sens. 2021 , 13 , 3715. [ Google Scholar ] [ CrossRef ]
  • He, S.; Lu, X.; Gu, J.; Tang, H.; Yu, Q.; Liu, K.; Ding, H.; Chang, C.; Wang, N. RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Images-Based Semantic Segmentation. IEEE Access 2022 , 10 , 34858–34871. [ Google Scholar ] [ CrossRef ]
  • Yang, N.; Tang, H. Semantic Segmentation of Satellite Images: A Deep Learning Approach Integrated with Geospatial Hash Codes. Remote Sens. 2021 , 13 , 2723. [ Google Scholar ] [ CrossRef ]
  • Boguszewski, A.; Batorski, D.; Ziemba-Jankowska, N.; Dziedzic, T.; Zambrzycka, A. LandCover.Ai: Dataset for Automatic Mapping of Buildings, Woodlands, Water and Roads from Aerial Imagery. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA, 20–25 June 2021. [ Google Scholar ]
  • Gao, J.; Weng, L.; Xia, M.; Lin, H. MLNet: Multichannel Feature Fusion Lozenge Network for Land Segmentation. J. Appl. Remote Sens. 2022 , 16 , 016513. [ Google Scholar ] [ CrossRef ]
  • Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raska, R. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; Volume 2018. [ Google Scholar ]
  • Wei, H.; Xu, X.; Ou, N.; Zhang, X.; Dai, Y. Deanet: Dual Encoder with Attention Network for Semantic Segmentation of Remote Sensing Imagery. Remote Sens. 2021 , 13 , 3900. [ Google Scholar ] [ CrossRef ]
  • Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; Volume 2017. [ Google Scholar ]
  • Li, W.; He, C.; Fang, J.; Zheng, J.; Fu, H.; Yu, L. Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data. Remote Sens. 2019 , 11 , 403. [ Google Scholar ] [ CrossRef ]
  • Ji, S.; Wang, D.; Luo, M. Generative Adversarial Network-Based Full-Space Domain Adaptation for Land Cover Classification from Multiple-Source Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021 , 59 , 3816–3828. [ Google Scholar ] [ CrossRef ]
  • Chen, Q.; Wang, L.; Wu, Y.; Wu, G.; Guo, Z.; Waslander, S.L. Aerial Imagery for Roof Segmentation: A Large-Scale Dataset towards Automatic Mapping of Buildings. ISPRS J. Photogramm. Remote Sens. 2019 , 147 , 42–55. [ Google Scholar ] [ CrossRef ]
  • Audebert, N.; Le Saux, B.; Lefèvre, S. Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images. Remote Sens. 2017 , 9 , 368. [ Google Scholar ] [ CrossRef ]
  • Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Multi-Object Segmentation in Complex Urban Scenes from High-Resolution Remote Sensing Data. Remote Sens. 2021 , 13 , 3710. [ Google Scholar ] [ CrossRef ]
  • Khan, S.D.; Alarabi, L.; Basalamah, S. Deep Hybrid Network for Land Cover Semantic Segmentation in High-Spatial Resolution Satellite Images. Information 2021 , 12 , 230. [ Google Scholar ] [ CrossRef ]
  • Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022 , 14 , 3109. [ Google Scholar ] [ CrossRef ]
  • Sang, Q.; Zhuang, Y.; Dong, S.; Wang, G.; Chen, H. FRF-Net: Land Cover Classification from Large-Scale VHR Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2020 , 17 , 1057–1061. [ Google Scholar ] [ CrossRef ]
  • Guo, Y.; Wang, F.; Xiang, Y.; You, H. Article Dgfnet: Dual Gate Fusion Network for Land Cover Classification in Very High-Resolution Images. Remote Sens. 2021 , 13 , 3755. [ Google Scholar ] [ CrossRef ]
  • Niu, X.; Zeng, Q.; Luo, X.; Chen, L. FCAU-Net for the Semantic Segmentation of Fine-Resolution Remotely Sensed Images. Remote Sens. 2022 , 14 , 215. [ Google Scholar ] [ CrossRef ]
  • Wu, M.; Zhang, C.; Liu, J.; Zhou, L.; Li, X. Towards Accurate High Resolution Satellite Image Semantic Segmentation. IEEE Access 2019 , 7 , 55609–55619. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Wang, H.; Zhang, A.; Liu, Y. Semantic Segmentation of Hyperspectral Remote Sensing Images Based on PSE-UNet Model. Sensors 2022 , 22 , 9678. [ Google Scholar ] [ CrossRef ]
  • Salgueiro, L.; Marcello, J.; Vilaplana, V. SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images. Remote Sens. 2022 , 14 , 5862. [ Google Scholar ] [ CrossRef ]
  • Marsocci, V.; Scardapane, S.; Komodakis, N. MARE: Self-Supervised Multi-Attention REsu-Net for Semantic Segmentation in Remote Sensing. Remote Sens. 2021 , 13 , 3275. [ Google Scholar ] [ CrossRef ]
  • Wang, S.; Mu, X.; Yang, D.; He, H.; Zhao, P. Attention Guided Encoder-Decoder Network with Multi-Scale Context Aggregation for Land Cover Segmentation. IEEE Access 2020 , 8 , 215299–215309. [ Google Scholar ] [ CrossRef ]
  • Feng, D.; Zhang, Z.; Yan, K. A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter. IEEE Access 2022 , 10 , 77432–77451. [ Google Scholar ] [ CrossRef ]
  • Bai, J.; Wen, Z.; Xiao, Z.; Ye, F.; Zhu, Y.; Alazab, M.; Jiao, L. Hyperspectral Image Classification Based on Multibranch Attention Transformer Networks. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 3196661. [ Google Scholar ] [ CrossRef ]
  • Meng, X.; Yang, Y.; Wang, L.; Wang, T.; Li, R.; Zhang, C. Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 6517505. [ Google Scholar ] [ CrossRef ]
  • Wang, D.; Yang, R.; Zhang, Z.; Liu, H.; Tan, J.; Li, S.; Yang, X.; Wang, X.; Tang, K.; Qiao, Y.; et al. P-Swin: Parallel Swin Transformer Multi-Scale Semantic Segmentation Network for Land Cover Classification. Comput. Geosci. 2023 , 175 , 105340. [ Google Scholar ] [ CrossRef ]
  • Dong, R.; Fang, W.; Fu, H.; Gan, L.; Wang, J.; Gong, P. High-Resolution Land Cover Mapping through Learning with Noise Correction. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 4402013. [ Google Scholar ] [ CrossRef ]
  • Shen, X.; Weng, L.; Xia, M.; Lin, H. Multi-Scale Feature Aggregation Network for Semantic Segmentation of Land Cover. Remote Sens. 2022 , 14 , 6156. [ Google Scholar ] [ CrossRef ]
  • Luo, Y.; Wang, J.; Yang, X.; Yu, Z.; Tan, Z. Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation. Remote Sens. 2022 , 14 , 5415. [ Google Scholar ] [ CrossRef ]
  • Yuan, X.; Chen, Z.; Chen, N.; Gong, J. Land Cover Classification Based on the PSPNet and Superpixel Segmentation Methods with High Spatial Resolution Multispectral Remote Sensing Imagery. J. Appl. Remote Sens. 2021 , 15 , 034511. [ Google Scholar ] [ CrossRef ]
  • Zhang, C.; Yue, P.; Tapete, D.; Shangguan, B.; Wang, M.; Wu, Z. A Multi-Level Context-Guided Classification Method with Object-Based Convolutional Neural Network for Land Cover Classification Using Very High Resolution Remote Sensing Images. Int. J. Appl. Earth Obs. Geoinf. 2020 , 88 , 102086. [ Google Scholar ] [ CrossRef ]
  • Van den Broeck, W.A.J.; Goedemé, T.; Loopmans, M. Multiclass Land Cover Mapping from Historical Orthophotos Using Domain Adaptation and Spatio-Temporal Transfer Learning. Remote Sens. 2022 , 14 , 5911. [ Google Scholar ] [ CrossRef ]
  • Zhang, B.; Chen, T.; Wang, B. Curriculum-Style Local-to-Global Adaptation for Cross-Domain Remote Sensing Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5611412. [ Google Scholar ] [ CrossRef ]
  • Li, A.; Jiao, L.; Zhu, H.; Li, L.; Liu, F. Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5400314. [ Google Scholar ] [ CrossRef ]
  • Shan, L.; Wang, W. DenseNet-Based Land Cover Classification Network with Deep Fusion. IEEE Geosci. Remote Sens. Lett. 2022 , 19 . [ Google Scholar ] [ CrossRef ]
  • Safarov, F.; Temurbek, K.; Jamoljon, D.; Temur, O.; Chedjou, J.C.; Abdusalomov, A.B.; Cho, Y.-I. Improved Agricultural Field Segmentation in Satellite Imagery Using TL-ResUNet Architecture. Sensors 2022 , 22 , 9784. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Liu, Z.-Q.; Tang, P.; Zhang, W.; Zhang, Z. CNN-Enhanced Heterogeneous Graph Convolutional Network: Inferring Land Use from Land Cover with a Case Study of Park Segmentation. Remote Sens. 2022 , 14 , 5027. [ Google Scholar ] [ CrossRef ]
  • Wang, D.; Yang, R.; Liu, H.; He, H.; Tan, J.; Li, S.; Qiao, Y.; Tang, K.; Wang, X. HFENet: Hierarchical Feature Extraction Network for Accurate Landcover Classification. Remote Sens. 2022 , 14 , 4244. [ Google Scholar ] [ CrossRef ]
  • Zhang, H.; Liu, M.; Wang, Y.; Shang, J.; Liu, X.; Li, B.; Song, A.; Li, Q. Automated Delineation of Agricultural Field Boundaries from Sentinel-2 Images Using Recurrent Residual U-Net. Int. J. Appl. Earth Obs. Geoinf. 2021 , 105 , 102557. [ Google Scholar ] [ CrossRef ]
  • Maggiolo, L.; Marcos, D.; Moser, G.; Serpico, S.B.; Tuia, D. A Semisupervised CRF Model for CNN-Based Semantic Segmentation with Sparse Ground Truth. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5606315. [ Google Scholar ] [ CrossRef ]
  • Barthakur, M.; Sarma, K.K.; Mastorakis, N. Modified Semi-Supervised Adversarial Deep Network and Classifier Combination for Segmentation of Satellite Images. IEEE Access 2020 , 8 , 117972–117985. [ Google Scholar ] [ CrossRef ]
  • Wang, Q.; Luo, X.; Feng, J.; Li, S.; Yin, J. CCENet: Cascade Class-Aware Enhanced Network for High-Resolution Aerial Imagery Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022 , 15 , 6943–6956. [ Google Scholar ] [ CrossRef ]
  • Zhang, C.; Zhang, L.; Zhang, B.Y.J.; Sun, J.; Dong, S.; Wang, X.; Li, Y.; Xu, J.; Chu, W.; Dong, Y.; et al. Land Cover Classification in a Mixed Forest-Grassland Ecosystem Using LResU-Net and UAV Imagery. J. Res. 2022 , 33 , 923–936. [ Google Scholar ] [ CrossRef ]
  • Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018 , 10 , 144. [ Google Scholar ] [ CrossRef ]
  • Li, L.; Yao, J.; Liu, Y.; Yuan, W.; Shi, S.; Yuan, S. Optimal Seamline Detection for Orthoimage Mosaicking by Combining Deep Convolutional Neural Network and Graph Cuts. Remote Sens. 2017 , 9 , 701. [ Google Scholar ] [ CrossRef ]
  • Cecili, G.; De Fioravante, P.; Congedo, L.; Marchetti, M.; Munafò, M. Land Consumption Mapping with Convolutional Neural Network: Case Study in Italy. Land 2022 , 11 , 1919. [ Google Scholar ] [ CrossRef ]
  • Abadal, S.; Salgueiro, L.; Marcello, J.; Vilaplana, V. A Dual Network for Super-Resolution and Semantic Segmentation of Sentinel-2 Imagery. Remote Sens. 2021 , 13 , 4547. [ Google Scholar ] [ CrossRef ]
  • Henry, C.J.; Storie, C.D.; Palaniappan, M.; Alhassan, V.; Swamy, M.; Aleshinloye, D.; Curtis, A.; Kim, D. Automated LULC Map Production Using Deep Neural Networks. Int. J. Remote Sens. 2019 , 40 , 4416–4440. [ Google Scholar ] [ CrossRef ]
  • Shojaei, H.; Nadi, S.; Shafizadeh-Moghadam, H.; Tayyebi, A.; Van Genderen, J. An Efficient Built-up Land Expansion Model Using a Modified U-Net. Int. J. Digit. Earth 2022 , 15 , 148–163. [ Google Scholar ] [ CrossRef ]
  • Dong, X.; Zhang, C.; Fang, L.; Yan, Y. A Deep Learning Based Framework for Remote Sensing Image Ground Object Segmentation. Appl. Soft Comput. 2022 , 130 , 109695. [ Google Scholar ] [ CrossRef ]
  • Guo, X.; Chen, Z.; Wang, C. Fully Convolutional Densenet with Adversarial Training for Semantic Segmentation of High-Resolution Remote Sensing Images. J. Appl. Remote Sens. 2021 , 15 , 016520. [ Google Scholar ] [ CrossRef ]
  • Zhang, B.; Wan, Y.; Zhang, Y.; Li, Y. JSH-Net: Joint Semantic Segmentation and Height Estimation Using Deep Convolutional Networks from Single High-Resolution Remote Sensing Imagery. Int. J. Remote Sens. 2022 , 43 , 6307–6332. [ Google Scholar ] [ CrossRef ]
  • Li, W.; Chen, K.; Shi, Z. Geographical Supervision Correction for Remote Sensing Representation Learning. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5411520. [ Google Scholar ] [ CrossRef ]
  • Shi, W.; Qin, W.; Chen, A. Towards Robust Semantic Segmentation of Land Covers in Foggy Conditions. Remote Sens. 2022 , 14 , 4551. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Tang, P.; Zhao, L. Fast and Accurate Land Cover Classification on Medium Resolution Remote Sensing Images Using Segmentation Models. Int. J. Remote Sens. 2021 , 42 , 3277–3301. [ Google Scholar ] [ CrossRef ]
  • Dechesne, C.; Mallet, C.; Le Bris, A.; Gouet-Brunet, V. Semantic Segmentation of Forest Stands of Pure Species Combining Airborne Lidar Data and Very High Resolution Multispectral Imagery. ISPRS J. Photogramm. Remote Sens. 2017 , 126 , 129–145. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.; Lu, W.; Cao, J.; Xie, G. MKANet: An Efficient Network with Sobel Boundary Loss for Land-Cover Classification of Satellite Remote Sensing Imagery. Remote Sens. 2022 , 14 , 4514. [ Google Scholar ] [ CrossRef ]
  • Li, W.; Chen, K.; Chen, H.; Shi, Z. Geographical Knowledge-Driven Representation Learning for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022 , 60 , 5405516. [ Google Scholar ] [ CrossRef ]
  • Liu, W.; Liu, J.; Luo, Z.; Zhang, H.; Gao, K.; Li, J. Weakly Supervised High Spatial Resolution Land Cover Mapping Based on Self-Training with Weighted Pseudo-Labels. Int. J. Appl. Earth Obs. Geoinf. 2022 , 112 , 102931. [ Google Scholar ] [ CrossRef ]
  • Liu, Q.; Kampffmeyer, M.; Jenssen, R.; Salberg, A.-B. Dense Dilated Convolutions Merging Network for Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2020 , 58 , 6309–6320. [ Google Scholar ] [ CrossRef ]
  • Li, Z.; Zhang, H.; Lu, F.; Xue, R.; Yang, G.; Zhang, L. Breaking the Resolution Barrier: A Low-to-High Network for Large-Scale High-Resolution Land-Cover Mapping Using Low-Resolution Labels. ISPRS J. Photogramm. Remote Sens. 2022 , 192 , 244–267. [ Google Scholar ] [ CrossRef ]
  • Yuan, Q.; Mohd Shafri, H.Z. Multi-Modal Feature Fusion Network with Adaptive Center Point Detector for Building Instance Extraction. Remote Sens. 2022 , 14 , 4920. [ Google Scholar ] [ CrossRef ]
  • Mboga, N.; D’aronco, S.; Grippa, T.; Pelletier, C.; Georganos, S.; Vanhuysse, S.; Wolff, E.; Smets, B.; Dewitte, O.; Lennert, M.; et al. Domain Adaptation for Semantic Segmentation of Historical Panchromatic Orthomosaics in Central Africa. ISPRS Int. J. Geoinf. 2021 , 10 , 523. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.; Doi, K.; Iwasaki, A.; Xu, G. Unsupervised Domain Adaptation of High-Resolution Aerial Images via Correlation Alignment and Self Training. IEEE Geosci. Remote Sens. Lett. 2021 , 18 , 746–750. [ Google Scholar ] [ CrossRef ]
  • Simms, D.M. Fully Convolutional Neural Nets In-the-Wild. Remote Sens. Lett. 2020 , 11 , 1080–1089. [ Google Scholar ] [ CrossRef ]
  • Liu, C.; Du, S.; Lu, H.; Li, D.; Cao, Z. Multispectral Semantic Land Cover Segmentation from Aerial Imagery with Deep Encoder-Decoder Network. IEEE Geosci. Remote Sens. Lett. 2022 , 19 , 5000105. [ Google Scholar ] [ CrossRef ]
  • Sun, P.; Lu, Y.; Zhai, J. Mapping Land Cover Using a Developed U-Net Model with Weighted Cross Entropy. Geocarto Int. 2022 , 37 , 9355–9368. [ Google Scholar ] [ CrossRef ]
  • Chen, J.; Sun, B.; Wang, L.; Fang, B.; Chang, Y.; Li, Y.; Zhang, J.; Lyu, X.; Chen, G. Semi-Supervised Semantic Segmentation Framework with Pseudo Supervisions for Land-Use/Land-Cover Mapping in Coastal Areas. Int. J. Appl. Earth Obs. Geoinf. 2022 , 112 , 102881. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Data Sources Number of Articles References
RS Satellites
Sentinel-27[ , , , , ]
Landsat5[ , , , ]
Worldview-032[ , ]
Rapid eye1[ ]
Worldview-021[ ]
Quickbird1[ ]
ZY-31[ ]
PlanetScope1[ ]
GF-22[ , ]
Aerial images
Phantom m multi-rotor AUS1[ ]
Quadcopter drone1[ ]
Vexcel Ultracam Eagle Camera1[ ]
DJI-Phantom 4 pro UAV1[ ]
SAR SAT
RADARSAT-21[ ]
Sentinel-16[ , , , , , ]
GF-31[ ]
ALOS-21[ ]
Others
Earth digitalglobe2[ , ]
Mobile phone1[ ]
Lidar Sources1 [ ]
ModelsDatasetsPerformance MetricsLimitation/Future Work
RAANet [ ] LoveDA,
ISPRS Vaihingen
MIoU = 77.28,
MIoU = 73.47
Accuracy can be improved with optimization.
PSE-UNet Model [ ] Salinas DatasetMIoU = 88.50Inaccurate segmentation of land cover features with low frequencies, superfluous parameter redundancy, and unvalidated generalization capabilities.
SEG-ESRGAN [ ]Sentinel-2 and WorldView-2 image pairs.MIoU = 62.78 The assessment of utilizing medium-resolution images has not been tested
Class-wise FCN [ ]Vaihingen, PotsdamMIoU = 72.35,
MIoU = 76.88
Enhancements in performance can be achieved through class-wise considerations for multiple classes, along with improved and more efficient implementations.
MARE [ ]VaihingenMIoU = 81.76Improve performance through parameter optimization and extend approach incorporating other self-supervised algorithms.
Feature fusion with dual attention and flexible contextual adaptation [ ]Vaihingen,
GaoFen-2
MIoU = 70.51,
MIoU = 56.98
Computational complexity issue.
Deanet [ ]LandCover.ai,
DSTL dataset,
DeepGlobe
MIoU = 90.28,
MIoU = 52.70,
MIoU = 71.80
Suboptimal performance. Future efforts involve incorporating the spatial attention module into a single unified backbone network.
An encoder-decoder framework featuring attention-guided multi-scale context integration [ ]GF-2 imagesMIoU = 62.3%Reduced accuracy on imbalance data.
ModelsDataPerformanceLimitation
Swin-S-GF [ ],GIDOA = 89.15
MIoU = 80.14
Computational complexity issue and
slow convergence speed.
CG-Swin [ ]Vaihingen,
Potsdam
OA = 91.68
MIoU = 83.39,
OA = 91.93
MIoU = 87.61
Extending CG-Swin to accommodate multi-modal data sources for more comprehensive and robust classification.
BANet [ ]Vaihingen,
Potsdam,
UAVid dataset
MIoU = 81.35,
MIoU = 86.25,
MIoU = 64.6
Combine convolution and Transformer as a hybrid structure to improve performance.
Spectral spatial transformer [ ]Indian datasetOA = 0.94Computational complexity issue
Sgformer [ ]Landcover datasetMIOU = 0.85Computational complexity issue and
slow convergence speed.
Parallel Swin Transformer [ ]Postdam,
GID
WHDLD
OA = 89.44,
OA = 84.67,
OA = 84.86
Performance can be improved.
ModelsDatasetsPerformance MetricsLimitation
RSI-Net [ ]Vaihingen,
Potsdam,
GID
OA = 91.83,
OA = 93.31,
OA = 93.67
Limitation in segmentation of pixel-wise semantics. Enhanced feature map fusion decoders can lead to performance improvements.
HMRT [ ] PotsdamOA = 85.99
MIoU = 74.14
Parameter complexity issue, decrease in segmentation accuracy due to a lot of noise. Optimization is required.
UNetFormer [ ]UAVid,
Vaihingen,
Potsdam,
LoveDA
MIoU = 67.8,
OA = 91.0
MIoU = 82.7,
OA = 91.3
MIoU = 86.8,
MIoU = 52.4
Investigate the Transformer’s potential and practicality in addressing geospatial vision tasks is open for research.
(TL-ResUNet) model [ ]DeepGlobeIoU = 0.81Improve classification performance is open for research, and developing real time and automated solution for land use land cover.
CNN-enhanced heterogeneous GCN [ ]Beijing dataset,
Shenzhen dataset.
MIoU = 70.48,
MIoU = 62.45
Future endeavor is to optimize the utilization of pretrained deep CNN features and GCN features across various segmentation scales.
HFENet [ ]MZData,
LandCover Dataset,
WHU Building Dataset
MIoU = 87.19,
MIoU = 89.69,
MIoU = 92.12
Time and space complexity issues. Future work can be to automatically fine-tune the parameters to attain the optimal performance of the model.
Model’s Structures Batch SizeEpochsLearning RateData AugmentationBackbonePopular OptimizerParametersEvaluation Metrics
Encoder/decoder-based 4, 8, 16, 64100–5000.01YesResNetSGDLow–HighMIoU, OA, F1
Transformer-based 6, 8100–2000.0006YesResNet/SwintinyAdamHighMIoU, OA, F1
Hybrid models8, 1640–1000.0006YesResNetAdamLow–HighMIoU, OA, F1
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Ajibola, S.; Cabral, P. A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping. Remote Sens. 2024 , 16 , 2222. https://doi.org/10.3390/rs16122222

Ajibola S, Cabral P. A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping. Remote Sensing . 2024; 16(12):2222. https://doi.org/10.3390/rs16122222

Ajibola, Segun, and Pedro Cabral. 2024. "A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping" Remote Sensing 16, no. 12: 2222. https://doi.org/10.3390/rs16122222

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

share this!

June 24, 2024 report

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Analysis of data suggests homosexual behavior in other animals is far more common than previously thought

by Bob Yirka , Phys.org

Meta-analysis of prior research data suggests homosexual behavior in other animals far more common than thought

A team of anthropologists and biologists from Canada, Poland, and the U.S., working with researchers at the American Museum of Natural History, in New York, has found via meta-analysis of data from prior research efforts that homosexual behavior is far more common in other animals than previously thought. The paper is published in PLOS ONE .

For many years, the biology community has accepted the notion that homosexuality is less common in animals than in humans, despite a lack of research on the topic. In this new effort, the researchers sought to find out if such assumptions are true.

The work involved study of 65 studies into the behavior of multiple species of animals, mostly mammals, such as elephants, squirrels, monkeys, rats and racoons.

The researchers found that 76% of the studies mentioned observations of homosexual behavior, though they also noted that only 46% had collected data surrounding such behavior—and only 18.5% of those who had mentioned such behavior in their papers had focused their efforts on it to the extent of publishing work with homosexuality as it core topic.

They noted that homosexual behavior observed in other species included mounting, intromission and oral contact—and that researchers who identified as LGBTQ+ were no more or less likely to study the topic than other researchers.

The researchers point to a hesitancy in the biological community to study homosexuality in other species , and thus, little research has been conducted. They further suggest that some of the reluctance has been due to the belief that such behavior is too rare to warrant further study.

The research team suggests that homosexuality is far more common in the animal kingdom than has been reported—they further suggest more work is required regarding homosexual behaviors in other animals to dispel the myth of rarity.

Journal information: PLoS ONE

© 2024 Science X Network

Explore further

Feedback to editors

research topic on data analytics

Researchers develop high-performance anion exchange membranes for sustainability applications

research topic on data analytics

Half of world's lakes are less resilient to disturbance than they used to be

research topic on data analytics

Modeling software reveals patterns in continuous seismic waveforms during series of stick-slip, magnitude-5 earthquakes

research topic on data analytics

Discovery of vast sex differences in cellular activity has major implications for disease treatment

research topic on data analytics

Researchers discover new flat electronic bands, paving way for advanced quantum materials

research topic on data analytics

Not all calcite crystals perfect; synthesis methods can alter internal structure, affect chemical reactivity

research topic on data analytics

Boosting 'natural killer' cell activity could improve cancer therapy

5 hours ago

research topic on data analytics

AI predicts upper secondary education dropout as early as the end of primary school

research topic on data analytics

Study reveals how one enzyme hitches a ride on another to recognize tRNA

research topic on data analytics

1,500-year-old reliquary discovered

Relevant physicsforums posts, color recognition: what we see vs animals with a larger color range.

8 hours ago

Innovative ideas and technologies to help folks with disabilities

20 hours ago

Is meat broth really nutritious?

Jun 24, 2024

COVID Virus Lives Longer with Higher CO2 In the Air

Jun 22, 2024

Periodical Cicada Life Cycle

Jun 21, 2024

A DNA Animation

Jun 15, 2024

More from Biology and Medical

Related Stories

research topic on data analytics

How, and why, did homosexual behavior evolve in humans and other animals?

Oct 12, 2023

research topic on data analytics

Male rhesus macaques often have sex with each other, a trait they have inherited in part from their parents

Jul 15, 2023

research topic on data analytics

Same-gender sexual behavior found to be widespread across mammal species and to have multiple origins

Oct 4, 2023

research topic on data analytics

Stop calling it a choice: Biological factors drive homosexuality

Sep 4, 2019

Clinicians' personal religious beliefs may impact treatment provided to patients who are homosexual

Oct 23, 2017

research topic on data analytics

Study shows same-sex sexual behavior is widespread and heritable in macaque monkeys

Jul 10, 2023

Recommended for you

research topic on data analytics

In a world-first, researchers map a 4,200 km transatlantic flight of the painted lady butterfly

10 hours ago

research topic on data analytics

Hidden mechanisms behind hermaphroditic plant self-incompatibility revealed

research topic on data analytics

Australia's giant lizards help save sheep from being eaten alive

12 hours ago

research topic on data analytics

Scientists identify safe havens we must preserve to prevent 'the sixth great extinction of life on Earth'

research topic on data analytics

Bats use four key tactics for accurate target tracking

6 hours ago

research topic on data analytics

Bacteria found to produce proteins that act like antifreeze, helping marine worms survive in polar waters

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

large-circle

May 5-8, 2024

  • Past Conferences

Social and Clinical Factors Influencing ED Utilization Behavior Amongst Blue Cross Blue Shield of Louisiana Members: A Cluster Analysis

Mousavian M 1 , Kippers J 1 , Zhang H 1 , Lanata N 1 , Lanata N 1 , Palanki S 2 , Zhang Y 1 , Vicidomina B 1 1 Blue Cross Blue Shield of Louisiana, Baton Rouge, LA, USA, 2 Blue Cross Blue Shield of Louisiana, Morgantown, WV, USA

Presentation Documents

  • ISPOR Poster MA ED utilization_Cluster Analysis_BCBSLA139869.pdf

METHODS: BCBSLA developed a model-based clustering methodology using data from 2022 that employed Partitioning Around Medoids and Gower distance matrix. Researchers descriptively analyzed healthcare utilization, SVI, and health conditions within each cluster. Independent predictors of respective cluster membership used a Logistic Regression model, which assesses the importance of drivers within clusters. Researchers spatially analyzed clusters for estimating access to health services such as a primary care provider, urgent care, and emergency department.

RESULTS: Using MA and commercial data, the clustering methodology identified 16 distinct clusters within each race/ethnicity that had common characteristics. Among the notable clusters: Cluster 1 was a cohort of White, commercial members with socioeconomic challenges such as no high school diploma, and disabilities. Cluster 2 was a cohort of Black/African American, female, commercial members with hypertension who had high housing burden and were below 150% of poverty level. Cluster 3 consisted of White, male, MA members with diabetes and high housing cost burden and limited knowledge of English. Cluster 4 consisted of Black/African American, male, MA members with diabetes who had socioeconomic challenges like single-parent household and housing burden.

Economic Evaluation, Methodological & Statistical Research

Topic Subcategory

Cost-comparison, Effectiveness, Utility, Benefit Analysis

No Additional Disease & Conditions/Specialized Treatment Areas

COMMENTS

  1. Data Science & Analytics Research Topics (Includes Free Webinar)

    Data Science-Related Research Topics. Developing machine learning models for real-time fraud detection in online transactions. The use of big data analytics in predicting and managing urban traffic flow. Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.

  2. 37 Research Topics In Data Science To Stay On Top Of » EML

    By harnessing the power of big data analytics, they can improve their decision-making, better understand their customers, and develop new products and services. 3.) Auto Machine Learning. Auto machine learning is a research topic in data science concerned with developing algorithms that can automatically learn from data without intervention.

  3. 99+ Data Science Research Topics: A Path to Innovation

    99+ Data Science Research Topics: A Path to Innovation. In today's rapidly advancing digital age, data science research plays a pivotal role in driving innovation, solving complex problems, and shaping the future of technology. Choosing the right data science research topics is paramount to making a meaningful impact in this field.

  4. 10 Best Research and Thesis Topic Ideas for Data Science in 2022

    In this article, we have listed 10 such research and thesis topic ideas to take up as data science projects in 2022. Handling practical video analytics in a distributed cloud: With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things ...

  5. 99+ Interesting Data Science Research Topics For Students

    A data science research paper should start with a clear goal, stating what the study aims to investigate or achieve. This objective guides the entire paper, helping readers understand the purpose and direction of the research. 2. Detailed Methodology. Explaining how the research was conducted is crucial.

  6. Ten Research Challenge Areas in Data Science

    Abstract. To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak to the breadth of issues spanning science, technology, and society.

  7. Top 10 Essential Data Science Topics to Real-World Application From the

    The top level of analytics in Figure 1, prescriptive analytics for decision making, tends to be under-focused in statistics and data science programs. Recall the marketing 4P's in Table 1, if we can answer these questions causally , we can use the results to optimize , as listed in the last column of the table, linking predictive and ...

  8. Analytics and data science

    Analytics and data science Magazine Article. Dominique M. Hanssens. Daniel Thorpe. Carl Finkbeiner. Wachovia has a long-term goal to build customer equity and short-term decisions to make on how ...

  9. Top 20 Latest Research Problems in Big Data and Data Science

    General big data research topics [3] are in the lines of: Scalability — Scalable Architectures for parallel data processing; Real-time big data analytics — Stream data processing of text, image, and video; Cloud Computing Platforms for Big Data Adoption and Analytics — Reducing the cost of complex analytics in the cloud; Security and ...

  10. Research Areas

    The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data ...

  11. Data Analytics: Definition, Uses, Examples, and More

    Data analytics jobs. Typically, data analytics professionals make higher-than-average salaries and are in high demand within the labor market. The US Bureau of Labor Statistics (BLS) projects that careers in data analytics fields will grow by 23 percent between 2022 and 2032—much faster than average—and are estimated to pay a higher-than-average annual income of $85,720 [].

  12. 5 Top Topics of Interest for Research in Data Science and Analytics

    The field of data science is constantly evolving, and new research topics are emerging all the time. Here are some of the current topics of interest for research in data science and analytics ...

  13. Frontiers in Big Data

    Machine Learning and Cutting-Edge Tools for Prediction and Treatment Strategies of Dementia and Associated Diseases. This innovative journal focuses on the power of big data - its role in machine learning, AI, and data mining, and its practical application from cybersecurity to climate science and public health.

  14. Data Science and Analytics: An Overview from Data-Driven Smart

    Introduction. We are living in the age of "data science and advanced analytics", where almost everything in our daily lives is digitally recorded as data [].Thus the current electronic world is a wealth of various kinds of data, such as business data, financial data, healthcare data, multimedia data, internet of things (IoT) data, cybersecurity data, social media data, etc [].

  15. Ten Research Challenge Areas in Data Science

    J.M. Wing, " Ten Research Challenge Areas in Data Science ," Voices, Data Science Institute, Columbia University, January 2, 2020. arXiv:2002.05658. Jeannette M. Wing is Avanessians Director of the Data Science Institute and professor of computer science at Columbia University. December 30, 2019.

  16. Emerging trends and global scope of big data analytics: a scientometric

    Recent topics of research related to data analytics were published from 2018 to 2019. In this period, the publications related to the research area have gradually focused on the learning methods, issues, and data analytics challenges. Some recent topics that rapidly developed in this period are machine learning, IoT, health care, deep learning ...

  17. Research on the topic of data analytics

    Research on the topic of data analytics; Finding research on data analytics. Data analytics is a very new area of work and not really a discipline of its own. This makes finding scholarly work about data analytics kind of tricky. Much of the work is done by computer scientists, so a good bit of the literature can be found in the these databases

  18. 214 Big Data Research Topics: Interesting Ideas To Try

    These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars. Evaluate the data mining process. The influence of the various dimension reduction methods and techniques. The best data classification methods. The simple linear regression modeling methods.

  19. Hot Topics in Research Methods: Big Data Analysis

    Use this interactive dataset—a subset of data from the 2012 Cooperative Congressional National Election Study—to learn the method of logistic regression with the software program R. Logistic regression, or logit, is a widely used method of analysis in social science and is the foundation for more complex methods in big data analytics.

  20. The role of data science and data analytics for innovation: a

    1. Introduction. Methods of data analytics, data science, business analytics, and big data - which we collectively refer to as Data Science/Data Analytics (DS/DA) - are increasingly important topics in business practice as well as for academia.

  21. Data and Business Analytics

    Data and Business Analytics Forging the Future of Finance For over a century, Columbia Business School has stood at the forefront of new ideas, research, and best practices in finance and the capital markets.

  22. What Is Data and Analytics: Everything You Need to Know

    The role of data and analytics is to equip businesses, their employees and leaders to make better decisions and improve decision outcomes. This applies to all types of decisions, including macro, micro, real-time, cyclical, strategic, tactical and operational. At the same time, D&A can unearth new questions, as well as innovative solutions and ...

  23. Research a Topic

    Suggested e-books: Data Analytics - Research Methods. Data Analysis by Michael S. Lewis-Beck. Call Number: Available Online. ISBN: 9780803957725. Publication Date: 1995-01-17. Rigorous Data Analysis by Wolff-Michael Roth. Call Number: Available Online. ISBN: 9789462099982. Publication Date: 2015-02-28.

  24. The High Cost of Misaligned Business and Analytics Goals

    In our previous research, we found that capitalizing on data and analytics requires creating a data culture, obtaining senior leadership commitment, acquiring data and analytics skills and ...

  25. New Software Tool Generates Data Analysis for Complex ToF-SIMS

    Compared to other software packages available, the advantage of the new Python-based package is that a detailed Microsoft Word format data analysis report can be automatically generated, greatly facilitating less experienced users in understanding chemical differences among samples. This saves users a lot of time in data report processing.

  26. Research Topics

    Coordinate Descent. Covariance Estimation. Crowd Computing. Data Clustering. Deep Learning. Divide-and-Conquer Methods. Eigenvalue Decomposition. Empirical Risk Minimization. Gene-Disease Prediction.

  27. A Systematic Literature Review and Bibliometric Analysis of Semantic

    Recent advancements in deep learning have spurred the development of numerous novel semantic segmentation models for land cover mapping, showcasing exceptional performance in delineating precise boundaries and producing highly accurate land cover maps. However, to date, no systematic literature review has comprehensively examined semantic segmentation models in the context of land cover ...

  28. Analysis of data suggests homosexual behavior in other animals is far

    A team of anthropologists and biologists from Canada, Poland, and the U.S., working with researchers at the American Museum of Natural History, in New York, has found via meta-analysis of data ...

  29. ISPOR

    OBJECTIVES : Blue Cross and Blue Shield of Louisiana (BCBSLA) analyzed Emergency Department (ED) utilization among race/ethnicity groups to better understand current practices across the state. Researchers observed higher ED utilization among Black/African American members, which led to an investigation of drivers such as Social Vulnerability Index (SVI), medication adherence, health ...