Logo for Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Research Guides

Multiple Case Studies

Nadia Alqahtani and Pengtong Qu

Description

The case study approach is popular across disciplines in education, anthropology, sociology, psychology, medicine, law, and political science (Creswell, 2013). It is both a research method and a strategy (Creswell, 2013; Yin, 2017). In this type of research design, a case can be an individual, an event, or an entity, as determined by the research questions. There are two variants of the case study: the single-case study and the multiple-case study. The former design can be used to study and understand an unusual case, a critical case, a longitudinal case, or a revelatory case. On the other hand, a multiple-case study includes two or more cases or replications across the cases to investigate the same phenomena (Lewis-Beck, Bryman & Liao, 2003; Yin, 2017). …a multiple-case study includes two or more cases or replications across the cases to investigate the same phenomena

The difference between the single- and multiple-case study is the research design; however, they are within the same methodological framework (Yin, 2017). Multiple cases are selected so that “individual case studies either (a) predict similar results (a literal replication) or (b) predict contrasting results but for anticipatable reasons (a theoretical replication)” (p. 55). When the purpose of the study is to compare and replicate the findings, the multiple-case study produces more compelling evidence so that the study is considered more robust than the single-case study (Yin, 2017).

To write a multiple-case study, a summary of individual cases should be reported, and researchers need to draw cross-case conclusions and form a cross-case report (Yin, 2017). With evidence from multiple cases, researchers may have generalizable findings and develop theories (Lewis-Beck, Bryman & Liao, 2003).

Creswell, J. W. (2013). Qualitative inquiry and research design: Choosing among five approaches (3rd ed.). Los Angeles, CA: Sage.

Lewis-Beck, M., Bryman, A. E., & Liao, T. F. (2003). The Sage encyclopedia of social science research methods . Los Angeles, CA: Sage.

Yin, R. K. (2017). Case study research and applications: Design and methods . Los Angeles, CA: Sage.

Key Research Books and Articles on Multiple Case Study Methodology

Yin discusses how to decide if a case study should be used in research. Novice researchers can learn about research design, data collection, and data analysis of different types of case studies, as well as writing a case study report.

Chapter 2 introduces four major types of research design in case studies: holistic single-case design, embedded single-case design, holistic multiple-case design, and embedded multiple-case design. Novice researchers will learn about the definitions and characteristics of different designs. This chapter also teaches researchers how to examine and discuss the reliability and validity of the designs.

Creswell, J. W., & Poth, C. N. (2017). Qualitative inquiry and research design: Choosing among five approaches . Los Angeles, CA: Sage.

This book compares five different qualitative research designs: narrative research, phenomenology, grounded theory, ethnography, and case study. It compares the characteristics, data collection, data analysis and representation, validity, and writing-up procedures among five inquiry approaches using texts with tables. For each approach, the author introduced the definition, features, types, and procedures and contextualized these components in a study, which was conducted through the same method. Each chapter ends with a list of relevant readings of each inquiry approach.

This book invites readers to compare these five qualitative methods and see the value of each approach. Readers can consider which approach would serve for their research contexts and questions, as well as how to design their research and conduct the data analysis based on their choice of research method.

Günes, E., & Bahçivan, E. (2016). A multiple case study of preservice science teachers’ TPACK: Embedded in a comprehensive belief system. International Journal of Environmental and Science Education, 11 (15), 8040-8054.

In this article, the researchers showed the importance of using technological opportunities in improving the education process and how they enhanced the students’ learning in science education. The study examined the connection between “Technological Pedagogical Content Knowledge” (TPACK) and belief system in a science teaching context. The researchers used the multiple-case study to explore the effect of TPACK on the preservice science teachers’ (PST) beliefs on their TPACK level. The participants were three teachers with the low, medium, and high level of TPACK confidence. Content analysis was utilized to analyze the data, which were collected by individual semi-structured interviews with the participants about their lesson plans. The study first discussed each case, then compared features and relations across cases. The researchers found that there was a positive relationship between PST’s TPACK confidence and TPACK level; when PST had higher TPACK confidence, the participant had a higher competent TPACK level and vice versa.

Recent Dissertations Using Multiple Case Study Methodology

Milholland, E. S. (2015). A multiple case study of instructors utilizing Classroom Response Systems (CRS) to achieve pedagogical goals . Retrieved from ProQuest Dissertations & Theses Global. (Order Number 3706380)

The researcher of this study critiques the use of Classroom Responses Systems by five instructors who employed this program five years ago in their classrooms. The researcher conducted the multiple-case study methodology and categorized themes. He interviewed each instructor with questions about their initial pedagogical goals, the changes in pedagogy during teaching, and the teaching techniques individuals used while practicing the CRS. The researcher used the multiple-case study with five instructors. He found that all instructors changed their goals during employing CRS; they decided to reduce the time of lecturing and to spend more time engaging students in interactive activities. This study also demonstrated that CRS was useful for the instructors to achieve multiple learning goals; all the instructors provided examples of the positive aspect of implementing CRS in their classrooms.

Li, C. L. (2010). The emergence of fairy tale literacy: A multiple case study on promoting critical literacy of children through a juxtaposed reading of classic fairy tales and their contemporary disruptive variants . Retrieved from ProQuest Dissertations & Theses Global. (Order Number 3572104)

To explore how children’s development of critical literacy can be impacted by their reactions to fairy tales, the author conducted a multiple-case study with 4 cases, in which each child was a unit of analysis. Two Chinese immigrant children (a boy and a girl) and two American children (a boy and a girl) at the second or third grade were recruited in the study. The data were collected through interviews, discussions on fairy tales, and drawing pictures. The analysis was conducted within both individual cases and cross cases. Across four cases, the researcher found that the young children’s’ knowledge of traditional fairy tales was built upon mass-media based adaptations. The children believed that the representations on mass-media were the original stories, even though fairy tales are included in the elementary school curriculum. The author also found that introducing classic versions of fairy tales increased children’s knowledge in the genre’s origin, which would benefit their understanding of the genre. She argued that introducing fairy tales can be the first step to promote children’s development of critical literacy.

Asher, K. C. (2014). Mediating occupational socialization and occupational individuation in teacher education: A multiple case study of five elementary pre-service student teachers . Retrieved from ProQuest Dissertations & Theses Global. (Order Number 3671989)

This study portrayed five pre-service teachers’ teaching experience in their student teaching phase and explored how pre-service teachers mediate their occupational socialization with occupational individuation. The study used the multiple-case study design and recruited five pre-service teachers from a Midwestern university as five cases. Qualitative data were collected through interviews, classroom observations, and field notes. The author implemented the case study analysis and found five strategies that the participants used to mediate occupational socialization with occupational individuation. These strategies were: 1) hindering from practicing their beliefs, 2) mimicking the styles of supervising teachers, 3) teaching in the ways in alignment with school’s existing practice, 4) enacting their own ideas, and 5) integrating and balancing occupational socialization and occupational individuation. The study also provided recommendations and implications to policymakers and educators in teacher education so that pre-service teachers can be better supported.

Multiple Case Studies Copyright © 2019 by Nadia Alqahtani and Pengtong Qu is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Multiple Case Study Data Analysis for Doctoral Researchers in Management and Leadership

12 Pages Posted: 25 Apr 2023

Daphne Halkias

EIM European Institute of Management

Michael Neubert

Nick harkiolakis.

New England College

Date Written: April 19, 2023

Multiple cases may be conducted for several reasons: they extend emergent theory, fill theoretical categories, provide examples of polar types, or replicate previously selected cases to discover new theoretical directions. However, the qualitative data analysis process for multiple case studies is a multi-step process that can be challenging for doctoral researchers. This article thus outlines the qualitative data analysis process for a doctoral-level multiple case study in management and leadership, including conducting descriptive coding and cross-case synthesis and ensuring the trustworthiness of multiple case study results. The descriptive coding strategy analyzes the raw data collected to assign meaning to the data segment, allowing Using the descriptive coding strategy allows for the emergence of words and phrases for further categorization and thematic analysis. Cross-case synthesis involves comparing and contrasting cases rather than just analyzing individual cases for content analysis. Combined with within-case analysis, it offers a more consistent platform for generating theoretical propositions and extending theory. Finally, four quality criteria of trustworthiness recommended by seminal qualitative methodologists Lincoln and Guba (1986) are reviewed, along with the preferred methods and the authors’ recommendations to strengthen the trustworthiness of a multiple case study. Case study researchers will continue to play a pivotal role in offering a voice as to how people, places, and events continually shape and reshape today’s business and technology transactions across nations’ regional and local communities.

Keywords: Qualitative, data analysis, multiple case study, doctoral researcher, management, leadership, descriptive coding, cross-case synthesis, trustworthiness, methodology]

Suggested Citation: Suggested Citation

Daphne Halkias (Contact Author)

Eim european institute of management ( email ).

66, Old Theatre Street, Valetta, VLT 1427 Malta +306932492344 (Phone)

New England College ( email )

Do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, political methods: qualitative & multiple methods ejournal.

Subscribe to this fee journal for more curated articles on this topic

Emerging Research within Organizational Behavior eJournal

Computation theory ejournal.

Multiple Case Research Design

  • First Online: 10 November 2021

Cite this chapter

data analysis for multiple case study

  • Stefan Hunziker 3 &
  • Michael Blankenagel 3  

5358 Accesses

6 Citations

This chapter addresses the peculiarities, characteristics, and major fallacies of multiple case research designs. The major advantage of multiple case research lies in cross-case analysis. A multiple case research design shifts the focus from understanding a single case to the differences and similarities between cases. Thus, it is not just conducting more (second, third, etc.) case studies. Rather, it is the next step in developing a theory about factors driving differences and similarities. Also, researchers find relevant information on how to write a multiple case research design paper and learn about typical methodologies used for this research design. The chapter closes with referring to overlapping and adjacent research designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bruns, W. J., & McKinnon, S. M. (1993). Information and managers: A field study. Journal of Management Accounting Research, 5 , 84–108.

Google Scholar  

Eisenhardt, K. M., & Graebner, M. E. (2007). Theory building from cases: Opportunities and challenges. Academy of Management Journal, 50 (1), 25–32.

Article   Google Scholar  

Ferreira, L. D. & Merchant, K. A. (1992). Field research in management accounting and control: A review and evaluation . Emerald Group Publishing Limited.

Keating, P. J. (1995). A framework for classifying and evaluating the theoretical contributions of case research in management accounting. Journal of Management Accounting Research, 7 , 66–86.

Lillis, A. M., & Mundy, J. (2005). Cross-sectional field studies in management accounting research—closing the gaps between surveys and case studies. Journal of Management Accounting Research, 17 (1), 119–141.

Ragin, C. C. (2009). Reflections on casing and case-oriented research (pp. 522–534). The Sage handbook of case-based method.

Ridder, H.-G. (2017). The theory contribution of case study research designs. Business Research, 10 (2), 281–305.

Stake, R. E. (2005). Qualitative case studies. In N.K. Denzin & Y.S. Lincoln (Eds.), The SAGE handbook of qualitative research (3rd ed., pp. 443–466).

Vaughan, D. (1992). Theory elaboration: The heuristics of case analysis. What is a case?. In C.C. Ragin & H.S. Becker (Eds.), Exploring the foundations of social inquiry (pp. 173–202). Cambridge University Press.

Walsham, G. (2006). Doing interpretive research. European Journal of Information Systems, 15 (3), 320–330.

Yin, R. K. (2014). Case study research. Design and methods (5th ed.). SAGE.

Download references

Author information

Authors and affiliations.

Wirtschaft/IFZ – Campus Zug-Rotkreuz, Hochschule Luzern, Zug-Rotkreuz, Zug , Switzerland

Stefan Hunziker & Michael Blankenagel

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Stefan Hunziker .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature

About this chapter

Hunziker, S., Blankenagel, M. (2021). Multiple Case Research Design. In: Research Design in Business and Management. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-34357-6_9

Download citation

DOI : https://doi.org/10.1007/978-3-658-34357-6_9

Published : 10 November 2021

Publisher Name : Springer Gabler, Wiesbaden

Print ISBN : 978-3-658-34356-9

Online ISBN : 978-3-658-34357-6

eBook Packages : Business and Economics (German Language)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Qualitative case study data analysis: an example from practice

Affiliation.

  • 1 School of Nursing and Midwifery, National University of Ireland, Galway, Republic of Ireland.
  • PMID: 25976531
  • DOI: 10.7748/nr.22.5.8.e1307

Aim: To illustrate an approach to data analysis in qualitative case study methodology.

Background: There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods: Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion: By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice: This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Keywords: Case study data analysis; case study research methodology; clinical skills research; qualitative case study methodology; qualitative data analysis; qualitative research.

PubMed Disclaimer

Similar articles

  • Selection, collection and analysis as sources of evidence in case study research. Houghton C, Casey D, Smyth S. Houghton C, et al. Nurse Res. 2017 Mar 22;24(4):36-41. doi: 10.7748/nr.2017.e1482. Nurse Res. 2017. PMID: 28326917
  • Rigour in qualitative case-study research. Houghton C, Casey D, Shaw D, Murphy K. Houghton C, et al. Nurse Res. 2013 Mar;20(4):12-7. doi: 10.7748/nr2013.03.20.4.12.e326. Nurse Res. 2013. PMID: 23520707
  • Using Framework Analysis in nursing research: a worked example. Ward DJ, Furber C, Tierney S, Swallow V. Ward DJ, et al. J Adv Nurs. 2013 Nov;69(11):2423-31. doi: 10.1111/jan.12127. Epub 2013 Mar 21. J Adv Nurs. 2013. PMID: 23517523
  • Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review. Chilcott J, Tappenden P, Rawdin A, Johnson M, Kaltenthaler E, Paisley S, Papaioannou D, Shippam A. Chilcott J, et al. Health Technol Assess. 2010 May;14(25):iii-iv, ix-xii, 1-107. doi: 10.3310/hta14250. Health Technol Assess. 2010. PMID: 20501062 Review.
  • Qualitative case study methodology in nursing research: an integrative review. Anthony S, Jack S. Anthony S, et al. J Adv Nurs. 2009 Jun;65(6):1171-81. doi: 10.1111/j.1365-2648.2009.04998.x. Epub 2009 Apr 3. J Adv Nurs. 2009. PMID: 19374670 Review.
  • How a National Organization Works in Partnership With People Who Have Lived Experience in Mental Health Improvement Programs: Protocol for an Exploratory Case Study. Robertson C, Hibberd C, Shepherd A, Johnston G. Robertson C, et al. JMIR Res Protoc. 2024 Apr 19;13:e51779. doi: 10.2196/51779. JMIR Res Protoc. 2024. PMID: 38640479 Free PMC article.
  • Implementation of an office-based addiction treatment model for Medicaid enrollees: A mixed methods study. Treitler P, Enich M, Bowden C, Mahone A, Lloyd J, Crystal S. Treitler P, et al. J Subst Use Addict Treat. 2024 Jan;156:209212. doi: 10.1016/j.josat.2023.209212. Epub 2023 Nov 5. J Subst Use Addict Treat. 2024. PMID: 37935350
  • Using the quadruple aim to understand the impact of virtual delivery of care within Ontario community health centres: a qualitative study. Bhatti S, Dahrouge S, Muldoon L, Rayner J. Bhatti S, et al. BJGP Open. 2022 Dec 20;6(4):BJGPO.2022.0031. doi: 10.3399/BJGPO.2022.0031. Print 2022 Dec. BJGP Open. 2022. PMID: 36109022 Free PMC article.
  • The components of diabetes educator's competence in diabetes self-management education in Iran: A qualitative study. Kashani F, Abazari P, Haghani F. Kashani F, et al. J Educ Health Promot. 2021 Mar 31;10:111. doi: 10.4103/jehp.jehp_912_20. eCollection 2021. J Educ Health Promot. 2021. PMID: 34084858 Free PMC article.
  • Minimally disruptive medicine (MDM) in clinical practice: a qualitative case study of the human immunodeficiency virus (HIV) clinic care model. Abu Dabrh AM, Boehmer KR, Shippee N, Rizza SA, Perlman AI, Dick SR, Behnken EM, Montori VM. Abu Dabrh AM, et al. BMC Health Serv Res. 2021 Jan 6;21(1):24. doi: 10.1186/s12913-020-06010-x. BMC Health Serv Res. 2021. PMID: 33407451 Free PMC article.
  • Search in MeSH
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Privacy Policy

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.

Observations

Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Quantitative Research

Quantitative Research – Methods, Types and...

Experimental Research Design

Experimental Design – Types, Methods, Guide

Mixed Research methods

Mixed Methods Research – Types & Analysis

Research Methods

Research Methods – Types, Examples and Guide

Exploratory Research

Exploratory Research – Types, Methods and...

Applied Research

Applied Research – Types, Methods and Examples

data analysis for multiple case study

1st Edition

The Multiple Case Study Design Methodology and Application for Management Education

VitalSource Logo

  • Taylor & Francis eBooks (Institutional Purchase) Opens in new tab or window

Description

Most organizations today operate in volatile economic and social environments and qualitative research plays an essential role in investigating leadership and management problems. This unique volume offers novice and experienced researchers a brief, student-centric research methods text specifically devoted to the multiple case study design. The multiple case study design is a valuable qualitative research tool in studying the links between the personal, social, behavioral, psychological, organizational, cultural, and environmental factors that guide organizational and leadership development. Case study research is essential for the in-depth study of participants' perspectives on the phenomenon within its natural context. Rigorously designed management and leadership case studies in the extant literature have a central focus on individual managers' and leaders' stories and their perceptions of the broader forces operating within and outside their organizations. This is a comprehensive methodology book exploring the multiple case study design with step-by-step and easily accessible guidelines on the topic, making it especially valuable to researchers, academics, and students in the areas of business, management, and leadership.

Table of Contents

Daphne Halkias is Professor and Distinguished Research Fellow at École des Ponts Business School in Paris, France. Michael Neubert is Associate Professor in Business and Management Studies and a Member of the Academic Council at UIBS in Zurich, Switzerland. Paul W. Thurman is Professor of Management and Analytics at Columbia University's Mailman School of Public Health, New York, USA. Nicholas Harkiolakis is on the Faculty of the School of Technology at Northcentral University, San Diego, California, USA.

About VitalSource eBooks

VitalSource is a leading provider of eBooks.

  • Access your materials anywhere, at anytime.
  • Customer preferences like text size, font type, page color and more.
  • Take annotations in line as you read.

Multiple eBook Copies

This eBook is already in your shopping cart. If you would like to replace it with a different purchasing option please remove the current eBook option from your cart.

Book Preview

data analysis for multiple case study

The country you have selected will result in the following:

  • Product pricing will be adjusted to match the corresponding currency.
  • The title Perception will be removed from your cart because it is not available in this region.

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

data_science_project

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

user profile

Abhinav Agarwal

Graduate Student at Northwestern University

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Breadcrumbs Section. Click here to navigate to respective pages.

The Multiple Case Study Design

The Multiple Case Study Design

DOI link for The Multiple Case Study Design

Get Citation

Most organizations today operate in volatile economic and social environments and qualitative research plays an essential role in investigating leadership and management problems. This unique volume offers novice and experienced researchers a brief, student-centric research methods text specifically devoted to the multiple case study design.

The multiple case study design is a valuable qualitative research tool in studying the links between the personal, social, behavioral, psychological, organizational, cultural, and environmental factors that guide organizational and leadership development. Case study research is essential for the in-depth study of participants' perspectives on the phenomenon within its natural context. Rigorously designed management and leadership case studies in the extant literature have a central focus on individual managers' and leaders' stories and their perceptions of the broader forces operating within and outside their organizations.

This is a comprehensive methodology book exploring the multiple case study design with step-by-step and easily accessible guidelines on the topic, making it especially valuable to researchers, academics, and students in the areas of business, management, and leadership.

TABLE OF CONTENTS

Chapter 1 | 6  pages, a refresher on the philosophical foundations of academic research, chapter 2 | 6  pages, research methodologies, chapter 3 | 3  pages, the role of theory in qualitative research, chapter 4 | 6  pages, how does the novice researcher design a multiple case study, chapter 5 | 5  pages, the advantage of the multiple case study design for management researchers, chapter 6 | 6  pages, applying data collection methods in multiple case study research, chapter 7 | 9  pages, the data analysis process for multiple case study research, chapter 8 | 3  pages, extending theory with multiple case study design, chapter 9 | 7  pages, incorporating multiple case design and methodologies into teaching and professional practice, chapter 10 | 9  pages, writing and publishing multiple case study research, chapter 11 | 2  pages, concluding thoughts.

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

  • Submit a Manuscript
  • Advanced search

American Journal of Neuroradiology

American Journal of Neuroradiology

Advanced Search

Does Long-term Surveillance Imaging Improve Survival in Patients Treated for Head and Neck Squamous Cell Carcinoma? A Systematic Review of the Current Evidence

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Info & Metrics

This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.

BACKGROUND: Long-term post-treatment surveillance imaging algorithms for head and neck squamous cell carcinoma are not standardized due to debates over optimal surveillance strategy and efficacy. Consequently, current guidelines do not provide long-term surveillance imaging recommendations beyond 6 months.

PURPOSE: We performed a systematic review to evaluate the impact of long-term imaging surveillance (i.e., imaging beyond 6 months following treatment completion) on survival in patients treated definitively for head and neck squamous cell carcinoma.

DATA SOURCES: A search was conducted on PubMed, Embase, Scopus, the Cochrane Central Register of Controlled Trials, and Web of Science for English literature published between 2003 and 2024 evaluating the impact of long-term surveillance imaging on survival in patients with head and neck squamous cell carcinoma.

STUDY SELECTION: 718 abstracts were screened and 9 5 underwent full-text review, with 2 articles meeting inclusion criteria. The Risk of Bias in Non-randomized Studies of Interventions assessment tool was used.

DATA ANALYSIS: A qualitative assessment without a pooled analysis was performed for the two studies meeting inclusion criteria.

DATA SYNTHESIS: No randomized prospective controlled trials were identified. Two retrospective two-arm studies were included comparing long-term surveillance imaging with clinical surveillance and were each rated as having moderate risk of bias. Each study included heterogeneous populations with variable risk profiles and imaging surveillance protocols. Both studies investigated the impact of long-term surveillance imaging on overall survival and came to a different conclusion with one study reporting a survival benefit for long-term surveillance imaging with FDG PET/CT in patients with stage III or IV disease or an oropharyngeal primary tumor and the other study demonstrating no survival benefit.

LIMITATIONS: Limited heterogeneous retrospective data available precludes definitive conclusions on the impact of long-term surveillance imaging in head and neck squamous cell carcinoma.

CONCLUSIONS: There is insufficient quality evidence regarding the impact of long-term surveillance imaging on survival in patients treated definitively for head and neck squamous cell carcinoma. There is a lack of standardized definition of long-term surveillance, variable surveillance protocols, and inconsistencies in results reporting, underscoring the need for a prospective multi-center registry assessing outcomes.

ABBREVIATIONS: HNSCC = Head and Neck Squamous Cell Carcinoma; RT= radiotherapy; NCCN = National Comprehensive Cancer Network; MPC = metachronous primary cancer; CR = complete response; OS = overall survival; CRT = chemoradiotherapy; HPV = human papillomavirus; PFS = progression-free survival; CFU = clinical follow up; NI-RADS = Neck Imaging Reporting and Data System.

The authors declare no conflicts of interest related to the content of this article.

  • © 2024 by American Journal of Neuroradiology

Log in using your username and password

Thank you for your interest in spreading the word on American Journal of Neuroradiology.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager

del.icio.us logo

  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

Related articles.

  • No related articles found.
  • Google Scholar

Cited By...

  • No citing articles found.

This article has not yet been cited by articles in journals that are participating in Crossref Cited-by Linking.

More in this TOC Section

  • Endovascular Thrombectomy for Carotid Pseudo-occlusion in the Setting of Acute Ischemic Stroke: A Comparative Systematic Review and Meta-Analysis
  • Double stent-retriever technique for mechanical thrombectomy: a systematic review and meta-analysis

Similar Articles

  • Search Menu
  • Sign in through your institution
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Briefings in Bioinformatics
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, background and methods, limitations, acknowledgements, data availability.

  • < Previous

Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Md Mamunur Rashid, Kumar Selvarajoo, Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data, Briefings in Bioinformatics , Volume 25, Issue 4, July 2024, bbae300, https://doi.org/10.1093/bib/bbae300

  • Permissions Icon Permissions

The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class–specific feature selection algorithms, which identifies multi-modal and -omics–associated interpretable components. MOMLIN was applied to 147 patients’ breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context–specific multi-omics network biomarkers and better predict drug-response classifications.

The advent of high-throughput sequencing technologies has revolutionized our ability to collect various ‘omics’ data types, such as deoxyribonucleic acid (DNA) methylations, ribonucleic acid (RNA) expressions, proteomics, metabolomics and bioimaging datasets, from the same samples or patients with unprecedented details [ 1 ]. By far, most studies have performed single omics analytics, which capture only a fraction of biological complexity. The integration of these multiple omics datasets offers a more comprehensive understanding of the underlying complex biological processes than single-omic analyses, particularly in human diseases like cancer and cardiovascular disease, where it significantly enhances prediction of clinical outcomes [ 2 , 3 ].

Cancer is a highly complex and deadly disease if left unchecked, and its heterogeneity poses significant challenges for treatment [ 4 ]. Standard treatments, including chemotherapy with or without targeted therapies, aim to reduce tumor burden and improve patient outcomes such as survival rate and quality of life [ 5–7 ]. However, even for the most advanced therapies, such as immunotherapies, treatment effectiveness varies widely across cancer types and even between patients with same diagnosis [ 8 ]. This heterogeneity is believed to be due to tumor microenvironment heterogeneity and their effects on the resultant complex and myriad molecular interactions within cells and tissues [ 9 , 10 ]. This variability underscores the urgent need to identify precise biomarkers to predict individual patient responses and potential adverse reactions to a particular therapy [ 11 ]. This can be made possible through multi-omics data integration analyses at the individual patient scale [ 12 ].

To assess treatment response, such as pathologic complete response (pCR) and residual cancer burden (RCB), current clinical practice relies on clinical parameters (e.g. tumor size/volume and hormone receptor status), along with genetic biomarkers (e.g. TP53 mutations) [ 13–15 ]. However, these approaches do not fully capture the complex intracellular regulatory dynamics [ 16 , 17 ] or the tumor-immune microenvironment (TiME) interactions that influence outcomes [ 18 , 19 ]. Thus, to enhance personalized cancer treatments, we need novel methodologies that can handle large, complex molecular (omics) and clinical datasets. Machine learning (ML) methods integrating multi-omics data offer a promising avenue to improve prediction accuracy and uncover robust biomarkers across drug-response classes [ 20 ], which may be overlooked by single-omics analytics. This approach can predict patients benefiting from standard treatments and those requiring alternative plans like combination therapies or clinical trials.

The current drug-response prediction methods can be broadly categorized into ML-based and network-based approaches. ML methods often analyze each data type (e.g. mutations and gene expression) independently using univariable selection [ 21 , 22 ] or dimension reduction methods [ 23 ]. These results are then integrated using various classifiers or regressors [e.g. support vector machine, elastic-net regressor, logistic regression (LR) and random forest (RF)] [ 24–26 ] and ensemble classifier to make predictions [ 9 ]. However, these methods often overlooked the crucial interactions among different data modalities. Deep learning methods, while gaining popularity, are limited by the need for large clinical sample sizes to achieve sufficient accuracy [ 27 ]. Recent ML advancements have focused on integrating multimodal omics features with patient phenotypes to improve predictive performance [ 28 , 29 ]. To discover multimodal biomarker, techniques such as multi-omics factor analysis (MOFA) and sparse canonical correlation analysis (SCCA), including its variant multiset SCCA (SMCCA) offer realistic strategies for integrating diverse data modalities [ 30–32 ]. However, although these methods are suitable for classification tasks, they are unsupervised and do not directly incorporate phenotypic information (e.g. disease status) to integrate diverse data types. As a result, they are limited to identify phenotype-specific biomarkers.

Recently, advanced supervised approaches like data integration analysis for biomarker discovery using latent components (DIABLO) by Sing et al. (2019) have emerged to overcome these limitations [ 28 ]. DIABLO is an extension of generalized SCCA (GSCCA), considers cross-modality relationships and extracts a set of common factors associated with different response categories. Network-based methods, like unsupervised network fusion or random walk with restart approaches construct drug–target interaction and sample similarity networks that are effective for patient stratification [ 20 , 33 ]. However, these methods lack a specific feature selection design, limiting their utility for identifying biomarkers for patient classification. Nevertheless, none of these ML methods are rigorous in terms of task/class-specific biomarker discovery and interpretability, and both SMCCA and GSCCA struggle with gradient dominance problem due to naive data fusion strategies [ 34 ]. Therefore, it is essential to develop novel interpretable methods for identifying robust multimodal network biomarkers across diverse data types to advance our understanding of the complex factors that influence drug responses.

In this study, we introduce MOMLIN, a multi-modal and -omics ML integration framework to enhance the prediction of anticancer drug responses. MOMLIN integrates weighted multi-class SCCA (WMSCCA) that identifies interpretable components and enables effective feature selection across multi-modal and -omics datasets. Our method contributes in three keyways: (i) innovates a class-specific feature selection strategy with SCCA methods for associating multimodal biomarkers, (ii) includes an adaptive weighting scheme into multiple pairwise SCCA models to balance the influence of different data modalities, preventing dominance during training process and (iii) ensures robust feature selection by employing a combined constraint mechanism that integrate lasso and GraphNet constraints to select both the individual features and subset of co-expressed features, thereby preventing overfitting to high-dimensional data.

We applied MOMLIN to a multimodal breast cancer (BC) dataset of 147 patients comprising clinical features, DNA mutation, RNA expression, tumor microenvironment and molecular pathway data [ 9 ], to predict drug-response classes, specifically distinguishing responders and non-responders. Our results demonstrate MOMLIN’s superiority in terms of outperforming state-of-the-art methods and interpretability of the underlying biological mechanisms driving these distinct response classes.

Overview of our proposed method for treatment response prediction

The workflow of our proposed method MOMLIN for identifying class- or task-specific biomarkers from multimodal data is shown in Fig. 1 . The core of this pipeline involves three stages: (i) identification of response-specific sparse components, in terms of input features and patients, (ii) development of drug-response predictor using latent components of patients and (iii) interpretation of sparse components and multi-modal and -omics biomarker discovery.

Schematic representation of the proposed framework. In stage 1, multimodal datasets from cancer patients (e.g. BC) were sourced from a published study [9]. This dataset comprises clinical features, DNA mutations, and gene expression from pre-treatment tumors, alongside post-treatment response classes (pCR, RCB-I to III). TiME and pathway activity were derived from transcriptomic data using statistical algorithms. For identifying class-specific correlated biomarkers, class binarization and oversampling were used to balance between classes. WMSCCA models the multimodal associations across different biomarkers and identifies response-specific sparse components on diverse input features and patients. In stage 2, a binary LR classifier then utilizes these patient latent components for predicting response to therapies, evaluated by AUROC. Next in stage 3, class–specific sparse components are shown in a heatmap, highlighting key signatures (non-zero loading) in colors. Finally, the identified multi-modal and -omics signatures then formed a correlation network, revealing pathways associations with multi-modal and -omics biomarkers for each response class. Nodes with colors in the network indicate multimodal features.

Schematic representation of the proposed framework. In stage 1, multimodal datasets from cancer patients (e.g. BC) were sourced from a published study [ 9 ]. This dataset comprises clinical features, DNA mutations, and gene expression from pre-treatment tumors, alongside post-treatment response classes (pCR, RCB-I to III). TiME and pathway activity were derived from transcriptomic data using statistical algorithms. For identifying class-specific correlated biomarkers, class binarization and oversampling were used to balance between classes. WMSCCA models the multimodal associations across different biomarkers and identifies response-specific sparse components on diverse input features and patients. In stage 2, a binary LR classifier then utilizes these patient latent components for predicting response to therapies, evaluated by AUROC. Next in stage 3, class–specific sparse components are shown in a heatmap, highlighting key signatures (non-zero loading) in colors. Finally, the identified multi-modal and -omics signatures then formed a correlation network, revealing pathways associations with multi-modal and -omics biomarkers for each response class. Nodes with colors in the network indicate multimodal features.

The rationales underpinned of this approach is that effective biomarkers are: (i) response–related multimodal features including genes, cell types and pathways, and (ii) features that demonstrate prediction capabilities on unseen patients. The first stage, a ‘feature selection step’ that selects multimodal features on the generated sparse components based on their relevance to drug-response categories (pCR and RCB-I to III). Features with high loading identified are considered as potential biomarker candidates. The second stage, a ‘classification step’, validates these biomarkers by assessing their predictive power in distinguishing responders from non-responders to anticancer therapy; any predictions indicating chemo-resistant tumors should be considered for enrolment in clinical trials for novel therapies. The third stage, an ‘interpretation step,’ analyzes the candidate biomarkers in a multi-modal and-omics network associated with relevant biological pathways. This step aims to elucidate the underlying biological processes differentiating between drug–response phenotypes.

Stage 1. Identification of response-associated sparse components in terms of input features and patients

Multi-modal and -omics data overview and preparation.

This study utilized clinical attributes, DNA mutation and gene expression (transcriptome) data from147 matched samples of early and locally advanced BC patients (categorized as pCR, n  = 38, RCB-I, n  = 23, or RCB-II, n  = 61, or RCB-III, n  = 25), obtained from the TransNEO cohort at Cambridge University Hospitals NHS Foundation [ 9 ]. The dataset includes clinical attributes (8 features, summary attributes are available in Supplementary Table S1 available online at http://bib.oxfordjournals.org/ ), genomic features (31 DNA mutation genes, applying a strict criterion of genes mutated in at least 10 patients) and RNA-sequencing (RNA-Seq) features (18 393 genes), covering major BC subtypes-normal-like, basal-like, Her2, luminalA and luminalB. Although DNA mutation genes typically represent binary data, we used mutation frequencies to construct a mutation count matrix. Initial data pre-processing involved a log2 transformation on the RNA-Seq features after filtering out less informative features at 25th percentile (in terms of mean and standard deviation) using interquartile range. For integrative modeling, we used the top 40% of variable genes (3748 genes, based on median absolute deviation ranking) from the RNA-Seq datasets. Finally, each feature was normalized dividing by its Frobenius norm, adjusting the offset between high and low intensities across different data modalities.

To characterize TiME and pathway markers, we applied various statistical algorithms on the RNA-Seq data. The GSVA algorithm [ 35 ] calculated (i) the GGI gene sets [ 36 ] and (ii) STAT1 immune signature scores [ 37 ]. For immune cell enrichment, three methods were used: (i) MCPcounter [ 37 ] with voom-normalized RNA-Seq counts; (ii) enrichment over 14 cell types using 60 gene markers, employing log2-transformed geometric mean of transcript per million (TPM) expression [ 38 ]; and (iii) z -score scaling of cancer immunity parameters [ 39 ] to classify four immune processes (major histocompatibility complex molecules, immunomodulators, effector cells and suppressor cells). Additionally, the TIDE algorithm [ 40 ] computed T-cell dysfunction and exclusion metrics for each tumor sample using log2-transformed TPM matrix of counts, which can serve as a surrogate biomarker to predict the response to immune checkpoint blockade. Pathway activity scores for each tumor sample were computed using the GSVA algorithm with input gene sets from Reactome [ 41 ], PIP [ 42 ] and BioCarta databases within the MSigDB C2 pathway database [ 43 ].

Sparse multiset canonical correlation analysis

In this study, lowercase letters denote a vector, and uppercase ones denote matrices, respectively. The term |${\left\Vert .\right\Vert}_{1,1}$| denotes the matrix |${l}_1$| -norm, and |${\left\Vert .\right\Vert}_{gn}$| denotes the GraphNet regularization. The sparse multiset canonical correlation analysis (SMCCA) is an extension of dual-view SCCA, proposed to model associations among multiple types of datasets [ 31 ]. Given the multiple types of datasets, let |$X\in{\mathcal{R}}^{n\times p}$| represent gene expression data with |$p$| features, and |${Y}_k\in{\mathcal{R}}^{n\times{q}_k}$| represent the |$k$| -th data modality (e.g. clinical, DNA mutation and tumors microenvironment) with |${q}_k$| features. Both |$X$| and |${Y}_k$| have |$n$| samples, and |$k=\left(1,\dots, K\right)$|⁠ , where |$K$| denotes the number of different data modalities. The objective function of SMCCA is defined as follows:

where |$u$| and |${v}_k$| are the canonical weight vectors corresponding to |$X$| and |${Y}_k$|⁠ , indicating the importance of each respective biomarkers. The term |${\left\Vert .\right\Vert}_1$| represents the |${l}_1$| regularization to detect small subset of discriminative biomarkers and prevent model overfitting. |${\lambda}_u,{\lambda}_{vk}$| are non-negative tuning parameters balancing between the loss function and regularization terms. The term |${\left\Vert .\right\Vert}_2^2$| denotes the squared Euclidean norm to constraint weight vectors |$u$| and as unit length |${v}_k$|⁠ , respectively.

However, SMCCA has limitations: (i) it is naturally unsupervised, meaning SMCCA cannot leverage phenotypic information (e.g. disease status and drug-response classes); (ii) pairwise association among multiple data types can vary significantly and can lead to gradient dominance issues during optimization; and (iii) SMCCA mines a common subset of biomarkers for classifying different tasks, which diminishes its relevance, as each task might require distinct features sets.

Weighted multi-class sparse canonical correlation analysis

To address the above limitations, here we propose weighted multi-class SCCA (WMSCCA), a formal model for class/tasks-specific feature selection, different from the conventional SMCCA. Throughout this study, we used the terms tasks/classes/drug-response classes interchangeably. WMSCCA includes phenotypic information as an additional data type, employs a weighting scheme to resolve the gradient dominance issue and innovates traditional class–specific feature selection strategies through the one-versus-all strategies into its core objective function. In this study, the underlying motivation is WMSCCA can jointly identify drug-response class–specific multimodal biomarkers to improve drug-response prediction. For ease of presentation, we consider |$n$| patients with data matrices |${X}_c\in{\mathcal{R}}^{n\times p},{Y}_{ck}\in{\mathcal{R}}^{n\times{q}_k}$|⁠ , and |$Z\in{\mathcal{R}}^{n\times C}$| from C different drug-response classes. Here, |${X}_c$| denotes |$p$| features from gene expression datasets, |${Y}_{ck}$| denotes |${q}_k$| features from |$k$| -th data modality (e.g. mutation, clinical features, TiME and pathway activity), |${Z}_c$| denotes |$c$| response class, and |$k=\left(1,\dots, K\right)$|⁠ , |$K$| denotes the number of data modalities. The WMSCCA optimization problem can be formulated as follows:

where |$U\in{\mathcal{R}}^{p\times C},{V}_k\in{\mathcal{R}}^{q_k\times C}$| are canonical loading matrices correspond to |$X$| and |${Y}_k$|⁠ , representing the importance of candidate biomarkers for each class |$C$|⁠ , respectively. In this equation, the first term models associations among |$X$|⁠ , and |${Y}_k$| datasets; the second- and third terms correlate class labels |${Z}_c$| with |$X$| and |${Y}_k$| data modalities for each |${C}^{th}$| class, aiming to identify class-specific features and their relationships; |$\psi (U)$| and |$\psi \left({V}_k\right)$| represent sparsity constraints on |$U$| and |${V}_k$|⁠ , to select a subset of discriminative feature. As mentioned in Equation ( 1 ), to address gradient dominance, the adjusting weight parameter |${\sigma}_{xy}$|⁠ , |${\sigma}_{xz}$| and |${\sigma}_{yz}$| can be defined as:

where |$k=\left(1,\dots, K\right)$|⁠ , |$K$| denotes the number of data modalities. |${\sigma}_{..}$| adjusts a larger weight if the non-squared loss (denominator term) between datasets is small and vice versa.

Given high-dimensional datasets, the model in Equation ( 2 ) encounters an overfitting problem. Therefore, the use of a sparsity constraint is appropriate to address this issue. We hypothesized that gene expression biomarkers can be either single genes or co-expressed sets; thus, a combined penalty is designed for the |$X$| dataset. Therefore, |$\psi (U)$| for |$X$| takes the following form:

where, |${\mathrm{\alpha}}_u,\beta$| are nonnegative tuning parameters. |$\beta$| balances between the effect of co-expressed and individual feature selection. The first sparsity constraint is matrix |${l}_{1,1}$| -norm, which is defined as follows:

This penalty promotes class-specific features on |$U$|⁠ . The second sparsity constraint GraphNet regularization, defined as follows:

where |${L}_c$| represents the Laplacian matrices of the connectivity in |$\boldsymbol{X}$| matrices. The Laplacian matrix is defined as |$L=D-A$|⁠ , where |$D$| is the degree matrix of connectivity matrix |$A$| (e.g. gene co-expression or correlation network). This penalty term promotes a subset of connected features to discriminate each response on |$U$|⁠ .

Besides, neither every mutation marker nor every clinical/TiME/pathways involves in predicting response classes, therefore, the |${l}_{1,1}$| -norm is used on the |${Y}_k$| datasets to select individual markers, i.e. |$\psi \left({V}_k\right)$| for the |${\boldsymbol{Y}}_k$| data modalities take the following form:

where |${\mathrm{\alpha}}_{vk}$| is non-negative tuning parameter.

Finally, we obtained C pairs of canonical weight matrices |$\big({U}_c{V}_{ck}\big)\left(c=1,\dots, C;k=1,\dots, K\right)$| using an iterative alternative algorithm by solving Equation ( 2 ) [ 44 , 45 ]. Detected features with non-zero weights in each class in the weight vectors were extracted as correlated sets.

The WMSCCA method involves parameters |${\mathrm{\alpha}}_u,\mathrm{\beta}, and\ {\mathrm{\alpha}}_{vk}$| |$\left(k=1, \dots, K\right)$|⁠ . Given the limited number of samples, we applied a nested cross-validation (CV) strategy on training sets and evaluated the maximum correlation on the test datasets. Optimal values for the regularization parameters were determined within each training set via internal five-fold CV.

Stage 2. Drug-response prediction using latent components of patients

To predict drug-response categories, we trained LR classifier using the latent components of patients (or raw multimodal features) generated by MOMLIN in Fig. 1 : stages 1 and 2. We used a binary classification scheme, distinguishing pCR versus non-pCR, RCB-I versus non-RCB-I, RCB-II versus non-RCB-II and RCB-III versus non-RCB-III, to evaluate model performance. In addition, we performed analyses with existing multi-omics methods, including SMCCA+LR, MOFA+LR, DIABLO and latent principal component analysis (PCA) features, with LR classifiers. To assess prediction performance for the response to treatment in an unbiased manner, we used five-fold cross-validated performance and repeated the process over 100 runs. The partitioning of data was kept consistent across all models for fair comparisons. The accuracy of response prediction was evaluated using area under the receiver operating characteristic curve (AUROC).

Stage 3. Interpretation of sparse components and multi-omics biomarker discovery and their networks

After learning sparse latent components of features across different data modalities using MOMLIN, we identify the most relevant feature based on the loading weight of genes, TiME and pathways, which reveal underlying interactions for discriminating response classes. The larger the loading weight, the more important the pair of features in discriminating response categories. We then use these selected features to construct a sample correlation network, or a relationship matrix based on their canonical weights [ 46 ]. In this network, nodes represent selected features, and the edge weights between two interconnected features indicate correlation or relatedness. The generated network is visualized using the ggraph package in R ( https://cran.r-project.org ). Finally, we prioritize multi-omics biomarkers based on their degree centrality within the interconnected correlation network.

Derivation of response-associated latent components from BC data with MOMLIN

We applied MOMLIN to analyze a breast cancer (BC) dataset to predict treatment response and gain molecular insights. The dataset comprised 147 BC patients with early and locally advanced pretherapy tumors [ 9 ], categorized as follows: pCR with 38 patients, RCB-I (good response) with 23 patients, RCB-II (moderate response) with 61 patients and RCB-III (resistance) with 25 patients. After preprocessing and filtering least informative features, the final dataset comprised 3748 RNA genes (top 40% out of 9371 genes), 31 mutation genes, 8 clinical attributes, 64 TiME and 178 pathways activities ( Fig. 1 : stage 1). Supplementary Table S1 available online at http://bib.oxfordjournals.org/ summarizes overall clinical characteristics by patients’ response classes.

While our proposed framework offers general applicability for identifying context-specific multi-omics biomarkers, this study specifically focused on discovering drug-response–specific biomarkers to enhance the prediction of pCR and RCB resistance. MOMLIN decomposed the input multimodal data into response-associated sparse latent components of input-features and patients. These sparse components reveal patterns of how various features (e.g. genes and mutations) and clinical attributes related to treatment outcomes ( Fig. 1 : stage 1–3), and their effectiveness was evaluated by measuring prediction performance. We assessed the predictive ability of MOMLIN through five-fold CV repeated 100 times. In each iteration, the dataset is divided into five-folds, with one random fold assigned as the held-out test set, and the remaining folds used as the training set. MOMLIN was trained using the training dataset, including detection of predictive marker candidates, and its performance was evaluated on the ‘unseen’ test set. This process was repeated for all five-folds to ensure robust evaluation of MOMLIN’s generalizability. Performance was measured by the AUROC matrices ( Fig. 1 : stage 2).

Performance comparison with existing methods for drug-response prediction

To evaluate the prediction capability of MOMLIN, we modeled each response category as a binary classification problem and compared its prediction accuracy to existing multi-omics integration algorithms. For comparison, we randomly split the dataset into a training set (70%) and a test set (30% unseen data), with balanced inclusion of response classes. We employed LR as the classifier to assess predictive performance of multimodal biomarkers. We compared MOMLIN with four other classification algorithms for omics data: (i) SMCCA, which integrates multi-omics data by projecting it onto latent components for discriminant analysis; (ii) MOFA, which decomposes multi-omics data into common factors for discriminant analysis; (iii) sparse PCA; and (iv) DIABLO, a supervised integrative analysis method, represent the state-of-the-art in classification. All methods were trained on the same preprocessed data.

The classification results showed that MOMLIN outperformed the compared multi-omics integration methods in most classification tasks on unseen test samples ( Fig. 2A ). Notably, DIABLO, the next best performer, was 10 to 15% less effective than our MOMLIN. Additionally, we compared the performance of component-based LR models against raw feature-based LR models to predict RCB response classes. Although raw feature-based models showed improved prediction, their performance was notably dropped compared to component-based models ( Fig. 2B ). This indicates the superior adaptability and effectiveness of component-based models in leveraging multi-omics data for predictive purposes.

Performance comparison with existing methods and detection of informative data combination. All results in the plots depict test AUROC over five-fold CV obtained from 100 runs. (A) Box plots comparing response prediction performance of MOMLIN against existing state-of-the-art multi-omics methods. (B) Performance comparison between predictors based on latent components and those utilizing a selected subset of multimodal features. (C) Comparing AUROCs for the models with different data subset combinations (clinical, clinical + DNA, clinical + RNA and clinical + DNA + RNA) using MOMLIN.

Performance comparison with existing methods and detection of informative data combination. All results in the plots depict test AUROC over five-fold CV obtained from 100 runs. (A) Box plots comparing response prediction performance of MOMLIN against existing state-of-the-art multi-omics methods. (B) Performance comparison between predictors based on latent components and those utilizing a selected subset of multimodal features. (C) Comparing AUROCs for the models with different data subset combinations (clinical, clinical + DNA, clinical + RNA and clinical + DNA + RNA) using MOMLIN.

Moreover, to test and demonstrate generalizability of this framework, we applied MOMLIN to a preprocessed multi-omics dataset of colorectal adenocarcinoma (COAD) with 256 patients [ 47 ]. This dataset included gene expression, copy number variations and micro-RNA expression data, which we used to classify COAD subtypes such as chromosomal instability (CIN, n  = 174), genomically stable (GS, n  = 34) and microsatellite instability (MSI, n  = 48). The performance results shown in Supplementary Table S2 available online at http://bib.oxfordjournals.org/ and Supplementary Figure S1 available online at http://bib.oxfordjournals.org/ , indicate that MOMLIN outperformed all state-of-the-art methods tested in classifying COAD subtypes. Moreover, when comparing the raw feature-based accuracies with sparse components-based (features derived from MOMLIN) accuracies, we found that raw feature-based classifier was superior against existing methods ( Figure S1A and B ), but lower than the components-based classifier. This consistent observation supports our findings with BC drug-response performances.

Importance of different omics data for treatment response prediction

To assess the added value of integrating multimodal data for predicting treatment response, we trained four prediction models with different feature combinations: (i) clinical features only, plus adding (ii) DNA, (iii) RNA and (iv) both DNA and RNA. We found that adding different data modalities improved prediction performance across all response classes ( Fig. 2C ). Notably, the models that combined clinical data with either RNA or both DNA and RNA demonstrated superior and comparable performance with an average AUROC of 0.978. In contrast, the model based on clinical features alone had much lower AUROC, ranging from 0.51 to 0.82. These results suggest that RNA transcriptome is the most informative data modality in this dataset. Thus, integrating gene expression with clinical features could significantly improve our ability to predict treatment outcomes in BC.

Interpretation of response-associated sparse components identified by MOMLIN

To understand the molecular landscape of treatment response in BC, we used MOMLIN to model response–specific bi-multivariate associations across multiple data modalities. We observed stronger correlations between RNA gene expression and both TiME ( r  = 0.701) and pathway activity ( r  = 0.868), indicating greater overlap or explained information between them. Conversely, moderate correlations were found between RNA gene expression and DNA mutations ( r  = 0.526), or clinical features ( r  = 0.488), indicating partially overlapping or independent information. These results suggest that multimodal biological features provide complementary information in a combinatorial manner.

When investigating the importance of each feature to predict response classes, MOMLIN identified four distinct loading vectors corresponding to pCR and RCB response classes, highlighting distinct weight patterns for pCR versus non-pCR and RCB versus non-RCB classes ( Fig. 3 ). For example, in the pCR (complete response) components—taking the top five molecular features across different modalities revealed distinct molecular patterns. Specifically, gene expression analysis showed that downregulation of FBXO2 and RPS28P7 inhibits tumor cell proliferation, and potentially may enhance treatment efficacy, and the upregulation of C2CD4D-AS1, CSF3R, and SMPDL3B genes may promote immune response, increasing tumor cell vulnerability and therapeutic effect ( Fig. 3A ). Mutational analysis revealed negative associations of marker genes HMCN1 and GATA3, but a positive association for COL5A1 ( Fig. 3C ). Additionally, tumor mutation burden (TMB), and homologous recombination deficiency (HRD)-Telomeric AI signatures were higher in pCR patients, suggesting high genomic instability compared to RCB patients [ 9 ]. TiME analysis showed reduced immunosuppressive mast cells and extracellular matrix (ECM), along with increased infiltration of neutrophils, TIM-3 and CD8+ T-cells ( Fig. 3D ). Subsequently, the pathway analysis further revealed potential downregulation of the PDGFRB pathway, involved in stromal cell activity and associated with improved patient response [ 49 ], while upregulation of pathways for antimicrobial peptides, FLT3 signaling, ephrin B reverse signaling and potential therapeutics for SARS ( Fig. 3E ), suggesting enhanced immune surveillance and interaction with tumor cells. In summary, MOMLIN reveals distinct genomic landscape with higher immune activity and genomic instability in pCR that characterizes its favorable treatment response.

Heatmaps illustrate the features importance on response-associated components identified by MOMLIN. Each row in the heatmap represents a drug-response class, pCR, RCB-I , RCB-II and RCB-III, with columns representing features across different data modalities. The color gradient indicates feature loading or importance, representing the strength of association with response classes. The sign (negative or positive) of gradient denotes the association directions to response classes. All results in the heatmaps depict an average over 100 runs of five-fold CV. (A–E) represents the response-associated candidate biomarkers detected in latent components in (A) gene expression data (highlighting DE genes), (B) clinical features, (C) DNA mutations (highlighting mutated genes), (D) TiME cells and (E) functional pathway profiles (highlighting altered pathways).

Heatmaps illustrate the features importance on response-associated components identified by MOMLIN. Each row in the heatmap represents a drug-response class, pCR, RCB-I , RCB-II and RCB-III, with columns representing features across different data modalities. The color gradient indicates feature loading or importance, representing the strength of association with response classes. The sign (negative or positive) of gradient denotes the association directions to response classes. All results in the heatmaps depict an average over 100 runs of five-fold CV. (A–E) represents the response-associated candidate biomarkers detected in latent components in (A) gene expression data (highlighting DE genes), (B) clinical features, (C) DNA mutations (highlighting mutated genes), (D) TiME cells and (E) functional pathway profiles (highlighting altered pathways).

Similarly, in the RCB-I (good response) components—RNA expression analysis revealed that lower expression of genes GPX1P1 and HBB are linked to less aggressive tumors [ 48 ], while those of thiosulfate sulfurtransferase (TST), NPIPA5 and GSDMB were overexpressed, linked to enhanced immune response and therapeutic effectiveness [ 49 , 50 ]. Mutational analysis showed positive association for therapeutic targets signatures TP53, MUC16 and RYR2 [ 51 , 52 ], but a negative in NEB, and CIN scores. TiME analysis demonstrated increased infiltration of Tregs, cancer-associated fibroblast (CAF), monocytic lineage and natural killer (NK) cells, indicating more active of immune environment [ 9 ], with reduced TEM CD4 cells. Pathway analysis further identified downregulation of NOD1/2 signaling, EPHA-mediated growth cone collapse and toll-like receptor (TLR1, TLR2) pathways, involved in inflammation and immune response, with the upregulation of allograft rejection, and G0 and early G1 pathways. In summary, tumors that achieve RCB-I is marked by distinct genomics marker, active immune response, and lower CIN.

In RCB-II (moderate response) components: RNA expression analysis revealed overexpression of RPLP0P9, FTH1P20, RNF5P1 pseudogenes, following accumulation of overexpressed ERVMER34-1, and PON3 genes play an oncogenic role in BC [ 53 ]. Mutation analysis revealed positive association of HRD-LOH, RYR1 and MT-ND4, but negative association of MACF1 and neoantigen loads, in line with previous reports [ 54 , 55 ]. Analysis of TiME features demonstrated increased infiltration of IDO1 and TAP2, with reduced CTLA 4, NK cells and PD-L2 cells, indicating a less suppressive immune environment. Pathways analysis further revealed downregulation pathways of G1/S DNA damage checkpoints and TP53 regulation, highlighting DNA repair issues, with the upregulation of PDGFRB pathway, E2F targets and signaling by Hedgehog associated with cell proliferation. In summary, RCB-II patients display distinct genomics markers including pseudogenes, lack of suppressive immune environment and active proliferation.

In RCB-III (resistant) components: RNA gene expression analysis revealed lower expression of therapeutic target PON3, and FGFR4 [ 56 ], and flowed accumulation of lower expressed lncRNAc ENSG00000225489, ENSG00000261116 and RNF5P1. Mutation signature analysis identified a positive association of MT-ND1, but a negative association in therapeutic targets TP53, and MT-ND4 [ 7 , 52 ]. Neoantigen loads were higher following lower TMB indicate reduced tumor suppressor activity. TiME analysis revealed reduced activity of T-cell exclusion, and HLA-E, with increased ECM, HLA DPA1 and LAG3, suggesting an immune suppressive tumor environment. Pathway analysis revealed upregulation of pathways involved in neurotransmitter release, cell-cycle progression (RB-1) and immune system diseases, suggesting active cell signaling and proliferation, with downregulation of EPHB FWD pathway and nucleotide catabolism. In summary, patients that attained RCB-III, characterized by low mutational burden and an immune suppressive environment, leading to treatment resistance.

Linking biology to treatment response through biomarker network analysis

To further extract multimodal network biomarkers and understand the complex biological interactions in patients with pCR and RCB, we performed cross-interaction network analysis using candidate signatures identified by MOMLIN across different modalities. This analysis included clinical features, DNA mutations, gene expression, TiME cells and enriched pathways, aiming to elucidate the underlying biology associated with specific treatment responses. Figure 4 shows the interaction networks of selected multimodal features for each RCB class. To identify potential biomarkers associated with pCR and RCB response, we specifically focused on the top ten multimodal features based on network edge connections. For example, tumors that attained in pCR, the network analysis revealed co-enrichment of mutations in HMCN1 and COL5A1 genes, particularly in estrogen receptor (ER)-negative patients. HMCN1 and COL5A1 therapeutic targets like molecules encode proteins for ECM structure, and mutations of these genes regulate tumor architecture and cell adhesion, potentially facilitating immune cell infiltration [ 52 ]. We also observed elevated expressions of FBXO2, CSF3R, C2CD4D-AS1 and RPS28P7 genes, alongside increased infiltration of CD8+ T-cells [ 9 , 57 ]. FBXO2 is a component of the ubiquitin-proteasome system, which regulates protein degradation and influences cell cycle and apoptosis [ 58 ], while CSF3R plays a vital role in granulocyte production and immune response [ 59 ]. These gene expression patterns, coupled with increased CD8+ T-cell infiltration, suggest a robust anti-tumor immune response. Furthermore, these molecular perturbations may be linked to antimicrobial peptide pathways and FLT3 signaling, potentially contributing to the favorable outcome in achieving pCR [ 60 , 61 ]. Future work could specifically search for these complex interactions across different molecules to gain more clinically relevant insights into pCR tumors. Supplementary Table S3 available online at http://bib.oxfordjournals.org/ presents the more detailed list (top 30) of the multi-modal and -omics biomarkers identified using the MOMLIN pipeline.

Multimodal network biomarkers explain drug-response classes. The multimodal networks detail the candidate biomarkers and their interactions for each response class, (A) the pCR patients (B) the RCB-I patients (good response), (C) the RCB-II patients (moderate response) and (D) the RCB-III resistance patients. Nodes in the network represent candidate biomarkers derived from clinical features, DNA mutations, gene expression, enriched cell-types and pathways, each indicated in different colors in the figure legend. Negative edges are light green; positive edges are in light magenta. Edge width reflects the strength of the interaction between features. Node size corresponds to the number of connections (degree), and the font size of node labels scales with degree centrality, highlighting the most interconnected biomarkers.

Multimodal network biomarkers explain drug-response classes. The multimodal networks detail the candidate biomarkers and their interactions for each response class, (A) the pCR patients (B) the RCB-I patients (good response), (C) the RCB-II patients (moderate response) and (D) the RCB-III resistance patients. Nodes in the network represent candidate biomarkers derived from clinical features, DNA mutations, gene expression, enriched cell-types and pathways, each indicated in different colors in the figure legend. Negative edges are light green; positive edges are in light magenta. Edge width reflects the strength of the interaction between features. Node size corresponds to the number of connections (degree), and the font size of node labels scales with degree centrality, highlighting the most interconnected biomarkers.

Similarly, RCB-I tumors exhibited co-enriched mutations in MUC16 and TP53, particularly in HER2+ cases [ 14 ]. MUC16 (CA125) is therapeutic molecule associated with immune evasion and tumor growth [ 51 ], while TP53 mutations can lead to loss of cell cycle control and genomic instability [ 62 ]. We also observed elevated expression of TST involved in the detoxification processes and GPX1P1 [long non-coding RNA (lncRNA)] involved in oxidative stress response. The immune landscape of these tumors showed increased infiltration of TEM CD4 cells (adaptive immunity), monocytic lineage cells (phagocytosis and antigen presentation) and NK cells (innate immunity), as well as CAFs. This immune landscape, coupled with potential perturbations in the allograft rejection pathway, suggests an active but potentially incomplete immune response against the tumor, resulting in minimal residual disease.

RCB-II tumors had lower neoantigen loads compared to pCR, both in ER-negative and HER2+ patients. This reduced neoantigen load might contribute to a weaker immune response. Gene expression analysis showed elevated levels of specific lncRNAs, including FTH1P20 (associated with iron metabolism), RNF5P1 (potentially affecting protein degradation) and RPLP0P9 (involved in protein synthesis), along with ERVMER34-1, which can influence gene expression and immune response in BC patients. Numerous studies have underscored the key regulatory roles of lncRNAs in tumors and the immune system. Notably, increased expression of the immune checkpoint protein IDO1 negatively regulates the expression of CTLA-4, both known to modulate antitumor immune responses [ 63 ]. The combined effect of these molecular alterations suggests potential tumor survival mechanisms, including immune evasion and dysregulation of G1/S DNA damage [ 64 ] contributing to moderate residual disease.

In RCB-III tumors, we observed the reduced prevalence of TP53 and MT-ND4 mutations, typically associated with genomic instability and aggressive tumor behavior [ 51 ], coupled with a higher neoantigen load, suggesting an alternative mechanism (pathways) that drives tumor progression. Despite the higher neoantigen loads, increased expression of HLA-E immune checkpoints and T-cell exclusion in the tumor microenvironment hindered effective anti-tumor immune responses. Additionally, the low-expressed genes PON3, ENSG00000261116 (lncRNA) and RNF5P1 are involved in detoxification, gene regulation and protein degradation, respectively, represents an adaptive response to cellular stress in these tumors. Clinical markers indicating lymph node involvement suggest a more advanced disease state [ 9 ]. These findings, along with potential perturbations in the neurotransmitter release cycle pathway, collectively portray RCB-III tumors as genetically unstable, yet effectively evading immune surveillance, contributing to their significant treatment resistance. Overall, further investigation of these interactive molecular networks, comprising both positive and negative interactions offers a more depth understudying of these potential candidate biomarkers for distinguishing treatment-sensitive pCR and resistant RCB tumors.

The advent of multi-omics technologies has revolutionized our understanding of cancer biology, offering unprecedented insights into the complex molecular interactions that shape tumor behavior and treatment response. In this study, we presented MOMLIN (multi-modal and -omics ML integration), a novel method to enhance cancer drug-response prediction by integrating multi-omics data. MOMLIN specifically utilizes class-specific feature learning and sparse correlation algorithms to model multi-omics associations, enables the detection of class-specific multimodal biomarkers from different omics datasets. Applied to a BC multimodal dataset of 147 patients (comprising RNA expression, DNA mutation, tumor microenvironment, clinical features and pathway functional profiles), MOMLIN was highly predictive of responses to anticancer therapies and identified cohesive multi-modal and -omics network biomarkers associated with responder (pCR) and various levels of RCB (RCB-I: good response, RCB-II: moderate response and RCB-III: resistance).

Using MOMLIN, we identified that pCR is determined by an interactive set of multimodal network biomarkers driven by distinct genetic alterations, such as HMCN1 and COL5A1, particularly in ER-negative tumors [ 9 , 65 ]. Gene expression signatures, including FBXO2 and CSF3R were associated with the immune cell infiltration (CD8+ T-cells), which has been previously reported as a key determinant of response [ 57 ]. The association of these biomarkers with antimicrobial peptide and FLT3 signaling pathways suggests a robust immune response [ 61 ] as a critical driver of complete response. Additionally, C2CD4D-AS1, an lncRNA was identified, and its exact role with these complex molecular interactions in BC remains to be elucidated. Future work could specifically search for these complex interactions across different molecules to gain more clinically relevant insights into pCR tumors.

RCB-I tumors, despite responding well to response, were associated with a distinct multimodal molecular signature. These tumors were enriched for mutations in the therapeutic target MUC16 (CA125), known for its role in immune evasion [ 51 ], and the tumor suppressor gene TP53, particularly in HER2+ cases [ 14 ]. Elevated expression of TST and GPX1P1 (lncRNA involved in oxidative stress response) were associated with increased infiltration of diverse immune cells, including Tem CD4+ cells, monocytes and NK cells [ 10 ]. This active immune landscape and the intricate interactions of these signature with the potential perturbations in the allograft rejection pathway, suggests a robust yet potentially incomplete anti-tumor immune response, contributing to the minimal residual disease observed in this subtype.

RCB-II tumors showed lower neoantigen loads compared to pCR, which could contribute to a weaker immune response, particularly in ER-negative and HER2+ subtypes. Increased expression of lncRNAs, such as FTH1P20, RNF5P1, RPLP0P9 and ERVMER34–1, were associated with the immune checkpoint protein IDO1, and negatively regulate the CTLA-4 protein expression, suggests immune evasion and alterations in tumor cell metabolism and proliferation. These molecules altered intricate interactions implicate dysregulation of G1/S DNA damage as a possible mechanism for moderate treatment response [ 64 ].

RCB-III tumors, classified as resistant, were associated with a distinct multimodal molecular landscape driven by reduced TP53 and MT-ND4 mutations [ 52 ], accompanied with higher neoantigen loads compared to other response groups. This suggests an alternative mechanism driving tumor progression and immune evasion. Despite the high neoantigen load which could potentially trigger immune response, these tumors exhibited immune evasion through increased HLA-E immune checkpoints and T-cell exclusion [ 40 , 55 ]. Also, the downregulation of genes like PON3 and the lncRNA ENSG00000261116, along with lymph node involvement, pointed to advanced disease and cellular stress adaptation [ 9 ]. The presence of these complex interactions, including potential perturbations in the neurotransmitter release cycle pathway, could contribute to treatment resistance in RCB-III tumors. Future studies targeting these immunosuppressive mechanisms and exploring novel pathways could offer promising avenues to overcome resistance in this aggressive subtype.

These findings above emphasize the potential of MOMLIN to enable deeper understanding of complex biological mechanism correspondence to each response class, ultimately paving the way for personalized treatment strategies in cancer. MOMLIN also demonstrated the best prediction performance for unseen patients by utilizing these identified sets of network biomarkers. By identifying response-associated biomarkers, researchers can stratify patients based on their likelihood of achieving pCR or experiencing RCB to anticancer treatments, facilitating more informed treatment decisions and potentially improving patient outcomes. Moreover, the identified biomarkers could serve as valuable targets for the development of novel therapeutic interventions and new biological hypothesis generation. However, the clinical translation of multimodal biomarkers necessitates addressing the potential economic burden associated with multi-omics testing. Developing targeted biomarker panels and prioritizing key hub molecules from the large-scale candidate multimodal network biomarkers identified by MOMLIN could be a viable strategy for reducing costs while maintaining predictive accuracy. Furthermore, ongoing advancements in sequencing and diagnostic technologies are expected to make multi-omics testing more accessible and affordable over time.

In conclusion, our study demonstrates MOMLIN’s capacity to uncover nuanced molecular signatures associated with different drug-response classes in BC. By integrating multi-modal and -omics datasets, we have highlighted the complex interplay between genetic alterations, gene expression, immune infiltration and cellular pathways that contribute to treatment response and resistance. Future research in this direction holds promise for refining risk stratification, optimizing treatment selection and ultimately improving patient outcomes.

While MOMLIN demonstrates promising results as shown, a key limitation lies in its reliance on correlation-based algorithms for multi-omics data integration. These algorithms are great at identifying associations, but they can fall short when it comes to inferring causality between different omics layers. This is a challenge faced by most current state-of-the-art methods [ 28 , 30 ]. In the future iterations of MOMLIN, we aim to incorporate causal inference methodologies alongside sparse correlation algorithms to better understand the complex causal relationships within multi-omics datasets.

We proposed MOMLIN, a novel framework designed to integrate multimodal data and identify response-associated network biomarkers, to understand biological mechanisms and regulatory roles.

MOMLIN employed an adaptive weighting for different data modalities and employs innovative regularization constraint to ensure robust feature selection to analyze high-dimensional omics data.

MOMLIN demonstrates significantly improved performance compared to current state-of-the-art methods.

MOMLIN identifies interpretable and phenotype-specific components, providing insights into the molecular mechanisms driving treatment response and resistance.

We thank Dr Yoshihiro Yamnishi and Mr Chen Yuzhou for their technical help.

This work was supported by the core research budget of Bioinformatics Institute, ASTAR.

Supplemental information and software are available at the Bib website. Our algorithm’s software is available for free download at https://github.com/mamun41/MOMLIN_softwar/tree/main

Hasin Y , Seldin M , Lusis A . Multi-omics approaches to disease . Genome Biol 2017 ; 18 : 83 .

Google Scholar

Rashid MM , Hamano M , Iida M . et al.  Network-based identification of diagnosis-specific trans-omic biomarkers via integration of multiple omics data . Biosystems 2024 ; 236 : 105122 . https://doi.org/10.1016/j.biosystems.2024.105122 .

Zhu B , Song N , Shen R . et al.  Integrating clinical and multiple omics data for prognostic assessment across human cancers . Sci Rep 2017 ; 7 : 16954 . https://doi.org/10.1038/s41598-017-17031-8 .

Aly HA . Cancer therapy and vaccination . J Immunol Methods 2012 ; 382 : 1 – 23 .

Debela DT . et al.  New approaches and procedures for cancer treatment: current perspectives . SAGE Open Med 2021 ; 9 : 20503121211034366 .

Rauf A , Abu-Izneid T , Khalil AA . et al.  Berberine as a potential anticancer agent: a comprehensive review . Molecules 2021 ; 26 :7368. https://doi.org/10.3390/molecules26237368 .

Islam MR , Islam F , Nafady MH . et al.  Natural small molecules in breast cancer treatment: understandings from a therapeutic viewpoint . Molecules 2022 ; 27 : 2165 . https://doi.org/10.3390/molecules27072165 .

Emran TB , Shahriar A , Mahmud AR . et al.  Multidrug resistance in cancer: understanding molecular mechanisms . Front Oncol 2022 ; 12 : 891652 . https://doi.org/10.3389/fonc.2022.891652 .

Sammut SJ , Crispin-Ortuzar M , Chin SF . et al.  Multi-omic machine learning predictor of breast cancer therapy response . Nature 2022 ; 601 : 623 – 9 . https://doi.org/10.1038/s41586-021-04278-5 .

Zhang A , Miao K , Sun H . et al.  Tumor heterogeneity reshapes the tumor microenvironment to influence drug resistance . Int J Biol Sci 2022 ; 18 : 3019 – 33 . https://doi.org/10.7150/ijbs.72534 .

Karczewski KJ , Snyder MP . Integrative omics for health and disease . Nat Rev Genet 2018 ; 19 : 299 – 310 .

In GK . et al.  Multi-omic profiling reveals discrepant immunogenic properties and a unique tumor microenvironment among melanoma brain metastases . NPJ Precis Oncol 2023 ; 7 : 120 . https://doi.org/10.1038/s41698-023-00471-z .

Denkert C , Untch M , Benz S . et al.  Reconstructing tumor history in breast cancer: signatures of mutational processes and response to neoadjuvant chemotherapy (small star, filled) . Ann Oncol 2021 ; 32 : 500 – 11 . https://doi.org/10.1016/j.annonc.2020.12.016 .

Lesurf R , Griffith OL , Griffith M . et al.  Genomic characterization of HER2-positive breast cancer and response to neoadjuvant trastuzumab and chemotherapy-results from the ACOSOG Z1041 (alliance) trial . Ann Oncol 2017 ; 28 : 1070 – 7 . https://doi.org/10.1093/annonc/mdx048 .

Choi JH , Yu J , Jung M . et al.  Prognostic significance of TP53 and PIK3CA mutations analyzed by next-generation sequencing in breast cancer . Medicine (Baltimore) 2023 ; 102 : e35267 . https://doi.org/10.1097/MD.0000000000035267 .

Simeoni O , Piras V , Tomita M . et al.  Tracking global gene expression responses in T cell differentiation . Gene 2015 ; 569 : 259 – 66 . https://doi.org/10.1016/j.gene.2015.05.061 .

Piras V , Hayashi K , Tomita M . et al.  Enhancing apoptosis in TRAIL-resistant cancer cells using fundamental response rules . Sci Rep 2011 ; 1 : 144 . https://doi.org/10.1038/srep00144 .

Misetic H , Keddar MR , Jeannon JP . et al.  Mechanistic insights into the interactions between cancer drivers and the tumour immune microenvironment . Genome Med 2023 ; 15 : 40 . https://doi.org/10.1186/s13073-023-01197-0 .

Son B , Lee S , Youn HS . et al.  The role of tumor microenvironment in therapeutic resistance . Oncotarget 2017 ; 8 : 3933 – 45 . https://doi.org/10.18632/oncotarget.13907 .

Wang C , Lye X , Kaalia R . et al.  Deep learning and multi-omics approach to predict drug responses in cancer . BMC Bioinformatics 2021 ; 22 : 632 . https://doi.org/10.1186/s12859-022-04964-9 .

Li F , Yin J , Lu M . et al.  ConSIG: consistent discovery of molecular signature from OMIC data . Brief Bioinform 2022 ; 23 :bbac253. https://doi.org/10.1093/bib/bbac253 .

Yang Q , Li B , Tang J . et al.  Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data . Brief Bioinform 2020 ; 21 : 1058 – 68 . https://doi.org/10.1093/bib/bbz049 .

Picard M , Scott-Boyer MP , Bodein A . et al.  Integration strategies of multi-omics data for machine learning analysis . Comput Struct Biotechnol J 2021 ; 19 : 3735 – 46 . https://doi.org/10.1016/j.csbj.2021.06.030 .

Dong Z , Zhang N , Li C . et al.  Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection . BMC Cancer 2015 ; 15 : 489 . https://doi.org/10.1186/s12885-015-1492-6 .

Menden MP , Iorio F , Garnett M . et al.  Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties . PloS One 2013 ; 8 : e61318 . https://doi.org/10.1371/journal.pone.0061318 .

Basu A , Bodycombe NE , Cheah JH . et al.  An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules . Cell 2013 ; 154 : 1151 – 61 . https://doi.org/10.1016/j.cell.2013.08.003 .

Adam G , Rampášek L , Safikhani Z . et al.  Machine learning approaches to drug response prediction: challenges and recent progress . NPJ Precis Oncol 2020 ; 4 : 19 . https://doi.org/10.1038/s41698-020-0122-1 .

Singh A , Shannon CP , Gautier B . et al.  DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays . Bioinformatics 2019 ; 35 : 3055 – 62 . https://doi.org/10.1093/bioinformatics/bty1054 .

Wang T , Shao W , Huang Z . et al.  MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification . Nat Commun 2021 ; 12 : 3445 . https://doi.org/10.1038/s41467-021-23774-w .

Argelaguet R , Arnol D , Bredikhin D . et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data . Genome Biol 2020 ; 21 : 111 . https://doi.org/10.1186/s13059-020-02015-1 .

Rodosthenous T , Shahrezaei V , Evangelou M . Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study . Bioinformatics 2020 ; 36 : 4616 – 25 . https://doi.org/10.1093/bioinformatics/btaa530 .

Witten DM , Tibshirani RJ . Extensions of sparse canonical correlation analysis with applications to genomic data . Stat Appl Genet Mol Biol 2009 ; 8 : Article28 . https://doi.org/10.2202/1544-6115.1470 .

Jeong D , Koo B , Oh M . et al.  GOAT: gene-level biomarker discovery from multi-omics data using graph ATtention neural network for eosinophilic asthma subtype . Bioinformatics 2023 ; 39 :btad582. https://doi.org/10.1093/bioinformatics/btad582 .

Hu W , Lin D , Cao S . et al.  Adaptive sparse multiple canonical correlation analysis with application to imaging (epi)genomics study of schizophrenia . IEEE Trans Biomed Eng 2018 ; 65 : 390 – 9 . https://doi.org/10.1109/TBME.2017.2771483 .

Hanzelmann S , Castelo R , Guinney J . GSVA: gene set variation analysis for microarray and RNA-seq data . BMC Bioinformatics 2013 ; 14 : 7 .

Sotiriou C , Wirapati P , Loi S . et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis . J Natl Cancer Inst 2006 ; 98 : 262 – 72 . https://doi.org/10.1093/jnci/djj052 .

Desmedt C , Haibe-Kains B , Wirapati P . et al.  Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes . Clin Cancer Res 2008 ; 14 : 5158 – 65 . https://doi.org/10.1158/1078-0432.CCR-07-4756 .

Danaher P , Warren S , Dennis L . et al.  Gene expression markers of tumor infiltrating leukocytes . J Immunother Cancer 2017 ; 5 : 18 . https://doi.org/10.1186/s40425-017-0215-8 .

Charoentong P , Finotello F , Angelova M . et al.  Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade . Cell Rep 2017 ; 18 : 248 – 62 . https://doi.org/10.1016/j.celrep.2016.12.019 .

Jiang P , Gu S , Pan D . et al.  Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response . Nat Med 2018 ; 24 : 1550 – 8 . https://doi.org/10.1038/s41591-018-0136-1 .

D’Eustachio P . Reactome knowledgebase of human biological pathways and processes . Methods Mol Biol 2011 ; 694 : 49 – 61 .

Schaefer CF , Anthony K , Krupa S . et al.  PID: the pathway interaction database . Nucleic Acids Res 2009 ; 37 : D674 – 9 . https://doi.org/10.1093/nar/gkn653 .

Liberzon A , Subramanian A , Pinchback R . et al.  Molecular signatures database (MSigDB) 3.0 . Bioinformatics 2011 ; 27 : 1739 – 40 . https://doi.org/10.1093/bioinformatics/btr260 .

Du L . et al.  Identifying diagnosis-specific genotype–phenotype associations via joint multitask sparse canonical correlation analysis and classification . Bioinformatics 2020 ; 36 : i371 – 9 . https://doi.org/10.1093/bioinformatics/btaa434 .

Hao X , Li C , du L . et al.  Mining outcome-relevant brain imaging genetic associations via three-way sparse canonical correlation analysis in Alzheimer’s disease . Sci Rep 2017 ; 7 : 44272 . https://doi.org/10.1038/srep44272 .

Shi WJ , Zhuang Y , Russell PH . et al.  Unsupervised discovery of phenotype-specific multi-omics networks . Bioinformatics 2019 ; 35 : 4336 – 43 . https://doi.org/10.1093/bioinformatics/btz226 .

Duan R , Gao L , Gao Y . et al.  Evaluation and comparison of multi-omics data integration methods for cancer subtyping . PLoS Comput Biol 2021 ; 17 : e1009224 . https://doi.org/10.1371/journal.pcbi.1009224 .

Ponzetti M , Capulli M , Angelucci A . et al.  Non-conventional role of haemoglobin beta in breast malignancy . Br J Cancer 2017 ; 117 : 994 – 1006 . https://doi.org/10.1038/bjc.2017.247 .

Yang X , Tang Z . Role of gasdermin family proteins in cancers (review) . Int J Oncol 2023 ; 63 : 100 . https://doi.org/10.3892/ijo.2023.5548 .

Chen Z , Yao N , Zhang S . et al.  Identification of critical radioresistance genes in esophageal squamous cell carcinoma by whole-exome sequencing . Ann Transl Med 2020 ; 8 : 998 . https://doi.org/10.21037/atm-20-5196 .

Zhou Y , Zhang Y , Zhao D . et al.  TTD: therapeutic target database describing target druggability information . Nucleic Acids Res 2024 ; 52 : D1465 – 77 . https://doi.org/10.1093/nar/gkad751 .

Li F , Yin J , Lu M . et al.  DrugMAP: molecular atlas and pharma-information of all drugs . Nucleic Acids Res 2023 ; 51 : D1288 – 99 . https://doi.org/10.1093/nar/gkac813 .

Záveský L , Jandáková E , Weinberger V . et al.  Human endogenous retroviruses (HERVs) in breast cancer: altered expression pattern implicates divergent roles in carcinogenesis . Oncology 2024 ; 102 : 1 – 10 . https://doi.org/10.1159/000538021 .

van der Wiel AMA , Schuitmaker L , Cong Y . et al.  Homologous recombination deficiency scar: mutations and beyond-implications for precision oncology . Cancers (Basel) 2022 ; 14 : 4157 . https://doi.org/10.3390/cancers14174157 .

Morisaki T , Kubo M , Umebayashi M . et al.  Neoantigens elicit T cell responses in breast cancer . Sci Rep 2021 ; 11 : 13590 . https://doi.org/10.1038/s41598-021-91358-1 .

Levine KM , Ding K , Chen L . et al.  FGFR4: a promising therapeutic target for breast cancer and other solid tumors . Pharmacol Ther 2020 ; 214 : 107590 . https://doi.org/10.1016/j.pharmthera.2020.107590 .

Ali H , Provenzano E , Dawson SJ . et al.  Association between CD8+ T-cell infiltration and breast cancer survival in 12 439 patients . Ann Oncol 2014 ; 25 : 1536 – 43 . https://doi.org/10.1093/annonc/mdu191 .

Liu Y , Pan B , Qu W . et al.  Systematic analysis of the expression and prognosis relevance of FBXO family reveals the significance of FBXO1 in human breast cancer . Cancer Cell Int 2021 ; 21 : 130 . https://doi.org/10.1186/s12935-021-01833-y .

Park SD , Saunders AS , Reidy MA . et al.  A review of granulocyte colony-stimulating factor receptor signaling and regulation with implications for cancer . Front Oncol 2022 ; 12 : 932608 . https://doi.org/10.3389/fonc.2022.932608 .

Aghamiri S , Zandsalimi F , Raee P . et al.  Antimicrobial peptides as potential therapeutics for breast cancer . Pharmacol Res 2021 ; 171 : 105777 . https://doi.org/10.1016/j.phrs.2021.105777 .

Chen R , Wang X , Fu J . et al.  High FLT3 expression indicates favorable prognosis and correlates with clinicopathological parameters and immune infiltration in breast cancer . Front Genet 2022 ; 13 : 956869 . https://doi.org/10.3389/fgene.2022.956869 .

Chen X , Zhang T , Su W . et al.  Mutant p53 in cancer: from molecular mechanism to therapeutic modulation . Cell Death Dis 2022 ; 13 : 974 . https://doi.org/10.1038/s41419-022-05408-1 .

Azimnasab-Sorkhabi P , Soltani-As M , Yoshinaga TT . et al.  IDO blockade negatively regulates the CTLA-4 signaling in breast cancer cells . Immunol Res 2023 ; 71 : 679 – 86 . https://doi.org/10.1007/s12026-023-09378-0 .

Sideris N , Dama P , Bayraktar S . et al.  LncRNAs in breast cancer: a link to future approaches . Cancer Gene Ther 2022 ; 29 : 1866 – 77 . https://doi.org/10.1038/s41417-022-00487-w .

Burstein MD . et al.  Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer . Clin Cancer Res 2015 ; 21 : 1688 – 98 .

Supplementary data

Month: Total Views:
June 2024 519

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1477-4054
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

mathematics-logo

Article Menu

data analysis for multiple case study

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Bayesian statistical inference for factor analysis models with clustered data.

data analysis for multiple case study

Share and Cite

Chen, B.; He, N.; Li, X. Bayesian Statistical Inference for Factor Analysis Models with Clustered Data. Mathematics 2024 , 12 , 1949. https://doi.org/10.3390/math12131949

Chen B, He N, Li X. Bayesian Statistical Inference for Factor Analysis Models with Clustered Data. Mathematics . 2024; 12(13):1949. https://doi.org/10.3390/math12131949

Chen, Bowen, Na He, and Xingping Li. 2024. "Bayesian Statistical Inference for Factor Analysis Models with Clustered Data" Mathematics 12, no. 13: 1949. https://doi.org/10.3390/math12131949

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

COMMENTS

  1. Multiple Case Research Design

    The major advantage of multiple case research lies in cross-case analysis. A multiple case research design shifts the focus from understanding a single case to the differences and similarities between cases. Thus, it is not just conducting more (second, third, etc.) case studies. Rather, it is the next step in developing a theory about factors ...

  2. Multiple Case Studies

    Case study research and applications: Design and methods. Los Angeles, CA: Sage. Yin discusses how to decide if a case study should be used in research. Novice researchers can learn about research design, data collection, and data analysis of different types of case studies, as well as writing a case study report.

  3. Case Study Methodology of Qualitative Research: Key Attributes and

    In a case study research, multiple methods of data collection are used, as it involves an in-depth study of a phenomenon. It must be noted, as highlighted by Yin ... (2001, p. 220) posits that the 'unit of analysis' in a case study research can be an individual, a family, a household, a community, an organisation, an event or even a decision.

  4. Multiple Case Study Data Analysis for Doctoral Researchers in ...

    However, the qualitative data analysis process for multiple case studies is a multi-step process that can be challenging for doctoral researchers. This article thus outlines the qualitative data analysis process for a doctoral-level multiple case study in management and leadership, including conducting descriptive coding and cross-case ...

  5. Toward Developing a Framework for Conducting Case Study Research

    Yin (1989, 1994) suggests three principles of data collection for case studies: use multiple sources of data, create a case study database, and maintain a chain of evidence. Stake (1995) states that essential parts of a data-gathering plan are definition of the case, list of research questions, identification of helpers, data sources ...

  6. Multiple Case Research Design

    Both single and multiple-case designs can be holistic (one unit of analysis per case) or embedded (multiple units of analysis per case) (Yin, 2014). Opportunistic and convenient sampling A general challenge in empirical studies is access to a sufficiently large number of interview partners who reserve time for an interview (or other data ...

  7. PDF A (VERY) BRIEF REFRESHER ON THE CASE STUDY METHOD

    ve as a brief refresher to the case study method. As a refresher, the chapter does not fully cover all the options or nuances that you might encounter when customizing your own case study (refer to Yin, 2009a, to obtain a full rendition of the entire method).Besides discussing case study design, data collection, and analysis, the refresher addr.

  8. The Data Analysis Process for Multiple Case Study Research

    Yin's recommendation is for the case study researcher to follow a ground-up strategy to analyze the data that allows critical concepts to emerge by carefully examining the data. To illustrate the cross-case synthesis method, let us review the data analysis procedures used by Shepherd in a multiple case study with the purpose of in-depth ...

  9. (PDF) Using a Multiple-Case Studies Design to Investigate the

    A multiple case study approach was deemed as the most appropriate approach for this inquiry as it allows the researcher to explore a phenomenon using a replication strategy that is tantamount to ...

  10. PDF 9 Multiple Case Research Design

    A multiple case research design shifts the focus from understanding a single case to the differences and similarities between cases. Thus, it is not just conducting another (sec-ond, third, etc.) case study. Rather, it is the next step in developing a theory about fac-tors driving differences and similarities.

  11. Qualitative case study data analysis: an example from practice

    Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  12. Case Study

    The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice. Multiple-Case ...

  13. Case Study Methods and Examples

    The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...

  14. Four Steps to Analyse Data from a Case Study Method

    propose an approach to the analysis of case study data by logically linking the data to a series of propositions and then interpreting the subsequent information. Like the Yin (1994) strategy, the Miles and Huberman (1994) process of analysis of case study data, although quite detailed, may still be insufficient to guide the novice researcher.

  15. The Multiple Case Study Design Methodology and Application for

    The multiple case study design is a valuable qualitative research tool in studying the links between the personal, social, behavioral, psychological, organizational, cultural, and environmental factors that guide organizational and leadership development. ... Applying Data Collection Methods in Multiple Case Study Research 7. The Data Analysis ...

  16. 10 Real World Data Science Case Studies Projects with Example

    A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

  17. Case Study Method: A Step-by-Step Guide for Business Researchers

    Second, an analysis of authors' multiple case studies is presented in order to provide an application of step-by-step guideline. ... The authors interpreted the raw data for case studies with the help of a four-step interpretation process (PESI). Raw empirical material, in the form of texts from interviews, field notes of meetings, and ...

  18. The Multiple Case Study Design

    This unique volume offers novice and experienced researchers a brief, student-centric research methods text specifically devoted to the multiple case study design. The multiple case study design is a valuable qualitative research tool in studying the links between the personal, social, behavioral, psychological, organizational, cultural, and ...

  19. PDF A multiple case design for the investigation of information management

    The multiple-case design was the best research design for this study, as it allowed the researcher to use best practices from the two international universities in order to develop a conceptual framework for the University of Johannesburg. The major benefit of using a multiple-case design was that multiple perspectives of the individuals

  20. How to analyse multiple case studies.

    Popular answers (1) you analyse each case separately, and then explore patterns of similarity or difference. Another method is the most similar and most difference approaches; and also QCA, all of ...

  21. Using Multiple Case Study Design and Thematic Analysis in NVivo to

    In this case study, readers will learn how to overcome the challenges related to data collection and analysis in a multiple case study design. The practical recommendations for choosing the participants will be discussed. Moreover, the application of reflexive thematic analysis in NVivo will be explored.

  22. How can I analysis multiple case studies? Using primary data and

    Concerning strategies, and given that in case studies we deal with huge amounts of data, I use mindmaps to incorporate and analyze data from different sourceslike you can see in the image: https ...

  23. Full article: Evaluation of lean construction practices for improving

    This research employed a mixed-methods approach, combining surveys, interviews and document analysis to collect data. By utilizing these methods, the study aimed to evaluate how lean construction practices influence project delivery in the district (Bashir et al., Citation 2011). Case studies of real construction projects were conducted to ...

  24. Assessing the impact of missing data in youth overweight and obesity

    Youth overweight and obesity (OWOB) surveillance often uses body mass index (BMI) derived from self-reported height and weight, but these measures can suffer from high proportions of missing data. Complete case analysis (CCA) is the most common approach to handle missing data, but this approach can introduce bias if missing data are not missing ...

  25. Proteomic prediction of diverse incident diseases: a machine learning

    We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person-years of follow-up. Participants were middle-aged individuals (aged 40-79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and ...

  26. American Journal of Neuroradiology

    STUDY SELECTION: 718 abstracts were screened and 9 5 underwent full-text review, with 2 articles meeting inclusion criteria. The Risk of Bias in Non-randomized Studies of Interventions assessment tool was used. DATA ANALYSIS: A qualitative assessment without a pooled analysis was performed for the two studies meeting inclusion criteria.

  27. Advancing drug-response prediction using multi-modal and -omics machine

    Introduction. The advent of high-throughput sequencing technologies has revolutionized our ability to collect various 'omics' data types, such as deoxyribonucleic acid (DNA) methylations, ribonucleic acid (RNA) expressions, proteomics, metabolomics and bioimaging datasets, from the same samples or patients with unprecedented details [].By far, most studies have performed single omics ...

  28. Understanding "patient deterioration" in psychotherapy from depressed

    Objective:This study scrutinizes the meaning of deterioration in psychotherapy beyond the widely used statistical definition of reliable symptom increase pre-to-post treatment.Method:An explanatory sequential mixed-methods multiple case study was conducted, combining quantitative pre-post outcome evaluation of self-reported depression symptoms and qualitative analysis of patients' interviews.

  29. Bayesian Statistical Inference for Factor Analysis Models with ...

    Clustered data are a complex and frequently used type of data. Traditional factor analysis methods are effective for non-clustered data, but they do not adequately capture correlations between multiple observed individuals or variables in clustered data. This paper proposes a Bayesian approach utilizing MCMC and Gibbs sampling algorithms to accurately estimate parameters of interest within the ...

  30. Can we do thematic analysis for multiple case study design?

    MoND. Thematic analysis is an analytical approach that is utilized by academics in a variety of subjects and disciplines. It is an analytic technique and synthesis technique utilized as part of ...