Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

How do you determine the quality of a journal article?

Published on October 17, 2014 by Bas Swaen . Revised on March 4, 2019.

In the theoretical framework of your thesis, you support the research that you want to perform by means of a literature review . Here, you are looking for earlier research about your subject. These studies are often published in the form of scientific articles in journals (scientific publications).

Table of contents

Why is good quality important, check the following points.

The better the quality of the articles that you use in the literature review , the stronger your own research will be. When you use articles that are not well respected, you run the risk that the conclusions you draw will be unfounded. Your supervisor will always check the article sources for the conclusions you draw.

We will use an example to explain how you can judge the quality of a scientific article. We will use the following article as our example:

Example article

Perrett, D. I., Burt, D. M., Penton-Voak, I. S., Lee, K. J., Rowland, D. A., & Edwards, R. (1999). Symmetry and Human Facial Attractiveness.  Evolution and Human Behavior ,  20 , 295-307. Retrieved from  http://www.grajfoner.com/Clanki/Perrett%201999%20Symetry%20Attractiveness.pdf

This article is about the possible link between facial symmetry and the attractiveness of a human face.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

quality research articles are

1. Where is the article published?

The journal (academic publication) where the article is published says something about the quality of the article. Journals are ranked in the Journal Quality List (JQL). If the journal you used is ranked at the top of your professional field in the JQL, then you can assume that the quality of the article is high.

The article from the example is published in the journal “Evolution and Human Behavior”. The journal is not on the Journal Quality List, but after googling the publication, it seems from multiple sources that it nevertheless is among the top in the field of Psychology (see Journal Ranking at   http://www.ehbonline.org/ ). The quality of the source is thus high enough to use it.

So, if a journal is not listed in the Journal Quality List then it is worthwhile to google it. You will then find out more about the quality of the journal.

2. Who is the author?

The next step is to look at who the author of the article is:

  • What do you know about the person who wrote the paper?
  • Has the author done much research in this field?
  • What do others say about the author?
  • What is the author’s background?
  • At which university does the author work? Does this university have a good reputation?

The lead author of the article (Perrett) has already done much work within the research field, including prior studies of predictors of attractiveness. Penton-Voak, one of the other authors, also collaborated on these studies. Perrett and Penton-Voak were in 1999 both professors at the University of St Andrews in the United Kingdom. This university is among the top 100 best universities in the world. There is less information available about the other authors. It could be that they were students who helped the professors.

3. What is the date of publication?

In which year is the article published? The more recent the research, the better. If the research is a bit older, then it’s smart to check whether any follow-up research has taken place. Maybe the author continued the research and more useful results have been published.

Tip! If you’re searching for an article in Google Scholar , then click on ‘Since 2014’ in the left hand column. If you can’t find anything (more) there, then select ‘Since 2013’. If you work down the row in this manner, you will find the most recent studies.

The article from the example was published in 1999. This is not extremely old, but there has probably been quite a bit of follow-up research done in the past 15 years. Thus, I quickly found via Google Scholar an article from 2013, in which the influence of symmetry on facial attractiveness in children was researched. The example article from 1999 can probably serve as a good foundation for reading up on the subject, but it is advisable to find out how research into the influence of symmetry on facial attractiveness has further developed.

4. What do other researchers say about the paper?

Find out who the experts are in this field of research. Do they support the research, or are they critical of it?

By searching in Google Scholar, I see that the article has been cited at least 325 times! This says then that the article is mentioned at least in 325 other articles. If I look at the authors of the other articles, I see that these are experts in the research field. The authors who cite the article use the article as support and not to criticize it.

5. Determine the quality

Now look back: how did the article score on the points mentioned above? Based on that, you can determine quality.

The example article scored ‘reasonable’ to ‘good’ on all points. So we can consider the article to be qualitatively good, and therefore it is useful in, for example, a literature review. Because the article is already somewhat dated, however, it is wise to also go in search of more recent research.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Swaen, B. (2019, March 04). How do you determine the quality of a journal article?. Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/tips/how-do-you-determine-the-quality-of-a-journal-article/

Is this article helpful?

Bas Swaen

What is your plagiarism score?

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

quality research articles are

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

What is quality in research building a framework of design, process and impact attributes and evaluation perspectives.

quality research articles are

1. Introduction

2. background, 3. research process and method, 4. attributes of quality in scientific research.

  • Research Design: All that relates to the conceptualization of research, its aims and specific goals, the strategy or approach adopted by the researcher, the initial assumptions and ultimate focus of research. This can be termed an ex ante approach to quality.
  • Research Process: The execution of research activities, the research method and tools applied, the conduct of the researcher and the formalization and reporting of results. While not a very common focus in most evaluations, this interim approach is the closest to the everyday practice of the researchers.
  • Research Impact: The sharing of results, the influence on scholars and practitioners, the adoption or utilization of findings and the ultimate effects on the society. Ex post approaches to quality differ in whether they are interested mainly in the impact within the research system or outside of it.

5. An Integrative Process Description

  • Ideation , i.e., the identification of a rationale or motivation (triggering factor) for conducting research and the high-level goals of the same;
  • Preparation , i.e., the definition of strategy, detailed objectives and a plan to conduct research activities;
  • Execution , i.e., the actual undertaking of research activities;
  • Reporting , i.e., the writing and formalization of a story about what we performed and what results we achieved;
  • Publication , i.e., the sharing the outcomes of our research within the academic/practitioner community;
  • Diffusion , i.e., the post-publication dissemination of results and the deriving effects.

6. Discussion and Conclusions

6.1. main contribution and comparison with literature, 6.2. policy implications and practical use of the framework, 6.3. future research and concluding remarks, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest, appendix a. page for expert feedback collection, defining and assessing quality dimensions in scientific research.

  • Accessibility
  • Adherence to standards/rules
  • Clinical significance
  • Collaboration distance
  • Communicability
  • Community impact
  • Completeness
  • Conformance to ethics
  • Consistency
  • Consumability
  • Contextualization
  • Contributory
  • Cost effectiveness
  • Craftsmanship
  • Credibility
  • Cross-field connections
  • Disinterestedness
  • Dissemination potential
  • Documented sources
  • Economic impact/significance
  • Educational impact
  • Evidence disclosure
  • Expert feedback
  • Feasibility and success probability
  • Generalizability
  • Impartiality
  • Innovation degree
  • Intellectual significance
  • Interdisciplinarity
  • Internal validity
  • International scope
  • Investigator’s expertise
  • Language clarity
  • Methodological soundness
  • Objectivity
  • Operationalization
  • Organized skepticism
  • Originality
  • Outcomes anticipation
  • Overall research approach
  • Plausibility
  • Political and social significance
  • Practitioner impact
  • Reliability
  • Reproducibility
  • Research process evaluation
  • Resource attraction
  • Resource management
  • Rigorousness
  • Scholarly exchange and diffusion
  • Scientific significance and value
  • Scope tailoring and focus
  • Searchability
  • Social impact/significance
  • Societal relevance and value
  • Stakeholder involvement
  • Stringent argumentation
  • Structure clarity
  • Sustainability
  • Systems view
  • Technological significance
  • Testability
  • Theory impact
  • Thoroughness
  • Transparency
  • Trendsetting and future outline
  • Truth and veridicity
  • Unconventionality
  • Universalism
  • Quality attributes/dimensions related to Research Vision (i.e., aims, rationale).
  • Quality attributes/dimensions related to Research Process (i.e., execution, method, conduct);
  • Quality attributes/dimensions related to Research Description (i.e., formalization, reporting);
  • Quality attributes/dimensions related to Research Diffusion (i.e., sharing, adoption);
  • Quality attributes/dimensions related to Research Impact (i.e., effects, influence).
  • Whitley, R.; Glaser, J. The Changing Governance of the Sciences: The Advent of Research Evaluation Systems ; Springer: Dordrecht, The Netherlands, 2008. [ Google Scholar ]
  • Flink, T.; Peter, T. Excellence and frontier research as travelling concepts in science policymaking. Minerva 2018 , 56 , 431–452. [ Google Scholar ] [ CrossRef ]
  • Sayed, O.H. Critical Treatise on University Ranking Systems. Open J. Soc. Sci. 2019 , 7 , 39–51. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • van Vught, F.A.; Ziegele, F. Design and Testing the Feasibility of a Multidimensional Global University Ranking ; Final Report; European Community, Europe; CHERPA-Network; Cnsortium for Higher Education and Research Performance Assessment: Virtual Network, 2011. [ Google Scholar ]
  • Cole, S. Citations and the evaluation of individual scientists. Trends Biochem. Sci. 1989 , 4 , 9–13. [ Google Scholar ] [ CrossRef ]
  • Gisvold, S.E. Citation analysis and journal impact factors—Is the tail wagging the dog? Acta Anaesthesiol. Scand. 1999 , 43 , 971–973. [ Google Scholar ] [ CrossRef ]
  • Romano, N.C., Jr. Journal self-citation v: Coercive journal self-citation—Manipulations to increase impact factors may do more harm than good in the long run. Commun. Assoc. Inf. Syst. 2009 , 25 , 41–56. [ Google Scholar ] [ CrossRef ]
  • Seglen, P.O. Citation frequency and journal impact: Valid indicators of scientific quality? J. Intern. Med. 1991 , 229 , 109–111. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Seglen, P.O. Citations and journal impact factors: Questionable Indicators of research quality. Allergy 1997 , 52 , 1050–1056. [ Google Scholar ] [ CrossRef ]
  • Donovan, C. The qualitative future of research evaluation. Sci. Public Policy 2007 , 34 , 585–597. [ Google Scholar ] [ CrossRef ]
  • Hicks, D.; Wouters, P.; Waltman, L.; de Rijcke, S.; Rafols, I. The Leiden Manifesto for research metrics. Nature 2015 , 520 , 429–431. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Wilsdon, J.; Belfiore, E.; Campbell, P.; Curry, S.; Hill, S.; Jones, R.; Kain, R.; Kerridge, S.R.; Thelwall, M.; Tinkler, J.; et al. The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management (HEFCE, 2015). Available online: https://www.researchgate.net/publication/279402178_The_Metric_Tide_Report_of_the_Independent_Review_of_the_Role_of_Metrics_in_Research_Assessment_and_Management (accessed on 9 February 2022).
  • Leydesdorff, L.; Wouters, F.; Bornman, L. Professional and citizen bibliometrics: Complementarities and ambivalences in the development and use of indicators. Scientometrics 2016 , 109 , 2129–2150. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Mårtensson, P.; Fors, U.; Wallin, S.-B.; Zander, U.; Nilsson, G.H. Evaluating research: A multidisciplinary approach to assessing research practice and quality. Res. Policy 2016 , 45 , 593–603. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Benedictus, R.; Miedema, F.; Ferguson, M.W. Fewer numbers, better science. Nature 2016 , 538 , 453. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lamont, M.; Fournier, M.; Guetzkow, J.; Mallard, G.; Bernier, R. Evaluating Creative Minds: The Assessment of Originality in Peer Review. In Knowledge, Communication and Creativity ; Sales, A., Fournier, M., Eds.; SAGE Publications: London, UK, 2007; pp. 166–181. [ Google Scholar ]
  • Moore, S.; Neylon, C.; Paul Eve, M.; O’Donnell, D.P.; Pattinson, D. Excellence R Us: University research and the fetishisation of excellence. Palgrave Commun. 2017 , 3 , 16105. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Garfield, E. Citation indexes for science: A new dimension in documentation through association of ideas. Science 1955 , 122 , 108–111. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Garfield, E. Citation analysis as a tool in journal evaluation. Science 1972 , 178 , 471–479. [ Google Scholar ] [ CrossRef ]
  • Garfield, E. Journal impact factor: A brief review. CMAJ 1999 , 161 , 977–980. [ Google Scholar ]
  • Hirsch, J.E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 2005 , 102 , 16569–16572. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Rousseau, R. New developments related to the Hirsch index. Sci. Focus 2006 , 1 , 23–25. [ Google Scholar ]
  • Egghe, L. Theory and practice of the g-index. Scientometrics 2006 , 69 , 131–152. [ Google Scholar ] [ CrossRef ]
  • Alonso, S.; Cabrerizo, F.; Herrera-Viedma, E.; Herrera, F. Hg-index: A new index to characterize the scientific output of researchers based on h- and g-indices. Scientometrics 2010 , 82 , 391–400. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Wildgaard, L.; Schneider, J.W.; Larsen, B. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics 2014 , 101 , 125–158. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Waltman, L. A review of the literature on citation impact indicators. J. Informetr. 2016 , 10 , 365–391. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Hammarfelt, B.; Rushforth, A.D. Indicators as judgment devices: An empirical study of citizen bibliometrics in research evaluation. Res. Eval. 2017 , 26 , 169–180. [ Google Scholar ] [ CrossRef ]
  • Karpik, L. Valuing the Unique the Economics of Singularities ; Princeton University Press: Princeton, NJ, USA, 2010. [ Google Scholar ]
  • Butler, L. Assessing university research: A plea for a balanced approach. Sci. Public Policy 2007 , 34 , 565–574. [ Google Scholar ] [ CrossRef ]
  • Moed, H.F. The future of research evaluation rests with an intelligent combination of advanced metrics and transparent peer review. Sci. Public Policy 2007 , 34 , 575–583. [ Google Scholar ] [ CrossRef ]
  • Holbrook, J.B.; Barr, K.R.; Brown, K.W. Research Impact: We need negative metrics too. Nature 2013 , 497 , 439. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Gibbons, M.; Limoges, C.; Nowotny, H.; Schwartzman, S.; Scott, P.; Trow, M. The New Production of Knowledge: The Dynamics of Science and Research in Contemporary Societies ; Sage: London, UK, 1994. [ Google Scholar ]
  • Gulbrandsen, J.M.; Langfeldt, L. In search of mode 2: The nature of knowledge production in Norway. Minerva 2004 , 42 , 237–250. [ Google Scholar ] [ CrossRef ]
  • Albert, M.; Laberge, S.; McGuire, W. Criteria for assessing quality in academic research: The views of biomedical scientists, clinical scientists and social scientists. High. Educ. 2012 , 64 , 661–676. [ Google Scholar ] [ CrossRef ]
  • Horbach, S.P.J.M.; Halffman, W. Journal peer review and editorial evaluation: Cautious innovator or sleepy giant? Minerva 2020 , 58 , 139–161. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Cronin, B.; Sugimoto, C.R. Beyond Bibliometrics. Harnessing Multidimensional Indicators of Scholarly Impact ; MIT Press: Cambridge, MA, USA, 2014. [ Google Scholar ]
  • De Bellis, N. Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics ; Scarecrow Press: Lanham, MD, USA, 2009. [ Google Scholar ]
  • Prins, A.A.M.; Costas, R.; van Leeuwen, T.N.; Wouters, P.F. Using Google Scholar in research evaluation of humanities and social science programs: A comparison with web of science data. Res. Eval. 2016 , 25 , 264–270. [ Google Scholar ] [ CrossRef ]
  • Bornmann, L. Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. J. Informetr. 2014 , 8 , 895–903. [ Google Scholar ]
  • Penfield, T.; Baker, M.J.; Scoble, R.; Wykes, M.C. Assessment, evaluations, and definitions of research impact: A review. Res. Eval. 2014 , 23 , 21–32. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Reale, E.; Avramov, D.; Canhial, K.; Donovan, C.; Flecha, R.; Holm, P.; Larkin, C.; Lepori, B.; Mosoni-Fried, J.; Oliver, E.; et al. A review of literature on evaluating the scientific, social and political impact of social sciences and humanities research. Res. Eval. Open Access 2018 , 27 , 298–308. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Bornmann, L. Measuring the societal impact of research. EMBO Rep. 2012 , 13 , 673–676. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Bornmann, L. What is societal Impact of research and how can it be assessed? A literature survey. J. Am. Soc. Inf. Sci. Technol. 2013 , 64 , 217–233. [ Google Scholar ] [ CrossRef ]
  • Molas-Gallart, J.; Salter, A.; Patel, P.; Scott, A.; Duran, X. Measuring Third Stream Activities. Final Report to the Russell Group of Universities ; Science and Technology Policy Research Unit (SPRU): Brighton, UK, 2002. [ Google Scholar ]
  • Van der Meulen, B.; Rip, A. Evaluation of societal quality of public sector research in the Netherlands. Res. Eval. 2000 , 9 , 11–25. [ Google Scholar ] [ CrossRef ]
  • DEST–Department of Education Science and Training. Research Quality Framework: Assessing the Quality and Impact of Research in Australia ; Commonwealth of Australia: Canberra, Australia, 2005. [ Google Scholar ]
  • Hemlin, S. Utility evaluation of academic research: Six basic propositions. Res. Eval. 1998 , 7 , 159–165. [ Google Scholar ] [ CrossRef ]
  • McNie, E.C.; Parris, A.; Sarewitz, D. Improving the public value of science: A typology to inform discussion, design and implementation of research. Res. Policy 2016 , 45 , 884–895. [ Google Scholar ] [ CrossRef ]
  • Bozeman, B.; Sarewitz, D. Public value mapping and science policy evaluation. Minerva 2011 , 49 , 1–23. [ Google Scholar ] [ CrossRef ]
  • ERiC. Evaluating the Societal Relevance of Academic Research: A Guide ; Rathenau Institute: The Hague, The Netherlands, 2010. [ Google Scholar ]
  • Holbrook, J.B.; Frodeman, R. Peer review and the ex-ante assessment of societal impacts. Res. Eval. 2011 , 20 , 239–246. [ Google Scholar ] [ CrossRef ]
  • Morton, S. Progressing research impact assessment: A contributions’ approach. Res. Eval. 2015 , 24 , 405–419. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Holbrook, J.B.; Hrotic, S. Blue skies, impacts, and peer review. Roars Trans. J. Res. Policy Eval. (RT) 2013 , 1 , 1–24. [ Google Scholar ]
  • ERC—European Research Council. Qualitative Evaluation of Completed Projects Funded by the European Research Council ; European Commission: Brussels, Belgium, 2016. [ Google Scholar ]
  • Samuel, G.N.; Derrick, G. Societal Impact evaluation: Exploring evaluator perceptions of the characterization of impact under the ref2014. Res. Eval. 2015 , 24 , 229–241. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lamont, M. How Professors Think: Inside the Curious World of Academic Judgment ; Harvard University Press: Cambridge, MA, USA, 2009. [ Google Scholar ]
  • Bazeley, P. Conceptualising research performance. Stud. High. Educ. 2010 , 35 , 889–903. [ Google Scholar ] [ CrossRef ]
  • Llonk, J.; Casablancas-Segura, C.; Alarcón-del-Amo, M.C. Stakeholder orientation in public universities: A conceptual discussion and a scale development. Span. J. Mark. ESIC 2016 , 20 , 41–57. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Bras-Amorós, M.; Domingo-Ferrer, J.; Torra, V. A bibliometric index based on the collaboration distance between cited and citing authors. J. Informetr. 2011 , 5 , 248–264. [ Google Scholar ] [ CrossRef ]
  • Lawani, S.M. Some bibliometric correlates of quality in scientific research. Scientometrics 1984 , 9 , 13–25. [ Google Scholar ] [ CrossRef ]
  • Hemlin, S. Quality in Science. Researchers’ Conceptions and Judgements ; University of Gothenburg, Department of Psychology: Gothenburg, Sweden, 1991. [ Google Scholar ]
  • Gulbrandsen, J.M. Research Quality and Organisational Factors: An Investigation of the Relationship ; Norwegian University of Science and Technology: Trondheim, Sweden, 2000. [ Google Scholar ]
  • Keen, P.G.W. Relevance and rigor in information systems research: Improving quality, confidence, cohesion and impact. In Information Systems Research: Contemporary Approaches and Emergent Traditions ; Klein, H.K., Nissen, H.-E., Hirschheim, R., Eds.; IFIP Elsevier Science: Philadelphia, PA, USA, 1991; pp. 27–49. [ Google Scholar ]
  • Maxwell, J.A. Qualitative Research Design: An Interactive Approach ; Sage Publications: Thousand Oaks, CA, USA, 1996. [ Google Scholar ]
  • Amin, A.; Roberts, J. Knowing in action: Beyond communities of practice. Res. Policy 2008 , 37 , 353–369. [ Google Scholar ] [ CrossRef ]
  • Langfeldt LNedeva, M.; Sörlin, S.; Thomas, D.A. Co-existing notions of research quality: A framework to study context-specific understandings of good research. Minerva 2020 , 58 , 115–137. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Baldi, S. Normative versus social constructivist processes in the allocation of citations: A network-analytic model. Am. Sociol. Rev. 2008 , 63 , 829–846. [ Google Scholar ] [ CrossRef ]
  • Silverman, D. Interpreting Qualitative Data: Methods for Analysing Talk, Text and Interaction ; Sage Publications: London, UK, 1993. [ Google Scholar ]
  • Alborz, A.; McNally, R. Developing methods for systematic reviewing in health services delivery and organisation: An example from a review of access to health care for people with learning disabilities. Part Evaluation of the literature—A practical guide. Health Inf. Lib. J. 2004 , 21 , 227–236. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lee, C.J. Commensuration bias in peer review. Philos. Sci. 2015 , 82 , 1272–1283. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Polanyi, M. The republic of science: Its political and economic theory Minerva, I(1) (1962), 54–73. Minerva 2000 , 38 , 1–32. [ Google Scholar ] [ CrossRef ]
  • Buchholz, K. Criteria for the analysis of scientific quality. Scientometrics 1995 , 32 , 195–218. [ Google Scholar ] [ CrossRef ]
  • Hug, S.E.; Ochsner, M.; Daniel, H.-D. Criteria for assessing research quality in the humanities: A Delphi study among scholars of English literature, German literature and art history. Res. Eval. 2013 , 22 , 369–383. [ Google Scholar ] [ CrossRef ]
  • Lahtinen, E.; Koskinen-Ollonqvist, P.; Rouvinen-Wilenius, P.; Tuominen, P.; Mittelmark, M.B. The development of quality criteria for research: A Finnish approach. Health Promot. Int. 2005 , 20 , 306–315. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • External Research Assessment (ERA). External Research Assessment ; Karolinska Institutet: Stockholm, Sweden, 2010. [ Google Scholar ]
  • Hemlin, S.; Montgomery, H. Scientists’ conceptions of scientific quality: An interview study. Sci. Stud. 1990 , 3 , 73–81. [ Google Scholar ] [ CrossRef ]
  • Hemlin, S.; Niemenmaa, P.; Montgomery, H. Quality criteria in evaluations: Peer reviews of grant applications in psychology. Sci. Stud. 1995 , 8 , 44–52. [ Google Scholar ] [ CrossRef ]
  • Weinberg, A.M. Criteria for scientific choice: Minerva, I (2), (1962), 158–171. Minerva 2000 , 38 , 255–266. [ Google Scholar ] [ CrossRef ]
  • Shipman, M. The Limitations of Social Research ; Longman: London, UK, 1982. [ Google Scholar ]
  • Gummesson, E. Qualitative Methods in Management Research ; Sage Publications: Newbury Park, CA, USA, 1991. [ Google Scholar ]
  • Ochsner, M.; Hug, S.E.; Daniel, H.-D. Four types of research in the humanities: Setting the stage for research quality criteria in the humanities. Res. Eval. 2012 , 22 , 79–92. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Tranøy, K.E. Science–Social Power and Way of Life ; Universitetsforlaget: Oslo, Norway, 1986. [ Google Scholar ]
  • Guthrie, S.; Wamae, W.; Diepeveen, S.; Wooding, S.; Grant, J. Measuring Research: A Guide to Research Evaluation Frameworks and Tools ; Rand: Santa Monica, CA, USA, 2013. [ Google Scholar ]
  • De Rijcke, S.; Wouters, P.F.; Rushforth, A.D.; Franssen, T.P.; Hammarfelt, B. Evaluation practices and effects of indicator use: A literature review. Res. Eval. 2016 , 25 , 161–169. [ Google Scholar ] [ CrossRef ]
  • Waltman, L. Responsible Metrics: One Size Doesn’t Fit All, CWTS—Centre for Science and Technology Studies, CWTS Blog article. Available online: https://www.cwts.nl (accessed on 31 August 2017).
  • Kaplan, S.; Norton, D.P. The Balanced Scorecard ; HBS Press: Boston, MA, USA, 1992. [ Google Scholar ]

Click here to enlarge figure

AttributesAuthors
Clarity, rigor, methodological soundness and craftsmanship[ , ]
Coherence[ ]
Collaboration distance between citing/cited authors[ ]
Communicability (consumability, accessibility and searchability)[ ]
Communication and collaboration[ ]
Openness, universalism, disinterestedness and organized skepticism[ , , ]
Conforming (ethics, alignment with rules and sustainability)[ ]
Contextualization [ , , ]
Contributory (originality, relevance and generalizability)[ ]
Credibility (rigorousness, consistency, coherence and transparency)[ ]
F(ield)-type and S(pace)-type quality[ ]
Intellectual influence[ ]
Internal validity, reliability and rigor [ , ]
Journal impact factor, citations and H-IndexVarious
Methods, intellectual and political/social significance and originality[ ]
Novelty, methodological soundness and significance[ ]
Originality/novelty[ , , , , , , ]
Plausibility/reliability[ , , , ]
Scholarly exchange, connecting to other research, impact on research community and future research, innovation, originality, productivity, rigor, fostering cultural memories, recognition, reflection, criticism, continuity, continuation, openness, variety, self-management, independence, scholarship/erudition, connection between research and teaching[ ]
Scientific quality, defined scope, anticipated outcomes, operationalization, feasibility, process evaluation, documentation and dissemination[ ]
Scientific value and societal relevance value[ ]
Scientific, technological, clinical and socio-economic significance [ ]
Significance, approach, innovation, investigators and environment
Societal quality, usefulness [ , ]
Socio-economic impact, resource attraction and resource management
Stringent argumentation, presentation of relevant evidence, clear language and structure, reflection of method and adherence to standards of scientific honesty [ ]
Stringency, intra- and extra-scientific effects and breadth[ , ]
Technological and social merit[ ]
Transparency[ , ]
Continuity, innovation, originality, rigor, reflection, criticism, scientific exchange, inspiration, connection to society, diversity, variety, topicality, openness, integration, autonomy, productivity, intrinsic motivation, scholarship and connection with teaching[ ]
Truth/probability, testability, coherence, simplicity/completeness, honesty, openness and impartiality/objectivity, originality and relevance/fruitfulness/value and verisimilitude[ , , , ]
Value/usefulness[ ]
1Authors’ distance2Investigator’s expertise
3Craftsmanship4Interdisciplinarity
5Credibility6Resource attraction
7Disinterestedness8International scope
9Objectivity/Impartiality10Stringent argumentation
11Systems view12Honesty
13Organized skepticism
14Clarity15Originality of method
16Coherence17Cost effectiveness
18Communicability19Conformance to ethics
20Reliability21Reproducibility
22Rigorousness23Evidence disclosure
24Methodological soundness25Structure clarity
26Stakeholder involvement27Documented sources
28Accessibility and intelligibility29Thoroughness
30Adherence to standards/rules31Internal validity
32Unconventionality33Operationalization
34Testability35Universalism
36Completeness37Consistency
38Replication feasibility39Transparency
40Expert feedback41Plausibility
42Scope tailoring and focus43Research process evaluation
44Truth and veridicity
45Political and social significance46Scholarly exchange/diffusion
47Practitioner impact48Scientific significance and value
49Social impact/significance50Economic impact/significance
51Societal relevance and value52Educational impact
53Community impact54Intellectual significance
55Technological significance56Theory impact
57Usefulness58Trendsetting and future outline
59Novelty60Sustainability of outcomes
61Generalizability62Relevance
63Contextualization64Searchability
65Openness66Dissemination potential
Focus of QualityIllustrative Information or Evidence
Research Design
(Motivation, Concentration, Approach)
keywords, research classification, purpose, research problems/questions, literature/practitioner gap, researcher profile, assumptions, research team, research strategy
Research Process
(Data, Method, Reporting)
information sources, data processing methods and tools, partnerships, research context, reports, research funding, research project, case study protocol, full-time equivalent, products/services
Research Impact (Personal, Academic, Social)citations, H-index, journal impact factor, consulting activities, social initiatives, career advancements, community acknowledgment, web presence, positions, start-ups, new products/services, patents, policy documents, industry reports, training outcomes
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Margherita, A.; Elia, G.; Petti, C. What Is Quality in Research? Building a Framework of Design, Process and Impact Attributes and Evaluation Perspectives. Sustainability 2022 , 14 , 3034. https://doi.org/10.3390/su14053034

Margherita A, Elia G, Petti C. What Is Quality in Research? Building a Framework of Design, Process and Impact Attributes and Evaluation Perspectives. Sustainability . 2022; 14(5):3034. https://doi.org/10.3390/su14053034

Margherita, Alessandro, Gianluca Elia, and Claudio Petti. 2022. "What Is Quality in Research? Building a Framework of Design, Process and Impact Attributes and Evaluation Perspectives" Sustainability 14, no. 5: 3034. https://doi.org/10.3390/su14053034

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Publish?
  • About Research Evaluation
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

1. introduction, 4. synthesis, 4.1 principles of tdr quality, 5. conclusions, supplementary data, acknowledgements, defining and assessing research quality in a transdisciplinary context.

  • Article contents
  • Figures & tables
  • Supplementary Data

Brian M. Belcher, Katherine E. Rasmussen, Matthew R. Kemshaw, Deborah A. Zornes, Defining and assessing research quality in a transdisciplinary context, Research Evaluation , Volume 25, Issue 1, January 2016, Pages 1–17, https://doi.org/10.1093/reseval/rvv025

  • Permissions Icon Permissions

Research increasingly seeks both to generate knowledge and to contribute to real-world solutions, with strong emphasis on context and social engagement. As boundaries between disciplines are crossed, and as research engages more with stakeholders in complex systems, traditional academic definitions and criteria of research quality are no longer sufficient—there is a need for a parallel evolution of principles and criteria to define and evaluate research quality in a transdisciplinary research (TDR) context. We conducted a systematic review to help answer the question: What are appropriate principles and criteria for defining and assessing TDR quality? Articles were selected and reviewed seeking: arguments for or against expanding definitions of research quality, purposes for research quality evaluation, proposed principles of research quality, proposed criteria for research quality assessment, proposed indicators and measures of research quality, and proposed processes for evaluating TDR. We used the information from the review and our own experience in two research organizations that employ TDR approaches to develop a prototype TDR quality assessment framework, organized as an evaluation rubric. We provide an overview of the relevant literature and summarize the main aspects of TDR quality identified there. Four main principles emerge: relevance, including social significance and applicability; credibility, including criteria of integration and reflexivity, added to traditional criteria of scientific rigor; legitimacy, including criteria of inclusion and fair representation of stakeholder interests, and; effectiveness, with criteria that assess actual or potential contributions to problem solving and social change.

Contemporary research in the social and environmental realms places strong emphasis on achieving ‘impact’. Research programs and projects aim to generate new knowledge but also to promote and facilitate the use of that knowledge to enable change, solve problems, and support innovation ( Clark and Dickson 2003 ). Reductionist and purely disciplinary approaches are being augmented or replaced with holistic approaches that recognize the complex nature of problems and that actively engage within complex systems to contribute to change ‘on the ground’ ( Gibbons et al. 1994 ; Nowotny, Scott and Gibbons 2001 , Nowotny, Scott and Gibbons 2003 ; Klein 2006 ; Hemlin and Rasmussen 2006 ; Chataway, Smith and Wield 2007 ; Erno-Kjolhede and Hansson 2011 ). Emerging fields such as sustainability science have developed out of a need to address complex and urgent real-world problems ( Komiyama and Takeuchi 2006 ). These approaches are inherently applied and transdisciplinary, with explicit goals to contribute to real-world solutions and strong emphasis on context and social engagement ( Kates 2000 ).

While there is an ongoing conceptual and theoretical debate about the nature of the relationship between science and society (e.g. Hessels 2008 ), we take a more practical starting point based on the authors’ experience in two research organizations. The first author has been involved with the Center for International Forestry Research (CIFOR) for almost 20 years. CIFOR, as part of the Consultative Group on International Agricultural Research (CGIAR), began a major transformation in 2010 that shifted the emphasis from a primary focus on delivering high-quality science to a focus on ‘…producing, assembling and delivering, in collaboration with research and development partners, research outputs that are international public goods which will contribute to the solution of significant development problems that have been identified and prioritized with the collaboration of developing countries.’ ( CGIAR 2011 ). It was always intended that CGIAR research would be relevant to priority development and conservation issues, with emphasis on high-quality scientific outputs. The new approach puts much stronger emphasis on welfare and environmental results; research centers, programs, and individual scientists now assume shared responsibility for achieving development outcomes. This requires new ways of working, with more and different kinds of partnerships and more deliberate and strategic engagement in social systems.

Royal Roads University (RRU), the home institute of all four authors, is a relatively new (created in 1995) public university in Canada. It is deliberately interdisciplinary by design, with just two faculties (Faculty of Social and Applied Science; Faculty of Management) and strong emphasis on problem-oriented research. Faculty and student research is typically ‘applied’ in the Organization for Economic Co-operation and Development (2012) sense of ‘original investigation undertaken in order to acquire new knowledge … directed primarily towards a specific practical aim or objective’.

An increasing amount of the research done within both of these organizations can be classified as transdisciplinary research (TDR). TDR crosses disciplinary and institutional boundaries, is context specific, and problem oriented ( Klein 2006 ; Carew and Wickson 2010 ). It combines and blends methodologies from different theoretical paradigms, includes a diversity of both academic and lay actors, and is conducted with a range of research goals, organizational forms, and outputs ( Klein 2006 ; Boix-Mansilla 2006a ; Erno-Kjolhede and Hansson 2011 ). The problem-oriented nature of TDR and the importance placed on societal relevance and engagement are broadly accepted as defining characteristics of TDR ( Carew and Wickson 2010 ).

The experience developing and using TDR approaches at CIFOR and RRU highlights the need for a parallel evolution of principles and criteria for evaluating research quality in a TDR context. Scientists appreciate and often welcome the need and the opportunity to expand the reach of their research, to contribute more effectively to change processes. At the same time, they feel the pressure of added expectations and are looking for guidance.

In any activity, we need principles, guidelines, criteria, or benchmarks that can be used to design the activity, assess its potential, and evaluate its progress and accomplishments. Effective research quality criteria are necessary to guide the funding, management, ongoing development, and advancement of research methods, projects, and programs. The lack of quality criteria to guide and assess research design and performance is seen as hindering the development of transdisciplinary approaches ( Bergmann et al. 2005 ; Feller 2006 ; Chataway, Smith and Wield 2007 ; Ozga 2008 ; Carew and Wickson 2010 ; Jahn and Keil 2015 ). Appropriate quality evaluation is essential to ensure that research receives support and funding, and to guide and train researchers and managers to realize high-quality research ( Boix-Mansilla 2006a ; Klein 2008 ; Aagaard-Hansen and Svedin 2009 ; Carew and Wickson 2010 ).

Traditional disciplinary research is built on well-established methodological and epistemological principles and practices. Within disciplinary research, quality has been defined narrowly, with the primary criteria being scientific excellence and scientific relevance ( Feller 2006 ; Chataway, Smith and Wield 2007 ; Erno-Kjolhede and Hansson 2011 ). Disciplines have well-established (often implicit) criteria and processes for the evaluation of quality in research design ( Erno-Kjolhede and Hansson 2011 ). TDR that is highly context specific, problem oriented, and includes nonacademic societal actors in the research process is challenging to evaluate ( Wickson, Carew and Russell 2006 ; Aagaard-Hansen and Svedin 2009 ; Andrén 2010 ; Carew and Wickson 2010 ; Huutoniemi 2010 ). There is no one definition or understanding of what constitutes quality, nor a set guide for how to do TDR ( Lincoln 1995 ; Morrow 2005 ; Oberg 2008 ; Andrén 2010 ; Huutoniemi 2010 ). When epistemologies and methods from more than one discipline are used, disciplinary criteria may be insufficient and criteria from more than one discipline may be contradictory; cultural conflicts can arise as a range of actors use different terminology for the same concepts or the same terminology for different concepts ( Chataway, Smith and Wield 2007 ; Oberg 2008 ).

Current research evaluation approaches as applied to individual researchers, programs, and research units are still based primarily on measures of academic outputs (publications and the prestige of the publishing journal), citations, and peer assessment ( Boix-Mansilla 2006a ; Feller 2006 ; Erno-Kjolhede and Hansson 2011 ). While these indicators of research quality remain relevant, additional criteria are needed to address the innovative approaches and the diversity of actors, outputs, outcomes, and long-term social impacts of TDR. It can be difficult to find appropriate outlets for TDR publications simply because the research does not meet the expectations of traditional discipline-oriented journals. Moreover, a wider range of inputs and of outputs means that TDR may result in fewer academic outputs. This has negative implications for transdisciplinary researchers, whose performance appraisals and long-term career progression are largely governed by traditional publication and citation-based metrics of evaluation. Research managers, peer reviewers, academic committees, and granting agencies all struggle with how to evaluate and how to compare TDR projects ( ex ante or ex post ) in the absence of appropriate criteria to address epistemological and methodological variability. The extent of engagement of stakeholders 1 in the research process will vary by project, from information sharing through to active collaboration ( Brandt et al. 2013) , but at any level, the involvement of stakeholders adds complexity to the conceptualization of quality. We need to know what ‘good research’ is in a transdisciplinary context.

As Tijssen ( 2003 : 93) put it: ‘Clearly, in view of its strategic and policy relevance, developing and producing generally acceptable measures of “research excellence” is one of the chief evaluation challenges of the years to come’. Clear criteria are needed for research quality evaluation to foster excellence while supporting innovation: ‘A principal barrier to a broader uptake of TD research is a lack of clarity on what good quality TD research looks like’ ( Carew and Wickson 2010 : 1154). In the absence of alternatives, many evaluators, including funding bodies, rely on conventional, discipline-specific measures of quality which do not address important aspects of TDR.

There is an emerging literature that reviews, synthesizes, or empirically evaluates knowledge and best practice in research evaluation in a TDR context and that proposes criteria and evaluation approaches ( Defila and Di Giulio 1999 ; Bergmann et al. 2005 ; Wickson, Carew and Russell 2006 ; Klein 2008 ; Carew and Wickson 2010 ; ERIC 2010; de Jong et al. 2011 ; Spaapen and Van Drooge 2011 ). Much of it comes from a few fields, including health care, education, and evaluation; little comes from the natural resource management and sustainability science realms, despite these areas needing guidance. National-scale reviews have begun to recognize the need for broader research evaluation criteria but have had difficulty dealing with it and have made little progress in addressing it ( Donovan 2008 ; KNAW 2009 ; REF 2011 ; ARC 2012 ; TEC 2012 ). A summary of the national reviews that we reviewed in the development of this research is provided in Supplementary Appendix 1 . While there are some published evaluation schemes for TDR and interdisciplinary research (IDR), there is ‘substantial variation in the balance different authors achieve between comprehensiveness and over-prescription’ ( Wickson and Carew 2014 : 256) and still a need to develop standardized quality criteria that are ‘uniquely flexible to provide valid, reliable means to evaluate and compare projects, while not stifling the evolution and responsiveness of the approach’ ( Wickson and Carew 2014 : 256).

There is a need and an opportunity to synthesize current ideas about how to define and assess quality in TDR. To address this, we conducted a systematic review of the literature that discusses the definitions of research quality as well as the suggested principles and criteria for assessing TDR quality. The aim is to identify appropriate principles and criteria for defining and measuring research quality in a transdisciplinary context and to organize those principles and criteria as an evaluation framework.

The review question was: What are appropriate principles, criteria, and indicators for defining and assessing research quality in TDR?

This article presents the method used for the systematic review and our synthesis, followed by key findings. Theoretical concepts about why new principles and criteria are needed for TDR, along with associated discussions about evaluation process are presented. A framework, derived from our synthesis of the literature, of principles and criteria for TDR quality evaluation is presented along with guidance on its application. Finally, recommendations for next steps in this research and needs for future research are discussed.

2.1 Systematic review

Systematic review is a rigorous, transparent, and replicable methodology that has become widely used to inform evidence-based policy, management, and decision making ( Pullin and Stewart 2006 ; CEE 2010). Systematic reviews follow a detailed protocol with explicit inclusion and exclusion criteria to ensure a repeatable and comprehensive review of the target literature. Review protocols are shared and often published as peer reviewed articles before undertaking the review to invite critique and suggestions. Systematic reviews are most commonly used to synthesize knowledge on an empirical question by collating data and analyses from a series of comparable studies, though methods used in systematic reviews are continually evolving and are increasingly being developed to explore a wider diversity of questions ( Chandler 2014 ). The current study question is theoretical and methodological, not empirical. Nevertheless, with a diverse and diffuse literature on the quality of TDR, a systematic review approach provides a method for a thorough and rigorous review. The protocol is published and available at http://www.cifor.org/online-library/browse/view-publication/publication/4382.html . A schematic diagram of the systematic review process is presented in Fig. 1 .

Search process.

Search process.

2.2 Search terms

Search terms were designed to identify publications that discuss the evaluation or assessment of quality or excellence 2 of research 3 that is done in a TDR context. Search terms are listed online in Supplementary Appendices 2 and 3 . The search strategy favored sensitivity over specificity to ensure that we captured the relevant information.

2.3 Databases searched

ISI Web of Knowledge (WoK) and Scopus were searched between 26 June 2013 and 6 August 2013. The combined searches yielded 15,613 unique citations. Additional searches to update the first searchers were carried out in June 2014 and March 2015, for a total of 19,402 titles scanned. Google Scholar (GS) was searched separately by two reviewers during each search period. The first reviewer’s search was done on 2 September 2013 (Search 1) and 3 September 2013 (Search 2), yielding 739 and 745 titles, respectively. The second reviewer’s search was done on 19 November 2013 (Search 1) and 25 November 2013 (Search 2), yielding 769 and 774 titles, respectively. A third search done on 17 March 2015 by one reviewer yielded 98 new titles. Reviewers found high redundancy between the WoK/Scopus searches and the GS searches.

2.4 Targeted journal searches

Highly relevant journals, including Research Evaluation, Evaluation and Program Planning, Scientometrics, Research Policy, Futures, American Journal of Evaluation, Evaluation Review, and Evaluation, were comprehensively searched using broader, more inclusive search strings that would have been unmanageable for the main database search.

2.5 Supplementary searches

References in included articles were reviewed to identify additional relevant literature. td-net’s ‘Tour d’Horizon of Literature’, lists important inter- and transdisciplinary publications collected through an invitation to experts in the field to submit publications ( td-net 2014 ). Six additional articles were identified via supplementary search.

2.6 Limitations of coverage

The review was limited to English-language published articles and material available through internet searches. There was no systematic way to search the gray (unpublished) literature, but relevant material identified through supplementary searches was included.

2.7 Inclusion of articles

This study sought articles that review, critique, discuss, and/or propose principles, criteria, indicators, and/or measures for the evaluation of quality relevant to TDR. As noted, this yielded a large number of titles. We then selected only those articles with an explicit focus on the meaning of IDR and/or TDR quality and how to achieve, measure or evaluate it. Inclusion and exclusion criteria were developed through an iterative process of trial article screening and discussion within the research team. Through this process, inter-reviewer agreement was tested and strengthened. Inclusion criteria are listed in Tables 1 and 2 .

Inclusion criteria for title and abstract screening

Topic coverage
Document type
GeographicNo geographic barriers
DateNo temporal barriers
Discipline/fieldDiscussion must be relevant to environment, natural resources management, sustainability, livelihoods, or related areas of human–environmental interactionsThe discussion need not explicitly reference any of the above subject areas
Topic coverage
Document type
GeographicNo geographic barriers
DateNo temporal barriers
Discipline/fieldDiscussion must be relevant to environment, natural resources management, sustainability, livelihoods, or related areas of human–environmental interactionsThe discussion need not explicitly reference any of the above subject areas

Inclusion criteria for abstract and full article screening

ThemeInclusion criteria
Relevance to review objectives (all articles must meet this criteria)Intention of article, or part of article, is to discuss the meaning of research quality and how to measure/evaluate it
Theoretical discussion
Quality definitions and criteriaOffers an explicit definition or criteria of inter and/or transdisciplinary research quality
Evaluation processSuggests approaches to evaluate inter and/or transdisciplinary research quality. (will only be included if there is relevant discussion of research quality criteria and/or measurement)
Research ‘impact’Discusses research outcomes (diffusion, uptake, utilization, impact) as an indicator or consequence of research quality.
ThemeInclusion criteria
Relevance to review objectives (all articles must meet this criteria)Intention of article, or part of article, is to discuss the meaning of research quality and how to measure/evaluate it
Theoretical discussion
Quality definitions and criteriaOffers an explicit definition or criteria of inter and/or transdisciplinary research quality
Evaluation processSuggests approaches to evaluate inter and/or transdisciplinary research quality. (will only be included if there is relevant discussion of research quality criteria and/or measurement)
Research ‘impact’Discusses research outcomes (diffusion, uptake, utilization, impact) as an indicator or consequence of research quality.

Article screening was done in parallel by two reviewers in three rounds: (1) title, (2) abstract, and (3) full article. In cases of uncertainty, papers were included to the next round. Final decisions on inclusion of contested papers were made by consensus among the four team members.

2.8 Critical appraisal

In typical systematic reviews, individual articles are appraised to ensure that they are adequate for answering the research question and to assess the methods of each study for susceptibility to bias that could influence the outcome of the review (Petticrew and Roberts 2006). Most papers included in this review are theoretical and methodological papers, not empirical studies. Most do not have explicit methods that can be appraised with existing quality assessment frameworks. Our critical appraisal considered four criteria adapted from Spencer et al. (2003): (1) relevance to the review question, (2) clarity and logic of how information in the paper was generated, (3) significance of the contribution (are new ideas offered?), and (4) generalizability (is the context specified; do the ideas apply in other contexts?). Disagreements were discussed to reach consensus.

2.9 Data extraction and management

The review sought information on: arguments for or against expanding definitions of research quality, purposes for research quality evaluation, principles of research quality, criteria for research quality assessment, indicators and measures of research quality, and processes for evaluating TDR. Four reviewers independently extracted data from selected articles using the parameters listed in Supplementary Appendix 4 .

2.10 Data synthesis and TDR framework design

Our aim was to synthesize ideas, definitions, and recommendations for TDR quality criteria into a comprehensive and generalizable framework for the evaluation of quality in TDR. Key ideas were extracted from each article and summarized in an Excel database. We classified these ideas into themes and ultimately into overarching principles and associated criteria of TDR quality organized as a rubric ( Wickson and Carew 2014 ). Definitions of each principle and criterion were developed and rubric statements formulated based on the literature and our experience. These criteria (adjusted appropriately to be applied ex ante or ex post ) are intended to be used to assess a TDR project. The reviewer should consider whether the project fully satisfies, partially satisfies, or fails to satisfy each criterion. More information on application is provided in Section 4.3 below.

We tested the framework on a set of completed RRU graduate theses that used transdisciplinary approaches, with an explicit problem orientation and intent to contribute to social or environmental change. Three rounds of testing were done, with revisions after each round to refine and improve the framework.

3.1 Overview of the selected articles

Thirty-eight papers satisfied the inclusion criteria. A wide range of terms are used in the selected papers, including: cross-disciplinary; interdisciplinary; transdisciplinary; methodological pluralism; mode 2; triple helix; and supradisciplinary. Eight included papers specifically focused on sustainability science or TDR in natural resource management, or identified sustainability research as a growing TDR field that needs new forms of evaluation ( Cash et al. 2002 ; Bergmann et al. 2005 ; Chataway, Smith and Wield 2007 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Andrén 2010 ; Carew and Wickson 2010 ; Lang et al. 2012 ; Gaziulusoy and Boyle 2013 ). Carew and Wickson (2010) build on the experience in the TDR realm to propose criteria and indicators of quality for ‘responsible research and innovation’.

The selected articles are written from three main perspectives. One set is primarily interested in advancing TDR approaches. These papers recognize the need for new quality measures to encourage and promote high-quality research and to overcome perceived biases against TDR approaches in research funding and publishing. A second set of papers is written from an evaluation perspective, with a focus on improving evaluation of TDR. The third set is written from the perspective of qualitative research characterized by methodological pluralism, with many characteristics and issues relevant to TDR approaches.

The majority of the articles focus at the project scale, some at the organization level, and some do not specify. Some articles explicitly focus on ex ante evaluation (e.g. proposal evaluation), others on ex post evaluation, and many are not explicit about the project stage they are concerned with. The methods used in the reviewed articles include authors’ reflection and opinion, literature review, expert consultation, document analysis, and case study. Summaries of report characteristics are available online ( Supplementary Appendices 5–8 ). Eight articles provide comprehensive evaluation frameworks and quality criteria specifically for TDR and research-in-context. The rest of the articles discuss aspects of quality related to TDR and recommend quality definitions, criteria, and/or evaluation processes.

3.2 The need for quality criteria and evaluation methods for TDR

Many of the selected articles highlight the lack of widely agreed principles and criteria of TDR quality. They note that, in the absence of TDR quality frameworks, disciplinary criteria are used ( Morrow 2005 ; Boix-Mansilla 2006a , b ; Feller 2006 ; Klein 2006 , 2008 ; Wickson, Carew and Russell 2006 ; Scott 2007 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Oberg 2008 ; Erno-Kjolhede and Hansson 2011 ), and evaluations are often carried out by reviewers who lack cross-disciplinary experience and do not have a shared understanding of quality ( Aagaard-Hansen and Svedin 2009 ). Quality is discussed by many as a relative concept, developed within disciplines, and therefore defined and understood differently in each field ( Morrow 2005 ; Klein 2006 ; Oberg 2008 ; Mitchell and Willets 2009 ; Huutoniemi 2010 ; Hellstrom 2011 ). Jahn and Keil (2015) point out the difficulty of creating a common set of quality criteria for TDR in the absence of a standard agreed-upon definition of TDR. Many of the selected papers argue the need to move beyond narrowly defined ideas of ‘scientific excellence’ to incorporate a broader assessment of quality which includes societal relevance ( Hemlin and Rasmussen 2006 ; Chataway, Smith and Wield 2007 ; Ozga 2007 ; Spaapen, Dijstelbloem and Wamelink 2007 ). This shift includes greater focus on research organization, research process, and continuous learning, rather than primarily on research outputs ( Hemlin and Rasmussen 2006 ; de Jong et al. 2011 ; Wickson and Carew 2014 ; Jahn and Keil 2015 ). This responds to and reflects societal expectations that research should be accountable and have demonstrated utility ( Cloete 1997 ; Defila and Di Giulio 1999 ; Wickson, Carew and Russell 2006 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Stige 2009 ).

A central aim of TDR is to achieve socially relevant outcomes, and TDR quality criteria should demonstrate accountability to society ( Cloete 1997 ; Hemlin and Rasmussen 2006 ; Chataway, Smith and Wield 2007 ; Ozga 2007 ; Spaapen, Dijstelbloem and Wamelink 2007 ; de Jong et al. 2011 ). Integration and mutual learning are a core element of TDR; it is not enough to transcend boundaries and incorporate societal knowledge but, as Carew and Wickson ( 2010 : 1147) summarize: ‘…the TD researcher needs to put effort into integrating these potentially disparate knowledges with a view to creating useable knowledge. That is, knowledge that can be applied in a given problem context and has some prospect of producing desired change in that context’. The inclusion of societal actors in the research process, the unique and often dispersed organization of research teams, and the deliberate integration of different traditions of knowledge production all fall outside of conventional assessment criteria ( Feller 2006 ).

Not only do the range of criteria need to be updated, expanded, agreed upon, and assumptions made explicit ( Boix-Mansilla 2006a ; Klein 2006 ; Scott 2007 ) but, given the specific problem orientation of TDR, reviewers beyond disciplinary academic peers need to be included in the assessment of quality ( Cloete 1997 ; Scott 2007 ; Spappen et al. 2007 ; Klein 2008 ). Several authors discuss the lack of reviewers with strong cross-disciplinary experience ( Aagaard-Hansen and Svedin 2009 ) and the lack of common criteria, philosophical foundations, and language for use by peer reviewers ( Klein 2008 ; Aagaard-Hansen and Svedin 2009 ). Peer review of TDR could be improved with explicit TDR quality criteria, and appropriate processes in place to ensure clear dialog between reviewers.

Finally, there is the need for increased emphasis on evaluation as part of the research process ( Bergmann et al. 2005 ; Hemlin and Rasmussen 2006 ; Meyrick 2006 ; Chataway, Smith and Wield 2007 ; Stige, Malterud and Midtgarden 2009 ; Hellstrom 2011 ; Lang et al. 2012 ; Wickson and Carew 2014 ). This is particularly true in large, complex, problem-oriented research projects. Ongoing monitoring of the research organization and process contributes to learning and adaptive management while research is underway and so helps improve quality. As stated by Wickson and Carew ( 2014 : 262): ‘We believe that in any process of interpreting, rearranging and/or applying these criteria, open negotiation on their meaning and application would only positively foster transformative learning, which is a valued outcome of good TD processes’.

3.3 TDR quality criteria and assessment approaches

Many of the papers provide quality criteria and/or describe constituent parts of quality. Aagaard-Hansen and Svedin (2009) define three key aspects of quality: societal relevance, impact, and integration. Meyrick (2006) states that quality research is transparent and systematic. Boaz and Ashby (2003) describe quality in four dimensions: methodological quality, quality of reporting, appropriateness of methods, and relevance to policy and practice. Although each article deconstructs quality in different ways and with different foci and perspectives, there is significant overlap and recurring themes in the papers reviewed. There is a broadly shared perspective that TDR quality is a multidimensional concept shaped by the specific context within which research is done ( Spaapen, Dijstelbloem and Wamelink 2007 ; Klein 2008 ), making a universal definition of TDR quality difficult or impossible ( Huutoniemi 2010 ).

Huutoniemi (2010) identifies three main approaches to conceptualizing quality in IDR and TDR: (1) using existing disciplinary standards adapted as necessary for IDR; (2) building on the quality standards of disciplines while fundamentally incorporating ways to deal with epistemological integration, problem focus, context, stakeholders, and process; and (3) radical departure from any disciplinary orientation in favor of external, emergent, context-dependent quality criteria that are defined and enacted collaboratively by a community of users.

The first approach is prominent in current research funding and evaluation protocols. Conservative approaches of this kind are criticized for privileging disciplinary research and for failing to provide guidance and quality control for transdisciplinary projects. The third approach would ‘undermine the prevailing status of disciplinary standards in the pursuit of a non-disciplinary, integrated knowledge system’ ( Huutoniemi 2010 : 313). No predetermined quality criteria are offered, only contextually embedded criteria that need to be developed within a specific research project. To some extent, this is the approach taken by Spaapen, Dijstelbloem and Wamelink (2007) and de Jong et al. (2011) . Such a sui generis approach cannot be used to compare across projects. Most of the reviewed papers take the second approach, and recommend TDR quality criteria that build on a disciplinary base.

Eight articles present comprehensive frameworks for quality evaluation, each with a unique approach, perspective, and goal. Two of these build comprehensive lists of criteria with associated questions to be chosen based on the needs of the particular research project ( Defila and Di Giulio 1999 ; Bergmann et al. 2005 ). Wickson and Carew (2014) develop a reflective heuristic tool with questions to guide researchers through ongoing self-evaluation. They also list criteria for external evaluation and to compare between projects. Spaapen, Dijstelbloem and Wamelink (2007) design an approach to evaluate a research project against its own goals and is not meant to compare between projects. Wickson and Carew (2014) developed a comprehensive rubric for the evaluation of Research and Innovation that builds of their extensive previous work in TDR. Finally, Lang et al. (2012) , Mitchell and Willets (2009) , and Jahn and Keil (2015) develop criteria checklists that can be applied across transdisciplinary projects.

Bergmann et al. (2005) and Carew and Wickson (2010) organize their frameworks into managerial elements of the research project, concerning problem context, participation, management, and outcomes. Lang et al. (2012) and Defila and Di Giulio (1999) focus on the chronological stages in the research process and identify criteria at each stage. Mitchell and Willets (2009) , , with a focus on doctoral s tudies, adapt standard dissertation evaluation criteria to accommodate broader, pluralistic, and more complex studies. Spaapen, Dijstelbloem and Wamelink (2007) focus on evaluating ‘research-in-context’. Wickson and Carew (2014) created a rubric based on criteria that span the research process, stages, and all actors included. Jahn and Keil (2015) organized their quality criteria into three categories of quality including: quality of the research problems, quality of the research process, and quality of the research results.

The remaining papers highlight key themes that must be considered in TDR evaluation. Dominant themes include: engagement with problem context, collaboration and inclusion of stakeholders, heightened need for explicit communication and reflection, integration of epistemologies, recognition of diverse outputs, the focus on having an impact, and reflexivity and adaptation throughout the process. The focus on societal problems in context and the increased engagement of stakeholders in the research process introduces higher levels of complexity that cannot be accommodated by disciplinary standards ( Defila and Di Giulio 1999 ; Bergmann et al. 2005 ; Wickson, Carew and Russell 2006 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Klein 2008 ).

Finally, authors discuss process ( Defila and Di Giulio 1999 ; Bergmann et al. 2005 ; Boix-Mansilla 2006b ; Spaapen, Dijstelbloem and Wamelink 2007 ) and utilitarian values ( Hemlin 2006 ; Ernø-Kjølhede and Hansson 2011 ; Bornmann 2013 ) as essential aspects of quality in TDR. Common themes include: (1) the importance of formative and process-oriented evaluation ( Bergmann et al. 2005 ; Hemlin 2006 ; Stige 2009 ); (2) emphasis on the evaluation process itself (not just criteria or outcomes) and reflexive dialog for learning ( Bergmann et al. 2005 ; Boix-Mansilla 2006b ; Klein 2008 ; Oberg 2008 ; Stige, Malterud and Midtgarden 2009 ; Aagaard-Hansen and Svedin 2009 ; Carew and Wickson 2010 ; Huutoniemi 2010 ); (3) the need for peers who are experienced and knowledgeable about TDR for fair peer review ( Boix-Mansilla 2006a , b ; Klein 2006 ; Hemlin 2006 ; Scott 2007 ; Aagaard-Hansen and Svedin 2009 ); (4) the inclusion of stakeholders in the evaluation process ( Bergmann et al. 2005 ; Scott 2007 ; Andréen 2010 ); and (5) the importance of evaluations that are built in-context ( Defila and Di Giulio 1999 ; Feller 2006 ; Spaapen, Dijstelbloem and Wamelink 2007 ; de Jong et al. 2011 ).

While each reviewed approach offers helpful insights, none adequately fulfills the need for a broad and adaptable framework for assessing TDR quality. Wickson and Carew ( 2014 : 257) highlight the need for quality criteria that achieve balance between ‘comprehensiveness and over-prescription’: ‘any emerging quality criteria need to be concrete enough to provide real guidance but flexible enough to adapt to the specificities of varying contexts’. Based on our experience, such a framework should be:

Comprehensive: It should accommodate the main aspects of TDR, as identified in the review.

Time/phase adaptable: It should be applicable across the project cycle.

Scalable: It should be useful for projects of different scales.

Versatile: It should be useful to researchers and collaborators as a guide to research design and management, and to internal and external reviews and assessors.

Comparable: It should allow comparison of quality between and across projects/programs.

Reflexive: It should encourage and facilitate self-reflection and adaptation based on ongoing learning.

In this section, we synthesize the key principles and criteria of quality in TDR that were identified in the reviewed literature. Principles are the essential elements of high-quality TDR. Criteria are the conditions that need to be met in order to achieve a principle. We conclude by providing a framework for the evaluation of quality in TDR ( Table 3 ) and guidance for its application.

Transdisciplinary research quality assessment framework

CriteriaDefinitionRubric scale
Clearly defined socio-ecological contextThe context is well defined and described and analyzed sufficiently to identify research entry points.The context is well defined, described, and analyzed sufficiently to identify research entry points.
Socially relevant research problem Research problem is relevant to the problem context. The research problem is defined and framed in a way that clearly shows its relevance to the context and that demonstrates that consideration has been given to the practical application of research activities and outputs.
Engagement with problem context Researchers demonstrate appropriate breadth and depth of understanding of and sufficient interaction with the problem context. The documentation demonstrates that the researcher/team has interacted appropriately and sufficiently with the problem context to understand it and to have potential to influence it (e.g. through site visits, meeting participation, discussion with stakeholders, document review) in planning and implementing the research.
Explicit theory of changeThe research explicitly identifies its main intended outcomes and how they are intended/expected to be realized and to contribute to longer-term outcomes and/or impacts.The research explicitly identifies its main intended outcomes and how they are intended/expected to be realized and to contribute to longer-term outcomes and/or impacts.
Relevant research objectives and designThe research objectives and design are relevant, timely, and appropriate to the problem context, including attention to stakeholder needs and values.The documentation clearly demonstrates, through sufficient analysis of key factors, needs, and complexity within the context, that the research objectives and design are relevant and appropriate.
Appropriate project implementationResearch execution is suitable to the problem context and the socially relevant research objectives.The documentation reflects effective project implementation that is appropriate to the context, with reflection and adaptation as needed.
Effective communication Communication during and after the research process is appropriate to the context and accessible to stakeholders, users, and other intended audiences The documentation indicates that the research project planned and achieved appropriate communications with all necessary actors during the research process.
Broad preparationThe research is based on a strong integrated theoretical and empirical foundation that is relevant to the context.The documentation demonstrates critical understanding of an appropriate breadth and depth of literature and theory from across disciplines relevant to the context, and of the context itself
Clear research problem definitionThe research problem is clearly defined, researchable, grounded in the academic literature, and relevant to the context.The research problem is clearly stated and defined, researchable, and grounded in the academic literature and the problem context.
Objectives stated and metResearch objectives are clearly stated.The research objectives are clearly stated, logically and appropriately related to the context and the research problem, and achieved, with any necessary adaptation explained.
Feasible research projectThe research design and resources are appropriate and sufficient to meet the objectives as stated, and sufficiently resilient to adapt to unexpected opportunities and challenges throughout the research process.The research design and resources are appropriate and sufficient to meet the objectives as stated, and sufficiently resilient to adapt to unexpected opportunities and challenges throughout the research process.
Adequate competenciesThe skills and competencies of the researcher/team/collaboration (including academic and societal actors) are sufficient and in appropriate balance (without unnecessary complexity) to succeed.The documentation recognizes the limitations and biases of individuals’ knowledge and identifies the knowledge, skills, and expertise needed to carry out the research and provides evidence that they are represented in the research team in the appropriate measure to address the problem.
Research approach fits purposeDisciplines, perspectives, epistemologies, approaches, and theories are combined appropriately to create an approach that is appropriate to the research problem and the objectivesThe documentation explicitly states the rationale for the inclusion and integration of different epistemologies, disciplines, and methodologies, justifies the approach taken in reference to the context, and discusses the process of integration, including how paradoxes and conflicts were managed.
Appropriate methodsMethods are fit to purpose and well-suited to answering the research questions and achieving the objectives.Methods are clearly described, and documentation demonstrates that the methods are fit to purpose, systematic yet adaptable, and transparent. Novel (unproven) methods or adaptations are justified and explained, including why they were used and how they maintain scientific rigor.
Clearly presented argumentThe movement from analysis through interpretation to conclusions is transparently and logically described. Sufficient evidence is provided to clearly demonstrate the relationship between evidence and conclusions.Results are clearly presented. Analyses and interpretations are adequately explained, with clearly described terminology and full exposition of the logic leading to conclusions, including exploration of possible alternate explanations.
Transferability/generalizability of research findingsAppropriate and rigorous methods ensure the study’s findings are externally valid (generalizable). In some cases, findings may be too context specific to be generalizable in which case research would be judged on its ability to act as a model for future research.Document clearly explains how the research findings are transferable to other contexts OR, in cases that are too context-specific to be generalizable, discusses aspects of the research process or findings that may be transferable to other contexts and/or used as learning cases.
Limitations statedResearchers engage in ongoing individual and collective reflection in order to explicitly acknowledge and address limitations.Limitations are clearly stated and adequately accounted for on an ongoing basis through the research project.
Ongoing monitoring and reflexivity Researchers engage in ongoing reflection and adaptation of the research process, making changes as new obstacles, opportunities, circumstances, and/or knowledge surface.Processes of reflection, individually and as a research team, are clearly documented throughout the research process along with clear descriptions and justifications for any changes to the research process made as a result of reflection.
Disclosure of perspectiveActual, perceived, and potential bias is clearly stated and accounted for. This includes aspects of: researchers’ position, sources of support, financing, collaborations, partnerships, research mandate, assumptions, goals, and bounds placed on commissioned research.The documentation identifies potential or actual bias, including aspects of researchers’ positions, sources of support, financing, collaborations, partnerships, research mandate, assumptions, goals, and bounds placed on commissioned research.
Effective collaborationAppropriate processes are in place to ensure effective collaboration (e.g. clear and explicit roles and responsibilities agreed upon, transparent and appropriate decision-making structures)The documentation explicitly discusses the collaboration process, with adequate demonstration that the opportunities and process for collaboration are appropriate to the context and the actors involved (e.g. clear and explicit roles and responsibilities agreed upon, transparent and appropriate decision-making structures)
Genuine and explicit inclusionInclusion of diverse actors in the research process is clearly defined. Representation of actors' perspectives, values, and unique contexts is ensured through adequate planning, explicit agreements, communal reflection, and reflexivity.The documentation explains the range of participants and perspectives/cultural backgrounds involved, clearly describes what steps were taken to ensure the respectful inclusion of diverse actors/views, and explains the roles and contributions of all participants in the research process.
Research is ethicalResearch adheres to standards of ethical conduct.The documentation describes the ethical review process followed and, considering the full range of stakeholders, explicitly identifies any ethical challenges and how they were resolved.
Research builds social capacityChange takes place in individuals, groups, and at the institutional level through shared learning. This can manifest as a change in knowledge, understanding, and/or perspective of participants in the research project. There is evidence of observed changes in knowledge, behavior, understanding, and/or perspectives of research participants and/or stakeholders as a result of the research process and/or findings.
Contribution to knowledgeResearch contributes to knowledge and understanding in academic and social realms in a timely, relevant, and significant way.There is evidence that knowledge created through the project is being/has been used by intended audiences and end-users.
Practical applicationResearch has a practical application. The findings, process, and/or products of research are used.There is evidence that innovations developed through the research and/or the research process have been (or will be applied) in the real world.
Significant outcomeResearch contributes to the solution of the targeted problem or provides unexpected solutions to other problems. This can include a variety of outcomes: building societal capacity, learning, use of research products, and/or changes in behaviorsThere is evidence that the research has contributed to positive change in the problem context and/or innovations that have positive social or environmental impacts.
CriteriaDefinitionRubric scale
Clearly defined socio-ecological contextThe context is well defined and described and analyzed sufficiently to identify research entry points.The context is well defined, described, and analyzed sufficiently to identify research entry points.
Socially relevant research problem Research problem is relevant to the problem context. The research problem is defined and framed in a way that clearly shows its relevance to the context and that demonstrates that consideration has been given to the practical application of research activities and outputs.
Engagement with problem context Researchers demonstrate appropriate breadth and depth of understanding of and sufficient interaction with the problem context. The documentation demonstrates that the researcher/team has interacted appropriately and sufficiently with the problem context to understand it and to have potential to influence it (e.g. through site visits, meeting participation, discussion with stakeholders, document review) in planning and implementing the research.
Explicit theory of changeThe research explicitly identifies its main intended outcomes and how they are intended/expected to be realized and to contribute to longer-term outcomes and/or impacts.The research explicitly identifies its main intended outcomes and how they are intended/expected to be realized and to contribute to longer-term outcomes and/or impacts.
Relevant research objectives and designThe research objectives and design are relevant, timely, and appropriate to the problem context, including attention to stakeholder needs and values.The documentation clearly demonstrates, through sufficient analysis of key factors, needs, and complexity within the context, that the research objectives and design are relevant and appropriate.
Appropriate project implementationResearch execution is suitable to the problem context and the socially relevant research objectives.The documentation reflects effective project implementation that is appropriate to the context, with reflection and adaptation as needed.
Effective communication Communication during and after the research process is appropriate to the context and accessible to stakeholders, users, and other intended audiences The documentation indicates that the research project planned and achieved appropriate communications with all necessary actors during the research process.
Broad preparationThe research is based on a strong integrated theoretical and empirical foundation that is relevant to the context.The documentation demonstrates critical understanding of an appropriate breadth and depth of literature and theory from across disciplines relevant to the context, and of the context itself
Clear research problem definitionThe research problem is clearly defined, researchable, grounded in the academic literature, and relevant to the context.The research problem is clearly stated and defined, researchable, and grounded in the academic literature and the problem context.
Objectives stated and metResearch objectives are clearly stated.The research objectives are clearly stated, logically and appropriately related to the context and the research problem, and achieved, with any necessary adaptation explained.
Feasible research projectThe research design and resources are appropriate and sufficient to meet the objectives as stated, and sufficiently resilient to adapt to unexpected opportunities and challenges throughout the research process.The research design and resources are appropriate and sufficient to meet the objectives as stated, and sufficiently resilient to adapt to unexpected opportunities and challenges throughout the research process.
Adequate competenciesThe skills and competencies of the researcher/team/collaboration (including academic and societal actors) are sufficient and in appropriate balance (without unnecessary complexity) to succeed.The documentation recognizes the limitations and biases of individuals’ knowledge and identifies the knowledge, skills, and expertise needed to carry out the research and provides evidence that they are represented in the research team in the appropriate measure to address the problem.
Research approach fits purposeDisciplines, perspectives, epistemologies, approaches, and theories are combined appropriately to create an approach that is appropriate to the research problem and the objectivesThe documentation explicitly states the rationale for the inclusion and integration of different epistemologies, disciplines, and methodologies, justifies the approach taken in reference to the context, and discusses the process of integration, including how paradoxes and conflicts were managed.
Appropriate methodsMethods are fit to purpose and well-suited to answering the research questions and achieving the objectives.Methods are clearly described, and documentation demonstrates that the methods are fit to purpose, systematic yet adaptable, and transparent. Novel (unproven) methods or adaptations are justified and explained, including why they were used and how they maintain scientific rigor.
Clearly presented argumentThe movement from analysis through interpretation to conclusions is transparently and logically described. Sufficient evidence is provided to clearly demonstrate the relationship between evidence and conclusions.Results are clearly presented. Analyses and interpretations are adequately explained, with clearly described terminology and full exposition of the logic leading to conclusions, including exploration of possible alternate explanations.
Transferability/generalizability of research findingsAppropriate and rigorous methods ensure the study’s findings are externally valid (generalizable). In some cases, findings may be too context specific to be generalizable in which case research would be judged on its ability to act as a model for future research.Document clearly explains how the research findings are transferable to other contexts OR, in cases that are too context-specific to be generalizable, discusses aspects of the research process or findings that may be transferable to other contexts and/or used as learning cases.
Limitations statedResearchers engage in ongoing individual and collective reflection in order to explicitly acknowledge and address limitations.Limitations are clearly stated and adequately accounted for on an ongoing basis through the research project.
Ongoing monitoring and reflexivity Researchers engage in ongoing reflection and adaptation of the research process, making changes as new obstacles, opportunities, circumstances, and/or knowledge surface.Processes of reflection, individually and as a research team, are clearly documented throughout the research process along with clear descriptions and justifications for any changes to the research process made as a result of reflection.
Disclosure of perspectiveActual, perceived, and potential bias is clearly stated and accounted for. This includes aspects of: researchers’ position, sources of support, financing, collaborations, partnerships, research mandate, assumptions, goals, and bounds placed on commissioned research.The documentation identifies potential or actual bias, including aspects of researchers’ positions, sources of support, financing, collaborations, partnerships, research mandate, assumptions, goals, and bounds placed on commissioned research.
Effective collaborationAppropriate processes are in place to ensure effective collaboration (e.g. clear and explicit roles and responsibilities agreed upon, transparent and appropriate decision-making structures)The documentation explicitly discusses the collaboration process, with adequate demonstration that the opportunities and process for collaboration are appropriate to the context and the actors involved (e.g. clear and explicit roles and responsibilities agreed upon, transparent and appropriate decision-making structures)
Genuine and explicit inclusionInclusion of diverse actors in the research process is clearly defined. Representation of actors' perspectives, values, and unique contexts is ensured through adequate planning, explicit agreements, communal reflection, and reflexivity.The documentation explains the range of participants and perspectives/cultural backgrounds involved, clearly describes what steps were taken to ensure the respectful inclusion of diverse actors/views, and explains the roles and contributions of all participants in the research process.
Research is ethicalResearch adheres to standards of ethical conduct.The documentation describes the ethical review process followed and, considering the full range of stakeholders, explicitly identifies any ethical challenges and how they were resolved.
Research builds social capacityChange takes place in individuals, groups, and at the institutional level through shared learning. This can manifest as a change in knowledge, understanding, and/or perspective of participants in the research project. There is evidence of observed changes in knowledge, behavior, understanding, and/or perspectives of research participants and/or stakeholders as a result of the research process and/or findings.
Contribution to knowledgeResearch contributes to knowledge and understanding in academic and social realms in a timely, relevant, and significant way.There is evidence that knowledge created through the project is being/has been used by intended audiences and end-users.
Practical applicationResearch has a practical application. The findings, process, and/or products of research are used.There is evidence that innovations developed through the research and/or the research process have been (or will be applied) in the real world.
Significant outcomeResearch contributes to the solution of the targeted problem or provides unexpected solutions to other problems. This can include a variety of outcomes: building societal capacity, learning, use of research products, and/or changes in behaviorsThere is evidence that the research has contributed to positive change in the problem context and/or innovations that have positive social or environmental impacts.

a Research problems are the particular topic, area of concern, question to be addressed, challenge, opportunity, or focus of the research activity. Research problems are related to the societal problem but take on a specific focus, or framing, within a societal problem.

b Problem context refers to the social and environmental setting(s) that gives rise to the research problem, including aspects of: location; culture; scale in time and space; social, political, economic, and ecological/environmental conditions; resources and societal capacity available; uncertainty, complexity, and novelty associated with the societal problem; and the extent of agency that is held by stakeholders ( Carew and Wickson 2010 ).

c Words such as ‘appropriate’, ‘suitable’, and ‘adequate’ are used deliberately to allow for quality criteria to be flexible and specific enough to the needs of individual research projects ( Oberg 2008 ).

d Research process refers to the series of decisions made and actions taken throughout the entire duration of the research project and encompassing all aspects of the research project.

e Reflexivity refers to an iterative process of formative, critical reflection on the important interactions and relationships between a research project’s process, context, and product(s).

f In an ex ante evaluation, ‘evidence of’ would be replaced with ‘potential for’.

There is a strong trend in the reviewed articles to recognize the need for appropriate measures of scientific quality (usually adapted from disciplinary antecedants), but also to consider broader sets of criteria regarding the societal significance and applicability of research, and the need for engagement and representation of stakeholder values and knowledge. Cash et al. (2002) nicely conceptualize three key aspects of effective sustainability research as: salience (or relevance), credibility, and legitimacy. These are presented as necessary attributes for research to successfully produce transferable, useful information that can cross boundaries between disciplines, across scales, and between science and society. Many of the papers also refer to the principle that high-quality TDR should be effective in terms of contributing to the solution of problems. These four principles are discussed in the following sections.

4.1.1 Relevance

Relevance is the importance, significance, and usefulness of the research project's objectives, process, and findings to the problem context and to society. This includes the appropriateness of the timing of the research, the questions being asked, the outputs, and the scale of the research in relation to the societal problem being addressed. Good-quality TDR addresses important social/environmental problems and produces knowledge that is useful for decision making and problem solving ( Cash et al. 2002 ; Klein 2006 ). As Erno-Kjolhede and Hansson ( 2011 : 140) explain, quality ‘is first and foremost about creating results that are applicable and relevant for the users of the research’. Researchers must demonstrate an in-depth knowledge of and ongoing engagement with the problem context in which their research takes place ( Wickson, Carew and Russell 2006 ; Stige, Malterud and Midtgarden 2009 ; Mitchell and Willets 2009 ). From the early steps of problem formulation and research design through to the appropriate and effective communication of research findings, the applicability and relevance of the research to the societal problem must be explicitly stated and incorporated.

4.1.2 Credibility

Credibility refers to whether or not the research findings are robust and the knowledge produced is scientifically trustworthy. This includes clear demonstration that the data are adequate, with well-presented methods and logical interpretations of findings. High-quality research is authoritative, transparent, defensible, believable, and rigorous. This is the traditional purview of science, and traditional disciplinary criteria can be applied in TDR evaluation to an extent. Additional and modified criteria are needed to address the integration of epistemologies and methodologies and the development of novel methods through collaboration, the broad preparation and competencies required to carry out the research, and the need for reflection and adaptation when operating in complex systems. Having researchers actively engaged in the problem context and including extra-scientific actors as part of the research process helps to achieve relevance and legitimacy of the research; it also adds complexity and heightened requirements of transparency, reflection, and reflexivity to ensure objective, credible research is carried out.

Active reflexivity is a criterion of credibility of TDR that may seem to contradict more rigid disciplinary methodological traditions ( Carew and Wickson 2010 ). Practitioners of TDR recognize that credible work in these problem-oriented fields requires active reflexivity, epitomized by ongoing learning, flexibility, and adaptation to ensure the research approach and objectives remain relevant and fit-to-purpose ( Lincoln 1995 ; Bergmann et al. 2005 ; Wickson, Carew and Russell 2006 ; Mitchell and Willets 2009 ; Andreén 2010 ; Carew and Wickson 2010 ; Wickson and Carew 2014 ). Changes made during the research process must be justified and reported transparently and explicitly to maintain credibility.

The need for critical reflection on potential bias and limitations becomes more important to maintain credibility of research-in-context ( Lincoln 1995 ; Bergmann et al. 2005 ; Mitchell and Willets 2009 ; Stige, Malterud and Midtgarden 2009 ). Transdisciplinary researchers must ensure they maintain a high level of objectivity and transparency while actively engaging in the problem context. This point demonstrates the fine balance between different aspects of quality, in this case relevance and credibility, and the need to be aware of tensions and to seek complementarities ( Cash et al. 2002 ).

4.1.3 Legitimacy

Legitimacy refers to whether the research process is perceived as fair and ethical by end-users. In other words, is it acceptable and trustworthy in the eyes of those who will use it? This requires the appropriate inclusion and consideration of diverse values, interests, and the ethical and fair representation of all involved. Legitimacy may be achieved in part through the genuine inclusion of stakeholders in the research process. Whereas credibility refers to technical aspects of sound research, legitimacy deals with sociopolitical aspects of the knowledge production process and products of research. Do stakeholders trust the researchers and the research process, including funding sources and other sources of potential bias? Do they feel represented? Legitimate TDR ‘considers appropriate values, concerns, and perspectives of different actors’ ( Cash et al. 2002 : 2) and incorporates these perspectives into the research process through collaboration and mutual learning ( Bergmann et al. 2005 ; Chataway, Smith and Wield 2007 ; Andrén 2010 ; Huutoneimi 2010 ). A fair and ethical process is important to uphold standards of quality in all research. However, there are additional considerations that are unique to TDR.

Because TDR happens in-context and often in collaboration with societal actors, the disclosure of researcher perspective and a transparent statement of all partnerships, financing, and collaboration is vital to ensure an unbiased research process ( Lincoln 1995 ; Defila and Di Giulio 1999 ; Boaz and Ashby 2003 ; Barker and Pistrang 2005 ; Bergmann et al. 2005 ). The disclosure of perspective has both internal and external aspects, on one hand ensuring the researchers themselves explicitly reflect on and account for their own position, potential sources of bias, and limitations throughout the process, and on the other hand making the process transparent to those external to the research group who can then judge the legitimacy based on their perspective of fairness ( Cash et al. 2002 ).

TDR includes the engagement of societal actors along a continuum of participation from consultation to co-creation of knowledge ( Brandt et al. 2013 ). Regardless of the depth of participation, all processes that engage societal actors must ensure that inclusion/engagement is genuine, roles are explicit, and processes for effective and fair collaboration are present ( Bergmann et al. 2005 ; Wickson, Carew and Russell 2006 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Hellstrom 2012 ). Important considerations include: the accurate representation of those involved; explicit and agreed-upon roles and contributions of actors; and adequate planning and procedures to ensure all values, perspectives, and contexts are adequately and appropriately incorporated. Mitchell and Willets (2009) consider cultural competence as a key criterion that can support researchers in navigating diverse epistemological perspectives. This is similar to what Morrow terms ‘social validity’, a criterion that asks researchers to be responsive to and critically aware of the diversity of perspectives and cultures influenced by their research. Several authors highlight that in order to develop this critical awareness of the diversity of cultural paradigms that operate within a problem situation, researchers should practice responsive, critical, and/or communal reflection ( Bergmann et al. 2005 ; Wickson, Carew and Russell 2006 ; Mitchell and Willets 2009 ; Carew and Wickson 2010 ). Reflection and adaptation are important quality criteria that cut across multiple principles and facilitate learning throughout the process, which is a key foundation to TD inquiry.

4.1.4 Effectiveness

We define effective research as research that contributes to positive change in the social, economic, and/or environmental problem context. Transdisciplinary inquiry is rooted in the objective of solving real-word problems ( Klein 2008 ; Carew and Wickson 2010 ) and must have the potential to ( ex ante ) or actually ( ex post ) make a difference if it is to be considered of high quality ( Erno-Kjolhede and Hansson 2011 ). Potential research effectiveness can be indicated and assessed at the proposal stage and during the research process through: a clear and stated intention to address and contribute to a societal problem, the establishment of the research process and objectives in relation to the problem context, and the continuous reflection on the usefulness of the research findings and products to the problem ( Bergmann et al. 2005 ; Lahtinen et al. 2005 ; de Jong et al. 2011 ).

Assessing research effectiveness ex post remains a major challenge, especially in complex transdisciplinary approaches. Conventional and widely used measures of ‘scientific impact’ count outputs such as journal articles and other publications and citations of those outputs (e.g. H index; i10 index). While these are useful indicators of scholarly influence, they are insufficient and inappropriate measures of research effectiveness where research aims to contribute to social learning and change. We need to also (or alternatively) focus on other kinds of research and scholarship outputs and outcomes and the social, economic, and environmental impacts that may result.

For many authors, contributing to learning and building of societal capacity are central goals of TDR ( Defila and Di Giulio 1999 ; Spaapen, Dijstelbloem and Wamelink 2007 ; Carew and Wickson 2010 ; Erno-Kjolhede and Hansson 2011 ; Hellstrom 2011 ), and so are considered part of TDR effectiveness. Learning can be characterized as changes in knowledge, attitudes, or skills and can be assessed directly, or through observed behavioral changes and network and relationship development. Some evaluation methodologies (e.g. Outcome Mapping ( Earl, Carden and Smutylo 2001 )) specifically measure these kinds of changes. Other evaluation methodologies consider the role of research within complex systems and assess effectiveness in terms of contributions to changes in policy and practice and resulting social, economic, and environmental benefits ( ODI 2004 , 2012 ; White and Phillips 2012 ; Mayne et al. 2013 ).

4.2 TDR quality criteria

TDR quality criteria and their definitions (explicit or implicit) were extracted from each article and summarized in an Excel database. These criteria were classified into themes corresponding to the four principles identified above, sorted and refined to develop sets of criteria that are comprehensive, mutually exclusive, and representative of the ideas presented in the reviewed articles. Within each principle, the criteria are organized roughly in the sequence of a typical project cycle (e.g. with research design following problem identification and preceding implementation). Definitions of each criterion were developed to reflect the concepts found in the literature, tested and refined iteratively to improve clarity. Rubric statements were formulated based on the literature and our own experience.

The complete set of principles, criteria, and definitions is presented as the TDR Quality Assessment Framework ( Table 3 ).

4.3 Guidance on the application of the framework

4.3.1 timing.

Most criteria can be applied at each stage of the research process, ex ante , mid term, and ex post , using appropriate interpretations at each stage. Ex ante (i.e. proposal) assessment should focus on a project’s explicitly stated intentions and approaches to address the criteria. Mid-term indicators will focus on the research process and whether or not it is being implemented in a way that will satisfy the criteria. Ex post assessment should consider whether the research has been done appropriately for the purpose and that the desired results have been achieved.

4.3.2 New meanings for familiar terms

Many of the terms used in the framework are extensions of disciplinary criteria and share the same or similar names and perhaps similar but nuanced meaning. The principles and criteria used here extend beyond disciplinary antecedents and include new concepts and understandings that encapsulate the unique characteristics and needs of TDR and allow for evaluation and definition of quality in TDR. This is especially true in the criteria related to credibility. These criteria are analogous to traditional disciplinary criteria, but with much stronger emphasis on grounding in both the scientific and the social/environmental contexts. We urge readers to pay close attention to the definitions provided in Table 3 as well as the detailed descriptions of the principles in Section 4.1.

4.3.3 Using the framework

The TDR quality framework ( Table 3 ) is designed to be used to assess TDR research according to a project’s purpose; i.e. the criteria must be interpreted with respect to the context and goals of an individual research activity. The framework ( Table 3 ) lists the main criteria synthesized from the literature and our experience, organized within the principles of relevance, credibility, legitimacy, and effectiveness. The table presents the criteria within each principle, ordered to approximate a typical process of identifying a research problem and designing and implementing research. We recognize that the actual process in any given project will be iterative and will not necessarily follow this sequence, but this provides a logical flow. A concise definition is provided in the second column to explain each criterion. We then provide a rubric statement in the third column, phrased to be applied when the research has been completed. In most cases, the same statement can be used at the proposal stage with a simple tense change or other minor grammatical revision, except for the criteria relating to effectiveness. As discussed above, assessing effectiveness in terms of outcomes and/or impact requires evaluation research. At the proposal stage, it is only possible to assess potential effectiveness.

Many rubrics offer a set of statements for each criterion that represent progressively higher levels of achievement; the evaluator is asked to select the best match. In practice, this often results in vague and relative statements of merit that are difficult to apply. We have opted to present a single rubric statement in absolute terms for each criterion. The assessor can then rank how well a project satisfies each criterion using a simple three-point Likert scale. If a project fully satisfies a criterion—that is, if there is evidence that the criterion has been addressed in a way that is coherent, explicit, sufficient, and convincing—it should be ranked as a 2 for that criterion. A score of 2 means that the evaluator is persuaded that the project addressed that criterion in an intentional, appropriate, explicit, and thorough way. A score of 1 would be given when there is some evidence that the criterion was considered, but it is lacking completion, intention, and/or is not addressed satisfactorily. For example, a score of 1 would be given when a criterion is explicitly discussed but poorly addressed, or when there is some indication that the criterion has been considered and partially addressed but it has not been treated explicitly, thoroughly, or adequately. A score of 0 indicates that there is no evidence that the criterion was addressed or that it was addressed in a way that was misguided or inappropriate.

It is critical that the evaluation be done in context, keeping in mind the purpose, objectives, and resources of the project, as well as other contextual information, such as the intended purpose of grant funding or relevant partnerships. Each project will be unique in its complexities; what is sufficient or adequate in one criterion for one research project may be insufficient or inappropriate for another. Words such as ‘appropriate’, ‘suitable’, and ‘adequate’ are used deliberately to encourage application of criteria to suit the needs of individual research projects ( Oberg 2008 ). Evaluators must consider the objectives of the research project and the problem context within which it is carried out as the benchmark for evaluation. For example, we tested the framework with RRU masters theses. These are typically small projects with limited scope, carried out by a single researcher. Expectations for ‘effective communication’ or ‘competencies’ or ‘effective collaboration’ are much different in these kinds of projects than in a multi-year, multi-partner CIFOR project. All criteria should be evaluated through the lens of the stated research objectives, research goals, and context.

The systematic review identified relevant articles from a diverse literature that have a strong central focus. Collectively, they highlight the complexity of contemporary social and environmental problems and emphasize that addressing such issues requires combinations of new knowledge and innovation, action, and engagement. Traditional disciplinary research has often failed to provide solutions because it cannot adequately cope with complexity. New forms of research are proliferating, crossing disciplinary and academic boundaries, integrating methodologies, and engaging a broader range of research participants, as a way to make research more relevant and effective. Theoretically, such approaches appear to offer great potential to contribute to transformative change. However, because these approaches are new and because they are multidimensional, complex, and often unique, it has been difficult to know what works, how, and why. In the absence of the kinds of methodological and quality standards that guide disciplinary research, there are no generally agreed criteria for evaluating such research.

Criteria are needed to guide and to help ensure that TDR is of high quality, to inform the teaching and learning of new researchers, and to encourage and support the further development of transdisciplinary approaches. The lack of a standard and broadly applicable framework for the evaluation of quality in TDR is perceived to cause an implicit or explicit devaluation of high-quality TDR or may prevent quality TDR from being done. There is a demonstrated need for an operationalized understanding of quality that addresses the characteristics, contributions, and challenges of TDR. The reviewed articles approach the topic from different perspectives and fields of study, using different terminology for similar concepts, or the same terminology for different concepts, and with unique ways of organizing and categorizing the dimensions and quality criteria. We have synthesized and organized these concepts as key TDR principles and criteria in a TDR Quality Framework, presented as an evaluation rubric. We have tested the framework on a set of masters’ theses and found it to be broadly applicable, usable, and useful for analyzing individual projects and for comparing projects within the set. We anticipate that further testing with a wider range of projects will help further refine and improve the definitions and rubric statements. We found that the three-point Likert scale (0–2) offered sufficient variability for our purposes, and rating is less subjective than with relative rubric statements. It may be possible to increase the rating precision with more points on the scale to increase the sensitivity for comparison purposes, for example in a review of proposals for a particular grant application.

Many of the articles we reviewed emphasize the importance of the evaluation process itself. The formative, developmental role of evaluation in TDR is seen as essential to the goals of mutual learning as well as to ensure that research remains responsive and adaptive to the problem context. In order to adequately evaluate quality in TDR, the process, including who carries out the evaluations, when, and in what manner, must be revised to be suitable to the unique characteristics and objectives of TDR. We offer this review and synthesis, along with a proposed TDR quality evaluation framework, as a contribution to an important conversation. We hope that it will be useful to researchers and research managers to help guide research design, implementation and reporting, and to the community of research organizations, funders, and society at large. As underscored in the literature review, there is a need for an adapted research evaluation process that will help advance problem-oriented research in complex systems, ultimately to improve research effectiveness.

This work was supported by funding from the Canada Research Chairs program. Funding support from the Canadian Social Sciences and Humanities Research Council (SSHRC) and technical support from the Evidence Based Forestry Initiative of the Centre for International Forestry Research (CIFOR), funded by UK DfID are also gratefully acknowledged.

Supplementary data is available here

The authors thank Barbara Livoreil and Stephen Dovers for valuable comments and suggestions on the protocol and Gillian Petrokofsky for her review of the protocol and a draft version of the manuscript. Two anonymous reviewers and the editor provided insightful critique and suggestions in two rounds that have helped to substantially improve the article.

Conflict of interest statement . None declared.

1. ‘Stakeholders’ refers to individuals and groups of societal actors who have an interest in the issue or problem that the research seeks to address.

2. The terms ‘quality’ and ‘excellence’ are often used in the literature with similar meaning. Technically, ‘excellence’ is a relative concept, referring to the superiority of a thing compared to other things of its kind. Quality is an attribute or a set of attributes of a thing. We are interested in what these attributes are or should be in high-quality research. Therefore, the term ‘quality’ is used in this discussion.

3. The terms ‘science’ and ‘research’ are not always clearly distinguished in the literature. We take the position that ‘science’ is a more restrictive term that is properly applied to systematic investigations using the scientific method. ‘Research’ is a broader term for systematic investigations using a range of methods, including but not restricted to the scientific method. We use the term ‘research’ in this broad sense.

Aagaard-Hansen J. Svedin U. ( 2009 ) ‘Quality Issues in Cross-disciplinary Research: Towards a Two-pronged Approach to Evaluation’ , Social Epistemology , 23 / 2 : 165 – 76 . DOI: 10.1080/02691720902992323

Google Scholar

Andrén S. ( 2010 ) ‘A Transdisciplinary, Participatory and Action-Oriented Research Approach: Sounds Nice but What do you Mean?’ [unpublished working paper] Human Ecology Division: Lund University, 1–21. < https://lup.lub.lu.se/search/publication/1744256 >

Australian Research Council (ARC) ( 2012 ) ERA 2012 Evaluation Handbook: Excellence in Research for Australia . Australia : ARC . < http://www.arc.gov.au/pdf/era12/ERA%202012%20Evaluation%20Handbook_final%20for%20web_protected.pdf >

Google Preview

Balsiger P. W. ( 2004 ) ‘Supradisciplinary Research Practices: History, Objectives and Rationale’ , Futures , 36 / 4 : 407 – 21 .

Bantilan M. C. et al.  . ( 2004 ) ‘Dealing with Diversity in Scientific Outputs: Implications for International Research Evaluation’ , Research Evaluation , 13 / 2 : 87 – 93 .

Barker C. Pistrang N. ( 2005 ) ‘Quality Criteria under Methodological Pluralism: Implications for Conducting and Evaluating Research’ , American Journal of Community Psychology , 35 / 3-4 : 201 – 12 .

Bergmann M. et al.  . ( 2005 ) Quality Criteria of Transdisciplinary Research: A Guide for the Formative Evaluation of Research Projects . Central report of Evalunet – Evaluation Network for Transdisciplinary Research. Frankfurt am Main, Germany: Institute for Social-Ecological Research. < http://www.isoe.de/ftp/evalunet_guide.pdf >

Boaz A. Ashby D. ( 2003 ) Fit for Purpose? Assessing Research Quality for Evidence Based Policy and Practice .

Boix-Mansilla V. ( 2006a ) ‘Symptoms of Quality: Assessing Expert Interdisciplinary Work at the Frontier: An Empirical Exploration’ , Research Evaluation , 15 / 1 : 17 – 29 .

Boix-Mansilla V. . ( 2006b ) ‘Conference Report: Quality Assessment in Interdisciplinary Research and Education’ , Research Evaluation , 15 / 1 : 69 – 74 .

Bornmann L. ( 2013 ) ‘What is Societal Impact of Research and How can it be Assessed? A Literature Survey’ , Journal of the American Society for Information Science and Technology , 64 / 2 : 217 – 33 .

Brandt P. et al.  . ( 2013 ) ‘A Review of Transdisciplinary Research in Sustainability Science’ , Ecological Economics , 92 : 1 – 15 .

Cash D. Clark W.C. Alcock F. Dickson N. M. Eckley N. Jäger J . ( 2002 ) Salience, Credibility, Legitimacy and Boundaries: Linking Research, Assessment and Decision Making (November 2002). KSG Working Papers Series RWP02-046. Available at SSRN: http://ssrn.com/abstract=372280 .

Carew A. L. Wickson F. ( 2010 ) ‘The TD Wheel: A Heuristic to Shape, Support and Evaluate Transdisciplinary Research’ , Futures , 42 / 10 : 1146 – 55 .

Collaboration for Environmental Evidence (CEE) . ( 2013 ) Guidelines for Systematic Review and Evidence Synthesis in Environmental Management . Version 4.2. Environmental Evidence < www.environmentalevidence.org/Documents/Guidelines/Guidelines4.2.pdf >

Chandler J. ( 2014 ) Methods Research and Review Development Framework: Policy, Structure, and Process . < http://methods.cochrane.org/projects-developments/research >

Chataway J. Smith J. Wield D. ( 2007 ) ‘Shaping Scientific Excellence in Agricultural Research’ , International Journal of Biotechnology 9 / 2 : 172 – 87 .

Clark W. C. Dickson N. ( 2003 ) ‘Sustainability Science: The Emerging Research Program’ , PNAS 100 / 14 : 8059 – 61 .

Consultative Group on International Agricultural Research (CGIAR) ( 2011 ) A Strategy and Results Framework for the CGIAR . < http://library.cgiar.org/bitstream/handle/10947/2608/Strategy_and_Results_Framework.pdf?sequence=4 >

Cloete N. ( 1997 ) ‘Quality: Conceptions, Contestations and Comments’, African Regional Consultation Preparatory to the World Conference on Higher Education , Dakar, Senegal, 1-4 April 1997 .

Defila R. DiGiulio A. ( 1999 ) ‘Evaluating Transdisciplinary Research,’ Panorama: Swiss National Science Foundation Newsletter , 1 : 4 – 27 . < www.ikaoe.unibe.ch/forschung/ip/Specialissue.Pano.1.99.pdf >

Donovan C. ( 2008 ) ‘The Australian Research Quality Framework: A Live Experiment in Capturing the Social, Economic, Environmental, and Cultural Returns of Publicly Funded Research. Reforming the Evaluation of Research’ , New Directions for Evaluation , 118 : 47 – 60 .

Earl S. Carden F. Smutylo T. ( 2001 ) Outcome Mapping. Building Learning and Reflection into Development Programs . Ottawa, ON : International Development Research Center .

Ernø-Kjølhede E. Hansson F. ( 2011 ) ‘Measuring Research Performance during a Changing Relationship between Science and Society’ , Research Evaluation , 20 / 2 : 130 – 42 .

Feller I. ( 2006 ) ‘Assessing Quality: Multiple Actors, Multiple Settings, Multiple Criteria: Issues in Assessing Interdisciplinary Research’ , Research Evaluation 15 / 1 : 5 – 15 .

Gaziulusoy A. İ. Boyle C. ( 2013 ) ‘Proposing a Heuristic Reflective Tool for Reviewing Literature in Transdisciplinary Research for Sustainability’ , Journal of Cleaner Production , 48 : 139 – 47 .

Gibbons M. et al.  . ( 1994 ) The New Production of Knowledge: The Dynamics of Science and Research in Contemporary Societies . London : Sage Publications .

Hellstrom T. ( 2011 ) ‘Homing in on Excellence: Dimensions of Appraisal in Center of Excellence Program Evaluations’ , Evaluation , 17 / 2 : 117 – 31 .

Hellstrom T. . ( 2012 ) ‘Epistemic Capacity in Research Environments: A Framework for Process Evaluation’ , Prometheus , 30 / 4 : 395 – 409 .

Hemlin S. Rasmussen S. B . ( 2006 ) ‘The Shift in Academic Quality Control’ , Science, Technology & Human Values , 31 / 2 : 173 – 98 .

Hessels L. K. Van Lente H. ( 2008 ) ‘Re-thinking New Knowledge Production: A Literature Review and a Research Agenda’ , Research Policy , 37 / 4 , 740 – 60 .

Huutoniemi K. ( 2010 ) ‘Evaluating Interdisciplinary Research’ , in Frodeman R. Klein J. T. Mitcham C. (eds) The Oxford Handbook of Interdisciplinarity , pp. 309 – 20 . Oxford : Oxford University Press .

de Jong S. P. L. et al.  . ( 2011 ) ‘Evaluation of Research in Context: An Approach and Two Cases’ , Research Evaluation , 20 / 1 : 61 – 72 .

Jahn T. Keil F. ( 2015 ) ‘An Actor-Specific Guideline for Quality Assurance in Transdisciplinary Research’ , Futures , 65 : 195 – 208 .

Kates R. ( 2000 ) ‘Sustainability Science’ , World Academies Conference Transition to Sustainability in the 21st Century 5/18/00 , Tokyo, Japan .

Klein J. T . ( 2006 ) ‘Afterword: The Emergent Literature on Interdisciplinary and Transdisciplinary Research Evaluation’ , Research Evaluation , 15 / 1 : 75 – 80 .

Klein J. T . ( 2008 ) ‘Evaluation of Interdisciplinary and Transdisciplinary Research: A Literature Review’ , American Journal of Preventive Medicine , 35 / 2 Supplment S116–23. DOI: 10.1016/j.amepre.2008.05.010

Royal Netherlands Academy of Arts and Sciences, Association of Universities in the Netherlands, Netherlands Organization for Scientific Research (KNAW) . ( 2009 ) Standard Evaluation Protocol 2009-2015: Protocol for Research Assessment in the Netherlands . Netherlands : KNAW . < www.knaw.nl/sep >

Komiyama H. Takeuchi K. ( 2006 ) ‘Sustainability Science: Building a New Discipline’ , Sustainability Science , 1 : 1 – 6 .

Lahtinen E. et al.  . ( 2005 ) ‘The Development of Quality Criteria For Research: A Finnish approach’ , Health Promotion International , 20 / 3 : 306 – 15 .

Lang D. J. et al.  . ( 2012 ) ‘Transdisciplinary Research in Sustainability Science: Practice , Principles , and Challenges’, Sustainability Science , 7 / S1 : 25 – 43 .

Lincoln Y. S . ( 1995 ) ‘Emerging Criteria for Quality in Qualitative and Interpretive Research’ , Qualitative Inquiry , 1 / 3 : 275 – 89 .

Mayne J. Stern E. ( 2013 ) Impact Evaluation of Natural Resource Management Research Programs: A Broader View . Australian Centre for International Agricultural Research, Canberra .

Meyrick J . ( 2006 ) ‘What is Good Qualitative Research? A First Step Towards a Comprehensive Approach to Judging Rigour/Quality’ , Journal of Health Psychology , 11 / 5 : 799 – 808 .

Mitchell C. A. Willetts J. R. ( 2009 ) ‘Quality Criteria for Inter and Trans - Disciplinary Doctoral Research Outcomes’ , in Prepared for ALTC Fellowship: Zen and the Art of Transdisciplinary Postgraduate Studies ., Sydney : Institute for Sustainable Futures, University of Technology .

Morrow S. L . ( 2005 ) ‘Quality and Trustworthiness in Qualitative Research in Counseling Psychology’ , Journal of Counseling Psychology , 52 / 2 : 250 – 60 .

Nowotny H. Scott P. Gibbons M. ( 2001 ) Re-Thinking Science . Cambridge : Polity .

Nowotny H. Scott P. Gibbons M. . ( 2003 ) ‘‘Mode 2’ Revisited: The New Production of Knowledge’ , Minerva , 41 : 179 – 94 .

Öberg G . ( 2008 ) ‘Facilitating Interdisciplinary Work: Using Quality Assessment to Create Common Ground’ , Higher Education , 57 / 4 : 405 – 15 .

Ozga J . ( 2007 ) ‘Co - production of Quality in the Applied Education Research Scheme’ , Research Papers in Education , 22 / 2 : 169 – 81 .

Ozga J . ( 2008 ) ‘Governing Knowledge: research steering and research quality’ , European Educational Research Journal , 7 / 3 : 261 – 272 .

OECD ( 2012 ) Frascati Manual 6th ed. < http://www.oecd.org/innovation/inno/frascatimanualproposedstandardpracticeforsurveysonresearchandexperimentaldevelopment6thedition >

Overseas Development Institute (ODI) ( 2004 ) ‘Bridging Research and Policy in International Development: An Analytical and Practical Framework’, ODI Briefing Paper. < http://www.odi.org/sites/odi.org.uk/files/odi-assets/publications-opinion-files/198.pdf >

Overseas Development Institute (ODI) . ( 2012 ) RAPID Outcome Assessment Guide . < http://www.odi.org/sites/odi.org.uk/files/odi-assets/publications-opinion-files/7815.pdf >

Pullin A. S. Stewart G. B. ( 2006 ) ‘Guidelines for Systematic Review in Conservation and Environmental Management’ , Conservation Biology , 20 / 6 : 1647 – 56 .

Research Excellence Framework (REF) . ( 2011 ) Research Excellence Framework 2014: Assessment Framework and Guidance on Submissions. Reference REF 02.2011. UK: REF. < http://www.ref.ac.uk/pubs/2011-02/ >

Scott A . ( 2007 ) ‘Peer Review and the Relevance of Science’ , Futures , 39 / 7 : 827 – 45 .

Spaapen J. Dijstelbloem H. Wamelink F. ( 2007 ) Evaluating Research in Context: A Method for Comprehensive Assessment . Netherlands: Consultative Committee of Sector Councils for Research and Development. < http://www.qs.univie.ac.at/fileadmin/user_upload/qualitaetssicherung/PDF/Weitere_Aktivit%C3%A4ten/Eric.pdf >

Spaapen J. Van Drooge L. ( 2011 ) ‘Introducing “Productive Interactions” in Social Impact Assessment’ , Research Evaluation , 20 : 211 – 18 .

Stige B. Malterud K. Midtgarden T. ( 2009 ) ‘Toward an Agenda for Evaluation of Qualitative Research’ , Qualitative Health Research , 19 / 10 : 1504 – 16 .

td-net ( 2014 ) td-net. < www.transdisciplinarity.ch/e/Bibliography/new.php >

Tertiary Education Commission (TEC) . ( 2012 ) Performance-based Research Fund: Quality Evaluation Guidelines 2012. New Zealand: TEC. < http://www.tec.govt.nz/Documents/Publications/PBRF-Quality-Evaluation-Guidelines-2012.pdf >

Tijssen R. J. W. ( 2003 ) ‘Quality Assurance: Scoreboards of Research Excellence’ , Research Evaluation , 12 : 91 – 103 .

White H. Phillips D. ( 2012 ) ‘Addressing Attribution of Cause and Effect in Small n Impact Evaluations: Towards an Integrated Framework’. Working Paper 15. New Delhi: International Initiative for Impact Evaluation .

Wickson F. Carew A. ( 2014 ) ‘Quality Criteria and Indicators for Responsible Research and Innovation: Learning from Transdisciplinarity’ , Journal of Responsible Innovation , 1 / 3 : 254 – 73 .

Wickson F. Carew A. Russell A. W. ( 2006 ) ‘Transdisciplinary Research: Characteristics, Quandaries and Quality,’ Futures , 38 / 9 : 1046 – 59

Month: Total Views:
November 2016 7
December 2016 36
January 2017 51
February 2017 109
March 2017 124
April 2017 72
May 2017 45
June 2017 30
July 2017 70
August 2017 84
September 2017 114
October 2017 76
November 2017 81
December 2017 320
January 2018 522
February 2018 326
March 2018 518
April 2018 661
May 2018 652
June 2018 463
July 2018 411
August 2018 528
September 2018 537
October 2018 361
November 2018 420
December 2018 344
January 2019 374
February 2019 465
March 2019 610
April 2019 456
May 2019 418
June 2019 437
July 2019 346
August 2019 377
September 2019 451
October 2019 376
November 2019 392
December 2019 326
January 2020 436
February 2020 383
March 2020 691
April 2020 444
May 2020 316
June 2020 435
July 2020 376
August 2020 379
September 2020 625
October 2020 443
November 2020 329
December 2020 356
January 2021 418
February 2021 402
March 2021 648
April 2021 519
May 2021 487
June 2021 435
July 2021 449
August 2021 421
September 2021 658
October 2021 537
November 2021 444
December 2021 379
January 2022 428
February 2022 534
March 2022 603
April 2022 688
May 2022 551
June 2022 366
July 2022 375
August 2022 497
September 2022 445
October 2022 457
November 2022 374
December 2022 303
January 2023 364
February 2023 327
March 2023 499
April 2023 404
May 2023 335
June 2023 350
July 2023 340
August 2023 419
September 2023 444
October 2023 595
November 2023 585
December 2023 498
January 2024 691
February 2024 728
March 2024 667
April 2024 611
May 2024 422
June 2024 382
July 2024 377
August 2024 427
September 2024 154

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1471-5449
  • Print ISSN 0958-2029
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

What Is Quality in Research? Building a Framework of Design, Process and Impact Attributes and Evaluation Perspectives

  • Sustainability 14(5):3034

Alessandro Margherita at Università del Salento

  • Università del Salento

Gianluca Elia at Università del Salento

Abstract and Figures

Attributes associated to Research Design (D).

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Paul Giller

  • Anna Baldissera

Sven E. Hug

  • محمد علي محمد عبد المتجلي

Natalia Correa

  • Ana Isabel Cañas Gutiérrez

Juan M. Cogollo-Flórez

  • Osama H. Sayed

Serge Horbach

  • Willem Halffman

Liv Langfeldt

  • Kubra Canhial

René van Horik

  • Lucien Karpik
  • M. D. Shipman

Sven Hemlin

  • Pirjo. Niemenmaa

Henry Montgomery

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

A Step-To-Step Guide to Write a Quality Research Article

  • Conference paper
  • First Online: 01 June 2023
  • Cite this conference paper

quality research articles are

  • Amit Kumar Tyagi   ORCID: orcid.org/0000-0003-2657-8700 14 ,
  • Rohit Bansal 15 ,
  • Anshu 16 &
  • Sathian Dananjayan   ORCID: orcid.org/0000-0002-6103-7267 17  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 717))

Included in the following conference series:

  • International Conference on Intelligent Systems Design and Applications

356 Accesses

25 Citations

Today publishing articles is a trend around the world almost in each university. Millions of research articles are published in thousands of journals annually throughout many streams/sectors such as medical, engineering, science, etc. But few researchers follow the proper and fundamental criteria to write a quality research article. Many published articles over the web become just irrelevant information with duplicate information, which is a waste of available resources. This is because many authors/researchers do not know/do not follow the correct approach for writing a valid/influential paper. So, keeping such issues for new researchers or exiting researchers in many sectors, we feel motivated to write an article and present some systematic work/approach that can help researchers produce a quality research article. Also, the authors can publish their work in international conferences like CVPR, ICML, NeurIPS, etc., or international journals with high factors or a white paper. Publishing good articles improve the profile of researchers around the world, and further future researchers can refer their work in their work as references to proceed with the respective research to a certain level. Hence, this article will provide sufficient information for researchers to write a simple, effective/impressive and qualitative research article on their area of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

quality research articles are

A brief guide to the science and art of writing manuscripts in biomedicine

quality research articles are

Does the journal impact factor reflect the impact of German medical guideline contributions?

quality research articles are

How to Evaluate Biomedical Research Publications Rigorously

Nair, M.M., Tyagi, A.K., Sreenath, N.: The future with industry 4.0 at the core of society 5.0: open issues, future opportunities and challenges. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7 (2021). https://doi.org/10.1109/ICCCI50826.2021.9402498

Tyagi, A.K., Fernandez, T.F., Mishra, S., Kumari, S.: Intelligent Automation Systems at the Core of Industry 4.0. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) ISDA 2020. AISC, vol. 1351, pp. 1–18. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71187-0_1

Chapter   Google Scholar  

Goyal, D., Tyagi, A.: A Look at Top 35 Problems in the Computer Science Field for the Next Decade. CRC Press, Boca Raton (2020) https://doi.org/10.1201/9781003052098-40

Tyagi, A.K., Meenu, G., Aswathy, S.U., Chetanya, V.: Healthcare Solutions for Smart Era: An Useful Explanation from User’s Perspective. In the Book “Recent Trends in Blockchain for Information Systems Security and Privacy”. CRC Press, Boca Raton (2021)

Google Scholar  

Varsha, R., Nair, S.M., Tyagi, A.K., Aswathy, S.U., RadhaKrishnan, R.: The future with advanced analytics: a sequential analysis of the disruptive technology’s scope. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 565–579. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73050-5_56

Tyagi, A.K., Nair, M.M., Niladhuri, S., Abraham, A.: Security, privacy research issues in various computing platforms: a survey and the road ahead. J. Inf. Assur. Secur. 15 (1), 1–16 (2020)

Madhav, A.V.S., Tyagi, A.K.: The world with future technologies (Post-COVID-19): open issues, challenges, and the road ahead. In: Tyagi, A.K., Abraham, A., Kaklauskas, A. (eds.) Intelligent Interactive Multimedia Systems for e-Healthcare Applications, pp. 411–452. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6542-4_22

Mishra, S., Tyagi, A.K.: The role of machine learning techniques in the Internet of Things-based cloud applications. In: Pal, S., De, D., Buyya, R. (eds.) Artificial Intelligence-Based Internet of Things Systems. Internet of Things (Technology, Communications and Computing). Springer, Cham. https://doi.org/10.1007/978-3-030-87059-1_4

Pramod, A., Naicker, H.S., Tyagi, A.K.: Machine Learning and Deep Learning: Open Issues and Future Research Directions for Next Ten Years. Computational Analysis and Understanding of Deep Learning for Medical Care: Principles, Methods, and Applications. Wiley Scrivener (2020)

Kumari, S., Tyagi, A.K., Aswathy, S.U.: The Future of Edge Computing with Blockchain Technology: Possibility of Threats, Opportunities and Challenges. In the Book Recent Trends in Blockchain for Information Systems Security and Privacy. CRC Press, Boca Raton (2021)

Dananjayan, S., Tang, Y., Zhuang, J., Hou, C., Luo, S.: Assessment of state-of-the-art deep learning based citrus disease detection techniques using annotated optical leaf images. Comput. Electron. Agric. 193 (7), 106658 (2022). https://doi.org/10.1016/j.compag.2021.106658

Nair, M.M., Tyagi, A.K.: Privacy: History, Statistics, Policy, Laws, Preservation and Threat analysis. J. Inf. Assur. Secur. 16 (1), 24–34 (2021)

Tyagi, A.K., Sreenath, N.: A comparative study on privacy preserving techniques for location based services. Br. J. Math. Comput. Sci. 10 (4), 1–25 (2015). ISSN: 2231–0851

Rekha, G., Tyagi, A.K., Krishna Reddy, V.: A wide scale classification of class imbalance problem and its solutions: a systematic literature review. J. Comput. Sci. 15 (7), 886–929 (2019). ISSN Print: 1549–3636

Kanuru, L., Tyagi, A.K., A, S.U., Fernandez, T.F., Sreenath, N., Mishra, S.: Prediction of pesticides and fertilisers using machine learning and Internet of Things. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2021). https://doi.org/10.1109/ICCCI50826.2021.9402536

Ambildhuke, G.M., Rekha, G., Tyagi, A.K.: Performance analysis of undersampling approaches for solving customer churn prediction. In: Goyal, D., Gupta, A.K., Piuri, V., Ganzha, M., Paprzycki, M. (eds.) Proceedings of the Second International Conference on Information Management and Machine Intelligence. LNNS, vol. 166, pp. 341–347. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9689-6_37

Sathian, D.: ABC algorithm-based trustworthy energy-efficient MIMO routing protocol. Int. J. Commun. Syst. 32 , e4166 (2019). https://doi.org/10.1002/dac.4166

Varsha, R., et al.: Deep learning based blockchain solution for preserving privacy in future vehicles. Int. J. Hybrid Intell. Syst. 16 (4), 223–236 (2020)

Tyagi, A.K., Aswathy, S U.: Autonomous Intelligent Vehicles (AIV): research statements, open issues, challenges and road for future. Int. J. Intell. Netw. 2 , 83–102 (2021). ISSN 2666–6030. https://doi.org/10.1016/j.ijin.2021.07.002

Tyagi, A.K., Sreenath, N.: Cyber physical systems: analyses, challenges and possible solutions. Internet Things Cyber-Phys. Syst. 1 , 22–33 (2021). ISSN 2667–3452, https://doi.org/10.1016/j.iotcps.2021.12.002

Tyagi, A.K., Aghila, G.: A wide scale survey on botnet. Int. J. Comput. Appl. 34 (9), 9–22 (2011). (ISSN: 0975–8887)

Tyagi, A.K., Fernandez, T.F., Aswathy, S.U.: Blockchain and aadhaar based electronic voting system. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, pp. 498–504 (2020). https://doi.org/10.1109/ICECA49313.2020.9297655

Kumari, S., Muthulakshmi, P.: Transformative effects of big data on advanced data analytics: open issues and critical challenges. J. Comput. Sci. 18 (6), 463–479 (2022). https://doi.org/10.3844/jcssp.2022.463.479

Article   Google Scholar  

Download references

Acknowledgement

We want to think of the anonymous reviewer and our colleagues who helped us to complete this work.

Author information

Authors and affiliations.

Department of Fashion Technology, National Institute of Fashion Technology, New Delhi, India

Amit Kumar Tyagi

Department of Management Studies, Vaish College of Engineering, Rohtak, India

Rohit Bansal

Faculty of Management and Commerce (FOMC), Baba Mastnath University, Asthal Bohar, Rohtak, India

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamilnadu, 600127, India

Sathian Dananjayan

You can also search for this author in PubMed   Google Scholar

Contributions

Amit Kumar Tyagi & Sathian Dananjayan have drafted and approved this manuscript for final publication.

Corresponding author

Correspondence to Amit Kumar Tyagi .

Editor information

Editors and affiliations.

Faculty of Computing and Data Science, FLAME University, Pune, Maharashtra, India

Ajith Abraham

Center for Smart Computing Continuum, Burgenland, Austria

Sabri Pllana

University of Bari, Bari, Italy

Gabriella Casalino

University of Jinan, Jinan, Shandong, China

Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India

Ethics declarations

Conflict of interest.

The author declares that no conflict exists regarding the publication of this paper.

Scope of the Work

As the author belongs to the computer science stream, so he has tried to cover up this article for all streams, but the maximum example used in situations, languages, datasets etc., are with respect to computer science-related disciplines only. This work can be used as a reference for writing good quality papers for international conferences journals.

Disclaimer. Links and papers provided in the work are only given as examples. To leave any citation or link is not intentional.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Tyagi, A.K., Bansal, R., Anshu, Dananjayan, S. (2023). A Step-To-Step Guide to Write a Quality Research Article. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 717. Springer, Cham. https://doi.org/10.1007/978-3-031-35510-3_36

Download citation

DOI : https://doi.org/10.1007/978-3-031-35510-3_36

Published : 01 June 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-35509-7

Online ISBN : 978-3-031-35510-3

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.328(7430); 2004 Jan 3

Assessing the quality of research

Paul glasziou.

1 Department of Primary Health Care, University of Oxford, Oxford OX3 7LF

Jan Vandenbroucke

2 Leiden University Medical School, Leiden 9600 RC, Netherlands

Iain Chalmers

3 James Lind Initiative, Oxford OX2 7LG

Associated Data

Short abstract.

Inflexible use of evidence hierarchies confuses practitioners and irritates researchers. So how can we improve the way we assess research?

The widespread use of hierarchies of evidence that grade research studies according to their quality has helped to raise awareness that some forms of evidence are more trustworthy than others. This is clearly desirable. However, the simplifications involved in creating and applying hierarchies have also led to misconceptions and abuses. In particular, criteria designed to guide inferences about the main effects of treatment have been uncritically applied to questions about aetiology, diagnosis, prognosis, or adverse effects. So should we assess evidence the way Michelin guides assess hotels and restaurants? We believe five issues should be considered in any revision or alternative approach to helping practitioners to find reliable answers to important clinical questions.

Different types of question require different types of evidence

Ever since two American social scientists introduced the concept in the early 1960s, 1 hierarchies have been used almost exclusively to determine the effects of interventions. This initial focus was appropriate but has also engendered confusion. Although interventions are central to clinical decision making, practice relies on answers to a wide variety of types of clinical questions, not just the effect of interventions. 2 Other hierarchies might be necessary to answer questions about aetiology, diagnosis, disease frequency, prognosis, and adverse effects. 3 Thus, although a systematic review of randomised trials would be appropriate for answering questions about the main effects of a treatment, it would be ludicrous to attempt to use it to ascertain the relative accuracy of computerised versus human reading of cervical smears, the natural course of prion diseases in humans, the effect of carriership of a mutation on the risk of venous thrombosis, or the rate of vaginal adenocarcinoma in the daughters of pregnant women given diethylstilboesterol. 4 ​ 4

An external file that holds a picture, illustration, etc.
Object name is glap95695.f1.jpg

To answer their everyday questions, practitioners need to understand the “indications and contraindications” for different types of research evidence. 5 Randomised trials can give good estimates of treatment effects but poor estimates of overall prognosis; comprehensive non-randomised inception cohort studies with prolonged follow up, however, might provide the reverse.

Systematic reviews of research are always preferred

With rare exceptions, no study, whatever the type, should be interpreted in isolation. Systematic reviews are required of the best available type of study for answering the clinical question posed. 6 A systematic review does not necessarily involve quantitative pooling in a meta-analysis.

Although case reports are a less than perfect source of evidence, they are important in alerting us to potential rare harms or benefits of an effective treatment. 7 Standardised reporting is certainly needed, 8 but too few people know about a study showing that more than half of suspected adverse drug reactions were confirmed by subsequent, more detailed research. 9 For reliable evidence on rare harms, therefore, we need a systematic review of case reports rather than a haphazard selection of them. 10 Qualitative studies can also be incorporated in reviews—for example, the systematic compilation of the reasons for non-compliance with hip protectors derived from qualitative research. 11

Level alone should not be used to grade evidence

The first substantial use of a hierarchy of evidence to grade health research was by the Canadian Task Force on the Preventive Health Examination. 12 Although such systems are preferable to ignoring research evidence or failing to provide justification for selecting particular research reports to support recommendations, they have three big disadvantages. Firstly, the definitions of the levels vary within hierarchies so that level 2 will mean different things to different readers. Secondly, novel or hybrid research designs are not accommodated in these hierarchies—for example, reanalysis of individual data from several studies or case crossover studies within cohorts. Thirdly, and perhaps most importantly, hierarchies can lead to anomalous rankings. For example, a statement about one intervention may be graded level 1 on the basis of a systematic review of a few, small, poor quality randomised trials, whereas a statement about an alternative intervention may be graded level 2 on the basis of one large, well conducted, multicentre, randomised trial.

This ranking problem arises because of the objective of collapsing the multiple dimensions of quality (design, conduct, size, relevance, etc) into a single grade. For example, randomisation is a key methodological feature in research into interventions, 13 but reducing the quality of evidence to a single level reflecting proper randomisation ignores other important dimensions of randomised clinical trials. These might include:

  • Other design elements, such as the validity of measurements and blinding of outcome assessments
  • Quality of the conduct of the study, such as loss to follow up and success of blinding
  • Absolute and relative size of any effects seen
  • Confidence intervals around the point estimates of effects.

None of the current hierarchies of evidence includes all these dimensions, and recent methodological research suggests that it may be difficult for them to do so. 14 Moreover, some dimensions are more important for some clinical problems and outcomes than for others, which necessitates a tailored approach to appraising evidence. 15 Thus, for important recommendations, it may be preferable to present a brief summary of the central evidence (such as “double-blind randomised controlled trials with a high degree of follow up over three years showed that...”), coupled with a brief appraisal of why particular quality dimensions are important. This broader approach to the assessment of evidence applies not only to randomised trials but also to observational studies. In the final recommendations, there will also be a role for other types of scientific evidence—for example, on aetiological and pathophysiological mechanisms—because concordance between theoretical models and the results of empirical investigations will increase confidence in the causal inferences. 16 , 17

What to do when systematic reviews are not available

Although hierarchies can be misleading as a grading system, they can help practitioners find the best relevant evidence among a plethora of studies of diverse quality. For example, to answer a therapeutic question, the hierarchy would suggest first looking for a systematic review of randomised controlled trials. However, only a fraction of the hundreds of thousands of reports of randomised trials have been considered for possible inclusion in systematic reviews. 18 So when there is no existing review, a busy clinician might next try to identify the best of several randomised trials. If the search fails to identify any randomised trials, non-randomised cohort studies might be informative. For non-therapeutic questions, however, search strategies should accommodate the need for observational designs that answer questions about aetiology, prognosis, or adverse effects. 19 Whatever evidence is found, this should be clearly described rather than simply assigned to a level. Such considerations have led the authors of the BMJ 's Clinical Evidence to use a hierarchy for finding evidence but to forgo grading evidence into levels. Instead, they make explicit the type of evidence on which their conclusions are based.

Balanced assessments should draw on a variety of types of research

For interventions, the best available evidence for each outcome of potential importance to patients is needed. 20 Often this will require systematic reviews of several different types of study. As an example, consider a woman interested in oral contraceptives. Evidence is available from controlled trials showing their contraceptive effectiveness. Although contraception is the main intended beneficial effect, some women will also be interested in the effects of oral contraceptives on acne or dysmenorrhoea. These may have been assessed in short term randomised controlled trials comparing different contraceptives. Any beneficial intended effect needs to be weighed against possible harms, such as increases in thromboembolism and breast cancer. The best evidence for such potential harms is likely to come from non-randomised cohort studies or case-control studies. For example, fears about negative consequences on fertility after long term use of oral contraceptives were allayed by such non-randomised studies. The figure gives an example of how all this information might be amalgamated into a balance sheet. 21 , 22

An external file that holds a picture, illustration, etc.
Object name is glap95695.f2.jpg

Example of possible evidence table for short and long term effects of oral contraceptives. (Absolute effects will vary with age and other risk factors such as smoking and blood pressure. RCT = randomised controlled trial)

Sometimes, rare, dramatic adverse effects detected with case reports or case control studies prompt further investigation and follow up of existing randomised cohorts to detect related but less severe adverse effects. For example, the case reports and case-control studies showing that intrauterine exposure to diethylstilboestrol could cause vaginal adenocarcinoma led to further investigation and follow up of the mothers and children (male as well as female) who had participated in the relevant randomised trials. These investigations showed several less serious but more frequent adverse effects of diethylstilboestrol that would have otherwise been difficult to detect. 4

Conclusions

Given the flaws in evidence hierarchies that we have described, how should we proceed? We suggest that there are two broad options: firstly, to extend, improve, and standardise current evidence hierarchies 22 ; and, secondly, to abolish the notion of evidence hierarchies and levels of evidence, and concentrate instead on teaching practitioners general principles of research so that they can use these principles to appraise the quality and relevance of particular studies. 5

We have been unable to reach a consensus on which of these approaches is likely to serve the current needs of practitioners more effectively. Practitioners who seek immediate answers cannot embark on a systematic review every time a new question arises in their practice. Clinical guidelines are increasingly prepared professionally—for example, by organisations of general practitioners and of specialist physicians or the NHS National Institute for Clinical Excellence—and this work draws on the results of systematic reviews of research evidence. Such organisations might find it useful to reconsider their approach to evidence and broaden the type of problems that they examine, especially when they need to balance risks and benefits. Most importantly, however, the practitioners who use their products should understand the approach used and be able to judge easily whether a review or a guideline has been prepared reliably.

Evidence hierarchies with the randomised trial at the apex have been pivotal in the ascendancy of numerical reasoning in medicine over the past quarter century. 17 Now that this principle is widely appreciated, however, we believe that it is time to broaden the scope by which evidence is assessed, so that the principles of other types of research, addressing questions on aetiology, diagnosis, prognosis, and unexpected effects of treatment, will become equally widely understood. Indeed, maybe we do have something to learn from Michelin guides: they have separate grading systems for hotels and restaurants, provide the details of the several quality dimensions behind each star rating, and add a qualitative commentary ( www.viamichelin.com ).

Summary points

Different types of research are needed to answer different types of clinical questions

Irrespective of the type of research, systematic reviews are necessary

Adequate grading of quality of evidence goes beyond the categorisation of research design

Risk-benefit assessments should draw on a variety of types of research

Clinicians need efficient search strategies for identifying reliable clinical research

Supplementary Material

We thank Andy Oxman and Mike Rawlins for helpful suggestions.

Contributors and sources: As a general practitioner, PG uses the his own and others' evidence assessments, and as a teacher of evidence based medicine helps others find and appraise research. JV is an internist and epidemiologist by training; he has extensively collaborated in clinical research, which made him strongly aware of the diverse types of evidence that clinicians use and need. IC's interest in these issues arose from witnessing the harm done to patients from eminence based medicine.

Competing interests: None declared.

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Assessing the quality...

Assessing the quality of research

  • Related content
  • Peer review

This article has a correction. Please see:

  • Errata - September 09, 2004
  • Paul Glasziou ([email protected]) , reader 1 ,
  • Jan Vandenbroucke , professor of clinical epidemiology 2 ,
  • Iain Chalmers , editor, James Lind library 3
  • 1 Department of Primary Health Care, University of Oxford, Oxford OX3 7LF
  • 2 Leiden University Medical School, Leiden 9600 RC, Netherlands
  • 3 James Lind Initiative, Oxford OX2 7LG
  • Correspondence to: P Glasziou
  • Accepted 20 October 2003

Inflexible use of evidence hierarchies confuses practitioners and irritates researchers. So how can we improve the way we assess research?

The widespread use of hierarchies of evidence that grade research studies according to their quality has helped to raise awareness that some forms of evidence are more trustworthy than others. This is clearly desirable. However, the simplifications involved in creating and applying hierarchies have also led to misconceptions and abuses. In particular, criteria designed to guide inferences about the main effects of treatment have been uncritically applied to questions about aetiology, diagnosis, prognosis, or adverse effects. So should we assess evidence the way Michelin guides assess hotels and restaurants? We believe five issues should be considered in any revision or alternative approach to helping practitioners to find reliable answers to important clinical questions.

Different types of question require different types of evidence

Ever since two American social scientists introduced the concept in the early 1960s, 1 hierarchies have been used almost exclusively to determine the effects of interventions. This initial focus was appropriate but has also engendered confusion. Although interventions are central to clinical decision making, practice relies on answers to a wide variety of types of clinical questions, not just the effect of interventions. 2 Other hierarchies might be necessary to answer questions about aetiology, diagnosis, disease frequency, prognosis, and adverse effects. 3 Thus, although a systematic review of randomised trials would be appropriate for answering questions about the main effects of a treatment, it would be ludicrous to attempt to use it to ascertain the relative accuracy of computerised versus human reading of cervical smears, the natural course of prion diseases in humans, the effect of carriership of a mutation on the risk of venous thrombosis, or the rate of vaginal adenocarcinoma in the daughters of pregnant women given diethylstilboesterol. 4

To answer their everyday questions, practitioners …

Log in using your username and password

BMA Member Log In

If you have a subscription to The BMJ, log in:

  • Need to activate
  • Log in via institution
  • Log in via OpenAthens

Log in through your institution

Subscribe from £184 *.

Subscribe and get access to all BMJ articles, and much more.

* For online subscription

Access this article for 1 day for: £50 / $60/ €56 ( excludes VAT )

You can download a PDF version for your personal record.

Buy this article

quality research articles are

What is quality research? A guide to identifying the key features and achieving success

quality research articles are

Every researcher worth their salt strives for quality. But in research, what does quality mean?

Simply put, quality research is thorough, accurate, original and relevant. And to achieve this, you need to follow specific standards. You need to make sure your findings are reliable and valid. And when you know they're quality assured, you can share them with absolute confidence.

You’ll be able to draw accurate conclusions from your investigations and contribute to the wider body of knowledge in your field.

Importance of quality research

Quality research helps us better understand complex problems. It enables us to make decisions based on facts and evidence. And it empowers us to solve real-world issues. Without quality research, we can't advance knowledge or identify trends and patterns. We also can’t develop new theories and approaches to solving problems.

With rigorous and transparent research methods, you’ll produce reliable findings that other researchers can replicate. This leads to the development of new theories and interventions. On the other hand, low-quality research can hinder progress by producing unreliable findings that can’t be replicated, wasting resources and impeding advancements in the field.

In all cases, quality control is critical. It ensures that decisions are based on evidence rather than gut feeling or bias.

Standards for quality research

Over the years, researchers, scientists and authors have come to a consensus about the standards used to check the quality of research. Determined through empirical observation, theoretical underpinnings and philosophy of science, these include:

1. Having a well-defined research topic and a clear hypothesis

This is essential to verify that the research is focused and the results are relevant and meaningful. The research topic should be well-scoped and the hypothesis should be clearly stated and falsifiable .

For example, in a quantitative study about the effects of social media on behavior, a well-defined research topic could be, "Does the use of TikTok reduce attention span in American adolescents?"

This is good because:

  • The research topic focuses on a particular platform of social media ( TikTok ). And it also focuses on a specific group of people (American adolescents).
  • The research question is clear and straightforward, making it easier to design the study and collect relevant data.
  • You can test the hypothesis and a research team can evaluate it easily. This can be done through the use of various research methods, such as survey research, experiments or observational studies.
  • The hypothesis is focused on a specific outcome (the attention span). Then, this can be measured and compared to control groups or previous research studies.

2. Ensuring transparency

Transparency is crucial when conducting research. You need to be upfront about the methods you used, such as:

  • Describing how you recruited the participants.
  • How you communicated with them.
  • How they were incentivized.

You also need to explain how you analyzed the data, so other researchers can replicate your results if necessary. re-registering your study is a great way to be as transparent in your research as possible. This  involves publicly documenting your study design, methods and analysis plan before conducting the research. This reduces the risk of selective reporting and increases the credibility of your findings.

3. Using appropriate research methods

Depending on the topic, some research methods are better suited than others for collecting data. To use our TikTok example, a quantitative research approach, such as a behavioral test that measures the participants' ability to focus on tasks, might be the most appropriate.

On the other hand, for topics that require a more in-depth understanding of individuals' experiences or perspectives, a qualitative research approach, such as interviews or focus groups, might be more suitable. These methods can provide rich and detailed information that you can’t capture through quantitative data alone.

4. Assessing limitations and the possible impact of systematic bias

When you present your research, it’s important to consider how the limitations of your study could affect the result. This could be systematic bias in the sampling procedure or data analysis, for instance. Let’s say you only study a small sample of participants from one school district. This would limit the generalizability and content validity of your findings.

5. Conducting accurate reporting

This is an essential aspect of any research project. You need to be able to clearly communicate the findings and implications of your study . Also, provide citations for any claims made in your report. When you present your work, it’s vital that you describe the variables involved in your study accurately and how you measured them.

Curious to learn more? Read our Data Quality eBook .

How to identify credible research findings

To determine whether a published study is trustworthy, consider the following:

  • Peer review: If a study has been peer-reviewed by recognized experts, rest assured that it’s a reliable source of information. Peer review means that other scholars have read and verified the study before publication.
  • Researcher's qualifications: If they're an expert in the field, that’s a good sign that you can trust their findings. However, if they aren't, it doesn’t necessarily mean that the study's information is unreliable. It simply means that you should be extra cautious about accepting its conclusions as fact.
  • Study design: The design of a study can make or break its reliability. Consider factors like sample size and methodology.
  • Funding source: Studies funded by organizations with a vested interest in a particular outcome may be less credible than those funded by independent sources.
  • Statistical significance: You've heard the phrase "numbers don't lie," right? That's what statistical significance is all about. It refers to the likelihood that the results of a study occurred by chance. Results that are statistically significant are more credible.

Achieve quality research with Prolific

Want to ensure your research is high-quality? Prolific can help.

Our platform gives you access to a carefully vetted pool of participants. We make sure they're attentive, honest, and ready to provide rich and detailed answers where needed. This helps to ensure that the data you collect through Prolific is of the highest quality.

Streamline your research process and feel confident in the results you receive. Our minimum pay threshold and commitment to fair compensation motivate participants to provide valuable responses and give their best effort. This ensures the quality of your research and helps you get the results you need. 

You might also like

quality research articles are

High-quality human data to deliver world-leading research and AIs.

quality research articles are

Follow us on

All Rights Reserved Prolific 2024

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 12 September 2024

An open-source framework for end-to-end analysis of electronic health record data

  • Lukas Heumos 1 , 2 , 3 ,
  • Philipp Ehmele 1 ,
  • Tim Treis 1 , 3 ,
  • Julius Upmeier zu Belzen   ORCID: orcid.org/0000-0002-0966-4458 4 ,
  • Eljas Roellin 1 , 5 ,
  • Lilly May 1 , 5 ,
  • Altana Namsaraeva 1 , 6 ,
  • Nastassya Horlava 1 , 3 ,
  • Vladimir A. Shitov   ORCID: orcid.org/0000-0002-1960-8812 1 , 3 ,
  • Xinyue Zhang   ORCID: orcid.org/0000-0003-4806-4049 1 ,
  • Luke Zappia   ORCID: orcid.org/0000-0001-7744-8565 1 , 5 ,
  • Rainer Knoll 7 ,
  • Niklas J. Lang 2 ,
  • Leon Hetzel 1 , 5 ,
  • Isaac Virshup 1 ,
  • Lisa Sikkema   ORCID: orcid.org/0000-0001-9686-6295 1 , 3 ,
  • Fabiola Curion 1 , 5 ,
  • Roland Eils 4 , 8 ,
  • Herbert B. Schiller 2 , 9 ,
  • Anne Hilgendorff 2 , 10 &
  • Fabian J. Theis   ORCID: orcid.org/0000-0002-2419-1943 1 , 3 , 5  

Nature Medicine ( 2024 ) Cite this article

86 Altmetric

Metrics details

  • Epidemiology
  • Translational research

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Similar content being viewed by others

quality research articles are

Data-driven identification of heart failure disease states and progression pathways using electronic health records

quality research articles are

EHR foundation models improve robustness in the presence of temporal distribution shift

quality research articles are

Harnessing EHR data for health research

Electronic health records (EHRs) are becoming increasingly common due to standardized data collection 1 and digitalization in healthcare institutions. EHRs collected at medical care sites serve as efficient storage and sharing units of health information 2 , enabling the informed treatment of individuals using the patient’s complete history 3 . Routinely collected EHR data are approaching genomic-scale size and complexity 4 , posing challenges in extracting information without quantitative analysis methods. The application of such approaches to EHR databases 1 , 5 , 6 , 7 , 8 , 9 has enabled the prediction and classification of diseases 10 , 11 , study of population health 12 , determination of optimal treatment policies 13 , 14 , simulation of clinical trials 15 and stratification of patients 16 .

However, current EHR datasets suffer from serious limitations, such as data collection issues, inconsistencies and lack of data diversity. EHR data collection and sharing problems often arise due to non-standardized formats, with disparate systems using exchange protocols, such as Health Level Seven International (HL7) and Fast Healthcare Interoperability Resources (FHIR) 17 . In addition, EHR data are stored in various on-disk formats, including, but not limited to, relational databases and CSV, XML and JSON formats. These variations pose challenges with respect to data retrieval, scalability, interoperability and data sharing.

Beyond format variability, inherent biases of the collected data can compromise the validity of findings. Selection bias stemming from non-representative sample composition can lead to skewed inferences about disease prevalence or treatment efficacy 18 , 19 . Filtering bias arises through inconsistent criteria for data inclusion, obscuring true variable relationships 20 . Surveillance bias exaggerates associations between exposure and outcomes due to differential monitoring frequencies 21 . EHR data are further prone to missing data 22 , 23 , which can be broadly classified into three categories: missing completely at random (MCAR), where missingness is unrelated to the data; missing at random (MAR), where missingness depends on observed data; and missing not at random (MNAR), where missingness depends on unobserved data 22 , 23 . Information and coding biases, related to inaccuracies in data recording or coding inconsistencies, respectively, can lead to misclassification and unreliable research conclusions 24 , 25 . Data may even contradict itself, such as when measurements were reported for deceased patients 26 , 27 . Technical variation and differing data collection standards lead to distribution differences and inconsistencies in representation and semantics across EHR datasets 28 , 29 . Attrition and confounding biases, resulting from differential patient dropout rates or unaccounted external variable effects, can significantly skew study outcomes 30 , 31 , 32 . The diversity of EHR data that comprise demographics, laboratory results, vital signs, diagnoses, medications, x-rays, written notes and even omics measurements amplifies all the aforementioned issues.

Addressing these challenges requires rigorous study design, careful data pre-processing and continuous bias evaluation through exploratory data analysis. Several EHR data pre-processing and analysis workflows were previously developed 4 , 33 , 34 , 35 , 36 , 37 , but none of them enables the analysis of heterogeneous data, provides in-depth documentation, is available as a software package or allows for exploratory visual analysis. Current EHR analysis pipelines, therefore, differ considerably in their approaches and are often commercial, vendor-specific solutions 38 . This is in contrast to strategies using community standards for the analysis of omics data, such as Bioconductor 39 or scverse 40 . As a result, EHR data frequently remain underexplored and are commonly investigated only for a particular research question 41 . Even in such cases, EHR data are then frequently input into machine learning models with serious data quality issues that greatly impact prediction performance and generalizability 42 .

To address this lack of analysis tooling, we developed the EHR Analysis in Python framework, ehrapy, which enables exploratory analysis of diverse EHR datasets. The ehrapy package is purpose-built to organize, analyze, visualize and statistically compare complex EHR data. ehrapy can be applied to datasets of different data types, sizes, diseases and origins. To demonstrate this versatility, we applied ehrapy to datasets obtained from EHR and population-based studies. Using the Pediatric Intensive Care (PIC) EHR database 43 , we stratified patients diagnosed with ‘unspecified pneumonia’ into distinct clinically relevant groups, extracted clinical indicators of pneumonia through statistical analysis and quantified medication-class effects on length of stay (LOS) with causal inference. Using the UK Biobank 44 (UKB), a population-scale cohort comprising over 500,000 participants from the United Kingdom, we employed ehrapy to explore cardiovascular risk factors using clinical predictors, metabolomics, genomics and retinal imaging-derived features. Additionally, we performed image analysis to project disease progression through fate mapping in patients affected by coronavirus disease 2019 (COVID-19) using chest x-rays. Finally, we demonstrate how exploratory analysis with ehrapy unveils and mitigates biases in over 100,000 visits by patients with diabetes across 130 US hospitals. We provide online links to additional use cases that demonstrate ehrapy’s usage with further datasets, including MIMIC-II (ref. 45 ), and for various medical conditions, such as patients subject to indwelling arterial catheter usage. ehrapy is compatible with any EHR dataset that can be transformed into vectors and is accessible as a user-friendly open-source software package hosted at https://github.com/theislab/ehrapy and installable from PyPI. It comes with comprehensive documentation, tutorials and further examples, all available at https://ehrapy.readthedocs.io .

ehrapy: a framework for exploratory EHR data analysis

The foundation of ehrapy is a robust and scalable data storage backend that is combined with a series of pre-processing and analysis modules. In ehrapy, EHR data are organized as a data matrix where observations are individual patient visits (or patients, in the absence of follow-up visits), and variables represent all measured quantities ( Methods ). These data matrices are stored together with metadata of observations and variables. By leveraging the AnnData (annotated data) data structure that implements this design, ehrapy builds upon established standards and is compatible with analysis and visualization functions provided by the omics scverse 40 ecosystem. Readers are also available in R, Julia and Javascript 46 . We additionally provide a dataset module with more than 20 public loadable EHR datasets in AnnData format to kickstart analysis and development with ehrapy.

For standardized analysis of EHR data, it is crucial that these data are encoded and stored in consistent, reusable formats. Thus, ehrapy requires that input data are organized in structured vectors. Readers for common formats, such as CSV, OMOP 47 or SQL databases, are available in ehrapy. Data loaded into AnnData objects can be mapped against several hierarchical ontologies 48 , 49 , 50 , 51 ( Methods ). Clinical keywords of free text notes can be automatically extracted ( Methods ).

Powered by scanpy, which scales to millions of observations 52 ( Methods and Supplementary Table 1 ) and the machine learning library scikit-learn 53 , ehrapy provides more than 100 composable analysis functions organized in modules from which custom analysis pipelines can be built. Each function directly interacts with the AnnData object and adds all intermediate results for simple access and reuse of information to it. To facilitate setting up these pipelines, ehrapy guides analysts through a general analysis pipeline (Fig. 1 ). At any step of an analysis pipeline, community software packages can be integrated without any vendor lock-in. Because ehrapy is built on open standards, it can be purposefully extended to solve new challenges, such as the development of foundational models ( Methods ).

figure 1

a , Heterogeneous health data are first loaded into memory as an AnnData object with patient visits as observational rows and variables as columns. Next, the data can be mapped against ontologies, and key terms are extracted from free text notes. b , The EHR data are subject to quality control where low-quality or spurious measurements are removed or imputed. Subsequently, numerical data are normalized, and categorical data are encoded. Data from different sources with data distribution shifts are integrated, embedded, clustered and annotated in a patient landscape. c , Further downstream analyses depend on the question of interest and can include the inference of causal effects and trajectories, survival analysis or patient stratification.

In the ehrapy analysis pipeline, EHR data are initially inspected for quality issues by analyzing feature distributions that may skew results and by detecting visits and features with high missing rates that ehrapy can then impute ( Methods ). ehrapy tracks all filtering steps while keeping track of population dynamics to highlight potential selection and filtering biases ( Methods ). Subsequently, ehrapy’s normalization and encoding functions ( Methods ) are applied to achieve a uniform numerical representation that facilitates data integration and corrects for dataset shift effects ( Methods ). Calculated lower-dimensional representations can subsequently be visualized, clustered and annotated to obtain a patient landscape ( Methods ). Such annotated groups of patients can be used for statistical comparisons to find differences in features among them to ultimately learn markers of patient states.

As analysis goals can differ between users and datasets, the ehrapy analysis pipeline is customizable during the final knowledge inference step. ehrapy provides statistical methods for group comparison and extensive support for survival analysis ( Methods ), enabling the discovery of biomarkers. Furthermore, ehrapy offers functions for causal inference to go from statistically determined associations to causal relations ( Methods ). Moreover, patient visits in aggregated EHR data can be regarded as snapshots where individual measurements taken at specific timepoints might not adequately reflect the underlying progression of disease and result from unrelated variation due to, for example, day-to-day differences 54 , 55 , 56 . Therefore, disease progression models should rely on analysis of the underlying clinical data, as disease progression in an individual patient may not be monotonous in time. ehrapy allows for the use of advanced trajectory inference methods to overcome sparse measurements 57 , 58 , 59 . We show that this approach can order snapshots to calculate a pseudotime that can adequately reflect the progression of the underlying clinical process. Given a sufficient number of snapshots, ehrapy increases the potential to understand disease progression, which is likely not robustly captured within a single EHR but, rather, across several.

ehrapy enables patient stratification in pneumonia cases

To demonstrate ehrapy’s capability to analyze heterogeneous datasets from a broad patient set across multiple care units, we applied our exploratory strategy to the PIC 43 database. The PIC database is a single-center database hosting information on children admitted to critical care units at the Children’s Hospital of Zhejiang University School of Medicine in China. It contains 13,499 distinct hospital admissions of 12,881 individual pediatric patients admitted between 2010 and 2018 for whom demographics, diagnoses, doctors’ notes, vital signs, laboratory and microbiology tests, medications, fluid balances and more were collected (Extended Data Figs. 1 and 2a and Methods ). After missing data imputation and subsequent pre-processing (Extended Data Figs. 2b,c and 3 and Methods ), we generated a uniform manifold approximation and projection (UMAP) embedding to visualize variation across all patients using ehrapy (Fig. 2a ). This visualization of the low-dimensional patient manifold shows the heterogeneity of the collected data in the PIC database, with malformations, perinatal and respiratory being the most abundant International Classification of Diseases (ICD) chapters (Fig. 2b ). The most common respiratory disease categories (Fig. 2c ) were labeled pneumonia and influenza ( n  = 984). We focused on pneumonia to apply ehrapy to a challenging, broad-spectrum disease that affects all age groups. Pneumonia is a prevalent respiratory infection that poses a substantial burden on public health 60 and is characterized by inflammation of the alveoli and distal airways 60 . Individuals with pre-existing chronic conditions are particularly vulnerable, as are children under the age of 5 (ref. 61 ). Pneumonia can be caused by a range of microorganisms, encompassing bacteria, respiratory viruses and fungi.

figure 2

a , UMAP of all patient visits in the ICU with primary discharge diagnosis grouped by ICD chapter. b , The prevalence of respiratory diseases prompted us to investigate them further. c , Respiratory categories show the abundance of influenza and pneumonia diagnoses that we investigated more closely. d , We observed the ‘unspecified pneumonia’ subgroup, which led us to investigate and annotate it in more detail. e , The previously ‘unspecified pneumonia’-labeled patients were annotated using several clinical features (Extended Data Fig. 5 ), of which the most important ones are shown in the heatmap ( f ). g , Example disease progression of an individual child with pneumonia illustrating pharmacotherapy over time until positive A. baumannii swab.

We selected the age group ‘youths’ (13 months to 18 years of age) for further analysis, addressing a total of 265 patients who dominated the pneumonia cases and were diagnosed with ‘unspecified pneumonia’ (Fig. 2d and Extended Data Fig. 4 ). Neonates (0–28 d old) and infants (29 d to 12 months old) were excluded from the analysis as the disease context is significantly different in these age groups due to distinct anatomical and physical conditions. Patients were 61% male, had a total of 277 admissions, had a mean age at admission of 54 months (median, 38 months) and had an average LOS of 15 d (median, 7 d). Of these, 152 patients were admitted to the pediatric intensive care unit (PICU), 118 to the general ICU (GICU), four to the surgical ICU (SICU) and three to the cardiac ICU (CICU). Laboratory measurements typically had 12–14% missing data, except for serum procalcitonin (PCT), a marker for bacterial infections, with 24.5% missing, and C-reactive protein (CRP), a marker of inflammation, with 16.8% missing. Measurements assigned as ‘vital signs’ contained between 44% and 54% missing values. Stratifying patients with unspecified pneumonia further enables a more nuanced understanding of the disease, potentially facilitating tailored approaches to treatment.

To deepen clinical phenotyping for the disease group ‘unspecified pneumonia’, we calculated a k -nearest neighbor graph to cluster patients into groups and visualize these in UMAP space ( Methods ). Leiden clustering 62 identified four patient groupings with distinct clinical features that we annotated (Fig. 2e ). To identify the laboratory values, medications and pathogens that were most characteristic for these four groups (Fig. 2f ), we applied t -tests for numerical data and g -tests for categorical data between the identified groups using ehrapy (Extended Data Fig. 5 and Methods ). Based on this analysis, we identified patient groups with ‘sepsis-like, ‘severe pneumonia with co-infection’, ‘viral pneumonia’ and ‘mild pneumonia’ phenotypes. The ‘sepsis-like’ group of patients ( n  = 28) was characterized by rapid disease progression as exemplified by an increased number of deaths (adjusted P  ≤ 5.04 × 10 −3 , 43% ( n  = 28), 95% confidence interval (CI): 23%, 62%); indication of multiple organ failure, such as elevated creatinine (adjusted P  ≤ 0.01, 52.74 ± 23.71 μmol L −1 ) or reduced albumin levels (adjusted P  ≤ 2.89 × 10 −4 , 33.40 ± 6.78 g L −1 ); and increased expression levels and peaks of inflammation markers, including PCT (adjusted P  ≤ 3.01 × 10 −2 , 1.42 ± 2.03 ng ml −1 ), whole blood cell count, neutrophils, lymphocytes, monocytes and lower platelet counts (adjusted P  ≤ 6.3 × 10 −2 , 159.30 ± 142.00 × 10 9 per liter) and changes in electrolyte levels—that is, lower potassium levels (adjusted P  ≤ 0.09 × 10 −2 , 3.14 ± 0.54 mmol L −1 ). Patients whom we associated with the term ‘severe pneumonia with co-infection’ ( n  = 74) were characterized by prolonged ICU stays (adjusted P  ≤ 3.59 × 10 −4 , 15.01 ± 29.24 d); organ affection, such as higher levels of creatinine (adjusted P  ≤ 1.10 × 10 −4 , 52.74 ± 23.71 μmol L −1 ) and lower platelet count (adjusted P  ≤ 5.40 × 10 −23 , 159.30 ± 142.00 × 10 9 per liter); increased inflammation markers, such as peaks of PCT (adjusted P  ≤ 5.06 × 10 −5 , 1.42 ± 2.03 ng ml −1 ), CRP (adjusted P  ≤ 1.40 × 10 −6 , 50.60 ± 37.58 mg L −1 ) and neutrophils (adjusted P  ≤ 8.51 × 10 −6 , 13.01 ± 6.98 × 10 9 per liter); detection of bacteria in combination with additional pathogen fungals in sputum samples (adjusted P  ≤ 1.67 × 10 −2 , 26% ( n  = 74), 95% CI: 16%, 36%); and increased application of medication, including antifungals (adjusted P  ≤ 1.30 × 10 −4 , 15% ( n  = 74), 95% CI: 7%, 23%) and catecholamines (adjusted P  ≤ 2.0 × 10 −2 , 45% ( n  = 74), 95% CI: 33%, 56%). Patients in the ‘mild pneumonia’ group were characterized by positive sputum cultures in the presence of relatively lower inflammation markers, such as PCT (adjusted P  ≤ 1.63 × 10 −3 , 1.42 ± 2.03 ng ml −1 ) and CRP (adjusted P  ≤ 0.03 × 10 −1 , 50.60 ± 37.58 mg L −1 ), while receiving antibiotics more frequently (adjusted P  ≤ 1.00 × 10 −5 , 80% ( n  = 78), 95% CI: 70%, 89%) and additional medications (electrolytes, blood thinners and circulation-supporting medications) (adjusted P  ≤ 1.00 × 10 −5 , 82% ( n  = 78), 95% CI: 73%, 91%). Finally, patients in the ‘viral pneumonia’ group were characterized by shorter LOSs (adjusted P  ≤ 8.00 × 10 −6 , 15.01 ± 29.24 d), a lack of non-viral pathogen detection in combination with higher lymphocyte counts (adjusted P  ≤ 0.01, 4.11 ± 2.49 × 10 9 per liter), lower levels of PCT (adjusted P  ≤ 0.03 × 10 −2 , 1.42 ± 2.03 ng ml −1 ) and reduced application of catecholamines (adjusted P  ≤ 5.96 × 10 −7 , 15% (n = 97), 95% CI: 8%, 23%), antibiotics (adjusted P  ≤ 8.53 × 10 −6 , 41% ( n  = 97), 95% CI: 31%, 51%) and antifungals (adjusted P  ≤ 5.96 × 10 −7 , 0% ( n  = 97), 95% CI: 0%, 0%).

To demonstrate the ability of ehrapy to examine EHR data from different levels of resolution, we additionally reconstructed a case from the ‘severe pneumonia with co-infection’ group (Fig. 2g ). In this case, the analysis revealed that CRP levels remained elevated despite broad-spectrum antibiotic treatment until a positive Acinetobacter baumannii result led to a change in medication and a subsequent decrease in CRP and monocyte levels.

ehrapy facilitates extraction of pneumonia indicators

ehrapy’s survival analysis module allowed us to identify clinical indicators of disease stages that could be used as biomarkers through Kaplan–Meier analysis. We found strong variance in overall aspartate aminotransferase (AST), alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT) and bilirubin levels (Fig. 3a ), including changes over time (Extended Data Fig. 6a,b ), in all four ‘unspecified pneumonia’ groups. Routinely used to assess liver function, studies provide evidence that AST, ALT and GGT levels are elevated during respiratory infections 63 , including severe pneumonia 64 , and can guide diagnosis and management of pneumonia in children 63 . We confirmed reduced survival in more severely affected children (‘sepsis-like pneumonia’ and ‘severe pneumonia with co-infection’) using Kaplan–Meier curves and a multivariate log-rank test (Fig. 3b ; P  ≤ 1.09 × 10 −18 ) through ehrapy. To verify the association of this trajectory with altered AST, ALT and GGT expression levels, we further grouped all patients based on liver enzyme reference ranges ( Methods and Supplementary Table 2 ). By Kaplan–Meier survival analysis, cases with peaks of GGT ( P  ≤ 1.4 × 10 −2 , 58.01 ± 2.03 U L −1 ), ALT ( P  ≤ 2.9 × 10 −2 , 43.59 ± 38.02 U L −1 ) and AST ( P  ≤ 4.8 × 10 −4 , 78.69 ± 60.03 U L −1 ) in ‘outside the norm’ were found to correlate with lower survival in all groups (Fig. 3c and Extended Data Fig. 6 ), in line with previous studies 63 , 65 . Bilirubin was not found to significantly affect survival ( P  ≤ 2.1 × 10 −1 , 12.57 ± 21.22 mg dl −1 ).

figure 3

a , Line plots of major hepatic system laboratory measurements per group show variance in the measurements per pneumonia group. b , Kaplan–Meier survival curves demonstrate lower survival for ‘sepsis-like’ and ‘severe pneumonia with co-infection’ groups. c , Kaplan–Meier survival curves for children with GGT measurements outside the norm range display lower survival.

ehrapy quantifies medication class effect on LOS

Pneumonia requires case-specific medications due to its diverse causes. To demonstrate the potential of ehrapy’s causal inference module, we quantified the effect of medication on ICU LOS to evaluate case-specific administration of medication. In contrast to causal discovery that attempts to find a causal graph reflecting the causal relationships, causal inference is a statistical process used to investigate possible effects when altering a provided system, as represented by a causal graph and observational data (Fig. 4a ) 66 . This approach allows identifying and quantifying the impact of specific interventions or treatments on outcome measures, thereby providing insight for evidence-based decision-making in healthcare. Causal inference relies on datasets incorporating interventions to accurately quantify effects.

figure 4

a , ehrapy’s causal module is based on the strategy of the tool ‘dowhy’. Here, EHR data containing treatment, outcome and measurements and a causal graph serve as input for causal effect quantification. The process includes the identification of the target estimand based on the causal graph, the estimation of causal effects using various models and, finally, refutation where sensitivity analyses and refutation tests are performed to assess the robustness of the results and assumptions. b , Curated causal graph using age, liver damage and inflammation markers as disease progression proxies together with medications as interventions to assess the causal effect on length of ICU stay. c , Determined causal effect strength on LOS in days of administered medication categories.

We manually constructed a minimal causal graph with ehrapy (Fig. 4b ) on records of treatment with corticosteroids, carbapenems, penicillins, cephalosporins and antifungal and antiviral medications as interventions (Extended Data Fig. 7 and Methods ). We assumed that the medications affect disease progression proxies, such as inflammation markers and markers of organ function. The selection of ‘interventions’ is consistent with current treatment standards for bacterial pneumonia and respiratory distress 67 , 68 . Based on the approach of the tool ‘dowhy’ 69 (Fig. 4a ), ehrapy’s causal module identified the application of corticosteroids, antivirals and carbapenems to be associated with shorter LOSs, in line with current evidence 61 , 70 , 71 , 72 . In contrast, penicillins and cephalosporins were associated with longer LOSs, whereas antifungal medication did not strongly influence LOS (Fig. 4c ).

ehrapy enables deriving population-scale risk factors

To illustrate the advantages of using a unified data management and quality control framework, such as ehrapy, we modeled myocardial infarction risk using Cox proportional hazards models on UKB 44 data. Large population cohort studies, such as the UKB, enable the investigation of common diseases across a wide range of modalities, including genomics, metabolomics, proteomics, imaging data and common clinical variables (Fig. 5a,b ). From these, we used a publicly available polygenic risk score for coronary heart disease 73 comprising 6.6 million variants, 80 nuclear magnetic resonance (NMR) spectroscopy-based metabolomics 74 features, 81 features derived from retinal optical coherence tomography 75 , 76 and the Framingham Risk Score 77 feature set, which includes known clinical predictors, such as age, sex, body mass index, blood pressure, smoking behavior and cholesterol levels. We excluded features with more than 10% missingness and imputed the remaining missing values ( Methods ). Furthermore, individuals with events up to 1 year after the sampling time were excluded from the analyses, ultimately selecting 29,216 individuals for whom all mentioned data types were available (Extended Data Figs. 8 and 9 and Methods ). Myocardial infarction, as defined by our mapping to the phecode nomenclature 51 , was defined as the endpoint (Fig. 5c ). We modeled the risk for myocardial infarction 1 year after either the metabolomic sample was obtained or imaging was performed.

figure 5

a , The UKB includes 502,359 participants from 22 assessment centers. Most participants have genetic data (97%) and physical measurement data (93%), but fewer have data for complex measures, such as metabolomics, retinal imaging or proteomics. b , We found a distinct cluster of individuals (bottom right) from the Birmingham assessment center in the retinal imaging data, which is an artifact of the image acquisition process and was, thus, excluded. c , Myocardial infarctions are recorded for 15% of the male and 7% of the female study population. Kaplan–Meier estimators with 95% CIs are shown. d , For every modality combination, a linear Cox proportional hazards model was fit to determine the prognostic potential of these for myocardial infarction. Cardiovascular risk factors show expected positive log hazard ratios (log (HRs)) for increased blood pressure or total cholesterol and negative ones for sampling age and systolic blood pressure (BP). log (HRs) with 95% CIs are shown. e , Combining all features yields a C-index of 0.81. c – e , Error bars indicate 95% CIs ( n  = 29,216).

Predictive performance for each modality was assessed by fitting Cox proportional hazards (Fig. 5c ) models on each of the feature sets using ehrapy (Fig. 5d ). The age of the first occurrence served as the time to event; alternatively, date of death or date of the last record in the EHR served as censoring times. Models were evaluated using the concordance index (C-index) ( Methods ). The combination of multiple modalities successfully improved the predictive performance for coronary heart disease by increasing the C-index from 0.63 (genetic) to 0.76 (genetics, age and sex) and to 0.77 (clinical predictors) with 0.81 (imaging and clinical predictors) for combinations of feature sets (Fig. 5e ). Our finding is in line with previous observations of complementary effects between different modalities, where a broader ‘major adverse cardiac event’ phenotype was modeled in the UKB achieving a C-index of 0.72 (ref. 78 ). Adding genetic data improves predictive potential, as it is independent of sampling age and has limited prediction of other modalities 79 . The addition of metabolomic data did not improve predictive power (Fig. 5e ).

Imaging-based disease severity projection via fate mapping

To demonstrate ehrapy’s ability to handle diverse image data and recover disease stages, we embedded pulmonary imaging data obtained from patients with COVID-19 into a lower-dimensional space and computationally inferred disease progression trajectories using pseudotemporal ordering. This describes a continuous trajectory or ordering of individual points based on feature similarity 80 . Continuous trajectories enable mapping the fate of new patients onto precise states to potentially predict their future condition.

In COVID-19, a highly contagious respiratory illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), symptoms range from mild flu-like symptoms to severe respiratory distress. Chest x-rays typically show opacities (bilateral patchy, ground glass) associated with disease severity 81 .

We used COVID-19 chest x-ray images from the BrixIA 82 dataset consisting of 192 images (Fig. 6a ) with expert annotations of disease severity. We used the BrixIA database scores, which are based on six regions annotated by radiologists, to classify disease severity ( Methods ). We embedded raw image features using a pre-trained DenseNet model ( Methods ) and further processed this embedding into a nearest-neighbors-based UMAP space using ehrapy (Fig. 6b and Methods ). Fate mapping based on imaging information ( Methods ) determined a severity ordering from mild to critical cases (Fig. 6b–d ). Images labeled as ‘normal’ are projected to stay within the healthy group, illustrating the robustness of our approach. Images of diseased patients were ordered by disease severity, highlighting clear trajectories from ‘normal’ to ‘critical’ states despite the heterogeneity of the x-ray images stemming from, for example, different zoom levels (Fig. 6a ).

figure 6

a , Randomly selected chest x-ray images from the BrixIA dataset demonstrate its variance. b , UMAP visualization of the BrixIA dataset embedding shows a separation of disease severity classes. c , Calculated pseudotime for all images increases with distance to the ‘normal’ images. d , Stream projection of fate mapping in UMAP space showcases disease severity trajectory of the COVID-19 chest x-ray images.

Detecting and mitigating biases in EHR data with ehrapy

To showcase how exploratory analysis using ehrapy can reveal and mitigate biases, we analyzed the Fairlearn 83 version of the Diabetes 130-US Hospitals 84 dataset. The dataset covers 10 years (1999–2008) of clinical records from 130 US hospitals, detailing 47 features of diabetes diagnoses, laboratory tests, medications and additional data from up to 14 d of inpatient care of 101,766 diagnosed patient visits ( Methods ). It was originally collected to explore the link between the measurement of hemoglobin A1c (HbA1c) and early readmission.

The cohort primarily consists of White and African American individuals, with only a minority of cases from Asian or Hispanic backgrounds (Extended Data Fig. 10a ). ehrapy’s cohort tracker unveiled selection and surveillance biases when filtering for Medicare recipients for further analysis, resulting in a shift of age distribution toward an age of over 60 years in addition to an increasing ratio of White participants. Using ehrapy’s visualization modules, our analysis showed that HbA1c was measured in only 18.4% of inpatients, with a higher frequency in emergency admissions compared to referral cases (Extended Data Fig. 10b ). Normalization biases can skew data relationships when standardization techniques ignore subgroup variability or assume incorrect distributions. The choice of normalization strategy must be carefully considered to avoid obscuring important factors. When normalizing the number of applied medications individually, differences in distributions between age groups remained. However, when normalizing both distributions jointly with age group as an additional group variable, differences between age groups were masked (Extended Data Fig. 10c ). To investigate missing data and imputation biases, we introduced missingness for the number of applied medications according to an MCAR mechanism, which we verified using ehrapy’s Little’s test ( P  ≤ 0.01 × 10 −2 ), and an MAR mechanism ( Methods ). Whereas imputing the mean in the MCAR case did not affect the overall location of the distribution, it led to an underestimation of the variance, with the standard deviation dropping from 8.1 in the original data to 6.8 in the imputed data (Extended Data Fig. 10d ). Mean imputation in the MAR case skewed both location and variance of the mean from 16.02 to 14.66, with a standard deviation of only 5.72 (Extended Data Fig. 10d ). Using ehrapy’s multiple imputation based MissForest 85 imputation on the MAR data resulted in a mean of 16.04 and a standard deviation of 6.45. To predict patient readmission in fewer than 30 d, we merged the three smallest race groups, ‘Asian’, ‘Hispanic’ and ‘Other’. Furthermore, we dropped the gender group ‘Unknown/Invalid’ owing to the small sample size making meaningful assessment impossible, and we performed balanced random undersampling, resulting in 5,677 cases from each condition. We observed an overall balanced accuracy of 0.59 using a logistic regression model. However, the false-negative rate was highest for the races ‘Other’ and ‘Unknown’, whereas their selection rate was lowest, and this model was, therefore, biased (Extended Data Fig. 10e ). Using ehrapy’s compatibility with existing machine learning packages, we used Fairlearn’s ThresholdOptimizer ( Methods ), which improved the selection rates for ‘Other’ from 0.32 to 0.38 and for ‘Unknown’ from 0.23 to 0.42 and the false-negative rates for ‘Other’ from 0.48 to 0.42 and for ‘Unknown’ from 0.61 to 0.45 (Extended Data Fig. 10e ).

Clustering offers a hypothesis-free alternative to supervised classification when clear hypotheses or labels are missing. It has enabled the identification of heart failure subtypes 86 and progression pathways 87 and COVID-19 severity states 88 . This concept, which is central to ehrapy, further allowed us to identify fine-grained groups of ‘unspecified pneumonia’ cases in the PIC dataset while discovering biomarkers and quantifying effects of medications on LOS. Such retroactive characterization showcases ehrapy’s ability to put complex evidence into context. This approach supports feedback loops to improve diagnostic and therapeutic strategies, leading to more efficiently allocated resources in healthcare.

ehrapy’s flexible data structures enabled us to integrate the heterogeneous UKB data for predictive performance in myocardial infarction. The different data types and distributions posed a challenge for predictive models that were overcome with ehrapy’s pre-processing modules. Our analysis underscores the potential of combining phenotypic and health data at population scale through ehrapy to enhance risk prediction.

By adapting pseudotime approaches that are commonly used in other omics domains, we successfully recovered disease trajectories from raw imaging data with ehrapy. The determined pseudotime, however, only orders data but does not necessarily provide a future projection per patient. Understanding the driver features for fate mapping in image-based datasets is challenging. The incorporation of image segmentation approaches could mitigate this issue and provide a deeper insight into the spatial and temporal dynamics of disease-related processes.

Limitations of our analyses include the lack of control for informative missingness where the absence of information represents information in itself 89 . Translation from Chinese to English in the PIC database can cause information loss and inaccuracies because the Chinese ICD-10 codes are seven characters long compared to the five-character English codes. Incompleteness of databases, such as the lack of radiology images in the PIC database, low sample sizes, underrepresentation of non-White ancestries and participant self-selection, cannot be accounted for and limit generalizability. This restricts deeper phenotyping of, for example, all ‘unspecified pneumonia’ cases with respect to their survival, which could be overcome by the use of multiple databases. Our causal inference use case is limited by unrecorded variables, such as Sequential Organ Failure Assessment (SOFA) scores, and pneumonia-related pathogens that are missing in the causal graph due to dataset constraints, such as high sparsity and substantial missing data, which risk overfitting and can lead to overinterpretation. We counterbalanced this by employing several refutation methods that statistically reject the causal hypothesis, such as a placebo treatment, a random common cause or an unobserved common cause. The longer hospital stays associated with penicillins and cephalosporins may be dataset specific and stem from higher antibiotic resistance, their use as first-line treatments, more severe initial cases, comorbidities and hospital-specific protocols.

Most analysis steps can introduce algorithmic biases where results are misleading or unfavorably affect specific groups. This is particularly relevant in the context of missing data 22 where determining the type of missing data is necessary to handle it correctly. ehrapy includes an implementation of Little’s test 90 , which tests whether data are distributed MCAR to discern missing data types. For MCAR data single-imputation approaches, such as mean, median or mode, imputation can suffice, but these methods are known to reduce variability 91 , 92 . Multiple imputation strategies, such as Multiple Imputation by Chained Equations (MICE) 93 and MissForest 85 , as implemented in ehrapy, are effective for both MCAR and MAR data 22 , 94 , 95 . MNAR data require pattern-mixture or shared-parameter models that explicitly incorporate the mechanism by which data are missing 96 . Because MNAR involves unobserved data, the assumptions about the missingness mechanism cannot be directly verified, making sensitivity analysis crucial 21 . ehrapy’s wide range of normalization functions and grouping functionality enables to account for intrinsic variability within subgroups, and its compatibility with Fairlearn 83 can potentially mitigate predictor biases. Generally, we recommend to assess all pre-processing in an iterative manner with respect to downstream applications, such as patient stratification. Moreover, sensitivity analysis can help verify the robustness of all inferred knowledge 97 .

These diverse use cases illustrate ehrapy’s potential to sufficiently address the need for a computationally efficient, extendable, reproducible and easy-to-use framework. ehrapy is compatible with major standards, such as Observational Medical Outcomes Partnership (OMOP), Common Data Model (CDM) 47 , HL7, FHIR or openEHR, with flexible support for common tabular data formats. Once loaded into an AnnData object, subsequent sharing of analysis results is made easy because AnnData objects can be stored and read platform independently. ehrapy’s rich documentation of the application programming interface (API) and extensive hands-on tutorials make EHR analysis accessible to both novices and experienced analysts.

As ehrapy remains under active development, users can expect ehrapy to continuously evolve. We are improving support for the joint analysis of EHR, genetics and molecular data where ehrapy serves as a bridge between the EHR and the omics communities. We further anticipate the generation of EHR-specific reference datasets, so-called atlases 98 , to enable query-to-reference mapping where new datasets get contextualized by transferring annotations from the reference to the new dataset. To promote the sharing and collective analysis of EHR data, we envision adapted versions of interactive single-cell data explorers, such as CELLxGENE 99 or the UCSC Cell Browser 100 , for EHR data. Such web interfaces would also include disparity dashboards 20 to unveil trends of preferential outcomes for distinct patient groups. Additional modules specifically for high-frequency time-series data, natural language processing and other data types are currently under development. With the widespread availability of code-generating large language models, frameworks such as ehrapy are becoming accessible to medical professionals without coding expertise who can leverage its analytical power directly. Therefore, ehrapy, together with a lively ecosystem of packages, has the potential to enhance the scientific discovery pipeline to shape the era of EHR analysis.

All datasets that were used during the development of ehrapy and the use cases were used according to their terms of use as indicated by each provider.

Design and implementation of ehrapy

A unified pipeline as provided by our ehrapy framework streamlines the analysis of EHR data by providing an efficient, standardized approach, which reduces the complexity and variability in data pre-processing and analysis. This consistency ensures reproducibility of results and facilitates collaboration and sharing within the research community. Additionally, the modular structure allows for easy extension and customization, enabling researchers to adapt the pipeline to their specific needs while building on a solid foundational framework.

ehrapy was designed from the ground up as an open-source effort with community support. The package, as well as all associated tutorials and dataset preparation scripts, are open source. Development takes place publicly on GitHub where the developers discuss feature requests and issues directly with users. This tight interaction between both groups ensures that we implement the most pressing needs to cater the most important use cases and can guide users when difficulties arise. The open-source nature, extensive documentation and modular structure of ehrapy are designed for other developers to build upon and extend ehrapy’s functionality where necessary. This allows us to focus ehrapy on the most important features to keep the number of dependencies to a minimum.

ehrapy was implemented in the Python programming language and builds upon numerous existing numerical and scientific open-source libraries, specifically matplotlib 101 , seaborn 102 , NumPy 103 , numba 104 , Scipy 105 , scikit-learn 53 and Pandas 106 . Although taking considerable advantage of all packages implemented, ehrapy also shares the limitations of these libraries, such as a lack of GPU support or small performance losses due to the translation layer cost for operations between the Python interpreter and the lower-level C language for matrix operations. However, by building on very widely used open-source software, we ensure seamless integration and compatibility with a broad range of tools and platforms to promote community contributions. Additionally, by doing so, we enhance security by allowing a larger pool of developers to identify and address vulnerabilities 107 . All functions are grouped into task-specific modules whose implementation is complemented with additional dependencies.

Data preparation

Dataloaders.

ehrapy is compatible with any type of vectorized data, where vectorized refers to the data being stored in structured tables in either on-disk or database form. The input and output module of ehrapy provides readers for common formats, such as OMOP, CSV tables or SQL databases through Pandas. When reading in such datasets, the data are stored in the appropriate slots in a new AnnData 46 object. ehrapy’s data module provides access to more than 20 public EHR datasets that feature diseases, including, but not limited to, Parkinson’s disease, breast cancer, chronic kidney disease and more. All dataloaders return AnnData objects to allow for immediate analysis.

AnnData for EHR data

Our framework required a versatile data structure capable of handling various matrix formats, including Numpy 103 for general use cases and interoperability, Scipy 105 sparse matrices for efficient storage, Dask 108 matrices for larger-than-memory analysis and Awkward array 109 for irregular time-series data. We needed a single data structure that not only stores data but also includes comprehensive annotations for thorough contextual analysis. It was essential for this structure to be widely used and supported, which ensures robustness and continual updates. Interoperability with other analytical packages was a key criterion to facilitate seamless integration within existing tools and workflows. Finally, the data structure had to support both in-memory operations and on-disk storage using formats such as HDF5 (ref. 110 ) and Zarr 111 , ensuring efficient handling and accessibility of large datasets and the ability to easily share them with collaborators.

All of these requirements are fulfilled by the AnnData format, which is a popular data structure in single-cell genomics. At its core, an AnnData object encapsulates diverse components, providing a holistic representation of data and metadata that are always aligned in dimensions and easily accessible. A data matrix (commonly referred to as ‘ X ’) stands as the foundational element, embodying the measured data. This matrix can be dense (as Numpy array), sparse (as Scipy sparse matrix) or ragged (as Awkward array) where dimensions do not align within the data matrix. The AnnData object can feature several such data matrices stored in ‘layers’. Examples of such layers can be unnormalized or unencoded data. These data matrices are complemented by an observations (commonly referred to as ‘obs’) segment where annotations on the level of patients or visits are stored. Patients’ age or sex, for instance, are often used as such annotations. The variables (commonly referred to as ‘var’) section complements the observations, offering supplementary details about the features in the dataset, such as missing data rates. The observation-specific matrices (commonly referred to as ‘obsm’) section extends the capabilities of the AnnData structure by allowing the incorporation of observation-specific matrices. These matrices can represent various types of information at the individual cell level, such as principal component analysis (PCA) results, t-distributed stochastic neighbor embedding (t-SNE) coordinates or other dimensionality reduction outputs. Analogously, AnnData features a variables-specific variables (commonly referred to as ‘varm’) component. The observation-specific pairwise relationships (commonly referred to as ‘obsp’) segment complements the ‘obsm’ section by accommodating observation-specific pairwise relationships. This can include connectivity matrices, indicating relationships between patients. The inclusion of an unstructured annotations (commonly referred to as ‘uns’) component further enhances flexibility. This segment accommodates unstructured annotations or arbitrary data that might not conform to the structured observations or variables categories. Any AnnData object can be stored on disk in h5ad or Zarr format to facilitate data exchange.

ehrapy natively interfaces with the scientific Python ecosystem via Pandas 112 and Numpy 103 . The development of deep learning models for EHR data 113 is further accelerated through compatibility with pathml 114 , a unified framework for whole-slide image analysis in pathology, and scvi-tools 115 , which provides data loaders for loading tensors from AnnData objects into PyTorch 116 or Jax arrays 117 to facilitate the development of generalizing foundational models for medical artificial intelligence 118 .

Feature annotation

After AnnData creation, any metadata can be mapped against ontologies using Bionty ( https://github.com/laminlabs/bionty-base ). Bionty provides access to the Human Phenotype, Phecodes, Phenotype and Trait, Drug, Mondo and Human Disease ontologies.

Key medical terms stored in an AnnData object in free text can be extracted using the Medical Concept Annotation Toolkit (MedCAT) 119 .

Data processing

Cohort tracking.

ehrapy provides a CohortTracker tool that traces all filtering steps applied to an associated AnnData object. To calculate cohort summary statistics, the implementation makes use of tableone 120 and can subsequently be plotted as bar charts together with flow diagrams 121 that visualize the order and reasoning of filtering operations.

Basic pre-processing and quality control

ehrapy encompasses a suite of functionalities for fundamental data processing that are adopted from scanpy 52 but adapted to EHR data:

Regress out: To address unwanted sources of variation, a regression procedure is integrated, enhancing the dataset’s robustness.

Subsample: Selects a specified fraction of observations.

Balanced sample: Balances groups in the dataset by random oversampling or undersampling.

Highly variable features: The identification and annotation of highly variable features following the ‘highly variable genes’ function of scanpy is seamlessly incorporated, providing users with insights into pivotal elements influencing the dataset.

To identify and minimize quality issues, ehrapy provides several quality control functions:

Basic quality control: Determines the relative and absolute number of missing values per feature and per patient.

Winsorization: For data refinement, ehrapy implements a winsorization process, creating a version of the input array less susceptible to extreme values.

Feature clipping: Imposes limits on features to enhance dataset reliability.

Detect biases: Computes pairwise correlations between features, standardized mean differences for numeric features between groups of sensitive features, categorical feature value count differences between groups of sensitive features and feature importances when predicting a target variable.

Little’s MCAR test: Applies Little’s MCAR test whose null hypothesis is that data are MCAR. Rejecting the null hypothesis may not always mean that data are not MCAR, nor is accepting the null hypothesis a guarantee that data are MCAR. For more details, see Schouten et al. 122 .

Summarize features: Calculates statistical indicators per feature, including minimum, maximum and average values. This can be especially useful to reduce complex data with multiple measurements per feature per patient into sets of columns with single values.

Imputation is crucial in data analysis to address missing values, ensuring the completeness of datasets that can be required for specific algorithms. The ‘ehrapy’ pre-processing module offers a range of imputation techniques:

Explicit Impute: Replaces missing values, in either all columns or a user-specified subset, with a designated replacement value.

Simple Impute: Imputes missing values in numerical data using mean, median or the most frequent value, contributing to a more complete dataset.

KNN Impute: Uses k -nearest neighbor imputation to fill in missing values in the input AnnData object, preserving local data patterns.

MissForest Impute: Implements the MissForest strategy for imputing missing data, providing a robust approach for handling complex datasets.

MICE Impute: Applies the MICE algorithm for imputing data. This implementation is based on the miceforest ( https://github.com/AnotherSamWilson/miceforest ) package.

Data encoding can be required if categoricals are a part of the dataset to obtain numerical values only. Most algorithms in ehrapy are compatible only with numerical values. ehrapy offers two encoding algorithms based on scikit-learn 53 :

One-Hot Encoding: Transforms categorical variables into binary vectors, creating a binary feature for each category and capturing the presence or absence of each category in a concise representation.

Label Encoding: Assigns a unique numerical label to each category, facilitating the representation of categorical data as ordinal values and supporting algorithms that require numerical input.

To ensure that the distributions of the heterogeneous data are aligned, ehrapy offers several normalization procedures:

Log Normalization: Applies the natural logarithm function to the data, useful for handling skewed distributions and reducing the impact of outliers.

Max-Abs Normalization: Scales each feature by its maximum absolute value, ensuring that the maximum absolute value for each feature is 1.

Min-Max Normalization: Transforms the data to a specific range (commonly (0, 1)) by scaling each feature based on its minimum and maximum values.

Power Transformation Normalization: Applies a power transformation to make the data more Gaussian like, often useful for stabilizing variance and improving the performance of models sensitive to distributional assumptions.

Quantile Normalization: Aligns the distributions of multiple variables, ensuring that their quantiles match, which can be beneficial for comparing datasets or removing batch effects.

Robust Scaling Normalization: Scales data using the interquartile range, making it robust to outliers and suitable for datasets with extreme values.

Scaling Normalization: Standardizes data by subtracting the mean and dividing by the standard deviation, creating a distribution with a mean of 0 and a standard deviation of 1.

Offset to Positive Values: Shifts all values by a constant offset to make all values non-negative, with the lowest negative value becoming 0.

Dataset shifts can be corrected using the scanpy implementation of the ComBat 123 algorithm, which employs a parametric and non-parametric empirical Bayes framework for adjusting data for batch effects that is robust to outliers.

Finally, a neighbors graph can be efficiently computed using scanpy’s implementation.

To obtain meaningful lower-dimensional embeddings that can subsequently be visualized and reused for downstream algorithms, ehrapy provides the following algorithms based on scanpy’s implementation:

t-SNE: Uses a probabilistic approach to embed high-dimensional data into a lower-dimensional space, emphasizing the preservation of local similarities and revealing clusters in the data.

UMAP: Embeds data points by modeling their local neighborhood relationships, offering an efficient and scalable technique that captures both global and local structures in high-dimensional data.

Force-Directed Graph Drawing: Uses a physical simulation to position nodes in a graph, with edges representing pairwise relationships, creating a visually meaningful representation that emphasizes connectedness and clustering in the data.

Diffusion Maps: Applies spectral methods to capture the intrinsic geometry of high-dimensional data by modeling diffusion processes, providing a way to uncover underlying structures and patterns.

Density Calculation in Embedding: Quantifies the density of observations within an embedding, considering conditions or groups, offering insights into the concentration of data points in different regions and aiding in the identification of densely populated areas.

ehrapy further provides algorithms for clustering and trajectory inference based on scanpy:

Leiden Clustering: Uses the Leiden algorithm to cluster observations into groups, revealing distinct communities within the dataset with an emphasis on intra-cluster cohesion.

Hierarchical Clustering Dendrogram: Constructs a dendrogram through hierarchical clustering based on specified group by categories, illustrating the hierarchical relationships among observations and facilitating the exploration of structured patterns.

Feature ranking

ehrapy provides two ways of ranking feature contributions to clusters and target variables:

Statistical tests: To compare any obtained clusters to obtain marker features that are significantly different between the groups, ehrapy extends scanpy’s ‘rank genes groups’. The original implementation, which features a t -test for numerical data, is complemented by a g -test for categorical data.

Feature importance: Calculates feature rankings for a target variable using linear regression, support vector machine or random forest models from scikit-learn. ehrapy evaluates the relative importance of each predictor by fitting the model and extracting model-specific metrics, such as coefficients or feature importances.

Dataset integration

Based on scanpy’s ‘ingest’ function, ehrapy facilitates the integration of labels and embeddings from a well-annotated reference dataset into a new dataset, enabling the mapping of cluster annotations and spatial relationships for consistent comparative analysis. This process ensures harmonized clinical interpretations across datasets, especially useful when dealing with multiple experimental diseases or batches.

Knowledge inference

Survival analysis.

ehrapy’s implementation of survival analysis algorithms is based on lifelines 124 :

Ordinary Least Squares (OLS) Model: Creates a linear regression model using OLS from a specified formula and an AnnData object, allowing for the analysis of relationships between variables and observations.

Generalized Linear Model (GLM): Constructs a GLM from a given formula, distribution and AnnData, providing a versatile framework for modeling relationships with nonlinear data structures.

Kaplan–Meier: Fits the Kaplan–Meier curve to generate survival curves, offering a visual representation of the probability of survival over time in a dataset.

Cox Hazard Model: Constructs a Cox proportional hazards model using a specified formula and an AnnData object, enabling the analysis of survival data by modeling the hazard rates and their relationship to predictor variables.

Log-Rank Test: Calculates the P value for the log-rank test, comparing the survival functions of two groups, providing statistical significance for differences in survival distributions.

GLM Comparison: Given two fit GLMs, where the larger encompasses the parameter space of the smaller, this function returns the P value, indicating the significance of the larger model and adding explanatory power beyond the smaller model.

Trajectory inference

Trajectory inference is a computational approach that reconstructs and models the developmental paths and transitions within heterogeneous clinical data, providing insights into the temporal progression underlying complex systems. ehrapy offers several inbuilt algorithms for trajectory inference based on scanpy:

Diffusion Pseudotime: Infers the progression of observations by measuring geodesic distance along the graph, providing a pseudotime metric that represents the developmental trajectory within the dataset.

Partition-based Graph Abstraction (PAGA): Maps out the coarse-grained connectivity structures of complex manifolds using a partition-based approach, offering a comprehensive visualization of relationships in high-dimensional data and aiding in the identification of macroscopic connectivity patterns.

Because ehrapy is compatible with scverse, further trajectory inference-based algorithms, such as CellRank, can be seamlessly applied.

Causal inference

ehrapy’s causal inference module is based on ‘dowhy’ 69 . It is based on four key steps that are all implemented in ehrapy:

Graphical Model Specification: Define a causal graphical model representing relationships between variables and potential causal effects.

Causal Effect Identification: Automatically identify whether a causal effect can be inferred from the given data, addressing confounding and selection bias.

Causal Effect Estimation: Employ automated tools to estimate causal effects, using methods such as matching, instrumental variables or regression.

Sensitivity Analysis and Testing: Perform sensitivity analysis to assess the robustness of causal inferences and conduct statistical testing to determine the significance of the estimated causal effects.

Patient stratification

ehrapy’s complete pipeline from pre-processing to the generation of lower-dimensional embeddings, clustering, statistical comparison between determined groups and more facilitates the stratification of patients.

Visualization

ehrapy features an extensive visualization pipeline that is customizable and yet offers reasonable defaults. Almost every analysis function is matched with at least one visualization function that often shares the name but is available through the plotting module. For example, after importing ehrapy as ‘ep’, ‘ep.tl.umap(adata)’ runs the UMAP algorithm on an AnnData object, and ‘ep.pl.umap(adata)’ would then plot a scatter plot of the UMAP embedding.

ehrapy further offers a suite of more generally usable and modifiable plots:

Scatter Plot: Visualizes data points along observation or variable axes, offering insights into the distribution and relationships between individual data points.

Heatmap: Represents feature values in a grid, providing a comprehensive overview of the data’s structure and patterns.

Dot Plot: Displays count values of specified variables as dots, offering a clear depiction of the distribution of counts for each variable.

Filled Line Plot: Illustrates trends in data with filled lines, emphasizing variations in values over a specified axis.

Violin Plot: Presents the distribution of data through mirrored density plots, offering a concise view of the data’s spread.

Stacked Violin Plot: Combines multiple violin plots, stacked to allow for visual comparison of distributions across categories.

Group Mean Heatmap: Creates a heatmap displaying the mean count per group for each specified variable, providing insights into group-wise trends.

Hierarchically Clustered Heatmap: Uses hierarchical clustering to arrange data in a heatmap, revealing relationships and patterns among variables and observations.

Rankings Plot: Visualizes rankings within the data, offering a clear representation of the order and magnitude of values.

Dendrogram Plot: Plots a dendrogram of categories defined in a group by operation, illustrating hierarchical relationships within the dataset.

Benchmarking ehrapy

We generated a subset of the UKB data selecting 261 features and 488,170 patient visits. We removed all features with missingness rates greater than 70%. To demonstrate speed and memory consumption for various scenarios, we subsampled the data to 20%, 30% and 50%. We ran a minimal ehrapy analysis pipeline on each of those subsets and the full data, including the calculation of quality control metrics, filtering of variables by a missingness threshold, nearest neighbor imputation, normalization, dimensionality reduction and clustering (Supplementary Table 1 ). We conducted our benchmark on a single CPU with eight threads and 60 GB of maximum memory.

ehrapy further provides out-of-core implementations using Dask 108 for many algorithms in ehrapy, such as our normalization functions or our PCA implementation. Out-of-core computation refers to techniques that process data that do not fit entirely in memory, using disk storage to manage data overflow. This approach is crucial for handling large datasets without being constrained by system memory limits. Because the principal components get reused for other computationally expensive algorithms, such as the neighbors graph calculation, it effectively enables the analysis of very large datasets. We are currently working on supporting out-of-core computation for all computationally expensive algorithms in ehrapy.

We demonstrate the memory benefits in a hosted tutorial where the in-memory pipeline for 50,000 patients with 1,000 features required about 2 GB of memory, and the corresponding out-of-core implementation required less than 200 MB of memory.

The code for benchmarking is available at https://github.com/theislab/ehrapy-reproducibility . The implementation of ehrapy is accessible at https://github.com/theislab/ehrapy together with extensive API documentation and tutorials at https://ehrapy.readthedocs.io .

PIC database analysis

Study design.

We collected clinical data from the PIC 43 version 1.1.0 database. PIC is a single-center, bilingual (English and Chinese) database hosting information of children admitted to critical care units at the Children’s Hospital of Zhejiang University School of Medicine in China. The requirement for individual patient consent was waived because the study did not impact clinical care, and all protected health information was de-identified. The database contains 13,499 distinct hospital admissions of 12,881 distinct pediatric patients. These patients were admitted to five ICU units with 119 total critical care beds—GICU, PICU, SICU, CICU and NICU—between 2010 and 2018. The mean age of the patients was 2.5 years, of whom 42.5% were female. The in-hospital mortality was 7.1%; the mean hospital stay was 17.6 d; the mean ICU stay was 9.3 d; and 468 (3.6%) patients were admitted multiple times. Demographics, diagnoses, doctors’ notes, laboratory and microbiology tests, prescriptions, fluid balances, vital signs and radiographics reports were collected from all patients. For more details, see the original publication of Zeng et al. 43 .

Study participants

Individuals older than 18 years were excluded from the study. We grouped the data into three distinct groups: ‘neonates’ (0–28 d of age; 2,968 patients), ‘infants’ (1–12 months of age; 4,876 patients) and ‘youths’ (13 months to 18 years of age; 6,097 patients). We primarily analyzed the ‘youths’ group with the discharge diagnosis ‘unspecified pneumonia’ (277 patients).

Data collection

The collected clinical data included demographics, laboratory and vital sign measurements, diagnoses, microbiology and medication information and mortality outcomes. The five-character English ICD-10 codes were used, whose values are based on the seven-character Chinese ICD-10 codes.

Dataset extraction and analysis

We downloaded the PIC database of version 1.1.0 from Physionet 1 to obtain 17 CSV tables. Using Pandas, we selected all information with more than 50% coverage rate, including demographics and laboratory and vital sign measurements (Fig. 2 ). To reduce the amount of noise, we calculated and added only the minimum, maximum and average of all measurements that had multiple values per patient. Examination reports were removed because they describe only diagnostics and not detailed findings. All further diagnoses and microbiology and medication information were included into the observations slot to ensure that the data were not used for the calculation of embeddings but were still available for the analysis. This ensured that any calculated embedding would not be divided into treated and untreated groups but, rather, solely based on phenotypic features. We imputed all missing data through k -nearest neighbors imputation ( k  = 20) using the knn_impute function of ehrapy. Next, we log normalized the data with ehrapy using the log_norm function. Afterwards, we winsorized the data using ehrapy’s winsorize function to obtain 277 ICU visits ( n  = 265 patients) with 572 features. Of those 572 features, 254 were stored in the matrix X and the remaining 318 in the ‘obs’ slot in the AnnData object. For clustering and visualization purposes, we calculated 50 principal components using ehrapy’s pca function. The obtained principal component representation was then used to calculate a nearest neighbors graph using the neighbors function of ehrapy. The nearest neighbors graph then served as the basis for a UMAP embedding calculation using ehrapy’s umap function.

We applied the community detection algorithm Leiden with resolution 0.6 on the nearest neighbor graph using ehrapy’s leiden function. The four obtained clusters served as input for two-sided t -tests for all numerical values and two-sided g -tests for all categorical values for all four clusters against the union of all three other clusters, respectively. This was conducted using ehrapy’s rank_feature_groups function, which also corrects P values for multiple testing with the Benjamini–Hochberg method 125 . We presented the four groups and the statistically significantly different features between the groups to two pediatricians who annotated the groups with labels.

Our determined groups can be confidently labeled owing to their distinct clinical profiles. Nevertheless, we could only take into account clinical features that were measured. Insightful features, such as lung function tests, are missing. Moreover, the feature representation of the time-series data is simplified, which can hide some nuances between the groups. Generally, deciding on a clustering resolution is difficult. However, more fine-grained clusters obtained via higher clustering resolutions may become too specific and not generalize well enough.

Kaplan–Meier survival analysis

We selected patients with up to 360 h of total stay for Kaplan–Meier survival analysis to ensure a sufficiently high number of participants. We proceeded with the AnnData object prepared as described in the ‘Patient stratification’ subsection to conduct Kaplan–Meier analysis among all four determined pneumonia groups using ehrapy’s kmf function. Significance was tested through ehrapy’s test_kmf_logrank function, which tests whether two Kaplan–Meier series are statistically significant, employing a chi-squared test statistic under the null hypothesis. Let h i (t) be the hazard ratio of group i at time t and c a constant that represents a proportional change in the hazard ratio between the two groups, then:

This implicitly uses the log-rank weights. An additional Kaplan–Meier analysis was conducted for all children jointly concerning the liver markers AST, ALT and GGT. To determine whether measurements were inside or outside the norm range, we used reference ranges (Supplementary Table 2 ). P values less than 0.05 were labeled significant.

Our Kaplan–Meier curve analysis depends on the groups being well defined and shares the same limitations as the patient stratification. Additionally, the analysis is sensitive to the reference table where we selected limits that generalize well for the age ranges, but, due to children of different ages being examined, they may not necessarily be perfectly accurate for all children.

Causal effect of mechanism of action on LOS

Although the dataset was not initially intended for investigating causal effects of interventions, we adapted it for this purpose by focusing on the LOS in the ICU, measured in months, as the outcome variable. This choice aligns with the clinical aim of stabilizing patients sufficiently for ICU discharge. We constructed a causal graph to explore how different drug administrations could potentially reduce the LOS. Based on consultations with clinicians, we included several biomarkers of liver damage (AST, ALT and GGT) and inflammation (CRP and PCT) in our model. Patient age was also considered a relevant variable.

Because several different medications act by the same mechanisms, we grouped specific medications by their drug classes This grouping was achieved by cross-referencing the drugs listed in the dataset with DrugBank release 5.1 (ref. 126 ), using Levenshtein distances for partial string matching. After manual verification, we extracted the corresponding DrugBank categories, counted the number of features per category and compiled a list of commonly prescribed medications, as advised by clinicians. This approach facilitated the modeling of the causal graph depicted in Fig. 4 , where an intervention is defined as the administration of at least one drug from a specified category.

Causal inference was then conducted with ehrapy’s ‘dowhy’ 69 -based causal inference module using the expert-curated causal graph. Medication groups were designated as causal interventions, and the LOS was the outcome of interest. Linear regression served as the estimation method for analyzing these causal effects. We excluded four patients from the analysis owing to their notably long hospital stays exceeding 90 d, which were deemed outliers. To validate the robustness of our causal estimates, we incorporated several refutation methods:

Placebo Treatment Refuter: This method involved replacing the treatment assignment with a placebo to test the effect of the treatment variable being null.

Random Common Cause: A randomly generated variable was added to the data to assess the sensitivity of the causal estimate to the inclusion of potential unmeasured confounders.

Data Subset Refuter: The stability of the causal estimate was tested across various random subsets of the data to ensure that the observed effects were not dependent on a specific subset.

Add Unobserved Common Cause: This approach tested the effect of an omitted variable by adding a theoretically relevant unobserved confounder to the model, evaluating how much an unmeasured variable could influence the causal relationship.

Dummy Outcome: Replaces the true outcome variable with a random variable. If the causal effect nullifies, it supports the validity of the original causal relationship, indicating that the outcome is not driven by random factors.

Bootstrap Validation: Employs bootstrapping to generate multiple samples from the dataset, testing the consistency of the causal effect across these samples.

The selection of these refuters addresses a broad spectrum of potential biases and model sensitivities, including unobserved confounders and data dependencies. This comprehensive approach ensures robust verification of the causal analysis. Each refuter provides an orthogonal perspective, targeting specific vulnerabilities in causal analysis, which strengthens the overall credibility of the findings.

UKB analysis

Study population.

We used information from the UKB cohort, which includes 502,164 study participants from the general UK population without enrichment for specific diseases. The study involved the enrollment of individuals between 2006 and 2010 across 22 different assessment centers throughout the United Kingdom. The tracking of participants is still ongoing. Within the UKB dataset, metabolomics, proteomics and retinal optical coherence tomography data are available for a subset of individuals without any enrichment for specific diseases. Additionally, EHRs, questionnaire responses and other physical measures are available for almost everyone in the study. Furthermore, a variety of genotype information is available for nearly the entire cohort, including whole-genome sequencing, whole-exome sequencing, genotyping array data as well as imputed genotypes from the genotyping array 44 . Because only the latter two are available for download, and are sufficient for polygenic risk score calculation as performed here, we used the imputed genotypes in the present study. Participants visited the assessment center up to four times for additional and repeat measurements and completed additional online follow-up questionnaires.

In the present study, we restricted the analyses to data obtained from the initial assessment, including the blood draw, for obtaining the metabolomics data and the retinal imaging as well as physical measures. This restricts the study population to 33,521 individuals for whom all of these modalities are available. We have a clear study start point for each individual with the date of their initial assessment center visit. The study population has a mean age of 57 years, is 54% female and is censored at age 69 years on average; 4.7% experienced an incident myocardial infarction; and 8.1% have prevalent type 2 diabetes. The study population comes from six of the 22 assessment centers due to the retinal imaging being performed only at those.

For the myocardial infarction endpoint definition, we relied on the first occurrence data available in the UKB, which compiles the first date that each diagnosis was recorded for a participant in a hospital in ICD-10 nomenclature. Subsequently, we mapped these data to phecodes and focused on phecode 404.1 for myocardial infarction.

The Framingham Risk Score was developed on data from 8,491 participants in the Framingham Heart Study to assess general cardiovascular risk 77 . It includes easily obtainable predictors and is, therefore, easily applicable in clinical practice, although newer and more specific risk scores exist and might be used more frequently. It includes age, sex, smoking behavior, blood pressure, total and low-density lipoprotein cholesterol as well as information on insulin, antihypertensive and cholesterol-lowering medications, all of which are routinely collected in the UKB and used in this study as the Framingham feature set.

The metabolomics data used in this study were obtained using proton NMR spectroscopy, a low-cost method with relatively low batch effects. It covers established clinical predictors, such as albumin and cholesterol, as well as a range of lipids, amino acids and carbohydrate-related metabolites.

The retinal optical coherence tomography–derived features were returned by researchers to the UKB 75 , 76 . They used the available scans and determined the macular volume, macular thickness, retinal pigment epithelium thickness, disc diameter, cup-to-disk ratio across different regions as well as the thickness between the inner nuclear layer and external limiting membrane, inner and outer photoreceptor segments and the retinal pigment epithelium across different regions. Furthermore, they determined a wide range of quality metrics for each scan, including the image quality score, minimum motion correlation and inner limiting membrane (ILM) indicator.

Data analysis

After exporting the data from the UKB, all timepoints were transformed into participant age entries. Only participants without prevalent myocardial infarction (relative to the first assessment center visit at which all data were collected) were included.

The data were pre-processed for retinal imaging and metabolomics subsets separately, to enable a clear analysis of missing data and allow for the k -nearest neighbors–based imputation ( k  = 20) of missing values when less than 10% were missing for a given participant. Otherwise, participants were dropped from the analyses. The imputed genotypes and Framingham analyses were available for almost every participant and, therefore, not imputed. Individuals without them were, instead, dropped from the analyses. Because genetic risk modeling poses entirely different methodological and computational challenges, we applied a published polygenic risk score for coronary heart disease using 6.6 million variants 73 . This was computed using the plink2 score option on the imputed genotypes available in the UKB.

UMAP embeddings were computed using default parameters on the full feature sets with ehrapy’s umap function. For all analyses, the same time-to-event and event-indicator columns were used. The event indicator is a Boolean variable indicating whether a myocardial infarction was observed for a study participant. The time to event is defined as the timespan between the start of the study, in this case the date of the first assessment center visit. Otherwise, it is the timespan from the start of the study to the start of censoring; in this case, this is set to the last date for which EHRs were available, unless a participant died, in which case the date of death is the start of censoring. Kaplan–Meier curves and Cox proportional hazards models were fit using ehrapy’s survival analysis module and the lifelines 124 package’s Cox-PHFitter function with default parameters. For Cox proportional hazards models with multiple feature sets, individually imputed and quality-controlled feature sets were concatenated, and the model was fit on the resulting matrix. Models were evaluated using the C-index 127 as a metric. It can be seen as an extension of the common area under the receiver operator characteristic score to time-to-event datasets, in which events are not observed for every sample and which ranges from 0.0 (entirely false) over 0.5 (random) to 1.0 (entirely correct). CIs for the C-index were computed based on bootstrapping by sampling 1,000 times with replacement from all computed partial hazards and computing the C-index over each of these samples. The percentiles at 2.5% and 97.5% then give the upper and lower confidence bound for the 95% CIs.

In all UKB analyses, the unit of study for a statistical test or predictive model is always an individual study participant.

The generalizability of the analysis is limited as the UK Biobank cohort may not represent the general population, with potential selection biases and underrepresentation of the different demographic groups. Additionally, by restricting analysis to initial assessment data and censoring based on the last available EHR or date of death, our analysis does not account for longitudinal changes and can introduce follow-up bias, especially if participants lost to follow-up have different risk profiles.

In-depth quality control of retina-derived features

A UMAP plot of the retina-derived features indicating the assessment centers shows a cluster of samples that lie somewhat outside the general population and mostly attended the Birmingham assessment center (Fig. 5b ). To further investigate this, we performed Leiden clustering of resolution 0.3 (Extended Data Fig. 9a ) and isolated this group in cluster 5. When comparing cluster 5 to the rest of the population in the retina-derived feature space, we noticed that many individuals in cluster 5 showed overall retinal pigment epithelium (RPE) thickness measures substantially elevated over the rest of the population in both eyes (Extended Data Fig. 9b ), which is mostly a feature of this cluster (Extended Data Fig. 9c ). To investigate potential confounding, we computed ratios between cluster 5 and the rest of the population over the ‘obs’ DataFrame containing the Framingham features, diabetes-related phecodes and genetic principal components. Out of the top and bottom five highest ratios observed, six are in genetic principal components, which are commonly used to represent genetic ancestry in a continuous space (Extended Data Fig. 9d ). Additionally, diagnoses for type 1 and type 2 diabetes and antihypertensive use are enriched in cluster 5. Further investigating the ancestry, we computed log ratios for self-reported ancestries and absolute counts, which showed no robust enrichment and depletion effects.

A closer look at three quality control measures of the imaging pipeline revealed that cluster 5 was an outlier in terms of either image quality (Extended Data Fig. 9e ) or minimum motion correlation (Extended Data Fig. 9f ) and the ILM indicator (Extended Data Fig. 9g ), all of which can be indicative of artifacts in image acquisition and downstream processing 128 . Subsequently, we excluded 301 individuals from cluster 5 from all analyses.

COVID-19 chest-x-ray fate determination

Dataset overview.

We used the public BrixIA COVID-19 dataset, which contains 192 chest x-ray images annotated with BrixIA scores 82 . Hereby, six regions were annotated by a senior radiologist with more than 20 years of experience and a junior radiologist with a disease severity score ranging from 0 to 3. A global score was determined as the sum of all of these regions and, therefore, ranges from 0 to 18 (S-Global). S-Global scores of 0 were classified as normal. Images that only had severity values up to 1 in all six regions were classified as mild. Images with severity values greater than or equal to 2, but a S-Global score of less than 7, were classified as moderate. All images that contained at least one 3 in any of the six regions with a S-Global score between 7 and 10 were classified as severe, and all remaining images with S-Global scores greater than 10 with at least one 3 were labeled critical. The dataset and instructions to download the images can be found at https://github.com/ieee8023/covid-chestxray-dataset .

We first resized all images to 224 × 224. Afterwards, the images underwent a random affine transformation that involved rotation, translation and scaling. The rotation angle was randomly selected from a range of −45° to 45°. The images were also subject to horizontal and vertical translation, with the maximum translation being 15% of the image size in either direction. Additionally, the images were scaled by a factor ranging from 0.85 to 1.15. The purpose of applying these transformations was to enhance the dataset and introduce variations, ultimately improving the robustness and generalization of the model.

To generate embeddings, we used a pre-trained DenseNet model with weights densenet121-res224-all of TorchXRayVision 129 . A DenseNet is a convolutional neural network that makes use of dense connections between layers (Dense Blocks) where all layers (with matching feature map sizes) directly connect with each other. To maintain a feed-forward nature, every layer in the DenseNet architecture receives supplementary inputs from all preceding layers and transmits its own feature maps to all subsequent layers. The model was trained on the nih-pc- chex-mimic_ch-google-openi-rsna dataset 130 .

Next, we calculated 50 principal components on the feature representation of the DenseNet model of all images using ehrapy’s pca function. The principal component representation served as input for a nearest neighbors graph calculation using ehrapy’s neighbors function. This graph served as the basis for the calculation of a UMAP embedding with three components that was finally visualized using ehrapy.

We randomly picked a root in the group of images that was labeled ‘Normal’. First, we calculated so-called pseudotime by fitting a trajectory through the calculated UMAP space using diffusion maps as implemented in ehrapy’s dpt function 57 . Each image’s pseudotime value represents its estimated position along this trajectory, serving as a proxy for its severity stage relative to others in the dataset. To determine fates, we employed CellRank 58 , 59 with the PseudotimeKernel . This kernel computes transition probabilities for patient visits based on the connectivity of the k -nearest neighbors graph and the pseudotime values of patient visits, which resembles their progression through a process. Directionality is infused in the nearest neighbors graph in this process where the kernel either removes or downweights edges in the graph that contradict the directional flow of increasing pseudotime, thereby refining the graph to better reflect the developmental trajectory. We computed the transition matrix with a soft threshold scheme (Parameter of the PseudotimeKernel ), which downweights edges that point against the direction of increasing pseudotime. Finally, we calculated a projection on top of the UMAP embedding with CellRank using the plot_projection function of the PseudotimeKernel that we subsequently plotted.

This analysis is limited by the small dataset of 192 chest x-ray images, which may affect the model’s generalizability and robustness. Annotation subjectivity from radiologists can further introduce variability in severity scores. Additionally, the random selection of a root from ‘Normal’ images can introduce bias in pseudotime calculations and subsequent analyses.

Diabetes 130-US hospitals analysis

We used data from the Diabetes 130-US hospitals dataset that were collected between 1999 and 2008. It contains clinical care information at 130 hospitals and integrated delivery networks. The extracted database information pertains to hospital admissions specifically for patients diagnosed with diabetes. These encounters required a hospital stay ranging from 1 d to 14 d, during which both laboratory tests and medications were administered. The selection criteria focused exclusively on inpatient encounters with these defined characteristics. More specifically, we used a version that was curated by the Fairlearn team where the target variable ‘readmitted’ was binarized and a few features renamed or binned ( https://fairlearn.org/main/user_guide/datasets/diabetes_hospital_data.html ). The dataset contains 101,877 patient visits and 25 features. The dataset predominantly consists of White patients (74.8%), followed by African Americans (18.9%), with other racial groups, such as Hispanic, Asian and Unknown categories, comprising smaller percentages. Females make up a slight majority in the data at 53.8%, with males accounting for 46.2% and a negligible number of entries listed as unknown or invalid. A substantial majority of the patients are over 60 years of age (67.4%), whereas those aged 30–60 years represent 30.2%, and those 30 years or younger constitute just 2.5%.

All of the following descriptions start by loading the Fairlearn version of the Diabetes 130-US hospitals dataset using ehrapy’s dataloader as an AnnData object.

Selection and filtering bias

An overview of sensitive variables was generated using tableone. Subsequently, ehrapy’s CohortTracker was used to track the age, gender and race variables. The cohort was filtered for all Medicare recipients and subsequently plotted.

Surveillance bias

We plotted the HbA1c measurement ratios using ehrapy’s catplot .

Missing data and imputation bias

MCAR-type missing data for the number of medications variable (‘num_medications‘) were introduced by randomly setting 30% of the variables to be missing using Numpy’s choice function. We tested that the data are MCAR by applying ehrapy’s implementation of Little’s MCAR test, which returned a non-significant P value of 0.71. MAR data for the number of medications variable (‘num_medications‘) were introduced by scaling the ‘time_in_hospital’ variable to have a mean of 0 and a standard deviation of 1, adjusting these values by multiplying by 1.2 and subtracting 0.6 to influence overall missingness rate, and then using these values to generate MAR data in the ‘num_medications’ variable via a logistic transformation and binomial sampling. We verified that the newly introduced missing values are not MCAR with respect to the ‘time_in_hospital’ variable by applying ehrapy’s implementation of Little’s test, which was significant (0.01 × 10 −2 ). The missing data were imputed using ehrapy’s mean imputation and MissForest implementation.

Algorithmic bias

Variables ‘race’, ‘gender’, ‘age’, ‘readmitted’, ‘readmit_binary’ and ‘discharge_disposition_id’ were moved to the ‘obs’ slot of the AnnData object to ensure that they were not used for model training. We built a binary label ‘readmit_30_days’ indicating whether a patient had been readmitted in fewer than 30 d. Next, we combined the ‘Asian’ and ‘Hispanic’ categories into a single ‘Other’ category within the ‘race’ column of our AnnData object and then filtered out and discarded any samples labeled as ‘Unknown/Invalid’ under the ‘gender‘ column and subsequently moved the ‘gender’ data to the variable matrix X of the AnnData object. All categorical variables got encoded. The data were split into train and test groups with a test size of 50%. The data were scaled, and a logistic regression model was trained using scikit-learn, which was also used to determine the balanced accuracy score. Fairlearn’s MetricFrame function was used to inspect the target model performance against the sensitive variable ‘race’. We subsequently fit Fairlearn’s ThresholdOptimizer using the logistic regression estimator with balanced_accuracy_score as the target object. The algorithmic demonstration of Fairlearn’s abilities on this dataset is shown here: https://github.com/fairlearn/talks/tree/main/2021_scipy_tutorial .

Normalization bias

We one-hot encoded all categorical variables with ehrapy using the encode function. We applied ehrapy’s implementation of scaling normalization with and without the ‘Age group’ variable as group key to scale the data jointly and separately using ehrapy’s scale_norm function.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Physionet provides access to the PIC database 43 at https://physionet.org/content/picdb/1.1.0 for credentialed users. The BrixIA images 82 are available at https://github.com/BrixIA/Brixia-score-COVID-19 . The data used in this study were obtained from the UK Biobank 44 ( https://www.ukbiobank.ac.uk/ ). Access to the UK Biobank resource was granted under application number 49966. The data are available to researchers upon application to the UK Biobank in accordance with their data access policies and procedures. The Diabetes 130-US Hospitals dataset is available at https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008 .

Code availability

The ehrapy source code is available at https://github.com/theislab/ehrapy under an Apache 2.0 license. Further documentation, tutorials and examples are available at https://ehrapy.readthedocs.io . We are actively developing the software and invite contributions from the community.

Jupyter notebooks to reproduce our analysis and figures, including Conda environments that specify all versions, are available at https://github.com/theislab/ehrapy-reproducibility .

Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101 , E215–E220 (2000).

Article   CAS   PubMed   Google Scholar  

Atasoy, H., Greenwood, B. N. & McCullough, J. S. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Annu. Rev. Public Health 40 , 487–500 (2019).

Article   PubMed   Google Scholar  

Jamoom, E. W., Patel, V., Furukawa, M. F. & King, J. EHR adopters vs. non-adopters: impacts of, barriers to, and federal initiatives for EHR adoption. Health (Amst.) 2 , 33–39 (2014).

Google Scholar  

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1 , 18 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Wolf, A. et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int. J. Epidemiol. 48 , 1740–1740g (2019).

Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12 , e1001779 (2015).

Pollard, T. J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 5 , 180178 (2018).

Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3 , 160035 (2016).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hyland, S. L. et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat. Med. 26 , 364–373 (2020).

Rasmy, L. et al. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit. Health 4 , e415–e425 (2022).

Marcus, J. L. et al. Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study. Lancet HIV 6 , e688–e695 (2019).

Kruse, C. S., Stein, A., Thomas, H. & Kaur, H. The use of electronic health records to support population health: a systematic review of the literature. J. Med. Syst. 42 , 214 (2018).

Sheikh, A., Jha, A., Cresswell, K., Greaves, F. & Bates, D. W. Adoption of electronic health records in UK hospitals: lessons from the USA. Lancet 384 , 8–9 (2014).

Sheikh, A. et al. Health information technology and digital innovation for national learning health and care systems. Lancet Digit. Health 3 , e383–e396 (2021).

Cord, K. A. M., Mc Cord, K. A. & Hemkens, L. G. Using electronic health records for clinical trials: where do we stand and where can we go? Can. Med. Assoc. J. 191 , E128–E133 (2019).

Article   Google Scholar  

Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit. Med. 3 , 96 (2020).

Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R. & Stiawan, D. The Fast Health Interoperability Resources (FHIR) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med. Inform. 9 , e21929 (2021).

Peskoe, S. B. et al. Adjusting for selection bias due to missing data in electronic health records-based research. Stat. Methods Med. Res. 30 , 2221–2238 (2021).

Haneuse, S. & Daniels, M. A general framework for considering selection bias in EHR-based studies: what data are observed and why? EGEMS (Wash. DC) 4 , 1203 (2016).

PubMed   Google Scholar  

Gallifant, J. et al. Disparity dashboards: an evaluation of the literature and framework for health equity improvement. Lancet Digit. Health 5 , e831–e839 (2023).

Sauer, C. M. et al. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4 , e893–e898 (2022).

Li, J. et al. Imputation of missing values for electronic health record laboratory data. NPJ Digit. Med. 4 , 147 (2021).

Rubin, D. B. Inference and missing data. Biometrika 63 , 581 (1976).

Scheid, L. M., Brown, L. S., Clark, C. & Rosenfeld, C. R. Data electronically extracted from the electronic health record require validation. J. Perinatol. 39 , 468–474 (2019).

Phelan, M., Bhavsar, N. A. & Goldstein, B. A. Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. EGEMS (Wash. DC). 5 , 22 (2017).

PubMed   PubMed Central   Google Scholar  

Secondary Analysis of Electronic Health Records (ed MIT Critical Data) (Springer, 2016).

Jetley, G. & Zhang, H. Electronic health records in IS research: quality issues, essential thresholds and remedial actions. Decis. Support Syst. 126 , 113137 (2019).

McCormack, J. P. & Holmes, D. T. Your results may vary: the imprecision of medical measurements. BMJ 368 , m149 (2020).

Hobbs, F. D. et al. Is the international normalised ratio (INR) reliable? A trial of comparative measurements in hospital laboratory and primary care settings. J. Clin. Pathol. 52 , 494–497 (1999).

Huguet, N. et al. Using electronic health records in longitudinal studies: estimating patient attrition. Med. Care 58 Suppl 6 Suppl 1 , S46–S52 (2020).

Zeng, J., Gensheimer, M. F., Rubin, D. L., Athey, S. & Shachter, R. D. Uncovering interpretable potential confounders in electronic medical records. Nat. Commun. 13 , 1014 (2022).

Getzen, E., Ungar, L., Mowery, D., Jiang, X. & Long, Q. Mining for equitable health: assessing the impact of missing data in electronic health records. J. Biomed. Inform. 139 , 104269 (2023).

Tang, S. et al. Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data. J. Am. Med. Inform. Assoc. 27 , 1921–1934 (2020).

Dagliati, A. et al. A process mining pipeline to characterize COVID-19 patients’ trajectories and identify relevant temporal phenotypes from EHR data. Front. Public Health 10 , 815674 (2022).

Sun, Y. & Zhou, Y.-H. A machine learning pipeline for mortality prediction in the ICU. Int. J. Digit. Health 2 , 3 (2022).

Article   CAS   Google Scholar  

Mandyam, A., Yoo, E. C., Soules, J., Laudanski, K. & Engelhardt, B. E. COP-E-CAT: cleaning and organization pipeline for EHR computational and analytic tasks. In Proc. of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. https://doi.org/10.1145/3459930.3469536 (Association for Computing Machinery, 2021).

Gao, C. A. et al. A machine learning approach identifies unresolving secondary pneumonia as a contributor to mortality in patients with severe pneumonia, including COVID-19. J. Clin. Invest. 133 , e170682 (2023).

Makam, A. N. et al. The good, the bad and the early adopters: providers’ attitudes about a common, commercial EHR. J. Eval. Clin. Pract. 20 , 36–42 (2014).

Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17 , 137–145 (2020).

Virshup, I. et al. The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat. Biotechnol. 41 , 604–606 (2023).

Zou, Q. et al. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9 , 515 (2018).

Cios, K. J. & William Moore, G. Uniqueness of medical data mining. Artif. Intell. Med. 26 , 1–24 (2002).

Zeng, X. et al. PIC, a paediatric-specific intensive care database. Sci. Data 7 , 14 (2020).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203–209 (2018).

Lee, J. et al. Open-access MIMIC-II database for intensive care research. Annu. Int. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2011 , 8315–8318 (2011).

Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: annotated data. Preprint at bioRxiv https://doi.org/10.1101/2021.12.16.473007 (2021).

Voss, E. A. et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 22 , 553–564 (2015).

Vasilevsky, N. A. et al. Mondo: unifying diseases for the world, by the world. Preprint at medRxiv https://doi.org/10.1101/2022.04.13.22273750 (2022).

Harrison, J. E., Weber, S., Jakob, R. & Chute, C. G. ICD-11: an international classification of diseases for the twenty-first century. BMC Med. Inform. Decis. Mak. 21 , 206 (2021).

Köhler, S. et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47 , D1018–D1027 (2019).

Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inform. 7 , e14325 (2019).

Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19 , 15 (2018).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res . 12 , 2825–2830 (2011).

de Haan-Rietdijk, S., de Haan-Rietdijk, S., Kuppens, P. & Hamaker, E. L. What’s in a day? A guide to decomposing the variance in intensive longitudinal data. Front. Psychol. 7 , 891 (2016).

Pedersen, E. S. L., Danquah, I. H., Petersen, C. B. & Tolstrup, J. S. Intra-individual variability in day-to-day and month-to-month measurements of physical activity and sedentary behaviour at work and in leisure-time among Danish adults. BMC Public Health 16 , 1222 (2016).

Roffey, D. M., Byrne, N. M. & Hills, A. P. Day-to-day variance in measurement of resting metabolic rate using ventilated-hood and mouthpiece & nose-clip indirect calorimetry systems. JPEN J. Parenter. Enter. Nutr. 30 , 426–432 (2006).

Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13 , 845–848 (2016).

Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19 , 159–170 (2022).

Weiler, P., Lange, M., Klein, M., Pe'er, D. & Theis, F. CellRank 2: unified fate mapping in multiview single-cell data. Nat. Methods 21 , 1196–1205 (2024).

Zhang, S. et al. Cost of management of severe pneumonia in young children: systematic analysis. J. Glob. Health 6 , 010408 (2016).

Torres, A. et al. Pneumonia. Nat. Rev. Dis. Prim. 7 , 25 (2021).

Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9 , 5233 (2019).

Kamin, W. et al. Liver involvement in acute respiratory infections in children and adolescents—results of a non-interventional study. Front. Pediatr. 10 , 840008 (2022).

Shi, T. et al. Risk factors for mortality from severe community-acquired pneumonia in hospitalized children transferred to the pediatric intensive care unit. Pediatr. Neonatol. 61 , 577–583 (2020).

Dudnyk, V. & Pasik, V. Liver dysfunction in children with community-acquired pneumonia: the role of infectious and inflammatory markers. J. Educ. Health Sport 11 , 169–181 (2021).

Charpignon, M.-L. et al. Causal inference in medical records and complementary systems pharmacology for metformin drug repurposing towards dementia. Nat. Commun. 13 , 7652 (2022).

Grief, S. N. & Loza, J. K. Guidelines for the evaluation and treatment of pneumonia. Prim. Care 45 , 485–503 (2018).

Paul, M. Corticosteroids for pneumonia. Cochrane Database Syst. Rev. 12 , CD007720 (2017).

Sharma, A. & Kiciman, E. DoWhy: an end-to-end library for causal inference. Preprint at arXiv https://doi.org/10.48550/ARXIV.2011.04216 (2020).

Khilnani, G. C. et al. Guidelines for antibiotic prescription in intensive care unit. Indian J. Crit. Care Med. 23 , S1–S63 (2019).

Harris, L. K. & Crannage, A. J. Corticosteroids in community-acquired pneumonia: a review of current literature. J. Pharm. Technol. 37 , 152–160 (2021).

Dou, L. et al. Decreased hospital length of stay with early administration of oseltamivir in patients hospitalized with influenza. Mayo Clin. Proc. Innov. Qual. Outcomes 4 , 176–182 (2020).

Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50 , 1219–1224 (2018).

Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14 , 604 (2023).

Ko, F. et al. Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank. Ophthalmology 124 , 105–117 (2017).

Patel, P. J. et al. Spectral-domain optical coherence tomography imaging in 67 321 adults: associations with macular thickness in the UK Biobank study. Ophthalmology 123 , 829–840 (2016).

D’Agostino Sr, R. B. et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117 , 743–753 (2008).

Buergel, T. et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med. 28 , 2309–2320 (2022).

Xu, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616 , 123–131 (2023).

Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37 , 547–554 (2019).

Rousan, L. A., Elobeid, E., Karrar, M. & Khader, Y. Chest x-ray findings and temporal lung changes in patients with COVID-19 pneumonia. BMC Pulm. Med. 20 , 245 (2020).

Signoroni, A. et al. BS-Net: learning COVID-19 pneumonia severity on a large chest X-ray dataset. Med. Image Anal. 71 , 102046 (2021).

Bird, S. et al. Fairlearn: a toolkit for assessing and improving fairness in AI. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/ (2020).

Strack, B. et al. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed. Res. Int. 2014 , 781670 (2014).

Stekhoven, D. J. & Bühlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28 , 112–118 (2012).

Banerjee, A. et al. Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study. Lancet Digit. Health 5 , e370–e379 (2023).

Nagamine, T. et al. Data-driven identification of heart failure disease states and progression pathways using electronic health records. Sci. Rep. 12 , 17871 (2022).

Da Silva Filho, J. et al. Disease trajectories in hospitalized COVID-19 patients are predicted by clinical and peripheral blood signatures representing distinct lung pathologies. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.23295024 (2023).

Haneuse, S., Arterburn, D. & Daniels, M. J. Assessing missing data assumptions in EHR-based studies: a complex and underappreciated task. JAMA Netw. Open 4 , e210184 (2021).

Little, R. J. A. A test of missing completely at random for multivariate data with missing values. J. Am. Stat. Assoc. 83 , 1198–1202 (1988).

Jakobsen, J. C., Gluud, C., Wetterslev, J. & Winkel, P. When and how should multiple imputation be used for handling missing data in randomised clinical trials—a practical guide with flowcharts. BMC Med. Res. Methodol. 17 , 162 (2017).

Dziura, J. D., Post, L. A., Zhao, Q., Fu, Z. & Peduzzi, P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J. Biol. Med. 86 , 343–358 (2013).

White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30 , 377–399 (2011).

Jäger, S., Allhorn, A. & Bießmann, F. A benchmark for data imputation methods. Front. Big Data 4 , 693674 (2021).

Waljee, A. K. et al. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3 , e002847 (2013).

Ibrahim, J. G. & Molenberghs, G. Missing data methods in longitudinal studies: a review. Test (Madr.) 18 , 1–43 (2009).

Li, C., Alsheikh, A. M., Robinson, K. A. & Lehmann, H. P. Use of recommended real-world methods for electronic health record data analysis has not improved over 10 years. Preprint at bioRxiv https://doi.org/10.1101/2023.06.21.23291706 (2023).

Regev, A. et al. The Human Cell Atlas. eLife 6 , e27041 (2017).

Megill, C. et al. cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).

Speir, M. L. et al. UCSC Cell Browser: visualize your single-cell data. Bioinformatics 37 , 4578–4580 (2021).

Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9 , 90–95 (2007).

Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6 , 3021 (2021).

Harris, C. R. et al. Array programming with NumPy. Nature 585 , 357–362 (2020).

Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc. of the Second Workshop on the LLVM Compiler Infrastructure in HPC. https://doi.org/10.1145/2833157.2833162 (Association for Computing Machinery, 2015).

Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17 , 261–272 (2020).

McKinney, W. Data structures for statistical computing in Python. In Proc. of the 9th Python in Science Conference (eds van der Walt, S. & Millman, J.). https://doi.org/10.25080/majora-92bf1922-00a (SciPy, 2010).

Boulanger, A. Open-source versus proprietary software: is one more reliable and secure than the other? IBM Syst. J. 44 , 239–248 (2005).

Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. In Proc. of the 14th Python in Science Conference. https://doi.org/10.25080/majora-7b98e3ed-013 (SciPy, 2015).

Pivarski, J. et al. Awkward Array. https://doi.org/10.5281/ZENODO.4341376

Collette, A. Python and HDF5: Unlocking Scientific Data (‘O’Reilly Media, Inc., 2013).

Miles, A. et al. zarr-developers/zarr-python: v2.13.6. https://doi.org/10.5281/zenodo.7541518 (2023).

The pandas development team. pandas-dev/pandas: Pandas. https://doi.org/10.5281/ZENODO.3509134 (2024).

Weberpals, J. et al. Deep learning-based propensity scores for confounding control in comparative effectiveness research: a large-scale, real-world data study. Epidemiology 32 , 378–388 (2021).

Rosenthal, J. et al. Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the PathML toolkit for computational pathology. Mol. Cancer Res. 20 , 202–206 (2022).

Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40 , 163–166 (2022).

Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.). 8024–8035 (Curran Associates, 2019).

Frostig, R., Johnson, M. & Leary, C. Compiling machine learning programs via high-level tracing. https://cs.stanford.edu/~rfrostig/pubs/jax-mlsys2018.pdf (2018).

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616 , 259–265 (2023).

Kraljevic, Z. et al. Multi-domain clinical natural language processing with MedCAT: the Medical Concept Annotation Toolkit. Artif. Intell. Med. 117 , 102083 (2021).

Pollard, T. J., Johnson, A. E. W., Raffa, J. D. & Mark, R. G. An open source Python package for producing summary statistics for research papers. JAMIA Open 1 , 26–31 (2018).

Ellen, J. G. et al. Participant flow diagrams for health equity in AI. J. Biomed. Inform. 152 , 104631 (2024).

Schouten, R. M. & Vink, G. The dance of the mechanisms: how observed information influences the validity of missingness assumptions. Sociol. Methods Res. 50 , 1243–1258 (2021).

Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 , 118–127 (2007).

Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4 , 1317 (2019).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 , 289–300 (1995).

Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34 , D668–D672 (2006).

Harrell, F. E. Jr, Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the yield of medical tests. JAMA 247 , 2543–2546 (1982).

Currant, H. et al. Genetic variation affects morphological retinal phenotypes extracted from UK Biobank optical coherence tomography images. PLoS Genet. 17 , e1009497 (2021).

Cohen, J. P. et al. TorchXRayVision: a library of chest X-ray datasets and models. In Proc. of the 5th International Conference on Medical Imaging with Deep Learning (eds Konukoglu, E. et al.). 172 , 231–249 (PMLR, 2022).

Cohen, J.P., Hashir, M., Brooks, R. & Bertrand, H. On the limits of cross-domain generalization in automated X-ray prediction. In Proceedings of Machine Learning Research , Vol. 121 (eds Arbel, T. et al.) 136–155 (PMLR, 2020).

Download references

Acknowledgements

We thank M. Ansari who designed the ehrapy logo. The authors thank F. A. Wolf, M. Lücken, J. Steinfeldt, B. Wild, G. Rätsch and D. Shung for feedback on the project. We further thank L. Halle, Y. Ji, M. Lücken and R. K. Rubens for constructive comments on the paper. We thank F. Hashemi for her help in implementing the survival analysis module. This research was conducted using data from the UK Biobank, a major biomedical database ( https://www.ukbiobank.ac.uk ), under application number 49966. This work was supported by the German Center for Lung Research (DZL), the Helmholtz Association and the CRC/TRR 359 Perinatal Development of Immune Cell Topology (PILOT). N.H. and F.J.T. acknowledge support from the German Federal Ministry of Education and Research (BMBF) (LODE, 031L0210A), co-funded by the European Union (ERC, DeepCell, 101054957). A.N. is supported by the Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD program Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the Federal Ministry of Education and Research. This work was also supported by the Chan Zuckerberg Initiative (CZIF2022-007488; Human Cell Atlas Data Ecosystem).

Open access funding provided by Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH).

Author information

Authors and affiliations.

Institute of Computational Biology, Helmholtz Munich, Munich, Germany

Lukas Heumos, Philipp Ehmele, Tim Treis, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A. Shitov, Xinyue Zhang, Luke Zappia, Leon Hetzel, Isaac Virshup, Lisa Sikkema, Fabiola Curion & Fabian J. Theis

Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany

Lukas Heumos, Niklas J. Lang, Herbert B. Schiller & Anne Hilgendorff

TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany

Lukas Heumos, Tim Treis, Nastassya Horlava, Vladimir A. Shitov, Lisa Sikkema & Fabian J. Theis

Health Data Science Unit, Heidelberg University and BioQuant, Heidelberg, Germany

Julius Upmeier zu Belzen & Roland Eils

Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany

Eljas Roellin, Lilly May, Luke Zappia, Leon Hetzel, Fabiola Curion & Fabian J. Theis

Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA), Darmstadt, Germany

Altana Namsaraeva

Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Bonn, Germany

Rainer Knoll

Center for Digital Health, Berlin Institute of Health (BIH) at Charité – Universitätsmedizin Berlin, Berlin, Germany

Roland Eils

Research Unit, Precision Regenerative Medicine (PRM), Helmholtz Munich, Munich, Germany

Herbert B. Schiller

Center for Comprehensive Developmental Care (CDeCLMU) at the Social Pediatric Center, Dr. von Hauner Children’s Hospital, LMU Hospital, Ludwig Maximilian University, Munich, Germany

Anne Hilgendorff

You can also search for this author in PubMed   Google Scholar

Contributions

L. Heumos and F.J.T. conceived the study. L. Heumos, P.E., X.Z., E.R., L.M., A.N., L.Z., V.S., T.T., L. Hetzel, N.H., R.K. and I.V. implemented ehrapy. L. Heumos, P.E., N.L., L.S., T.T. and A.H. analyzed the PIC database. J.U.z.B. and L. Heumos analyzed the UK Biobank database. X.Z. and L. Heumos analyzed the COVID-19 chest x-ray dataset. L. Heumos, P.E. and J.U.z.B. wrote the paper. F.J.T., A.H., H.B.S. and R.E. supervised the work. All authors read, corrected and approved the final paper.

Corresponding author

Correspondence to Fabian J. Theis .

Ethics declarations

Competing interests.

L. Heumos is an employee of LaminLabs. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd. and Omniscope Ltd. and has ownership interest in Dermagnostix GmbH and Cellarity. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Leo Anthony Celi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 overview of the paediatric intensive care database (pic)..

The database consists of several tables corresponding to several data modalities and measurement types. All tables colored in green were selected for analysis and all tables in blue were discarded based on coverage rate. Despite the high coverage rate, we discarded the ‘OR_EXAM_REPORTS’ table because of the lack of detail in the exam reports.

Extended Data Fig. 2 Preprocessing of the Paediatric Intensive Care (PIC) dataset with ehrapy.

( a ) Heterogeneous data of the PIC database was stored in ‘data’ (matrix that is used for computations) and ‘observations’ (metadata per patient visit). During quality control, further annotations are added to the ‘variables’ (metadata per feature) slot. ( b ) Preprocessing steps of the PIC dataset. ( c ) Example of the function calls in the data analysis pipeline that resembles the preprocessing steps in (B) using ehrapy.

Extended Data Fig. 3 Missing data distribution for the ‘youths’ group of the PIC dataset.

The x-axis represents the percentage of missing values in each feature. The y-axis reflects the number of features in each bin with text labels representing the names of the individual features.

Extended Data Fig. 4 Patient selection during analysis of the PIC dataset.

Filtering for the pneumonia cohort of the youths filters out care units except for the general intensive care unit and the pediatric intensive care unit.

Extended Data Fig. 5 Feature rankings of stratified patient groups.

Scores reflect the z-score underlying the p-value per measurement for each group. Higher scores (above 0) reflect overrepresentation of the measurement compared to all other groups and vice versa. ( a ) By clinical chemistry. ( b ) By liver markers. ( c ) By medication type. ( d ) By infection markers.

Extended Data Fig. 6 Liver marker value progression for the ‘youths’ group and Kaplan-Meier curves.

( a ) Viral and severe pneumonia with co-infection groups display enriched gamma-glutamyl transferase levels in blood serum. ( b ) Aspartate transferase (AST) and Alanine transaminase (ALT) levels are enriched for severe pneumonia with co-infection during early ICU stay. ( c ) and ( d ) Kaplan-Meier curves for ALT and AST demonstrate lower survivability for children with measurements outside the norm.

Extended Data Fig. 7 Overview of medication categories used for causal inference.

( a ) Feature engineering process to group administered medications into medication categories using drugbank. ( b ) Number of medications per medication category. ( c ) Number of patients that received (dark blue) and did not receive specific medication categories (light blue).

Extended Data Fig. 8 UK-Biobank data overview and quality control across modalities.

( a ) UMAP plot of the metabolomics data demonstrating a clear gradient with respect to age at sampling, and ( b ) type 2 diabetes prevalence. ( c ) Analogously, the features derived from retinal imaging show a less pronounced age gradient, and ( d ) type 2 diabetes prevalence gradient. ( e ) Stratifying myocardial infarction risk by the type 2 diabetes comorbidity confirms vastly increased risk with a prior type 2 (T2D) diabetes diagnosis. Kaplan-Meier estimators with 95 % confidence intervals are shown. ( f ) Similarly, the polygenic risk score for coronary heart disease used in this work substantially enriches myocardial infarction risk in its top 5% percentile. Kaplan-Meier estimators with 95 % confidence intervals are shown. ( g ) UMAP visualization of the metabolomics features colored by the assessment center shows no discernable biases. (A-G) n = 29,216.

Extended Data Fig. 9 UK-Biobank retina derived feature quality control.

( a ) Leiden Clustering of retina derived feature space. ( b ) Comparison of ‘overall retinal pigment epithelium (RPE) thickness’ values between cluster 5 (n = 301) and the rest of the population (n = 28,915). ( c ) RPE thickness in the right eye outliers on the UMAP largely corresponds to cluster 5. ( d ) Log ratio of top and bottom 5 fields in obs dataframe between cluster 5 and the rest of the population. ( e ) Image Quality of the optical coherence tomography scan as reported in the UKB. ( f ) Minimum motion correlation quality control indicator. ( g ) Inner limiting membrane (ILM) quality control indicator. (D-G) Data are shown for the right eye only, comparable results for the left eye are omitted. (A-G) n = 29,216.

Extended Data Fig. 10 Bias detection and mitigation study on the Diabetes 130-US hospitals dataset (n = 101,766 hospital visits, one patient can have multiple visits).

( a ) Filtering to the visits of Medicare recipients results in an increase of Caucasians. ( b ) Proportion of visits where Hb1Ac measurements are recorded, stratified by admission type. Adjusted P values were calculated with Chi squared tests and Bonferroni correction (Adjusted P values: Emergency vs Referral 3.3E-131, Emergency vs Other 1.4E-101, Referral vs Other 1.6E-4.) ( c ) Normalizing feature distributions jointly vs. separately can mask distribution differences. ( d ) Imputing the number of medications for visits. Onto the complete data (blue), MCAR (30% missing data) and MAR (38% missing data) were introduced (orange), with the MAR mechanism depending on the time in hospital. Mean imputation (green) can reduce the variance of the distribution under MCAR and MAR mechanisms, and bias the center of the distribution under an MAR mechanism. Multiple imputation, such as MissForest imputation can impute meaningfully even in MAR cases, when having access to variables involved in the MAR mechanism. Each boxplot represents the IQR of the data, with the horizontal line inside the box indicating the median value. The left and right bounds of the box represent the first and third quartiles, respectively. The ‘whiskers’ extend to the minimum and maximum values within 1.5 times the IQR from the lower and upper quartiles, respectively. ( e ) Predicting the early readmission within 30 days after release on a per-stay level. Balanced accuracy can mask differences in selection and false negative rate between sensitive groups.

Supplementary information

Supplementary tables 1 and 2, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Heumos, L., Ehmele, P., Treis, T. et al. An open-source framework for end-to-end analysis of electronic health record data. Nat Med (2024). https://doi.org/10.1038/s41591-024-03214-0

Download citation

Received : 11 December 2023

Accepted : 25 July 2024

Published : 12 September 2024

DOI : https://doi.org/10.1038/s41591-024-03214-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

quality research articles are

This website uses cookies. to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking “Accept & Close”, you consent to our use of cookies. Read our Privacy Policy to learn more.

  • The Magazine

Anesthesia Alert: How Does Wildfire Smoke Impact Anesthesia?

By: Joe Paone | Senior Editor

Published: 9/10/2024

New research ignites discussion about protocols for high-risk patients.

S hould anesthesia providers in areas where wildfires are burning change their practices to account for the possible effects of extremely poor air quality on their patients?

That question and many others are being asked in the wake of an article published last month in the Online First edition of Anesthesiology , the peer-reviewed journal of the American Society of Anesthesiologists (ASA).

Senior Author Vijay Krishnamoorthy, MD, MPH, PhD, chief of the Critical Care Medicine Division and associate professor of anesthesiology and population health sciences at Duke University School of Medicine in Durham, N.C., says exposure to wildfire smoke can damage the heart, lungs and other organs, and lead to potentially worsening outcomes for patients undergoing anesthesia and surgery.

Spreading like wildfire

According to ASA, nearly 100 wildfires were raging in August, burning more than two million acres across the U.S., and recent data suggests the number of wildfires will grow by 30% by 2050.

Dr. Krishnamoorthy and his colleagues recognize an increasingly urgent need to determine exactly how exposure to wildfire smoke negatively affects patients undergoing anesthesia and surgery and their ultimate outcomes. Just as importantly, they believe guidance should be developed on how providers can best manage such patients perioperatively.

They report that wildfire smoke exposure can exacerbate some patients’ existing comorbidities and chronic diseases. It poses significant risks to those with preexisting heart and lung disease, obese patients, infants and young children, the elderly and other vulnerable groups. It is known to cause inflammation, worsen heart and lung disease, and impact pregnancy outcomes.

Anyone familiar with the harm surgical smoke exposure can cause will immediately recognize the toxic nature of wildfire smoke, which contains a mix of fine particles and chemicals that produce inflammation and oxidative stress. Researchers say exposure is proven to impact patients with cardiovascular disease, possibly contributing to heart attack, heart rhythm abnormalities, heart failure or stroke.

The ‘known knowns’ of wildfire smoke exposure

“What’s currently known is that the increase in wildfires is slowly but surely occurring across the world,” says Dr. Krishnamoorthy. “We also know for sure that certain patients are vulnerable to worse outcomes during periods of high wildfire smoke — the very elderly, the very young, the pregnant population and patients with a significant chronic disease, particularly cardiopulmonary disease. We know that, among those patients, and in these settings, we have certainly observed greater exacerbations of stroke, cardiac disease, pulmonary disease and pregnancy outcomes.”

What’s unknown right now is how anesthesia providers and the surgical centers where they work should handle high-risk patients in the midst of, or in the wake of, a wildfire in their area. Just what clinical steps to avoid adverse outcomes should anesthesia providers take with their patients and protocols in response to a wildfire that is impacting (or has recently impacted) their area? The answer is that, right now, we just don’t know, says Dr. Krishnamoorthy. Little is known about the relationship between wildfire smoke exposure, anesthesia and surgery.

“More research is needed to guide what specific operational changes need to take place,” says Krishnamoorthy. “Not only to increase awareness about the impacts of wildfire smoke on acute illness, but also to hopefully contribute to setting a research agenda for this topic that I hope will inform more robust guidelines to truly guide patient care.”

Dr. Krishnamoorthy’s research group is doing its part to fill those gaps through a mathematical model they are developing that aims to link geographic smoke exposure data to databases that include surgical outcomes. The team hopes that this and other lines of research will help providers develop guidelines for assessing and managing patient risks.

“While I think the perioperative community definitely appreciates the impact of climate change on society, I think understanding the specific impacts on perioperative care is less appreciated,” says Dr. Krishnamoorthy. “That was one of the big goals of the paper, just to put this on the radar.” For now, Dr. Krishnamoorthy and others in the industry are making their best educated guesses on how to better ensure patient safety during times of wildfire. “At this point, we want to make sure the patient is not having an exacerbation of any of their chronic disease during a period of high wildfire smoke,” he says. “Additionally, we would potentially want to consider closer monitoring of these patients in the perioperative setting.”

Dr. Krishnamoorthy believes that most patients should be able to safely undergo most elective surgeries during times of wildfire. “I think the groups we should be thinking about are the very vulnerable patient population groups, especially in areas of very high wildfire exposure,” he says.

In practice

Michael A. MacKinnon, DNP, FNP-C, CRNA, owner of MacKinnon Anesthesia in Show Low, Ariz., says he has seen wildfire smoke exposure cause major respiratory issues in already compromised patients. Once, he and his colleagues evacuated patients from a hospital due to a wildfire in the area. Another time, his facility was placed on “get set” status as it began “snowing” ash outside.

“Currently, we do not have any specific protocols in place solely for addressing the impacts of wildfire smoke inhalation in the perioperative setting,” he says. “However, during wildfire season when air quality is particularly poor, we elevate our clinical awareness regarding patients with certain respiratory conditions that are especially vulnerable to exacerbation from smoke inhalation.”

Dr. MacKinnon says that while chronic obstructive pulmonary disease and asthma are the more commonly recognized conditions to consider, diseases such as interstitial lung disease, bronchiectasis, pulmonary fibrosis and certain forms of restrictive lung disease can also be significantly impacted by wildfire smoke exposure. Patients with cardiovascular comorbidities such as congestive heart failure or ischemic heart disease can experience worsened symptoms due to the systemic inflammatory response triggered by inhaled particulate matter, he adds. “Though there isn’t a formal policy, we take a more cautious approach in evaluating patients with these conditions during fire season,” he says.

That cautious approach entails heightened vigilance during pre-op assessments, with careful attention to recent exacerbations, changes in medication usage or new symptoms that could indicate impaired respiratory function.

“For patients with these at-risk conditions, we might consider additional pulmonary function tests or chest imaging depending on the severity of their symptoms and overall risk profile,” says Dr. MacKinnon. “Intraoperatively, we remain mindful of the potential for increased airway reactivity and the need for meticulous management of ventilation.”

Dr. MacKinnon fully agrees with Dr. Krishnamoorthy that further research is needed. “It could lead to better-defined protocols, evidence-based guidelines and, ultimately, improved patient outcomes. The clinical experiences of those of us on the front lines certainly suggest that further investigation is needed,” he says. OSM

Related Articles

Alleviate the anxiety of unannounced surveys.

When I transitioned into the role of ASC Administrator, the prospect of an unannounced survey loomed as one of the most daunting challenges....

A Day in the Life of an Administrator: Lisa York

Lisa York, MSN, RN, CASC, CAIP, executive director at Hunterdon Center for Surgery, Flemington, N.J. Outpatient Surgery Magazine is posting these profiles to give the...

Business Advisor: Your Blueprint for Supply Chain Preparedness

Congratulations, ASC leaders! You survived the summer vacations, volume fluctuations, lack of daycare and listening to physicians complain about the number of holidays that...

Partner With Us!

Connect with 100,000+ outpatient OR leaders through print and digital advertising, custom programs, e-newsletters, event sponsorship, and more!

AORN Enterprise

Quick links.

  • Share full article

Advertisement

Supported by

The Big City Is Vibrant. Birds There Might Be Getting Less So.

Recent studies show that certain feather pigments can help neutralize toxic pollution. It means darker, duller birds could have a survival advantage.

A close-up of two small birds held gently in a human hand. One bird has a very pale yellow breast, the other a bright yellow breast.

By Marta Zaraska

Some popular city dwellers appear to be losing their colorful allure, and not just the dirty birds.

According to a study published this summer in the journal Landscape and Planning that looked at 547 bird species in China, birds that live in cities are duller and darker on average than their rural counterparts. A similar conclusion emerged from an analysis of 59 studies published in March in Biological Reviews : Urban feathers are not as bright, with yellow, orange and red feathers affected most.

Often, city birds are covered in grime. But even if you could give them all a good bird bath, chances are their brightness still wouldn’t match that of their country cousins. That’s because of the way pollution, and heavy metals in particular, can interact with melanin, a pigment that makes feathers black, brown and gray.

Studies show that melanin can bind to heavy metals like lead. That means toxic chemicals may be more likely to be stored in plumage in darker and duller birds. And that, in turn, can confer a survival advantage.

“The more melanin you accumulate, the better able you are to sequester these harmful compounds in feathers,” said Kevin McGraw, a biologist at Michigan State University who studies the colors of animals to understand the costs, benefits and evolution of visual signals.

Urban pollution affects avian colors in other ways, too. Research shows that, compared with rural plants, city trees store fewer natural pigments called carotenoids. And pollution is the likely reason. Carotenoids are produced by plants, algae and fungi. They’re what makes red peppers red and carrots orange.

When leaves are low on these pigments, the effects go up the food chain: Leaf-munching caterpillars become deficient in carotenoids, and so do caterpillar-munching birds.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

This website uses cookies.

By clicking the "Accept" button or continuing to browse our site, you agree to first-party and session-only cookies being stored on your device to enhance site navigation and analyze site performance and traffic. For more information on our use of cookies, please see our Privacy Policy .

  • Research Highlights

Building better bureaucracy

Research Highlights Article

September 13, 2024

The effects of the Pendleton Act on the quality of public services in the United States

Tyler Smith

quality research articles are

Source: Library of Congress, Washington, D.C. (file no. LC-DIG-ppmsca-15781)

A competent civil service is widely seen by political economists and policymakers as crucial for good government and long-run economic development. But there is little empirical evidence that pinpoints the reforms needed to create effective state institutions.

In a  paper in the American Economic Review , authors Abhay Aneja and Guo Xu investigate historic changes to the federal civil service in the United States and show that taking bureaucratic staffing decisions out of the hands of elected politicians can have a significant positive impact on the quality of public service delivery. 

For much of the 19th century, staffing the US federal government was based on a so-called “ spoils system .”  In return for financial and political support, presidential politicians winning office rewarded their party loyalists with bureaucratic jobs. The system was ripe for abuse and helped powerful political machines dominate politics during the Gilded Age. 

In 1881, the assassination of President James Garfield helped galvanize pro-reform sentiment that had been growing since the end of the Civil War. Garfield’s assassin, Charles Guiteau, was motivated in part by a sense that Garfiield owed him an office appointment, a fact which reformers used to push for change. President Chester Arthur signed the Pendleton Civil Service Reform Act into law in 1883. The bill called for the  open selection of government employees based on merit.

Initially, the Pendleton Act  shielded only a modest fraction of federal workers from political interference. But it grew to cover nearly all federal jobs as subsequent administrations built on the reforms. 

To quantify the initial impact of the Pendleton Act, the authors analyzed arguably the most important federal institution of the 19th century: the US Postal Service. In terms of personnel, the postal service was, and remains today, the largest civilian agency within the federal government.

The authors combined information on delivery errors, the number of workers, and personnel characteristics to measure the performance and productivity of local post offices before and after the reforms were rolled out. In particular, the reforms came in two major waves, one in 1883 and a second in 1893, which allowed the researchers to examine how performance measures changed in post offices across hundreds of cities from 1879 to 1901.

quality research articles are

When they compared reformed cities to non-reformed cities, they found substantial improvements in quality. The number of delivery errors decreased by roughly 20 percent, which suggests a significant increase in reliability. The reforms also led to increases in the overall volume per mail carrier and cost per volume, indicating large productivity gains.

While the researchers initially thought these gains might be driven by the rule-based, open selection of employees based on merit, that’s not what they found.

“It turns out that the first-order effect that seems to be driving this performance improvement is a reduction in turnover,” Xu said. “We're talking about clerks and carriers in a local post office getting fired and replaced with every election. It's exactly when those turnover events happen during elections that we see the performance gains show up.”

On the one hand, it seems crazy that the president wouldn't be able to control people through firing and hiring; you should be able to control your workers to implement your plan. But it's a double-edged sword. If you give the politician all that discretion, they can also abuse it for private political gains. Guo Xu 

Overall, the findings indicate that by insulating federal employees from politics, the Pendleton Act made the federal government a more stable organization capable of preserving institutional knowledge. Today, extremely few positions in the federal government—apart from cabinet level appointments—are filled by the president. 

“On the one hand, it seems crazy that the president wouldn't be able to control people through firing and hiring; you should be able to control your workers to implement your plan. But it's a double-edged sword,” Xu told the AEA in an interview. “If you give the politician all that discretion, they can also abuse it for private political gains.”

“ Strengthening State Capacity: Civil Service Reform and Public Sector Performance during the Gilded Age ” appears in the August 2024 issue of the American Economic Review .

Building state capacity with local elites

The Democratic Republic of Congo shows how city chiefs can help low-capacity governments collect more taxes.

Better Bureaucracy

A popular anti-poverty program is changing rural China.

IMAGES

  1. Quality of journal articles.

    quality research articles are

  2. Quality of journal articles.

    quality research articles are

  3. Quality in Qualitative Research

    quality research articles are

  4. How to Write a High Quality Research Paper 2024

    quality research articles are

  5. 3 Quick Ways To Source Quality Research For Your Articles

    quality research articles are

  6. 3 Quick Ways To Source Quality Research For Your Articles

    quality research articles are

VIDEO

  1. WARNING: Your Content Will Never Rank Unless You Do THIS!

  2. Qualitative Research Reporting Standards: How are qualitative articles different from quantitative?

  3. 4 steps to Enhance the Quality of Research Paper

  4. Your research can change the world

  5. Online Qual: Best Practices for Mobile Consumer Research

  6. Thoughts on Modern Quality Assurance in Clinical Research

COMMENTS

  1. Research quality: What it is, and how to achieve it

    2) Initiating research stream: The researcher (s) must be able to assemble a research team that can achieve the identified research potential. The team should be motivated to identify research opportunities and insights, as well as to produce top-quality articles, which can reach the highest-level journals.

  2. How do you determine the quality of a journal article?

    The better the quality of the articles that you use in the literature review, the stronger your own research will be. When you use articles that are not well respected, you run the risk that the conclusions you draw will be unfounded. Your supervisor will always check the article sources for the conclusions you draw. We will use an example to ...

  3. Criteria for Good Qualitative Research: A Comprehensive Review

    Fundamental Criteria: General Research Quality. Various researchers have put forward criteria for evaluating qualitative research, which have been summarized in Table 3.Also, the criteria outlined in Table 4 effectively deliver the various approaches to evaluate and assess the quality of qualitative work. The entries in Table 4 are based on Tracy's "Eight big‐tent criteria for excellent ...

  4. A Review of the Quality Indicators of Rigor in Qualitative Research

    Abstract. Attributes of rigor and quality and suggested best practices for qualitative research design as they relate to the steps of designing, conducting, and reporting qualitative research in health professions educational scholarship are presented. A research question must be clear and focused and supported by a strong conceptual framework ...

  5. How to … assess the quality of qualitative research

    The abstract at the very beginning of a qualitative research article can be seen as the first marker of high quality. An abstract should clearly explicate the research problem and the aim of the study, and should skillfully condense the contents of the study, as well as indicate its novelty and contribution to research and practice in health ...

  6. Editorial: How to develop a quality research article and avoid a

    1. Introduction. The process of concluding that a submitted research article does not align with the specific journals' aims and scope or meets the required rigour and quality, is commonly termed - desk rejection.This element of the review process is normally performed by the journal Editor directly or by the team of specialist sub-Editors.

  7. What Is Quality in Research? Building a Framework of Design ...

    The strategic relevance of innovation and scientific research has amplified the attention towards the definition of quality in research practice. However, despite the proliferation of evaluation metrics and procedures, there is a need to go beyond bibliometric approaches and to identify, more explicitly, what constitutes good research and which are its driving factors or determinants.

  8. Defining and assessing research quality in a transdisciplinary context

    2.2 Search terms. Search terms were designed to identify publications that discuss the evaluation or assessment of quality or excellence 2 of research 3 that is done in a TDR context. Search terms are listed online in Supplementary Appendices 2 and 3.The search strategy favored sensitivity over specificity to ensure that we captured the relevant information.

  9. Research quality: What it is, and how to achieve it

    Section snippets What research quality historically is. Research assessment plays an essential role in academic appointments, in annual performance reviews, in promotions, and in national research assessment exercises such as the Excellence in Research for Australia (ERA), the Research Excellence Framework (REF) in the United Kingdom, the Standard Evaluation Protocol (SEP) in the Netherlands ...

  10. Quality of Research Practice

    The research quality model based on four basic concepts and 28 sub-concepts has-overall-been found valid, as an important contribution for describing the quality of research practice, by 42 senior researchers from three major Swedish universities. The majority of the respondents believed that all concepts in the model are important, and did ...

  11. (PDF) What Is Quality in Research? Building a Framework of Design

    This article reviews specialized research policy, science policy and scientometrics literature to extract critical dimensions associated with research quality as presented in a vast although ...

  12. A Step-To-Step Guide to Write a Quality Research Article

    Today publishing articles is a trend around the world almost in each university. Millions of research articles are published in thousands of journals annually throughout many streams/sectors such as medical, engineering, science, etc. But few researchers follow the proper and fundamental criteria to write a quality research article.

  13. Quality in Research: Asking the Right Question

    Quality in Research: Asking the Right Question. Joan E. Dodgson, PhD, MPH, RN, FAAN [email protected] View all authors and affiliations. ... Nursing research: Generating and assessing evidence for nursing practice (10th ed.). Philadelphia, PA: Wolters Kluwer. Google Scholar. Cite article

  14. Assessing the quality of research

    Summary points. Different types of research are needed to answer different types of clinical questions. Irrespective of the type of research, systematic reviews are necessary. Adequate grading of quality of evidence goes beyond the categorisation of research design. Risk-benefit assessments should draw on a variety of types of research.

  15. Citations, Citation Indicators, and Research Quality: An Overview of

    Against this background, this article provides an overview of basic issues related to citations, citation indicators, and their interpretation and validity as performance measures. 2 The question of how citations may relate to or reflect various aspects of the concept of research quality is paid particular attention. The research literature on ...

  16. Full article: Four decades of research on quality: summarising

    Previous quantitative studies of research on quality. Martínez-Lorente, Dewhurst, and Dale (Citation 1998) note that the term TQM started to become popular in the mid-1980s.Dereli, Durmuşoğlu, Delibaş, and Avlanmaz (Citation 2011) conclude that QM began to attract an increasing amount of interest from the service industry during the last decade.. Additionally, Dereli et al. (Citation 2011 ...

  17. What is quality, innovative research?

    Quality research starts from the research proposal and goes through to final dissemination of results. Quality research means you have thoroughly examined the relevant theoretical frameworks and situated your researchable problem within this literature. It means that you have obtained ethical clearance and conducted the research in an ethical ...

  18. Assessing the quality of research

    Inflexible use of evidence hierarchies confuses practitioners and irritates researchers. So how can we improve the way we assess research? The widespread use of hierarchies of evidence that grade research studies according to their quality has helped to raise awareness that some forms of evidence are more trustworthy than others. This is clearly desirable. However, the simplifications involved ...

  19. Full article: Quality 2030: quality management for the future

    6.1. General conclusions. Quality 2030 consists of five collectively designed themes for future QM research and practice: (a) systems perspectives applied, (b) stability in change, (c) models for smart self-organising, (d) integrating sustainable development, and (e) higher purpose as QM booster.

  20. What is quality research? A guide to identifying the key ...

    Standards for quality research. Over the years, researchers, scientists and authors have come to a consensus about the standards used to check the quality of research. Determined through empirical observation, theoretical underpinnings and philosophy of science, these include: 1. Having a well-defined research topic and a clear hypothesis

  21. PDF Definitions and Characteristics of High Quality Research

    Definitions: A research study is a systematic empirical investigation of a specific topic or issue. The purpose of a research study is to develop or contribute to generalizable knowledge on a specific topic. Research studies pose specific questions that are clear, focused, complex and yet concise.

  22. An open-source framework for end-to-end analysis of electronic ...

    In the ehrapy analysis pipeline, EHR data are initially inspected for quality issues by analyzing feature distributions that may skew results and by detecting visits and features with high missing ...

  23. Full article: The Many Meanings of Quality: Towards a Definition in

    1. Introduction. Quality is a multi-faceted and intangible construct (Charantimath, Citation 2011; Zhang, Citation 2001) that has been subject to many interpretations and perspectives in our everyday life, in academia, as well as in industry and the public domain.In industry, most organisations have well-established quality departments (Sousa & Voss, Citation 2002), but the method of ...

  24. Anesthesia Alert: How Does Wildfire Smoke Impact Anesthesia?

    New research ignites discussion about protocols for high-risk patients. S hould anesthesia providers in areas where wildfires are burning change their practices to account for the possible effects of extremely poor air quality on their patients?. That question and many others are being asked in the wake of an article published last month in the Online First edition of Anesthesiology, the peer ...

  25. Editorial: How to develop a quality research article and avoid a

    Introduction. The process of concluding that a submitted research article does not align with the specific journals' aims and scope or meets the required rigour and quality, is commonly termed - desk rejection.This element of the review process is normally performed by the journal Editor directly or by the team of specialist sub-Editors.

  26. Pollution May Affect the Color of City Birds, Research Shows

    Research shows that, compared with rural plants, city trees store fewer natural pigments called carotenoids. And pollution is the likely reason. Carotenoids are produced by plants, algae and fungi.

  27. The Impact of Social Determinants of Health and Acculturation on

    While SDOH significantly impacted the quality of life, acculturation did not significantly predict the quality of life for older Chinese American adults with chronic pain. Further research is warranted to explore the nuanced dynamics between acculturation, SDOH, and quality of life in this population.

  28. Building better bureaucracy

    Research Highlights Article. September 13, 2024. Building better bureaucracy ... bureaucratic staffing decisions out of the hands of elected politicians can have a significant positive impact on the quality of public service delivery. For much of the 19th century, ...