Ethics of AI: A systematic literature review of principles and challenges

Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles. We conducted a systematic literature review (SLR) study to investigate the agreement on the significance of AI principles and identify the challenging factors that could negatively impact the adoption of AI ethics principles. The results reveal that the global convergence set consists of 22 ethical principles and 15 challenges. Transparency, privacy, accountability and fairness are identified as the most common AI ethics principles. Similarly, lack of ethical knowledge and vague principles are reported as the significant challenges for considering ethics in AI. The findings of this study are the preliminary inputs for proposing a maturity model that assess the ethical capabilities of AI systems and provide best practices for further improvements.

1 Introduction

Artificial intelligence (AI) technologies are considered important across a vast array of industries including health, manufacturing, banking and retail [1]. However, the promises of AI systems like improving productivity, reducing costs, and safety has now been considered with worries, that these complex systems might bring more ethical harm than economical good [1].

Artificial intelligence (AI) and autonomous systems have a significant effect on the development of humanity [2]. The autonomous decision-making nature of these systems raises fundamental questions i.e., what are the potential risks involved in those systems, how these systems should perform, how to control such systems and what to do with AI-based systems? [2]. Autonomous systems are beyond the concepts of automation by characterising them with decision-making capabilities. The development of autonomous system components, such as intelligent awareness and self-decision making is based on AI concepts.

There is a political and ethical discussion to develop policies for different technologies including nuclear power, manufacturing etc. to control the ethical damage they could bring. The same ethical potential harm also exits in AI systems and more specifically they might end human control [2].The real-world failure and misuse incidents of AI systems bring the demand and discussion for AI ethics [3]. The ethical studies of AI technologies revealed that AI and autonomous systems should not be only considered a technological effort. There is a broad discussion that the design and use of AI-based systems are culturally and ethically embedded [4]. Developing AI-based systems not only need technical efforts but also include economic, political, societal, intellectual and legal aspects [4]. These systems significantly impact the cultural norms and values of the people [4]. The AI industry and specifically the practitioners should have a deep understanding of ethics in this domain. Recently, AI ethics get press coverage and public voices, which supports significant related research [3]. However, the topic is still not sufficiently investigated both academically and in the real-world environment [4]. There are very few academic studies conducted on this topic, but it is still largely unknown to AI practitioners. The Ethically Aligned Design (EAD) [6] guidelines of IEEE mentioned that ethics in AI is still far from being mature in an industrial setting [5]. The limited or no knowledge of ethics for the AI industry develop the gap, which indicates the need for further academic and in practice research.

The aim of this study is to conduct a systematic literature review (SLR) and explore the available literature to identify the AI ethics principles. Moreover, the SLR study uncover the key challenging factors that are the demotivators for considering the ethics of AI. The following research questions are developed to achieve the given core objectives:

RQ1: What are the key principles of AI ethics?

RQ2: What are the challenges of adopting ethics in AI?

The remaining content of paper is structured as follow: Section 2 presents the background of the study and the research methodology is reported in Section 3 . The SLR data are provided in Section 4 and results and analysis are discussed in Section 5 . Finally, Section 6 provides an overview of threats to the validity of the study and Section 7 conclude the findings with future directions.

2 Background

The implementation of AI or machine intelligence concepts brings a technological revolution that change both science and society. Human to machine power transformation sparked important societal debate about the principles and policies that guide the use and deployment of AI systems [7]. Various organizations have developed ad hoc committees to draft the policy documents for AI ethics. These organizations reportedly developed AI policies and guide documents [7]. In 2018, technology corporates such as SAP and Google publicly introduced guidelines and policies for AI-based systems [7]. Similarly, Amnesty International, the Association of Computing Machinery (ACM) and Access Now comes up with principles and recommendations for AI technologies. The Trustworthy AI European Commission’s guidelines were developed with the aim to promote lawful, ethically sound and robust AI systems [8]. The report “Preparing for the Future of Artificial Intelligence” prepared by the Obama administration’s presents a thorough survey that focuses on the current AI research, its applications and impact on society [9]. The report further presents recommendations for future AI related actions. The “Beijing AI Principles” guidelines [10] proposed various principles in the domain of AI research, development, use and governance. These principles present a framework that focus on AI ethics.

The world largest technical professional organization, IEEE launches the guidelines Ethically Aligned Design (EAD) [6] that provides a framework to address the ethical and technical values of AI systems based on a set of principles and recommendations. The EAD framework consists of the following eight general principles to guide the development and implementations of AI-based systems: human rights, well-being, data agency, effectiveness, transparency, accountability and awareness of misuse. Organizations such as ISO and IEC also embark on developing standard for AI [11]. ISO/IEC JTC 1/SC 42 is a joint ISO/IEC international standard committee that focus on the entire AI ecosystem including ethical and social concerns, standardization, AI governance, AI computational approach and trustworthiness [11]. The effort of different organizations to shape AI ethics not only determine the need of guidelines, tools, techniques, but also the interest of these organizations to manage ethics in a way that meet their respective priorities.

However, recently published studies reported that the existing guidelines developed for ethics of AI are not effective and adopted in practice [12]. It is evident from the empirical study conducted by McNamara et al. [13] to test the influence of the ACM code of ethics in the decision-making process of software development. The results of the study revealed that the ACM code of ethics have no impact in making ethical decisions. The lack of effective techniques makes it challenging to successfully scale the available guidelines into practice [12]. Vakkuri et al. [12] used the accountability, responsibility, and transparency (ART) framework [14] and developed the conceptual model to explore ethical consideration in the AI environment. The conceptual model is empirically validated by conducting multiple case studies. The empirical results are concluded by highlighting that AI ethics principles are still not in practice; however, some common concepts are considered such as documentation. Moreover, the study findings revealed that practitioners consider the social impact of AI systems [12].

There are no tools, methods or frameworks that fill the gap between the AI principles and their implementation in practice. Further studies in this area should conduct that explicitly discuss the AI ethics principles, challenges and provide evaluation standards/models that guide AI industry to consider ethics in practice.

3 Research Method

Systematic literature review (SLR) approach is used to explore the available primary studies. SLR is a widely adopted literature survey method in evidence-based software engineering domain. SLR is “a means of evaluating and interpreting all available research relevant to a particular research question, topic area, or phenomenon of interest” [15]. The Kitchenham and Charters [15] SLR guidelines are used to conduct this study and systematically address the research questions. The SLR process plan is provided in Fig. 1 and thoroughly discussed in the following sections.

Refer to caption

3.1 Research questions (RQs)

Research questions development in SLR studies is the most significant phase [15]. Developing research questions require deep understanding of the research area in general and the research problem in specific. We primarily studied relevant articles [3-9] to better understand the problem and develop the questions of interest. The questions are finally developed based on the research concepts discussed in the mentioned research sources [3-9]. The details of the research questions are provided in Section 1 .

3.2 Data sources

The authors had a series of team discussions to identify the list of digital data sources. The selected digital repositories are explored to extract the relevant data in order to address the given research questions (see Section 1 ). Finally, the following digital libraries are selected based on the authors SLR experience, discussions and guidelines provided by Chen et al. [16]: Springer Link, Science Direct, IEEE Xplore, Wiley Online Library and ACM Digital Library. These are the world leading digital data sources which collect a large number of original information and communication technology studies [16].

3.3 Search strategy

The research questions are analysed by the second and third authors to extract the terms or keywords used for the search process. All the authors participated in the group discussion to finalise the search terms and retrieve the relevant data from the selected repositories. Pilot search terms and strings are made that finally contributed to develop the following agreed search string:

("artificial intelligence ethics" OR "AI ethics" OR "machine learning ethics" OR "software ethics") AND (“resistance” OR “barriers” OR “limitations” OR “challenges”)

The “principles” and “guidelines” terms were excluded from the final search string because these terms return irrelevant data from different other domains. The given search string was specifically testified during the pilot attempts to explore the data related to the AI “principles” and “guidelines” and we noticed that it precisely returns the desire results related to the RQ1, i.e., “principles” and “guidelines”.

The search terms are concatenated using “AND” and “OR” operators to develop the search strings. The selected digital repositories have a customised search mechanism. The search strings are executed using the personalised search mechanism of electronic data sources.

3.4 Inclusion/Exclusion criteria

The inclusion/exclusion criteria are developed to filter the search string findings and remove irrelevant, not accessible, redundant and low-quality studies. The criteria are developed by the first and fifth authors, which are finalised by all the authors in the regular consensus meeting (see Table 1 ).

No. Inclusion criteria
In1 Consider articles that specifically focus on AI ethics.
In2 Primary studies published in conferences, research workshops, book chapters, journals and magazines.
In3 Peer-reviewed and available in full text
In4 Written in the English language
No. Exclusion criteria
Ex1 If two studies are published from the same project, then exclude the one with minimum contribution.
Ex2 Exclude grey literature material.
Ex3 Remove duplicate studies.
Ex4 Discuss ethics in other domains.

3.5 Study selection

The search string discussed in Section 3.3 is used to explore the selected digital repositories. The search process was initiated on 23rd December 2020 and ended on 5th February 2021. The search string retrieved total 811 studies in the first phase, which were further filtered based on the study title, abstract and keywords (see Fig. 2 ). In the second phase of the selection process, the inclusion/exclusion of the 60 studies are performed based on the full-text review. Finally, 24 primary studies are shortlisted using the SLR approach. Moreover, backward snowballing [17] is performed to search the references of the selected 24 studies. The backward snowballing is previously used by Tingting et al. [18] to explore text analysis techniques in software architectural and we used with the aim to explore the references list of the selected primary studies to identify relevant studies that are missed during the SLR process. Additionally, 5 studies are selected, which are further filtered using the inclusion/exclusion criteria (see Fig. 2 ). Eventually, only 3 studies fulfil the selection criteria and the final data sets consist of total 27 primary studies (24 SLR + 3 backward snowballing). Final set of the selected studies is provided in Appendix A, where each study is labeled as [Sn] to differentiate from the general list of references.

Refer to caption

3.6 Quality assessment (QA)

The assessment criteria are developed to evaluate the quality of the selected primary studies and remove the research bias. The quality assessment phase interprets the significance and completeness of each selected primary study [15]. The QA criteria checklist provided by Kitchenham and Charters [15] are analysed and designed the QA questions provided in Table 2 . Each selected primary study evaluated against the quality assessment questions (QA1-QA6). Score (1) assigned if the study comprehensively addresses the quality assessment questions (see Table 2 ). Similarly, 0.5 points are assigned to those who have partially addressed the QA questions. Studies with no evidence of addressing the QA questions are assigned 0 points.

No. Assessment Questions Score
QA1 Does the adopted research method address the research problem? 1/0.5/0
QA2 Does the study have clear research objectives? 1/0.5/0
QA3 Does the study explicitly discuss the proposed research approach? 1/0.5/0
QA4 Is the study clearly reported the experimental setting? 1/0.5/0
QA5 Do the study results and findings are systematically discussed? 1/0.5/0
QA6 Does the study present the real-world implications of the research? 1/0.5/0

3.7 Data extraction

The relevant data to address the RQs are collected by thoroughly reading the selected primary studies and extract the AI ethics principles (RQ1) and challenges (RQ2). The extracted data are recorded on excel sheets. Most of the data are collected by the second and third authors. They assess the quality of the primary studies based on the criteria discussed in Section 3.6 . Moreover, the first, fourth and fifth authors participated in the review meeting to finalize the QA score of each study (Appendix A).

4 Reporting the review

The data collected from the selected 27 primary studies are analyzed and discussed in the following sections.

4.1 Temporal distribution

The year wise distribution of the primary studies is shown in Fig. 3 . Of the 27 studies, total 2, 19, 4 and 2 are respectively published in 2021 (till 5th February), 2020, 2019 and 2018. The first relevant study was found in 2018 and since then, there has been a gradual increase in the number of research publication. The SLR string was finally executed on 5th February 2021, therefore the given results only cover the first two months of 2021. The increasing number of publications reveal that AI ethics is significant, and state of the art research direction. There is still need of substantial research work to explore ethics in AI.

Refer to caption

4.2 Publication type

The selected primary studies are classified across four major types i.e., journal, conference (including workshop), book chapter and magazine. Fig. 4 shows that 19 (70%) studies are published in journals, 3 (11%) in conferences, 4 (15%) book chapters and 1 (4%) is a magazine article. We noticed that journals are the most active venues to publish relevant studies.

Refer to caption

5 Detail results and analysis

The detail results to address RQ1 and RQ2 are discussed in the following sections.

5.1 RQ1 (AI Ethics Principles)

The final set of the primary studies consist of 27 articles and total 21 AI ethics principles are extracted from these articles. The identified principles along with their respective references are provided in Table 3 . Moreover, a word cloud is generated to graphically represent the significance of the reported principles (See Fig. 5 ). Of the 21 principles, transparency (n=17) is the most frequently mentioned principle, followed by privacy (n=16). The third and fourth most common principles are accountability (n=15) and fairness (n=14) respectively.

Refer to caption

Principle-Id Principles Reference Frequency
P-01 Transparency [S1][S2][S3][S4][S5][S6][S7][S8][S9][S10] [S11][S12][S13][S14][S15][S16][S17][S25] 17
P-02 Privacy [S1][S2][S18][S3][S5][S7][S8][S21][S22] [S9][S11][S13][S14][S15][S16][S17] 16
P-03 Accountability [S1][S2][S3][S4][S5][S6][S7][S9][S10][S12] [S13][S14][S15][S17][S25] 15
P-04 Fairness [S1][S18][S4][S5][S6][S7][S8][S21] [S10][S11][S13][S14][S15][S17] 14
P-05 Autonomy [S1][S2][S18][S8][S21][S22][S19][S16][S20] [S25] 10
P-06 Explainability [S6][S19][S9][S11][S13][S14][S16][S20] 8
P-07 Justice [S1][S18][S3][S6][S8][S19][S20] 7
P-08 Non-maleficence [S1][S18][S5][S6][S19][S9][S20] 7
P-09 Human Dignity [S1][S5][S21][S22][S10][S16] 6
P-10 Beneficence [S1][S18][S21][S22][S19][S20] 6
P-11 Responsibility [S1][S2][S5][S9][S12] 5
P-12 Safety [S2][S6][S7][S10][25] 5
P-13 Data Security [S2][S9][S16][S17] 4
P-14 Sustainability [S1][S6] 2
P-15 Freedom [S1] 1
P-16 Solidarity [S1] 1
P-17 Prosperity [S2] 1
P-18 Effectiveness [S2] 1
P-19 Accuracy [S2] 1
p-20 Predictability [S5] 1
P-21 Interpretability [S6] 1

5.1.1 Transparency

Transparency of operations is a major concern in AI/autonomous systems [S5]. It answers how and why a specific decision is made by the system and further triggers the other constructs including interpretability and explainability . It should not only consider for the AI system operations, but must be part of the technical process [S5] to make the decision-making actions more transparent and trustworthy. Both operational and technical transparency could be achieved by developing standards and models that measure and testify the levels of transparency. Such standards could assist the AI system development organizations to assess their level of transparency and provide best practices for further improvements. Moreover, transparency should consider for a wide range of system stakeholders; however, the level of transparency should be varied for them [S4].

5.1.2 Privacy

AI/autonomous system must assure user and data privacy throughout the system lifecycle. It could broadly be defined as “the right to control information about oneself” [S22]. Regulatory institutions are consistently involved in establishing legislation for data privacy and protection [S7]. However, privacy becomes more challenging in data driven AI environment, where the system subsequently processes user data including cleaning, merging, and interpretation [S7]. The data access in self-governing AI systems develop the primary concern of data privacy, which is commonly related to security and transparency [S21]. It is worth noting that AI technologies bring complex challenges associated with data privacy and integrity, which demand more relevant future research [S22].

5.1.3 Accountability

Accountability is the third most frequently reported principle which specifically focuses on liability issues [S5]. It refers to safeguard justice by assigning responsibility and prevent harm [S3]. The stakeholders must be accountable for the system decisions and actions to minimize the culpability problems [S4, S5]. Ensure both technical and social accountability before and after the system development, implementation and operation [S5]. Accountability is closely linked with transparency because the system must be understood before making the liability decisions [S5].

5.1.4 Fairness

Fairness is considered a significant principle of AI ethics. Discrimination between individuals or groups made by the decision-making systems lead to ethical fairness problems, which impact public values including dignity and justice [S11]. Avoiding unfair biases of AI systems could foster social fairness. AI and autonomous systems should not deceive people by impairing their autonomy [S4]. It could achieve by explicitly making the decision-making process more transparent and identifying the accountable entities.

Analysis. Based on the SLR findings, we identified that the above principles received significant attention, which are compatible with the widely adopted accountability, responsibility and transparency (ART) framework [14] of ethics in AI. Responsibility is not a highly cited principle in the selected primary studies and the reason might be that it is considered an associated one with accountability [8]. Moreover, Vakkuri et al. [S5] developed a relational framework based on the key ART constructs with an additional fairness principle. The framework is empirically evaluated to know the opinions and perceptions of the practitioners [S5]. However, the findings of their study are only based on the five major principles and have not considered the other significant principles reported in Table 3 .

5.2 RQ2 (Challenges)

Systematic review of the 27 primary studies returns total 15 challenging factors (See Table 4 ). The frequencies of the identified challenging factors are provided in Fig. 6 , moreover, word cloud is generated to demonstrate the significance of the reported factors (See Fig. 7 ). Following are the details of the highly cited challenges:

Refer to caption

5.2.1 Lack of ethical knowledge

Lack of ethical knowledge is one of the main reasons that AI ethics in practice is still far from being mature [S14]. AI systems development organizations believe that government institutions are not in the position of providing experts to this emerging area, while some opine that establishing ethics in AI is not possible without the political approach [S15]. Similarly, management and technical staff are not aware of the moral and ethical complexity of the AI systems. AI ethics are in their infancy, not enough ethical standards and frameworks are available that provide details guidelines to the AI industry.

5.2.2 Vague principles

There are various AI ethics principles as we discussed in Section 5.1 . However, in practice, majority of the organizations are reluctant to adopt these principles which are highly vague in their definition [S23]. For example, it is not clear how specifically consider “fairness” and “human dignity” in AI ethics [S17]. It is very challenging to consider AI ethics in real world settings using these vaguely formulated principles [S3].

Refer to caption

5.2.3 Highly general

The available principles are highly general and broad in concept to specifically consider in the AI industry [S18]. They are subjective in the term and used in various other domains than AI. Policymakers involved in drafting AI ethics principles might not have strong technical understanding of AI system development processes, which makes the principles more general and ambiguous.

5.2.4 Conflict in practice

Organizations, committees and groups involved in developing the AI ethics guidelines and principles have opinion conflicts regarding the real world implementation of AI ethics [S13, S16]. For example, the UK house of lords suggested that robots cannot solely be in operation, but they should be guided by human beings [S10], on the other hand, in various hospitals’ robots make autonomous decisions in diagnosis and surgical endeavours. It shows interpretation and understanding conflict for AI ethics in practice.

5.2.5 Interpret principles differently

AI ethics principles are widely considered ambiguous and general by majority of the organizations [S20]. It has been found that tech firms involved in the development of AI and autonomous systems follow ethical guidelines based on their own understandings [S27]. There are no universally agreed ethical principles that can bring all the institutions on one page.

5.2.6 Lack of technical understanding

The policymakers have lack of technical knowledge, which makes AI ethics in practice a challenging effort [S10, S13]. They are not aware of the technical aspects of AI systems and the advancement in AI technologies as well their limitations. Lack of technical understanding develops the gap between system design and ethical thinking [S10]. The ethicists must have skills of grasping technical knowledge using their ethical framework [S10].

Analysis. The above reported challenges provide an overview of the most common and frequently cited factors that could be potential barriers for scaling ethics in AI. Lack of ethical knowledge is identified as the most common challenge of AI ethics. Major ethical mistakes are made because of no moral awareness of specific problem [S14]. Practitioners only consider software development activities as the main responsibilities; however, they have limited interest to consider ethical aspects [S5]. The ethical uncertainty in AI systems could only be diminish by acquiring ethical knowledge. Continuous awareness of ethical policies, codes and regulations assist to properly manage the ethical values in AI and autonomous systems.

We noticed that very few studies are published where the barriers of AI ethics are directly or indirectly mentioned. It is evident from the frequency distribution of the challenging factors given in Table 4 . This finding reveals that the AI ethics challenges aspect is very young field and requires considerable research effort from diverse disciplines to be mature. The significance of AI technologies in various sectors calls for rush research to uncover the relevant challenges that hinder the process of considering ethics in AI.

Moreover, the challenging factors having low frequency are not discussed in details because of the page limitation. However, the complete list of the identified factors is provided in Table 4 .

Challenge-Id Challenges Reference
Ch-01 Lack of ethical knowledge [S14], [S15], [S18], [S21], [S23]
Ch-02 Vague principles [S3], [S17], [S24]
Ch-03 Highly general [S19], [S20]
Ch-04 Conflict in practice [S19], [S16]
Ch-05 Interpret principles differently [S19], [S20]
Ch-06 Lack of technical understanding [S10], [S13]
Ch-07 Extra constraints [S11], [S24]
Ch-08 Lacking monitoring bodies [S20]
Ch-09 No Legal frameworks [S20]
Ch-10 Business interest [S25]
Ch-11 Pluralism of ethical methods [S14]
Ch-12 Cases of ethical dilemmas [S14]
Ch-13 Machine distortion [S14]
Ch-14 Lack of Guidance and Adoption [S26]
Ch-15 Lack of Cross-cultural Cooperation [S27]

The long-term plan of the research is to propose a maturity model that could be used to evaluate the ethical capabilities of the organizations involved in developing AI systems. The findings of this systematic review are the initial inputs for the development of the proposed model. Figure 8 shows the preliminary structure of the model and demonstrates how the findings of this review contribute in the development of the principles and challenges component. The identified principles and challenges will be classified across capability and maturity levels. Moreover, best practices will provide to tackle the identified challenges and implement the AI ethics principles. The given model is a proposed idea that will be systematically developed based on the industrial empirical studies and the concepts of the widely adopted CMMI process model [19]. Case study approach is selected to evaluate the real-world significance of the model.

Refer to caption

6 Threats to validity

6.1 construct validity.

The primary studies selection process might affect the quality of the data collected for synthesis. However, we define the formal search strategy and constantly revised it during the regular consensus meetings. Moreover, the given search string might not cover all the relevant articles and missed quality studies. Therefore, we tried to avoid this threat by conducting a pilot search using multiple strings. The final string was developed based on the results returned by the pilot strings. Finally, backward snowballing is performed to identify any additional primary studies that were missed during the SLR process.

6.2 Internal validity

In SLR studies, internal validity refers to the rigorousness of the review process including the development of research questions, data sources, search strategy, study selection, string development etc. This study is conducted by following the formal SLR process guidelines proposed by Kitchenham and Charters [15]. The step-by-step flow of the SLR phases is methodically discussed in Section 3 .

6.3 External validity

External validity is related to the generalizability of the study findings. The results are summaries based on 27 primary studies, because of the novelty of the research topic, very few studies published in this domain. The primary studies sample size (n=27) might not be strong enough to generalize the study findings, however, we plan to extend this study in future by conducting an industrial study to evaluate the SLR findings and know the perceptions of the practitioners.

7 Conclusions and future directions

Ethics in AI gets significant attention in the last couple of years and there is a need of systematic literature study that discuss the principles and uncover the key challenges of AI ethics. This study is conducted to fill the given research gap by following the SLR approach. We identified total 27 relevant primary studies and the systematic review of the selected studies return 22 principles and 15 challenging factors. We noticed that most of the studies focus on four major principles i.e., transparency, privacy, accountability and fairness, which should consider by the AI system designers. Moreover, the decision-making systems should also be aware of the ethical principles to know the implications of their actions.

The challenges of ethics in AI are identified to provide an understanding of the factors that hinder the implementation of ethical principles. The most frequently reported challenging factors are lack of ethical knowledge and vague principles. The knowledge and understanding of ethics are important for both management and technical teams. It further removes the vagueness in AI principles. Lack of ethical knowledge could undermine the significance of decision-making systems.

We plan to extend this study by conducting an industrial survey to investigate the understanding of AI ethics in practice and identify the best practices to tackle the given challenging factors and manage the reported principles. Moreover, industrial case studies will be conducted in AI industry to assess the effectiveness of the proposed maturity model in practice.

8 Acknowledgments

  • (1) Christina Pazzanese. 2020. Ethical concerns mount as AI takes bigger decision-making role in more industries. Retrieved January 15, 2021 from https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/
  • (2) Vincent C. Müller. 2020. Ethics of Artificial Intelligence and Robotics. The Stanford Encyclopedia of Philosophy. Retrieved January 15, 2021 from https://plato.stanford.edu/archives/win2020/entries/ethics-ai/
  • (3) Ville Vakkuri, Kai-Kristian Kemell and Pekka Abrahamsson. 2019. Implementing Ethics in AI: Initial Results of an Industrial Multiple Case Study. In International Conference on Product-Focused Software Process Improvement, Lecture Notes in Computer Science, vol 11915. Springer, Cham, 331-338. https://doi.org/10.1007/978-3-030-35333-9_24
  • (4) Jaana Leikas, Raija Koivisto, and Nadezhda Gotcheva. 2019. Ethical framework for designing autonomous intelligent systems. Journal of Open Innovation: Technology, Market, and Complexity 5, 1 (2019), 18. https://doi.org/10.3390/joitmc5010018
  • (5) Ville Vakkuri, Kai-Kristian Kemell and Pekka Abrahamsson. 2019. AI ethics in industry: a research framework. arXiv preprint arXiv:1910.12695.
  • (6) IEEE. (2019). Ethically aligned design: A vision for prioritizing human well-being with autonomous and intelligent systems, first edition. Retrieved January 17, 2021 from https://tinyurl.com/yah4jzb6
  • (7) Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, no. 9 (2019), 389-399. https://doi.org/10.1038/s42256-019-0088-2
  • (8) Pekka Ala-Pietilä, Wilhelm Bauer, Urs Bergmann, Mária Bieliková, Cecilia Bonefeld-Dahl, Yann Bonnet, Loubna Bouarfa et al. (2018). The European Commission’s high-level expert group on artificial intelligence: Ethics guidelines for trustworthy AI. Working Document for stakeholders’ consultation. Retrieved January 17, 2021 from https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
  • (9) Alan Bundy. 2016. Preparing for the future of artificial intelligence. Executive Office of the President National Science and Technology Council Committee on Technology Washington, D.C, USA. Retrieved January 23, 2021 from https://cra.org/ccc/wp-content/uploads/sites/2/2016/11/NSTC_preparing_for_the_future_of_ai.pdf
  • (10) Beijing Academy of Artificial Intelligence. 2019. Beijing AI principles. Retrieved January 23, 2021 from https://www.baai.ac.cn/news/beijing-ai-principles-en.html
  • (11) ISO/IEC. ISO/IEC JTC 1/SC 42 Artificial intelligence, Retrieved 25th January 2021, https://www.iso.org/committee/6794475.html
  • (12) Ville Vakkuri, Kai-Kristian Kemell, Marianna Jantunen, and Pekka Abrahamsson. 2020. “This is Just a Prototype”: How Ethics Are Ignored in Software Startup-Like Environments. In International Conference on Agile Software Development, Lecture Notes in Business Information Processing, vol 383. Springer, Cham. 195-210. https://doi.org/10.1007/978-3-030-49392-9_13
  • (13) Andrew McNamara, Justin Smith, and Emerson Murphy-Hill. 2018. Does ACM’s code of ethics change ethical decision making in software development? In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 729–733. DOI:https://doi.org/10.1145/3236024.3264833
  • (14) Virginia Dignum. 2017. Responsible autonomy. Preprint arXiv:1706.02513
  • (15) Barbara Kitchenham and Stuart Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE Technical Report. School of Computer Science and Mathematics, Keele University, UK.
  • (16) Lianipng Chen, Muhammad Ali Babar, and He Zhang. 2010. Towards an evidence-based understanding of electronic data sources. In Proceedings of the 14th international conference on Evaluation and Assessment in Software Engineering (EASE’10). BCS Learning & Development Ltd., Swindon, GBR, 135–138.
  • (17) Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE ’14). Association for Computing Machinery, New York, USA, 1–10. DOI:https://doi.org/10.1145/2601248.2601268
  • (18) Tingting Bi, Peng Liang, Antony Tang, and Chen Yang. 2018. A systematic mapping study on text analysis techniques in software architecture. Journal of Systems and Software 144, (2018), 533-558.https://doi.org/10.1016/j.jss.2018.07.055
  • (19) CMMI Product Team. 2002. Capability maturity model® integration (CMMI SM), version 1.1. CMMI for systems engineering, software engineering, integrated product and process development, and supplier sourcing (CMMI-SE/SW/IPPD/SS, V1. 1) 2 (2002).

9 Appendices

Appendix A: Selected primary studies

S. No. Selected Primary Studies Q1 Q2 Q3 Q4 Total
S1 Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, no. 9 (2019), 389-399. https://doi.org/10.1038/s42256-019-0088-2 1 1 1 1 4
S2 Siau, Keng, and Weiyu Wang. 2020. Artificial intelligence (AI) ethics: ethics of AI and ethical AI. Journal of Database Management (JDM) 31, 2 (2020), 74-87. DOI: 10.4018/JDM.2020040105 1 1 1 1 4
S3 Canca Cansu. 2020. Operationalizing AI ethics principles. Communications of the ACM 63, 12 (2020), 18-21. DOI: 10.1145/3430368. 1 1 1 1 4
S4 Hoffmann A. Lauren, Sarah T. Roberts, Christine T. Wolf, and Stacy Wood. 2018. Beyond fairness, accountability, and transparency in the ethics of algorithms: Contributions and perspectives from LIS. Proceedings of the Association for Information Science and Technology 55, 1 (2018), 694-696. https://doi.org/10.1002/pra2.2018.14505501084 1 1 0 0.5 2.5
S5 Ville Vakkuri, Kai-Kristian Kemell, Joni Kultanen and Pekka Abrahamsson. 2020. The Current State of Industrial Practice in Artificial Intelligence Ethics. IEEE Software 37, 4 (2020), 50-57. DOI: 10.1109/MS.2020.2985621 1 1 1 1 4
S6 Charles D. Raab. 2020. Information privacy, impact assessment, and the place of ethics. Computer Law & Security Review 37 (2020), 105404. https://doi.org/10.1016/j.clsr.2020.105404 1 1 1 1 4
S7 Wenjun Wu, Tiejun Huang and Ke Gong. 2020. "Ethical principles and governance technology development of AI in China. Engineering 6, 3 (2020), 302-309. https://doi.org/10.1016/j.eng.2019.12.015 1 1 1 1 4
S8 Sara Gerke, Timo Minssen, and Glenn Cohen. 2020. Ethical and legal challenges of artificial intelligence-driven healthcare. In Artificial intelligence in healthcare, (2020), 295-336. https://doi.org/10.1016/B978-0-12-818438-7.00012-5 1 0.5 1 0.5 3
S9 Richard Benjamins. 2021. A choices framework for the responsible use of AI. AI and Ethics 1, 1 (2021), 49-53. https://doi.org/10.1007/s43681-020-00012-5 1 1 1 1 4
S10 Thilo Hagendorff. 2020. The ethics of AI ethics: An evaluation of guidelines. Minds and Machines 30, no. 1 (2020), 99-120. https://doi.org/10.1007/s11023-020-09517-8 1 1 1 1 4
S11 Nagadivya Balasubramaniam, Marjo Kauppinen, Sari Kujala and Kari Hiekkanen. 2020. Ethical Guidelines for Solving Ethical Issues and Developing AI Systems. In International Conference on Product-Focused Software Process Improvement, Lecture Notes in Computer Science, vol 12562, Springer, Cham, 331-346. https://doi.org/10.1007/978-3-030-64148-1_21 1 1 1 1 4
S12 Ville Vakkuri and Kai-Kristian Kemell. 2019. Implementing AI Ethics in Practice: An Empirical Evaluation of the RESOLVEDD Strategy. In International Conference on Software Business. Lecture Notes in Business Information Processing, vol 370. Springer, Cham. https://doi.org/10.1007/978-3-030-33742-1_21 pp. 260-275 1 1 1 1 4
S13 Ray Eitel-Porter. 2021. Beyond the promise: implementing ethical AI. AI and Ethics 1, 1 (2021), 73-80. https://doi.org/10.1007/s43681-020-00011-6 1 1 1 1 4
S14 Tomas Hauer. 2020. Machine Ethics, Allostery and Philosophical Anti-Dualism: Will AI Ever Make Ethically Autonomous Decisions?." Society 57, 4 (2020), 425-433. https://doi.org/10.1007/s12115-020-00506-2 1 1 1 1 4
S15 Josef Baker-Brunnbauer. 2020. Management perspective of ethics in artificial intelligence." AI and Ethics (2020), 1-9. https://doi.org/10.1007/s43681-020-00022-3 1 1 1 1 4
S16 Banu Buruk, Perihan Elif Ekmekci and Berna Arda. 2020. A critical perspective on guidelines for responsible and trustworthy artificial intelligence. Medicine, Health Care and Philosophy 23, 3 (2020), 387-399. https://doi.org/10.1007/s11019-020-09948-1 1 1 1 1 4
S17 Mittelstadt, Brent. 2019. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1, 11 (2019), 501-507. https://doi.org/10.1038/s42256-019-0114-4 1 1 1 1 4
S18 Jess Whittlestone, Rune Nyrup, Anna Alexandrova, and Stephen Cave. 2019. The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19). Association for Computing Machinery, New York, NY, USA, 195–200. DOI:https://doi.org/10.1145/3306618.3314289 1 1 0 1 3
S19 Luciano Floridi, Josh Cowls, Monica Beltrametti, Raja Chatila, Patrice Chazerand, Virginia Dignum, Christoph Luetge, Robert Madelin, Ugo Pagallo, Francesca Rossi, Burkhard Schafer, Peggy Valcke and Effy Vayena. 2018. AI4People—an ethical framework for a good AI society: opportunities, risks, principles, and recommendations. Minds and Machines 28, 4 (2018), 689-707. https://doi.org/10.1007/s11023-018-9482-5 1 1 1 1 4
S20 Jessica Morley, Luciano Floridi, Libby Kinsey, and Anat Elhalal. 2020. From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices. Science and engineering ethics 26, 4 (2020), 2141-2168. https://doi.org/10.1007/s11948-019-00165-5 1 1 1 1 4
S21 Ángel G. de Ágreda. 2020. Ethics of autonomous weapons systems and its applicability to any AI systems. Telecommunications Policy 44, 6 (2020), 101953. https://doi.org/10.1016/j.telpol.2020.101953 1 1 1 1 4
S22 Shuili Du, and Chunyan Xie. 2020. Paradoxes of artificial intelligence in consumer markets: Ethical challenges and opportunities. Journal of Business Research (2020). https://doi.org/10.1016/j.jbusres.2020.08.024. 1 1 1 1 4
S23 Mark Coeckelbergh. 2020. 11 CHALLENGES FOR POLICYMAKERS, AI Ethics, MIT Press, 2020, 167-181. DOI: 10.7551/mitpress/12549.001.0001 0.5 0.5 1 1 3
S24 E. L. Sidorenko, Z. I. Khisamova and U. E. Monastyrsky. 2021. The Main Ethical Risks of Using Artificial Intelligence in Business. In Current Achievements, Challenges and Digital Chances of Knowledge Based Economy, Springer, Cham, 423-429. https://doi.org/10.1007/978-3-030-47458-4_51 0.5 1 1 0.5 3
S25 Chris Rees. 2020 .The Ethics of Artificial Intelligence. In Unimagined Futures–ICT Opportunities and Challenges, Springer, Cham, 55-69. https://doi.org/10.1007/978-3-030-64246-4_5 1 1 1 1 4
S26 Jan Jöhnk, Malte Weißert and Katrin Wyrtki. 2021. Ready or Not, AI Comes—An Interview Study of Organizational AI Readiness Factors. Business & Information Systems Engineering 63, 1 (2021), 5-20. https://doi.org/10.1007/s12599-020-00676-7 1 1 1 1 4
S27 Seán S. ÓhÉigeartaigh, Jess Whittlestone, Yang Liu, Yi Zeng, and Zhe Liu. 2020. Overcoming barriers to cross-cultural cooperation in AI ethics and governance. Philosophy & Technology 33, 4 (2020), 571-593. https://doi.org/10.1007/s13347-020-00402-x 1 1 1 1 4

ar5iv homepage

  • Research article
  • Open access
  • Published: 15 February 2021

Artificial intelligence for good health: a scoping review of the ethics literature

  • Kathleen Murphy 1 ,
  • Erica Di Ruggiero 2 ,
  • Ross Upshur 3 , 4 ,
  • Donald J. Willison 5 ,
  • Neha Malhotra 1 ,
  • Jia Ce Cai 1 ,
  • Nakul Malhotra 1 ,
  • Vincci Lui 6 &
  • Jennifer Gibson 1  

BMC Medical Ethics volume  22 , Article number:  14 ( 2021 ) Cite this article

44k Accesses

148 Citations

22 Altmetric

Metrics details

Artificial intelligence (AI) has been described as the “fourth industrial revolution” with transformative and global implications, including in healthcare, public health, and global health. AI approaches hold promise for improving health systems worldwide, as well as individual and population health outcomes. While AI may have potential for advancing health equity within and between countries, we must consider the ethical implications of its deployment in order to mitigate its potential harms, particularly for the most vulnerable. This scoping review addresses the following question: What ethical issues have been identified in relation to AI in the field of health, including from a global health perspective?

Eight electronic databases were searched for peer reviewed and grey literature published before April 2018 using the concepts of health, ethics, and AI, and their related terms. Records were independently screened by two reviewers and were included if they reported on AI in relation to health and ethics and were written in the English language. Data was charted on a piloted data charting form, and a descriptive and thematic analysis was performed.

Upon reviewing 12,722 articles, 103 met the predetermined inclusion criteria. The literature was primarily focused on the ethics of AI in health care, particularly on carer robots, diagnostics, and precision medicine, but was largely silent on ethics of AI in public and population health. The literature highlighted a number of common ethical concerns related to privacy, trust, accountability and responsibility, and bias. Largely missing from the literature was the ethics of AI in global health, particularly in the context of low- and middle-income countries (LMICs).

Conclusions

The ethical issues surrounding AI in the field of health are both vast and complex. While AI holds the potential to improve health and health systems, our analysis suggests that its introduction should be approached with cautious optimism. The dearth of literature on the ethics of AI within LMICs, as well as in public health, also points to a critical need for further research into the ethical implications of AI within both global and public health, to ensure that its development and implementation is ethical for everyone, everywhere.

Peer Review reports

Introduction

Artificial intelligence (AI) has been described as the “fourth industrial revolution” with transformative and global implications [ 1 ]. AI can be generally understood as “a field of study that combines computer science, engineering and related disciplines to build machines capable of behaviour that would be said to require intelligence were it to be observed in humans” [ 2 ]. Some such behaviours include the ability to visually perceive images, recognize speech, translate language, and learn from and adapt to new information [ 2 ]. To do so, AI as a field of study can employ a number of techniques. Machine learning, for instance, allows algorithms to make predictions and solve problems based on large amounts of data, without being explicitly programmed [ 2 ]. Deep learning is a subset of machine learning, and goes further to use multiple layers of artificial neural networks to solve complex problems from unstructured data, much like the human brain [ 2 , 3 , 4 ]. Many countries have developed or are in the process of developing national AI strategies and policies to promote research, development, and adoption of AI methods and technologies [ 5 ]. Amongst them, Canada was the first country to release a $125 million Pan-Canadian Artificial Intelligence Strategy to advance new public and private sector collaborations to stimulate research in AI [ 6 ]. Investments in AI are rapidly increasing with the potential for economic gains, projected at a $15.7 trillion contribution to the global economy by 2030 [ 7 ].

Amidst the nascence of AI, ethics has been identified as a priority concern in the development and deployment of AI across sectors [ 8 , 9 , 10 ]. In efforts to address this concern, there has been a proliferation of initiatives, including the establishment of organizations and principles documents [ 11 ] to provide guidance to those working within the AI space. Some such initiatives include the Partnership on AI [ 12 ], OpenAI [ 13 ], the Foundation for Responsible Robotics [ 14 ], the Ethics and Governance of Artificial Intelligence Initiative [ 15 ], the Montréal Declaration for Responsible Development of Artificial Intelligence [ 16 ], and the Principles for Accountable Algorithms [ 17 , 18 ]. While there is increasing support from funding bodies for research on the social and ethical implications of AI [ 19 , 20 , 21 , 22 ], to date there has been limited attention by the academic bioethics community on AI within the field of health, particularly within the context of a globalized world. The health sector, however, is a growing area of AI research, development and deployment, with AI holding promise for the promotion of healthy behaviours; the detection and early intervention of infectious illnesses and environmental health threats; and the prevention, diagnosis, and treatment of disease [ 23 , 24 , 25 ].

The World Health Organization (WHO), for example, has established the “triple billion” target whereby it aims to have 1 billion more people benefit from universal health coverage, be better protected from health emergencies, and experience better health and wellbeing, and it believes that AI can help it achieve those objectives [ 26 ]. The WHO has been advancing the discussion of AI within health through its various Collaborating Centres, the AI for Global Good Summit, the development of the WHO Guideline Recommendations on Digital Interventions for Health System Strengthening [ 27 ], and its commitment to supporting countries in realizing the benefits of AI for health. Indeed, AI has been described by former WHO Director General Dr. Margaret Chan as the new frontier for health with transformative implications [ 28 ]. Yet amidst its promise, the introduction of AI in all corners of the world is accompanied by ethical questions that need to be uncovered from a global health perspective in order to be adequately addressed.

Global health has been defined as “an area for study, research, and practice that places a priority on improving health and achieving equity in health for all people worldwide” (p.1995), placing particular emphasis on the prevention and treatment of transnational population- and individual-level health issues through interdisciplinary and international collaboration [ 29 ]. To the extent that public health concerns the health of populations, global health concerns the health of populations on a global scale that transcends national boundaries and that underpins the interdependencies and interconnectivity of all people within a broader geopolitical, economic, and environmental context [ 29 ]. While both are critically important, AI, with its potential impact on research and development, trade, warfare, food systems, education, climate change, and more [ 30 , 31 ], all of which either directly or indirectly impact the health of individuals, is inherently global.

In 2015, the 17 Sustainable Development Goals (SDGs) were unanimously adopted by all United Nations’ Member States. Goal 3 aims to achieve “good health and well-being” [ 32 ] and Goal 10 targets the reduction of inequalities [ 33 ]. While the SDGs are founded on the values of equity, inclusion, global solidarity, and a pledge to leave no one behind [ 34 ], the advent of AI could further exacerbate existing patterns of health inequities if the benefits of AI primarily support populations in high-income countries (HICs), or privilege the wealthiest within countries. Vinuesa and colleagues [ 35 ] assessed the role of AI in achieving all 17 SDGs (and their 169 targets), and found that while AI may serve predominantly as an enabler for achieving all targets in SDG 3, for SDG 10, it can be almost equally inhibiting as it is enabling. Considering, for instance, that many low- and middle-income countries (LMICs) still face significant challenges in digitizing their health records [ 36 ], data from which AI relies, there remains a substantial technological gap to overcome in order for LMICs to harness the potential benefits offered by AI. With increasing scale and diffusion of AI technologies in health worldwide, it is therefore imperative to identify and address the ethical issues systematically in order to realize the potential benefits of AI, and mitigate its potential harms, especially for the most vulnerable.

With this pursuit in mind, the purpose of this scoping review was to scope the academic and grey literatures in this emerging field, to better understand the discourse around the ethics of AI in health, and identify where gaps in the literature exist. Our research question was as follows: What ethical issues have been identified in relation to AI in the field of health, including from a global health perspective? Results from this scoping review of the academic and grey literatures include: (a) the selection of sources of evidence, (b) a descriptive analysis of the literature reviewed, (c) common ethical issues related to AI technologies in health, (d) ethical issues identified for specific AI applications in health, and (e) gaps in the literature pertaining to health, AI, and ethics.

Our approach to scoping the literature was informed by the methods outlined by Levac, Colquhoun, and O’Brien [ 37 ], and the reporting guidelines established by Tricco, Lillie, Zarin, O'Brien, Colquhoun, Levac, et al. [ 38 ]. The core search concepts for the scoping review were AI, health, and ethics. Given the evolving nature of the AI field, both academic and grey literatures were included in the search. To enhance the rigour of our grey literature search specifically, the grey literature search was informed by search methods outlined by Godin, Stapleton, Kirkpatrick, Hanning, and Leatherdale [ 39 ].

Eligibility criteria

In keeping with a scoping review methodological approach [ 37 ], the inclusion and exclusion criteria were defined a priori and were refined as necessary throughout the iterative screening process involving the full project team at the beginning, middle, and end of the screening process to ensure consistency. Articles were selected during title and abstract screening if they met the following inclusion criteria: [1] records reported on all three core search concepts (AI, ethics, and health), and [2] records were written in the English language. The criterion for articles written in the English language was included because it is the language spoken by the majority of the research team, and thus allowed us to engage in a collaborative analysis process and enhance the rigour of our review. With regard to exclusion criteria, we excluded articles that did not include each of the concepts of AI, ethics and health, as well as those not written in the English language. Although ‘big data’ is a critical input to AI systems, articles that focused only on ethics and big data without explicit mention of AI methods or applications were excluded. Non-peer-reviewed academic literature was also excluded (e.g. letters, and non-peer reviewed conference proceedings), as were books and book chapters, each of which are categorized as ‘irrelevant record type’ in Fig.  1 . Finally, invalid records (e.g. those that only included a string of code, or a date and no other information) and additional duplicates identified through the title/abstract screening process were excluded as well. No date or study design limits were applied, in order to obtain as extensive a literature base as possible. For the grey literature specifically, media articles, blog posts, and magazine entries were excluded, as we were more interested in documents that were both expert-driven, and which required a degree of methodological rigour (e.g. organization/institution reports). During full-text screening, records were excluded if any of the core search concepts were not engaged in a substantive way (e.g. if a concept was mentioned in passing or treated superficially); if there was an insufficient link made between health, ethics, and AI; if the ethics of AI was not discussed in relation to human health; if the article was not written in the English language; and if it was an irrelevant record type (e.g. a book, news article, etc.).

figure 1

Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) flow diagram. This PRISMA flow diagram depicts the number of records identified at each state of the scoping review literature selection process

Information sources

Searches of the peer-reviewed literature were executed in eight electronic databases: OVID MEDLINE (1946-present,includinge-pubaheadofprintandin-processandotherunindexedcitations), OVID Embase, (1947-present), OVID PsycINFO (1806-present), EBSCO CINAHL Plus with Full Text (1937-present), ProQuest Sociological Abstracts (1952-present), ProQuest Philosopher’s Index (1940-present), ProQuest Advanced Technologies & Aerospace (1962-present) and Wiley Cochrane Library. The search strategy was translated into each database using combinations of each database platform's command language, controlled vocabulary, and appropriate search fields, using MeSH terms, EMTREE terms, APA’s Thesaurus of Psychological Index Terms, CINAHL headings, Sociological Thesaurus, Philosopher’s Index subject headings, and Advanced Technologies & Aerospace subject headings in conjunction with keywords. Limits imposed were for English language-only articles; a filter excluding animal studies was applied to searches in MEDLINE, Embase, and PsycINFO, as we were interested in the ethics of AI as it applies to humans; and a filter for health or medicine-related studies was applied to the Advanced Technologies & Aerospace database, to reduce the high volume of solely technical studies. Final searches of the peer-reviewed literature were completed on April 23, 2018.

Grey literature was retrieved between April 25 th and September 12 th , 2018, from (a) searches of grey literature databases including OAIster, Google Scholar, the Canadian Electronic Library, and the Canadian Institute for Health Information; (b) a Google search and customized Google search engines which included documents from think tanks, the Canadian government, and non-governmental organizations; (c) 28 targeted website searches of known organizations and institutions; and (d) the results from a prior environmental scan conducted by a member of the project team (J.G.). The targeted website searches were undertaken to identify any grey literature that was not captured in the grey literature databases and customized Google searches. The 28 websites searched were chosen based on the existing knowledge of members of the research team, in addition to input from stakeholders who attended an AI and health symposium in June 2018. For the purposes of feasibility and relevance, only reports from the year 2015 and beyond were retrieved.

The search strategy for the academic literature was developed by an academic health science librarian (V.L.) based on recommendations from the project leads (J.G., E.DiR., R.U.), and peer-reviewed by a second librarian. The full electronic search of the peer-reviewed literature can be found in Additional file 1 , with an example search from OVID MEDLINE (1946-present,includinge-pubaheadofprintandin-processandotherunindexedcitations). The search strategy and results for the grey literature is similarly outlined in Additional file 2 .

Selection and sources of evidence

All identified records from the academic and grey literature searches were imported into the reference management software EndNote. After removing duplicate records, screening was conducted in two steps. First, the titles and abstracts of academic records were independently screened by two reviewers based on the inclusion and exclusion criteria established a priori. Reviewers consulted academic record keywords if the title and abstract lacked clarity in relation to the core concepts. Given that the majority of the grey literature did not include abstracts, grey literature records were screened initially on title. So as not to overlook relevant grey literature (given that some grey literature discussed ethical issues of AI more generally, including those pertaining to health), records proceeded to a full-text screening even if the title alluded to two of our three search concepts. A third reviewer assessed any records for which there was uncertainty among the reviewers about fit with the inclusion/exclusion criteria or discrepancy in reviewer assessments, and a final decision was made upon consensus with the research team. All records that passed the first level screening were pulled for full-text review by the two independent reviewers. The independent review and iterative team process were applied. The resulting sample was retained for data charting and analysis.

Data charting process

Draft data charting forms for recording extracted data from the screened articles were created using Microsoft Excel (Version 16.18.(181,014)) based on the scoping review research question. As per the recommendations of Levac et al. [ 37 ], the data charting forms were piloted by having two project team members independently chart the first 10 academic and grey literature records [20 in total], with any arising discrepancies or uncertainties being brought to the larger project team for an agreed-upon resolution. The forms were further refined based on discussions with the project team and finalized upon consensus prior to completing the data charting process. For the remaining articles, each record was charted by one member of the research team, and weekly check-in meetings with the research team were held to ensure consistency in data charting, and to verify accuracy.

We extracted data on the objective of each paper; the institutional affiliations of authors; the publication year; the country of the first and corresponding authors; whether a conflict of interest was stated; the health context of interest; the AI applications or technologies discussed; the ethical concepts, issues or implications raised; any reference to global health; and recommendations for future research, policy, or practice. Data was copy and pasted directly into the data charting form with the corresponding page number, so that no information was lost to paraphrasing. A template of the data charting form can be found in Additional file 3 .

Synthesis of results

The analysis comprised two components: descriptive and thematic. The descriptive analysis captured information about global location of primary authorship, dates of publication, and the AI application(s) discussed. Primary authorship was determined by the institutional location of the first author. The academic and grey literatures were compared to identify any notable differences in scope and emphasis. The thematic analysis [ 40 ] was conducted inductively. First, open descriptive codes were generated from a random sample of 10 academic records, and 10 grey literature records from which data had been extracted in the data charting form. Upon generating consensus among project team members on the appropriate codes after several attempts at refinement, codes were applied to meaningful data points throughout the entirety of the grey and academic records in the respective data charting forms, with new codes added as necessary. These codes were reorganized into themes and then compared amongst one another to identify commonalities and gaps in the literature, including convergences and divergences between the grey and academic literatures in relation to the original research question. Results are presented below in a narrative format, with complimentary tables and figures to provide visual representation of key findings.

Selection of sources of evidence

Of the 12,722 records identified after de-duplication, 81 peer-reviewed articles and 22 grey literature records met the inclusion criteria for a total of 103 records in the scoping review sample (Fig.  1 ).

Descriptive analytics

The vast majority of publications had primary authors in the United States (n = 42) or the United Kingdom (n = 17) (Fig.  2 ) and while our literature search yielded publications between 1989 and 2018, most were published between 2014 and 2018 (Fig.  3 ). The academic and grey literatures addressed numerous AI-enabled health applications, including in particular, care robots Footnote 1 (n = 48), followed by diagnostics (n = 36), and precision medicine (n = 16) (Fig.  4 ).

figure 2

Number of publications by country, based on first author affiliation. *Note that two records were published by international organizations, and the geographic origin of one record is unknown. These three records are not represented in the above figure. This map was created using mapchart.net

figure 3

Number of publications reviewed, categorized by year of publication. *The graph begins in year 2013, after which the majority of articles were published

figure 4

Publications reviewed according to the most frequently reported AI health applications. *The graph begins in year 2013, after which the majority of articles were published

There were notable differences between the academic and grey literature sources in terms of authorship, AI health applications addressed, and treatment of ethical implications. The academic literature was written by persons primarily affiliated with academic institutions, whereas the grey literature was written by researchers, industry leaders, and government officials, often collaboratively, with authors frequently affiliated with multiple institutions. The grey literature tended to cover a broader range of AI health applications, issues, and trends, and their associated ethical implications, whereas the academic papers typically centered their discussion on one or at most a few topics or applications. The grey literature was oriented more towards broader health and social policy issues, whereas the academic literature tended to focus on a particular dimension of AI in health. As compared to the grey literature, robotics, particularly care robotics (a) were highly represented in the peer-reviewed literature (48% of peer-reviewed literature, n = 39; 18% of the grey literature, n = 4). The academic literature on care robots was most concerned with the ethics of using care robots in health settings (e.g. “How much control, or autonomy, should an elderly person be allowed?”… “Are the safety and health gains great enough to justify the resulting restriction of the individual’s liberty?” (41, p.31, p.33), whereas the grey literature tended to emphasize ethical or operational implications of using robots in health settings, such as the potential displacement of human jobs [ 42 ].

Common ethical themes

Four ethical themes were common across the health applications of AI addressed in the literature, including data privacy and security, trust in AI, accountability and responsibility, and bias. These issues, while in many ways interconnected, were identified based on how distinctly they were discussed in the literature.

Privacy and security

Issues of privacy and data security were raised about the collection and use of patient data for AI-driven applications, given that these systems must be trained with a sizeable amount of personal health information [ 43 , 44 ]. Highlighted concerns about the collection and use of patient data were that they may be used in ways unbeknownst to the individual from whom the information was collected [ 45 ], and that there is a potential for information collected by and for AI systems to be hacked [ 45 ]. One illustrative example of this challenge was that of the diagnostic laboratory database in Mumbai that was hacked in 2016, during which 35,000 patient medical records were leaked, inclusive of patient HIV status, with many patients never informed of the incident [ 45 ]. Further noted was that patients may believe that their data are being used for one purpose, yet it can be difficult to predict what the subsequent use may be [ 46 , 47 ]. For example, ubiquitous surveillance for use by AI systems through personal devices, smart cities, or robotics, introduces the concern that granular data can be re-identified [ 48 , 49 ], and personal health information can be hacked and shared for profit [ 49 ]. Of further concern was that these smart devices are often powered by software that is proprietary, and consequently less subject to scrutiny [ 48 ]. The stated implications of these privacy and security concerns were vast, with particular attention given to if ever personal data was leaked to employers and insurance companies [ 46 , 50 , 51 , 52 , 53 , 54 ]. A prevailing concern was how population sub-groups may then be discriminated against based on their social, economic, and health statuses by those making employment and insurance decisions [ 49 , 50 , 51 , 53 ].

Trust in AI applications

The issues of privacy, security, and patient and healthcare professional [HCP] trust of AI were frequently and closely linked in the literature. Attention was given, for instance, to how individuals must be able to trust that their data is used safely, securely, and appropriately if AI technology is to be deployed ethically and effectively [ 2 , 46 , 55 , 56 , 57 ]. Asserted in the literature was that patients must be well enough informed of the use of their data in order to trust the technology and be able to consent or reject its use [ 52 , 56 ]. One example that highlights these concerns is the data sharing partnership between Google DeepMind, an AI research company, and the Royal Free London NHS Foundation Trust (NHS) [ 49 , 58 ]. Identifiable data from 1.6 million patients was shared with DeepMind with the stated intention of improving the management of acute kidney injuries with a clinical alert app [ 58 ]. However, there was a question of whether the quantity and content of the data shared was proportionate to what was necessary to test the app, and why it was necessary for DeepMind to retain the data indefinitely [ 49 , 58 ]. Furthermore, this arrangement has come under question for being made in the absence of adequate patient consent, consultations with relevant regulatory bodies, or research approval, threatening patient privacy, and consequently public trust [ 49 , 58 ].

HCPs have similarly demonstrated a mistrust in AI, resulting in a hesitancy to use the technology [ 59 , 60 ]. This was exhibited, for instance, by physicians in various countries halting the uptake of IBM’s Watson Oncology, an AI-powered diagnostic support system [ 61 ]. These physicians stated that Watson’s recommendations were too narrowly focused on American studies and physician expertise, and failed to account for international knowledge and contexts [ 61 ]. The distrust amongst HCPs was also raised with regard to machine learning programs being difficult to both understand and explain [ 62 , 63 ]. In contrast, a fear exists that some HCPs may place too much faith in the outputs of machine learning processes, even if the resulting reports, such as brain mapping results from AI systems, are inconclusive [ 57 ]. One suggestion to improve HCP trust in AI technology was to deploy training and education initiatives so HCPs have a greater understanding of how AI operates [ 43 ]. A further suggestion was to promote the inclusion of end-users in the design of the technology so that not only will end-users develop a better understanding of how it functions [ 64 ], but user trust will also increase through a more transparent development process [ 47 ].

Accountability and responsibility for use of AI technology

Frequently mentioned was the question of who ought to assume responsibility for errors in the application of AI technology to clinical and at-home care delivery [ 41 , 45 , 58 , 59 , 60 , 65 , 66 , 67 ]. The question often arose in response to the fact that AI processes are often too complex for many individuals to understand and explain, which hinders their ability to scrutinize the output of AI systems [ 2 , 61 , 66 ]. Similarly, grounds for seeking redress for harm experienced as a result of its use were noted to be obstructed by the proprietary nature of AI technology, for under the ownership of private companies, the technology is less publicly accessible for inspection [ 2 , 48 , 51 , 68 ]. Further to these questions, a debate remains as to whether or not HCPs ought to be held responsible for the errors of AI in the healthcare setting, particularly with regard to errors in diagnostic and treatment decisions [ 41 , 45 , 57 , 65 ]. Several records put forward the view that, because HCPs are legally and professionally responsible for making decisions in their patient’s health interests, they bear responsibility for the consequences of decisions aided by AI technology [ 46 , 47 , 50 , 59 , 67 , 69 , 70 ]. However, records underlined the responsibility of manufacturers of AI systems for ensuring the quality of AI systems, including safety and effectiveness [ 47 , 59 , 71 , 72 ], and for being responsive to the needs and characteristics of specific patient populations [ 72 ].

Beyond the clinical environment, issues of accountability arose in the context of using care robots. Related questions revolved around the burden of responsibility if an AI-enabled robotic care receiver is, for example, harmed by a robotic care provider [ 2 , 73 ]. Is the burden of responsibility for such harm on the robot manufacturer who wrote the learning algorithm [ 73 ]? Similarly, the question arose of who is to be held accountable if a care receiver takes their own life or the life of another under the watch of a care robot [ 46 ]. If a care robot is considered an autonomous agent, should this incident then be the responsibility of the robot [ 46 ]? While proposed solutions to accountability challenges were few, one suggestion offered included building in a machine learning accountability mechanism into AI algorithms that could themselves perform black box audits to ensure they are privacy neutral (45, p.18). Also suggested was appropriate training of engineers and developers on issues of accountability, privacy, and ethics, and the introduction of national regulatory bodies to ensure AI systems have appropriate transparency and accountability mechanisms [ 45 ].

Where the above findings on accountability relate more to the “answerability” of AI’s potentially adverse impacts, responsibility was also present in the literature with regard to AI design and governance, albeit far less so. To promote responsible AI, governments were described as holding responsibility for developing policy to address ethical, social, and legal issues, including research and development of AI technologies, and for regulatory oversight [ 60 , 74 , 75 ]. Records also suggested that policymakers seek to understand public perceptions of the use of AI in health [ 75 ] and to ensure that AI technologies are distributed equally [ 74 ]. One article drew attention to the risk of exacerbating health inequities as a result of the unequal distribution of AI, particularly where AI applications are increasingly being used by patients for the self-management of their health [ 76 ]. While there was little mention of corporate responsibility, a small number of articles alluded to commercial strategies for responsible innovation [ 54 , 55 , 77 ]. Some such strategies included identifying where bias manifests and how and by whom it is managed; and being transparent in how the algorithm has been used (e.g. using a training dataset or in a real-world setting) and what type of learning the model is built for (e.g. supervised or unsupervised learning, etc.) [ 55 ]. Other suggestions included having AI manufacturing companies monitor the use of their systems in various contexts after being deployed [ 77 ], and to have AI research and development involve ‘human participation’ to ensure its conscientious development (54, p.10).

Adverse consequences of bias

Bias was yet another transcending ethical theme within the literature, notably the potential bias embedded within algorithms [ 43 , 54 , 59 , 64 , 68 , 71 , 77 , 78 , 79 ], and within the data used to train algorithms [ 43 , 45 , 49 , 51 , 55 , 59 , 60 , 61 , 63 , 64 , 68 , 73 , 77 , 77 , 78 , 80 , 81 , 82 , 83 , 84 ]. The prevailing concern with algorithms was that they are developed by humans, who are by nature fallible, and subverted by their own values and implicit biases [ 68 , 79 ]. These values have been noted to often reflect those that are societally endemic, and if carried into the design of AI algorithms, could consequently produce outputs that advantage certain population groups over others [ 43 , 51 , 54 , 59 , 63 , 68 , 71 , 77 , 77 , 81 ]. Bias was indicated to similarly manifest in the data relied upon to train AI algorithms, by way of inaccurate and incomplete datasets [ 48 , 51 , 63 , 81 , 84 ], or by unrepresentative data sets [ 43 , 82 , 83 ], thus rendering AI outputs ungeneralizable to the population unto which it is applied [ 51 , 68 , 81 ].

Not only have biased data sets been noted to potentially perpetuate systemic inequities based on race, gender identity, and other demographic characteristics [ 48 , 51 , 59 , 63 , 68 , 78 ], they may limit the performance of AI as a diagnostic and treatment tool due to the lack of generalizability highlighted above [ 43 , 48 , 83 ]. In contrast, some noted the potential for AI to mitigate existing bias within healthcare systems. Examples of this potential include reducing human error [ 50 ]; mitigating the cognitive biases of HCPs in determining treatment decisions, such as recency, anchoring, or availability biases [ 45 , 51 ]; and reducing biases that may be present within healthcare research and public health databases [ 48 ]. Suggestions to address the issue of bias included building AI systems to reflect current ethical healthcare standards [ 78 ], and ensuring a multidisciplinary and participatory approach to AI design and deployment [ 79 ].

Specific ethical themes by AI application in health

Three health applications were emphasized in the reviewed literature: care robots, diagnostics, and precision medicine. Each health application raised unique ethical issues and considerations.

Care robotics

A notable concern for the use of care robots was the social isolation of care recipients, with care robots potentially replacing the provision of human care [ 41 , 61 , 72 , 85 , 86 , 87 , 88 , 89 ]. Some asserted that the introduction of care robots would reduce the amount of human contact care recipients would receive from family, friends, and human care providers [ 41 , 61 , 72 , 85 , 87 , 88 , 89 ]. Implications of this included increased stress, higher likelihood of dementia, and other such impacts on the well-being of care recipients [ 41 ]. Others, in contrast, viewed robots as an opportunity to increase the “social” interaction that already isolated individuals may experience [ 41 , 85 , 90 , 91 ]. Care robots could, for example, offer opportunities for care recipients to maintain interactive skills [ 91 ], and increase the amount of time human care providers spend having meaningful interactions with those they are caring for [ 85 ] as opposed to being preoccupied with routine tasks. Yet despite these opportunities, of note was the idea that care robots risk deceiving care recipients into having them believe that the robots are ‘real’ care providers and companions [ 41 , 46 , 72 , 85 , 87 , 88 , 92 , 93 , 94 ], which could undermine the preservation and promotion of human dignity [ 41 , 92 ].

The issue of deception often linked to the question of ‘good care’, what the criteria for good care are, and whether robots are capable of providing it. In the context of deceit, some considered it justified as long as the care robot allows recipients to achieve and enhance their human capabilities [ 93 , 95 ]. Also challenged was the assumption that good care is contingent upon humans providing it [ 46 , 93 , 96 ], for while robots may not be able to provide reciprocal emotional support [ 93 ], humans similarly may fail to do so [ 96 ]. A further illustrated aspect of good care was the preservation and advancement of human dignity [ 93 ], support for which can be offered by robots insofar as they promote individual autonomy [ 41 , 61 , 73 , 85 , 87 , 88 ]. Some, however, contested this, arguing that care robots may in fact reduce a person’s autonomy if the technology is too difficult to use [ 87 ]; if the robot supersedes one’s right to make decisions based on calculations of what it thinks is best [ 61 ]; and because the implementation of robots may lead to the infantilization of care recipients, making them feel as though they are being treated like children [ 88 ]. The promotion of autonomy also appeared controversial, acknowledged at times as the pre-eminent value for which robots ought to promote [ 73 , 91 ], where at others, autonomy was in tension with the safety of the care recipient [ 41 , 91 ]. For example, with the introduction of care robots, care recipients might choose to engage in unsafe behaviours in pursuit of, and as a result of, their new independence [ 41 , 91 ]. A comparable tension existed in the literature between the safety of care recipients, which some believe care robots protect, and the infringement on the recipient’s physical, and information privacy [ 41 , 46 , 88 , 91 , 97 , 98 ].

Diagnostics

Diagnostics was an area that also garnered significant attention with regard to ethics. Of note was the ‘black box’ nature of machine learning processes ( 36 , 45 , 51 , 63 , 74 , 80 , 99 , 100 ), frequently mentioned with a HCP’s inability to scrutinize the output [ 44 , 51 , 63 , 74 ]. Acknowledging that the more advanced the AI system, the more difficult it is to discern its functioning [ 99 ], there was also a concern that due to the difficulty in understanding how and why a machine learning program produces an output, there is a risk of encountering biased outputs [ 80 ]. Thus, despite the challenge of navigating these opaque AI systems, there was a call for said systems to be explainable in order to ensure responsible AI [ 45 , 80 ]. Also a pervasive theme was the replacement and augmentation of the health workforce, particularly physicians, as a result of AI’s role in diagnostics [ 44 , 59 , 63 , 100 , 101 ]. While few feared the full replacement of physicians in diagnostics [ 2 , 63 , 100 ], some expected its presence to actually enhance the effectiveness and efficiency of their work [ 63 , 100 ]. There were expressed concerns, however, about how the roles and interactions of physicians may change with its introduction, such as the ethical dilemma encountered if a machine learning algorithm is inconsistent with the HCP’s recommendation, if it contradicts a patient’s account of their own condition, or if it fails to consider patients’ non-verbal communication and social context [ 59 ].

Precision medicine

Issues of bias persisted in discussions of precision medicine, with the recognition that biased data sets, such as those that exclude certain patient populations, can produce inaccurate predictions that in turn can have unfair consequences for patients [ 81 ]. While precision medicine was a less prominent theme than the aforementioned AI applications, questions of the accuracy of predictive health information from the intersection of AI and genomics arose, as did an uncertainty of where and by whom that data may then be used [ 102 ]. In the case of AI-assisted gene editing, deep learning holds potential for directing experts where in the human genome to use gene editing technologies such as CRISPR, to reduce an individual’s risk of contracting a genetic disease or disorder [ 25 ]. However, deep learning models cannot discern the moral difference between gene editing for health optimization, and gene editing for human enhancement more generally, which may blur ethical lines [ 25 ]. A further tension existed in how the technology is deployed to support human choices; for example if a person not only seeks gene editing to reduce their risk of inheriting a particular genetic disease, but to also increase their muscle mass, obtain a particular personality trait, or enhance their musical ability [ 25 ]. Also illuminated was the implications of AI-enabled precision medicine in the global north versus the global south [ 103 ]. First is the possibility that this technology, given its high associated costs and greater accessibility in the developed world, might leave LMICs behind [ 103 ]. Second was the awareness that the introduction of genetic testing may undermine low cost, scalable and effective public health measures, which should remain central to global health [ 103 ].

Gaps in the literature

Healthcare was the predominant focus in the ethics literature on AI applications in health, with the ethics of AI in public health largely absent from the literature reviewed. One article that did illuminate ethical considerations for AI in public health highlighted the use of AI in environmental monitoring, motor vehicle crash prediction, fall detection, spatial profiling, and infectious disease outbreak detection, among other purposes, with the dominant ethical themes linking to data privacy, bias, and ‘black box’ machine learning models [ 82 ]. Other articles that mentioned public health similarly illustrated infectious disease outbreak predictions and monitoring [ 61 , 84 , 104 ], tracking communicable diseases [ 104 ], mental health research [ 105 ], and health behaviour promotion and management [ 59 , 104 ]. However, these applications were only briefly mentioned in the broader context of primary healthcare, and few spoke to the ethics of these applications [ 59 , 105 , 106 ].

In the literature reviewed, there were also evident gaps in the area of global health, with few considerations of the unique ethical challenges AI poses for LMICs. Though there was mention of utilizing AI for screening in rural India [ 45 ]; genomics research in China [ 25 ]; facial recognition to detect malnutrition in Kenya [ 80 ]; and precision medicine in LMICs more broadly [ 103 ], among others, there was a significant gap in the literature commenting on the ethics of these practices in the global south. Furthermore, there was little discussion of health equity, including how the use of AI may perpetuate or exacerbate current gaps in health outcomes between and within countries. Instead, references to “global” health were often limited to global investments in AI research and development (R&D), and a number of innovations currently underway in HICs [ 25 , 41 , 49 , 59 , 73 , 90 , 107 , 108 , 109 ]. The lack of focus on global health was further reflected in the primary authorship of the literature, with a mere 5.8% (n = 6) of the reviewed literature authored by individuals from LMICs. Furthermore, 33% (n = 34) of articles had primary authorship from non-English speaking countries, which indicates that while the discourse of AI is indeed global in scope, it may only be reaching an Anglo-Saxon readership, or at the very least, an educated readership.

Summary of evidence

Cross-cutting themes and asymmetries.

In this scoping review we identified 103 records (81 academic articles and 22 grey literature articles) that addressed the ethics of AI within health, up to April 2018. Illustrated in the literature reviewed were overarching ethical concerns about privacy, trust, accountability, and bias, each of which were both interdependent and mutually reinforcing. Accountability, for instance, was a noted concern when considering who ought to bear responsibility for AI errors in patient diagnoses [ 63 , 65 , 66 ], while also a recognized issue in protecting patient privacy within data sharing partnerships [ 59 ]. The security of confidential patient data, in turn, was identified as critical for eliciting patient trust in the use of AI technology for health [ 2 ]. One suggestion offered to combat the threat to citizen trust in AI is through an inclusive development process [ 64 ], a process which has also been proposed to mitigate bias integrated into algorithm development [ 79 ]. It is therefore clear from our review that the aforementioned ethical themes cannot be considered in isolation, but rather must be viewed in relation to one another when considering the ethics of AI in health.

These broad ethical themes of privacy and security, accountability and responsibility, bias, and trust have also been revealed in other reviews. In a mapping review by Morley et al. [ 110 ] on AI in healthcare, for instance, concerns of trust, ‘traceability’ (aligning with what we have labelled ‘accountability’), and bias emerged. While privacy and security were explicitly excluded from their review [ 110 ], these very issues were a significant finding in a systematic review by Stahl et al. [ 111 ], both with regard to data privacy and personal (or physical) privacy. Issues of the autonomy and agency of AI machines, the challenge of trusting algorithms (linked with their lack of transparency), as well as others that were more closely associated with non-AI computing technologies were also discussed [ 111 ]. While the precise labels of ethical themes differed across these reviews based on the authors’ analytic approach, the general challenges were common across them, and indeed, intimately interconnected. It is clear also that these broad ethical themes are not unique to health, but rather transcend multiple sectors, including policing, transportation, military operations, media, and journalism [ 112 , 113 ].

An asymmetry in the literature was the predominant focus on the ethics of AI in healthcare, with less attention granted to public health, including its core functions of health promotion, disease prevention, public health surveillance, and health system planning from a population health perspective. Yet in the age of ubiquitous computing, data privacy for use in public health surveillance and interventions will be all the more critical to secure, as will ensuring that individuals and communities without access to the latest technologies are not absent from these initiatives. In a recent article, Blasimme and Vayena [ 114 ] touched upon issues of consent when employing AI-driven social media analysis for digital epidemiology; the ethics of ‘nudging’ people towards healthier behaviours using AI technology; and developing paternalistic interventions tailored to marginalized populations. These public health issues and others merit further exploration within the ethics literature, particularly given how powerful such AI applications can be when applied at a population level. From an alternative perspective, the increasing presence of AI within healthcare may in some respects pose a risk to public health, with an expressed concern that the ‘hype’ around AI in healthcare may redirect attention and resources away from proven public health interventions [ 103 , 115 ]. Similarly absent in the literature was a public health lens to the issues presented, a lens which rests on a foundation of social justice to “enable all people to lead fulfilling lives” [ 116 ]. With respect to jobs, for example, the pervasive discourse around care robots in the literature suggests that there may be a wave of robots soon to replace human caregivers of the sick, elderly, and disabled. Despite this recognition, however, the focus was solely on the impact on patients, and there was little mention given to those caregivers whose jobs may soon be threatened. This is true also for other low-wage workers within health systems at large, despite the fact that unemployment is frequently accompanied by adverse health effects.

A second asymmetry in the literature was the focus on HICs, and a notable gap in discourse at the intersection of ethics, AI, and health within LMICs. Some articles mentioned the challenges of implementing the technology in low-resource settings [ 25 , 45 , 80 , 102 , 103 , 106 ], and whether its introduction will further widen the development gaps between HICs and LMICs [ 102 ], however absent in most was the integration of ethics and/or health. Yet AI is increasingly being deployed in the global south; to predict dengue fever hotspots in Malaysia [ 59 ], to predict birth asphyxia in LMICs at large [ 36 ], and to increase access to primary screening in remote communities in India [ 45 ], to name a few examples. Despite these advancements, in LMIC contexts there are challenges around collecting data from individuals without financial or geographic access to health services, data upon which AI systems rely [ 36 , 80 ], and a further challenge of storing data electronically [ 80 ]. The United States Agency for International Development (USAID) and the Rockefeller Foundation [ 117 ] have recently illuminated some additional considerations for the deployment of AI in LMICs, one in particular being the hesitancy of governments and health practitioners to share digital health data for concern that it could be used against them, as digitizing health data is often quite politicized for actors on the ground. Given the infancy of these discussions, however, there is far more work to be done in order to critically and collaboratively examine the ethical implications of AI for health in all corners of the world, to ensure that AI contributes to improving, rather than exacerbating health and social inequities.

Towards ethical AI for health: what is needed?

Inclusive and participatory discourse and development of ethical AI for health was commonly recommended in the literature to mitigate bias [ 79 ], ensure the benefits of AI are shared widely [ 59 , 74 , 79 , 80 ], and to increase citizens’ understanding and trust in the technology [ 47 , 59 , 64 ]. However, those leading the discussion on the ethics of AI in health seldom mentioned engagement with the end users and beneficiaries whose voices they were representing. While much attention was given to the impacts of AI health applications on underserved populations, only a handful of records actually included primary accounts from the people for whom they were raising concerns [ 2 , 59 , 75 , 94 , 118 , 119 ]. Yet without better understanding the perspectives of end users, we risk confining the ethics discourse to the hypothetical, devoid of the realities of everyday life. This was illustrated, for instance, when participants in aged care challenged the ethical issue of care robots being considered deceptive, by stating that despite these concerns, they preferred a care robot over a human caregiver [ 94 ]. We therefore cannot rely on our predictions of the ethical challenges around AI in health without hearing from a broader mosaic of voices. In echoing recommendations from the literature, there is an evident need to gain greater clarity on public perceptions of AI applications for health, what ethical concerns end-users and beneficiaries have, and how best they can be addressed with the input of these individuals and communities. This recommendation is well aligned with the current discourse on the responsible innovation of AI, an important dimension of which involves the inclusion of new voices in discussions of the process and outcomes of AI [ 120 ].

In addition to taking a participatory approach to AI development, there is a responsibility for all parties to ensure its ethical deployment. For instance, it should be the responsibility of the producers of AI technology to advise end users, such as HCPs, as to the limits of its generalizability, just as should be done with any other diagnostic or similar technology. There is a similar responsibility for the end user to apply discretion with regards to the ethical and social implications of the technology they are using. This viewpoint is shared by Bonderman [ 121 ], who asserts that when physicians deploy AI during patient diagnoses, for instance, it is important that they remain in control, and retain the authority to override algorithms when they have certainty the algorithm outputs are incorrect [ 122 ]. Ahuja [ 122 ] compliments this assertion by stating how, since machine learning and deep learning require large quantities of data, said systems can underperform when presented with novel cases, such as atypical side effects or resistance to treatment. Simply stated, we must be critical and discretionary with regards to the application of AI in scenarios where human health and wellbeing are concerned, and we must not simply defer to AI outputs.

Also in need of critical reflection, as it remains unresolved in the literature, is how to appropriately and responsibly govern this technology [ 25 , 45 , 49 , 52 , 57 , 102 ]. While there were hints in the literature regarding how to promote responsible AI, such as equal distribution of the technology, corporate transparency, and participatory development, there was little on how these recommendations could be optimally secured through regulatory mechanisms and infrastructure. The infusion of AI into health systems appears inevitable, and as such, we need to reconsider our existing regulatory frameworks for disruptive health technologies, and perhaps deliberate something new entirely. Given the challenge that many have termed the ‘black box’, illustrative of the fact that, on the one hand, AI processes operate at a level of complexity beyond the comprehension of many end-users, and on the other, neural networks are by nature opaque, the issue of governance is particularly salient. Never before has the world encountered technology that can learn from the information it is exposed to, and in theory, become entirely autonomous. Even the concept of AI is somewhat nebulous [ 2 , 59 , 123 , 124 ], which threatens to cloud our ability to govern its use. These challenges are compounded by those of jurisdictional boundaries for AI governance, an ever-increasing issue given the global ‘race’ towards international leadership in AI development [ 125 ]. Thirty-eight national and international governing bodies have established or are developing AI strategies, with no two the same [ 125 , 126 ]. Given that the pursuit of AI for development is a global endeavour, this calls for governance mechanisms that are global in scope. However, such mechanisms require careful consideration in order for countries to comply, especially considering differences in national data frameworks that pre-empt AI [ 49 ]. These types of jurisdictional differences will impact the ethical development of AI for health, and it is thus important that academic researchers contribute to the discussion on how a global governance mechanism can address ethical, legal, cultural, and regulatory discrepancies between countries involved in the AI race.

Limitations

One potential limitation to this study is that given the field of AI is evolving at an unprecedented rate [ 1 ], there is a possibility that new records in the academic and grey literatures will have been published after the conclusion of our search, and prior to publication. Some recent examples of related articles have very much been in line with our findings, drawing light to many of the pertinent ethical issues of AI in healthcare discussed in the literature reviewed [ 18 , 127 , 128 , 129 , 130 , 131 , 132 ]. Few, however, appear to have discussed the ethical application of AI in LMICs [ 117 , 133 ] or public health [ 117 , 130 ], so despite any new literature that may have arisen, there is still further work to be done in these areas. Furthermore, given our search strategy was limited to the English language, we may have missed valuable insights from publications written in other languages. The potential impact on our results is that we underrepresented the authorship from LMICs, and underreported the amount of literature on the ethics of AI within the context of LMICs. Furthermore, by not engaging with literature in other languages, we risk contradicting recommendations for an inclusive approach to the ethics discourse. Indeed, we may be missing important perspectives from a number of country and cultural contexts that could improve the ethical development and application of AI in health globally. To address this limitation, future researchers could collaborate with global partner organizations, such as WHO regional offices, in order to gain access to literatures which would otherwise be inaccessible to research teams. An additional limitation lies in our grey literature search. As part of a systematic search strategy, we pursued targeted website searches in order to identify any literature that did not emerge from our grey literature database and customized Google searches. These websites were chosen based on the expert knowledge of the research team, as well as stakeholders operating within the AI space, however there is a chance that additional relevant websites, and thus reports, proceedings, and other documents, exist beyond what was included in this review. Nevertheless, this scoping review offers a comprehensive overview of the current literature on the ethics of AI in health, from a global health perspective, and provides a valuable direction for further research at this intersection.

The ethical issues surrounding the introduction of AI into health and health systems are both vast and complex. Issues of privacy and security, trust, bias, and accountability and responsibility have dominated the ethical discourse to date with regard to AI and health, and as this technology is increasingly taken to scale, there will undoubtedly be more that arise. This holds particularly true with the introduction of AI in public health, and within LMICs, given that these areas of study have been largely omitted from the ethics literature. AI is being developed and implemented worldwide, and without considering what it means for populations at large, and particularly those who are hardest to reach, we risk leaving behind those who are already the most underserved. Thus, the dearth of literature on the ethics of AI within public health and LMICs points to a critical need to devote further research in these areas. Indeed, a greater concentration of ethics research into AI and health is required for all of its many applications. AI has the potential to help actualize universal health coverage, reduce health, social, and economic inequities, and improve health outcomes on a global scale. However, the bourgeoning field of AI is outpacing our ability to adequately understand its implications, much less to regulate its responsible design, development, and use for health. Given the relatively uncharted territory of AI in health, we must be diligent to both consider and respond to the ethical implications of its implementation, and whether if in every case it is indeed ethical at all. Amidst the tremendous potential that AI carries, it is important to approach its introduction with a degree of cautious optimism, informed by an extensive body of ethics research, to ensure its development and implementation is ethical for everyone, everywhere.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

Robots for the care of the sick, elderly, or disabled bore a number of different labels in the literature, however they will herein be described as ‘care robots’ in an effort to broadly discuss the associated ethical challenges. ‘Care robots’ as used in this context are exclusive of surgical robots. Only those care robots that relied on AI are discussed, such as those that can understand commands, can locate and pick up objects, relocate a patient, and other tasks that require machine intelligence.

Abbreviations

  • Artificial intelligence

Canadian Institutes of Health Research

Sustainable Development Goal(s)

Low- and middle-income country/ies

High-income country/ies

Health care professionals

Research and development

Clustered regularly interspaced short palindromic repeats

Schwab K. The Fourth Industrial Revolution: what it means and how to respond. World Economic Forum. 2016 [cited 2020 Sep 23]. https://www.weforum.org/agenda/2016/01/the-fourth-industrial-revolution-what-it-means-and-how-to-respond/ .

AI in the UK: ready, willing and able? United Kingdom: authority of the house of lords; 2018. (Intelligence SCoA, editor). https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/100.pdf .

Ravi D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, et al. Deep learning for health informatics. IEEE J Biomed Health Inform. 2017;21(1):4–21.

Article   Google Scholar  

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

Future of Life Institute. National and international AI strategies. n.d. https://futureoflife.org/national-international-ai-strategies/ .

The Canadian Institute for Advanced Research 4. CIFAR Pan-Canadian Artificial Intelligence Strategy. 2020. https://www.cifar.ca/ai/pan-canadian-artificial-intelligence-strategy .

Price Waterhouse Cooper. Sizing the prize: What’s the real value of AI for your business and how can you capitalise? 2017. https://www.pwc.com/gx/en/issues/data-and-analytics/publications/artificial-intelligence-study.html .

Bossmann J. Top 9 ethical issues in artificial intelligence. World Economic Forum. https://www.weforum.org/agenda/2016/10/top-10-ethical-issues-in-artificial-intelligence/ .

Gibney E. The battle for ethical AI at the world’s biggest machine-learning conference. Nature. 2020;577(7792):609–609.

Ouchchy L, Coin A, Dubljević V. AI in the headlines: the portrayal of the ethical issues of artificial intelligence in the media. AI Soc. 2020. https://doi.org/10.1007/s00146-020-00965-5 .

Floridi L. Soft ethics: its application to the general data protection regulation and its dual advantage. Philos Technol. 2018;31(2):163–7. https://doi.org/10.1007/s13347-018-0315-5 .

Partnership on AI. Partnership on AI. 2020. https://www.partnershiponai.org/ .

OpenAI. OpenAI. https://openai.com/ .

Responsible Robotics. Responsible robotics: accountable innovation for the humans behind the robots. https://responsiblerobotics.org/ .

AI Ethics Initiative. The ethics and governance of artificial intelligence initiative. n.d. https://aiethicsinitiative.org .

Université de Montréal. Montréal declaration for a responsible development of artificial intelligence. p. 4–12. https://5dcfa4bd-f73a-4de5-94d8-c010ee777609.filesusr.com/ugd/ebc3a3_506ea08298cd4f8196635545a16b071d.pdf .

Fairness, Accountability, and Transparency in Machine Learning 12. Principles for accountable algorithms and a social impact statement for algorithms. n.d. https://www.fatml.org/resources/principles-for-accountable-algorithms .

Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines. Nat Mach Intell. 2019;1(9):389–99.

Social Sciences and Humanities Research Council 14. Canada-UK Artificial Intelligence Initiative. 2020. https://www.sshrc-crsh.gc.ca/funding-financement/programs-programmes/canada-uk_ai/index-eng.aspx .

Canadian Institute for Advanced Research 15. Canada, France, UK launch research workshops exploring societal implications of Artificial intelligence. 2019. https://www.cifar.ca/cifarnews/2019/04/15/canada-france-uk-launch-research-workshops-exploring-societal-implications-of-artificial-intelligence .

Canadian Institute for Advanced Research 17. AI & Society. CIFAR. https://www.cifar.ca/ai/ai-society .

Wellcome Trust. The ethical, social and political challenges of using artificial intelligence in healthcare|Wellcome. https://wellcome.org/grant-funding/ethical-social-and-political-challenges-using-artificial-intelligence-healthcare .

Bhagyashree SIR, Nagaraj K, Prince M, Fall CHD, Krishna M. Diagnosis of dementia by machine learning methods in epidemiological studies: a pilot exploratory study from south India. Soc Psychiatry Psychiatr Epidemiol. 2018;53(1):77–86. https://doi.org/10.1007/s00127-017-1410-0 .

Zhang X, Pérez-Stable EJ, Bourne PE, Peprah E, Duru OK, Breen N, et al. Big data science: opportunities and challenges to address minority health and health disparities in the 21st century. Ethn Dis. 2017;27(2):95.

Johnson W, Pauwels E. How to optimize human biology:27. https://www.wilsoncenter.org/sites/default/files/media/documents/publication/how_to_optimize_human_biology.pdf .

Ghebreyesus T. Artificial intelligence for good global summit. World Health Organization. http://www.who.int/dg/speeches/2018/artificial-intelligence-summit/en/ .

World Health Organization. WHO guideline recommendations on digital interventions for health systems strengthening. 2019. 123 p. http://www.ncbi.nlm.nih.gov/books/NBK541902/ .

Chan M. Opening remarks at the artificial intelligence for good global summit. World Health Organization. 2017. https://www.who.int/dg/speeches/2017/artificial-intelligence-summit/en/ .

Koplan JP, Bond TC, Merson MH, Reddy KS, Rodriguez MH, Sewankambo NK, et al. Towards a common definition of global health. The Lancet. 2009;373(9679):1993–5.

Ghosh S, Mitra I, Nandy P, Bhattacharya U, Dutta U, Kakar D. Artificial intelligence in India—hype or reality: impact of artificial intelligence across industries and user groups.. Price Waterhouse Cooper India; 2018. p. 1–32. https://www.pwc.in/assets/pdfs/consulting/technology/data-and-analytics/artificial-intelligence-in-india-hype-or-reality/artificial-intelligence-in-india-hype-or-reality.pdf .

A blueprint for the future of AI: 2018–2019. Brookings Institute. 2020. https://www.brookings.edu/series/a-blueprint-for-the-future-of-ai/ .

United Nations. Sustainable Development Goal 3: ensure healthy lives and promote well-being for all at all ages. n.d. https://sdgs.un.org/goals/goal3 .

United Nations. Sustainable development goal 10: reduce inequality within and among countries. n.d. https://sdgs.un.org/goals/goal10 .

United Nations Committee for Development Policy. Leaving no on behind. 2018. p. 1–4. https://sustainabledevelopment.un.org/content/documents/2754713_July_PM_2._Leaving_no_one_behind_Summary_from_UN_Committee_for_Development_Policy.pdf .

Vinuesa R, Azizpour H, Leite I, Balaam M, Dignum V, Domisch S, et al. The role of artificial intelligence in achieving the sustainable development goals. Nat Commun. 2020;11(1):233.

Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Health. 2018;3(4):e000798. https://doi.org/10.1136/bmjgh-2018-000798 .

Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implement Sci. 2010;5(1):69. https://doi.org/10.1186/1748-5908-5-69 .

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467. https://doi.org/10.7326/M18-0850 .

Godin K, Stapleton J, Kirkpatrick SI, Hanning RM, Leatherdale ST. Applying systematic review search methods to the grey literature: a case study examining guidelines for school-based breakfast programs in Canada. Syst Rev. 2015;4(1):138.

Thomas J, Harden A. Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Med Res Methodol. 2008;8(1):45.

Sharkey A, Sharkey N. Granny and the robots: ethical issues in robot care for the elderly. Ethics Inf Technol. 2012;14(1):27–40. https://doi.org/10.1007/s10676-010-9234-6 .

West DM. What happens if robots take the jobs? The impact of emerging technologies on employment and public policy. Center for Technology Innovation at Brookings. 2015. https://www.brookings.edu/wp-content/uploads/2016/06/robotwork.pdf .

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15(141):20170387. https://doi.org/10.1098/rsif.2017.0387 .

Infosys Limited. AI for healthcare: balancing efficiency and ethics. 2018;14. https://www.infosys.com/smart-automation/docpdf/ai-healthcare.pdf .

Paul Y, Hickok E, Sinha A, Tiwari U, Bidare PM. Artificial intelligence in the healthcare industry in India. 2018;45. https://cis-india.org/internet-governance/files/ai-and-healtchare-report .

Luxton DD. Recommendations for the ethical use and design of artificial intelligent care providers. Artif Intell Med. 2014;62(1):1–10.

Suominen H, Lehtikunnas T, Back B, Karsten H, Salakoski T, Salanterä S. Applying language technology to nursing documents: pros and cons with a focus on ethics. Int J Med Inf. 2007;76:S293-301.

Crawford K, Whittaker M. The AI now report: the social and economic implications of artificial intelligence technologies in the near-term. White House and New York University’s Information Law Institute; (AI Now Public Symposium 2016).

Denton S, Pauwels E, He Y, Johnson W. There’s nowhere to hide: Artificial intelligence and privacy in the fourth industrial revolution. Wilson Center, Synenergene, and the Institute for Philosophy & Public Policy. 2018. https://iapp.org/media/pdf/resource_center/ai_and_privacy.pdf .

Markowetz A, Błaszkiewicz K, Montag C, Switala C, Schlaepfer TE. Psycho-informatics: big data shaping modern psychometrics. Med Hypotheses. 2014;82(4):405–11.

AI Now 2016 Symposium: The social implications of artificial intelligence technologies in the near-term. New York. 2016. (AI Now 2016 Primers.). https://ainowinstitute.org/AI_Now_2016_Primers.pdf .

Bollier D. The promise and challenge of integrating AI into cars. Healthcare and journalism: a report on the inaugural aspen institute roundtable on artificial intelligence. United States: The Aspen Institute. 2017. https://assets.aspeninstitute.org/content/uploads/2017/01/2017-Artificial-Intelligence-REP-FINAL.pdf .

Kantarjian H, Yu PP. Artificial intelligence, big data, and cancer. JAMA Oncol. 2015;1(5):573. https://doi.org/10.1001/jamaoncol.2015.1203 .

Bowser A, Sloan M, Michelucci P, Pauwels E. Artificial intelligence: a policy-oriented introduction. Wilson Center. 2017. https://www.wilsoncenter.org/sites/default/files/media/documents/publication/wilson_center_policy_brief_artificial_intelligence.pdf .

UK Government. Guidance: initial code of conduct for data-driven health and care technology. United Kingdom: Department of Health and Social Care, editor. 2018. https://www.gov.uk/government/publications/code-of-conduct-for-data-driven-health-and-care-technology/initial-code-of-conduct-for-data-driven-health-and-care-technology .

Hengstler M, Enkel E, Duelli S. Applied artificial intelligence and trust—the case of autonomous vehicles and medical assistance devices. Technol Forecast Soc Change. 2016;105:105–20.

Mohandas S, Ranganathan R. AI and healthcare in India: looking forward roundtable report. India: The Centre for Internet and Society, India. 2017. https://cis-india.org/internet-governance/files/ai-and-healthcare-report .

Powles J, Hodson H. Google DeepMind and healthcare in an age of algorithms. Health Technol. 2017;7(4):351–67. https://doi.org/10.1007/s12553-017-0179-1 .

Fenech M, Strukelj N, Buston O. Ethical, social, and political challenges of artificial intelligence in health. Future advocacy & wellcome trust. https://wellcome.ac.uk/sites/default/files/ai-in-health-ethical-social-political-challenges.pdf .

Kenneth K, Eggleton A. Challenge ahead: Integrating robotics, artificial intelligence and 3D printing technologies into Canada’s healthcare systems. 2017. p. 1–44. (The Standing Senate Committee on Social Affairs SaT, editor). https://sencanada.ca/content/sen/committee/421/SOCI/reports/RoboticsAI3DFinal_Web_e.pdf .

Nuffield Council on Bioethics. 50. Artificial intelligence (AI) in healthcare and research. 2018. https://www.nuffieldbioethics.org/publications/ai-in-healthcare-and-research .

Boissoneault J, Sevel L, Letzen J, Robinson M, Staud R. Biomarkers for musculoskeletal pain conditions: use of brain imaging and machine learning. Curr Rheumatol Rep. 2017;19(1):5. https://doi.org/10.1007/s11926-017-0629-9 .

Verghese A, Shah NH, Harrington RA. What this computer needs is a physician: humanism and artificial intelligence. JAMA. 2018;319(1):19. https://doi.org/10.1001/jama.2017.19198 .

Stone P, Brooks R, Brynjolfsson E, Calo R, Etzioni O, Hager O, et al. Artificial Intelligence and Life in 2030. One hundred year study on artificial intelligence: report of the 2015–2016 study panel. https://ai100.stanford.edu/sites/g/files/sbiybj9861/f/ai_100_report_0831fnl.pdf .

McBee MP, Awan OA, Colucci AT, Ghobadi CW, Kadom N, Kansagra AP, et al. Deep learning in radiology. Acad Radiol. 2018;25(11):1472–80.

Mesko B. The role of artificial intelligence in precision medicine. Expert Rev Precis Med Drug Dev. 2017;2(5):239–41. https://doi.org/10.1080/23808993.2017.1380516 .

Siqueira-Batista R, Souza CR, Maia PM, Siqueira SL. Robotic surgery: bioethical aspects. ABCD Arq Bras Cir Dig São Paulo. 2016;29(4):287–90.

Monteith S, Glenn T. Automated decision-making and big data: concerns for people with mental illness. Curr Psychiatry Rep. 2016;18(12):112. https://doi.org/10.1007/s11920-016-0746-6 .

Hope M. Computer-aided medicine: present and future issues of liability. Comput Law J. 1989;9(2):177–203.

Google Scholar  

Balthazar P, Harri P, Prater A, Safdar NM. Protecting your patients’ interests in the era of big data, artificial intelligence, and predictive analytics. J Am Coll Radiol. 2018;15(3):580–6.

Yuste R, Goering S, Arcas BAY, Bi G, Carmena JM, Carter A, et al. Four ethical priorities for neurotechnologies and AI. Nature. 2017;551(7679):159–63.

O’Brolcháin F. Robots and people with dementia: Unintended consequences and moral hazard. Nurs Ethics. 2019;26(4):962–72. https://doi.org/10.1177/0969733017742960 .

Decker M. Caregiving robots and ethical reflection: the perspective of interdisciplinary technology assessment. AI Soc. 2008;22(3):315–30. https://doi.org/10.1007/s00146-007-0151-0 .

Russell S. Ethics of artificial intelligence. 521:415–8. https://www.nature.com/articles/521415a.pdf?origin=ppub .

Coeckelbergh M, Pop C, Simut R, Peca A, Pintea S, David D, et al. A survey of expectations about the role of robots in robot-assisted therapy for children with ASD: ethical acceptability, trust, sociability, appearance, and attachment. Sci Eng Ethics. 2016;22(1):47–65. https://doi.org/10.1007/s11948-015-9649-x .

Corbett J, d’Angelo C, Gangitano L, Freeman J. Future of health: findings from a survey of stakeholders on the future of health and healthcare in England. RAND Corporation. 2017. p. 1–90. https://www.rand.org/pubs/research_reports/RR2147.html .

Campolo A, Sanfilippo M, Whittaker M, Crawford K. AI now 2017 report. New York University. 2017. (AI Now 2017 Symposium and Workshop). https://ainowinstitute.org/AI_Now_2017_Report.pdf .

Char DS, Shah NH, Magnus D. Implementing machine learning in health care—addressing ethical challenges. N Engl J Med. 2018;378(11):981–3. https://doi.org/10.1056/NEJMp1714229 .

Howard A, Borenstein J. The ugly truth about ourselves and our robot creations: the problem of bias and social inequity. Sci Eng Ethics. 2018;24(5):1521–36. https://doi.org/10.1007/s11948-017-9975-2 .

International Telecommunications Union, XPrize. AI for good global summit report. AI for good global summit. Geneva, Switzerland; 2017. https://www.itu.int/en/ITU-T/AI/Documents/Report/AI_for_Good_Global_Summit_Report_2017.pdf .

Williams AM, Liu Y, Regner KR, Jotterand F, Liu P, Liang M. Artificial intelligence, physiological genomics, and precision medicine. Phys Genomics. 2018;50(4):237–43. https://doi.org/10.1152/physiolgenomics.00119.2017 .

Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39(1):95–112. https://doi.org/10.1146/annurev-publhealth-040617-014208 .

Senders JT, Zaki MM, Karhade AV, Chang B, Gormley WB, Broekman ML, et al. An introduction and overview of machine learning in neurosurgical care. Acta Neurochir (Wien). 2018;160(1):29–38. https://doi.org/10.1007/s00701-017-3385-8 .

Lee CH, Yoon H-J. Medical big data: promise and challenges. Kidney Res Clin Pract. 2017;36(1):3–11. https://doi.org/10.23876/j.krcp.2017.36.1.3 .

Borenstein J, Pearson Y. Robot caregivers: harbingers of expanded freedom for all? Ethics Inf Technol. 2010;12(3):277–88. https://doi.org/10.1007/s10676-010-9236-4 .

Sharkey A. Robots and human dignity: a consideration of the effects of robot care on the dignity of older people. Ethics Inf Technol. 2014;16(1):63–75. https://doi.org/10.1007/s10676-014-9338-5 .

Laitinen A, Niemelä M, Pirhonen J. Social robotics, elderly care, and human dignity: a recognition-theoretical approach. In: What social robots can and should do. IOS Press. 2016. https://doi.org/10.3233/978-1-61499-708-5-155

Vandemeulebroucke T, Dierckx de casterlé B, Gastmans C. The use of care robots in aged care: a systematic review of argument-based ethics literature. Arch Gerontol Geriatr. 2018;74:15–25.

Coeckelbergh M. Artificial agents, good care, and modernity. Theor Med Bioeth. 2015;36(4):265–77. https://doi.org/10.1007/s11017-015-9331-y .

Sharkey N, Sharkey A. The eldercare factory. Gerontology. 2012;58(3):282–8.

Sorell T, Draper H. Robot carers, ethics, and older people. Ethics Inf Technol. 2014;16(3):183–95. https://doi.org/10.1007/s10676-014-9344-7 .

Sparrow R, Sparrow L. In the hands of machines? The future of aged care. Minds Mach. 2006;16(2):141–61. https://doi.org/10.1007/s11023-006-9030-6 .

Coeckelbergh M. Health care, capabilities, and AI assistive technologies. Ethical Theory Moral Pract. 2010;13(2):181–90. https://doi.org/10.1007/s10677-009-9186-2 .

Wachsmuth I. Robots like me: challenges and ethical issues in aged care. Front Psychol. 2018;9:432. https://doi.org/10.3389/fpsyg.2018.00432/full .

Coeckelbergh M. Care robots and the future of ICT-mediated elderly care: a response to doom scenarios. AI Soc. 2016;31(4):455–62. https://doi.org/10.1007/s00146-015-0626-3 .

Gallagher A, Nåden D, Karterud D. Robots in elder care: some ethical questions. Nurs Ethics. 2016;23(4):369–71. https://doi.org/10.1177/0969733016647297 .

Vandemeulebroucke T, de Casterlé BD, Gastmans C. How do older adults experience and perceive socially assistive robots in aged care: a systematic review of qualitative evidence. Aging Ment Health. 2018;22(2):149–67. https://doi.org/10.1080/13607863.2017.1286455 .

Dahl T, Boulos M. Robots in health and social care: a complementary technology to home care and telehealthcare? Robotics. 2013;3(1):1–21.

Kohli M, Prevedello LM, Filice RW, Geis JR. Implementing machine learning in radiology practice and research. Am J Roentgenol. 2017;208(4):754–60. https://doi.org/10.2214/AJR.16.17224 .

Kruskal JB, Berkowitz S, Geis JR, Kim W, Nagy P, Dreyer K. Big data and machine learning—strategies for driving this bus: a summary of the 2016 intersociety summer conference. J Am Coll Radiol. 2017;14(6):811–7.

Vogel L. What, “learning” machines will mean for medicine. Can Med Assoc J. 2017;189(16):E615–6. https://doi.org/10.1503/cmaj.1095413 .

Ethically aligned design: a vision for prioritizing human well-being with autonomous and intelligent systems. The IEEE global initiative on ethics of autonomous and intelligent systems. n.d. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf .

Mentis A-FA, Pantelidi K, Dardiotis E, Hadjigeorgiou GM, Petinaki E. Precision medicine and global health: the good, the bad, and the ugly. Front Med. 2018;5:67. https://doi.org/10.3389/fmed.2018.00067/full .

Albrecht S, Bouchard B, Brownstein JS, Buckeridge DL, Caragea C, Carter KM, et al. Reports of the 2016 AAAI workshop program. AI Mag. 2016;37(3):99.

Conway M, O’Connor D. Social media, big data, and mental health: current advances and ethical implications. Curr Opin Psychol. 2016;9:77–82.

Flahault A, Geissbuhler A, Guessous I, Guerin PJ, Bolon I, Marcel S, et al. Precision global health in the digital age. Swiss Med Wkly. 2017;147(1314).

Huschilt J, Clune L. The use of socially assistive robots for dementia care. J Gerontol Nurs. 2012;38(10):15–9. https://doi.org/10.3928/00989134-20120911-02 .

Sharts-Hopko NC. The coming revolution in personal care robotics: what does it mean for nurses? Nurs Adm Q. 2014;38(1):5–12.

van Wynsberghe A. Designing robots for care: care centered value-sensitive design. Sci Eng Ethics. 2013;19(2):407–33. https://doi.org/10.1007/s11948-011-9343-6 .

Morley J, Machado CCV, Burr C, Cowls J, Joshi I, Taddeo M, et al. The ethics of AI in health care: a mapping review. Soc Sci Med. 2020;260:113172.

Stahl BC, Timmermans J, Mittelstadt BD. The ethics of computing: a survey of the computing-oriented literature. ACM Comput Surv. 2016;48(4):1–38. https://doi.org/10.1145/2871196 .

Asaro PM. AI ethics in predictive policing: from models of threat to an ethics of care. IEEE Technol Soc Mag. 2019;38(2):40–53.

Insights Team. Forbes insights: 4 industries that feel the urgency of AI ethics. Forbes. 2019 [cited 2020 Sep 23]. https://www.forbes.com/sites/insights-intelai/2019/03/27/4-industries-that-feel-the-urgency-of-ai-ethics/#7ec15d7372be .

Blasimme A, Vayena E. The ethics of AI in biomedical research, patient care and public health. SSRN Electron J. 2019. https://www.ssrn.com/abstract=3368756 .

Panch T, Pearson-Stuttard J, Greaves F, Atun R. Artificial intelligence: opportunities and risks for public health. Lanc Dig Health. 2019;1(1):e13–4.

Canadian Public Health Association. Public health: a conceptual framework. 2017. https://www.cpha.ca/sites/default/files/uploads/policy/ph-framework/phcf_e.pdf .

US Agency for International Development 104. Artificial intelligence in global health: defining a collective path forward. 2019. p. 1–42. https://www.usaid.gov/sites/default/files/documents/1864/AI-in-Global-Health_webFinal_508.pdf .

Bedaf S, Marti P, De Witte L. What are the preferred characteristics of a service robot for the elderly? A multi-country focus group study with older adults and caregivers. Assist Technol. 2019 May 27;31(3):147–57. https://doi.org/10.1080/10400435.2017.1402390 .

Draper H, Sorell T. Ethical values and social care robots for older people: an international qualitative study. Ethics Inf Technol. 2017;19(1):49–68. https://doi.org/10.1007/s10676-016-9413-1 .

Brundage M. Artificial intelligence and responsible innovation. https://www.milesbrundage.com/uploads/2/1/6/8/21681226/ai_ri_slides.pdf .

Bonderman D. Artificial intelligence in cardiology. Wien Klin Wochenschr. 2017;129(23–24):866–8. https://doi.org/10.1007/s00508-017-1275-y .

Ahuja AS. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ. 2019;7:e7702.

Smith C, McGuire B, Huang T, Yang G. The history of artificial intelligence. https://courses.cs.washington.edu/courses/csep590/06au/projects/history-ai.pdf .

Robitzski D. You have no idea what artificial intelligence really does: the world of AI is full of hype and deception. Futurism. https://futurism.com/artificial-intelligence-hype .

Dutton T. An overview of national AI strategies. Medium. 2018. https://medium.com/politics-ai/an-overview-of-national-ai-strategies-2a70ec6edfd .

Dutton T, Barron B, Boskovic G. Building an AI world: report on national and regional AI strategies. Canadian Institute for Advanced Research. 2018. https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 .

Sun TQ, Medaglia R. Mapping the challenges of artificial intelligence in the public sector: evidence from public healthcare. Gov Inf Q. 2019;36(2):368–83.

Guan J. Artificial intelligence in healthcare and medicine: promises, ethical challenges, and governance. Chin Med Sci J. 2019;99.

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–40.

Matheny M, Israni S, Ahmed M, Whicher D. Artificial Intelligence in health care: the hope, the hype, the promise, the peril. Washington, DC: National Academy of Medicine. 2019 p. 1–245. (NAM Special Publication.). https://nam.edu/wp-content/uploads/2019/12/AI-in-Health-Care-PREPUB-FINAL.pdf .

Powell J. Trust Me, I’m a chatbot: how artificial intelligence in health care fails the turing test. J Med Internet Res. 2019;21(10):16222.

Morley J, Machado CCV, Burr C, Cowls J, Joshi I, Mariarosaria T, et al. The debate on the ethics of AI in health care: a reconstruction and critical review. 2019. https://doi.org/10.13140/RG.2.2.27135.76960

Davies SE. Artificial intelligence in global health. Ethics Int Aff. 2019;33(02):181–92.

Download references

Acknowledgements

We would like to thank participants at the Ethics and AI for Good Health Symposium, whose thoughtful comments informed our thinking throughout the analysis of the literature. We would also like to acknowledge the kind contributions of Mikaela Grey at the Gerstein Science Information Centre, for peer reviewing our search strategy.

This study was supported by funding from the Joint Centre for Bioethics (JCB) Jus Innovation Fund. The JCB Jus Innovation Fund provided salary support for trainees (Murphy, Cai, Malhotra, Malhotra) working on the project.

Author information

Authors and affiliations.

Joint Centre for Bioethics, Dalla Lana School of Public Health, University of Toronto, 155 College Street, Suite 754, Toronto, ON, M5T 1P8, Canada

Kathleen Murphy, Neha Malhotra, Jia Ce Cai, Nakul Malhotra & Jennifer Gibson

Office of Global Health Education and Training, Dalla Lana School of Public Health, University of Toronto, 155 College Street, Room 408, Toronto, ON, M5T 3M7, Canada

Erica Di Ruggiero

Division of Clinical Public Health, Dalla Lana School of Public Health, 155 College Street, Toronto, ON, M5T 3M7, Canada

Ross Upshur

Bridgepoint Collaboratory for Research and Innovation, Lunenfeld Tanenbaum Research Institute, Sinai Health System, 1 Bridgepoint Drive, Toronto, ON, M4M 2B5, Canada

Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public, Health Sciences Building, Health University of Toronto, 155 College Street, Suite 425, Toronto, ON, M5T 3M6, Canada

Donald J. Willison

Gerstein Science Information Centre, University of Toronto, 9 King’s College Circle, Toronto, ON, M7A 1A5, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

KM assisted in developing the search strategies, managing the data, as well as screening, charting, and analyzing the data. She was the primary contributor in writing the manuscript. NM assisted with the article screening, charting, and analysis, and was a significant contributor to the writing of the manuscript. JC and NM assisted with the article screening, data charting and analysis, and developed the graphical representations of the findings. They also supported the writing of the manuscript. VL helped to develop the search strategy to appropriately address our research question, and to write said search strategy in the methods section of the manuscript. JG, ED, RU, and DW, conceived the idea for this study, and devised its approach. They provided oversight and strategic direction, and participated in the article screening process, oversaw the data analysis, and contributed valuable feedback throughout each step of the manuscript-writing process. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jennifer Gibson .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Search Strategy for the Academic Literature.

Additional file 2.

Search Strategy and Results of Grey Literature Search.

Additional file 3.

Data Charting Form Template.

Additional file 4.

Bibliography of the 103 records included in analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Murphy, K., Di Ruggiero, E., Upshur, R. et al. Artificial intelligence for good health: a scoping review of the ethics literature. BMC Med Ethics 22 , 14 (2021). https://doi.org/10.1186/s12910-021-00577-8

Download citation

Received : 30 April 2020

Accepted : 20 January 2021

Published : 15 February 2021

DOI : https://doi.org/10.1186/s12910-021-00577-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health care
  • Public and population health
  • Global health

BMC Medical Ethics

ISSN: 1472-6939

ai ethics literature review

Advertisement

Advertisement

Objective metrics for ethical AI: a systematic literature review

  • Open access
  • Published: 13 April 2024

Cite this article

You have full access to this open access article

ai ethics literature review

  • Guilherme Palumbo 1   na1 ,
  • Davide Carneiro 2 , 3   na1 &
  • Victor Alves 4   na1  

2116 Accesses

2 Citations

Explore all metrics

The field of AI Ethics has recently gained considerable attention, yet much of the existing academic research lacks practical and objective contributions for the development of ethical AI systems. This systematic literature review aims to identify and map objective metrics documented in literature between January 2018 and June 2023, specifically focusing on the ethical principles outlined in the Ethics Guidelines for Trustworthy AI. The review was based on 66 articles retrieved from the Scopus and World of Science databases. The articles were categorized based on their alignment with seven ethical principles: Human Agency and Oversight, Technical Robustness and Safety, Privacy and Data Governance, Transparency, Diversity, Non-Discrimination and Fairness, Societal and Environmental Well-being, and Accountability. Of the identified articles, only a minority presented objective metrics to assess AI ethics, with the majority being purely theoretical works. Moreover, existing metrics are primarily concentrating on Diversity, Non-Discrimination and Fairness, with a clear under-representation of the remaining principles. This lack of practical contributions makes it difficult for Data Scientists to devise systems that can be deemed Ethical, or to monitor the alignment of existing systems with current guidelines and legislation. With this work, we lay out the current panorama concerning objective metrics to quantify AI Ethics in Data Science and highlight the areas in which future developments are needed to align Data Science projects with the human values widely posited in the literature.

Similar content being viewed by others

ai ethics literature review

The ethics of AI business practices: a review of 47 AI ethics guidelines

ai ethics literature review

Mapping the landscape of ethical considerations in explainable AI research

ai ethics literature review

The Self-Synchronisation of AI Ethical Principles

Explore related subjects.

  • Artificial Intelligence
  • Medical Ethics

Avoid common mistakes on your manuscript.

1 Introduction

Artificial Intelligence (AI) is widely recognized as a significant disruptive force across all domains it touches. The widespread adoption of AI has experienced significant growth, resulting in a substantial impact on society, potentially with both positive and negative consequences. While AI is expected to be immensely beneficial for humanity, namely in areas such as medicine, law, education, or industry, its negative impacts may outweigh the positive ones if it is not developed in a way that has human values at its core.

Early examples of the potential risks abound, such as AI systematically discriminating against black patients by miscategorizing them in heart failure risk scores or kidney donor risk indexes, marking black individuals as less suitable donors [ 1 ], or AI recruiting tools that show bias against women [ 2 ] or even a chatbot coaching a “girl of 13 years old” on losing virginity [ 3 ].

In order to avoid such cases of harmful AI, legal and ethical regulation is paramount. However, one challenge when it comes to regulation is that legislation will always move slower than technological development. While the European Commission’s risk-based approach on an AI legislation is a step in the right direction, in the sense that it is not based on specific technologies or systems but on their level of potential risk, it might not be enough.

We argue that in order to allow AI developers and Data Scientists to actually comply with legislation, transparency requirements, and to be aware of the potential issues in the applications they develop, there is a need for more practical tools [ 4 , 5 ]. Tools that go beyond checklists or toolkits that can pinpoint sources of potentially ethical issues at any stage of the Data Science lifecycle, preferably before they have an actual impact. Without such an objective approach, we regard it as very difficult if not impossible to argue transparently about the level of ethical compliance or alignment of a given system or to point out ethical issues so that they can be promptly addressed.

One first challenge thus concerns the choice of ethical principles to be considered in order to create AI systems that are ethically aligned and cause no harm to society. This has been addressed by the High-Level Expert Group on AI, which in 2019 defined the Ethics Guidelines for Trustworthy Artificial Intelligence [ 6 ] by presenting several principles an AI must adhere to in order to be deemed trustworthy and ethical. These guidelines were developed in order to make AI lawful (respecting all applicable laws and regulations), ethical (respecting ethical principles and values) and robust (both from a technical perspective, while taking into account its social environment).

In this paper, when referring to ethical principles, we adhere to the work of the Expert Group on AI. According to its Ethics Guidelines, there are a set of seven key requirements that AI systems should meet in order to be deemed trustworthy:

Human agency and oversight: AI should empower humans to make educated decisions and uphold their rights. While ensuring sufficient oversight, human-in-the-loop, human-on-the-loop and human-in-command methodologies can be used.

Technical Robustness and safety: AI must be secure and resilient. It must be accurate, dependable, reproducible and safe, with a backup plan. That is the only method to reduce and prevent unintentional injury.

Privacy and data governance: In addition to privacy and data protection, data governance procedures must consider data quality and integrity and legitimize data access.

Transparency: Data, systems and AI business models should be transparent. Traceability methods aid in this. Additionally, stakeholders should be informed about AI systems and their conclusions, since AI needs to be explainable. People must know when they are interacting with an AI system and its capabilities and limitations.

Diversity, non-discrimination and fairness: AI must avoid unfair bias, which can marginalize vulnerable populations and worsen prejudice and discrimination. AI systems should be accessible to all, regardless of disability, and involve stakeholders throughout their lives to promote diversity.

Societal and environmental well-being: AI should help all humans, including future generations. It must ensure they are sustainable and eco-friendly. They should also consider the environment, including other living things, and their social and societal influence.

Accountability: Responsibility and accountability for AI systems and their results should be established. Auditability, which allows algorithm, data and design process evaluation, is crucial in important applications. Additionally, a proper and accessible remedy is needed.

A second challenge is, once the relevant principles are established, how to measure an AI system’s level of compliance or alignment with these principles. In our view, this must go beyond mere words or statements of commitment by organizations, but include automated, objective and standard assessment means, and to communicate the results of these internal analysis to the public. Such an approach should guide the development of AI in light of generally accepted human-centric values and principles.

However, principles by themselves are very high-level and abstract by nature. They are, per se , of low usefulness to Data Scientists or Computer Scientists, who work at a technical level, and may struggle to find effective ways to implement them. Moreover, these principles are also very much subject to interpretation, and different organizations or individuals may interpret them differently and implement them differently, eventually leading to a lack of standardization and unfair competition. Ultimately, this may generate confusion and a barrier to information for the end-user.

In this paper, we argue that abstract and high-level concepts and guidelines, while being an essential starting point, are not enough. We believe that only with the creation of objective and observable ethics metrics that can be integrated into a generic Data Science pipeline, will we be able to: 1) have a real-time perception of the level of alignment of a given system; 2) pinpoint its root causes (e.g., data, model, processes); and 3) take the appropriate counter-measures to, first, avoid a negative impact on the users and, secondly, address the root cause.

Especially important is the notion that this is not a one-shot intervention: a system that is deemed ethically aligned in a given moment in time may not be deemed so at a later stage. This can happen, for instance, due to changes in the parameters or hyper-parameters of a Machine Learning (ML) model (that drove it to overfit some particular data or become biased), or to changes in the underlying data (that became itself biased or of poor quality). So, monitoring ethical compliance through objective metrics or indicators must be seen as a continuous effort throughout all the stages of any Data Science pipeline.

figure 1

Artificial Intelligence data pipeline with ethics observability solution

Our contribution to address this challenge lies in what we consider the first necessary step: to carry out a survey and characterization of existing objective metrics that can be used to quantify, to some extent or in some dimension, the level of ethical alignment of a Data Science pipeline. To do so, this paper presents the results of a systematic literature review on metrics-based ethical assessment and compliance in AI systems. The goal is to carry out a survey of existing objective ethics metrics and to organize them by ethical principle in order to create a map of how each principle is currently measurable and observable in a typical Data Science pipeline, according to the literature.

In the context of this paper, when using the term metrics , we refer to any measure in which an objective quantitative or qualitative value is assessed in relation to a standard or scale. These metrics yield specific numerical values, percentages, frequencies, or even functions that measure the distance between two outcomes within a defined space while also being directly linked to the evaluation of the ethical principles of AI. A metric can also be a qualitative measure that shows non-quantitative conformance to a specific criterion, also directly related to the evaluation of the ethical principles of AI.

This SLR was designed with the goal to find practical and objective metrics that could be seamlessly integrated in any typical data pipeline, such as the one represented in Fig.  1 . This pipeline, depicted here in very general terms, also conveys another important criterion: the metrics identified should be technology-agnostic and re-usable in any data pipeline. They should be relevant and applicable to any set of data, be it streaming or batch. We want the identified metrics to be useful for the development of monitoring systems that will analyze, in real time, the ethical compliance of a data pipeline under the principle of observability. This integration is crucial for practical implementation, enabling data scientists and AI developers to incorporate ethical considerations into their models and algorithms from the onset. By focusing only on quantifiable and automated metrics, the aim is to provide concrete, actionable insights in real time that align with the data-driven nature of AI and DS systems.

For this reason any tool, metric, framework, checklist, toolkit, or theoretical concept that either require human intervention (and may thus be prone to bias or individual choices) or cannot be implemented in a fully automated way were deliberately left out regardless of their value in the ethical AI landscape.

Indeed, tools or frameworks that need human intervention can vary significantly in their application across different contexts and domains, making it challenging to establish a standardized set of metrics. Additionally, relying on manual metrics or metrics that aren’t useful for explaining how ethical the AI is introduces subjectivity and inconsistency, which may undermine the reliability of the findings. Furthermore, by eschewing existing assessment frameworks or manual metrics that need human intervention, the risk of bias or preconceived notions influencing the selection of metrics is avoided. Instead, it was adopted a more agnostic approach, focusing solely on the quantifiable metrics of ethical AI that could be implemented and automated into the standardized AI or DS pipeline. While this approach may limit the scope of the study, we believe it provides a solid foundation for future research and practical implementation.

It is also not our goal to identify solutions or mitigation approaches to ethical problems, such as tools that do automatic resampling of data or bias mitigation tools [ 7 ]. We do, however, identify metrics or indicators that can identify and quantify such problems.

The rest of the paper is organized as follows: Sect.  2 presents the research goals and methodology of the systematic literature review (SLR); Sect.  3 presents the results of the SLR; Sect.  4 addresses the main findings derived from the SLR and; Sect.  5 presents the conclusion of this SLR.

2 Research goals and methodology

The main goal of this SLR is to identify objective ethics metrics currently present in academic literature. Moreover, we also aim to create a map of how the identified metrics cover the relevant ethical principles proposed in the literature in order to determine whether additional developments are necessary and, if so, in which areas.

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a component of comprehensive search to locate all relevant published work on a subject, followed by a systematic integration and synthesizing of the search results and a critique of the extent, nature and quality of evidence in relation to a particular research question, highlighting gaps between what is known and what needs to be researched/developed [ 8 ].

This SLR was elaborated following PRISMA Method (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). PRISMA considers a set of evidence-based minimum items for reporting systematic reviews and meta-analysis, especially for evaluating interventions [ 9 ]. It primarily focuses on the reporting of reviews evaluating the effects of interventions, but can also be used as a basis for reporting systematic reviews with objectives other than evaluating interventions.

The first stage of this SLR started by identifying the research issues. In this instance, and after a first comprehensive search, any paper related to the assessment of the ethics of AI based on metrics was considered eligible. So, any study, article, journal, or paper whose title, abstract, or keywords included terms related to ethics, Artificial Intelligence and metrics was obtained. In summary, the search keywords were “Ethics” AND “Artificial Intelligence” AND “Metrics,” and the query used to extract any related paper was: (TITLE-ABS-KEY (metrics) AND TITLE-ABS-KEY (artificial AND intelligence) AND TITLE-ABS-KEY (ethics)) AND PUBYEAR > 2017.

This SLR was conducted in June 2023; consequently, any additional papers introduced after that date have not been taken into account.

Hence, the available literature related to the metrics-based assessment of the ethical compliance of AI was reviewed for a comprehensive understanding of which solutions and research have already been conducted.

The flowchart depicted in Fig.  2 summarizes the process followed for the retrieval of the literature. In total, the literature search yielded 66 papers. Of these, 45 papers remained after the removal of duplicates (20) and the impossibility of accessing 1 paper. This specific paper couldn’t be accessed due to its cost. However, the analysis of its abstract also revealed that it would not be relevant given the goals and scope of this literature review.

figure 2

Flow chart summarizing the process of paper selection for inclusion in the study

The second stage consisted of identifying relevant studies based on the preceding findings and selecting the studies that were relevant to the subject. Although the relevance of ethics metrics and principles varies according to the domain (e.g., healthcare, law, education), we approach this work from a domain-agnostic perspective, as the implementation of each metric is the same regardless of the domain. For this reason, we did not filter papers based on their domain.

The search was carried out on two major databases, Scopus and World of Science, and this search was intended to acquire all related studies, articles and academic papers in the last five years, starting in January 2018 and finishing in June 2023. This date interval was defined for several reasons. First and foremost, we wanted to identify recent relevant contributions since the explosion in AI research that started around 2018. Second, while we acknowledge that there is relevant literature on AI Ethics before 2018, the existing work is largely theoretical or fundamental and thus not aligned with the goals and scope of this literature review.

Furthermore, we also did not restrict the search to a specific geography. We acknowledge that the European space is, arguably, at the forefront of AI Ethics and regulation, and the work developed is aligned with the principles proposed by the European Commission. However, any research anywhere can make a valuable contribution.

For a similar reason, we did not include in this search documents of non-scientific nature, such as reports generated or requested by political entities, legislation, or guidelines. While relying only on the outputs of scientific research, we believe to have a guarantee of impartiality and for unbiased results.

Duplicated papers were removed at this stage. Based on the topics covered by their titles and abstracts, 38 articles were obtained as full-text and assessed for eligibility [ 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 ].

The selection criteria were established so as to eliminate those articles that, after an analysis of their abstracts, were in no way related to the topic in question. Therefore, any article that didn’t address ethical AI and/or metrics in some way, not even in an implied way, was excluded.

Consequently, following a full-text screening of each remaining paper, every paper that discussed the need for metrics, evaluation, and analysis of any ethical principle, even if only theoretically or not at all explicitly, was selected. This allowed to quantify the frequency with which the evaluation of each principle was addressed in the literature, thus allowing to estimate its level of maturity. Then, an in-depth full-text analysis of the previously selected pool of papers was carried out. Any article that didn’t present at least a single practical and objective metric to assess an AI system’s level of compliance with any of the stated principles was removed from the analysis.

In the third and final stage, data from the selected papers were collected, systematized and summarized, and the results were reported. Initially, the relevant information was extracted and summarized on a per-article basis. So, a detailed summary was elaborated for each selected paper based on a full-text analysis. Specifically, the focus was put on relevant information regarding the domain of application, ethical principle, solution provided, metric proposed (if any), specific outcomes, among others. In this stage, papers were also grouped according to the ethical principle(s) addressed (e.g., bias, robustness, explainability), in order to ascertain the level of coverage of each principle in the literature.

Out of the 38 papers eligible for full-text assessment, 8 didn’t fit any subject of the literature review and were considered irrelevant [ 35 , 36 , 38 , 40 , 44 , 46 , 47 ]. The exclusion of these 8 papers was carried out on the grounds that they had no relationship with the topic of study, since they didn’t address ethical AI nor address or mention any kind of metrics for the ethical assessment of AI.

Another group of 6 papers [ 34 , 37 , 39 , 41 , 42 , 45 ] in the pool of full-text assessment discussed ethical AI but didn’t address or mention the need for the evaluation or the need for metrics for the ethical assessment of AI. For instance, Edwards [ 34 ] addresses the IEEE P7010 standard and how it can help support organizations and individuals who develop or use AI in considering and integrating ethical considerations for human well-being in product development. This article (in theory) discusses how IEEE P7010 presents several metrics of growth, income and productivity of well-being, but in practice, this article didn’t present any relevant metric that is quantifiable, measurable, or that evaluates how ethical an AI is; the article solely made reference to the fact that they exist without showing any relevant proof.

Similarly, Germann et al. [ 47 ] intend to clinically validate a Deep Convolutional Neural Network (DCNN) for the detection of surgically proven Anterior Cruciate Ligament (ACL) tears in a large patient cohort and to analyze the effect of magnetic dissonance examinations from different institutions with varying protocols and field strengths. The paper validated the DCNN and concluded that it showed sensitivities and specificities well above 90% for the entire population, irrespective of the magnetic field strength of the MRI. In this case, this paper is not relevant since it doesn’t address the ethical analysis of the model nor does it present any relevant ethics metric.

Finally, 12 papers [ 15 , 17 , 21 , 24 , 25 , 27 , 28 , 29 , 30 , 31 , 32 , 33 ] explored the necessity of metrics and evaluation methods for assessing the ethical implications of AI. However, these studies did not provide specific metrics and/or rather proposed theoretical solutions that relied on checklists or toolkits.

For instance, Saetra et al. [ 24 ] present an approach to the evaluation of the positive and negative impacts of AI on sustainable development (regarding ESG (Environment, Social, Governance) reporting to the United Nations Sustainable Development Goals (SDG)). It focuses on the examination of micro-, meso- and macro-level impacts, considering both negative and positive impacts, and accounting for ripple effects and inter-linkages between the different impacts. Although in good spirit and relevant, the article never presented a solution based on metrics nor presented a way to evaluate the ethical state of an AI system, making it out of the scope of this work.

As previously mentioned, the 38 papers were grouped according to the ethical principles mentioned in each one, with the resulting clusters allowing to quantify the frequency of each principle in the literature (independently of the proposal of actual metrics or not). Furthermore, a total of 12 papers [ 10 , 11 , 12 , 13 , 14 , 16 , 18 , 19 , 20 , 22 , 23 , 26 ] fulfilled all the inclusion criteria, including presenting at least one ethics metric, and were selected for a more in-depth analysis. The results of this analysis are detailed in Sect.  3 .

3.1 Literature summarization

Raji and Buolamwini [ 26 ] investigate the commercial impact of algorithm audits (Gender Shades) on increasing algorithmic fairness and transparency. The Gender Shades study audits commercial facial recognition products to assess their ability to correctly identify gender and skin types. The auditors (an independent group) follow a procedure similar to coordinated vulnerability disclosures in information security, which involves documenting the vulnerabilities they find and giving the companies a chance to respond before publicly releasing the result. The goal is to expose performance vulnerabilities in commercial facial recognition. The study targets IBM, Microsoft, Megvii, Amazon, Kairos, etc. Based on the measurement of error differences across the identified subgroups, they conclude that all targets reduce accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the dark-skinned female subgroup, which undergoes a 17.7–30.4% reduction in error between audit periods. In this study, a series of ethics metrics are presented.

Kaul and Soofastaei [ 10 ] address the current state of AI ethical principles in the mining industry. Later, they present a series of guidelines and recommendations on how to use AI ethically across the project lifecycle (CRISP-DM). The goal is also to help the organization understand, evaluate and mitigate bias using the appropriate fairness metrics. They present some fairness metrics and bias mitigation algorithms to help remove bias (e.g., equal opportunity, demographic parity, disparate impact, Theil index, among others).

Kasirzadeh [ 11 ] addresses feminist political theory (the work of Iris Marion Young) and argues that a primarily distribution approach to the local allocation of material and computational goods is too narrow to adequately address or accommodate concerns about social injustice vis-à-vis algorithmic fairness. It argues that algorithmic fairness is morally salient and requires philosophical attention and that algorithmic ecosystems are socio-technical entities and are therefore receptive to different sources of social injustice. It argues that the metrics are just concerned with local matters of distributional justice. However, not all sources of social injustice are distributional, some are structural. The paper also argues for six positive corollaries of the adoption of socially responsible algorithmic fairness as the conceptual basis for research into the infrastructural fairness of algorithmic ecosystems and their direct effects. In summary, the paper attempts to connect some dimensions of the philosophical works of Young to algorithmic fairness. In the process, it discusses counterfactual comparison, demographic parity, equal opportunity, etc.

Zafar et al. [ 12 ] focus on an algorithm for diabetic retinopathy screening and risk stratification. According to the study, additional performance metrics are needed that extend beyond the assessment of technical accuracy in order to comprehensively understand the influence of AI algorithms on patient outcomes. There is a need for real-world evaluation of safety, efficacy and equity (bias), impact of patient outcomes, ethical evaluation (using federated learning to test privately against the same algorithm rather than using pooled data), logistical and regulatory evaluation. In summary, the article focuses on arguing that there is a need for real-world validation and testing. It also makes reference to equal opportunity and equalized odds when addressing bias, arguing that there are two types of AI bias that are well defined: the (in)equality of opportunity (equal precision) and the (in)equality of odds (equal false positive and false negative rates) that can occur at all stages of the development of AI algorithms.

Bae and Xu [ 13 ] present two state-of-the-art pedestrian trajectory prediction models for age and gender bias across three different datasets (JAAD, PIE, and TITAN). The goal is to design and utilize novel evaluation metrics for comparing model performance (mean MSE, Mann–Whitney U test, Wasserstein distance). Both models (BiTraP, SGNet) perform worse on children and the elderly compared to adults. The paper also identifies potential sources of biases and some metrics to identify them (demographic parity/statistical parity), as well as discussing several limitations of the study. It concludes that there is no clear difference between genders.

Kasirzadeh and Clifford [ 14 ] lay the foundation for a more detailed analysis of the fairness principle by first exploring the potential for fairness metrics to operationalize the principle in order to more adequately respond to the potential for unfair outcomes. The paper also argues that there is a significant limitation to the use of fairness metrics to process personal data fairly. It discusses popular metrics for the assessment of algorithmic fairness, such as statistical parity and equality of opportunity, and also one way to provide a more concrete analysis of the notion of individual fairness by using counterfactuals. It concludes that the technical challenges have an impact on the usefulness of Data Protection Impact Assessments irrespective of a controller’s willingness to actively engage in the process. It also concludes that the fairness metrics are context dependent and have varying interpretations of fairness according to different fairness metrics. Additionally, it concludes that data controllers play a key role in the determination of what is fair.

Cortés et al. [ 16 ] introduce the notion of locality and defines structural interventions. They compare the effect of structural interventions on a system compared to local, structured-preserving interventions on technical objects. The paper proposes a methodology (RISE) to account for elements of algorithmic discrimination based on social origin. It places the algorithm in the social context in which it is deployed instead of just considering the algorithm in isolation. The framework allows for the identification of bias outside the algorithm stage and proposes joint interventions on social dynamics and algorithm design. To evaluate this proposal, they use demographic parity, equal opportunity and equalized odds metrics. The paper concludes by showing several structural interventions in a model for financial lending and concludes that structural interventions, unlike algorithmic interventions, can in fact help the system become more equal.

Zhang et al. [ 18 ] address the existing conflicts and inconsistencies among accuracy and fairness metrics. The paper considers how to simultaneously optimize accuracy and multiple fairness metrics more effectively. Not only that, but the paper presents 16 fairness metrics and, based on the obtained correlations, concludes that 8 fairness metrics represent all 16. The paper also views the mitigation of unfairness as a multi-objective learning problem, and a multi-objective evolutionary learning framework is used to optimize the metrics simultaneously. Afterward, ensembles are constructed based on the models in order to automatically balance different metrics. The authors also analyze the approach in eight different datasets and conclude that the framework can improve fairness according to a broad range of fairness metrics, even the ones not used in the multi-objective learning algorithms. They present the results and conclude the model performs well for both the eight fairness metrics used in training (average odd difference, error difference, discovery ratio, predictive equality, false omission rate difference, false omission rate ratio, false negative rate difference, false negative rate ratio) and for the other eight fairness metrics not used in training (error ratio, discovery difference, false positive rate ratio, disparate impact, statistical parity, equal opportunity, equalized odds, predictive parity).

Schedl et al. [ 19 ] address the necessity of discussing with different stakeholders when investigating biases and fairness in the value chain of recommendation systems. It also points out that there are discrepancies between computational metrics of bias and fairness (disparate impact, generalized entropy index, statistical parity) and their actual individual and societal perception. It points out that bias cannot be measured in a fully objective way, but it pushes for a more holistic perspective on human perception in relation to psychological, sociological and cultural backgrounds. It finally discusses metrics of bias and fairness, as well as technical debiasing solutions in the context of ethical considerations and legal regulations.

Goethals et al. [ 20 ] address the use of counterfactual explanations to assess the fairness of a model. Firstly, the paper presents some fairness metrics (demographic parity, disparate impact, equal opportunity, equalized odds, statistical parity) and counterfactual fairness. The paper also argues that the counterfactual explanations can not only detect explicit bias (when using a sensitive attribute) but also implicit bias (when not using the sensitive attribute). It presents the PreCoF metric (Predictive Counterfactual Fairness) which is successfully used to detect implicit bias in the model in the use cases.

Fleisher [ 22 ] argues that the method of individual fairness does not serve as a definition of fairness and does not serve as the only method analyzed, nor should it be given priority over other fairness methods. In the process, the paper addresses equalizing odds, measuring statistical distance and achieving parity. The author presents four in-principles problems for individual fairness, which are the insufficiency of similar treatment (showing similar treatment is insufficient to guarantee fairness), systematic bias and arbiters (the method is at risk of encoding human implicit bias), prior moral judgments (individual fairness requires prior judgment, limiting its usefulness as a guide for fairness), and the incommensurability of relevant moral values makes similarity metrics impossible for many tasks (two values are incommensurable if there is no common measure that can be applied to both values).

Wachter et al. [ 23 ] address the critical gap between legal, technical and organizational notions of algorithmic fairness. By analyzing EU non-discrimination law and the jurisprudence of the European Court of Justice (ECJ) and national courts, it identifies a critical incompatibility between European notions of discrimination and existing work on algorithmic and automated fairness. There is a clear gap between statistical measures of fairness (e.g., demographic parity, equalized odds) embedded in various fairness toolkits and governance mechanisms and the context-sensitive, often intuitive, and ambiguous discrimination metrics and evidential requirements used by the ECJ. The article makes three contributions. First, the authors review the evidential requirements to bring a claim under EU non-discrimination law. Due to the disparate nature of algorithmic and human discrimination, they conclude that the EU’s current requirements are too contextual, reliant on intuition and open to judicial interpretation to be automated. Secondly, they show how the legal protection offered by non-discrimination law is challenged when AI, not humans, discriminates. Compared to traditional forms of discrimination, automated discrimination is more abstract and unintuitive, subtle, intangible and difficult to detect. Thirdly, they examine how existing work on fairness in machine learning lines up with procedures for assessing cases under EU non-discrimination law. The paper proposes ‘conditional demographic disparity’ (CDD) as a standard baseline statistical measurement that aligns with the Court’s ‘gold standard’.

This concludes a summary of each paper’s contribution. The following sub-sections describe and define the Ethical principles considered in this work, as well as the identified metrics.

3.2 Ethical principles

Over the past years, as a result of the need for more human-aligned AI development, many principles have been put forward and many subjects have become widely discussed. Table 1 systematizes the key subjects identified in the 38 papers analyzed in the context of this SLR, as well as on the Ethics Guidelines for Trustworthy AI [ 6 ].

Based on the literature review, the most frequently addressed subject is Bias, which is mentioned in 22 papers (58%), closely followed by Fairness, which is mentioned in 20 studies (53%). Other frequent subjects include Transparency, referred in 12 studies (32%), and Privacy, mentioned in 11 studies (29%).

Precisely defining and distinguishing these subjects is also often challenging, as some of them overlap significantly according to the accepted definitions in the literature. This happens because they have been proposed freely by researchers, using terms that are sometimes synonyms or very closely related. Since having a large number of (potentially overlapping) key subjects might dilute the findings of the work to be carried out, a decision was taken to cluster every subject within a specific ethical principle based on the Ethics Guidelines for Trustworthy AI [ 6 ].

Thus, Human agency was attributed to the ethical principle of Human Agency and Oversight ; Robustness and Safety were clustered into Technical Robustness and Safety ; Privacy and Governance were assigned to Privacy and Data Governance ; Transparency and Explainability to Transparency ; Bias and Fairness were clustered into Diversity, Non-Discrimination and Fairness ; Well-being and Non-maleficence were assigned to Societal and Environmental Well-being ; and Accountability into Accountability .

This clustering allows to better understand the attention that has been devoted by the literature to each ethical principle, as detailed in Table 2 , which presents the frequency of each ethical principle considering the 38 articles. Table 3 , on the other hand, organizes the 38 papers according to the ethical principles they address. The results are comparable to the previous ones, where the principle that was devoted the greatest attention was Diversity, Non-Discrimination and Fairness , having been mentioned in 24 different papers (58%), followed by Transparency being addressed in 14 different studies (37%) and Accountability with 7 presences (18%).

Clearly, this analysis shows an imbalance between principles (dominated by Diversity, Non-Discrimination and Fairness). However, according to the goals of this work, we deepen this analysis by examining which papers provide tangible outcomes and practical metrics for assessing the ethical condition of IA, and not merely mention a given principle.

As detailed in Table 4 , only 12 articles out of 24 related to Diversity, Non-Discrimination and Fairness presented in fact at least 1 actual practical metric. Moreover, these articles correspond to 100% of all articles that presented metrics and to 34% of all full-text analyzed articles.

Thus, several key findings can already be drawn:

Only a small amount of the articles (34%) provide an actual objective and measurable metric;

All the articles that mention objective metrics (12) relate to the principle of “Diversity, Non-Discrimination and Fairness”;

There is a significant need to investigate and propose novel metrics that can be used to quantify the level of compliance of an AI system with the remaining ethical principles.

Table 5 identifies the 12 articles that propose actual metrics, by principle. The following section discusses in detail each of the identified metrics in these 12 articles.

3.3 Ethics metrics

After the analysis of the 12 studies that present actual objective metrics, it is imperative to understand the specific concept measured by each metric and how this is accomplished, to comprehend their relevancy.

Similar to the ethical principles, it was found that many metrics are similar or even overlap in the concept they represent, despite having slightly different names. Thus, a similar exercise was conducted, in which groups of equivalent metrics were clustered into a single one, so as to prevent repetitions and better focus the analysis. Table 6 shows the result of this clustering. The left column shows all the metrics found in the literature. These metrics have then been clustered according to their meaning, as depicted in the right column.

In total, after their clustering, 15 objective metrics have been identified in the articles. Independently of their names in the original papers, they will be, from this point onward, referred to by the name on the right column of Table 6 .

As it’s clear to see, the majority of the metrics used to assess the level of ethical compliance of AI are fairness metrics. Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models and identifying historical systematic disadvantages.

According to the literature, there is a lack of consensus in the community about what is considered bias or fair and a lack of consensus among different measures. Since fairness in different contexts can be interpreted into different quantitative definitions to emphasize different perspectives, no single measure has been accepted as a universal notion of fairness quantification [ 18 , 19 ].

There are several fairness metrics that, depending on the context, are relevant and useful to mitigate and identify bias. Fairness metrics can be defined into two conflicting but complementary categories: group fairness and individual fairness.

Group fairness is the idea that the average classifier behavior should be the same across groups defined by protected attributes, while individual fairness is the idea that all individuals with the same feature values should receive the same predicted label and that individuals with similar features should receive similar predicted labels. Individual fairness includes the special case of two individuals who are the same in every respect except for the value of one protected attribute [ 22 , 48 ].

Typically, protected (also called sensitive) attributes are traits considered to be discriminative by law, such as gender, race, age, among others [ 18 ].

As detailed in Table 6 , a total of 33 distinct metrics were identified; however, not all of these mentioned metrics are unique; rather, some are identical and were merely designated by different names in different papers. In the remainder of this section, every unique metric, identified by its clustered name, will be described in detail.

figure 3

Diagnostic testing diagram. Retrieved from: Wikipedia: https://en.wikipedia.org/wiki/Precision_and_recall

Given that some of these concepts are rather abstract, and in order to do this in a more concrete manner, in some cases, the example scenario of a bank loan approval will be used.

Firstly, some general notation, which will be used in the remainder of the section for the metrics and the bank loan scenario:

\(X \in R^d\) : quantified features of the applicant (e.g., education, work experience, college GPA, income, etc.). These are the observed characteristics of an individual (variables);

A : sensitive attribute/protected feature (e.g., sex, race, ethnicity);

\(C = c(X,A) \in {0,1}\) : binary predictor (classifier) (e.g., approved/rejected), which constitutes a decision based on a score \(R:=r(x, a) \in [0,1]\) ; in this case c ( X ,  A ) decides if the person should be given (1) or denied (0) a loan.

\(Y \in {0, 1}\) : target variable (binary outcome variable) (e.g., approved/denied);

\(\hat{Y} \in {0, 1}\) predicted decision of target variable;

Assume X ,  A ,  Y are generated from an underlying distribution D i.e., \((X, A, Y) \sim D\) ;

Denote \(P_a[Y]:= P[Y|A=a]\) .

We also briefly describe some concepts that are fundamental to understand the metrics (Fig.  3 ):

True positive (TP) is a granted loan to someone who can pay it back;

False positive (FP) is a granted loan that goes on to default (failed to pay the loan);

False negative (FN) is a loan denied to someone who can pay it back;

True negative (TN) is a loan denied to someone who would default;

True positive rate (TPR) is the proportion of people who could pay back loans that were actually granted loans;

False positive rate (FPR) is the proportion of people who would default that were granted loans;

False negative rate (FNR) is the proportion of people who could pay back loans that were actually denied loans;

True negative rate (TNR) is the proportion of people who would default that were actually denied loans;

In this situation, both false positives and false negatives are detrimental to both parties, as a false positive costs the lender due to the unpaid loan debt and a false negative costs the borrower due to the financial damage caused by loan default.

A false negative results in financial loss for the bank as it prevents the collection of interest on a repayable loan, while it also imposes a cost on the borrower by denying them access to credit that is rightfully theirs.

Next, we detail each of the metrics identified in the literature.

3.3.1 Demographic parity (DP)

Demographic parity or statistical parity is the same concept according to the literature [ 49 ]. Statistical distance, in this case, refers to the difference between the probability of a prediction being positive between two different groups, the same as both demographic and statistical parity. This metric is also known as independence in some cases.

The goal of demographic parity is to make the selection of each segment of a group’s probabilities as equal as possible, indicating that the model’s predictions are not biased with respect to demographic attributes. Demographic parity is the property that the demographics of those receiving positive (or negative) outcomes are identical to the demographics of the population as a whole. Demographic parity speaks to group fairness rather than individual fairness and appears desirable, as it equalizes outcomes across protected and non-protected groups [ 10 , 50 ].

The equation for demographic parity is relatively straightforward and is typically expressed as follows:

For all \(a,b \in A\) :

In this equation:

\(P(Y=1|A=a)\) represents the probability that the actual outcome of the target variable ( Y ) is positive (e.g., approval for a loan) given that the individual belongs to demographic group a (e.g., a protected group, like race = black).

\(P(Y=1|A=b)\) represents the probability that the actual outcome of the target variable is positive given that the individual belongs to demographic group b (e.g., a non-protected group, like race = white).

3.3.2 Equalized odds (EqO)

Equalized odds is a bias mitigation technique combination of both equal opportunity and predictive equality, where a classifier has equality of odds when the protected and unprotected groups achieve equality of TPR (equal opportunity) and FPR (predictive equality) across the two groups [ 51 ].

Equalized Odds now considers conditional expectations with respect to positive and negative labels, i.e., \(Y=0\) and \(Y=1\) . In order to meet this criterion, the outcomes of the subset of records belonging to the positive and negative classes must be equal. In certain articles, it is also denoted as false positive rate parity [ 51 ].

As previously stated, true positive parity is occasionally referred to as equality of opportunity because it mandates that the entire population, irrespective of the dominant group, is afforded the chance to benefit from the decision ( \(Y=1\) ) (See Eq. ( 5 )).

Likewise, false positive rate parity, sometimes referred to as predictive equality, is described in Eq. ( 14 ).

Mathematically, it is equivalent to the conjunction of conditions for false positive error rate balance and false negative error rate balance definitions given above. In this instance, this implies that the probability of an applicant with an actual good credit score being correctly assigned a good predicted credit score and the probability of an applicant with an actual bad credit score being incorrectly assigned a good predicted credit score should both be the same for male and female applicants, for example [ 52 ].

The inclusion of false positive rates takes into consideration the varying costs associated with misclassification across different populations. In instances where a model is utilized to forecast an unfavorable result, such as the likelihood of recidivism, and this outcome disproportionately impacts individuals from minority communities, the presence of false positive predictions can be seen as a manifestation of pre-existing disparities in outcomes between minority and majority populations. The concept of equalized odds aims to ensure that the predictive accuracy of models is uniformly high across all demographic groups. This approach penalizes models that exhibit strong performance solely on the majority groups.

A weaker variant of the equalized odds is referred to as Accuracy parity. The achievement of accuracy parity is realized when the accuracies within subgroups, which are determined by dividing the total number of successfully classified examples by the overall number of examples, exhibit little disparity. One drawback of this less robust concept is the potential trade-off between the false positive rate of one group and the false negative rate of another group.

Accuracy parity requires that the classifier guarantees the same accuracy in different sensitive attribute groups.

Accuracy (ACC) is defined as [ 10 ]:

So, accuracy parity can be represented by:

3.3.3 Equal opportunity (EO)

Equal opportunity is a state of fairness characterized by a fair treatment of individuals, where they are afforded identical opportunities and rights. The principle asserts that the representation of any subgroup within a larger population (such as gender or ethnicity) should be determined based on the relative size of that subgroup in the overall population. This determination should be made without any hindrance from artificial obstacles, prejudices, or preferences, unless specific justifications can be provided for certain distinctions [ 10 , 18 ].

Equal opportunity is a relaxed version of equalized odds that only considers conditional expectations with respect to positive labels (i.e., \(Y=1\) ). This metric requires equal outcomes only within the subset of records belonging to the positive class and is defined in some cases as equal true positive rates (true positive parity or balance).

In the loaning example, equal opportunity requires that the individuals in group a who are qualified to be given a loan are just as likely to be chosen as individuals in group b who are qualified to be given a loan. However, by not considering whether false positive rates are equivalent across groups, equal opportunity does not capture the costs of miss-classification disparities [ 16 ].

EO calculates the ratio of true positives to positive examples in the dataset \({\text {TPR}} = {\text {TP}}/P\) , conditioned on a protected attribute.

Likewise, false negative error rate balance measures the probability of a subject in a positive class to have a negative predictive value being the same across the groups. So, equal opportunity, false negative rate parity, false negative rate difference and false negative rate balance are similar.

A common denominator that emerged in the systematic literature review is that different articles presented the same metrics with different names. A conclusion that was taken is that every time an article addressed difference , parity and balance they were describing or referring to the same metric and trying to achieve the same conclusions based on slightly different equations/calculations. So in this case, false negative rate parity, difference and balance are exactly the same metric, justifying its clustering into a same ethic metric, this being equal opportunity [ 16 , 52 , 53 ].

FNR difference measures the equality (or lack thereof) of false negative rates between groups. In practice, this metric (false negative rate parity) is implemented as a difference between the metric value for group a and group b [ 52 ].

The condition for a prediction to be considered accurate is satisfied when the false negative rates (FNRs) for both the protected and unprotected categories are equal. In other words, the FNR indicates the probability that an individual in the positive class will receive a negative prediction, specifically, in this context, the likelihood that an individual who has the ability to pay back a loan will be rejected one.

In this example, this implies that the probability of an applicant with an actual good credit score being incorrectly assigned a bad predicted credit score should be the same for both male and female applicants, for instance (or for both white and black, etc.): \(P(\hat{Y}=0|Y=1,A=a) = P(\hat{Y}=0|Y=1,A=b).\) Mathematically, a classifier with equal FNRs will also have equal TPR: \(P(\hat{Y}=1|Y=1,A=a) = P(\hat{Y}=1|Y=1,A=b)\) [ 18 , 52 ].

For that matter, when evaluating AI based on equal opportunity, both TPR balance or FNR balance are acceptable.

3.3.4 Predictive parity (PrP)

Predictive parity, calibration, or false omission rate difference (or parity) can also be described as Sufficiency [ 54 ] in some instances.

The predictive value parity is satisfied when both positive predictive value parity (PPV-parity) and negative predictive value parity (NPV-parity) are satisfied, for both protected and unprotected groups.

PPV is the probability of a subject with positive predictive value to truly belong to the positive class. In the bank loan scenario used as example, PPV is the proportion of granted loans that were paid back, and the NPV is the proportion of rejected loans that were denied to someone who could not pay the loan [ 52 ].

PPV-parity equalizes the chance of success, given a positive prediction. For example, in the bank’s example, PPV-parity requires credit score rates to be equal across groups (of admitted individuals).

So, for all \(a,b \in A\) ::

This is equivalent to satisfying both:

In this example, this implies that, the score returned from a prediction (used to determine the individuals’s eligibility for the loan) for an individual, should reflect the person’s real capability of paying for the loan.

In other words, for example, for both male and female applicants, the probability that an applicant with a good predicted credit score will actually have a good credit score should be the same [ 52 ].

Furthermore, mathematically, a classifier with equal PPVs will also have equal false discovery rates (FDRs):

An additional equation to prove false omission rate parity is the other side of predictive parity:

We conclude that when trying to measure the predictive parity, the positive predictive parity, the negative predictive parity, the false omission rate parity / difference and the false discovery rate parity are all implicitly or explicitly obtained and measured.

3.3.5 Counterfactuals (CF)

Kusner et al. [ 55 ] introduced the notion of Counterfactual fairness, which is a form of fairness developed from Pearl’s causal model. In this framework, the fairness of a prediction made by the model for an individual is determined by whether it remains consistent when the individual is hypothetically assigned to a different demographic group [ 20 ].

The authors propose to employ counterfactuals and define a decision-making process counterfactually fair if, for any individual, the outcome does not change in the counterfactual scenario where the sensitive attributes are changed.

To measure this, they make explicit assumptions about the causal relationships in the data. One way for a predictor to be counterfactually fair is if it is a function of only non-descendants of the sensitive attribute, so this will be different depending on the chosen causal model [ 20 , 55 ].

In other words, the predictor \(\hat{Y}\) is counterfactually fair if under any context \(Y=y\) and \(A=a\) . The mathematical formulation reads:

That is, taken a random individual with sensitive attribute \(A=a\) and other features \(X=x\) and the same individual if she had \(A=b\) , they should have the same chance of being accepted. The symbol \(\hat{Y}_{A \leftarrow a}\) represents the counterfactual random variable \(\hat{Y}\) in the scenario where the sensitive attribute A is fixed to \(A=a\) . The conditioning on \(A=a\) , \(X=x\) means that this requirement is at the individual level, in that is conditioning on all the variables identifying a single observation [ 55 ].

3.3.6 Disparate impact (DI)

Disparate impact also known as proportional parity or even adverse impact commonly refers to the measurement of unintentional discriminatory practice. It is a quantitative measure of the adverse treatment of protected classes that compares the positive outcome of one group versus another [ 56 ].

Disparate impact can also be part of the demographic parity for some authors [ 10 , 20 ].

DI is then not a difference or a parity, but the ratio of predictions for a favorable outcome in a binary classification task between members of the unprivileged group a to those of the privileged group b [ 10 ]. For all \(a,b \in A\) :

In the bank loan scenario, it can be the ratio of the prediction for being approved for a loan between female and male, for instance. Disparate impact is a one-side reformulation of this condition, where 80% disparity is an agreed upon tolerance decided in the legal arena. For example, if the model predictions grant loans to 60% of men (group b ) and 50% to women (group a ), then DI = 0.5/0.6 = 0.8, which indicates a positive bias and an adverse impact on the other group represented by a .

Values less than 1 indicate that group b has a higher proportion of predicted positive outcomes than group a . This is referred to as positive bias.

A value of 1 indicates demographic parity.

Values greater than 1 indicate that group a has a higher proportion of predicted positive outcomes than group b . This is referred to as negative bias.

3.3.7 Predictive equality (PrE)

Predictive equality also referred to as false positive error rate balance is defined as the situation when accuracy of decisions is equal across two groups, as measured by false positive rate (FPR) [ 53 ].

A classifier satisfies this condition if subjects in the protected and unprotected groups have equal FPR, as indicated by the fulfillment of the following equation. For all \(a,b \in A\) :

Mathematically, if a classifier has equal FPR for both groups, it will also have equal TNR, satisfying the equation:

3.3.8 Generalized entropy index (GE)

Generalized entropy index is proposed as a unified measure of individual and group fairness by Speicher et al. [ 57 ]. It is a measure of inequality at a group or individual level with respect to the fairness of the algorithmic outcome [ 58 ].

This class of inequality indexes is based on the concept of entropy . In thermodynamics, entropy is a measure of disorder. When applied to income distributions, entropy (disorder) has the meaning of deviations from perfect equality [ 59 ].

Generalized Entropy Index measures the inequality between all users with respect to how fair they are treated by the algorithm. Entropy-based metrics such as Generalized Entropy Index are a family of inequality indices that can be used to measure fairness at both group level and individual level.

The Theil index is the most commonly used flavor of GEI. It can be considered a measure of the inequality between all individuals with respect to how fair the outcome of the algorithm is [ 57 , 58 , 59 ].

The GE index is defined as follows [ 57 , 58 , 59 , 60 ]:

The values of the \(GE(\alpha )\) index vary between 0 and \(\infty \) , with 0 representing an equal income distribution and higher values representing higher levels of income inequality. The \(GE(\alpha )\) index as shown in Eq. ( 16 ) defines a class because it assumes different forms depending upon the value assigned to the parameter \(\alpha \) , which is a weight given to inequalities in different parts of the income distribution [ 60 ]. A positive \(\alpha \) captures the sensitivity of the GE index to a specific part of the income distribution. With positive and large \(\alpha \) , the index GE will be more sensitive to what happens in the upper tail of the income distribution. The less positive the parameter \(\alpha \) is, the more sensitive the index is to inequalities at the bottom of the income distribution while the more positive the parameter \(\alpha \) is, the more sensitive the index is to inequalities at the top [ 59 , 60 ].

In principle, the parameter \(\infty \) can take any real values from \(-\infty \) to \(\infty \) . However, from a practical point of view, \(\alpha \) is normally chosen to be positive. This is because, for \(\alpha \) < 0, this class of indices is undefined if there are zero incomes. GE (0) is referred to as the mean logarithmic deviation, which is defined as follows [ 60 ]:

GE(1) is known as the Theil inequality index, named after the author who devised it in 1967 [ 60 ]. It is calculated as the generalized entropy of benefit for all individuals in the dataset, with alpha = 1. Thus, it measures the inequality in benefit allocation for individuals [ 10 ].

The Theil index is defined as follows:

3.3.9 Average odds difference (AOD)

Average odds difference is a fairness metric used to assess the difference in predictive performance between two groups or populations (between unprivileged and privileged groups) in terms of both false positive rates and true positive rates. It focuses on measuring the balance of the prediction outcomes among different groups [ 10 , 61 ].

So, the average odds denote the average difference in FPR and TPR for groups a and b [ 18 ], defined by the following equations.

This can be translated to:

The ideal value of this metric is 0. A negative value (value < 0) implies that the model produces higher false positive rates and true positive rates for the privileged group ( b ), and a positive value (value > 0) indicates a higher benefit for the unprivileged group ( a ), both suggesting potential disparities in predictive performance [ 10 ].

3.3.10 Error difference (ErrD)

Error difference is a metric that is intended to measure the difference between the false positive and false negative rates of unprivileged group ( a ) and the false positive and false negative rates of privileged group ( b ).

The goal is to understand the proportion of missed predictions based on the real value. This metric can be satisfied by the following equation:

For \(a,b \in A\) :

This is equivalent to:

The ideal number is 0, meaning that both false positive and negative rates for both privileged and unprivileged groups are balanced. A negative value means there is a discrepancy between groups’ predictions, meaning there may be potential bias toward the privileged group, since group b presents a higher false positive and negative rate than group a . In relation to that, a positive value shows potential bias toward the unprivileged group a .

3.3.11 Error ratio (ErrR)

Likewise, error ratio is a metric with the same intention of the error difference, but now trying to look to the comparison of both groups to try to understand how much of one group is present in the other group, or, in other words, how two distinct groups are related.

The metric is defined by the following equation:

The ideal number is 1, which means that both false positive and negative rates for privileged and unprivileged groups are balanced. A value close to zero indicates that there is a large discrepancy between groups, meaning there are disparities and potential bias toward the privileged group, since group b presents a higher false positive and negative rate than group a . In relation to that, a positive value greater than 1 shows a potential bias toward the unprivileged group a .

3.3.12 False discovery rate ratio (FDRR)

Similar to false discovery rate parity, the false discovery rate ratio is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons between two different groups.

This metric is used to assess the proportion of false positive predictions among all positive predictions made by a classification model between two different sensitive attributes. It quantifies the ratio of incorrect positive predictions relative to the total number of positive predictions of sensitive attribute a and attribute b and then compares them.

The formula for false discovery rate ratio is relatively straightforward and is typically expressed as follows:

3.3.13 False negative rate ratio (FNRR)

Similar to false negative rate parity, the false negative rate ratio is used to assess the equality (or lack thereof) of the false negative rates across groups. It quantifies the ratio of incorrect negative predictions relative to the total number of positive cases for the sensitive group a and the group b and then compares them.

The equation for false negative rate ratio is typically expressed as follows:

3.3.14 False omission rate ratio (FORR)

Similar to false omission rate parity, the false omission rate ratio is used to assess the equality (or lack thereof) between groups of the rate of inaccurate “negative” predictions by the model.

The formula for false negative rate ratio is typically expressed as follows:

3.3.15 False positive rate ratio (FPRR)

Similar to false positive rate parity, this metric is used to assess the equality (or lack thereof) of the false positive rates across groups. It quantifies the ratio of incorrect positive predictions relative to the total number of negative cases, for both unprivileged group a and privileged group b , comparing them.

The formula for false positive rate ratio is typically expressed as follows:

3.4 Additional information

Some of the articles analyzed mentioned performance metrics such as accuracy (12), precision (5), AUC—Area Under ROC Curve (3), recall (3), RMSE—Root Mean Squared Error (1), MSE—Mean Squared Error (1), F1-score (1), Mann–Whitney U test (1), among others. These are, however, in our view a different kind of metric, which quantifies the overall predictive quality of a ML model. While these metrics might be relevant to assess the level of ethical compliance of a system, be relevant for ethical requirements such “Technical Robustness and Safety” and would surely be useful to create ethics metrics, they cannot be exclusively considered ethics metrics per se . For instance, accuracy can evaluate the predictive quality of a model but not whether the model is ethical; a model can have good overall/average accuracy but poor accuracy for specific groups (discrimination). Not only that, but these performance metrics obtained through the full-text assessment were mostly referenced to evaluate a model in terms of performance but not in terms of ethical compliance. Additionally, although these performance metrics are relevant and shouldn’t be completely overlooked and discarded, they were not discussed like the previous ones since they have received plenty of attention in non-ethical contexts already.

For this reason, we consider their future use in building novel ethics metrics, but we did not consider them in this literature review with the same level of detail as the previous ones.

Another relevant disclaimer about this systematic literature review is that we found several papers that, when addressing the ethical compliance of AI, presented some solutions/tools or methods for ethical analysis, bias mitigation, or model performance enhancement. Such tools and metrics that were identified in the literature (e.g, SHAP, LIME, conformity assessment, BLEU score, average popularity GAP, transparency score, reporting rate, etc.) are undoubtedly relevant in AI Ethics. However, while they can, to some extent, facilitate the implementation of efforts toward Ethical AI, they are not objective metrics per se , as discussed previously, and are therefore outside of the scope of this literature review. While mentioned here, they are not given the same level of detail as the identified metrics.

For instance, Ruf and Detyniecki [ 25 ] present two tools that can help operationalize AI fairness. The fairness compass helps choose what fairness means for each specific situation through a fairness tree. The Fairness Library helps choosing the best way to make AI fair by providing a series of algorithms for bias mitigation. This specific paper is an example of one that presents a relevant tool, but does not put forward any specific metric.

The following sections discuss the main findings and limitations of this work and the key conclusions.

4 Discussion

This work started with the goal and motivation to find a wide range of objective ethics metrics, catering to, if possible, all of the majority of the widely accepted principles. This would make it possible for Data Scientists to have a list of observable metrics, that they could easily implement and integrate into any Data Science pipeline in order to assess its alignment with ethical principles and guidelines.

However, it results from this literature review that the subject of ethics in the area of AI is still relatively immature. While it has gained significant theoretical attention as one of the prominent subjects of discussion, practical solutions in this area are still underdeveloped. Maybe more interdisciplinary efforts that include, aside from ethics experts, specialists such as Computer Scientists, Data Scientists, ML Engineers, or AI Architects could be beneficial.

In face of the lack of practical solutions, the prevailing discourse predominantly revolves around the future trajectory of AI and the imperative of ethical considerations, yet only a minority of researchers actively engage in developing viable, measurable and practical solutions to address this challenge.

While several of the articles identified analyzed the challenges of AI, the ethical needs for AI, or how AI is impacting society, these were mostly theoretical works. Thus, it was challenging to draw specific conclusions or identify actual metrics.

Furthermore, it is generally acknowledged that all ethical principles are relevant when evaluating a Data Science pipeline, although some variations may occur between fields. For instance, while privacy is always important, it might be more important in healthcare or education than in certain industrial settings. However, the lack of ethics metrics in 6 out of 7 ethical principles can be seen as potentially hindering the progress toward a safe, fair and all-encompassing ethical AI. The reason for this unbalance can be explained, from our perception, by the relative facility that there is in assessing fairness by following a distributive and comparative approach. Generally, by analyzing how data (e.g., raw data, processed data, model outputs) are distributed across sensitive features.

We must thus be aware that the field of AI Ethics is nowadays very much biased toward Fairness, with a significant disregard for other principles. This is, in our opinion, both a current handicap of the field but also a major research opportunity.

Interestingly, a superficial analysis of the literature might appear to indicate that there are many more metrics than those identified in this literature review. However, this happens as many of the existing metrics overlap significantly or represent the same concept, albeit with a different denomination. A relevant contribution of this work was to reduce such groups of equivalent metrics into a single one, contributing to a clarification of the terminology and of the metrics available. Specifically, we reduced the original set of 33 metrics to 15, which still represent the same concepts being measured.

Aside from this, this SLR, which is, to the best of our knowledge, the first one with these goals, also systematized the existing knowledge regarding objective metrics that can be used to quantify the ethical alignment of a Data Science pipeline. As such, Data Scientists can use it to select the most appropriate metrics to monitor their systems, in a transparent and standardized way. Unfortunately, the existing metrics in the literature only allow to assess the system in terms of fairness.

We plan on bridging this gap in future work by proposing and implementing, in the form of a software library, a group of metrics for each of the identified ethical principles. This library will connect to different batch and streaming data sources and allow for the monitoring of the associated ethical principles out of the shelf. These metrics will be validated in different Data Science pipelines across different domains in an attempt to evaluate their truthfulness and usefulness. While we expected this implementation and development work to follow directly from this work, the current state of affairs demands we first devise and implement the lacking metrics.

4.1 Main findings

This systematic review of the literature on the topic of objective metrics to assess AI ethics identified 38 papers, which were considered for full-text analysis. These were organized according to seven main ethical principles (defined as per the ethics guidelines for trustworthy AI) based on the subjects addressed.

According to the results of the analysis, 24 articles addressed Diversity, Non-Discrimination and Fairness, 14 articles discussed Transparency, 7 articles focused on Accountability, 6 addressed Privacy and Data Governance, 6 were associated with Societal and Environmental Well-being, 3 articles related to Technical Robustness and Safety, and 2 addressed Human Agency and oversight.

Out of these 38 studies, only 12 presented actual objective metrics, all of which related to Diversity, Non-Discrimination and Fairness, which can be assigned to the well-known category of Fairness Metrics.

Out of these metrics, the ones that were most frequently mentioned were demographic parity, equal opportunity, predictive parity and equalized odds.

This research exposes several key aspects:

There is a significant gap between theoretical research and practice, or practical solutions. While the need for metrics related to different principles is often mentioned, the proposal of actual metrics that can be operationalized is scarce;

The research devoted to each principle is highly unbalanced. Some principles are much more prevalent than others (whether in theory or practice). Moreover, all the metrics found are related to the principle of Fairness, which means that the adherence to the remaining principles cannot be objectively evaluated in current AI systems;

The apparent wealth of metrics that are mentioned in the literature is not actually so rich, as many of the proposed metrics are actually slight variations (sometimes with only a change in the denomination) or sometimes measuring the exact same concept in a slightly different way;

Similarly, many ethical principles have been proposed in the literature. However, many of them overlap or are closely related. Although some authors argue that the seven AI-HLEG principles around which this literature review is organized are not maximally representative [ 5 , 62 , 63 , 64 ], we find that all the metrics identified and the related principles can, in fact, be grouped into those seven main principles.

5 Conclusion

In this systematic literature review, a total of 38 papers, from a pool of 66 candidates, were examined. These papers were categorized according to seven ethical principles. There were two articles that focused on the topic of Human Agency and Oversight, three articles that addressed Technical Robustness and Safety, six articles that explored Privacy and Data Governance, fourteen articles that examined Transparency, twenty-two articles that explored into Diversity, Non-Discrimination and Fairness, six articles that discussed Societal and Environmental Well-being and seven articles that were related to Accountability.

The bulk of studies consisted of academic publications that articulated the necessity of addressing ethical concerns in the field of AI and proposed potential theoretical frameworks for achieving this objective. However, most studies did not offer concrete, practical answers or demonstrate real-world applications.

Among the articles subjected to comprehensive textual analysis, only 12 articles were found to incorporate at least one discernible and practical ethics metric. Notably, these metrics were predominantly centered around fairness and pertained to the ethical principle of Diversity, Non-Discrimination and Fairness.

The most frequent metrics were Demographic Parity (15 times), Equalized Odds (11 times), Equal Opportunity (10 times), Predictive Parity (9 times), Counterfactuals (5 times), Disparate Impact (5 times) and Predictive Equality (3 times).

The key conclusion drawn from this study is that there is a severe lack of practical solutions for the ethical assessment of Data Science pipelines in six out of the seven ethical principles considered. Thus, there is both the need and the scientific opportunity to develop such embracing metrics. Nonetheless, the existing metrics were systematized, and the generated knowledge can be used by Data Scientists to integrate standardized, transparent, and objective ethics metrics into their pipelines, ensuring that their applications are observable in terms of their alignment with ethical standards and guidelines.

Data Availability

Not applicable.

Code Availability

Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.-C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.-C., et al.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4 (6), 406–414 (2022)

Article   Google Scholar  

Hunkenschroer, A.L., Luetge, C.: Ethics of AI-enabled recruiting and selection: a review and research agenda. J. Bus. Ethics 178 , 977–1007 (2022). https://doi.org/10.1007/s10551-022-05049-6

Barrett, C., Boyd, B., Bursztein, E., Carlini, N., Chen, B., Choi, J., Chowdhury, A.R., Christodorescu, M., Datta, A., Feizi, S., Fisher, K., Hashimoto, T., Hendrycks, D., Jha, S., Kang, D., Kerschbaum, F., Mitchell, E., Mitchell, J., Ramzan, Z., Shams, K., Song, D., Taly, A., Yang, D.: Identifying and mitigating the security risks of generative AI. Found. Trends Privacy Secur. 6 (1), 1–52 (2023). https://doi.org/10.1561/3300000041

CDEI: The roadmap to an effective AI assurance ecosystem - GOV.UK (2021)

BridgeAI, I.U.: Report on the core principles and opportunities for responsible and trustworthy AI (2023)

European Commision: Ethics guidelines for trustworthy AI | Shaping Europe’s digital future (2019). https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

Bellamy, R.K.E., Dey, K., Hind, M., Hoffman, S.C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K.N., Richards, J., Saha, D., Sattigeri, P., Singh, M., Varshney, K.R., Zhang, Y.: AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias (2018). https://arxiv.org/abs/1810.01943

Siddaway, A.P., Wood, A.M., Hedges, L.V.: How to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annu. Rev. Psychol. 70 , 747–770 (2019). https://doi.org/10.1146/annurev-psych-010418

PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses. http://prisma-statement.org/

Kaul, A., Soofastaei, A.: Advanced analytics for ethical considerations in mining industry. In Advanced Analytics in Mining Engineering: Leverage Advanced Analytics in Mining Industry to Make Better Business Decisions, pp. 55–80. (2022) https://doi.org/10.1007/978-3-030-91589-6_3/COVER

Kasirzadeh, A.: Algorithmic fairness and structural injustice: insights from feminist political philosophy. In: AIES 2022 - Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pp. 349–356. (2022) https://doi.org/10.1145/3514094.3534188

Zafar, S., Mahjoub, H., Mehta, N., Domalpally, A., Channa, R.: Artificial intelligence algorithms in diabetic retinopathy screening. Curr. Diabet. Rep. 22 , 267–274 (2022). https://doi.org/10.1007/S11892-022-01467-Y/METRICS

Bae, A., Xu, S.: Discovering and understanding algorithmic biases in autonomous pedestrian trajectory predictions. In: SenSys 2022 - Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, pp. 1155–1161. (2022) https://doi.org/10.1145/3560905.3568433

Kasirzadeh, A., Clifford, D.: Fairness and data protection impact assessments. In: AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 146–153. (2021) https://doi.org/10.1145/3461702.3462528

Marshall, R., Pardo, A., Smith, D., Watson, T.: Implementing next generation privacy and ethics research in education technology. Br. J. Educ. Technol. 53 , 737–755 (2022). https://doi.org/10.1111/BJET.13224

Cortés, E.C., Rajtmajer, S., Ghosh, D.: Locality of technical objects and the role of structural interventions for systemic change. In: ACM International Conference Proceeding Series, pp. 2327–2341. (2022) https://doi.org/10.1145/3531146.3534646

Abedin, B.: Managing the tension between opposing effects of explainability of artificial intelligence: a contingency theory perspective. Internet Res. 32 (2), 425–453 (2021). https://doi.org/10.1145/3479645.3479709

Zhang, Q., Liu, J., Zhang, Z., Wen, J., Mao, B., Yao, X.: Mitigating unfairness via evolutionary multiobjective ensemble learning. IEEE Trans. Evolut. Comput. 27 , 848–862 (2023). https://doi.org/10.1109/TEVC.2022.3209544

Schedl, M., Rekabsaz, N., Lex, E., Grosz, T., Greif, E.: Multiperspective and multidisciplinary treatment of fairness in recommender systems research. In: UMAP2022 - Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization, pp. 90–94. (2022) https://doi.org/10.1145/3511047.3536400

Goethals, S., Martens, D., Calders, T.: Precof: counterfactual explanations for fairness. Mach. Learn. (2023). https://doi.org/10.1007/S10994-023-06319-8/FIGURES/10

Tomalin, M., Byrne, B., Concannon, S., Saunders, D., Ullmann, S.: The practical ethics of bias reduction in machine translation: why domain adaptation is better than data debiasing. Ethics Inform. Technol. 23 , 419–433 (2021). https://doi.org/10.1007/S10676-021-09583-1/TABLES/7

Fleisher, W.: What’s fair about individual fairness? In: AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 480–490. (2021) https://doi.org/10.1145/3461702.3462621

Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. SSRN Electron. J. (2020). https://doi.org/10.2139/SSRN.3547922

Saetra, H.S., Wynsberghe, Bolte, L., Nachid, J.: A framework for evaluating and disclosing the esg related impacts of AI with the sdgs. Sustainability 13 , 8503 (2021). https://doi.org/10.3390/SU13158503

Ruf, B., Detyniecki, M.: A tool bundle for ai fairness in practice. In: Conference on Human Factors in Computing Systems - Proceedings (2022). https://doi.org/10.1145/3491101.3519878

Raji, I.D., Buolamwini, J.: Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. In: AIES 2019 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 429–435. (2019) https://doi.org/10.1145/3306618.3314244

Krijger, J.: Enter the metrics: critical theory and organizational operationalization of AI ethics. AI Soc. 37 , 1427–1437 (2022). https://doi.org/10.1007/S00146-021-01256-3/METRICS

Wylde, V., Prakash, E., Hewage, C., Platts, J.: Ethical challenges in the use of digital technologies: Ai and big data. In: Advanced Sciences and Technologies for Security Applications, pp. 33–58. (2023) https://doi.org/10.1007/978-3-031-09691-4_3/COVER

Sahu, S., Singh, S.K.: Ethics in ai: Collaborative filtering based approach to alleviate strong user biases and prejudices. In: 2019 12th International Conference on Contemporary Computing, IC3 2019 (2019) https://doi.org/10.1109/IC3.2019.8844875

Keleko, A.T., Kamsu-Foguem, B., Ngouna, R.H., Tongne, A.: Health condition monitoring of a complex hydraulic system using deep neural network and deepshap explainable xai. Adv. Eng. Softw. 175 , 103339 (2023). https://doi.org/10.1016/J.ADVENGSOFT.2022.103339

McCradden, M.D., Joshi, S., Anderson, J.A., Mazwi, M., Goldenberg, A., Shaul, R.Z.: Patient safety and quality improvement: ethical principles for a regulatory approach to bias in healthcare machine learning. J. Am. Med. Inform. Assoc. JAMIA 27 , 2024–2027 (2020). https://doi.org/10.1093/JAMIA/OCAA085

Lee, W.W.: Tools adapted to ethical analysis of data bias. HKIE Trans. Hong Kong Inst. Eng. 29 , 200–209 (2022). https://doi.org/10.33430/V29N3THIE-2022-0037

Minkkinen, M., Niukkanen, A., Mäntymäki, M.: What about investors? esg analyses as tools for ethics-based AI auditing. AI Soc. 1 , 1–15 (2022). https://doi.org/10.1007/S00146-022-01415-0/TABLES/5

Edwards, A.: IEEE P7010-2020 Standard: Use Cases in Ethical Impact on Human Wellbeing Studies (2020). https://doi.org/10.13140/RG.2.2.21769.88168

Fasterholdt, I., Naghavi-Behzad, M., Rasmussen, B.S.B., Kjølhede, T., Skjøth, M.M., Hildebrandt, M.G., Kidholm, K.: Value assessment of artificial intelligence in medical imaging: a scoping review. BMC Med. Imag. 22 , 1–11 (2022). https://doi.org/10.1186/S12880-022-00918-Y/FIGURES/2

Etienne, H.: Solving moral dilemmas with AI: how it helps us address the social implications of the covid-19 crisis and enhance human responsibility to tackle meta-dilemmas. Law Innov. Technol. 14 , 305–324 (2022). https://doi.org/10.1080/17579961.2022.2113669

Carlson, K.W.: Safe artificial general intelligence via distributed ledger technology. Big Data Cogn. Comput. 3 , 40 (2019). https://doi.org/10.3390/BDCC3030040

Steele, R.W.: Pediatric quality measures: the leap from process to outcomes. Curr. Probl. Pediatr. Adolesc. Health Care 51 , 101065 (2021). https://doi.org/10.1016/J.CPPEDS.2021.101065

Avelar, P.H.C., Audibert, R.B., Lamb, L.C.: Measuring ethics in ai with ai: A methodology and dataset construction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 13653 LNAI, pp. 370–384. (2021) https://doi.org/10.1007/978-3-031-21686-2_26

Riley, P.C., Deshpande, S.V., Ince, B.S., Dereje, R., Davidson, C.E., O’Donnell, K.P., Hauck, B.C.: Interpreting chemical detection alarms with live analysis of ml algorithms. In: Defense + Commercial Sensing, vol. 23 (2022) https://doi.org/10.1117/12.2619166

Claure, H., Chang, M.L., Kim, S., Omeiza, D., Brandao, M., Lee, M.K., Jung, M.: Fairness and transparency in human-robot interaction. In: ACM/IEEE International Conference on Human-Robot Interaction 2022-March, pp. 1244–1246. (2022) https://doi.org/10.1109/HRI53351.2022.9889421

Zou, J., Schiebinger, L.: Ensuring that biomedical ai benefits diverse populations. EBioMedicine 67 , 103358 (2021). https://doi.org/10.1016/j.ebiom.2021.103358

Zhao, K., Ma, S., Sun, Z., Liu, X., Zhu, Y., Xu, Y., Wang, X.: Effect of ai-assisted software on inter- and intra-observer variability for the x-ray bone age assessment of preschool children. BMC Pediatrics 22 , 644 (2022). https://doi.org/10.1186/S12887-022-03727-Y

Young, A.T., Xiong, M., Pfau, J., Keiser, M.J., Wei, M.L.: Artificial intelligence in dermatology: a primer. J. Invest. Dermatol. 140 , 1504–1512 (2020). https://doi.org/10.1016/j.jid.2020.02.026

Lawlor, B.: An overview of the 2022 niso plus conference: global conversations/global connections. Inf. Serv. Use 42 , 327–376 (2022). https://doi.org/10.3233/ISU-220178

Antikainen, J., Agbese, M., Alanen, H.-K., Halme, E., Isomäki, H., Jantunen, M., Kemell, K.-K., Rousi, R., Vainio-Pekka, H., Vakkuri, V.: A deployment model to extend ethically aligned ai implementation method eccola. In: Proceedings of the IEEE International Conference on Requirements Engineering 2021-September , pp. 230–235. (2021) https://doi.org/10.1109/REW53955.2021.00043

Germann, C., Marbach, G., Civardi, F., Fucentese, S.F., Fritz, J., Sutter, R., Pfirrmann, C.W.A., Fritz, B.: Deep convolutional neural network-based diagnosis of anterior cruciate ligament tears: performance comparison of homogenous versus heterogeneous knee mri cohorts with different pulse sequence protocols and 15-t and 3-t magnetic field strengths. Invest. Radiol. 55 , 499–506 (2020). https://doi.org/10.1097/RLI.0000000000000664

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54 (6), 1–35 (2021). https://doi.org/10.1145/3457607

Räz, T.: Group fairness: Independence revisited 10(1145/3442188), 3445876 (2021)

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness Through Awareness. (2011) arXiv:1104.3913

Tang, Z., Zhang, K.: Attainability and optimality: The equalized odds fairness revisited. In: Schölkopf, B., Uhler, C., Zhang, K. (eds.) Proceedings of the First Conference on Causal Learning and Reasoning. Proceedings of Machine Learning Research, vol. 177, pp. 754–786. (2022). https://proceedings.mlr.press/v177/tang22a.html

Verma, S., Rubin, J.: Fairness definitions explained. In: IEEE/ACM International Workshop on Software Fairness , vol. 18 (2018) https://doi.org/10.1145/3194770.3194776

Verma, S., Rubin, J.: Fairness definitions explained. In: Proceedings - International Conference on Software Engineering, pp. 1–7. (2018) https://doi.org/10.1145/3194770.3194776

Castelnovo, A., Crupi, R., Greco, G., Regoli, D., Penco, I.G., Cosentini, A.C.: A clarification of the nuances in the fairness metrics landscape. Sci. Rep. 12 , 1–21 (2022). https://doi.org/10.1038/s41598-022-07939-1

Kusner, M., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Conference on Neural Information Processing Systems. (2017) arXiv:1703.06856

Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17, pp. 1171–1180. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3038912.3052660

Speicher, T., Heidari, H., Grgic-Hlaca, N., Gummadi, K.P., Singla, A., Weller, A., Zafar, M.B.: A unified approach to quantifying algorithmic unfairness: Measuring individual and group unfairness via inequality indices, vol. 12 (2018) https://doi.org/10.1145/3219819.3220046

Ashokan, A., Haas, C.: Fairness metrics and bias mitigation strategies for rating predictions. Inf. Process. Manag. 58 , 102646 (2021). https://doi.org/10.1016/j.ipm.2021.102646

Bellù, L.G., Liberati, P.: Describing income inequality describing income inequality theil index and entropy class indexes. (2006)

Sitthiyot, T., Holasut, K.: A simple method for measuring inequality (2020). https://doi.org/10.1057/s41599-020-0484-6

Zhang, Y., Bellamy, R.K.E., Varshney, K.R.: Joint optimization of ai fairness and utility: a human-centered approach. vol. 10 (1145/3375627), 3375862 (2020)

Jobin, A., Ienca, M., Vayena, E.: Artificial intelligence: the global landscape of ethics guidelines (2019)

Floridi, L., Cowls, J.: A Unified Framework of Five Principles for AI in Society. Harvard Data Science Review 1 (1) (2019). https://hdsr.mitpress.mit.edu/pub/l0jsh9d1

Hagendorff, T.: The ethics of ai ethics: an evaluation of guidelines. Minds Mach. 30 , 99–120 (2020). https://doi.org/10.1007/s11023-020-09517-8

Download references

Acknowledgements

This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3 —“Agenda Mobilizadora da Fileira das Tecnologias de Produção para a Reindustrialização,” Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros. The work of Guilherme Palumbo was funded by a research grant from FCT—Fundação para a Ciência e Tecnologia under Grant Agreement No. [UIDP/04728/2020].

Open access funding provided by FCT|FCCN (b-on). This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3—“Agenda Mobilizadora da Fileira das Tecnologias de Produção para a Reindustrialização,” Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros. The work of Guilherme Palumbo was funded by a research grant from FCT—Fundação para a Ciência e Tecnologia under Grant Agreement No. [UIDP/04728/2020].

Author information

Guilherme Palumbo, Davide Carneiro and Victor Alves have contributed equally to this work.

Authors and Affiliations

CIICESI, ESTG, Politécnico do Porto, Felgueiras, Portugal

Guilherme Palumbo

INESC TEC, Rua Dr. Roberto Frias, 712, 4200-465, Porto, Portugal

Davide Carneiro

ESTG, Politécnico do Porto, Felgueiras, Portugal

ALGORITMI Research Centre / LASI, University of Minho, Braga, Portugal

Victor Alves

You can also search for this author in PubMed   Google Scholar

Contributions

GP did the acquisition, analysis and interpretation of results. All authors wrote the main manuscript text. All authors reviewed the manuscript All authors approved the version to be published.

Corresponding authors

Correspondence to Guilherme Palumbo or Davide Carneiro .

Ethics declarations

Conflict of interest.

Author Guilherme Palumbo has received research support from FCT—Fundação para a Ciência e Tecnologia under Grant Agreement No[UIDP/04728/2020]. Author Davide Carneiro and Victor Alves declare they have no financial interests.

Ethical approval

Consent to participate.

All authors consent to participate.

Consent for publication

All authors consent for publication.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Palumbo, G., Carneiro, D. & Alves, V. Objective metrics for ethical AI: a systematic literature review. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00541-w

Download citation

Received : 19 January 2024

Accepted : 20 March 2024

Published : 13 April 2024

DOI : https://doi.org/10.1007/s41060-024-00541-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review
  • Find a journal
  • Publish with us
  • Track your research

chrome icon

Ethics of AI: A Systematic Literature Review of Principles and Challenges

Chat with paper, trustworthy artificial intelligence: a decision‐making taxonomy of potential challenges, a phenomenological perspective on ai ethical failures: the case of facial recognition technology, the uselessness of ai ethics, ethics & ai: a systematic review on ethical concerns and related strategies for designing with ai in healthcare, a semantic similarity-based identification method for implicit citation functions and sentiments information, implementing ethics in ai: initial results of an industrial multiple case study, using thematic analysis in psychology, artificial intelligence: the global landscape of ethics guidelines., preparing for the future of artificial intelligence, guidelines for snowballing in systematic literature studies and a replication in software engineering, related papers (5), logical and algebraical models of the networks of activities, discovering the value in using leximancer for complex qualitative data analysis, living lab principles: supporting digital innovation, single-subject designs: methodology, six rules for practice-led research, trending questions (3).

AI ethics refers to the principles guiding the development and deployment of artificial intelligence systems to ensure they are fair, transparent, accountable, and respect privacy. Opportunities in AI ethics include fostering social fairness and enhancing public trust in AI technologies. However, significant challenges persist, such as a lack of ethical knowledge among stakeholders and the vagueness of ethical principles, which complicates their practical implementation. Addressing these challenges is crucial for the effective integration of ethics into AI practices and for promoting responsible AI development.

Ethical implications of AI adoption globally include 22 principles like transparency and privacy, with challenges such as lack of ethical knowledge and vague principles hindering widespread acceptance.

- Transparency, privacy, accountability, and fairness are key principles. - Lack of ethical knowledge and vague principles are significant challenges.

  • Access through  your organization
  • Purchase PDF

Article preview

Cited by (8), intelligent systems and learning data analytics in online education, chapter 6 - a literature review on artificial intelligence and ethics in online learning, references (0), harmonizing financial returns with ethical standards: a strategy for investing in responsible artificial intelligence, artificial intelligence-supported teacher tools to increase participation in online courses, ai algorithms and chatgpt for student engagement in online learning, ethical considerations for artificial intelligence in educational assessments, the use of artificial intelligence (ai) in online learning and distance education processes: a systematic review of empirical studies, overcoming the gap of social presence in online learning communities at university.

UCI Libraries Mobile Site

  • Langson Library
  • Science Library
  • Grunigen Medical Library
  • Law Library
  • Connect From Off-Campus
  • Accessibility
  • Gateway Study Center

Libaries home page

Email this link

Ai in research.

  • Assessing AI Tools
  • Research Tools
  • Scholarly Communication & AI
  • Copyright, Citation & AI
  • Systematic Review
  • Writing Code
  • Working with Data
  • Mapping Literature
  • Searching Literature
  • Summarizing Literature

Digital Scholarship Services

DSS fosters the use of digital content and transformative technology in scholarship and academic activities. We provide consultative and technical support for a wide range of tools and platforms. We work with the campus community to publish, promote, and preserve the digital products of research through consultation, teaching, and systems administration. Our areas of expertise include data curation, research data management, computational research, digital humanities, and scholarly communication.

Use of AI is fraught with complications involving accuracy, bias, academic integrity, and intellectual property and may not be appropriate in all academic settings. This guide is meant more for academic researchers looking to utilize AI tools in their research.

Students are strongly advised to consult with their instructor before using AI-generated content in their research or coursework. For information on Generative AI take a look at the guide.

Ethical Dilemmas of AI

Ethical issues related to artificial intelligence are a complex and evolving field of concern. As AI technology continues to advance, it raises various ethical dilemmas and challenges. Here are some of the key ethical issues associated with AI:

  • Bias and Fairness: AI systems can inherit and even amplify biases present in their training data. This can result in unfair or discriminatory outcomes, particularly in hiring, lending, and law enforcement applications. Addressing bias and ensuring fairness in AI algorithms is a critical ethical concern.
  • Privacy: AI systems often require access to large amounts of data, including sensitive personal information. The ethical challenge lies in collecting, using, and protecting this data to prevent privacy violations.
  • Transparency and Accountability: Many AI algorithms, particularly deep learning models, are often considered “black boxes” because they are difficult to understand or interpret. Ensuring transparency and accountability in AI decision-making is crucial for user trust and ethical use of AI.
  • Autonomy and Control: As AI systems become more autonomous, concerns about the potential loss of human control exist. This is especially relevant in applications like autonomous vehicles and military drones, where AI systems make critical decisions.
  • Job Displacement: Automation through AI can lead to job displacement and economic inequality. Ensuring a just transition for workers and addressing the societal impact of automation is an ethical issue.
  • Security and Misuse: AI can be used for malicious purposes, such as cyberattacks, deepfake creation, and surveillance. Ensuring the security of AI systems and preventing their misuse is an ongoing challenge.
  • Accountability and Liability: Determining who is responsible when an AI system makes a mistake or causes harm can be difficult. Establishing clear lines of accountability and liability is essential for addressing AI-related issues.
  • Ethical AI in Healthcare: The use of AI in healthcare, such as diagnostic tools and treatment recommendations, raises ethical concerns related to patient privacy, data security, and the potential for AI to replace human expertise.
  • AI in Criminal Justice: The use for predictive policing, risk assessment, and sentencing decisions can perpetuate biases and raise questions about due process and fairness.
  • Environmental Impact: The computational resources required to train and run AI models can have a significant environmental impact. Ethical considerations include minimizing AI’s carbon footprint and promoting sustainable AI development.
  • AI in Warfare: The development and use of autonomous weapons raise ethical concerns about the potential for AI to make life-and-death decisions in armed conflicts.
  • Bias in Content Recommendation: AI-driven content recommendation systems can reinforce existing biases and filter bubbles, influencing people’s views and opinions.
  • AI in Education: The use of AI in education, such as automated grading and personalized learning, raises concerns about data privacy, the quality of education, and the role of human educators.

From: https://annenberg.usc.edu/research/center-public-relations/usc-annenberg-relevance-report/ethical-dilemmas-ai

Ethics and AI

  • Artificial Intelligence (AI) Ethics: Ethics of AI and Ethical AI (PDF)
  • Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence
  • Ethics of Artificial Intelligence and Robotics  Stanford Encyclopedia of Philosophy
  • Understanding artificial intelligence ethics and safety (PDF) The Alan Turing Institute
  • The Ethical Framework for AI in Education (PDF) The Institute for Ethical AI in Education
  • << Previous: Copyright, Citation & AI
  • Next: Systematic Review >>
  • Last Updated: Sep 24, 2024 8:59 AM
  • URL: https://guides.lib.uci.edu/research_ai

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

AI Can (Mostly) Outperform Human CEOs

  • Hamza Mudassir,
  • Kamal Munir,
  • Shaz Ansari,

ai ethics literature review

When researchers at the University of Cambridge pitted human competitors against a leading LLM, the chatbot beat the top participants on almost every metric. It also was fired more quickly.

Generative AI has demonstrated the potential to significantly outperform human CEOs in strategic decision-making by excelling in data-driven tasks like product design and market optimization. In an experiment simulating the automotive industry, AI models outpaced human participants in market share and profitability but faltered in handling unpredictable disruptions, leading to faster dismissals by virtual boards. While AI’s ability to analyze complex data sets and iterate rapidly could revolutionize corporate strategy, it lacks the intuition and foresight required to navigate black swan events. Rather than fully replacing human CEOs, AI is poised to augment leadership by enhancing data analysis and operational efficiency, leaving humans to focus on long-term vision, ethics, and adaptability in dynamic markets. The future of leadership will likely be a hybrid model where AI complements human decision-making.

Could generative AI step into the C-suite and even replace the CEO?

  • HM Hamza Mudassir is the founder of Strategize.inc and a lecturer in strategy at Judge Business School at the University of Cambridge.
  • KM Kamal Munir is the pro vice chancellor and a professor of strategy at University of Cambridge.
  • SA Shaz Ansari is a professor of strategy and innovation at Judge Business School at the University of Cambridge.
  • AZ Amal Zahra is an artificial intelligence researcher at Strategize.inc.

Partner Center

Ethics of AI: A Systematic Literature Review of Principles and Challenges

ai ethics literature review

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations.

  • Gupta B Gaurav A Chui K Psannis K (2024) The Future of Ethical AI in Large Language Models Challenges in Large Language Model Development and AI Ethics 10.4018/979-8-3693-3860-5.ch013 (410-435) Online publication date: 30-Aug-2024 https://doi.org/10.4018/979-8-3693-3860-5.ch013
  • Asim Z Vasudevan A Khan S Mahmood M Yosof Y Muniyanayaka D (2024) Beyond the Screen Multidisciplinary Applications of Extended Reality for Human Experience 10.4018/979-8-3693-2432-5.ch002 (15-43) Online publication date: 14-Jun-2024 https://doi.org/10.4018/979-8-3693-2432-5.ch002
  • Doungtap S Wang J Phanichraksaphong V (2024) Comparative Analysis of SAAS Model and NPC Integration for Enhancing VR Shopping Experiences Applied Sciences 10.3390/app14156573 14 :15 (6573) Online publication date: 27-Jul-2024 https://doi.org/10.3390/app14156573
  • Show More Cited By

Recommendations

The role and limits of principles in ai ethics: towards a focus on tensions.

The last few years have seen a proliferation of principles for AI ethics. There is substantial overlap between different sets of principles, with widespread agreement that AI should be used for the common good, should not be used to harm people or ...

AI ethics: from principles to practice

Much of the current work on AI ethics has lost its connection to the real-world impact by making AI ethics operable. There exist significant limitations of hyper-focusing on the identification of abstract ethical principles, lacking effective ...

From machine ethics to computational ethics

Research into the ethics of artificial intelligence is often categorized into two subareas—robot ethics and machine ethics. Many of the definitions and classifications of the subject matter of these subfields, as found in the literature, are ...

Information

Published in.

cover image ACM Other conferences

Chalmers | University of Gothenburg, Sweden

Author Picture

University of Chile, Chile

Author Picture

School of Technology at PUCRS University, Brazil

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Machine Ethics
  • Systematic Literature Review
  • Research-article
  • Refereed limited

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 33 Total Citations View Citations
  • 1,878 Total Downloads
  • Downloads (Last 12 months) 1,110
  • Downloads (Last 6 weeks) 117
  • Dhiman R Miteff S Wang Y Ma S Amirikas R Fabian B (2024) Artificial Intelligence and Sustainability—A Review Analytics 10.3390/analytics3010008 3 :1 (140-164) Online publication date: 1-Mar-2024 https://doi.org/10.3390/analytics3010008
  • Zaharia G Apostol I Savin P Tanase I (2024) Digital Frontiers: Assessing the Influence and Ethical Challenges of AI in Online Marketing Proceedings of the International Conference on Business Excellence 10.2478/picbe-2024-0300 18 :1 (3699-3710) Online publication date: 12-Aug-2024 https://doi.org/10.2478/picbe-2024-0300
  • Cohen M Khavkin M Movsowitz Davidow D Toch E (2024) ChatGPT in the public eye: Ethical principles and generative concerns in social media discussions New Media & Society 10.1177/14614448241279034 Online publication date: 21-Sep-2024 https://doi.org/10.1177/14614448241279034
  • Gornet M Delarue S Boritchev M Viard T (2024) Mapping AI ethics: a meso-scale analysis of its charters and manifestos Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency 10.1145/3630106.3658545 (127-140) Online publication date: 3-Jun-2024 https://dl.acm.org/doi/10.1145/3630106.3658545
  • Giarmoleo F Ferrero I Rocchi M Pellegrini M (2024) What ethics can say on artificial intelligence: Insights from a systematic literature review Business and Society Review 10.1111/basr.12336 129 :2 (258-292) Online publication date: 4-Feb-2024 https://doi.org/10.1111/basr.12336
  • Grodzinsky F Wolf M Miller K (2024) Ethical Issues From Emerging AI Applications: Harms Are Happening Computer 10.1109/MC.2023.3332850 57 :2 (44-52) Online publication date: 31-Jan-2024 https://dl.acm.org/doi/10.1109/MC.2023.3332850
  • Homayouni L Hejazi Y Zarifsanaiey N (2024) A Review of Ethical Considerations in Using Artificial Intelligence in E-Learning 2024 11th International and the 17th National Conference on E-Learning and E-Teaching (ICeLeT) 10.1109/ICeLeT62507.2024.10493100 (1-6) Online publication date: 27-Feb-2024 https://doi.org/10.1109/ICeLeT62507.2024.10493100

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

electronics-logo

Article Menu

ai ethics literature review

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Literature review of explainable tabular data analysis.

ai ethics literature review

1. Introduction

The objectives of this survey.

  • To analyze the various techniques, inputs, and methods used to build XAI models since 2021, aiming to identify any superior models for tabular data that have been created since Sahakyan et al.’s, paper.
  • To identify and expand upon Sahakyan et al.’s description of need, challenges, gaps, and opportunities in XAI for tabular data.
  • To explore evaluation methods and metrics used to assess the effectiveness of XAI models specifically concerning tabular data and to see if any new metrics have been developed.

2. Background

Aspects of
Transparency
DefinitionsReference
TransparencyTransparency does not ensure that a user will fully understand the system, but it does provide access to all relevant information regarding the training data, data preprocessing, system performance, and more.[ ]
Algorithmic transparencyRefers to the user’s capacity to comprehend the process the model uses to generate a specific output based on its input data. The main limitation for algorithmically transparent models is that they must be entirely accessible for exploration through mathematical analysis and techniques.[ ]
DecomposabilityDecomposability is the capacity to explain each component of a model, including its inputs, parameters, and calculations. This enhances the understanding and interpretation of the model’s behavior. However, similar to algorithmic transparency, not all models can achieve this. For a model to be decomposable, each input must be easily interpretable, meaning complex features may hinder this ability. Additionally, for an algorithmically transparent model to be decomposable, all its parts must be comprehensible to a human without needing external tools.[ ]
SimulatabilityThis is a model’s capacity to be understood and conceptualized by a human, with complexity being a main factor. Simple models like single perceptron neural networks fit this criterion, more complex rule-based systems with excessive rules do not. An interpretable model should be easily explained through text and visualizations. The model must be sufficiently self-contained for a person to consider and reason on it as a whole.[ ]
Interaction
transparency
Is the clarity and openness in the interactions between users and AI systems? It involves giving users feedback they understand about the system’s actions, decisions, and processes, allowing them to understand how their inputs influence outcomes. This transparency fosters trust and enables users to engage more effectively with technology, as they can see and understand the rationale behind the AI’s behavior.[ ]
Social transparencyThis is the openness and clarity of an AI system’s impact on social dynamics and user interactions. It involves making the system’s intentions, decision-making processes, and effects on individuals and communities clear to users and stakeholders. Social transparency helps users understand how AI influences relationships, societal norms, and behaviors, fostering trust and the responsible use of technology.[ ]

3. Existing Techniques for Explainable Tabular Data Analysis

4. challenges and gaps in explainable tabular data analysis, 4.1. challenges of tabular data, 4.2. bias, incomplete and inaccurate data, 4.3. explanation quality, 4.4. scalability of techniques, 4.5. neural networks, 4.6. xai methods, 4.7. benchmark datasets, 4.8. scalability, 4.9. data structure, 4.10. model evaluation and benchmarks, 4.11. review, 5. applications of explainable tabular data analysis, 5.1. financial sector, 5.2. healthcare sector, 5.4. retail sector, 5.5. manufacturing sector, 5.6. utility sector, 5.7. education, 5.8. summary, 6. future directions and emerging trends, 7. conclusions, author contributions, data availability statement, conflicts of interest.

  • Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023 , 99 , 101805. [ Google Scholar ] [ CrossRef ]
  • Burkart, N.; Huber, M.F. A Survey on the Explainability of Supervised Machine Learning. J. Artif. Intell. Res. 2021 , 70 , 245–317. [ Google Scholar ] [ CrossRef ]
  • Weber, L.; Lapuschkin, S.; Binder, A.; Samek, W. Beyond explaining: Opportunities and challenges of XAI-based model improvement. Inf. Fusion 2023 , 92 , 154–176. [ Google Scholar ] [ CrossRef ]
  • Marcinkevičs, R.; Vogt, J.E. Interpretable and explainable machine learning: A methods-centric overview with concrete examples. WIREs Data Min. Knowl. Discov. 2023 , 13 , e1493. [ Google Scholar ] [ CrossRef ]
  • Sahakyan, M.; Aung, Z.; Rahwan, T. Explainable Artificial Intelligence for Tabular Data: A Survey. IEEE Access 2021 , 9 , 135392–135422. [ Google Scholar ] [ CrossRef ]
  • Alicioglu, G.; Sun, B. A survey of visual analytics for Explainable Artificial Intelligence methods. Comput. Graph. 2021 , 102 , 502–520. [ Google Scholar ] [ CrossRef ]
  • Cambria, E.; Malandri, L.; Mercorio, F.; Mezzanzanica, M.; Nobani, N. A survey on XAI and natural language explanations. Inf. Process. Manag. 2023 , 60 , 103111. [ Google Scholar ] [ CrossRef ]
  • Chinu, U.; Bansal, U. Explainable AI: To Reveal the Logic of Black-Box Models. New Gener. Comput. 2023 , 42 , 53–87. [ Google Scholar ] [ CrossRef ]
  • Schwalbe, G.; Finzel, B. A comprehensive taxonomy for explainable artificial intelligence: A systematic survey of surveys on methods and concepts. Data Min. Knowl. Discov. 2023 , 38 , 3043–3101. [ Google Scholar ] [ CrossRef ]
  • Yang, W.; Wei, Y.; Wei, H.; Chen, Y.; Huang, G.; Li, X.; Li, R.; Yao, N.; Wang, X.; Gu, X.; et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Hum.-Centric Intell. Syst. 2023 , 3 , 161–188. [ Google Scholar ] [ CrossRef ]
  • Hamm, P.; Klesel, M.; Coberger, P.; Wittmann, H.F. Explanation matters: An experimental study on explainable AI. Electron. Mark. 2023 , 33 , 17. [ Google Scholar ] [ CrossRef ]
  • Lance, E. Ways That the GDPR Encompasses Stipulations for Explainable AI or XAI ; SSRN, Stanford Center for Legal Informatics: Stanford, CA, USA, 2022; pp. 1–7. Available online: https://ssrn.com/abstract=4085089 (accessed on 30 August 2023).
  • Gunning, D.; Vorm, E.; Wang, J.Y.; Turek, M. DARPA’s explainable AI (XAI) program: A retrospective. Appl. AI Lett. 2021 , 2 , e61. [ Google Scholar ] [ CrossRef ]
  • Allgaier, J.; Mulansky, L.; Draelos, R.L.; Pryss, R. How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare. Artif. Intell. Med. 2023 , 143 , 102616. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Graziani, M.; Dutkiewicz, L.; Calvaresi, D.; Amorim, J.P.; Yordanova, K.; Vered, M.; Nair, R.; Abreu, P.H.; Blanke, T.; Pulignano, V.; et al. A global taxonomy of interpretable AI: Unifying the terminology for the technical and social sciences. Artif. Intell. Rev. 2022 , 56 , 3473–3504. [ Google Scholar ] [ CrossRef ]
  • Bellucci, M.; Delestre, N.; Malandain, N.; Zanni-Merk, C. Towards a terminology for a fully contextualized XAI. Procedia Comput. Sci. 2021 , 192 , 241–250. [ Google Scholar ] [ CrossRef ]
  • Barbiero, P.; Fioravanti, S.; Giannini, F.; Tonda, A.; Lio, P.; Di Lavore, E. Categorical Foundations of Explainable AI: A Unifying Formalism of Structures and Semantics. In Explainable Artificial Intelligence. xAI, Proceedings of the Communications in Computer and Information Science, Delhi, India, 21–24 May 2024 ; Springer: Cham, Switzerland, 2024; Volume 2155, pp. 185–206. [ Google Scholar ] [ CrossRef ]
  • Vilone, G.; Longo, L. Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 2021 , 76 , 89–106. [ Google Scholar ] [ CrossRef ]
  • Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019 , 1 , 206–215. [ Google Scholar ] [ CrossRef ]
  • Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020 , 58 , 82–115. [ Google Scholar ] [ CrossRef ]
  • Haresamudram, K.; Larsson, S.; Heintz, F. Three Levels of AI Transparency. Computer 2023 , 56 , 93–100. [ Google Scholar ] [ CrossRef ]
  • Wadden, J.J. Defining the undefinable: The black box problem in healthcare artificial intelligence. J. Med Ethic 2021 , 48 , 764–768. [ Google Scholar ] [ CrossRef ]
  • Burrell, J. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data Soc. 2016 , 3 , 1–12. [ Google Scholar ] [ CrossRef ]
  • Markus, A.F.; Kors, J.A.; Rijnbeek, P.R. The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inform. 2021 , 113 , 103655. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Brożek, B.; Furman, M.; Jakubiec, M.; Kucharzyk, B. The black box problem revisited. Real and imaginary challenges for automated legal decision making. Artif. Intell. Law 2023 , 32 , 427–440. [ Google Scholar ] [ CrossRef ]
  • Li, D.; Liu, Y.; Huang, J.; Wang, Z. A Trustworthy View on Explainable Artificial Intelligence Method Evaluation. Computer 2023 , 56 , 50–60. [ Google Scholar ] [ CrossRef ]
  • Nauta, M.; Trienes, J.; Pathak, S.; Nguyen, E.; Peters, M.; Schmitt, Y.; Schlötterer, J.; van Keulen, M.; Seifert, C. From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI. ACM Comput. Surv. 2023 , 55 , 295. [ Google Scholar ] [ CrossRef ]
  • Lopes, P.; Silva, E.; Braga, C.; Oliveira, T.; Rosado, L. XAI Systems Evaluation: A Review of Human and Computer-Centred Methods. Appl. Sci. 2022 , 12 , 9423. [ Google Scholar ] [ CrossRef ]
  • Baptista, M.L.; Goebel, K.; Henriques, E.M. Relation between prognostics predictor evaluation metrics and local interpretability SHAP values. Artif. Intell. 2022 , 306 , 103667. [ Google Scholar ] [ CrossRef ]
  • Fouladgar, N.; Alirezaie, M.; Framling, K. Metrics and Evaluations of Time Series Explanations: An Application in Affect Computing. IEEE Access 2022 , 10 , 23995–24009. [ Google Scholar ] [ CrossRef ]
  • Oblizanov, A.; Shevskaya, N.; Kazak, A.; Rudenko, M.; Dorofeeva, A. Evaluation Metrics Research for Explainable Artificial Intelligence Global Methods Using Synthetic Data. Appl. Syst. Innov. 2023 , 6 , 26. [ Google Scholar ] [ CrossRef ]
  • Speith, T. A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods. In Proceedings of the FAccT ‘22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022; pp. 2239–2250. [ Google Scholar ]
  • Kurdziolek, M. Explaining the Unexplainable: Explainable AI (XAI) for UX. User Experience Magazine . 2022. Available online: https://uxpamagazine.org/explaining-the-unexplainable-explainable-ai-xai-for-ux/ (accessed on 20 August 2023).
  • Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability beyond feature attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, ICML, Stockholm, Sweden, 10–15 July 2018; Volume 6, pp. 4186–4195. Available online: https://proceedings.mlr.press/v80/kim18d/kim18d.pdf (accessed on 30 July 2024).
  • Kenny, E.M.; Keane, M.T. Explaining Deep Learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in XAI. Knowl. Based Syst. 2021 , 233 , 107530. [ Google Scholar ] [ CrossRef ]
  • Alfeo, A.L.; Zippo, A.G.; Catrambone, V.; Cimino, M.G.; Toschi, N.; Valenza, G. From local counterfactuals to global feature importance: Efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput. Methods Programs Biomed. 2023 , 236 , 107550. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • An, J.; Zhang, Y.; Joe, I. Specific-Input LIME Explanations for Tabular Data Based on Deep Learning Models. Appl. Sci. 2023 , 13 , 8782. [ Google Scholar ] [ CrossRef ]
  • Bharati, S.; Mondal, M.R.H.; Podder, P. A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When? IEEE Trans. Artif. Intell. 2023 , 5 , 1429–1442. [ Google Scholar ] [ CrossRef ]
  • Chaddad, A.; Peng, J.; Xu, J.; Bouridane, A. Survey of Explainable AI Techniques in Healthcare. Sensors 2023 , 23 , 634. [ Google Scholar ] [ CrossRef ]
  • Chamola, V.; Hassija, V.; Sulthana, A.R.; Ghosh, D.; Dhingra, D.; Sikdar, B. A Review of Trustworthy and Explainable Artificial Intelligence (XAI). IEEE Access 2023 , 11 , 78994–79015. [ Google Scholar ] [ CrossRef ]
  • Chen, X.-Q.; Ma, C.-Q.; Ren, Y.-S.; Lei, Y.-T.; Huynh, N.Q.A.; Narayan, S. Explainable artificial intelligence in finance: A bibliometric review. Financ. Res. Lett. 2023 , 56 , 104145. [ Google Scholar ] [ CrossRef ]
  • Di Martino, F.; Delmastro, F. Explainable AI for clinical and remote health applications: A survey on tabular and time series data. Artif. Intell. Rev. 2022 , 56 , 5261–5315. [ Google Scholar ] [ CrossRef ]
  • Kök, I.; Okay, F.Y.; Muyanlı, O.; Özdemir, S. Explainable Artificial Intelligence (XAI) for Internet of Things: A Survey. IEEE Internet Things J. 2023 , 10 , 14764–14779. [ Google Scholar ] [ CrossRef ]
  • Haque, A.B.; Islam, A.N.; Mikalef, P. Explainable Artificial Intelligence (XAI) from a user perspective: A synthesis of prior literature and problematizing avenues for future research. Technol. Forecast. Soc. Chang. 2023 , 186 , 122120. [ Google Scholar ] [ CrossRef ]
  • Sahoh, B.; Choksuriwong, A. The role of explainable Artificial Intelligence in high-stakes decision-making systems: A systematic review. J. Ambient. Intell. Humaniz. Comput. 2023 , 14 , 7827–7843. [ Google Scholar ] [ CrossRef ]
  • Saranya, A.; Subhashini, R. A systematic review of Explainable Artificial Intelligence models and applications: Recent developments and future trends. Decis. Anal. J. 2023 , 7 , 100230. [ Google Scholar ] [ CrossRef ]
  • Sosa-Espadas, C.E.; Orozco-Del-Castillo, M.G.; Cuevas-Cuevas, N.; Recio-Garcia, J.A. IREX: Iterative Refinement and Explanation of classification models for tabular datasets. SoftwareX 2023 , 23 , 101420. [ Google Scholar ] [ CrossRef ]
  • Meding, K.; Hagendorff, T. Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms. Philos. Technol. 2024 , 37 , 4. [ Google Scholar ] [ CrossRef ]
  • Batko, K.; Ślęzak, A. The use of Big Data Analytics in healthcare. J. Big Data 2022 , 9 , 3. [ Google Scholar ] [ CrossRef ]
  • Borisov, V.; Leemann, T.; Seßler, K.; Haug, J.; Pawelczyk, M.; Kasneci, G. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2022 , 35 , 7499–7519. [ Google Scholar ] [ CrossRef ]
  • Mbanaso, M.U.; Abrahams, L.; Okafor, K.C. Data Collection, Presentation and Analysis. In Research Techniques for Computer Science, Information Systems and Cybersecurity ; Springer: Cham, Switzerland, 2023; pp. 115–138. [ Google Scholar ] [ CrossRef ]
  • Tjoa, E.; Guan, C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Trans. Neural Networks Learn. Syst. 2021 , 32 , 4793–4813. [ Google Scholar ] [ CrossRef ]
  • Gajcin, J.; Dusparic, I. Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities. ACM Comput. Surv. 2024 , 56 , 219. [ Google Scholar ] [ CrossRef ]
  • Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2023 , 16 , 45–74. [ Google Scholar ] [ CrossRef ]
  • Lötsch, J.; Kringel, D.; Ultsch, A. Explainable Artificial Intelligence (XAI) in Biomedicine: Making AI Decisions Trustworthy for Physicians and Patients. BioMedInformatics 2021 , 2 , 1–17. [ Google Scholar ] [ CrossRef ]
  • Hossain, I.; Zamzmi, G.; Mouton, P.R.; Salekin, S.; Sun, Y.; Goldgof, D. Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions. ACM Comput. Surv. 2023 . [ Google Scholar ] [ CrossRef ]
  • Rudin, C.; Chen, C.; Chen, Z.; Huang, H.; Semenova, L.; Zhong, C. Interpretable machine learning: Fundamental principles and 10 grand challenges. Stat. Surv. 2022 , 16 , 1–85. [ Google Scholar ] [ CrossRef ]
  • Zhong, X.; Gallagher, B.; Liu, S.; Kailkhura, B.; Hiszpanski, A.; Han, T.Y.-J. Explainable machine learning in materials science. NPJ Comput. Mater. 2022 , 8 , 204. [ Google Scholar ] [ CrossRef ]
  • Ekanayake, I.; Meddage, D.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022 , 16 , e01059. [ Google Scholar ] [ CrossRef ]
  • Černevičienė, J.; Kabašinskas, A. Explainable artificial intelligence (XAI) in finance: A systematic literature review. Artif. Intell. Rev. 2024 , 57 , 216. [ Google Scholar ] [ CrossRef ]
  • Weber, P.; Carl, K.V.; Hinz, O. Applications of Explainable Artificial Intelligence in Finance—A systematic review of Finance, Information Systems, and Computer Science literature. Manag. Rev. Q. 2024 , 74 , 867–907. [ Google Scholar ] [ CrossRef ]
  • Leijnen, S.; Kuiper, O.; van der Berg, M. Impact Your Future Xai in the Financial Sector a Conceptual Framework for Explainable Ai (Xai). Hogeschool Utrecht, Lectoraat Artificial Intelligence, Whitepaper, Version 1, 1–24. 2020. Available online: https://www.hu.nl/onderzoek/projecten/uitlegbare-ai-in-de-financiele-sector (accessed on 2 August 2024).
  • Dastile, X.; Celik, T. Counterfactual Explanations with Multiple Properties in Credit Scoring. IEEE Access 2024 , 12 , 110713–110728. [ Google Scholar ] [ CrossRef ]
  • Martins, T.; de Almeida, A.M.; Cardoso, E.; Nunes, L. Explainable Artificial Intelligence (XAI): A Systematic Literature Review on Taxonomies and Applications in Finance. IEEE Access 2023 , 12 , 618–629. [ Google Scholar ] [ CrossRef ]
  • Kalra, A.; Mittal, R. Explainable AI for Improved Financial Decision Support in Trading. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 14–15 March 2024; pp. 1–6. [ Google Scholar ]
  • Wani, N.A.; Kumar, R.; Mamta; Bedi, J.; Rida, I. Explainable AI-driven IoMT fusion: Unravelling techniques, opportunities, and challenges with Explainable AI in healthcare. Inf. Fusion 2024 , 110 , 102472. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; Song, X.; Wei, T.; Zhu, B. Counterfactual learning in customer churn prediction under class imbalance. In Proceedings of the 2023 6th International Conference on Big Data Technologies (ICBDT ‘23), Qingdao, China, 22–24 September 2023; pp. 96–102. [ Google Scholar ]
  • Zhang, L.; Zhu, Y.; Ni, Q.; Zheng, X.; Gao, Z.; Zhao, Q. Local/Global explainability empowered expert-involved frameworks for essential tremor action recognition. Biomed. Signal Process. Control 2024 , 95 , 106457. [ Google Scholar ] [ CrossRef ]
  • Sadeghi, Z.; Alizadehsani, R.; Cifci, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A review of Explainable Artificial Intelligence in healthcare. Comput. Electr. Eng. 2024 , 118 , 109370. [ Google Scholar ] [ CrossRef ]
  • Alizadehsani, R.; Oyelere, S.S.; Hussain, S.; Jagatheesaperumal, S.K.; Calixto, R.R.; Rahouti, M.; Roshanzamir, M.; De Albuquerque, V.H.C. Explainable Artificial Intelligence for Drug Discovery and Development: A Comprehensive Survey. IEEE Access 2024 , 12 , 35796–35812. [ Google Scholar ] [ CrossRef ]
  • Murindanyi, S.; Mugalu, B.W.; Nakatumba-Nabende, J.; Marvin, G. Interpretable Machine Learning for Predicting Customer Churn in Retail Banking. In Proceedings of the 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)., Tirunelveli, India, 11–13 April 2023; pp. 967–974. [ Google Scholar ]
  • Mill, E.; Garn, W.; Ryman-Tubb, N.; Turner, C. Opportunities in Real Time Fraud Detection: An Explainable Artificial Intelligence (XAI) Research Agenda. Int. J. Adv. Comput. Sci. Appl. 2023 , 14 , 1172–1186. [ Google Scholar ] [ CrossRef ]
  • Dutta, J.; Puthal, D.; Yeun, C.Y. Next Generation Healthcare with Explainable AI: IoMT-Edge-Cloud Based Advanced eHealth. In Proceedings of the IEEE Global Communications Conference, GLOBECOM, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 7327–7332. [ Google Scholar ]
  • Njoku, J.N.; Nwakanma, C.I.; Lee, J.-M.; Kim, D.-S. Evaluating regression techniques for service advisor performance analysis in automotive dealerships. J. Retail. Consum. Serv. 2024 , 80 , 103933. [ Google Scholar ] [ CrossRef ]
  • Agostinho, C.; Dikopoulou, Z.; Lavasa, E.; Perakis, K.; Pitsios, S.; Branco, R.; Reji, S.; Hetterich, J.; Biliri, E.; Lampathaki, F.; et al. Explainability as the key ingredient for AI adoption in Industry 5.0 settings. Front. Artif. Intell. 2023 , 6 , 1264372. [ Google Scholar ] [ CrossRef ]
  • Finzel, B.; Tafler, D.E.; Thaler, A.M.; Schmid, U. Multimodal Explanations for User-centric Medical Decision Support Systems. CEUR Workshop Proc. 2021 , 3068 , 1–6. [ Google Scholar ]
  • Brochado, F.; Rocha, E.M.; Addo, E.; Silva, S. Performance Evaluation and Explainability of Last-Mile Delivery. Procedia Comput. Sci. 2024 , 232 , 2478–2487. [ Google Scholar ] [ CrossRef ]
  • Kostopoulos, G.; Davrazos, G.; Kotsiantis, S. Explainable Artificial Intelligence-Based Decision Support Systems: A Recent Review. Electronics 2024 , 13 , 2842. [ Google Scholar ] [ CrossRef ]
  • Nyrup, R.; Robinson, D. Explanatory pragmatism: A context-sensitive framework for explainable medical AI. Ethics Inf. Technol. 2022 , 24 , 13. [ Google Scholar ] [ CrossRef ]
  • Talaat, F.M.; Aljadani, A.; Alharthi, B.; Farsi, M.A.; Badawy, M.; Elhosseini, M. A Mathematical Model for Customer Segmentation Leveraging Deep Learning, Explainable AI, and RFM Analysis in Targeted Marketing. Mathematics 2023 , 11 , 3930. [ Google Scholar ] [ CrossRef ]
  • Kulkarni, S.; Rodd, S.F. Context Aware Recommendation Systems: A review of the state of the art techniques. Comput. Sci. Rev. 2020 , 37 , 100255. [ Google Scholar ] [ CrossRef ]
  • Sarker, A.A.; Shanmugam, B.; Azam, S.; Thennadil, S. Enhancing smart grid load forecasting: An attention-based deep learning model integrated with federated learning and XAI for security and interpretability. Intell. Syst. Appl. 2024 , 23 , 200422. [ Google Scholar ] [ CrossRef ]
  • Nnadi, L.C.; Watanobe, Y.; Rahman, M.; John-Otumu, A.M. Prediction of Students’ Adaptability Using Explainable AI in Educational Machine Learning Models. Appl. Sci. 2024 , 14 , 5141. [ Google Scholar ] [ CrossRef ]
  • Vellido, A.; Martín-Guerrero, J.D.; Lisboa, P.J.G. Making machine learning models interpretable. In Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 25–27 April 2012; pp. 163–172. Available online: https://www.esann.org/sites/default/files/proceedings/legacy/es2012-7.pdf (accessed on 16 August 2024).
  • Alkhatib, A.; Ennadir, S.; Boström, H.; Vazirgiannis, M. Interpretable Graph Neural Networks for Tabular Data. In Proceedings of the ICLR 2024 Data-Centric Machine Learning Research (DMLR) Workshop, Vienna, Austria, 26–27 July 2024; pp. 1–35. Available online: https://openreview.net/pdf/60ce21fd5bcf7b6442b1c9138d40e45251d03791.pdf (accessed on 23 August 2024).
  • Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl. Based Syst. 2023 , 263 , 110273. [ Google Scholar ] [ CrossRef ]
  • de Oliveira, R.M.B.; Martens, D. A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data. Appl. Sci. 2021 , 11 , 7274. [ Google Scholar ] [ CrossRef ]
  • Bienefeld, N.; Boss, J.M.; Lüthy, R.; Brodbeck, D.; Azzati, J.; Blaser, M.; Willms, J.; Keller, E. Solving the explainable AI conundrum by bridging clinicians’ needs and developers’ goals. NPJ Digit. Med. 2023 , 6 , 94. [ Google Scholar ] [ CrossRef ]
  • Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable Machine Learning—A Brief History, State-of-the-Art and Challenges. In ECML PKDD 2020 Workshops, Proceedings of the ECML PKDD 2020, Ghent, Belgium, 14–18 September 2020 ; Koprinska, I., Kamp, M., Appice, A., Loglisci, C., Antonie, L., Zimmermann, A., Guidotti, R., Özgöbek, Ö., Ribeiro, R.P., Gavaldà, R., et al., Eds.; Springer: Cham, Switzerland, 2021; Volume 1323, pp. 417–431. [ Google Scholar ] [ CrossRef ]
  • Pawlicki, M.; Pawlicka, A.; Kozik, R.; Choraś, M. Advanced insights through systematic analysis: Mapping future research directions and opportunities for xAI in deep learning and artificial intelligence used in cybersecurity. Neurocomputing 2024 , 590 , 127759. [ Google Scholar ] [ CrossRef ]
  • Hartog, P.B.R.; Krüger, F.; Genheden, S.; Tetko, I.V. Using test-time augmentation to investigate explainable AI: Inconsistencies between method, model and human intuition. J. Cheminform. 2024 , 16 , 39. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Srinivasu, P.N.; Sandhya, N.; Jhaveri, R.H.; Raut, R. From Blackbox to Explainable AI in Healthcare: Existing Tools and Case Studies. Mob. Inf. Syst. 2022 , 2022 , 167821. [ Google Scholar ] [ CrossRef ]
  • Rong, Y.; Leemann, T.; Nguyen, T.-T.; Fiedler, L.; Qian, P.; Unhelkar, V.; Seidel, T.; Kasneci, G.; Kasneci, E. Towards Human-Centered Explainable AI: A Survey of User Studies for Model Explanations. IEEE Trans. Pattern Anal. Mach. Intell. 2024 , 46 , 2104–2122. [ Google Scholar ] [ CrossRef ]
  • Baniecki, H.; Biecek, P. Adversarial attacks and defenses in explainable artificial intelligence: A survey. Inf. Fusion 2024 , 107 , 102303. [ Google Scholar ] [ CrossRef ]
  • Panigutti, C.; Hamon, R.; Hupont, I.; Llorca, D.F.; Yela, D.F.; Junklewitz, H.; Scalzo, S.; Mazzini, G.; Sanchez, I.; Garrido, J.S.; et al. The role of explainable AI in the context of the AI Act. In Proceedings of the FAccT ‘23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023; pp. 1139–1150. [ Google Scholar ]
  • Madiega, T.; Chahri, S. EU Legislation in Progress: Artificial Intelligence Act, 1–12. 2024. Available online: https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf (accessed on 16 August 2024).

Click here to enlarge figure

DomainsExamples of Applications of Explainable Tabular Data
Financial SectorIdentity verification in client onboarding
Transaction data analysis
Fraud detection in claims management
Anti-money laundering monitoring
Financial trading
Risk management
Processing of loan applications
Bankruptcy prediction
Insurance industryInsurance premium calculation
Healthcare SectorPatient diagnosis
Drug efficacy
Personalized healthcare
FraudIdentification of fraudulent transactions
Retail SectorCustomer churn prediction
Improve product suggestions to a customer
Customer segmentation
Human resourcesEmployee churn prediction
Evaluate employee performance
Manufacturing SectorLogistics and supply chain management
Order fulfilment
Quality control
Process control
Planning and scheduling
Predictive maintenance
Utility SectorSmart grid load balancing
Forecast energy consumption for customers
EducationPredict student adaptability
Predict student exam grades
Course recommendations
DatabaseReasons
Google ScholarComprehensive Coverage: Accesses a wide range of disciplines and sources, including articles, theses, books, and conference papers, providing a broad view of available literature.
User-Friendly Interface: Easy to use, making it accessible
Citation Tracking: Shows how often articles have been cited and helps to gauge their influence and relevance.
IEEE XploreSpecialised Focus: On electrical engineering, computer science, and electronics.
High-Quality Publications: Includes peer-reviewed journals and conference proceedings from reputable organizations.
Cutting-Edge Research: Provides access to the latest research published in technology and engineering.
ACM Digital LibraryFocus on Computing and Information Technology: Resources specifically related to computing, software engineering, and information systems.
Peer-Reviewed Content: High academic quality through rigorous peer review.
Conference Proceedings: Important conferences in computing, giving the latest research developments.
PubMedBiomedical Focus: A vast collection of literature in medicine, life sciences, and health, often innovative computing solutions.
Free Access: Many articles are available for free.
High-Quality Research: Peer-reviewed journals and is a trusted source for medical and clinical research.
ScopusExtensive Database: A wide range of disciplines
Citation Analysis Tools: Provides metrics for authors and journals.
Quality Control: Peer-reviewed literature, reliability of the sources.
ScienceDirectMultidisciplinary Coverage: A vast collection of scientific and technical research.
Quality Journals: High-impact journals.
Full-Text Access: Access to a large number of full-text articles, facilitating in-depth research.
Search TermsNumber of Papers
XAI AND explainable artificial intelligence128
XAI AND explainable artificial intelligence AND 202128
XAI AND explainable artificial intelligence AND 202243
XAI AND explainable artificial intelligence AND 202357
2021 AND tabular2
2022 AND tabular5
2023 AND tabular5
2021 AND survey (in title)5
2022 AND survey (in title)1
2023 AND survey (in title)8
2021 AND survey AND tabular1
2022 AND survey AND tabular6
2023 AND survey AND tabular11
2021 AND survey AND tabular AND Sahakyan (Sahakyan’s article)1
2022 AND survey AND tabular AND Sahakyan0
2023 AND survey AND tabular AND Sahakyan2
ComprehensibilityDefinitionsReference
ComprehensibilityThe clarity of the language employed by a method for providing explanations.[ ]
Comprehensible systemsUnderstandable systems produce symbols, allowing users to generate explanations for how a conclusion is derived.[ ]
Degree of comprehensibilityThis is a subjective evaluation, as the potential for understanding relies on the viewer’s background knowledge. The more specialized the AI application, the greater the reliance on domain knowledge for the comprehensibility of the XAI system.[ ]
Comprehensibility of individual explanationsThe length of explanations and how readable they are.[ ]
Summary of XAI Types
Type of XAIDescriptionExamplesProsConsEvaluation
Counterfactual explanationsCounterfactual explanations illustrate how minimal changes in input features can change the model’s prediction, e.g., “If income increases by £5000, the loan is approved”.DiCECausal insight—understand the causal relationship between input features and predictions.Complexity—generating counterfactuals is computationally intensive, particularly for complex models and high-dimensional data.Alignment with predicted outcome—ensuring the generated counterfactual instances closely reflect the intended predicted outcome.
WatcherCFPersonalized explanations—tailors individualized insights for better insights.Complexity—generating counterfactuals is computationally intensive, particularly for complex models
and high-dimensional data.
Alignment with predicted outcome—ensuring the generated counterfactual instances closely reflect the intended predicted outcome.
GrowingSpheresCFDecision support—aids decision making with actionable outcome-focused changesModel specificity—effectiveness is influenced by the underlying model’s characteristics.Proximity to original instance—maintaining similarity to the original instance whilst altering the fewest features possible.
Interpretation—conveying implications can necessitate domain expertise.Diverse outputs—capable of producing multiple diverse counterfactual explanations.
Feasible feature values—the counterfactual features should be practical and adhere to the data distribution.
Feature importanceFeature importance methods assess how much each feature contributes to the model’s predictions. Permutation ImportanceHelps in feature selection and model interpretability.May not capture complex feature interactions.Relative importance—rank features based on their contribution to the model’s prediction.
Gain Importance.Provides insight into the most influential features driving the model’s decisions.Can be sensitive to data noise and model assumptions.Stability—ensure consistency of feature importance over different subsets of the data or re-trainings of the model.
SHAP Model impact—Assessing the influence of individual features on the model’s predictive performance
LIME
Feature interactionsFeature interaction analysis looks at how the combined effect of multiple input features influences the model’s predictions.Partial Dependence plotsReveals intricate and synergistic connections among features. Visualizing and interpreting features can be difficult, especially when dealing with high-dimensional data. Non-linear relationships—uncovers and visualizes complex, non-linear interactions among the features.
Accumulated Local Effects plots.Enhances insight into the model’s decision-making mechanism.The computational complexity grows as the number of interacting features increases.Holistic insight—provides a comprehensive understanding of how features collectively impact the model’s predictions.
Individual Conditional Expectation Plots Predictive power—evaluates the combined effects of interacting features on the model ‘s performance.
Interaction Values
Decision rulesDecision rules provide clear, human-readable guidelines derived from the model, such as “If age > 30 and income > 50k, then approve loan”.Decision TreesProvides clear and intuitive insights into the model’s predictions.Might struggle to capture complex relationships in the data, leading to oversimplification.Transparency—offers clear and interpretable explanations of the conditions and criteria used for decision making.
Rule-Based ModelsEasily understood by non-technical stakeholders.Can be prone to overfitting, reducing generalization performanceUnderstandability—ensures ease of understanding by non-technical stakeholders and experts alike.
Anchors Model adherence—check that decision rules capture accurately the model’s decision logic without oversimplification.
Simplified modelsSimplified models are interpretable machine learning models that approximate the behavior of a more complex black-box modelGeneralized
Additive Models.
Gives a balance between model interpretability and model complexity.Might not capture the total complexity of the underlying data generating process.Balance of complexity—achieves an optimal compromise between model simplicity and predictive performance.
Interpretable Tree Ensembles.Offers global insights into the model’s decision-making processNeeds careful model choice and tunning to maintain a good trade-off between interpretability and accuracy.Interpretable representation—ensures that the offers transparent and intuitive insights into the original complex model’s behavior.
Fidelity to original model—Assesses the extent to which the simplified model captures the key characteristics and patterns of the original complex model.
Possible Research AreasSuggestions
Hybrid Explanations [ ]Combining multiple XAI techniques to provide more comprehensive and robust explanations for tabular data models [ ].
Integrating global and local interpretability methods to offer both high-level and instance-specific insights.
Counterfactual ExplanationsGenerating counterfactual examples that show how the model’s predictions would change if certain feature values were altered [ ].
Helping users understand the sensitivity of the model to different feature inputs and how to achieve desired outcomes.
Causal Inference [ ]Incorporating causal reasoning into XAI methods to better understand the underlying relationships and dependencies in tabular data [ ].
Identifying causal features that drive the model’s predictions, beyond just correlational relationships.
Interactive VisualizationsDeveloping interactive visualization tools that allow users to explore and interpret the model’s behavior on tabular data [ ].
Enabling users to interactively adjust feature values and observe the corresponding changes in model outputs [ ].
Scalable XAI Techniques [ ]Designing XAI methods that can handle the growing volume and complexity of tabular datasets across various domains [ ].
Improving the computational efficiency and scalability of XAI techniques to support real-world applications.
Domain-specific XAITailoring XAI approaches to the specific needs and requirements of different industries and applications that rely on tabular data, such as finance, healthcare, and manufacturing.
Incorporating domain knowledge and constraints to enhance the relevance and interpretability of explanations [ ].
Automated Explanation Generation [ ]Developing AI-powered systems that can automatically generate natural language explanations for the model’s decisions on tabular data [ ].
Bridging the gap between the technical aspects of the model and the end-user’s understanding [ ].
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

O’Brien Quinn, H.; Sedky, M.; Francis, J.; Streeton, M. Literature Review of Explainable Tabular Data Analysis. Electronics 2024 , 13 , 3806. https://doi.org/10.3390/electronics13193806

O’Brien Quinn H, Sedky M, Francis J, Streeton M. Literature Review of Explainable Tabular Data Analysis. Electronics . 2024; 13(19):3806. https://doi.org/10.3390/electronics13193806

O’Brien Quinn, Helen, Mohamed Sedky, Janet Francis, and Michael Streeton. 2024. "Literature Review of Explainable Tabular Data Analysis" Electronics 13, no. 19: 3806. https://doi.org/10.3390/electronics13193806

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10801601

Logo of clinprac

Ethical Dilemmas in Using AI for Academic Writing and an Example Framework for Peer Review in Nephrology Academia: A Narrative Review

1 Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; [email protected] (J.M.); [email protected] (S.S.); [email protected] (O.A.G.V.); [email protected] (F.Q.); [email protected] (W.C.)

Charat Thongprayoon

Supawadee suppadungsuk.

2 Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bang Phli 10540, Samut Prakan, Thailand

Oscar A. Garcia Valencia

Fawad qureshi, wisit cheungpasitporn, associated data.

Data supporting this study are available in the original publication, reports, and preprints that were cited in the reference citation.

The emergence of artificial intelligence (AI) has greatly propelled progress across various sectors including the field of nephrology academia. However, this advancement has also given rise to ethical challenges, notably in scholarly writing. AI’s capacity to automate labor-intensive tasks like literature reviews and data analysis has created opportunities for unethical practices, with scholars incorporating AI-generated text into their manuscripts, potentially undermining academic integrity. This situation gives rise to a range of ethical dilemmas that not only question the authenticity of contemporary academic endeavors but also challenge the credibility of the peer-review process and the integrity of editorial oversight. Instances of this misconduct are highlighted, spanning from lesser-known journals to reputable ones, and even infiltrating graduate theses and grant applications. This subtle AI intrusion hints at a systemic vulnerability within the academic publishing domain, exacerbated by the publish-or-perish mentality. The solutions aimed at mitigating the unethical employment of AI in academia include the adoption of sophisticated AI-driven plagiarism detection systems, a robust augmentation of the peer-review process with an “AI scrutiny” phase, comprehensive training for academics on ethical AI usage, and the promotion of a culture of transparency that acknowledges AI’s role in research. This review underscores the pressing need for collaborative efforts among academic nephrology institutions to foster an environment of ethical AI application, thus preserving the esteemed academic integrity in the face of rapid technological advancements. It also makes a plea for rigorous research to assess the extent of AI’s involvement in the academic literature, evaluate the effectiveness of AI-enhanced plagiarism detection tools, and understand the long-term consequences of AI utilization on academic integrity. An example framework has been proposed to outline a comprehensive approach to integrating AI into Nephrology academic writing and peer review. Using proactive initiatives and rigorous evaluations, a harmonious environment that harnesses AI’s capabilities while upholding stringent academic standards can be envisioned.

1. Introduction

Artificial intelligence (AI) is now a cornerstone of contemporary technological progress, fueling breakthroughs in a wide array of fields—from healthcare and finance to transportation and the arts—leading to enhanced efficiency and productivity [ 1 ]. In the medical realm, AI systems are poring over patient histories to forecast health outcomes [ 2 ], while in the financial world, they are dissecting market fluctuations to fine-tune investment approaches [ 3 ]. Self-driving vehicles are transforming how we think about transportation [ 4 ], and in the realm of entertainment, AI is the unseen curator of your music playlists and film queues [ 5 ]. The scope of AI’s reach is both vast and awe-inspiring, especially when considering the capabilities of AI-generated large language models such as ChatGPT [ 6 ], Bard [ 7 ], Bing Chat [ 8 ], and Claude [ 9 ]. Generative AI refers to a subset of AI that generates content, including text and images, by utilizing natural language processing. OpenAI introduced ChatGPT, an AI chatbot employing natural language processing to emulate human conversation. Its latest iteration, GPT-4, possesses image analysis capabilities known as GPT-4 Vision [ 10 ]. Google’s Bard is another AI-driven chat tool utilizing natural language processing and machine learning to simulate human-like conversations [ 7 ]. Microsoft’s Bing Chat, integrated into Bing’s search engine, enables users to engage with an AI chatbot for search inquiries instead of typing queries. It operates on the same model as ChatGPT (GPT-4) from OpenAI [ 8 ]. Claude, developed by Anthropic, is yet another AI chatbot in the field, currently powered by a language model called Claude 2 [ 9 ].

Within academia, AI’s growing influence is reshaping traditional methodologies [ 11 ]. These AI tools, such as chatbots, are capable of providing personalized medical advice [ 12 ], disseminating educational materials and improving medical education [ 13 , 14 , 15 ], aiding in clinical decision-making processes [ 16 , 17 , 18 ], identifying medical emergencies [ 19 ], and providing empathetic responses to patient queries [ 20 , 21 , 22 ]. Specifically, in our nephrology-focused research, we have explored chatbot applications in critical care nephrology [ 23 ], kidney transplant care [ 24 ], renal diet support [ 25 ], nephrology literature searches [ 26 ], and answering nephrology-related questions [ 27 ]. Despite its potential, there are apprehensions about ChatGPT evolving into a “Weapon of Mass Deception”, emphasizing the necessity for rigorous assessments to mitigate inaccuracies [ 28 ]. The World Health Organization (WHO) is calling for caution to be exercised in using AI models to protect and promote healthcare, due to the major concerns such as safety, effectiveness, and ethics [ 21 , 22 , 29 , 30 ]. The remarkable surge in ChatGPT’s presence within the medical literature, accumulating more than 1400 citations on PubMed by October 2023, highlights a pivotal moment in the merging of AI and healthcare. The increasing adoption of natural language processing models like ChatGPT in various forms of writing, including scientific and scholarly publications, presents a notable shift in the academic domain [ 31 ]. These tools offer the potential to streamline academic writing and the peer review process, enhancing efficiency significantly [ 32 , 33 ]. However, this trend is accompanied by several critical concerns. Key among these are the issues of accuracy, bias, relevance, and the reasoning capabilities of these models. Additionally, there is growing apprehension regarding the impact these tools might have on the authenticity and credibility of academic work, resulting in ethical and societal dilemmas [ 34 , 35 ]. The integration of chatbots and similar technologies in academic settings, therefore, necessitates a careful and thorough examination to address these challenges effectively.

In the field of nephrology, the possibility that chatbots, whether deliberately or inadvertently, might generate incorrect references or introduce errors, threatens the reliability of the medical literature [ 26 ]. Similarly, a study assessing the capability of ChatGPT to summary possible mechanisms of acute kidney injury in patients with coronavirus disease 2019 (COVID-19), with references, found that hallucination is the most significant drawback of ChatGPT [ 36 , 37 ]. In addition, a prospective cross-sectional global survey in urology showed that among 456 urologists, almost half (48%) of them use ChatGPT or other large language models for medical research, with fewer (20%) using the technology in patient care, and more than half (62%) thinking there are potential ethical concerns when using ChatGPT for scientific or academic writing [ 38 ]. Practices that compromise academic integrity or disseminate misleading or false information could significantly affect patient care and the overall comprehension of scientific principles. This scenario underscores the need for vigilant assessment and regulation in the academic and peer review processes to uphold the standards of scholarly work.

This review highlights the importance of collaborative efforts among nephrology academic stakeholders to cultivate an ethical AI environment, safeguarding the integrity of scholarly discourse in the face of fast-paced technological progress. It promotes extensive research to gauge AI’s presence in the academic literature, assess the effectiveness of AI-powered plagiarism detection tools, and gain insights into the lasting effects of AI integration on academic integrity. By actively engaging in these initiatives and conducting thorough assessments, we can strive for a harmonious coexistence with AI while upholding the highest standards of academic excellence.

2. AI’s Unethical Role in Scholarly Writing

The transformative impact of AI on various sectors is well documented, and academia is no exception [ 39 , 40 , 41 ]. While AI has been praised for its ability to expedite research by sifting through massive datasets and running complex simulations, its foray into the realm of academic writing is sparking debate. AI large language model tools like ChatGPT offer tantalizing possibilities: automating literature reviews, suggesting appropriate research methods, and even assisting in the composition of scholarly articles [ 42 ]. Ideally, these advancements could liberate researchers to concentrate on groundbreaking ideas and intricate problem-solving. Yet, the reality diverges sharply from this optimistic scenario ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is clinpract-14-00008-g001.jpg

Ethical concerns surrounding AI’s role in scholarly writing.

Recent discoveries have unveiled a more troubling aspect of AI’s role in academic writing [ 42 , 43 , 44 , 45 ]. Scholars have been caught red-handed, incorporating verbatim text from AI language models into their peer-reviewed articles. Each of these AI tools brings something different to the table: ChatGPT excels in natural language processing, Bard AI is adept at crafting academic prose, Bing Chat is designed for conversational engagement, and Claude AI can distill complex documents into summaries. Despite their potential for good, these tools have been exploited in ways that erode the bedrock of academic integrity. This malpractice has been detected across a spectrum of journals, from lesser-known outlets to those with substantial academic influence [ 22 , 46 ].

The ethical concerns surrounding this issue are multifaceted and deeply disquieting. Firstly, it casts a pall over the very core of academic integrity and the esteemed peer-review process. When scholars are willing to present machine-generated text as their own work, it raises doubts about the genuineness and caliber of contemporary academic pursuits. Secondly, it erodes the credibility of coauthors, editors, and reviewers who are entrusted with upholding scholarly rigor. How did these articles manage to evade detection at the various checkpoints designed to safeguard quality? The answer might lie in systemic weaknesses within the academic publishing landscape, where the imperative to publish at any cost may be compromising scholarly excellence. Moreover, this problem extends beyond academic articles alone. There is evidence to suggest that even grant applications, vital for securing research funding, have been tainted by AI-generated content. This disconcerting revelation raises profound questions about the allocation of research funds and the overarching integrity of academic research.

The recent guidelines issued by the World Association of Medical Editors (WAME) place strong emphasis that AI chatbots, both from an ethical and legal standpoint, should not be recognized as coauthors of manuscripts in scientific literature authorship [ 47 ]. This not only underscores the pressing need for standardized reporting and the implementation of checklists for the utilization of AI tools in medical research, but also advocates for meticulous disclosure of pertinent information about the AI tool employed, which includes its name, version, and specific prompts. Such transparency is pivotal to upholding the credibility and trustworthiness of AI-assisted academic writing. On the other hand, it has also been recognized that ChatGPT and other AI language models hold the potential to function as personal assistants for journal editors and reviewers [ 28 ]. By automating certain repetitive tasks, these AI tools could enhance and streamline their workflow, thereby potentially optimizing the review process. However, it is important to acknowledge that further research and guidance are essential in this domain.

Numerous studies have highlighted that ChatGPT, while proficient in various tasks, shows limitations when dealing with scientific and mathematical concepts that require advanced cognitive skills. This becomes particularly noticeable in tasks demanding deep understanding and complex problem-solving abilities [ 48 , 49 , 50 , 51 ]. Nephrology, distinct from other medical specialties, primarily focuses on diagnosing and treating kidney diseases, including chronic kidney disease, acute renal failure, hypertension, and electrolyte imbalances. It uniquely intersects fluid, electrolyte, and acid–base balance, fundamental for overall body homeostasis. Long-term care of chronic conditions in nephrology demands deep knowledge in kidney physiology, pathology, immunology, and sometimes oncology and pharmacology. Given its complexity, especially in areas like electrolytes and acid-base disorders requiring intricate calculations, the application of AI models like ChatGPT in nephrology poses significant challenges. These include nuanced interpretations and subtle calculations, making AI integration in nephrology academic writing more complex than in other specialties.

2.1. Examples of Academic Papers That Have Used AI-Generated Content, Focusing on ChatGPT-Based Chatbots

In a blinded, randomized, noninferiority controlled study, GPT-4 was found to be equal to humans in writing introductions regarding publishability, readability, and content quality [ 52 ]. An article using GPT-3 to write a review on “effects of sleep deprivation on cognitive function” demonstrated ChatGPT’s adherence to ICMJE co-authorship criteria, including conception, drafting, and accountability [ 53 ]. However, it revealed challenges with accurate referencing. Another paper had GPT-3 generate content on Rapamycin and Pascal’s wager, effectively summarizing benefits, risks, and advising healthcare consultation, listing ChatGPT as first author [ 54 ]. Further example testing ChatGPT’s capability to draft a scholarly manuscript introduction and expand it with references showed promising outcomes. However, it became evident that all references generated by the AI were fictitious. This underscores the limitation of relying solely on ChatGPT for medical writing tasks, particularly in contexts where accurate and real references are critical [ 55 ].

In nephrology, there are currently only a small number of published papers featuring AI-generated content. However, this is still concerning, as it poses questions about the integrity of academic publications. Our prior study employed ChatGPT for a conclusion in the study “Assessing the Accuracy of ChatGPT on Core Questions in Glomerular Disease” [ 56 ]. A letter to editor suggests that academic journals should clarify the proportion of AI language model-generated content in papers, and excessive use should be considered academic misconduct [ 57 ]. Many scientists disapprove that ChatGPT can be listed as author on research papers [ 58 , 59 ]. But recently, science journals have overturned their bans on ChatGPT-authored papers; the publishing group of the American Association for the Advancement of Science (AAAS) allows authors to incorporate AI-written text and figures into papers if technology use is acknowledged and explained [ 60 ]. Similarly, the WAME Recommendations on ChatGPT and Chatbots in Scholarly Publications were updated due to the rapid increase in chatbot usage in scholarly publishing and concerns about content authenticity. These revised recommendations guide authors and reviewers on appropriately attributing chatbot use in their work. They also stress the necessity for journal editors to have tools for manuscript screening to ensure content integrity [ 61 ]. Although ChatGPT’s language generation skills are remarkable, it is important to use it as a supplementary tool rather than a substitute for human expertise, especially in medical writing. Caution and verification are essential when employing AI in such contexts to ensure accuracy and reliability. We should proactively learn about the capabilities, constraints, and possible future developments of these AI tools [ 62 ].

2.2. Systemic Failures: The Root of the Problem

Such lapses in oversight raise critical questions about the efficacy of the peer-review system, which is intended to serve as a multilayered defense for maintaining academic integrity. The first layer that failed was the coauthors, who apparently did not catch the AI-generated content. The second layer was the editorial oversight, which should have flagged the issue before the paper was even sent for peer review. Currently, numerous AI solutions, such as GPTZero, Turnitin AI detection, and AI Detector Pro, have been created for students, research mentors, educators, journal editors, and others to identify texts produced by ChatGPT, though the majority of these tools operate on a subscription model [ 44 ]. The third layer was the peer-review process itself, intended to be a stringent evaluation of a paper’s merit and originality. A study showed that ChatGPT has the potential to generate human-quality text [ 63 ], which raises concerns about the ability to determine whether research was written by a human or an AI tool. As ChatGPT and other language models continue to improve, it is likely that it will become increasingly difficult to distinguish between AI-generated and human-written text [ 64 ]. A study of 72 experienced reviewers of applied linguistics research article manuscripts showed that only 39% were able to distinguish between AI-produced and human-written texts, and the top four rationales used by reviewers were a text’s continuity and coherence, specificity or vagueness of details, familiarity and voice, and writing quality at the sentence level [ 65 ]. Additionally, the accuracy of identification varied depending on the specific texts examined [ 65 ]. The fourth layer was the revision phase, where the paper should have been corrected based on reviewers’ feedback, yet the AI-generated text remained. The fifth and final layer was the proofing stage, where the paper should have undergone a last round of checks before being published. These lapses serve as instructive case studies, spotlighting the deficiencies in the current peer-review system. The breakdown at these various checkpoints suggests that there are underlying systemic problems that risk undermining the quality and integrity of scholarly work.

2.3. The Infiltration of AI in Academic Theses

The problem of AI-generated content is not limited to scholarly articles; it has also infiltrated graduate-level theses. A survey conducted by Intelligent revealed that nearly 30% of college students have used ChatGPT to complete a written assignment, and although 75% considered it a form of cheating, they continue to use it for academic writing [ 66 ]. For example, a master’s thesis from the Department of Letters and English Language displayed unmistakable signs of AI-generated text [ 67 ]. The thesis, focused on Arab American literary characters and titled “The Reality of Contemporary Arab-American Literary Character and the Idea of the Third Space Female Character Analysis of Abu Jaber Novel Arabian Jazz”, included several phrases commonly produced by AI language models like ChatGPT. Among these were disclaimers such as “I apologize, but as an AI language model, I am unable to rewrite any text without having the original text to work with”. The presence of such language in a master’s thesis is a concerning sign that AI-generated content is seeping into even the most rigorous levels of academic scholarship. Dr. Jayachandran, a writing instructor, published a book titled “ChatGPT Guide to Scientific Thesis Writing”. This comprehensive guide offers expert guidance on crafting the perfect abstract, selecting an impactful title, conducting comprehensive literature reviews, and constructing compelling research chapters for undergraduate, postgraduate, and doctoral students [ 68 ]. This situation calls into question the effectiveness of existing safeguards for maintaining academic integrity within educational institutions. While there is no research indicating the extent of AI tool usage in nephrology-related academic theses, the increasing application of these tools in this field is noteworthy.

2.4. The Impact on Grant Applications

The issue of using AI-generated content is not limited to just academic papers and theses; it is also infiltrating the grant application process. A recent article [ 69 ] in The Guardian highlighted that some reports were crafted with the help of ChatGPT. One academic even found the term “regenerate response” in their assessor reports, which is a feature specific to the ChatGPT interface. A Nature survey of over 1600 researchers worldwide revealed that more than 25% use AI to assist with manuscript writing and more than 15% use the technology to aid in grant proposal writing [ 70 ]. The use of ChatGPT in grant proposal writing has not only significantly reduced the workload but has also produced outstanding results, suggesting that the grant application process is flawed [ 71 ]. This also raises concerns that peer reviewers, who play a crucial role in allocating research funds, might not be diligently reviewing the applications they are tasked with assessing. The ramifications of this oversight are significant, with the potential for misallocation of crucial research funding. This issue is exacerbated by the high levels of stress and substantial workloads that academics routinely face. Researchers are often tasked with reviewing a considerable number of lengthy grant proposals, in addition to fulfilling their regular academic duties such as publishing, peer reviewing, and administrative responsibilities. Given the enormity of these pressures, it becomes more understandable why some might resort to shortcuts like using AI-generated content to cope with their responsibilities. At present, the degree to which AI tools are employed in nephrology grant applications is unclear, yet given the rapid rise in AI adoption, attention should be drawn to this area.

2.5. The Inevitability of AI in Academia

The incorporation of AI into academic endeavors is not just a possibility; it is an unavoidable progression [ 72 ]. As we approach this transformative juncture, it becomes imperative for universities, publishers, and other academic service providers to give due consideration to AI tools. This entails comprehending their capabilities, recognizing their limitations, and being mindful of the ethical considerations tied to their utilization [ 73 ]. Rather than debating whether AI should be used, the primary focus should revolve around how it can be harnessed responsibly and effectively [ 74 ]. To ensure that AI acts as a supportive asset rather than an impediment to academic integrity, it is essential to establish clear guidelines and ethical parameters. For example, AI could be deployed to automate initial phases of literature reviews or data analysis, tasks that are often time-consuming but may not necessarily require human creativity [ 26 , 68 ]. However, it is crucial that the use of AI remains transparent, and any content generated using AI should be distinctly marked as such to uphold the integrity of the academic record. The key lies in striking a balance that permits the ethical and efficient application of AI in academia. This involves formulating policies and processes that facilitate academics’ use of AI tools while simultaneously ensuring that these tools are employed in a manner that upholds the stringent standards of academic work. By doing so, we can leverage the potential of AI to propel research and scholarship forward, all while preserving the quality and integrity that constitute the cornerstones of academia.

2.6. Proposed Solutions and Policy Recommendations

  • Advanced AI-driven plagiarism detection: AI-generated content often surpasses the detection capabilities of conventional plagiarism checkers. Implementing next-level, AI-driven plagiarism detection technologies could significantly alter this landscape. Such technologies should be designed to discern the subtle characteristics and structures unique to AI-generated text, facilitating its identification during the review phases. A recent study compared Japanese stylometric features of texts generated using ChatGPT (GPT-3.5 and GPT-4) and those written by humans, and verified the classification performance of random forest classifier for two classes [ 75 ]. The results showed that the random forest classifier focusing on the rate of function words achieved 98.1% accuracy, and focusing on all stylometric features, reached 100% in terms of all performance indexes including accuracy, recall, precision, and F1 score [ 75 ].
  • Revisiting and strengthening the peer-review process: The integrity of academic work hinges on a robust peer-review system, which has shown vulnerabilities in detecting AI-generated content. A viable solution could be the mandatory inclusion of an “AI scrutiny” phase within the peer-review workflow. This would equip reviewers with specialized tools for detecting AI-generated content. Furthermore, academic journals could deploy AI algorithms to preliminarily screen submissions for AI-generated material before they reach human evaluators.
  • Training and resources for academics on ethical AI usage: While academics excel in their specialized domains, they may lack awareness of the ethical dimensions of AI application in research. Educational institutions and scholarly organizations should develop and offer training modules that focus on the ethical and responsible deployment of AI in academic endeavors. These could range from using AI in data analytics and literature surveys to crafting academic papers. In this era of significant advancements, we must recognize and embrace the potential of chatbots in education while simultaneously emphasizing the necessity for ethical guidelines governing their use. Chatbots offer a plethora of benefits, such as providing personalized instruction, facilitating 24/7 access to support, and fostering engagement and motivation. However, it is crucial to ensure that they are used in a manner that aligns with educational values and promotes responsible learning [ 76 ]. In an effort to uphold academic integrity, the New York Education Department implemented a comprehensive ban on the use of AI tools on network devices [ 77 ]. Similarly, the International Conference on Machine Learning (ICML) prohibited authors from submitting scientific writing generated by AI tools [ 78 ]. Furthermore, many scientists disapproved ChatGPT being listed as an author on research papers [ 58 ].
  • Acknowledgment for AI as contributor: The use of ChatGPT as an author of academic papers is a controversial issue that raises important questions about accountability and contributorship [ 79 ]. On the one hand, ChatGPT can be a valuable tool for assisting with the writing process. It can help to generate ideas, organize thoughts, and produce clear and concise prose. However, ChatGPT is not a human author. It cannot understand the nuances of human language or the complexities of academic discourse. As a result, ChatGPT-generated text can often be superficial and lacking in originality. In addition, the use of ChatGPT raises concerns about accountability. Who is responsible for the content of a paper that is written using ChatGPT? Is it the human user who prompts the chatbot, or is it the chatbot itself? If a paper is found to be flawed or misleading, who can be held accountable? The issue of contributorship is also relevant. If a paper is written using ChatGPT, who should be listed as the author? Should the human user be listed as the sole author, or should ChatGPT be given some form of credit? Therefore, promoting a culture of transparency and safeguarding the integrity of academic work necessitates the acknowledgment of AI’s contribution in research and composition endeavors. It is crucial for authors to openly disclose the degree of AI assistance in a specially designated acknowledgment section within the publication. This acknowledgment should specify the particular roles played by AI, whether in data analysis, literature reviews, or drafting segments of the manuscript, alongside any human oversight exerted to ensure ethical deployment of AI. For example: “Acknowledgment: We hereby recognize the aid of [Specific AI Tool/Technology] in carrying out data analytics, conducting literature surveys, and drafting initial versions of the manuscript. This AI technology enabled a more streamlined research process, under the careful supervision of [Names of Individuals] to comply with ethical guidelines. The perspectives generated by AI significantly contributed to the articulation of arguments in this publication, affirming its valuable input to our work”.
  • Inevitability of Technological Integration: While recognizing ethical concerns, the argument asserts that the adoption of advanced technologies such as AI in academia is inevitable. It recommends shifting the focus from resistance to the establishment of robust ethical frameworks and guidelines to ensure responsible AI usage [ 76 ]. From this perspective, taking a proactive stance on AI integration, firmly rooted in ethical principles, can facilitate the utilization of AI’s advantages in academia while mitigating the associated risks of unethical AI use. By fostering a culture of transparency, accountability, and continuous learning, there is a belief that the academic community can navigate the complexities of AI. This includes crafting policies that clearly define the ethical use of AI tools, creating mechanisms for disclosing AI assistance in academic work, and promoting collaborative efforts to explore and comprehend the implications of AI in academic writing and research.

3. Ideal Proposal for AI Integration in Nephrology Academic Writing and Peer Review

Nephrology is a rapidly evolving field, and AI integration has the potential to significantly advance research and scholarship. Nevertheless, as highlighted in previous discussions about ethical dilemmas [ 80 ], there is an urgent need to develop a framework to ensure responsible AI utilization, transparency, and academic integrity in nephrology and related fields. This proposed framework outlines a comprehensive approach to integrating AI into nephrology academic writing and peer review, drawing on the expertise of leading nephrologists ( Table 1 ).

Framework for AI integration in nephrology academic writing and peer review.

ComponentObjectiveAction ItemsStakeholders InvolvedMetrics for Success
Transparent AI assistance acknowledgmentEnsure full disclosure of AI contributions in research.1. Add acknowledgment section in paper.
2. Specify AI role.
Authors, journal editorsNumber of publications with transparent acknowledgments
Enhanced peer review process with AI scrutinyMaintain academic rigor and integrity in the use of AI.1. Add “AI Scrutiny” phase in peer review.
2. Train reviewers on AI.
Peer reviewers, AI expertsReduced rate of publication errors related to AI misuse
AI ethics training for nephrologistsEquip nephrologists with the knowledge to use AI ethically.1. Develop training modules.
2. Conduct workshops.
Nephrologists, ethicists, AI expertsNumber of trained personnel
AI as a collaborative contributorFoster a culture where AI and human expertise are seen as complementary.1. Advocate for collaboration in publications.
2. Develop guidelines for collaboration.
Nephrologists, AI developersNumber of collaborative publications
Continuous monitoring and researchUnderstand the impact of AI on the field and adapt accordingly.1. Initiate long-term studies.
2. Develop AI-specific plagiarism tools.
Nephrologists, data scientistsPublished long-term impact studies
Ethics checklistEnsure preliminary ethical compliance in AI usage.Integrate ethics checklist into manuscript submission.Authors, journal editors, ethicistsNumber of manuscripts screened for ethical compliance

3.1. Transparent AI Assistance Acknowledgment

In the realm of nephrology research, it is essential that authors openly recognize the utilization of AI tools [ 56 ]. This recognition should find a dedicated space within their publications, shedding light on the specific roles that AI plays in data analysis, literature reviews, or manuscript drafting. As an example, consider a nephrology research paper that acknowledges AI’s involvement like this: “We extend our gratitude to [Specific AI Tool/Technology] for its contributions in data analysis and literature reviews. AI-driven insights were seamlessly integrated into our research, guided by the expertise of distinguished nephrologists [Names of Nephrologists]”.

3.2. Enhanced Peer Review Process with AI Scrutiny

To preserve academic rigor and uphold integrity, it is advisable for nephrology journals to integrate an “AI evaluation” stage into their peer-review process. Peer reviewers should be well-informed about the potential influence of AI on the manuscripts under their review and should be equipped to recognize AI-generated text. This phase, therefore, should incorporate nephrology experts with a deep understanding of AI applications. These experts can assess the incorporation of AI-generated content, verifying its adherence to established standards and ethical guidelines in nephrology research.

3.3. AI Ethics Training for Nephrologist

Specialized training in the ethical use of AI tools should be provided to nephrology experts and their fellow researchers in nephrology. This curriculum should encompass key subjects, including the potential advantages and pitfalls of AI in nephrology research, techniques to recognize and mitigate biases in AI tools, and methods to ensure transparency and accountability in AI-driven research. These educational programs can be delivered through workshops, webinars, and online courses. Nephrologist experts are uniquely positioned to enlighten their colleagues about the responsible application of AI, preserving AI’s value in nephrology research. Moreover, we stress the significance of fostering collaboration between nephrologists and AI specialists. Through this joint effort, we can create and implement AI tools that are not only ethical but also effective and advantageous to the nephrology field. Collaborative training initiatives with AI experts can also offer a comprehensive understanding of AI’s capabilities and limitations.

3.4. AI as a Collaborative Contributor

Nephrology experts should advocate for a collaborative culture that recognize AI as a valuable research partner [ 24 ]. AI’s proficiency in data analysis, pattern recognition, and literature reviews can free nephrologists to delve into novel research inquiries and clinical applications. For example, AI can be employed to analyze extensive patient datasets, uncovering trends and patterns that would be difficult or impossible for nephrologists to identify on their own [ 81 ]. AI can be used for crafting innovative diagnostic tools and algorithms, enabling nephrologists to enhance the precision and efficiency of kidney disease diagnosis and monitoring. Additionally, AI holds the potential to develop new therapeutic strategies for kidney disease, encompassing personalized treatment plans and the discoveries of new drug. Publications resulting from these collaborations should emphasize the synergistic relationship between AI and nephrologist expertise, demonstrating how AI-generated insights enhance the nephrology field.

3.5. Continuous Monitoring and Research

Nephrologists should play a leading role in continuously evaluating the impact of AI on nephrology research. This requires implementing long-term studies to track changing perceptions, the emergence of AI-focused research trends, and their implications for the quality and integrity of nephrology publications. We can carry out surveys and interviews with nephrologists to gauge their perspectives on AI, their existing utilization of AI in research, and their anticipations regarding AI’s future role in Nephrology. Moreover, an analysis of the nephrology literature can be undertaken to pinpoint developing trends in AI-centric research and appraise AI’s influence on the caliber and credibility of nephrology publications. Additionally, experts in nephrology can provide valuable insights in studies evaluating the efficacy of plagiarism detection tools enhanced using AI, specifically tailored to the nephrology literature, ensuring their alignment with the distinct features of the field.

3.6. Ethics Checklist

Recently, the CANGARU (ChatGPT, Generative Artificial Intelligence and Natural Large Language Models for Accountable Reporting and Use) Guidelines have been proposed as a comprehensive framework for ensuring ethical standards in AI research [ 82 ]. The Ethics Checklist, derived from these newly established guidelines, serves as a crucial tool in the AI integration process, upholding the highest ethical principles in nephrology research. Its adoption in manuscript submissions is essential for the early and systematic consideration of ethical dimensions, significantly mitigating the risk of ethical dilemmas in subsequent stages of research.

The Ethics Checklist plays a central role in the AI integration process, serving as a preemptive step to uphold ethical standards in nephrology research. Its incorporation into manuscript submissions guarantees the early consideration of ethical aspects, reducing the likelihood of ethical issues arising down the line. Effective implementation and review of this checklist ( Table 2 ) depend on collaboration among authors, journal editors, and ethicists, thereby fostering responsible AI utilization in the realm of nephrology. A vital metric for tracking advancement in this domain is the count of manuscripts assessed for ethical adherence, demonstrating a resolute dedication to transparency and the integrity of research.

Proposed AI Ethics Checklist for journal submissions.


General Information AI Involvement (If AI was not involved, you may skip the rest of this checklist.)
AI Contribution AI Tools and Technologies Ethical Considerations Author’s Declaration
I, the undersigned, declare that the information provided in this checklist is accurate and complete to the best of my knowledge.
Signature: ___________________________

4. Future Studies and Research Directions

Undoubtedly, the significance of conducting a thorough analysis to grasp the extent of AI’s presence in academic writings cannot be overstated. There is an immediate necessity to quantify the prevalence and influence of AI in scholarly literature, thereby offering a clear perspective on the current landscape. An exhaustive exploration spanning various academic disciplines and levels of scholarship holds the potential to yield valuable insights into the ubiquity of AI-generated content within academic discourse. Such research can unveil the diverse applications of AI, pinpoint commonly used AI tools, and gauge the transparency with which they are utilized. Moreover, it may spotlight academic domains where AI plays a substantial role, signaling areas demanding prompt attention.

Conventional plagiarism detection tools might grapple with recognizing AI-generated content due to the advanced capabilities of contemporary AI writing assistance. Consequently, there is an urgent demand to appraise the efficacy of plagiarism detection technologies bolstered by AI for identifying AI-generated text. These evaluations could provide a deeper understanding of the capabilities and limitations of these advanced tools and their potential integration into existing plagiarism detection and academic evaluation frameworks. Furthermore, the insights gleaned from these inquiries could inform the development of more robust, AI-focused plagiarism detection systems capable of adapting to evolving AI writing techniques.

To comprehend the long-term ramifications of AI utilization in academic work, it is imperative to undertake extended studies that track changes over an extended period. These investigations could delve into shifts in attitudes toward AI, the evolution of AI-related plagiarism, and its impact on the caliber and authenticity of scholarly endeavors. They may also shed light on how the integration of AI into academic literature influences the reliability of scholarly publications, the peer-review process, and the broader academic community ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is clinpract-14-00008-g002.jpg

Future studies and research directions.

5. Conclusions

The extensive utilization of AI-generated content in academic papers underscores profound issues deeply ingrained within the academic realm. These issues manifest in various ways, including the relentless pressure to publish, shortcomings in peer-review procedures, and an absence of effective safeguards against AI-driven plagiarism. The failure to detect and rectify AI-authored material during the evaluation process erodes the fundamental integrity of scholarly work. Furthermore, the inappropriate deployment of AI technology jeopardizes the rigorous ethical standards maintained by the academic community.

Resolving this challenge necessitates collaborative efforts from all stakeholders in academia. Educational institutions, academic journals, and researchers collectively bear the responsibility to combat unethical AI usage in scholarly publications. Potential solutions encompass fostering an environment characterized by transparency and the ethical use of AI, enhancing peer-review systems with technology tailored to identify AI-generated plagiarism, and advocating for higher ethical standards throughout the academic community. Additionally, the provision of clear guidelines for the responsible use of AI tools and the education of scholars about AI ethics are indispensable measures. Through proactive initiatives, we can navigate the intricate interplay between AI technology and academic integrity, ensuring the preservation of the latter even in the face of technological advancements.

Funding Statement

This research received no external funding.

Author Contributions

Conceptualization, J.M. and W.C.; methodology, J.M. and C.T.; validation, J.M., C.T., F.Q. and W.C.; investigation, J.M. and W.C.; resources, J.M.; data curation, J.M., C.T., S.S., O.A.G.V., F.Q. and W.C.; writing—original draft preparation, J.M., C.T., S.S., O.A.G.V., F.Q. and W.C.; writing—review and editing, J.M., C.T., S.S., O.A.G.V., F.Q. and W.C.; visualization, F.Q. and W.C.; supervision, W.C. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflicts of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

10 Best AI Tools for Streamlining Your Literature Review Process

ai ethics literature review

Diving into the world of academic research can feel like navigating a labyrinth of endless papers, data, and theories. Yet, for many students and researchers, it's a journey that's both exhilarating and overwhelming. If only there were a way to streamline the literature review process! Well, folks, the future is now, thanks to AI. Here’s a detailed look at the 10 best AI tools that can transform your literature review process from drudgery to delight.

1. SciSummary

  • Overview: Designed especially for researchers and students, SciSummary is a powerhouse in summarizing complex academic papers into concise, digestible pieces, ensuring you capture the essence without missing critical information.
  • Standout Feature: It even highlights key points, making your note-taking process seamless.
  • Price: Free tier available; Premium plans start at $6.99/month or $34.99/yr.
  • Overview: An oldie but a goodie, Zotero remains a favorite for managing references and generating citations effortlessly.
  • Standout Feature: Its browser extension allows for instant source saving while you browse.
  • Price: Free with optional storage add-ons.

3. Mendeley

  • Overview: Mendeley is more than just a reference manager; it’s a researcher's collaborative tool.
  • Standout Feature: The social network feature, enabling researchers to connect and share their work.
  • Price: Free for basic use; Premium starts at $4.99/month.
  • Overview: This tool is a researcher's secret weapon for bibliography management.
  • Standout Feature: EndNote integrates seamlessly with MS Word, making citation generation a breeze.
  • Price: Free basic version; Advanced options available via subscription.

5. Ref-N-Write

  • Overview: Ever struggled with academic writing style? Ref-N-Write has got your back.
  • Standout Feature: Its academic phrasebank is a life-saver for non-native English speakers.
  • Price: One-time purchase starting at $29.

6. Connected Papers

  • Overview: Provides a visual graph of related papers to help you map out the landscape of your research area.
  • Standout Feature: The exploration graph simplifies finding the most influential and connected papers.
  • Price: Free.
  • Overview: This AI researcher assistant helps you find relevant papers and even extract specific data points.
  • Standout Feature: The ability to run detailed content analysis for deeper insights.
  • Price: Free tier available; Professional plans start at $9/month.

8. Research Rabbit

  • Overview: Think of it as Spotify’s recommendation system but for research papers.
  • Standout Feature: It doesn't just find papers; it suggests entire lines of inquiry you might have missed.

9. Semantic Scholar

  • Overview: Backed by the Allen Institute, this tool uses AI to provide high-quality, nuanced research paper recommendations.
  • Standout Feature: Automatic citation extraction and its TLDR feature that summarizes long papers in a few sentences.

10. Paperpile

  • Overview: If you’re a Google Docs enthusiast, Paperpile is your best friend for reference management within the Google ecosystem.
  • Standout Feature: Smooth integration with Google Scholar and Google Drive.
  • Price: Basic free plan; Premium starts at $2.99/month.

As you delve into these tools, your literature review process will not only become more efficient but also significantly more enjoyable. Don't just take our word for it—try them out and see the transformation for yourself.

If you’re interested in seeing how SciSummary stands up against giants like ChatGPT, feel free to explore our features and see live comparisons at SciSummary . Happy researching!

IMAGES

  1. (PDF) Ethical Guidelines for Artificial Intelligence: A Systematic

    ai ethics literature review

  2. (PDF) An Overview of Artificial Intelligence Ethics

    ai ethics literature review

  3. How to write a literature review faster with ai I ai research assitant I Dr Dee

    ai ethics literature review

  4. AI ethics

    ai ethics literature review

  5. (PDF) Ethics of AI: A Systematic Literature Review of Principles and

    ai ethics literature review

  6. The Ethics of AI in Health Care: A Mapping Review

    ai ethics literature review

VIDEO

  1. The Black Box of AI and Tech Ethics by Maira Elahi

  2. Best AI for Literature Review and Meta Analysis Data Collection| SciSpace Tutorial & Discount Code!

  3. Understanding AI Ethics with OpenAI

  4. Research Methodology for Life Science Projects (4 Minutes)

  5. The Ethics of AI in Creative Arts

  6. AI ethics

COMMENTS

  1. (PDF) Ethics of AI: A Systematic Literature Review of Principles and

    Ethics of AI: A Systematic Literature Review of Principles and Challenges EASE, 13-15 June 2022, Gothenburg, Sweden. Additionally, all the authors have substantial e xperience performing.

  2. Title: Ethics of AI: A Systematic Literature Review of Principles and

    Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles. We conducted a systematic literature review (SLR) study to ...

  3. Worldwide AI ethics: A review of 200 guidelines and recommendations for

    This paper conducts a meta-analysis of 200 governance policies and ethical guidelines for AI usage published by various stakeholders worldwide. It identifies 17 ethical principles that resonate across the documents and presents a database and tool for comparison and analysis.

  4. Ethics & AI: A Systematic Review on Ethical Concerns and Related ...

    This article reviews ethical concerns and strategies for designing with AI in healthcare, based on a systematic literature search. It covers 12 main ethical issues and 19 sub-issues, such as justice, privacy, transparency, and trust, and provides examples of AI applications in healthcare.

  5. A Literature Review on Ethics for AI in Biomedical Research and

    In recent years, there has been growing interest in AI ethics, as reflected by a huge number of (scientific) literature dealing with the topic of AI ethics. The main objectives of this review are: (1) to provide an overview about important (upcoming) AI ethics regulations and international recommendations as well as available AI ethics tools ...

  6. Ethics of AI: A systematic literature review of principles and ...

    Ethics in AI gets significant attention in the last couple of years and there is a need of systematic literature study that discuss the principles and uncover the key challenges of AI ethics. This study is conducted to fill the given research gap by following the SLR approach.

  7. Artificial intelligence for good health: a scoping review of the ethics

    This article examines the ethical implications of artificial intelligence (AI) in health, public health, and global health, based on a review of peer reviewed and grey literature. It identifies common ethical concerns such as privacy, trust, accountability, and bias, and highlights the need for further research and guidance in this field.

  8. Ethics of AI: A Systematic Literature Review of Principles and

    However, there is still debate about the implications of these principles. We conducted a systematic literature review (SLR) study to investigate the agreement on the significance of AI principles and identify the challenging factors that could negatively impact the adoption of AI ethics principles.

  9. Objective metrics for ethical AI: a systematic literature review

    The field of AI Ethics has recently gained considerable attention, yet much of the existing academic research lacks practical and objective contributions for the development of ethical AI systems. This systematic literature review aims to identify and map objective metrics documented in literature between January 2018 and June 2023, specifically focusing on the ethical principles outlined in ...

  10. Mapping the ethic‐theoretical foundations of artificial intelligence

    The study identifies the lack of AI ethics literature that draws upon seminal ethics works and the ensuing disconnectedness among the publications on this subject. ... While there is a vast corpus of literature that discusses AI E, our structured literature review reflects the view that has been echoed by several researchers that little of this ...

  11. Medical artificial intelligence ethics: A systematic review of

    Our systematic literature review of published empirical studies of medical AI ethics identified the three main approaches taken in this line of research and the major findings in each approach. The largest group of studies examines the knowledge and attitudes of medical AI ethics across various stakeholders.

  12. A high-level overview of AI ethics

    Artificial intelligence (AI) ethics is a field that has emerged as a response to the growing concern regarding the impact of AI. ... and frameworks. This literature has flourished, often referred to as "Trustworthy AI". This review embraces inherent interdisciplinarity in the field by providing a high-level introduction to AI ethics drawing ...

  13. Ethics-based AI auditing: A systematic literature review on

    The AI auditing literature is fragmented, examining specific contexts such as search engines [21], facial recognition [73], social networks [144], e-commerce [149], and online job boards [55].Thus far, few syntheses can be found in the literature, with an exception being Batarseh et al. [9], who scored AI assurance methods based on their applicability but devoted little attention to ethics issues.

  14. Ethics & AI: A Systematic Review on Ethical Concerns and Related

    In modern life, the application of artificial intelligence (AI) has promoted the implementation of data-driven algorithms in high-stakes domains, such as healthcare.

  15. (Open Access) Ethics of AI: A Systematic Literature Review of

    (DOI: 10.1145/3530019.3531329) Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles. We conducted a systematic ...

  16. A literature review on artificial intelligence and ethics in online

    Guided by the literature review provided, as well as by the principles noted in the aforementioned citation, we explore the uncharted center of the diagram in Fig. 6.1 (that is, the intersection between AI, online learning, and ethics) by distilling the main ethical considerations that should be taken into account both when designing, as well ...

  17. Responsible AI Governance: A Systematic Literature Review

    Responsible AI Governance: A Systematic Literature Review Amna Batool CSIRO's Data61 Melbourne, Australia [email protected] Didar Zowghi CSIRO's Data61 ... framework/australias-ai-ethics-principles arXiv:2401.10896v1 [cs.CY] 18 Dec 2023. Amna Batool, Didar Zowghi, and Muneera Bano

  18. [2109.07906v1] Ethics of AI: A Systematic Literature Review of

    Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles. We conducted a systematic literature review (SLR) study to ...

  19. Generative artificial intelligence and ethical ...

    By analysing the articles identified in our scoping review (especially those with a stronger ethics focus), we selected a set of well established ethical principles in the AI ethics literature to include in our checklist that are essential to, and operationalisable in, GenAI research in health care.

  20. PDF Ethics of AI: A Systematic Literature Review of Principles and Challenges

    Ethics of AI: A Systematic Literature Review of Principles and Challenges Arif Ali Khan1∗, Sher Badshah2, Peng Liang3, Muhammad Waseem3, Bilal Khan4, Aakash Ahmad5, Mahdi Fahmideh6, Mahmood Niazi7, Muhammad Azeem Akbar8 1M3S Empirical Software Engineering Research Unit, University of Oulu, Oulu, Finland 2 Faculty of Computer Science, Dalhousie University, Halifax, Canada

  21. Ethics

    Ethical issues related to artificial intelligence are a complex and evolving field of concern. As AI technology continues to advance, it raises various ethical dilemmas and challenges. Here are some of the key ethical issues associated with AI: Bias and Fairness: AI systems can inherit and even amplify biases present in their training data ...

  22. AI Can (Mostly) Outperform Human CEOs

    Rather than fully replacing human CEOs, AI is poised to augment leadership by enhancing data analysis and operational efficiency, leaving humans to focus on long-term vision, ethics, and ...

  23. (PDF) Ethics of AI: A Systematic Literature Review of Principles and

    Ethics of AI: A Systematic Literature Review of Principles and Challenges EASE 2022, June 13-15, 2022, Gothenburg, Sweden. only cover the rst two months of 2021. The increasing number of.

  24. Peer review and research integrity in the age of artificial intelligence

    3. Tackling bias in peer review Peer review is also under scrutiny for exposing inherent biases in academia.Reviewer bias, whether implicit or explicit, can stem from various factors such as an author's nationality, language, affiliation, or prior work. Bias can also affect the perceived quality of a manuscript's language, influenced by geographical factors, which can all hinder objective ...

  25. Ethics of AI: A Systematic Literature Review of Principles and

    Ethics in AI becomes a global topic of interest for both policymakers and academic researchers. In the last few years, various research organizations, lawyers, think tankers, and regulatory bodies get involved in developing AI ethics guidelines and principles. However, there is still debate about the implications of these principles.

  26. A Systematic Literature Review of Human-Centered, Ethical, and

    As part of our posteriori analysis, we used AI-assisted summarization to answer our RQs using the abstracts of the papers as input to ChatGPT 4.0, following the recent emerging literature on using AI for qualitative analysis (Byun et al., 2023; Abram et al., 2020).We compared its findings with our manual analysis (Section 8).Although ChatGPT provided an overview of the dataset, it is important ...

  27. Literature Review of Explainable Tabular Data Analysis

    Explainable artificial intelligence (XAI) is crucial for enhancing transparency and trust in machine learning models, especially for tabular data used in finance, healthcare, and marketing. This paper surveys XAI techniques for tabular data, building on] previous work done, specifically a survey of explainable artificial intelligence for tabular data, and analyzes recent advancements. It ...

  28. Ethical Dilemmas in Using AI for Academic Writing and an Example

    Enhanced peer review process with AI scrutiny: Maintain academic rigor and integrity in the use of AI. 1. Add "AI Scrutiny" phase in peer review. 2. Train reviewers on AI. Peer reviewers, AI experts: Reduced rate of publication errors related to AI misuse: AI ethics training for nephrologists: Equip nephrologists with the knowledge to use ...

  29. 10 Best AI Tools for Streamlining Your Literature Review Process

    Diving into the world of academic research can feel like navigating a labyrinth of endless papers, data, and theories. Yet, for many students and researchers, it's a journey that's both exhilarating and overwhelming. If only there were a way to streamline the literature review process! Well, folks, the future is now, thanks to AI.

  30. What sexual well-being really means for older adults: a systematic

    The goal of this systematic review is to identify the researched topics in this field and assess the quality of the research while minimizing bias through a peer-to-peer review process. We searched several databases, including the Cochrane Database, Psy-Redalyc, PubMed, Scielo, Scopus and Google Scholar, resulting in 181,278 references.