Tools Detecting and/or Measuring Ethical Leadership: A Systematic Literature Review I NTERNATIONAL J OURNAL OF O

The paper includes a Systematic Literature Review of the tools for detecting and/or measuring Ethical Leadership in the business world published between 2000 and 2020. The review attempts to explore, analyze and synthesize the most recent evidence from the field with the aim to summarize and integrate knowledge regarding Ethical Leadership. Our goal is to build a documentation framework for further research and future design of a more concise and accurate tool covering all possible aspects of Ethical Leadership. We choose the Systematic Literature Review method because it is structured and minimizes subjectivity when selecting and analyzing data. Our work addresses researchers/scholars, postgraduate students, PhD candidates, and private and public sector officials who need scientific evidence to support decision-making and/or policy designing.

At the same time, a considerable number of instruments detecting or measuring ethical leadership elements appeared. The construction of these tools dates back to the first half of the twentieth century. Some of them were based on existing research, while others were updated versions of previous attempts to evaluate ethical leadership.
This paper seeks to review, critically appraise, and synthesize published evidence regarding tools to detect and/or measure ethical leadership. We implement the Systematic Literature Review (SLR) methodology because it is structured and can lead to "transparent and reproducible" and less biased conclusions (Lame, 2019), as compared to previous literature reviews following the traditional review method. We limit our investigation to the 21 st century intending to open the topic of Ethical Leadership instruments further by mapping the most recent literature. We attempt to summarize and integrate knowledge with an aspiration to create evidence-based information on the fast-expanding field of ethical leadership.
The paper is organized into four distinct parts: First, a conceptual clarification of the key terms of leadership, ethics and morality, and ethical leadership is attempted. Second, comes the methodology of this study. The third part presents and analyzes the findings. The last part contains a discussion of the conclusions and implications.

Purpose and Research Question
The purpose of this study is to map and synthesize scientific evidence on Ethical Leadership tools by using the SLR method. We seek to answer this research question: "Is there a plausible instrument/ tool to explore as many as possible aspects of ethical leadership in an organization?" We also ask a secondary question: "Is there consensus among researchers on how to detect an ethical leader?" This review aims to identify the most appropriate tool for detecting and/or measuring ethical leadership to implement it in multiple social sectors.

Theoretical Framework
Leadership. The notion of leadership itself has gone through many definitions and clarifications during the last sixty years including a considerable number of authors (to mention a few: Bass, 2008;Buckingham, 2005;Day & Antonakis, 2012;Kotter, 1990;Leithwood, 2004;Yukl, 1998). A combination of these contributions can provide a more or less complete definition of leadership. Each of these researchers has pointed at one or two particular aspects of the leadership phenomenon. However, all authors agree that leading has to do with influencing significant others to follow the leader's vision and, thus, achieve a common goal.
Modern leadership research moves further than this traditional definition. It includes all stakeholders of the leadership phenomenon -not only the leader. The leader's followers, the context in which the leadership phenomenon occurs, and the relations among participants are some of the dimensions examined in modern leadership literature (Day & Antonakis, 2012). These dimensions render important when constructing a tool detecting and/or measuring leaders' qualities, traits or behaviors. No matter how many definitions of leadership have been offered, new research always brings forward emerging features (Yukl, 1998, p. 2).
Since the beginning of the 21st century, a simplification of the rather vague theoretical or philosophical approach of ethical leadership has been attempted. Other researchers have shown preference to a descriptive definition, through attitudes, abilities and skills (e.g., Resick et al., 2006), using descriptors such as, "character and integrity, ethical awareness, community/people-orientation, motivating, encouraging and empowering, and managing ethical accountability" (p. 346).
Nevertheless, the difficulty of clearly defining ethical leadership is so great that even the same researcher can review the key components of their definitions after a few years. For example, Starratt described the ethic of critique, the ethic of justice, and the ethic of caring as the key components of ethical leadership in 1991; later, in 2004, he moved to new descriptors: responsibility, authenticity, and presence (Starratt, 1991(Starratt, , 2004. For some authors, ethics and leadership are inextricably linked: "Ethics is central to leadership" (Northouse, 2018, p. 495), "ethics can be regarded as the heart of leadership" (Ciulla, 2004;Langlois, 2011, p. 34). However, there are researchers (e.g., Crews, 2011) who have contributed substantially to the definition of ethical leadership, balancing its theoretical and practical characteristics.

Method Tools
Various terms have been used in literature to define constructions to investigate ethical leadership. Among the most commonly used are "mechanism", "measure", "scale", "questionnaire" or "tool". The term tool is used throughout this paper inclusively in the sense that it represents any of the above terms. Tools may include a variety of investigating means: • Questionnaires: Direct investigation. Questions or Statements about the ethical level of an active or a candidate leader, usually in Likert scale form. • Vignettes: Indirect investigation. Short hypothetical scenarios, including ethical dilemmas, attempt to capture a real ethical reaction through appropriate questions/statements. • A combination of statements and vignettes.

Who Judges the Leader?
Based on the type of judges, the existing tools for detecting and/or measuring ethical leadership can be divided into three categories: a) Those where detection is based solely on leaders' own judgment for themselves (selfrating or self-referential). Their rationale is that the best knower of the organization reality is the executive himself. However, an executive may lack knowledge of certain parameters affecting the operation of the organization, a fact that can lead to a misconception about their leadership skills and effectiveness and possible dissatisfaction on the part of employees.
According to Kim and Yukl, "self-rating tends to be inflated" (1998, p. 368). Studies have proved that the poorest performers rate themselves higher than the highly competent ones for various reasons, such as a presentation of a competent self, social desirability effects, personality characteristics of the rater (Atkins & Wood, 2002), or the individuals' wish to "discount or rationalize negative feedback" (Waldman et al., 1998, p. 6). In other words, the leader's low self-awareness may lead to inadvertently incorrect over-self-esteem of their leadership skills. It may also lead to the inability to assess the effects of their behavior on their subordinates (Tang et al., 2013). Some researchers (e.g., Brutus et al., 1998) argue that this happens mainly in the private sector (less often in the public sector, e.g., education, army), where high performing managers are rated by their coworkers higher than they did for themselves.
b) Those where the detection and/or measure is based solely on the judgment of followers (subordinates) or junior executives (hetero 1 -referential or upward rating). Some researchers think that this is a reliable method because an employee is a potential/ candidate leader and their coworkers "are those with whom the employee interacts regularly at work, [therefore] their assessments are reliable, valid, and credible" (Edwards & Ewen, 1996, p. 7).
Subordinates' assessment promotes self-awareness and provides feedback for the leader's development and change (Sala, 2003). However, it is important to consider other parameters, such as social conditions drastically affecting this reliability. Subordinates' comments and criticism of the executive usually refer to social balance, kindness, and camaraderie issues and may be more positive or negative than reality. The possible discrepancy between the ethics of the followers and those of the candidate or incumbent executives should also be considered. The ethics of the executive may be in line with that of the organization for the benefit of the latter. Consequently, there are different ethical perspectives in the two parties [executivessubordinates/employees or leaders-followers] under which the quality of the ethical leadership of the executive is judged. The possible lack of evaluative skills by the subordinates (due to the lack of relevant training) still needs to be seriously considered for the results of their judgment. c) Those where the detection is based on the judgment of both groups, as well as on the views of senior executives (downward rating) and their colleagues (peer-rating). This multifaceted method, "feedback method 360 o " is used as the most objective way to evaluate the quality of leadership of an executive (e.g., Edwards & Ewen, 1996;Waldman et al., 1998). Research findings have revealed that it is "generally useful to the group but following through on development was the most critical factor in improving one's skills" (Hazucha et al., 1993). The implementation, however, of this method does not primarily aim at selecting a candidate leader but at providing feedback on the abilities and/or qualities of an active leader's self-improvement, professional development, and increasing self-awareness (Tang et al., 2013). This method is not a one-time evaluation statement but a "process of systematic data collection" (Kopsidas, 2021). Besides, the reliability of the method is ambiguous since, for example, "the influence of interpersonal effect was stronger in upward and peer ratings than it was in downward feedback" (Antonioni & Park, 2001). According to Carless et al., (1998), "there is a generally low level of agreement between raters providing 360-degree feedback". Although this method may be seemingly the best, it has significant weaknesses which need to be considered before its application.

Defining Detecting and/or Measuring
Throughout the title and the text, the terms "detecting and /or measuring" collocate. This is because we conceptualize the use of tools as follows: a) Tools aiming at detecting the ethical sensitivity of the leader: these tools usually employ qualitative approaches and include mostly scenarios of ethical dilemmas calling for ethical decision-making. Ethical decisions are to be based on ethical motivation, that is, the urge the leader feels to be ethical and act morally. They detect the leader's ethical literacy, as the ability to be ethical is often blurred due to "community shared prejudices", "moral blind spots", and "habituating wrongdoing" (Tuana, 2014, p.160-161). In our conceptualization, these tools can function independently [per se] or be part of an extended tool. b) Tools measuring either attitude towards ethical values and mirroring the values-based philosophy of the leader (Notman, 2014), or leadership skills to respond to ethical dilemmas and make contextually appropriate ethical decisions, or both. Since the objective of these tools is collecting and analyzing numerical data, they usually include quantitative-type questionnaires. They can be used as an independent method of exploration or can be combined with tools detecting ethical sensitivity.
Moreover, we put forward another distinction regarding the application of the tools. We support that there should be a different approach, hence a different tool, when examining the ethical sensitivity and qualities of aspiring leaders compared to experienced leaders. As the first is concerned, there is no existing leadership evidence. We are to speculate based on the hypothetical evidence provided by the detecting tools or -perhaps-by a limited number of self-referential questions. The latter have already shown ethical behaviors and decisions in praxis, so both self-referential and hetero-referential tools can be used.

Reviewing Method
This study is based on a Systematic Literature Review [SLR] that took place between October 2019 and October 2021. As a method of reviewing, the SLR has been widely used lately. It provides a more concrete and transparent synthesis and quality appraisal of published research and avoids bias in selecting and including studies (Kraus et al., 2020;Lame, 2019). In this paper, we use an explicit methodological pattern resulting from the research questions mentioned above: defining criteria for inclusion and/or exclusion, locating studies in databases, selecting studies according to set criteria, assessing the quality of selected studies, presenting and analyzing data, interpreting results to benefit further study.

Defining Criteria for Inclusion
V.a. Only completed and adequately documented tools oriented towards detecting or measuring ethical leadership were included. We clarify below how we conceptualize the term "completed and adequately documented tools": The term "completed" refers to published tools tested for -at least-their validity or reliability. Uncompleted tools (e.g., Edmonson et al., 2003;Krisharyuli et al., 2020;Northouse, 2018, p. 517) were excluded from the review.
With the term "not adequately documented" we mean tools with a limited number of references, a short mention of previous research, unreliable data collection (e.g., no anonymity), or inconsistencies between the text and index references (e.g., Kaptein et al., 2005).
"Partially oriented to ethical leadership" means that these studies deal with specific perspectives of ethical leadership (to mention a few: servant or authentic leadership, i.e., Barbuto & Wheeler, 2006;Walumbwa et al., 2008; transformational leadership, e.g., Bass & Steidlmeier, 1999;spiritual leadership, i.e., Fry, 2003). In these tools, studies detecting other types of leadership in which ethics plays an important role and simply "share some characteristics with ethical leadership but are conceptually different" (Den Hartog, 2015) were excluded from reviewing as they "tend to rest on narrow and somewhat simplistic characterizations of ethical concepts" (Ciulla & Forsyth, 2011, p. 239).
V. b. An important inclusion criterion is the publication date of each tool. Included tools were produced in the 21st century. The increase of tools in our century dictates the need to update relevant literature. The uncovering of economic scandals worldwide triggered attempts to find ways to avoid them in the future by selecting ethically appropriate leaders (Edmonson et al., 2003;Frisch & Huppenbauer, 2013;Hackett & Wang, 2012;Ryan & Bisson, 2011). These modern tools consider basic philosophic principles or sociological trends and recognize ethics as universal and timeless bedrock on which strong and meaningful human relations are built. The researchers who constructed them aspire to make them applicable to various social endeavors.

Locating Studies in Databases
Research sources include well-known online databases (DOAJ, ERIC, Google Scholar, JSTOR, ProQuest, Science Direct, Scopus, Springer Link, Web of Science, Wiley Online Library), purchased (printed) books, and national and university dissertation databases. All articles and books are in English. Only a doctoral dissertation is in Greek.

Ethics
We selected and analyzed facts and evidence in the relevant papers with due respect to the authors and their work. We attempted to be objective and unbiased and present arguments as they derive from the selected points of discussion.

Data Collection Strategy
We applied the "Preferred Reporting Items for Systematic Reviews and Meta-Analyses -PRISMA", the overall aim of which is "to help ensure the clarity and transparency of reporting of systematic reviews" (Liberati et al., 2009). Its four-phase construction details the quantitative path through which the search took place, based on review questions and inclusion criteria mentioned above.
The PRISMA selection procedure ( Figure 1) produced 27 eligible papers containing an equal number of tools. More specifically: The initial database search produced 1847 items while other sources produced 22 items, totaling 1869. Duplicates were removed, thus leaving 1276 items to screen using the selected inclusion criteria. Twenty-seven papers were eligible for further assessment, from which 15 were eventually excluded (four were incomplete, three were not adequately documented, and eight represented very specific leadership perspectives). Table 1 shows all 27 tools and indicates which ones were excluded and why.

Findings
Below there is a critical analysis of the remaining tools. Table 2 and 3 contain features and details of the tools analyzed.

Brown, Trevino and Harrison: "Ethical Leadership Scale"
Some points call for further discussion. For example, the researchers state, "we would not expect high agreement between leader self-reports and employees' ratings" (p. 130), and "leaders are almost certain to rate themselves favorably on the ethical dimension of leadership" (p. 131) thus, accepting the unreliability of leaders. Therefore, the following issues need further consideration: a) the ethical basis of the tool is questioned: For example, is it ethical for an ethical leadership tool to presuppose that the assessee is unreliable? How ethical it is to consider that employee judgments assessing the ethics of their supervisor are correct when answering questions like "Sets an example of how to do things the right way in terms of ethics" or "Conducts his/her personal life in an ethical manner"? How do employees know what their supervisor's behavior is in their personal life? b) the limited range of the tool: The tool is limited to the evaluation of immediate superiors. It takes for granted that junior employees have no clear picture of what happens two levels above their immediate superiors. Thus, it presupposes the distance between senior executives and employees will lead to an unreliable (positive or negative) judgment.
c) the perspective of its construction: The authors base the entire construction of the tool on Bandura's social learning theory. The researchers argue that employees will [or may] develop an ethical behavior by observing and imitating significant others through attention and retention processes. But, what happens when the contextual ethical climate within a team does not identify with the leader's (moral person) ethics? Bai et al. (2017) attempt to bridge this theoretical and practical gap by extending social learning theory, from individual (moral person), to a team-level (moral manager) social learning perspective. They argue that when the leader's ethical style matches the ethical climate among employees, it "might be more likely to accept the ethical leader as their role model". In other words, they do not question the power of Brown et al.'s (2005) tool, but they doubt whether its implementation at the group level is possible. They make a distinction between exhibiting ethical behavior and acting as a role model because the first refers to the ethical climate while the second concerns the leader. Thus, they support that the leader's behavior is crucial in creating an ethical climate among employees and that they serve as a model of a moral person.  Spangenberg and Theron (2005) implemented the Delphi method (apart from its initial form by the two researchers) in two rounds of evaluation. In the first one, the evaluators were 13 industrial psychologists, "known as experts in the field of leadership assessment and/or development"; in the second round, the judges were 16 executive managers "from some of South Africa's largest and most respected companies", thus, combining leadership theoretical knowledge with the consent and support of the leadership community (p. 3). Spangenberg and Theron's use of the Delphi Method is a "novelty", as it does not employ relevant theory. Accordingly, the construction of their tool suffers. To be more specific: Theorists of the Delphi Method (Adler & Ziglio, 1996;Mitroff & Turoff, 2002;Rowe & Wright, 1999) support the existence of a single group of experts taking part in the successive rounds of evaluation and having the chance to revise their views after feedback. However, Spangenberg and Theron's tool includes two different groups of experts, each of whom takes part in only one round of evaluation; neither has a single chance to improve their views based on feedback from the previous round. The tool's construction method is not the Delphi Method. With the assistance of experts and executive managers, the tool seems to consider the theoretical and practical perspective of ethical leadership; but its construction is weak and rather ineffective.

Sarros, Cooper, and Hartican : "Virtuous Leadership Scale"
There are concerns about some issues, i.e., a) data collection method: 238 unknown, randomly selected readers of the Australian Institute of Management (AIM) website completed and submitted the researchers' questionnaire. Statistically speaking, the randomness is not supported by a relevant methodological technique while the representativeness of the sample is not documented, or, b) the theoretical documentation: The main source is the article by Barlow, Jordan, and Hendrix (2003) using a sample of military professionals. Both Sarros et al. (2006) and Barlow et al. (2003) propose the generalization of results to a wider population, which raises questions about the tool's validity. Even the researchers themselves question the power of their tool, when saying they "found that a selfreport questionnaire could, at best, only measure respondents' knowledge of and desire for good or moral knowing and moral feeling." (p. 692).

Loviscky, Trevino, and Jacobs: Managerial Moral Judgment Test
Inspired by Rest et al.'s (1999) DIT tool, the authors of this tool used six hypothetical but realistic scenarios to produce a new measure of ethical judgment. The three researchers claim that MMJT refers to a society in general compared to DIT, which applies only to the administrative sector. The reliability and validity of the new tool was checked using four other measures (tests): "Wonderlic Personnel Test" (WPT; Wonderlic and Associates, 1998), the "Verbal Critical Thinking Test" (VCT; Saville & Holdsworth Limited, 1989), the "Hogan Personality Inventory" (HPI; Hogan & Hogan, 1992) and -finally-DIT test in order to find similarities and differences between DIT and MMJT. Although the researchers claim that "MMJT is measuring managerial moral judgment in a reliable manner" and "predict[s] ethics-related performance" (p. 275), there are major weaknesses: a) the scenarios were constructed with the assistance of seven Ph.D. students and "11 experienced ethics officers and human resources managers from various industries" (p. 267). This means that the theoretical background of the construction was based exclusively on PhD students' work; on the other hand, the sample of managers constitutes the pilot rather than the main research. When checking the validity and reliability of the tool in regard to the sample, one notes that the average age of the respondents is 23.58 years, while the average duration of their work experience is 2.98 years; these are insufficient parameters even for the authors themselves. The researchers state that "in the current research, we tested hypotheses in a student sample. Future research will be needed in managerial samples" (p. 274), b) although the researchers argue that they predict ethical leadership behaviors in general, they focus only on one component of ethics, i.e., justice, because "corporate ethics officers routinely report that about two-thirds of the issues raised in ethics reporting systems are fairness issues" (p. 267). In other words, they tend to equate ethics with justice, thus, ignoring other significant ethical features.

Kalshoven, Den Hartog and De Hoogh: Ethical Leadership at Work Questionnaire
The authors attempt to detect seven characteristics they call ethical: fairness, integrity, ethical guidance, people orientation, power sharing, role clarification, and concern for sustainability. The so-called "people orientation or having a true concern for people" identifies with what other writers define as "care", as it "reflects genuinely caring about, respecting, and supporting subordinates and where possible ensuring that their needs are met" (Kalshoven et al., 2011, p. 53). Even though the first four characteristics constitute elements of ethical behavior, the rest three befall to leadership rather than ethical behavior. Here we select to discuss two elements: a) power sharing: for decades, it has been considered either "a leadership functional approach" (Lord, 1977, p. 115), or "a concept of management practice" (Conger & Kanungo, 1988, p. 471), or a component of "shared leadership" (Pearce & Sims, 2000). Additionally, Burke, Fiore, and Salas (2003) argue that shared leadership is a way of "exploiting" the possibilities of subordinates in terms of knowledge, attitudes, and perspective. In other words: Burke, et al.'s meaning of sharing and that of Kalshoven et al. (2011) seem rather identical; hence, it is arguable whether it is an ethical feature, b) sustainability: Yukl et al. (2013) support that sustainability involves many social issues which make the definition and measurement of ethical leadership too complicated. Though other authors also agree with this view as sustainability "is, by definition, a moral concept and a moral practice" (Hargreaves & Fink, 2006, p. 26). Sustainable leadership refers to a wide range of life issues, like earth, society, and the health of local and global economies (Kuhlman & Farrington, 2010).

Riggio, Zhu, Reina and Maroosis: «Leadership Virtues Questionnaire»
This tool is based solely on the philosophical rather than the behavioral foundation of ethics. An ethical leader is defined as one who is endowed with wisdom, mental strength, temperance, and justice. Riggio et al. (2010) aspire to detect the existence of these virtues through the LVQ. They focus on tracing elements of character, "the virtues he or she possesses, and the self-knowledge and self-discipline that guide the leader's moral actions" (p. 237). They believe that personal ethics acts as a catalyst in shaping professional ethics. In contrast to most relevant tools investigating the existence of ethical leadership through the behavior and actions of the leader, LVQ "focuses on virtue-ethics [which are] the characteristics that create a moral person instead of focusing on the actions that make a person moral" (p. 246). Though the theoretical/philosophical perspective of the ethical leadership seems to be realistic, there are basic weaknesses: a) the samples studied for its construction and testing: While the researchers' intention was to construct an upward rating tool, all samples consisted [only] of managers who evaluated their "direct leader". Employees' evaluations were not included. We cannot support that executives and "ordinary" employees evaluate leadership from the same perspective, b) the content of control questions is ambiguous and ambivalent: Let us take the question detecting prudence "Does as he/she ought to do in a given situation?" How do employees know what their supervisor ought to do? Do they access all required information? Does their ethical or/and professional perspective towards the organization coincide with that of their leader? Or, consider two (out of five) assessing items of fortitude: subordinates are expected to be experts in ethics and/or cognitive psychology. Is subordinates' arbitrary opinion reliable when detecting the existence of those virtues? Is the detection procedure an ethical process in itself? In addition, the question assessing the leader/supervisor's justice "Does not treat others as he/she would like to be treated" seems subjective and rather irrelevant. Can we expect others to know how we would like to be treated?

Tanner, Brügger, Schie and Lebherz: "Ethical Leadership Behavior Scale"
This model seeks to evaluate the degree of difficulty in demonstrating each of the 35 identified ethical components and determine the number of -ethical-behaviors displayed by the evaluated leader. The originality of its construction lies exclusively in the application of Rasch's probabilistic model (Bond & Fox, 2015) to measure the performance of a student or an athlete using "statistics". The authors support that they adapted it accordingly to predict the ethical behavior of a leader; so, the mathematical relationship of the ethical components with the Rasch types determines the level of ethical leadership of the leader. Some points regarding validity are obscure in this assumption: The model is monoparametric because it aims at determining the degree of ethical leadership in general. Therefore, a) it does not consider how often each of the "central values" appear, b) its power is questionable, as it is rather risky to create the basic factors (difficulties) of each behavior based only on a specific sample. Such an approach does not ensure generalization, especially, when this sample consists exclusively of employees "of two Swiss federal police departments, only 14% of whom were women". If the evaluation referred to the supervisors in the sample, the application of this model would make some sense. However, the predictive power of the tool for any leader ("[ELBS] has great potential to be highly useful for practical purposes" (Tanner et al., 2010, p. 232)), is somehow exaggerated. The difficulty of demonstrating ethical behavior may differ significantly among employees of various contexts (i.e., a police station or a private for-profit enterprise), c) the researchers' documentation of the Rasch's model applying to behavior determination is limited; it appears only in one study (Kaiser, 1998), which introduced this application. This needs particular attention, as the 35 ELBS behaviors aptly test each of the four components of a leader's ethical behavior. The researchers admit that "only the last response category (''strongly agree'') was able to reveal the distinct behavior difficulties suggests that the behavioral items involved in the instrument tend to be too easy" (Tanner et al., 2010, p. 232), d) the importance attributed to the definition of "central values" and "types of ethical behavior". It should be noted here that the "critical" ability of samples of ordinary students with minimum work experience is questionable, e) regarding the recent standardization of ELBS in Brazil (Filbo, Ferreira, & Valentini, 2019), the results seem not so encouraging. The Brazilian researchers (Filbo et al., 2019) found that the power of the tool's unidimensionality is limited, as a large number of questions "tend to generate a high amount of residuals" and "consequently, the model tends to have adjustment indicators below the preestablished cut-off points" (Filbo et al., 2019, p. 355). That is why Filbo et al. (2019) suggest a reduction of the number of these questions: "the number of items can be reduced without significantly reducing the reliability of the scores" as an overall reform of the tool (p. 356), f) regarding the power of Rasch's model: Despite its growing appearance in publications (Bond & Fox, 2015), there are reservations about its implementation: "[Rash's model is] generally unsuitable for use in educational assessment" (Goldstein & Blinkhorn, 1982, p. 167). Moreover, "the measurement tools implied by such models are questionable in terms of any underlying psychological or educational theories", because a "raw score" cannot significantly determine "other features of the total response pattern" (Goldstein, 1980, p. 244). Zheng et al. (2011) attempt to relate the nature and the consequences of ethical leadership with contextual factors. They introduce a new research element, "Effects on Intention to Leave" associated with ethical leadership. The rationale behind this element is that "ethical leadership can reduce the follower's intention to resign" (Zheng et al., 2011, p. 195); this element is related to the Confucian philosophy (deeply rooted in the Chinese people).

Zheng, Zhu, Yu, Zhang, and Zhan: «Ethical Leadership Measure»
We note some weaknesses: a) the tool tests aspects of behavior. It does not detect ethical leadership traits. What aspect of ethics does the question "Never mixes personal matters with work" fit into? How does this question check the leadership skills of the supervisor? b) conceptual definitions are rather weak. Regarding the perspective of ethics, they claim that leadership itself, along with the purpose of leadership are ethical, c) the research design suffers: the Factor Analysis (Exploratory and Confirmatory) of ELM was based on young inexperienced employees and their developing judgment, d) there is a resemblance between the rating items of ELM and those of ELS (Brown et al., 2005). Accordingly, weaknesses regarding ELS's items (e.g., followers' "wisdom") are repeated in ELM.

Yukl, Mahsud, Hassan and Prussia: "Improved Measure for Ethical Leadership"
There are some weaknesses in their tool regarding the following: a) choosing items from three previous tools ((ELS, PLIS, & ELW) and the argument of "research continuity": There is not enough documentation for their choices; they imply that it was done because these tools measure ethical leadership. The question here is on what criteria Yukl et al. (2013) excluded other similar tools, b) structural contradictions: The authors support that their tool «has several advantages" over relevant tools "being short and easy to use". Further, Yukl et al. (2013) state, "we should avoid the temptation to oversimplify the meaning of ethical leadership by equating it to the composite score on a short questionnaire" (p. 46). The second statement contradicts the practicality of short questionnaires mentioned in the first statement, c) issues associated with the sample characteristics: on the one hand, the respondents were graduate students with minimum work experience; hence the reliability of their judgment seems dubious; on the other hand, their employers sponsored their studies, a fact that affects their objectivity, d) some of the IMEL items [i.e., "Shows a strong concern for ethical and moral values"] are difficult for the employee to answer accurately on a 6-grade Likert scale. Even though employees are familiar with basic ethical principles, how can they be aware of all organizational parameters (e.g., finance or shareholders' decisions/intentions) upon which a leader is called to make their decision?

Wang and Hackett: "Virtuous Leadership Questionnaire"
This is an update of the researchers' (LVQ, 2010) tool. The tool focuses on virtue ethics and the character of the business leader and differs from "well-known leadership perspectives". Wang and Hackett (2016) avoid "conflicts" among virtue ethics, deontology, and teleology. The comparative analysis of Aristotelian ethics with Confucian ethics provides a strong philosophical background to this study. However, some elements raise questions: a) For Aristotle, teleology is not distinguishable from virtue ethics: "Every craft and every line of inquiry, and likewise every action and decision seems to seek some good" "Nicomachean Ethics" (1094a §1, Irwin, 1999). Aristotle accepts the inextricable connection between virtue and teleology. Consequently, one cannot separate them, especially in a tool consisting mainly of cardinal virtues (see also, Bertland, 2009;Vogt, 2017), b) in contrast with many researchers, Wang and Hackett (2016) do not accept teaching business ethics and the necessity of ethics programs. Instead, they believe in personal virtue and character construction during the years which, affects any professional behavior, c) if, regardless of circumstances, personal ethical codes coincide with professional ones, further reflection is needed on the following: Is always personal ethics aligned with professional one and vice versa? Does developing ethical leadership presuppose the existence of a character possessing the cardinal virtues? On the other hand, adopting norms (deontology) and considering consequences (teleology) can provide a "sufficient" ethical leadership level, irrespective of long-term character shaping?

Mitropoulou et al.: "Redefining Ethical Leadership: Development and psychometric evaluation of the Questionnaire of Ethical Leadership"
Some points of QueL raise a number of questions: a) the assertions on which the tool itself and its items were constructed. Since QueL is a self-rating tool, many questions entail social desirability and indicate subjectivism (i.e., "I show integrity and consistency in ethical behaviors"), b) the author argues that QueL is the first self-rating psychometric tool assessing ethical leadership at work since all previous tools are hetero-referential. Mitropoulou et al. (2020) argue that its construction followed a thorough statistical research design, which proved the factorial invariance of QueL in leaders and subordinates; hence, both groups' views on ethical issues are identical. We believe that it cannot be proven that leaders and followers realize the ethical meanings in the same way. Even by excluding studies with direct factorial invariance, Jöreskog asserting that "the battery of tests need not to be the same for each group" concluded supporting "the same tests" (Jöreskog, 2007, p. 61), c) finally, we object to the accuracy and reliability of employees' responses to certain items. For example, consider the question: "My supervisor is an ethical example at her/his personal life and work, systematically?" How do we know that the employees are aware of their boss's personal actions and behavior? Or, do they speculate?

Shakeel, Kruyen, and Van Thiel: "Broader Ethical Leadership Scale"
The tool is an interesting synthesis and a literature supplement to the ethical leadership tools, but it contains a number of inadequacies: a) the construction of the tool and the factor loading. Non-overlapping items with factor loadings above .4 were chosen from nine published ethical leadership tools. But none of the previous tools provided initial underlying factors or loadings of adopted items without at least one empirical study. The combination of variables of different factors used does not guarantee the same remaining factors, nor the loadings of the variables to each of the factors, b) validity: adopting 25 (out of 48) items by just converting them from follower-rating to self-rating forms needs further documentation of validity. The exploratory role of 17 newly added items also needs further proof, c) minor drawbacks. They are related to assertions about non-existing items of authentic leadership, spiritual leadership, and positive leadership and keeping the original meaning of the adopting items. Even though "BELS" seems to be a fully-featured ethical leadership tool, "a validation study can determine which items work best" (Shakeel et al., 2020, p. 17) and which of them have to be discarded.

Discussion and Implications
Ethical leadership is nowadays a matter of great importance and urgent need both in private and public sector. This is the reason why researchers struggle to invent an appropriate and effective tool to detect and measure the ethical level of an in-service or a candidate leader, who "has to be above the crowd and yet one of the crowd" (Ciulla, 2005, p. 9). The critical analysis of the above studies revealed the following: a. Researchers have not come to an agreement on whether a tool construction should be based on literature or on, empirical research, or both. b. There is no consensus about the length of a tool. The number of items ranges from less than ten to more than forty and depends on the theoretical or philosophical perspective of ethical leadership employed in each study. c. There is no unanimity regarding the judging or rating body. Some researchers support the use of hetero-referential tools as the most reliable way of detecting or measuring ethical leaders; others are pro-self-referential tools; a small number reflect on using both types of tools to achieve reliability of results. Again, the importance given to one type of tool or the other depends on the researchers' background and the perspective of ethical leadership they decide to employ. d. Differences can be noted regarding the form of questions used. Some prefer scenarios and/or vignettes; others favor forced-type questions on a Likert scale. e. Samples used in most studies suffer reliability and validity, as it is difficult to reach samples of active or aspiring leaders or incumbent employees. In most studies, the samples were convenient, chosen among inexperienced employees or business school [post]graduate students with limited knowledge of the organizational reality. f. Though researchers wish to make a universally applied tool, fundamental distinctions between Western and Eastern (or, the rest of the world) ethical cultures are not considered. g. Statistical analyses (i.e., reliability, validity, correlations, factor analysis, and regression analysis) require a strong mathematical background. Otherwise, their results tend to lack clarity and trustworthiness.
In the name of prosperity, fundamental ethical traits and social skills tend to diminish in modern societies, and an individualistic social perspective is predominant. To recover from such a de-humanizing society, ethical leadership becomes an utmost necessity. As a result, pursuing to construct new tools detecting and/or measuring ethical leadership or improving existing ones remains an ethical obligation for researchers.
However, we feel that a distinction should take place: detecting elements of ethical leadership in aspiring leaders is different from measuring ethical leadership in existing leaders. This means that researchers working on new tools should be explicit about what they measure or detect. Another dimension to consider is the context within which the tool is constructed and tested. We are for context-appropriate tools. We support that tools cannot be universal and on principle one-measure-for-all, as the cultural, political, and economic circumstances in different parts of the world create dissimilar demands and requirements for leaders.
Moreover, we agree with Yukl et al. (2013) that "the ratings of leader ethical behaviors may be biased by a subordinate's general evaluation of the leader, but the alternative of using leader self-ratings of ethical behaviors entails an even greater likelihood of biased responses". Hence, researchers should revisit ethical leadership both ways, self-evaluation and heteroevaluation, simultaneously if they are to achieve the highest degree of reliability and avoid possible bias. Psychometric tools cannot be the only option; tools should include qualitative approaches, enabling researchers to deepen and reveal more aspects of the human personality. Finally, we propose that new tools should consider relative domains of ethical leadership and share common questions for both leaders and followers/subordinates.