Abstract
Background: Script concordance testing (SCT) is an objective method to evaluate clinical reasoning that assesses the ability to interpret medical information under conditions of uncertainty. Many studies have supported its validity as a tool to assess higher levels of learning, but little is known about its acceptability to major stakeholders. The aim of this study was to determine the acceptability of SCT to residents in otolaryngology – head and neck surgery (OTL-HNS) and a reference group of experts.
Methods: In 2013 and 2016, a set of SCT questions, as well a post-test exit survey, were included in the National In-Training Examination (NITE) for OTL-HNS. This examination is administered to all OTL-HNS residents across Canada who are in the second to fifth year of residency. The same SCT questions and survey were then sent to a group of OTL-HNS surgeons from 4 Canadian universities.
Results: For 64.4% of faculty and residents, the study was their first exposure to SCT. Overall, residents found it difficult to adapt to this form of testing, thought that the clinical scenarios were not clear and believed that SCT was not useful for assessing clinical reasoning. In contrast, the vast majority of experts felt that the test questions reflected real-life clinical situations and would recommend SCT as an evaluation method in OTL-HNS.
Conclusion: Views about the acceptability of SCT as an assessment tool for clinical reasoning differed between OTL-HNS residents and experts. Education about SCT and increased exposure to this testing method are necessary to improve residents’ perceptions of SCT.
To deliver safe and effective patient care, clinicians must develop and apply sound clinical reasoning skills. Although definitions of clinical reasoning differ, they generally include the idea that clinical reasoning entails cognitive operations allowing clinicians to observe, collect and analyze information, resulting in decisions and actions that take into account a patient’s specific circumstances and preferences.1,2 Explicit teaching and formal assessment of these essential clinical reasoning skills are crucial during residency, as they represent the formative training years of a burgeoning physician.
In Canada, clinical reasoning skills are currently assessed mainly by oral examinations. However, oral examinations are time consuming, with only 1 resident being evaluated at a time, and they require substantial human and financial resources. Also, human factors (anxiety, stress, attitude, interpersonal conflict) can potentially limit the objectivity of the evaluation.3
Clinical reasoning is often underrepresented in written examinations; for example, in the 2012 National In-Training Examination (NITE), an annual formative examination written by all Canadian residents in otolaryngology – head and neck surgery (OTL-HNS), only 5% of questions were designed to evaluate clinical reasoning. The remainder of the questions were designed to assess either factual knowledge (50%) or the ability to apply knowledge (45%).
Given that there does not exist a single assessment method that adequately addresses all elements of clinical reasoning, a combination of different assessment methods is often required. One such assessment method is script concordance testing (SCT).
Script concordance testing was developed in 1998 by Charlin and colleagues.4 It has the unique feature of assessing the interpretation of clinical data under conditions of uncertainty. It has its roots in the concept of the illness script, a specialized knowledge structure, different for each clinician, in which medical knowledge is organized. It creates links between information such as illnesses, clinical features and management options.5 Typically, when assessing a specific clinical situation, a physician will subconsciously choose 1 of his or her scripts on the basis of a key element of the encounter. The physician will then compare the clinical situation with different elements of the script to determine the best hypothesis. He or she will actively use his or her knowledge networks (scripts) to constantly make judgments on the effect that each new piece of information has on the status of the hypothesis. Script concordance tests are designed with the assumption that the participant already has some factual knowledge about the subject being tested. As such, the more experienced the clinician, the more refined the scripts.2
The format of SCT differs from that of other evaluation tools (Figure 1). Each question starts with a clinical scenario, usually inspired by a real patient encounter. Next, a hypothesis is added to activate a script in the participant’s mind.5 Finally, new information is provided to recreate the process by which the physician searches for new information to confirm or rule out a hypothesis. The answer choices are formatted in a way that invites the participant to reflect on whether the probability of the hypothesis being true has changed in light of the new information.
Script concordance testing is based on the principle that the multiple judgments made in these clinical reasoning processes can be probed and their concordance with those of a panel of reference experts can be measured. The scoring key is thus based on the answers given by a group of experts to the same questions. The traditional scoring method for SCT is an aggregate one (1 point is given for the most popular answer among the experts, other answers chosen by a minority of experts are given partial points, and answers not chosen by any expert are given 0 points).
In the last decade or so, SCT has been widely studied, with validity evidence gathered in various contexts, such as different disciplines (e.g., pediatrics,6 urology,7 emergency medicine,8 optometry,9 nursing10), different levels of expertise (medical students,11 residents, attending physicians) and different purposes (self-evaluation,12 formative or summative assessments). Specific to OTL-HNS, Kania and colleagues and Iravani and colleagues compared 2 groups and concluded that SCT could discriminate between levels of expertise.13,14 Despite this emerging knowledge, little is known about the acceptability of SCT to the major stakeholders.
The acceptability of a test can be defined as the degree to which it appears practical, pertinent and related to the purpose of the test,15 on the basis of one’s personal experiences, beliefs and (mis)conceptions.16 Acceptability includes the concept of face validity, but it also includes other considerations that will affect the person taking the test (time to complete, burden, etc.). Acceptability, along with reliability, validity, cost-effectiveness and educational impact, is an essential component of a test’s usefulness (Van der Vleuten and colleagues introduced this concept of test usefulness in 199616). This model defines reliability and validity as the main components of a test’s usefulness, and acceptability, educational impact and costs as other important factors. The weight of acceptability relative to other components depends on the context in which the test is used.17 The degree of acceptability, by residents or experts, will influence the efficiency and implementation of the test. Therefore, the aim of our study was to assess and compare the acceptability of the SCT among OTL-HNS residents and faculty members and to identify potential factors that can affect its acceptability.
Methods
Study overview
We created SCT questions specific to the field of OTL-HNS and then distributed the questions both to OTL-HNS residents (via the NITE) and to staff at 4 hospitals (via email). After answering the SCT questions, both groups responded to a post-test survey about the acceptability of SCT. This study received ethical approval from the Université de Montréal research ethics board (Comité d’éthique de la recherche en santé: 13-120-CERES-D).
Creation of SCT questions, scoring key and posttest survey
An SCT test was created with 8 clinical cases (3 in pediatrics, 3 in oncology, 3 in otology and 1 in rhinology), with a total of 28 questions. All questions were constructed using Fournier and colleagues’ guidelines on how to write SCT tests.18 A team consisting of 1 OTL-HNS resident (A.-A.L.) and 2 OTL-HNS content experts (L.N., T.A.) wrote the SCT questions, and then 2 SCT experts (B.C., S.L.) reviewed and modified them. The SCT questions were then sent to the NITE committee.
Informed by a literature review, 3 of the authors (A.-A.L, L.N., T.A) created a post-test survey for this study. In the survey, participants were asked about their level of training (for residents), university affiliation, area of expertise (for faculty) and prior knowledge of SCT. Participants also rated their agreement with a series of statements regarding SCT on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). Finally, participants were asked if they recommended SCT as an assessment tool for OTL-HNS residency and were asked to answer 2 open-ended questions (What did you like best about responding to SCT questions? What did you like least about responding to SCT questions?).
Participants
Residents
The SCT questions we created for this study and the post-test survey were added to the end of the annual NITE, with an explicit statement to participants that this was a separate, optional and confidential section and that their test scores for this section would not be forwarded to their program directors. All residents of a Canadian OTL-HNS program in 2013 and 2016, from postgraduate year 2 (PGY-2) to postgraduate year 5 (PGY-5), were eligible to participate. In 2016, residents were excluded if they had participated in the study in 2013.
Experts
There are different criteria that can be used to choose the members of an expert group, and none are universally accepted.19 We chose to include OTL-HNS surgeons who practised in an academic hospital, who were regularly involved in the teaching and assessment of OTL-HNS residents and who had a minimum of 1 year of fellowship training. The expert group was recruited from 4 universities (Université de Montréal, Université Laval, Université de Sherbrooke, McGill University). Emails were sent in 2013 and 2016 to the OTL-HNS program directors of those universities, who then forwarded the messages to all eligible faculty members. Respondents were sent an email with a consent form, questions from the SCT test we created that were related to their clinical area of expertise and the post-test survey.
Statistical analysis
For the analysis of post-test survey results, we compared the results between the 2 participing groups using the Mann–Whitney U test using IBM SPSS Statistics 24, with a p value less than 0.05 considered statistically significant. We calculated the median score for each group for the questions probing agreement with statements on a 5-point Likert scale and then compared these values between the groups using the Mann–Whitney U test. Subgroup analysis was done with the same tests. Of note, we did not analyze the SCT test scores because our objective in this study was not to assess the validity of SCT. Thematic analysis was used for the open-ended questions.
Results
Participant characteristics
Forty-one residents (response rate 21.0%) participated in this study. The 2016 results of 5 residents were excluded because they had participated in the study in 2013. A total of 36 residents were included in the final analysis, 20 of whom were junior residents (PGY-2 and PGY-3) and 16 of whom were senior residents (PGY-4 and PGY-5). We received 23 responses from experts (response rate 34.8%). The post-test survey was incomplete for 3 participants in the resident group and 2 in the expert group. The characteristics of the 2 groups of participants are described in Table 1 and Table 2. The level of knowledge regarding SCT was low in both groups.
Acceptability
Results from the post-test acceptability survey of both residents and experts are presented in Table 3. For each statement, experts had, on average, a more positive opinion of the SCT test than the residents. They were more likely to find the questions clear, to find this type of test easy to adapt, and to think that SCT was a useful tool to evaluate clinical reasoning. Despite these differences, both experts and residents agreed that the clinical scenarios were realistic and were likely to be encountered in everyday practice.
Our results did not show a significant difference between the opinions of junior and senior residents (Table 4) for the statements, except for 1 statement. Interestingly, senior residents found that the time allotted to answer all the SCT questions was insufficient, while junior residents found the same amount of time to be adequate.
The majority of the experts (73.9%) were in favour of integrating SCT into OTL-HNS residency programs as a formal assessment tool, while only 33.3% of residents held this view. There was no difference between the views of junior and senior residents (30.0% and 37.5%). The combined group of experts and residents with prior knowledge of SCT were more inclined to recommend SCT (68.4%) than the group with no prior knowledge (39.5%) (p < 0.05).
Discussion
To our knowledge, this is the first study to examine the acceptability of SCT as a formal assessment tool in residency among major stakeholders. We found that experts were significantly more inclined to accept SCT than residents. We hypothesize that the discrepancy between residents’ expectations of a test and the reality of being a practising physician may have contributed to the difference of opinion. Residents probably expect certainty in testing conditions, especially in high-stakes examinations. Experts are probably more used to uncertainty. In response to our open-ended questions, experts mentioned that they liked the SCT for its global approach to a case and for its ability to assess knowledge and clinical reasoning at the same time. They were also aware that different evaluation tools must be employed to create an evaluation that truly reflects a resident’s capacity.17
The use of multiple forms of assessment ensures that different aspects of OTL-HNS residents’ knowledge and aptitude are evaluated. It may be adequate to focus on assessing factual knowledge among very junior residents, but as training progresses, more complex skills such as decision-making, particularly in difficult clinical situations, need to be taught and assessed. Various assessment methods have been described to evaluate clinical reasoning, such as multiple-choice questions (MCQs), patient management problems, key features problems, oral examinations and objective structured clinical examinations (OSCEs). In relation to these methods, the strength of SCT is that you can administer more items per unit of testing time. The more content domains you can probe in a given amount of time, the more accurate your assessment of a learner’s overall performance will be, because performance on 1 item does not necessarily predict performance on other items probing different content domains (content specificity concept).3 For example, an average SCT examinee can complete about 60–90 questions per hour, whereas an average OSCE examinee can work through only 8–12 items (i.e., stations) per hour. Also, MCQs are good for combating the content specificity problem, but it is easier to develop MCQs to probe pure factual knowledge than to evaluate clinical reasoning. Oral examinations are also a good option for testing multiple levels of knowledge, but SCT needs fewer resources in terms of cost, people and time.
In our study, residents and experts had a different opinion of SCT. A larger percentage of residents than experts found that the test questions were unclear, that they were not useful for evaluating clinical reasoning, that they were not representative of real clinical situations and that it would be hard to adapt to this kind of test.
Other studies have collected participants’ opinions about SCT as a secondary objective. In 2015, Cobb and colleagues used a focus group of 18 final-year undergraduates veterinarian students to assess the acceptability of SCT. The students concluded that SCT is confusing but that it is probably more relevant to decision-making in clinical practice and encourages participants to reflect on their own experience and knowledge.20 In a study by Kelly and colleagues, students reported that they did not find SCT questions harder than regular MCQ, but they still preferred MCQs.21 Kania and colleagues reported no difference in opinion on SCT between students and experts.13 The study setting may partially explain why those results differed from ours. Our study took place immediately after a formal annual NITE and was not limited to the research setting. The perceived connection to an examination possibly influenced residents’ views on the acceptability of SCT. Also, the possibility that SCT might be used as a tool in a formal examination was more tangible and thus residents may have viewed this testing method as more threatening.
The majority of the residents in our study were not familiar with SCT, probably because clinical reasoning has traditionally been assessed with oral questions.3 The responses to our open-ended questions showed that some residents felt that oral questions are a better way to evaluate clinical reasoning abilities. The residents’ relative lack of knowledge about the principles of SCT, and their lack of practice with them, could have negatively affected their opinion of SCT. This may explain why experts and residents with prior knowledge of SCT were more inclined to recommend the use of SCT to evaluate residents. Interestingly, when we compared the opinions of junior and senior residents, the only divergence seen was related to the time allotted to answer SCT questions. Senior residents expressed the desire for more time, while junior residents felt that they had enough. Given that senior residents have more experience and knowledge, and probably more elaborate scripts, it is possible that they spent more time thinking about their answers and going through their scripts.
Limitations
The main limitation of our study was the low participation rate (21.0% for residents, 34.8% for experts). It is hard to know if the people who chose to participate were different than those who did not participate. Almost equivalent numbers of junior and senior residents participated in this study, so we believe that the participants were still representative of the population studied. Also, experts were recruited from only 4 Canadian universities, all in Quebec, and most of those who chose to participate were from Université de Montréal (78.3%). As otolaryngology program accreditation and final board examinations are administered by national bodies, we have no reason to think that the responses from our expert participants would differ from those of experts in the rest of Canada. Other limitations of this study are that the questionnaires were administered at the end of the NITE and thus fatigue could have influenced the results from the residents and that the surveys were not validated.
Conclusion
Our study results suggest that OTL-HNS faculty members and residents have differing views on the acceptability of SCT as an assessment tool. Most of the experts and residents who participated in our study had never heard of SCT. We think that using SCT more frequently, in an informal learning setting, may help residents get familiar with SCT and enhance its acceptability as an assessment tool for clinical reasoning. Use of SCT as a learning tool could also be a way to familiarize residents and faculty members with the method. Different groups are currently working on making this method more accessible for teaching (e.g., in a Web-based format). We believe that SCT could be a useful and powerful addition to our tesing methods for residents once it is more widely known and understood.
Footnotes
Presented at the Annual Meeting of the Quebec Association of Otolaryngology and Head and Neck Surgery, Oct. 16–18, 2015, Québec, Que.; at the 69th Annual Meeting of the Canadian Society of Otolaryngology – Head and Neck Surgery, June 6–9, 2015, Winnipeg, Man.; and at the 2nd International Conference on Clinical Reasoning, Oct. 28–31, 2014, Montréal, Que.
Competing interests: None declared.
Contributors: A.-A. Leclerc, L. Nguyen, B. Charlin, S. Lubarsky and T. Ayad designed the study. A.-A. Leclerc acquired the data, which A.-A. Leclerc and T. Ayad analyzed. A.-A. Leclerc and T. Ayad drafted the manuscript, which A.-A. Leclerc, L. Nguyen, B. Charlin, S. Lubarsky and T. Ayad critically reviewed. All authors provided final approval of the version to be published.
Data sharing: The data sets analyzed in this study are not publicly available as they were provided by the National In-Training Examination (NITE) committee, but they are available from the corresponding author on reasonable request.
- Accepted July 28, 2020.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is noncommercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/