Summary
Phase 3 randomized controlled trials are the widely accepted gold standard through which treatment decisions are made, as they assess the efficacy of a novel treatment against the control on the relevant patient population. The effectiveness of the novel treatment should be derived by measuring patient-important outcomes; however, to accurately assess these outcomes, clinical trials often require extensive patient follow-up and large sample sizes that can incur substantial expense. For this reason, investigators substitute surrogate end points to reduce the sample size and duration of a trial, ultimately reducing cost. The purpose of this article is to help surgeons appraise the surgical literature that use surrogate end points for patient-important outcomes.
In medicine, the phase 3 randomized controlled trial (RCT) is the widely accepted gold standard through which treatment decisions are made. By evaluating the clinical effectiveness of a new therapeutic agent, device or surgical procedure against the current standard of treatment, we assess the efficacy of the novel treatment against the control on the relevant patient population. The effectiveness of the novel treatment should be derived by measuring patient-important outcomes — the clinical events relevant to the patient population.1–3 Examples of common patient-important outcomes include the occurrence of events, such as venous thrombosis embolism, stroke, tumour recurrence or death, or health-related quality of life measures, such as knee function scores.1,2
However, to accurately assess these outcomes, clinical trials often require extensive patient follow-up and large sample sizes that can incur substantial expense. For this reason, investigators substitute patient-important events for the associated laboratory measurements and physical signs (surrogate end points) to reduce the sample size and duration of a trial, ultimately reducing cost.1,2
Although surrogate end points represent an enticing alternative to patient-important outcomes, their use is associated with potential benefits and risks. On one hand, surrogate end points may be beneficial, as they allow effective treatments to be approved and made available earlier, in turn allowing surgeons to provide a greater array of treatment options to their patients.1,2 Alternatively, given that surrogate end points function to reduce the sample size and the duration of follow-up that would otherwise be required to measure patient-important outcomes, their drawback is that they have the potential to misrepresent the effectiveness of a therapeutic intervention, resulting in excess morbidity and mortality. As such, it is imperative that one is confident in the validity of a surrogate end point when interpreting the results of a clinical trial.1
Prentice4 developed 2 criteria that must be satisfied to ensure the validity of a surrogate end point. First, the surrogate end point must be in the causal pathway of the disease process. Second, the change in the surrogate end point must capture the net effect of the treatment on the patient-important outcome, such that a change in the surrogate end point corresponds to a change in the patient-important outcome.2,4,5
To date, examples of surrogate end points include bone mineral density for long-bone fracture risk, uncontrolled blood pressure for stroke, and hemoglobin A1c (HbA1c) for the complications and disease progression associated with type 2 diabetes. 1–3 Recently, surrogate end points have been gaining popularity in the surgical literature, as they represent a solution to the problems that currently impact modern randomized trials — specifically, the large sample sizes, long-term follow-ups and high cost associated with recording patient-important outcomes — although this is not without controversy.6
The purpose of this article is to help surgeons appraise the surgical literature that use surrogate end points for patient-important outcomes.
Clinical scenario
At the last cardiovascular surgery rounds, the newest staff recruit in the division gave a presentation on the results of his robotic-assisted coronary artery bypass grafting (CABG) in the previous 6 months. He claimed that a main benefit of the robotic procedure compared with traditional CABG was the lower levels of pain that patients experienced. In the ensuing discussion, a senior cardiac surgeon challenged him to show the evidence. The presenter then showed an additional slide comparing the amount of narcotics given to his patients versus the amount given to the patients who had traditional CABG in their centre; the narcotic use favoured the robotic group. The senior surgeon was not impressed with this type of evidence. The chief of the service intervened and asked the cardiovascular fellow to review the literature and report back to the group if the amount of narcotics administered can be used as evidence of pain measurement for CABG.
Literature search
The ideal article type to address the question posed in the scenario would be a large RCT or a meta-analysis of RCTs that compares pain (measured with a valid pain scale) in a head-to-head comparison of robotic-assisted CABG and traditional CABG. If such an article were not available, you could search for a nonrandomized observational study that compares both techniques.
From a computer in the hospital library, you search the Cochrane Database of Systematic Reviews, but no meta-analysis has been published on this topic. You subsequently search the MEDLINE database from the National Library of Medicine. Your keywords are derived from the clinical question (see Users’ guide to the surgical literature: how to perform a high-quality literature search.7). You enter the search terms “coronary artery bypass graft” AND “robotic-assisted” AND “pain,” which yields 15 articles. As RCTs are considered to be of higher evidence, you limit your search to RCTs; this yields no relevant articles. You search for studies published within the last 5 years (January 2012 to December 2016), which subsequently narrows your list to 6 articles.
Of these 6 articles, 1 describes a novel therapeutic technique for the treatment of aortic valve stenosis and left main coronary disease.8 Two articles are feasibility studies; 1 addresses the use of robotic technology in transmyocardial revascularization9 and the other quadruple CABG.10 Two of the articles are observational studies that compare postoperative pain and complication rates in patients undergoing robotic versus conventional CABG.11,12 Although both articles reference your intervention of interest, a review of their titles and abstracts reveals that neither study uses narcotics consumption as a surrogate for postoperative pain, and they can ultimately be excluded. The remaining article is a retrospective propensity-matched analysis published in 2016.13 You determine that this is the only article assessing robotic versus conventional CABG to reference narcotic consumption as a surrogate end point for postoperative pain and thus adequately addresses the question posed in the clinical scenario. You ultimately decide to critically review this article by Raad and colleagues.13 The characteristics of their study are outlined in Table 1.
As with previous guides to the surgical literature articles, we implement the framework shown in Box 1 to critically appraise the validity of the study, interpret the results and apply these conclusions to our patient population.
Guidelines for how to assess an article on surrogate outcomes
Are the results valid?
Is there strong documented/published evidence that connects the surrogate outcome to the patient-important surgical outcome under consideration?
Is there strong evidence that a change in the surrogate has led to a change in the target surgical outcome?
Is there strong evidence that similar interventions show similar improvements in both the surrogate and the patient-important surgical outcome?
What are the results?
What was the magnitude of the treatment effect?
Are the results applicable to my patients?
Will the information from this study help me to inform my patients?
Are the results valid?
In this section, we will determine whether narcotics use as a surrogate for postoperative pain satisfies the criteria proposed by Prentice4 by addressing the following questions:
Is there strong documented/published evidence that connects the surrogate end point to the patient-important outcome under consideration?
Is there strong evidence that a change in the surrogate end point has led to a change in the target outcome?
Is there strong evidence that similar interventions show similar improvements in both the surrogate end point and the patient-important surgical outcome?
Is there strong documented/published evidence that connects the surrogate end point to the patient-important surgical outcome under consideration?
To substitute a patient-important outcome for a surrogate end point, one must first show a clear and documented association between them. Typically, researchers select surrogate end points if a biologically plausible explanation exists to suggest that a change in the surrogate end point will demonstrate a change in the target outcome (or vice versa) and a strong correlation has been shown with the patient-important outcome across multiple observational studies. The more robust this association, the more easily a causal link can be established between the surrogate end point and the patient-important outcome — a requirement for a valid surrogate end point.1,2,6
Turning to the study by Raad and colleagues,13 our goal is to assess the strength of the association between narcotic use (surrogate end point) and postoperative pain (target outcome) to establish causality. To do this, we must once again look in the literature and find a biological basis for postoperative opioid analgesia as well as present evidence of a correlation between the quantity of opioid use and pain severity.
The biological principles of opioid analgesia are well understood and referenced in the literature. A review of opioid pharmacology by Trescot and colleagues14 outlines the function of opioids (e.g., morphine, hydromorphone and fentanyl) as agonists to opioid receptors located within the central nervous system and the peripheral tissues.14 Once activated, opioid receptors situated on presynaptic terminals of nociceptive A delta and C fibres function by indirectly inhibiting voltage-gated calcium channels and decreasing cyclic adenosine monophosphate (cAMP) secondary messengers. This ultimately prevents the release of known pain neurotransmitters (e.g., glutamate, substance P and calcitonin gene-related peptide) and results in an analgesic effect.14 Although several opioid receptors exist, morphine (the opioid archetype) acts primarily on μ receptors found in the brainstem and medial thalamus, which are responsible for analgesia and euphoria (μ1 subtype) as well as respiratory depression, pruritus, sedation, decreased gastrointestinal motility and dependence (μ2 subtype).14–16 Given our understanding of opioid pharmacology, a biologically plausible explanation exists to suggest that greater postoperative pain induced by nociceptive nerve stimulation will result in the need for additional opioid analgesia to inhibit signal propagation.
To ensure the surrogate end point is a valid substitute for the patient-important outcome, a strong correlation must be shown across multiple observational studies after consideration of known confounding variables. A large cohort study by Kruse and colleagues17 retrospectively analyzed the effect of tourniquet use on postoperative pain and opioid consumption following ankle surgery. They concluded that tourniquets resulted in elevated pain severity scores, which corresponded to a significant increase in postoperative opioid use after controlling for confounders. While justifying their use of opioid consumption as an outcome measure, the authors referred to a large prospective cohort study by Snyder and colleagues,18 which assessed the impact of pain medication as an indicator of perceived pain following oral surgery. They demonstrated a strong association between opioid consumption and postoperative pain scores and determined that a patient’s choice to take pain medication (opioid or nonsteroidal anti-inflammatories [NSAIDs]) appeared to be a better indicator of perceived pain than numerical pain scales alone.18
A study by Van Dijk and colleagues19 followed 1084 consecutive patients admitted for elective surgery, measuring the association between patients’ numerical rating scale (NRS) pain score, a validated and reliable pain assessment tool, and their desire for additional opioids on postoperative day (POD) 1. They showed that as pain scores increased in the postoperative setting, the percentage of patients who requested opioids also increased, with the majority requesting opioids at a score of 8 or above on the 11-point NRS. Although Van Dijk and colleagues19 found a correlation between pain and postoperative opioid administration, they noted that patients with elevated pain scores often refused opioids owing to tolerability of their pain and fear of potential adverse effects associated with the use of opioids.19 This raises into question the role of additional variables in postoperative opioid administration. Tanaka and colleagues20 performed a retrospective analysis of postoperative narcotic use in pediatric patients undergoing open versus laparoscopic pyeloplasty and determined that patient age, institutional variability in pain management and surgeon experience all affected postoperative narcotic administration. In a similar retrospective trial, Piaggio and colleagues21 determined that patients received more narcotics when their institution’s pain management service was consulted, regardless of surgical intervention.
Although observational cohort studies consistently report an association between narcotic consumption and the patient-important outcome of postoperative pain, they raise some concern regarding the potential for multiple variables other than the patient’s perceived pain to impact opioid analgesia, which ultimately weakens the strength of the association between the surrogate end point and target outcome and calls into question the potential for a causal link between them, jeopardizing the validity of narcotic use as a surrogate end point for postoperative pain (Table 2).
Is there strong evidence that a change in the surrogate end point has led to a change in the surgical target outcome?
Although establishing a clear and consistent association between the surrogate end point and patient-important outcome in observational studies is necessary to prove validity, it is not entirely sufficient. Before one can confidently accept the results of an intervention and make recommendations to patients on the basis of a surrogate end point measure, a strong relationship with the target outcome must be documented across RCTs.2,6
The study by Raad and colleagues13 does not provide direct reference to an RCT showing the validity of narcotic use as a surrogate end point for postoperative pain. In this case, we must once again refer to the literature for evidence. An RCT by Kim and colleagues22 prospectively followed 51 patients to assess the role of intraoperative lidocaine infusion on postoperative pain and narcotic use following lumbar surgery. The results demonstrated a clear association between pain (as measured by the visual analogue pain scale) and narcotics use (as measured by fentanyl consumption) on POD 1. Specifically, the treatment group experienced significantly less pain, which corresponded to a significant reduction in narcotic consumption as well as improved patient satisfaction.22 This association was further substantiated by the observation that fentanyl use decreased as pain scores diminished at 2, 4, 8, 12, 24 and 48 hours postoperatively.22 Additionally, an RCT by Keller and colleagues23 prospectively followed 92 patients and compared the impact of transversus abdominis plane block versus placebo on postoperative pain and opioid administration following laparoscopic colorectal surgery. The authors found an association between lower pain scores and reduced opioid consumption in the immediate postoperative period, with a reduction in opioid consumption corresponding to a reduction in pain. However, although the change in patient pain scores was deemed significant, opioid use did not meet the threshold for statistical significance, which the authors attributed to early motility and lower narcotic intake at POD 0 in the treatment group.23 This inconsistency was again shown in a similar trial by Fields and colleagues,24 which followed 52 patients and evaluated the effect of transversus abdominis plane block on postoperative pain and opioid consumption immediately following ventral hernia repair surgery and up to POD 1. Although this trial reported a significant decrease in cumulative opioid use and postoperative pain experienced by patients in the treatment group, the difference in pain scores reached statistical significance much earlier than narcotic use (1 hr postoperative v. 6 hr postoperative). To account for this, Fields and colleagues24 noted that some patients may have been fatigued in the immediate postoperative period, preventing them from requesting medication. Additionally, they suggested that postoperative pain may have peaked at 6–12 hours, resulting in more pain medication use during this time leading to statistical significance.24
Although these RCTs show a consistent association between narcotic use and postoperative pain, the strength of this correlation is not clear. Specifically, Keller and colleagues23 and Fields and colleagues24 reported that postoperative pain scores appeared to reach statistical significance, whereas narcotic use did not. Although the reason for this is not certain, the literature continues to report the role of additional variables that may affect opioid consumption. As a result, despite studies with higher level of evidence (LOE) improving the strength of the association between the surrogate end point and target outcome, concern still remains regarding the validity of narcotic use as a surrogate end point for postoperative pain, and caution must therefore be taken when interpreting results (Table 2).
Is there strong evidence that similar interventions show similar improvements in both the surrogate end point and the patient-important surgical outcome?
A correlate does not a surrogate make.1,6 Although the focus of this article has, until now, been to establish the association between the surrogate end point and target outcome through observational studies and RCTs, it is a common misconception that correlation is sufficient for validation of a surrogate end point.6 In order to justify replacement, the surrogate end point must capture the net effect of the treatment on the patient-important outcome — a condition much stronger than correlation.1
Surgeons are more likely to accept the results of a surrogate end point if the new therapy resembles an intervention where RCTs have already demonstrated a clear and consistent relationship between the surrogate end point and patient-important outcome. The rationale for this acceptance is 2-fold. On one hand, the biological principle that connects the surrogate end point to the target outcome for one therapeutic intervention may not apply to another. Alternatively, different interventions may have an impact on the patient-important outcome that is unrelated to the surrogate end point (i.e., the treatment under investigation may be beneficial or detrimental to the target outcome through additional mechanisms independent of the surrogate end point).2 For these reasons, more confidence can be given to RCTs that use a similar intervention to show improvement in both the surrogate end point and patient-important outcome.
In the study by Raad and colleagues,13 reference is made to a study by Bucerius and colleagues,25 which assessed postoperative pain in patients undergoing robotic, minimally invasive direct coronary artery bypass (MIDCAB), and conventional CABG procedures. This semirandomized trial prospectively followed 190 patients and concluded that the robotic group experienced a significant reduction in postoperative pain and required fewer opioid analgesics, despite not reaching statistical significance. Although this trial found a correlation between narcotic use and postoperative pain, this association can be considered weak given that no significant difference was found between groups despite a significant difference in pain scores.25 A critical review of this article revealed that healthier patients may have inadvertently been selected for the minimally invasive procedure groups (i.e., robotic and MIDCAB groups), whereas sicker patients, with the potential for a lower pain threshold, were allocated to the conventional group, thus resulting in a greater difference in pain scores and narcotic use. A review of the literature revealed that this is the only randomized trial, though only partially randomized, to assess postoperative pain and narcotic use in patients undergoing robotic versus conventional CABG.
In summary, there is insufficient evidence to suggest a strong and consistent correlation between the surrogate end point and patient-important outcome given the limited availability of trials with a high LOE and the weaknesses identified within the study by Bucerius and colleagues.25 For this reason, one cannot say with certainty that the surrogate end point captures the full association between the treatment (robotic v. conventional CABG) and the target outcome — a requirement to establish validity. For this reason, results of narcotics use as a surrogate end point for postoperative pain should be interpreted with caution, given that the criteria for a valid surrogate are not met (Table 2).
What are the results?
What was the magnitude of the effect?
Once the validity of the surrogate end point has been established, attention must then be given to the magnitude of the treatment effect. Rather than determine whether the intervention altered the surrogate end point, you must evaluate the extent to which the surrogate end point was affected — specifically, the size, accuracy and duration of the treatment effect.2 For example, a treatment that results in a large effect size, that is accompanied by narrow confidence intervals and shows a persistent treatment effect strengthens our belief that the results of the intervention on the surrogate end point will accompany a meaningful change in the target outcome. In contrast, a small effect size with a short duration that is associated with wide confidence intervals reduces our confidence that the change in the surrogate end point will correspond to a meaningful change in the patient-important outcome.2
Raad and colleagues13 concluded that patients within the robotic CABG group had a statistically significant reduction in their mean morphine equivalent dose (MED) requirement compared with their conventional CABG counterparts, as defined by their primary end point: MED from start of the operative procedure to POD 3 (181 ± 11 v. 251 ± 8, p < 0.05). When these results are compared with the authors’ secondary analysis, where total in-hospital MED to discharge (317 ± 30 v. 480 ± 28, p < 0.05) and MED after the procedure to discharge (190 ± 22 v. 274 ± 18, p < 0.05) showed a significant reduction in the robotic group, it is fair to conclude that this study represents a large, precise and lasting treatment effect on the surrogate end point.
As a result, if we had determined that narcotics use was a valid surrogate end point for postoperative pain, the strength of this treatment effect would give us confidence to believe that the reduction in narcotic consumption in the robotic group would likely correspond to a meaningful change in postoperative pain, the target outcome.
In summary, one must be confident in the strength and duration of the treatment effect on the surrogate end point for inferences regarding the impact of the intervention on the patient-important outcome to be believed; this remains true even if the surrogate end point was determined to be valid.
Are the results applicable to my patients?
Will the information from this study help me to inform my patients?
As suggested by Guyatt and colleagues,2 before one can extrapolate the results of a clinical trial and offer recommendations on treatment, the surgeon must ask themselves 3 questions:
Does the study represent my patient population?
Does the trial consider all relevant patient-important outcomes?
Do the benefits of the procedure outweigh any potential risks and costs?
Although the assessment of the study population and alternate patient-important outcomes is fairly straightforward, weighing the benefits against the harms of treatment is particularly challenging when our knowledge of the treatment benefit is limited to its effect on the surrogate end point. To address this challenge, one can look to higher LOE studies (i.e., RCTs) that measure both the surrogate end point and the target outcome to gain a better appreciation of the magnitude of the treatment effect on the patient-important outcome.2 If none exist, one can extrapolate results from observational studies that relate the surrogate end point to the target outcome.
For the purpose of this analysis we will assume the study cohort is similar to the patient population encountered by the physicians in our clinical scenario. A review of the study by Raad and colleagues13 reveals no statistical difference between robotic and conventional CABG with regard to additional patient-important outcomes, including stroke, wound infection rate, renal failure requiring hemodialysis, prolonged intubation longer than 24 hours, reoperation for bleeding, readmission within 30 days and readmission for pain; mortality was not assessed. This finding is consistent with the results of a retrospective cohort analysis by Leyvi and colleagues,26 which determined that robotic CABG was associated with lower 30-day complication rates, shorter length of stay and decreased need for an acute care facility. This shows a thorough consideration of all relevant patient-important outcomes, and on the basis of this evidence, there appears to be no added risk associated with use of the robotic CABG procedure.
Moreover, the conclusions of Raad and colleagues13 are limited to the effect of the intervention on the surrogate end point (narcotic use) alone. For this reason, we must once again turn to the literature in order to extrapolate the results to the patient-important outcome. Although no RCTs comparing the effects of the intervention using both surrogate and target end points exist, a review of our initial search strategy reveals 2 cohort studies by Ezelsoy and colleagues11,12 that measure postoperative pain in patients undergoing robotic versus conventional CABG. In both trials, robotic CABG was associated with a statistically significant reduction in postoperative pain on POD 311 and POD 4,12 respectively, when compared with the conventional procedure. Although these studies present no data regarding opioid consumption, the magnitude of the treatment effect on the patient-important outcome corresponds to the conclusions made by Raad and colleagues.13
A search of the literature for the direct costs associated with robotic versus conventional CABG reveals a single article by Leyvi and colleagues.27 This retrospective propensity-matched study followed 2088 consecutive patients who underwent CABG at a single academic tertiary care centre to compare the direct costs of the index hospitalization and 30-day morbidity and mortality incurred during robotic and conventional CABG. The findings suggest that despite being associated with a shorter surgery, shorter length of stay and a lower complication rate, the cost of the robotic CABG procedure did not significantly differ from that of conventional cases ($18 717.35 [range $11 316.1–$34 550.6] v. $18 601 [range $13 137–$50 194.75], p = 0.13).27
Given documented evidence of a significant treatment effect on the patient-important outcome by Ezelsoy and colleagues,11,12 which corresponds to the results of Raad and colleagues,13 and additional patient-important measures that demonstrate similar or improved clinical outcomes and associated costs, one can conclude that the benefits of the proposed intervention likely outweigh the associated risks. However, a prospective RCT is still required in order to more precisely define the advantages of this therapeutic intervention on narcotic use and postoperative pain.
Resolution of the scenario
At the next cardiovascular surgery rounds, the cardiovascular fellow gave his update regarding the use of narcotics as a surrogate end point for postoperative pain. He informed the group that although opioid use has previously been used as a surrogate end point for postoperative pain in the literature, not enough high-level evidence exists to suggest that this assumption is valid. He reports that conclusions made solely from the effect of a treatment on a surrogate end point may be misleading and are inherently weaker than results derived by measuring the patient-important outcome directly. The staff recruit who had given the presentation during the last cardiovascular surgery rounds agreed with this criticism and informed the group of his intention to conduct an RCT to assess postoperative pain directly in a robotic versus conventional CABG model validated using reliable pain scales.
Conclusion:
The ideal surrogate end point is one that has met the aforementioned criteria for validity and where an RCT has demonstrated a strong, precise and long duration of effect as a result of the intervention. Although substituting a surrogate end point for the patient-important outcome is associated with some benefit, the results of these trials require assumptions that can never be as robust as the conclusions derived from measuring the patient-important outcome directly in a head-to-head comparative RCT. For this reason, we recommend caution when interpreting results of therapeutic interventions in which the treatment effects are represented solely by a change in the surrogate end points.
Footnotes
Competing interests: None declared.
Contributors: All authors designed the study. L. Gallo acquired and analyzed the data, which L. Braga and A. Thoma also analyzed. L. Gallo and A. Thoma wrote the article, which all authors reviewed and approved for publication.
- Accepted March 14, 2017.