Surgical treatments are half as likely as medical therapies to be based on randomized controlled trial (RCT) evidence.1 The paucity of randomized surgical trials may reflect ethical challenges to randomization, insufficient funding and insufficient knowledge and infrastructure in surgery to conduct large, full-scale and definitive trials. Many surgical events, such as scapholunate dissociation as compared with hypertension, are relatively rare; this undoubtedly accounts for part of the discrepancy. As a result, most surgical research uses retrospective designs, often with a small number of patients.1 Despite efforts to design a methodologically sound surgical technique study and perform proper statistical analyses, the results may not accurately reflect the true situation. This is a major concern in observational studies of surgical interventions. Whenever one observes an association between an exposure, such as a surgical intervention, and an outcome measure, one is tempted to derive a causal inference when, in fact, the relation may not be causal. The exposure could be a risk factor causing harm or a protective factor preventing harm. For example, one may detect an increased rate of postoperative wound infection in patients who undergo open appendectomy compared with those who have a laparoscopic procedure and presume that the type of approach (open surgery in this case) is a risk factor for the increased postoperative wound infections. However, this association may simply be owing to the presence of a third factor or a confounding factor, such as obesity (as shown further on), which ought to be controlled for at the stages of design or analysis.2 Failure to control for confounders in a study undermines its credibility and internal validity.

## Objectives of this article

This article will discuss the importance of confounding factors and the methods of detecting and dealing with these factors in surgical research. By the end of this article, the reader will be able to understand the methods available to deal with confounding factors at the stages of design and analysis of any given study. The subject matter is divided into 2 sections:

What is a confounding factor?

How do we deal with a confounding factor?

## What is a confounding factor?

To understand the phenomenon of “confounding,” one needs to consider the potential relation between an exposure (e.g., novel surgical technique) and an outcome (e.g., revision surgery or postoperative infection). The term “confounding” refers to a situation when one finds a spurious association or misses a true association between an exposure variable and an outcome variable as a result of a third factor or group of factors referred to as confounding variable(s). A confounding variable is a factor associated with both the exposure variable and the outcome variable.3 For example, delay to surgery has been shown to increase the risk of mortality in patients with hip fractures in a number of observational cohort studies. It is presumed, therefore, that the delay to surgery (risk factor) increases mortality (outcome). However, critical evaluation of the literature suggests that this effect is confounded by other patient characteristics, such as age and American Society of Anaesthesiologists (ASA) score.4 Surgery is often delayed for sicker patients to seek medical optimization before surgery, and sicker patients are generally at greater risk for mortality.

For a variable to be considered a true confounder, it cannot lie in the causal pathway of association between the exposure variable and the outcome variable. Biological and clinical knowledge are usually used to judge whether a potential confounder is in the causal pathway of an association.2

Confounding is essentially an intrinsic limitation of observational studies (e.g., cohort studies, case–control studies) but is usually well addressed by large and well-designed RCTs.5 However, small RCTs can occasionally run into problems with confounding simply owing to chance if their treatment groups are unbalanced with respect to participants’ baseline characteristics. Not recognizing confounding factors certainly increases the chance of this happening — particularly in smaller studies.

What do we mean by confounding? In an observational study, for example, if factor A is associated with condition B, we might conclude that a third factor, X, is a confounder if both of the following are true:5

Factor X is a known risk factor for condition B.

Factor X is associated with factor A, but it is not a result of factor A.

Whenever we observe an association between a potential risk factor and an outcome, we have to question whether this association is true or whether it is a result of confounding by a third factor. Let us look at a hypothetical example of open versus laparoscopic appendectomy. We have conducted an unmatched case–control study to see if, compared with laparoscopic appendectomy, open appendectomy is associated with a higher rate of postoperative wound infection. However, we know that postoperative wound infection and obesity are related because obese patients usually have a higher rate of postoperative wound infection than nonobese patients.6 Let us also assume that obese patients are more likely than nonobese patients to undergo open appendectomy. In this hypothetical example, if an association is observed between open appendectomy and postoperative wound infection, it may be, first, that open appendectomy leads to increased postoperative wound infection (Fig. 1A) or, second, that the observed association is confounded by the factor of obesity (Fig 1B). When examining the effect of a risk factor (factor A: open appendectomy) on the outcome (condition B: postoperative wound infection) without taking into account the effect of the confounder (factor X: obesity), one may misrepresent the true effect of open appendectomy in increasing postoperative wound infection (Fig. 1A). Thus, in this hypothetical study, obesity may confound the relation between surgical approach (open or laparoscopic appendectomy) and wound infection unless the results are adjusted for obesity (Fig. 1B).

How do we identify a confounder? A convenient method to check for a potential confounding factor is, first, to find out if the assumed confounding factor is associated with both outcome variable and exposure variable and, second, to compare the associations before and after adjusting for that confounding factor. Table 1 displays data from our hypothetical example of an unmatched case–control study of open and laparoscopic appendectomy. Let us assume for simplicity that 100 cases (patients with wound infection) and 100 controls (patients without wound infection) were studied. The unadjusted odds ratio (OR) of postoperative wound infection in open appendectomy versus laparoscopic appendectomy was calculated to be 1.95. The OR is defined as the odds of the outcome occurring in exposed individuals divided by the odds of the outcome occurring in unexposed individuals. If the outcome is not related to the exposure, the OR will be equal to 1. The OR will be greater than 1 for a positive association and less than 1 for a negative association. The unadjusted OR of 1.95 suggests that the postoperative wound infection is almost 2 times higher for open appendectomy than laparoscopic appendectomy. In the next step, we must examine whether the observed relation between open appendectomy (exposure variable) and postoperative wound infection (outcome variable) is confounded by obesity (confounding variable).

First, we need to find out whether obesity is related to postoperative wound infection and to open appendectomy. By looking at Table 2, we see that 50% of the patients with postoperative wound infections and 20% of the patients without postoperative wound infections are obese. It seems that, with a ratio of 2.5, obesity is related to increased risk for postoperative wound infection. We then need to assess whether obesity is related to open appendectomy. Table 3 shows the relation between obesity and type of appendectomy for 200 patients. Of 70 obese patients, 35 (50%) underwent open appendectomy and of 130 nonobese patients, 13 (10%) underwent open appendectomy. Thus, with a ratio of 5.0, we clearly observe that obese patients were more likely than nonobese patients to have an open appendectomy. At this point, obesity seems to be related to both postoperative wound infection and open appendectomy.

Second, we need to calculate the adjusted OR and compare it with the unadjusted OR of 1.95. We first stratify study population to obese and nonobese patients. Within each stratum, a contingency (2 × 2) table is created and the OR is calculated (Table 4). When we calculate the OR separately for obese and nonobese patients, we find that the OR is 1 in each stratum, indicating the lack of association between postoperative wound infection and type of surgical approach. We could conclude that the unadjusted OR of 1.95 in Table 1 was owing to the unbalanced distribution of obesity between cases and controls. Thus, in this example, obesity was a confounder, and the association between open appendectomy and postoperative wound infection was spurious.

## How do we deal with a confounding factor?

As confounders influence the real treatment effect (obscure the etiological importance of a variable), they need to be dealt with when planning a research study.7 Confounding can be dealt with at the stage of study design (before collecting the data) or at the stage of data analysis (after collecting the data). The commonly used methods to control for confounding factors and improve internal validity are randomization, restriction, matching, stratification, multivariable regression analysis and propensity score analysis.3 A brief explanation of controlling for confounders at each stage is subsequently described. Note that the use of these methods should be justified, predefined, specified and taken into account in a power calculation a priori, at the stage of study design.

### Dealing with confounding at the design stage

#### Randomization (RCT)

Randomization is the most optimal method of controlling for confounders. It has the advantage of balancing both measured and unmeasured confounders between study groups, which reduces the uncertainty as to whether the observed associations might be confounded by prognostic factors in the study. In our previous example, a well-designed and powered RCT to study the effect of surgical approach (open v. laparoscopic appendectomy) on postoperative wound infection would more likely provide a balanced number of obese and nonobese patients between the open and laparoscopic groups. Since the method of randomization is based on probability, it is unlikely that this balance will be achieved for all patient characteristics, even with a large number of observations. However, randomization does guarantee that any differences between the 2 groups (open and laparoscopic appendectomy) are owing to chance,7,8 rather than the choice of the surgeon. Thus, although differences between patient characteristics may still exist after randomization, their confounding effects are likely minimized.7 The chances of achieving balanced groups with respect to prognostic factors will increase if larger numbers of patients are studied.1,8 Randomization may be insufficient to achieve balanced groups with small sample sizes (i.e., fewer than 200 patients)8 The larger the sample size, the more confidence one may have that balance of prognostic factors in an RCT has been achieved. The methods of optimizing the chances of achieving balanced groups in RCTs of surgical interventions are explained elsewhere.8

#### Restriction (observational study)

Restriction is simple and easy to understand and can at least partially eliminate the influence of a confounding factor. We restrict our study population to individuals with certain characteristics by tightening the eligibility criteria. For example, we could include nonsmokers younger than 60 years of age to remove the effect of smoking and older age. Restriction can be used to address a limited number of confounders. This usually involves selecting patients with specific characteristics to have a more homogeneous study population, but this comes at the expense of external validity and loss of generalizability.2 Because of the selection process, the number of eligible participants is usually reduced; consequently, achieving the required sample size becomes more difficult. Another disadvantage of restriction is that once we restrict our study population on certain variables, we can no longer investigate those variables. Therefore, we should only use restriction on variables that we are convinced are confounders. In our example, the investigator would only select nonobese patients to control for the factor of obesity. Note that there are more likely other patient characteristics that need to be considered for adjustment at the stage of data analysis.

#### Matching (observational study)

Matching involves pairing the study groups for potential confounding factors, such as smoking, age, tumour size or sex. Matching could be used in case–control studies (e.g., patients with postoperative wound infection are matched to patients without wound infection for 1 or more confounders) and cohort studies (e.g., patients who have open appendectomy are matched to patients who have laparoscopic appendectomy for 1 or more confounders). Matching permits the adjustment for multiple confounding factors, provided that appropriate control patients can be identified. It also improves between-group comparability. Matching is useful when there is a limited number of cases or a limited number of exposed persons. Furthermore, matching each case to more than 1 control maintains the function of matching and increases study power.9 The method of matching must be specified a priori at the stage of study design. There are certain disadvantages to matching that one should consider. First, it can be difficult to find suitable matched pairs for multiple confounding factors. Second, variables selected for matching can no longer be evaluated as risk factors for the outcome or disease under investigation.10 Also, matching is very difficult to perform in surgical studies. In carrying out a matched cohort study or a matched case–control study, we should match variables that have confounding effects and that we have no interest in investigating. Matching on variables other than those is called overmatching and should be avoided. Third, the design is prone to loss of data; if 1 member of a matched pair does not provide adequate data, the pair has to be excluded from the analysis. A further disadvantage of matching is the complexity of the data analysis if unmatched variables need to be adjusted or if the matching ratio varies.5,11 In our example, patients would be matched by obesity status. Obese patients undergoing open appendectomy would be matched to obese patients undergoing laparoscopic appendectomy, and nonobese patients undergoing open appendectomy would be matched to nonobese patients undergoing laparoscopic appendectomy, thus eliminating the confounding effect of obesity.

### Dealing with confounding at the data analysis stage

#### Stratification

Stratification divides data as strata or layers based on a suspected confounding variable. Data are stratified and analyzed for each stratum. Indeed, we have already used the stratification method of analysis for our clinical example. In our clinical scenario, patients with and without postoperative wound infections were divided into 2 strata based on obesity status, and then stratum-specific ORs for open versus laparoscopic appendectomy were calculated for obese and nonobese patients separately. With respect to a confounding factor, the OR of 1.0 for each stratum makes the interpretation of data self-explanatory and simple. However, in most situations, treatment effect is different between strata (Table 5). In these situations, statistical methods, such as the Mantel–Haenszel test, are used to produce a single estimate of treatment effect, which is adjusted for the effects of the confounding factor.5,7 This is called the adjusted treatment effect, which is then compared with the unadjusted treatment effect to determine the effect of the confounding factor. There is no general agreement as to how much change is required in the strength of an association after adjustment for a factor to be considered a confounder; some experts have argued that at least 10% change in the strength of any association is acceptable.2,12 To estimate the pooled adjusted treatment effect, the treatment effect (i.e., OR) should be homogeneous across strata, and we need to test for heterogeneity to ensure this assumption. If the treatment was heterogeneous across strata, the pooled estimate of treatment effect should be avoided because this situation might reflect possibilities of effect modification — the effect of an interaction between exposure variable and confounding variable on the outcome. In our example, assuming that the treatment effect is homogeneous across strata, the adjusted OR is 1.11 (Table 5). The change from the unadjusted OR of 1.95 is more than 10%, suggesting that obesity was a confounder. We should therefore report the adjusted OR of 1.11 and suggest that the risk of postoperative wound infection is increased by about 1.1 times in open appendectomy compared with laparoscopic appendectomy. Note that the use of the test of statistical significance to detect a confounding effect should be avoided because we are interested in the strength of the association between the confounder, the exposure and the disease; *p* values will not provide the information on the magnitude of association if a particular variable is a confounder.7

Stratification is effective when dealing with dichotomous confounding variables because the data can be separated into 2 or more distinct strata. Stratification is more difficult for continuous variables, such as age and tumour size, because arbitrary strata must be created.3 Such adjustment for confounding does not always remove the confounding effect of that variable. There may be residual confounding when there are relatively few strata of continuous variables (e.g., age is in only 2 strata: = 50 yr and > 50 yr).7 The adjustment for the confounding factor can be improved by increasing the number of strata; however, this requires inflating the number of observations in each stratum. The main disadvantage of stratification is the inability to deal with multiple confounding factors simultaneously. If multiple confounders are considered, each stratum may become very small or disappear (i.e., no patients in the stratum).3 Similar principles apply to relative risk (RR) as another measure of treatment effect applicable to prospective designs. Relative risk is defined as the ratio of the event rate (postoperative wound infection) in the exposed group (open appendectomy) divided by the event rate in the unexposed group (laparoscopic appendectomy). The interpretation of RR is similar to the interpretation of OR. Detailed information on calculating adjusted ORs or adjusted RRs using the Mantel–Haenszel test12 and the standardization method13 for matched data is available elsewhere.

#### Multivariable regression analysis

Regression analysis is a mathematical model that estimates the association between a number of independent variables (potential risk factors) and 1 dependent variable (outcome). This method uses all the study data and examines many variables, either continuous or dichotomous, simultaneously, including the comparison variable (surgical technique).14 Multivariable regression analysis allows the estimation of the effect of an exposure variable (open appendectomy) on a given outcome variable (postoperative wound infection) after controlling for the cofounding effect of other included variables (e.g., obesity, age).15,16 Multivariable regression analysis is regarded as the most powerful but complex type of analysis to deal with confounders.3 It is the most commonly used method to deal with confounding factors in the medical literature because of its flexibility.3 When we use regression models to assess the change in the association between outcome and exposure, it is the extent of change, not the statistical significance, that is considered important. One limitation of regression analysis is that it can deal with only a limited number of factors when there is a small number of observations (e.g., a small study sample size with few outcome events).3 The acceptable number of covariates (i.e., potential risk factors, confounding or interacting variables) depends on the number of observations or sample size. It is recommended to have 10 or more observations (postoperative wound infection) per variable.15,17 We suggest including only variables that are likely risk factors for the outcome of interest. A literature review might provide the most relevant variables. Other disadvantages of regression analysis are that the interpretation of output from regression models can be challenging for researchers with limited statistical knowledge and that the results may be inaccurate if assumptions of the mathematical models are not satisfied and if sample size is insufficient.3 In our example, a multivariable regression model would include postoperative wound infection as a dependent (outcome) variable and surgical approach (open v. laparoscopic appendectomy), obesity and other patient characteristics as independent variables (covariates).

#### Propensity scores

Propensity score analysis, defined as the conditional probability of being treated given the patients’ risk factors (covariates), can be used to balance the between-group differences and, therefore, reduce bias.16 Binary regression analysis provides an estimate of the propensity toward (probability of) belonging to one group versus another.16 Once the propensity score is calculated for each patient, this score can be used through matching, stratification and regression to estimate the adjusted treatment effect. Propensity scores are applicable when dealing with dichotomous variables and can reduce the number of covariates in a multivariable regression model.18 The main disadvantage of this method is that, like any other methodological or statistical method, the propensity score does not account for unknown and unmeasured covariates. Another limitation is its lack of familiarity among physicians. Other limitations applied to multivariable regression analysis are also applied to propensity score analysis.

The advantages and disadvantages of the main methods for adjusting for confounders in a study are summarized in Table 6. Some practical tips on dealing with confounders at the stage of planning a research project are provided in Box 1.

### Tips for dealing with confounders when planning research studies

Form a clear and well-defined research question. For more detailed information on how to develop a research question, hypothesis and objectives, refer to the article by Farrugia and colleagues.19

Review the literature and identify the relevant risk factors and confounders.

Select the most optimal design to answer the research question.

Choose the optimal method of dealing with the confounders at the stage of study design before collecting data. Consider the effect of the selected study design on the internal and external validity (generalizability) of the study.

Consider the further adjustments needed at the stage of data analysis after collecting data.

Consider the method chosen to deal with confounders in sample size calculation. Note that the same method used for sample size calculation should be also considered for data analysis.

These considerations should be predefined and justified a priori at the stage of design and before the process of data collection.

## Conclusion

Confounding occurs when a risk factor is associated with both the comparison variable and the outcome of interest. The most optimal method for controlling confounders is randomization. When randomization is not feasible, observational studies are conducted. The findings from studies using observational designs are subject to bias if the researchers do not adjust for the effect of confounding variables. In observational studies, confounding variables should be predefined and measured, and the method of adjustment should be specified and taken into account in the power analysis a priori at the stage of study design. Even then, the conclusions drawn from these studies will be less robust than those generated from an RCT because one cannot eliminate the effect of all known confounders, and because there is no way of incorporating the effect of potential unknown confounders. Therefore, the conclusions drawn from observational studies must be interpreted with caution.

## Footnotes

**Competing interests:**M. Bhandari was funded, in part, by a Canada Research Chair, McMaster University. None declared for L.H.P. Braga and F. Farrokhyar.

- Accepted December 21, 2011.