EXCEL 5-years: Trial-Based Vs Evidence-Based Medicine

Pablo Lamelas –

In TCT 2019 Dr Stone published the results of EXCEL trial at 5-years which triggered many controversies in TCT, EACTS, social media, etc. Does this 5-year report of EXCEL change clinical practice for patients with severe left main coronary artery disease? Please stay tuned on this methodological review.

Note: For background, in early 2019 I wrote an article about EXCEL trial (Vs NOBLE trial) in which limitations of both studies were addressed. Interestingly, many points raised in that article resulted very relevant for the 5-year result interpretation: like the non-cardiovascular death increase in PCI, the proportional hazards assumption violation, or even the impact of peri-procedural myocardial infarction.

EXCEL 5-years summary

Among patients with severe left-main coronary artery disease, at 5 years death-stroke-myocardial infarction occurred in 22.0% in the PCI vs 19.2% in CABG (2.8 percentage points difference; 95% CI −0.9 to 6.5; P=0.13). Death from any cause: 13.0% PCI vs. 9.9% CABG; 3.1% difference; 95% CI 0.2 to 6.1). Definite cardiovascular death 5.0% PCI vs 4.5 CABG 0.5% difference; 95% CI, −1.4 to 2.5. Myocardial infarction 10.6% PCI vs 9.1% CABG; 1.4 difference; 95% CI −1.3 to 4.2). All cerebrovascular events (stroke or transient ischemic attack) less frequent after PCI (3.3% vs 5.2% CABG; −1.9 difference; 95% CI −3.8 to 0), stroke was not significantly different between the two groups (2.9% PCI vs 3.7%, −0.8 difference; 95% CI, −2.4 to 0.9). Ischemia-driven revascularization was more frequent after PCI vs CABG (16.9% vs. 10.0%; 6.9 difference; 95% CI, 3.7 to 10.0). Authors conclude that “in patients with left main coronary artery disease of low or intermediate anatomical complexity, there was no significant difference between PCI and CABG with respect to the rate of the composite outcome of death, stroke, or myocardial infarction at 5 years”. NEJM 2019

Methodological review

Peri-procedural myocardial infarction definition

This created a lot of controversies. Peri-procedural myocardial infarction was defined as an increase of CK-MB x10 or more by its own, or CK-MB between x5 and x10 with another evidence of myocardial infarction (Q-waves on ECG, angiographic supporting evidence, or imaging of new loss of viable myocardium). The question is: is this a valid way to measure myocardial infarction?

First, none of the different dozens of myocardial definitions used in trials or suggested by different societies in the last 20 years is clearly superior vs the other (in other words, no “gold standard definition”). Why? In part because we cannot get FULL consensus in what is a myocardial infarction and what is not. Is myocardial infarction the death of at least one myocardial cell because impaired blood supply (the most sensitive definition of myocardial infarction spectrum, which when small can remain undetected, even with autopsy! and have little or not impact on clinical outcomes), or is the minimum degree of myocardial muscle necrosis enough to have a relevant impact on other outcomes (the other end of the spectrum)?

I believe that when we are talking about choosing treatments for patients (PCI vs CABG in this case), then we need to focus on patient-important outcomes (and measured in a way) that are relevant by themselves, which usually have an important prognostic impact to patients. Is out of question that myocardial infarction is a patient-important outcome from both trialist, physician and patient perspectives.

Also a very important point: a “biomarker-oriented” myocardial infarction definition measures a “biomarker-only myocardial infarction”? Of course not. In EXCEL 1) 15 cases out of 90 peri-procedural myocardial infarctions (PCI + CABG) had CK-MB x5 plus Q-wave, angiographic or imaging evidence supporting the myocardial infarction, so have additional criteria beyond being just a biomarker elevation. 2) In the 75 patients that had at least CK-MB x10 (which x10 was the threshold, and I cannot find the mean CK-MB of this subgroup in any paper) doesn’t mean that they do not have any other evidence of myocardial infarction, and very likely the majority of them had other evidence (not just a biomarker elevation) if looked after (not saying systematic MRI), but not required to meet trial definition.

So, we need to dive into “advanced” validity language to answer this issue. Does the EXCEL definition has face validity? Yes, CK-MB elevation (in that large degree (x10+) or associated with other evidence of myocardial infarction when lower) makes sense that we are measuring myocardial infarctions. Does it has criterion validity? Yes, this CK-MB x10 for periprocedural myocardial infarctions has been matched with MRI gadolinium late enhancement, the “gold standard” of myocardial necrosis, including post CABG and post PCI literature. This EXCEL definition has prognostic validity? Yes. A recent publication of periprocedural myocardial infarction in EXCEL described that the risk of cardiovascular death was three times higher compared to those without peri-procedural myocardial infarction using this definition in question (10.2% vs 3.8%, un-adjusted HR 2.84, 95% CI 1.42–5.71) p = 0.002; Adjusted HR 2.63, 95% CI 1.19–5.81; p = 0.02). This is supported by tons of prior literature and recent ones as well with better performance than troponin x70.

So, if this definition fits all these validity domains, why many keep insisting that this EXCEL definition is not valid? Probably because of prior reports in which smaller biomarker elevations, or smaller analysis, or without enough follow-up (or other lower quality methodology aspects), showed no or little impact on outcomes. In other words: bad reputation of biomarker-based definitions.

Both PCI and CABG can cause a myocardial infarction, the outcome we want to prevent, so MUST be measured somehow. We all agree that a using the spontaneous myocardial infarction definition (enzymes, chest pain and ECG changes) used in post CABG patients will over diagnose myocardial infarctions (all patients have chest pain, many elevate enzymes, or have ECG changes [many times real transient ischemic changes that are not urged to catheterization unless unstable]). The EXCEL definition fits many validity domains, so, what is the strong rationale for against?

Mortality increase with PCI

Interestingly, EXCEL showed a statistically significant increase in all-cause mortality with PCI, which was driven (at least in part) by non-cardiovascular death. Is this real?

Some background on mortality outcome

Please note that non-cardiovascular deaths (in EXCEL and many other large well-conducted trials) needs a clear cause of death (like pancreatic cancer, traffic accident, appendicitis). Those indefinite causes (typical example, the patient was found death at home in the morning) are considered cardiovascular deaths, but not considered as definite cardiovascular deaths.

If one plays the role of a methods purist, may say that you should measure (or build your research question) the outcome that you are supposed to impact with your intervention. It is pretty clear that with cardiovascular interventions we want to modify cardiovascular death, not non-cardiovascular deaths (unaffected). Then, using all-cause mortality brings noise (systematic or random error) to the equation, in this case the most-important outcome for patients (survival).

A good reason why many cardiovascular studies use all-cause mortality is that makes adjudication a little easier: death or alive, not much thinking. A not always so-good reason to use all-cause mortality instead of cardiovascular mortality has to do with increasing statistical power of the trial: more events, more power. But this much more complicated: statistical power is not the same for a cardiovascular trial (say EXCEL) with 300 cardiovascular deaths (using a cardiovascular death-only outcome) vs the same trial with 300 all-cause deaths (using an all-cause mortality-only outcome). Although both statistical analysis have 300 events, the latter has less statistical power. Why?

Because adding noise reduces statistical power, that simple. As Dr Botto (one of my mentors) teaches frequently: statistical test is a battle between signal and noise (equation: signal/noise). Adding non-cardiovascular deaths in the primary outcome (as all cause-mortality) in a cardiovascular trial not only brings noise (random error), but also dilutes the signal (systematic error) by mixing two point estimates (since in cardiovascular trials the theoretical or expected relative risk of non-cardiovascular death is “always” 1.00, vs the estimate in the cardiovascular outcome that is different than 1.00 [unless equivalency or non-inferiority, like EXCEL, making this even more complex to read]). Then, the larger the incidence (or proportion of total deaths) of non-cardiovascular deaths will push the point estimate towards the null, so no-difference. And if you have small numbers, you are also at risk of having a random high (harm) or a random low (benefit) in all-cause mortality, but driven by an outcome that the true estimate is a relative risk of 1.00.

This argument goes in favor of abandoning the use of all-cause mortality in clinical trials of cardiovascular disease and focus on cardiovascular mortality (specially a sensitive one, including definite cardiovascular mortality and indeterminate causes of death like EXCEL did), unless there is an additional reason not discussed here.

Mortality results

In EXCEL 5-year results demonstrated an increase in all-cause mortality (13 vs 9.9%, OR 1.38 with 95% CI 1.03 to 1.85). Of this 3.1% points difference between PCI and CABG, 2.0% (65% of the weight) was driven by non-cardiovascular deaths only. When analyzing deaths causes separately (cardiovascular death, definite cardiovascular death, undetermined cause or non-cardiovascular) there are no “statistically” significant (for 95% CI) differences. So, what we do now? Overall deaths favor CABG, but cannot be really sure what drives that. We believe or not believe?

When we talk about uncertainty (or certainty) of a causal effect is not a yes or no, is a spectrum. Applying Bradford Hill criteria for causation, there are some unmet points.

Biological plausibility: On one hand, one can perfectly hypothesize that PCI has higher cardiovascular mortality than CABG (as shown in other areas of the literature like extensive three-vessel disease and diabetes). But, cardiovascular mortality (including the undetermined causes of deaths) was not statistically different in this specific trial, which used a sensitive definition. On the other hand, is the increase in non-cardiovascular deaths with PCI plausible? Unlikely. Indeed, if I have to “pre-hypothesize”, a post CABG patient more likely to die from other causes (pneumonia, sepsis of other sources, pneumothorax, etc) or other longer-term conditions induced by an initial CABG than a PCI patient.
Consistency: This mortality increase was not observed in other long-term outcomes from randomized trials, including the left main cohort from SYNTAX, PRECOMBAT or NOBLE. No need to mention about non-cardiovascular deaths, which there are no conclusive reports of that phenomenon in PCI vs CABG literature.
Strength: We are talking about a relatively weak association. I know that a 38% relative increase in death is important for patients, physicians and stakeholders, but from the causation point of view is a weak association not helping in this case. Note: many clear causal association do not meet this criteria, but worth mentioning, like specificity or dose-response.

The only two that meet clear criteria are temporality (cause and then the effect) and experimental design (randomized trial design). The latter is very important, thats why we are having this debate, but trials may be wrong sometimes. Is important to mention that the all-cause mortality estimate has large imprecision, with 95% CI that are quite wide increasing the change a of a chance finding (1.03 to 1.85). Imprecision from small number of events can lead to wrong conclusions due to random error, including the “statistically significant” ones.

If this is a chance finding, what is the methodological tool to reduce random noise using aggregate data from studies? Meta-analysis. If this increase in non-cardiovascular death or all-cause death was real, should be visible in meta-analysis of randomized trials. If this is a chance finding from EXCEL, then this chance will balance with others studies and then “control” for random error.

On top of updated meta-analysis that of course will not show that PCI increases non-cardiovascular mortality and very unlikely show either increased 5-year all-cause mortality vs CABG (eyeball statistic prediction), in this article we raised up other points to think this association less likely to be true. Still, truth is unknown.

Cerebrovascular events

Not surprisingly there was an increase in cerebrovascular events (stroke and transient ischemic attacks) in CABG patients. Why not surprisingly to me? because the same was observed in SYNTAX trial, the effect happens soon in Kaplan figures (periprocedural, and then curves become parallel as expected) and is biologically plausible (aortic clamping, longer hypotension, or atrial fibrillation). One may argue that NOBLE observed the opposite and thats fair, but those cerebrovascular events diverged over time (why?). Not to mention that meta-analysis of CABG vs PCI trials reported more stroke in CABG group. Would love to see updated meta-analysis of stroke including NOBLE to be certain. Compared to the large SYNTAX trial, EXCEL observed less definite strokes, likely related to better surgical techniques and improvement in post-procedure care, or simply less sick patients, or a combination.

Ischemia-driven revascularization

This aspect was described in our prior post, but grossly: repeat revascularization in PCI vs CABG trials is a conflictive outcome based on confounding by indication, patient preferences, availability of revascularization targets and real impact (ant its magnitude) on other patient-important outcomes.

Impact of SYNTAX score

As noted in our prior article, about a third of the EXCEL cohort had core-lab reported SYNTAX score over 32, despite this was an exclusion criteria and prior literature suggests superiority of CABG including 10-year data from SYNTAX trial. In EXCEL there was a clear trend of more benefit of CABG as SYNTAX score increased, but interaction p-value not reported (with eyeball test seems not significant). Taking this into consideration, prior literature suggesting superiority of CABG in diffuse coronary disease and lack of mortality differences in other left main trials (SYNTAX 10 years, PRECOMBAT 5 years and NOBLE 5 years) makes me wonder that in EXCEL we may have a subtle subgroup effect driving part of 5-year data interpretation.

What would have been the results if the study entered only SYNTAX scores under 32 and powered for that only? Nobody has the answer, but I guess we would not be having this controversy then. Meta-analysis (specially individual-patient data) can clarify this better. EXCEL is a clear example of the importance of a core-laboratory to standardize measurements prone to error.

Final thoughts

Evidence-Based Medicine is broadly defined as integration of research for medical decision making. So, practicing medicine integrating EXCEL results is a form of Evidence-Based Medicine. But when digging a little deeper in the proper way to apply Evidence-Based Medicine, we need to consider and critically appraise the whole existing evidence about a specific research question (or topic) for clinical decision making. Very often we (doctors) see a new large trial published and quickly jump into conclusions about practice changing, when this is much more complex.

Sometimes is difficult to differentiate between the historical or classical way of running and interpreting experiments (primary hypothesis, p-values, multiple testing, main study interpretation, primary, secondary, tertiary, etc) with the current process of Evidence-Based Medicine (cumulate evidence and critically appraise it).

For example: the trial tested non-inferiority for 3-years, but not for 5 years since this was “exploratory”. If the study have done a well job with 5-year follow-up (which indeed did a great job) then when summarizing the available information of this topic we will pick these outcomes for long-term efficacy, regardless of investigators primary objective. If only a selected smaller group was followed between 3 and 5 years, then (although they will not be “excluded” when appraising the evidence) those aspects will be considered. Same for p-values: the journal probably pushed for not reporting many of the p-values, when at the end when applying Evidence-Based Medicine we will extract the numbers, pool it if necessary and then get the estimates and its precision we need for decision making.

“Science is meant to be cumulative, but researchers usually don’t cumulate scientifically”
Sir Ian Chalmers.

Important note: you maybe thinking “okay, this is an interventional cardiologist, has conflicts of interests”. I did my best to be as objective as possible, and any doctor living on earth or other parts of the multiverse can be easily blamed as biased, specially given the unconscious part of it (or beliefs). In my favor, before COMPLETE trial was presented, I published a systematic review with meta-analysis in which I expressed my skepticism that complete revascularization (doing more PCI) reduced 50% mortality (with many similarities with this EXCEL topic, including small number of events and lack of plausibility for such effect). Luckily, the very large COMPLETE trial was published later showing no differences in cardiovascular deaths and, given the many methodological limitations of prior literature, this is likely the truth. So, people who know me knows I am not a robot that wants more PCI for patients, but trying the best treatment for them.