Rethinking the Interpretation of Randomized Trials: A Reflection on CLOSURE-AF

A few days ago, the CLOSURE AF Trial was published online at NEJM. It was accompanied by an editorial titled “another overused method in Cardiology”. Friends sent me links to social media statements. They included things like “the end of left atrial appendage closure” and various funny memes.

I read the main findings on my own, I integrated that with prior evidence. Then I started thinking: am I watching a different movie? Do they live in a different planet/multiverse? I’m alone in the pursuit of applying the best standards of evidence-based practice?

Join me in this editorial addressing this specific “controversy”. We will start with editorial key points. Then we will provide a short trial summary for those not in topic. Next, we’ll move to specific points of the study, focusing on key methods discussion. Finally, we will discuss how to properly integrate this to prior evidence. This integration will aid in making future evidence-based clinical decisions. We will present a provocative proposal.

Disclosures: I will not be politically correct. This represents just my opinion. I do not talk in representation of any institution/organization but just myself. I perform left atrial appendage closure (LAAC) procedures, proctoring, and consulting. However, this is not about whether LAAC should be used or not. This is about how to interpret trials overall.

Editorial key points

This trial employs a primary composite outcome that cannot be used for clinical decisions. It cannot even be used for trial design. To put this into perspective: If my clinical epidemiology students submit a research proposal with that composite outcome, they lose grade.
Relative risk and not clinically-justified non-inferiority margins are not recommended. What others did in the past is not enough.
We (all, everyone) should probably abandon the rationale of using the “primary outcome” for main study interpretation. This is the most provocative topic of the day.
Proposal: every large RCT that is published should run (or update) a systematic review. It should also conduct a meta-analysis to provide more accurate evidence-based debate/conclusions.

CLOSURE AF Trial summary (as editors/readers usually prefer):

Key methods discussion

Composite endpoints

The primary combined endpoint was stroke (ischemic or hemorrhagic), systemic embolism, mayor bleeding (including periprocedural and longer-term non-procedural), and cardiovascular or unexplained death.

When you face a combined endpoint, ask yourself a question before seeing the results: Will this combined endpoint provide a result useful for clinical decision making?

To answer this question, you need to answer a few questions the same way I do with my students (based on Dr Guyatt publications), lets to it together:

Are the outcomes of similar importance to patients (in technical terms, have the outcomes similar utility/disutility)? In this case: Are stroke or CV death and major bleeding of similar importance to patients? How does this compare to systemic (non-stroke) embolism? Do patients perceive a major periprocedural bleeding the same way as a major bleeding that happens at home provoking a new hospitalization?
All outcomes are expected to have the same direction of effects? In this case: a priory expected direction of stroke is the same as mayor bleeding between LAAC and control group (mostly OAC), or mortality? In other words, the expected relative risk of stroke (which is already a mix between ischemic and hemorrhagic) is in the same direction as mayor bleeding when one group receive mostly long term OAC? Is the expected direction the same for periprocedural bleeding to spontaneous bleeding during follow up?…
Is the magnitude of effects similar across the outcomes? In this case, assuming that all outcomes are expected to move in the same direction, are these magnitudes similar? For example, is the relative risk of stroke, bleeding, and CV death expected to be the same?…
Is the frequency of events similar across outcomes? In this case, the number of strokes happen similar to mayor bleeding, CV death, or systemic embolism?…
Is the underlying biology of the component end points similar enough that one would expect similar relative risk reductions? In this case: the mechanism of ischemic vs hemorrhagic stroke are similar? Or ischemic stroke and major bleeding?…

You found a great combined outcome if the answer to all these questions is “Yes.” This is seen less commonly than unicorns on the street. If the answer is “Yes” or “maybe Yes” to most of them (three, maybe two) of these questions, you found a combined outcome that you can defend. If the answer to these questions is all “no” or “probably no”, you can confidently conclude “why are we combining these outcomes?”.

Any unbiased evaluator of this study would conclude that the answer to all these questions is “no” or “probably no”, because:

Is out of question that there is important difference in patient-importance between stroke (EQ-5D utility around 0.55 at 3 months, 0.65 at 6 months, and about 0.66 beyond 6 months, and do not even mention a disabling stroke) and mayor bleeding (EQ-5D utility decrement of about −0.029, and decreasing over time) or non-stroke systemic embolism (typical values used are in the range of −0.02 to −0.05).
That the expected direction of ischemic stroke (likely neutral if LAAC effective) is different to major bleeding or hemorrhagic stroke (more likely to happen to chronic OAC users). Also, within major bleeding there is a mix between procedural (increase in the LAAC group) and non-procedural spontaneous bleeding (lower in the LAAC group) after discharge. Authors (and many readers) would say “it’s OK to mix them because we want to see the net benefit”. The problem is that procedural bleeding usually involves a blood transfusion or a pericardial effusion needing drainage, which may or may not prolong the original admission. In contrast, major bleeding after discharge results in an urgent re-admission and subsequent procedures. Not even mention an intracranial OAC-related bleeding. Then, patients perceive major bleeding after discharge as much worse. They view it as more severe than a major procedural bleeding. Therefore, the net effect is probably misleading.
The expected magnitude of effects is not uniform between CV death or stroke, which are mild, and major bleeding, where the effect is large. We expect much less impact on stroke estimates than on bleeding.
The number mayor bleeding is expected to grossly exceed the frequency of stroke or systemic embolism.
No question exists that the underlying biological mechanism of ischemic stroke differs completely from mayor bleeding. Indeed, they are the opposite.

Despite all this, did not see any comment in the editorial, social media, or raised up on debates. This raises serious concerns. Trialists, editors, and consumers need to be more aware of misleading primary combined outcomes.

Non-inferiority margin selection

Let’s assume that the combined outcome selected was appropriate for a moment, or was not a composite outcome at all. Everybody knows that non-inferiority trial conclusions are heavily dependent on non-inferiority margin selection. In this case they stated HR 1.3, similar to say “if we rule out being a 30% worse, non-inferiority is met”. This is problematic, for a few reasons.

The credibility of the conclusion of a non-inferiority test is proportional to the rationale behind the selection of this margin. In the text, there is no clear explanation for using a 30% relative risk increase as an appropriate margin. The only justification given is that “other trials did it before.” This logic on threshold selection is an approximation of prior trials. It is not sufficient. Supporting it is difficult with such a complex combined outcome selected.

Ideally (or at least more frequently) non-inferiority margins should be presented in absolute risk differences. Why? Because is what patient matters. Patients do not value relative risk (generally), they value absolute risk differences. It is true that using absolute margins has a “low-outcome rate” problem. Trials usually face “lower than expected outcome events”. But still, using just a relative risk is probably more problematic.

Relative risk alone can be misleading. A 30% relative change can represent important differences and can also represent non-important differences, depending on outcome event rates. At 3 years, most patients would value a 2% risk difference of stroke (NNH 50) as a minimally important difference. In other words, if the difference between treatments is larger than 2% at 3 years, treatment is considered harmful. Using the HR 1.3, a relative risk increase of 30% occurs. HR becomes to 2.0 when stroke happens in 1% of the control group. The upper confidence interval is 2% in the treatment group. It would be considered not fitting non-inferiority despite the 1% difference at 3 years being considered a non-important difference. Conversely, if the control group has 10% stroke rate and treatment group upper confidence interval is 12.5%, although this difference is 2.5% (larger than 2% that patients consider an important difference) since the HR is less than 1.3 the treatment will be considered “safe or effective”.

To conclude this point, these trials should use properly and transparently justified absolute risk threshold differences. These threshold should be based on what patients consider a minimal important difference in that scenario. On top of that, is virtually impossible to come up with a threshold with such a “complex” combined primary outcome. Finally, to make this even more complicated, proportional hazard assumption did not hold…

How to approach this controversy using best evidence-based standards

This following process is based on the GRADE approach. The developers of the GRADE approach will tell you “this is not the bible.” They say “there are other ways to do this…” However, as a user and part-time teacher with no responsibility to be politically correct, I find it clear that this is the most comprehensive approach, considering the limitations of clinical research. Every aspect has been carefully debated by hundreds of top-level clinical epidemiologists for many years. It is considered the international standard. It is adopted by the WHO, CDC, NHS, and many other leading organizations worldwide.

The research question:

First, the research question should be clear. In this case, would be:

Population: patients with AF and increased risk of bleeding.

Intervention: LAAC

Comparator: Best medical treatment, including OAC (represented in 90%)

Outcome/s…

To select the outcomes, in simple terms think about efficacy and safety. For LAAC vs medical treatment (which is usually OAC):

Efficacy: ischemic stroke, and reduction in adverse events of long term OAC such as spontaneous bleeding (whether cerebral or other major bleeding).

Safety: periprocedural complications (mainly periprocedural major bleeding [including tamponade], stroke, major vascular complication), device embolization, device thrombosis. Mortality and cardiovascular mortality can be considered safety or efficacy depending on the hypothesis, so included anyways. Peri-device leak not sure is a complication or a non-effective treatment yet (like, a failed PCI is a complication or just inconclusive?).

First, outcomes are ranked based on their importance. Next, outcomes considered critical and important are included in the summary of findings table.

Summary of findings table

After a systematic review (that is, making sure you do not miss any eligible trial), each outcome is extracted independently from each trial. Why? Because clinical decisions are based on the balance of effects across all critical/important outcomes individually, no need to combine outcomes. Note: In some scenarios where there is a strong suspicion of competitive risks, combined outcomes may be used, otherwise, tables are filled with individual outcomes.

Looking at the literature, this is the most updated summary of findings table for this research question is from the 2025 SCAI-HRS clinical practice guideline. This was published before the OPTION trial and, of course, CLOSURE AF. What happens if we update the most important outcomes, what does the body of the evidence states? Note: I pooled all the prior evidence into a single estimate to save time. It is probably not very different from what will be concluded if prior individual studies were pooled. This is just for illustration purposes.

All mortality

All stroke from enrollment to latest follow-up

Ischemic stroke

Non-procedural major bleeding

Procedural complications cannot be pooled against control group (“0 events”), but in CLOSURE AF 4% had procedural major bleeding (1.1% were tamponade), 1 TIA, 1 embolization, and 2 deaths within 7 days of the procedure.

What can be concluded looking this evidence summary? Although before jumping into conclusions each outcome needs to be assessed for certainty, but grossly speaking the totality of the evidence so far points towards no relevant changes in mortality (trend favoring), all stroke, ischemic stroke (trend unfavorable), and a 50%ish reduction in non-procedural bleeding with LAAC.

Although we celebrate CLOSURE AF for increasing the precision of the estimates, this study does not add any relevant change in interpretation for any outcome. So, why so much “celebration” by some colleagues. I believe there are two main reasons:

Primary outcome believers

These people still believe (hopefully more balanced after reading this editorial comment) that studies should primarily be interpreted by the conclusion of the primary outcome. In my opinion, we should abandon that traditional rationale, because:

Primary outcomes are selected by investigators according to their understanding of the problem. However, that is absolutely anecdotal for clinical decisions. What if this same study, executed the same way, had just all-cause mortality as primary outcome? Or all-stroke? All look reasonable primary outcomes, but conclusions may vary a lot, still the same study. Indeed, when applying the GRADE approach, the rationale or primary outcomes selected by investigators is not collected/assessed.

The most valuable role of investigators is to run the most valid (that is, unbiased) study. The larger the study, the better it increases the precision of all outcomes as much as possible. They should include all patient-important outcomes related to the problem. Investigators must also do a great job in transparent and complete reporting. So, why do most people push investigators to focus conclusions solely on the primary outcome? This is true even for editors in chief of high-impact journals, not just the average doctor. Why do they ignore or understate the rest?

We inherited this rationale based on the major threat of randomized trials: random error. Is common knowledge that if you run multiple tests, one maybe positive despite no true causal association, just chance. So, pre-specifying a primary outcome “protect us” from falling into spurious associations. Indeed, the FDA promotes the use of hierarchical testing. They probably should base decisions on the body of the evidence where there is no hierarchical analysis, it is just the totality of the evidence. The use of primary outcome and hierarchical testing is in reality a double edge sword.

What makes you think that a primary outcome selected by investigators will have less random error than other tests in the study? The likelihood of random error of the primary outcome is the same than the rest of the outcomes, because is absolutely impossible to know in advance, just random. Is true that avoid cherry-picking the outcomes and so on. But at the end we will base clinical decisions on tools that ignore this approach.

So, which is the weapon against random error? Unfortunately, we have only one solution. We must increase precision, whether it’s outcome events or observations, within a single study or pooling all the existing evidence in a meta-analysis. It is the same as figuring out the chance of getting a 12 in the roulette. To get the true precise chance (1 in 37, remember the “0”), you can increase observations from a single table in the same room (increasing outcome events in a single study), or you can include all the tables in the casino (meta-analysis).

Confirmation bias

When people encounter a study that favors their prior beliefs, they tend to be less critical. This is in contrast to how they react to studies that challenge those beliefs. We often observe this in social media. Algorithms increase confirmation bias by showing content aligned to your thinking. This leads to polarized opinions. This study fits perfectly for physicians skeptical about LAAC. This is totally expected. Not much effort is made to find limitations despite being notorious. The point is that we should be equally critical when results favor or not favor our knowledge.

What do these people think about all the information presented above? Is it a valid primary composite outcome for clinical decisions? Does CLOSURE AF trial changed the totality of the evidence of patient-important outcomes using best standards? The answer to these two questions is probably “No”, regardless your prior beliefs.

This is the reason why I sometimes feel isolated in this journey, but I do not lose hope. Heard many saying “people do not change”, and that is misleading, people do change but with difficulties. Entropy (second law of thermodynamics) indicates that things go from complex to less complex forms. However, how do you explain the appearance of life (and later intelligent life) on earth? Because change happens, but with difficulties, despite odds. Same for people.

A proposal

A primary outcome does not need to be indicated. Neither is a conclusion solely based on the present study vs prior isolated trials is necessary. At the end, most trial conclusions will be misleading compared to the totality of existing evidence. The body of existing evidence is the standard for evidence-based practice.

To me, it would be much more informative if investigators attach a systematic review. They should include an updated meta-analysis using the current RCT information. This is the basis for drawing conclusions. Meanwhile, other people will have to generate the new body of evidence for clinical decisions. This usually happens with some delay. Still, people will jump to conclusions before that happens.

The abstract that I would prefer to read

In the CLOSURE-AF randomized trial, 912 patients with atrial fibrillation at high risk of stroke (mean CHA₂DS₂-VASc 5.2) and bleeding (mean HAS-BLED 3.0) were assigned to left atrial appendage closure (n=446) or physician-directed medical therapy (n=442) and followed for a median of 3 years. In terms of patient-important outcomes, all stroke occurred in 27 patients in each group (6.1% vs 6.1%), with ischemic stroke in 18 (4.0%) vs 15 (3.4%) and hemorrhagic stroke in 10 (2.2%) vs 13 (2.9%), respectively. Systemic embolism occurred in 3 patients (0.7%) vs 1 (0.2%). Any major bleeding (procedural and non-procedural) occurred in 70 patients (15.7%) in the device group (including 18 procedure-related events) and 61 patients (13.8%) in the medical group. Cardiovascular or unexplained death occurred in 99 patients (22.2%) vs 81 (18.3%). Serious adverse events occurred in 368 patients (82.5%) vs 342 (77.4%). Periprocedural complications within 7 days occurred in 5.7% of device patients (25/446) and included major bleeding requiring transfusion in 18 (4.0%), pericardial tamponade in 5 (1.1%), device embolization requiring surgery in 1 (0.2%), procedure-related TIA in 1 (0.2%), peripheral embolism in 1 (0.2%), and death in 2 patients (0.4%). We performed an updated meta-analysis using current results and there is no major change in patient-important outcomes interpretation.

Future perspective for LAAC

In less than 10 days the CHAMPION AF trial, and later in 2026/27 the CATALYST trial will be also published. Hundreds of events will be added in the meta-analysis. The totality of the evidence is likely to be dramatically impacted. So, nothing showed today really matters in terms of LAAC.

Yet, I do think this matter in terms of how people design trials, and consumers read them. Specially in terms of primary composite outcomes. I believe that in the future we will abandon the primary composite outcome design. We will finally globally acknowledge and accept the totality of the evidence for clinical decisions.

Pablo Lamelas MD MSc