Introduction

Last week, Boston Scientific announced the withdrawal of the ACURATE neo2 TAVI series from the market. While undoubtedly a difficult decision, it was likely the right one. Gaining regulatory approval in the future would require significant investment, modifications to the platform (some already addressed with the PRIME version), and still carry the risk of replicating the same unfavorable outcomes. However, I can’t shake the feeling that the core issue isn’t the valve itself—but how it was tested.

This editorial is divided into two sections: first, a summary of the background; and second, a deeper reflection on how the prosthesis was studied, and how it might have been done with a different approach. It’s important to note that my observations are not intended to undermine the efforts of other researchers and those related to the industry, but rather highlight the complexities of testing medical devices.

Background

In Europe, two investigator-initiated trials (SCOPE I and II) tested the original ACURATE neo valve against Sapien and Evolut. These trials were relatively small and underpowered to detect minimally important differences for patient-centered outcomes. They failed to demonstrate non-inferiority using composite endpoints that mixed efficacy, safety, and even surrogate outcomes (especially in SCOPE I).

Later, the larger U.S.-based ACURATE neo IDE trial sought FDA approval using the updated neo2 version, which featured an improved skirt design primarily. Presented at TCT 2024, this trial also failed to demonstrate non-inferiority to Sapien or Evolut. Death, stroke, and hospital readmissions were higher with ACURATE neo2, and the separation of the curves happened early, within 30 to 90 days.

The under-expansion hypothesis

Post hoc analysis of the ACURATE IDE trial suggested that one out of five patients had under-expansion of the frame at the end of the procedure, and this was associated with increased risk of events over time. Patients without under-expansion performed outcomes similarly to the control group.

ACURATE neo2 under-expansion of the valve usually happens due to poor patient selection and/or a lack of enough experience to detect and properly manage (post-dilate). However, sometimes (usually less than 1 out of 5 in experienced centers), no matter what you do, you will have under-expansion despite high-pressure post-dilatation (also happens with other platforms). This hypothesis remains the main explanation by investigators to justify the lack of non-inferiority.

However, the subgroup analysis has a major limitation. The comparison is between a subset of ACURATE neo2 patients (those with under-expansion, roughly 20%) and the full control group, which may unknowingly include anatomies that would also lead to under-expansion if ACURATE had been used. The control group without the subset of patients that would have an under-expanded ACURATE neo2 (20% of the control group; these anatomies may also result in not properly expanded Sapien or Evolut, also considered a factor for adverse outcomes during follow-up) may perform even better than the subgroup ACURATE neo2 properly expanded. This undermines the validity of attributing the observed differences solely to the valve. Interestingly, many who present or discuss this analysis do not mention this important limitation; indeed, I had to figure it out myself.

On top of all that, nearly all the experienced operators I consulted cannot easily explain in simple words why IDE trial failed, even after the presentation of the under-expansion hypothesis. To provide some examples: All observed that balloon-expanding had higher gradients than self-expanding, logic and trials support that, fine. All observed that self-expanding prosthesis had higher paravalvular leak, logic and trials demonstrate that, fine. Here, I cannot see a crystal-clear explanation of why ACURATE neo2 performed worse than regular practice, and that opens the door to other hypotheses.

I understand that outcome event rates were low and these differences can be easily overlooked in usual clinical practice (that’s why, in part, we need some large, simple clinical trials), but also, looking at the RCT data feels weird. Indeed, the IDE trial reported a statistically significant increase in stroke (2.3% more, from 0.1 to 4.4%) and was pointed out as a potential culprit/mediator of increased events. However, both SCOPE-I and II, the ACURATE neo had numerically fewer strokes than Sapien and Evolut at 1-year. If you meta-analyze both SCOPE trials at one year you get 3.2% vs 4.5% favoring ACURATE neo (OR 0.70; 95% CI 0.42 – 1.19), and if you include the IDE trial in the meta-analysis (that is, the body of the evidence, the totality of the evidence) you get 66 vs 60 events (OR 1.10 95% CI 0.77, 1.57) far from anything important. Also, the heterogeneity increases substantially (I2 of SCOPE trials is 0% for stroke, and goes to 63% after including the IDE trial).

So, beyond the under-expansion, What else might explain the results? Why so much heterogeneity in stroke from the SCOPE series and the IDE trial (beyond chance, the main threat of RCTs)?

Operator experience

Every seasoned ACURATE neo2 operator knows two things:

1. This isn’t a “workhorse” valve. Patient selection is crucial and requires experience.

2. The learning curve is longer than for other new valves.

Operators familiar with Sapien adapt quickly to Myval. Those trained in Evolut adapt easily to Portico, Navitor, Vitaflow, etc. However, ACURATE neo is unique. It requires different considerations for planning, implant, and troubleshooting.

From my personal experience:

– I felt confident with Sapien by case 50–100, and with Myval by case 3–5.

– I was comfortable with Evolut around case 50–100, and with Portico/Vitaflow by case 3–5.

– With ACURATE neo, it took no fewer than 30 or 50 cases to feel comfortable.

It is normal to have a longer learning curve relative to pivotal valves like Sapien or Evolut; most centers suffered that learning curve more than a decade ago. I do not see reasons why other operators find ACURATE neo (a totally different valve to the pivotal ones) has the same learning curve as other new valves in the market that are very similar to the pivotal ones. In the IDE trial, most operators had zero prior experience with ACURATE neo or neo2. Furthermore, how many implants did each center perform even after finalizing enrollment in the trial? Likely fewer than 50 for most centers. In this context, is it surprising that results were suboptimal?

Reflections on Evidence-Based Device Testing

The power of the Research Question:

When we teach research methods to students, we cannot emphasize enough how important it is to thoroughly work on the research question. It’s probably the most important step to set, conceive the clinical problem, design the trial, execute, analyze, and then disseminate the trial results.

But why is the research question important here? Are we interested in the results of this valve in centers that have enough experience (ie, those who completed the learning curve, such as the results of centers in Europe or Latin America)? Or are we interested in testing the results among centers that have no prior experience?

Question 1: Among patients with severe symptomatic aortic stenosis candidates for TAVI, is TAVI with ACURATE neo2 comparable to commercially available valves for the primary outcome?

Question 2: Among patients with severe symptomatic aortic stenosis candidates for TAVI, is TAVI with ACURATE neo2 by centers without a complete learning curve comparable to commercially available valves for the primary outcome?

Those are two very different questions when a learning curve is present. Of course, the latter seems to be useless from the regulatory point of view since once approved the valve should reach the learning curve in all centers, but the study design of ACURATE neo IDE trial seems more aligned to test it in centers that don’t have completed the learning curve, since centers started enrolling/randomizing from the very first patient they implanted. How possible is it to match results from inexperienced centers with the favorable “real-world” results that the company was propagating (specifically from Europe)?

The internal validity problem

When the actual results of a study (whether the population, the intervention, the comparator, or the outcomes) are not the planned ones by the research question, we encounter internal validity issues. When we face scenarios in which the intervention needs a learning curve or some degree of expertise, not only to select the patient but to plan and effectively and safely execute the procedure, we may have issues with internal validity if we don’t complete the learning curve before randomizing the patients.

Clinical Randomized Trials for devices: One size doesn’t fit all

In cardiology, we are used to evaluating clinical trials testing interventions that have virtually no (or a steep) learning curve, like medications (whether IV or PO) or coronary stents. Many of my research methods students who come from the surgical or orthopedic fields face this situation of learning curves, in which they tailor the study design to represent results in regular practice.

One clinical design aspect they usually propose is the expertise-based trial, in which the intervention that needs a learning curve is implemented only by the doctors who have enough experience. Of course, this may not be feasible to do in the ACURATE IDE trial, because that would need to bring doctors from Europe or Latin America to perform the procedure in the United States. Another unfeasible option is to send doctors from the United States to Europe or Latin America to train and then come back. So, what other options are available?

Another option is to do what is called a proctoring or pre-randomization phase, in which doctors must perform a minimum of procedures before enrolling the patients in the actual trial. This, of course, will delay the inclusion of patients in the RCTs, but this may pay off in the future. Also, a challenge of this design is to come up with a minimum number of procedures that doctors need, and we have no clear guidelines for that. Given my experience with the device, no less than 30 or 50 cases, or even more, before they start enrolling in the randomized trial.

The third option, which was probably the intended one at the beginning of the study, is to provide very high-quality medical proctoring support for the cases. However, due to COVID it was not possible to deliver medical doctors to all the cases in lockdown stages (international travel was limited), and many cases were delivered without medical proctors even after the pandemic. Still, the results of experienced operators/centers differ from proctored cases (despite superb local staff and equipment).

Even if a decent proctoring phase was mandatory before randomizing was selected, the strategy would be optimal if just a few centers enrolled all the patients in the study. So, the decision to do no proctoring phase, not to provide medical proctoring to all the cases, and to include dozens of centers across the United States enrolling a few patients in each center (instead of a few centers enrolling most of the cases) was probably not the smartest choice for a valve that needs a longer learning curve than other new platforms.

The art of getting positive studies

Sometimes, anecdotes can not only express thoughts better but also make reading more enjoyable. I spent four years as a research fellow of probably the most experienced trialist in the world, Dr Salim Yusuf, and his whole team at PHRI. One day at rounds, Dr PJ Deveraux was presenting locally the results of the MANAGE trial (demonstrated that dabigatran reduced peri-procedural MI in non-cardiac surgery, a positive study). After his presentation, Dr Yusuf joined the stage and said “Congratulations”, and after a small pause and a lot of uncertainty in the small audience, he stated, “for your first positive study”, followed by a hug. PJ executed many randomized trials in the past, like the POISE series among them, all of which were very valuable for the field, but had “negative results”.

From Dr Yusuf, I learned a lot, but one of the most important ones is to value the effort, smartness, perseverance, and thinking that is needed to get a positive study, from picking the right question to answer, planning it, and executing it. The better you are as a trialist, the more likely you are to pick, plan, execute, and get positive studies. Designing a trial just because of a specific inquietude, and let it run on its own will almost certainly provide a negative result. It is clear that even the best trialist with the best clinical trial plan and immaculate execution can have negative results (Dr Yusuf, Jolly, Whitlock, and many other top large trial researchers got negative studies), of course, but it happens less often to experienced than less experienced trialists.

Okay, so why bother with this anecdote between Dr Yusuf and PJ? Because before you get a positive study, you need to deeply understand the device being tested and adapt the design to answer the sensible question: does this device work in usual circumstances? If most experienced ACURATE neo/neo2 users acknowledge this longer learning curve, why not incorporate this aspect into the trial design rather than randomize from the very first patient that unexperienced centers perform?

Clarification: I’m not referring to getting positive trials by design or data manipulation, never seen Dr Yusuf or any from PHRI do such things, they only happen in the imagination of people that never spent time at large research centers. That’s the advantage of spending some time of your life inside the kitchen, you know what you eat. Those who never spent time in the kitchen cannot accurately tell how the food they consume was made, and sometimes they quickly fall into nonsense conspiracy theories (“that sausage contains dog/horse meat”).

FDA requirements role

Is it not interesting that a study designed to use ACURATE neo2 in the US generates the valve being withdrawn from all over the world? If there were no such US trial, thousands of people would still use it every year across the globe, and it is unclear to me that at the expense of worse outcomes.

In some way, this reminds me to the renal denervation drama: therapy looks promising and gets approved in Europe, then SYMPLICITY HTN-3 for FDA approval and the therapy seems to suddenly die, but then multiple subsequent properly designed and executed trials demonstrate is effective and now companies are fighting against the “negative trial” inertia that SYMPLICITY HTN-3 generated. Note: SYMPLICITY HTN-3 advanced knowledge, since it demonstrated that if you do incomplete denervation, select less responsive patients, and background medication isn’t properly monitored, the treatment is less effective than expected. But in the end, a study to get FDA approval almost exterminated an effective therapy.

Questions that deserve deeper thought:

Is it necessary to always test new interventions in US soil before they can be delivered to US citizens?
Do stakeholders contemplate that interventions that need a learning curve (such as renal denervation or ACURATE neo2) that do not consider this for study design may be at risk of a misleading approval turndown?
Are FDA trials the final answer? For renal denervation, we have a final answer (FDA approved from 2023), which will not be the case for ACURATE neo2.

It is crucial to acknowledge the mandate of the FDA to ensure the safety and efficacy of new devices. The goal here is not to criticize but to advocate for adaptive regulatory strategies that recognize the learning curve challenges of new medical devices, such as the ACURATE neo2.

Limitations of my perspective on the learning curve

Most of this article’s logic is built from the concept of an incomplete learning curve before enrolling/randomizing patients, based on the personal sensation that this valve has a longer learning curve vs other new platforms (shared with virtually all the experienced colleagues I exchanged opinions with). That said, some may not agree with me, and that’s fine. However, this conversation cannot be fair with physicians who did not reach a decent number of cases, such as most of ACURATE IDE trial. Some may have implanted 10 or 20 cases or so for the trial and say, “We had no problems, easy procedure”, an excellent example of the early peak of the Dunning-Kruger effect.

Open questions remain for future products

Do we need platforms that require a longer learning curve? Why bother with new stuff? The ACURATE neo2 prosthesis is a device with major advantages relative to others in the market, such as deliverability, flexibility, low pacemaker rate, low coronary occlusion risk, coronary re-access, commissural alignment, low gradients, especially in small anatomies, among others. That said, yes, as far as new devices offer advances, the learning curves of new procedures are probably justified. Companies should do their best to reduce the learning curve with technology (like the delivery of radiopaque markers and the simplified Step 2 from PRIME) and education/training.

Wrapping up

So, what led to the ACURATE neo commercial failure?

Was it the valve? The way it was tested? A mix of both?

As Kanye West once said: “Guess we’ll never know.” With the valve pulled and no further trials planned, we may never find out. But one thing is certain: many experienced users around the world will miss what ACURATE neo2 brought to the table.

Conflict of interests: I’m a (from now, former) proctor/consultant of ACURATE neo2 and other devices from Boston Scientific (have no stock), and proctor/consultant of many other TAVI devices. I also share the mission of companies to provide the best to our patients, and Boston Scientific demonstrated a high level of transparency and commitment to their mission by putting everything at risk to test their new platform against the current best standards in the US soil. This decision by Boston Scientific reinforces the trust between the medical community with industry.

Pablo Lamelas

Interventional Cardiologist, Fundación Favaloro, Argentina. Assistant Professor of Health Research Methods, Evidence, and Impact, McMaster University, Canada.

Understanding the ACURATE neo2 TAVI Withdrawal: Key Insights

Introduction

Background

The under-expansion hypothesis

Operator experience

Reflections on Evidence-Based Device Testing

The power of the Research Question:

The internal validity problem

Clinical Randomized Trials for devices: One size doesn’t fit all

The art of getting positive studies

FDA requirements role

Questions that deserve deeper thought:

Limitations of my perspective on the learning curve

Open questions remain for future products

Wrapping up

Leave a comment Cancel reply

Introduction

Background

The under-expansion hypothesis

Operator experience

Reflections on Evidence-Based Device Testing

The power of the Research Question:

The internal validity problem

Clinical Randomized Trials for devices: One size doesn’t fit all

The art of getting positive studies

FDA requirements role

Questions that deserve deeper thought:

Limitations of my perspective on the learning curve

Open questions remain for future products

Wrapping up

Share

Relacionado

Leave a comment Cancel reply