Abstract
Abstract
For this “hot topic” session in uveitis we selected first and foremost an issue that puts our clinical work and research in “holding pattern.” The issue is our method of evaluating the severity of uveitis. We posed the following questions to our esteemed panelists:
The relative significance of cells vs. flare in following uveitis patients Cells/flare measurements A glance into the future and the relevance of endpoints in clinical studies and their methodologies
While there are different opinions in managing and monitoring uveitis patients, there seems to be an agreement on the high need of improving objective mode/s of reliably measuring both cells and flare and better understand their significance.
Lately, there were some remarks made about the role of flare in assessing uveitis severity and activity. Traditionally cells were considered the major component of uveitis activity and flare was considered a byproduct of longstanding damage to blood vessels. Currently, changing levels of flare seems to become more relevant in the assessment of uveitis activity. Having said that, Quan, as a leader in uveitis clinical studies can you share some thoughts with us about how to conduct proper clinical trials for uveitis and whether we can consider flare as an outcome.
We know that there have been several uveitis trials that have started within the last 17 years since I left fellowship, and very few of them had made it far because of the number of challenges we have.
There are three different aspects that are very important in planning and conducting clinical trials: one is the target population condition. Unlike age-related macular degeneration (AMD), we don't have thousands of patients to enroll into a trial, so our target population is small.
The second point is inclusion criteria which are also very difficult because of the dispute whether uveitis is one entity. For example, we can consider the vast differences between VKH and sarcoid. Is it proper to consider both entities as candidates in the same study? Thirdly, outcome measures are still under discussion. Regulatory bodies such as the FDA only accept certain endpoints so far, because we as a community, have not been able to demonstrate what are the appropriate parameters that should be followed over short and long periods of time.
Thus, challenges exist both in the design and the conduct of trials: the different etiologies of disease mechanism that may present with similar clinical manifestations. One etiology may be different than the other. Infection may also trigger and stimulate autoimmune disease. Autoimmune diseases have such wide variety of etiologies and natural history that are not well known.
Therefore, the question we need to consider is whether there will be ever be a clinical trial design that will satisfy a group of uveitis experts like us. Probably if one asks any of our five distinguished colleagues sitting here today, each one of them would have a specific goal in mind of what the trial should look like, and that continues to be a challenge.
What is missing from our group here? First of all, we need to have a better understanding of disease mechanism in different entities for example, sarcoid. Is it the same as VKH? How is it different from Adamantiades-Behçet disease? As clinician scientists, we do not yet have a very clear understanding of the disease pathophysiology.
Second, there's also a need to have better agreements among uveitis specialists, because if one were to ask the group from SUN, the group from IUSG, AUS or IOIS, each of them may not always come up with the same definition for disease, and that's just our fault. Therefore, before we come to the regulatory authority, whether the FDA or the EMA, we need to come up with better definitions and agreements for endpoints.
Ron just referred to the flare meter, and why we have not incorporated it in any of our clinical trials thus far. It is because we have not yet demonstrated reproducibility and utility in clinical activities.
Also, outcome measures in uveitis clinical trials have been very difficult. Vitreous haze, which is something that the EMA and the FDA have been using, has not always been well accepted by our community. We know that visual acuity is not always an accurate reflection of uveitic activity. Many of us feel that macular thickness in the setting of edema, is an anatomic endpoint, and not always a true representation of inflammation.
What are other possible parameters? One of the parameters we're missing, for example, is macular sensitivity. Why can't we incorporate retinal sensitivity into uveitis trials? One of our colleagues from Stanford, Yasir Sepah, has published on this topic to describe how macular sensitivity and fixation pattern have good correlation and can be used as potential markers in patients with maculopathy secondary to uveitis, for example.
Other parameters that we can consider include reading speed and visual field, which may be considered as endpoints in appropriate settings.
Thus, the gap in planning and conducting uveitis studies is to develop trials for real-world patients. We may need to step beyond patients that are easier to recruit in order to finish enrollment for a trial, but rather those who have unmet needs. Thus, we should not conduct trials that are just to find a role for a new pharmacologic agent. Rather, we should define carefully the clinical and the etiologic diagnosis in the trials.
In addition, we have different devices that can and should be employed in trials. The laser flare meter is certainly a possibility. The flare meter has been promoted by many colleagues, especially those in Asia and Europe, but have not yet been incorporated into clinical trials.
In summary, there are challenges in the design and implementation of clinical trials in uveitis, which is owed to the nature and the presentation of disease, and the need to measure the structural and functional endpoints. Thus far, the various recent and ongoing clinical trials are meeting these challenges by two ways. One, they try to evaluate the new treatment or treatments that have not been used in patients, and two, they try to employ innovative outcomes.
We have been handicapped for a long time. I left fellowship in 2001. Since then, over the last 17 years, I have seen a number of clinical trials, and so far there are very few that have made it all the way to the end to be approved because a number of them have failed near the end of the chain.
The session just before us discussed about geographic atrophy (GA). Retina colleagues are having difficulty getting drugs to be approved for GA, because perhaps the endpoints have not been well defined, and the disease mechanisms have not been fully elucidated. For uveitis, I believe it's also a problem, too, because we don't have endpoint or endpoints that all of us are completely satisfied. Hopefully, during the next 45 minutes, we'll be able to reach consensus!
So, Marc, why don't you talk to us about the flare? And then we will discuss together the various ways of measuring an effect.
Recently, the SATURN trial used a novel flare measurement in the posterior segment using both clinical observation and photographic documentation. And there also, a significant discrepancy was observed between the two methods. Ultimately, if you're trying to do studies, you want to be able to document in an objective, reproducible manner meaningful clinical parameters. It is a significant issue for all of us interested in achieving better care in uveitis patients.
In uveitis, we are dealing with (to use a biblical term) the Tower of Babel and this, I think, was very well outlined by Quan. We treat multiple diseases, each of which are different in etiology, course, and outcomes. They have therefore different endpoints regarding progression, and response to treatment. Furthermore, we assess these diseases, using non-validated scales—where each incremental step may or may not represent a clinically significant worsening of the inflammation. In all its years of existence, the flare/cell scale which goes back to Hogan and Kimura, was never validated against disease. It was used from its inception as a qualitative assessment.
We use it because it helps us to evaluate uveitis severity. It is also oddly weighted with most uveitis entities falling in the 1 to 2 range and very few reaching a higher level.
There are few grading steps. Having only four levels limits the ability to discern fine changes in inflammation. Hence, adjudicating finer changes in inflammation. In studies, we are often left evaluating an all or none response. Each uveitis expert also interprets the scale in his own way, using a slit beam and intensity that can be variable from one exam to the next.
The SUN Working Group attempted to standardize our approach starting by defining what can be considered as active or inactive uveitis. What should be considered as an improvement in activity, or what defines a remission? Also it took a good look at previous scales and provided further standardization.
For cells, the consensus was to measure within a 1 mm by 1 mm max intensity slit beam, the number of cells present. Trace was 1 to 5 cells, 1+ from 6 to 15. Counting these numbers is possible. When you get to 25, 50 or more, it becomes impossible to count that number, particularly since they move around. In fact, to be perfectly honest, beyond 10, I'm uncomfortable giving a number and I start making a mental estimate. Would it not be better to have a “comparative scale” or even take a high-resolution picture?
With regards to flare, our definition is vague, particularly between none and faint (0 and 1+). The difference between faint and moderate is also unclear as for the latter, iris and lens details are clearly visible. Hogan and Kimura were more specific in terms of their description.
Yeo et al. came out in 2016 with a study in which they asked 65 uveitis experts how they use the SUN scale. 69% of these experts did use a slit beam that is 1 by 1 mm, but the remaining experts were using a larger beam.
If you look at interobserver variations and I guess John could tell us a lot more about it, exact matches for anterior chamber cells are present in only 58% of cases. But for vitreous haze, a measurement that could be considered more difficult to perform as it requires the use of an indirect ophthalmoscope, we get better concordance between observers around 69%.
I believe the reason for this concordance is that to quantitate vitreous haze, we use a photographic scale, while for anterior chamber cells we use a physician's judgment. The use of a photographic grading scale insures a better standardization of the assessment and reduces interobserver variability.
We're not the only ones using scales to assess severity of ocular inflammation. Eaton published in the J Ocul Pharmacology and Therapeutics in 2017, the “SPOT” system used by ophthalmic veterinarians when performing preclinical toxicity assessments. During their evaluation, they quantitate not only intraocular inflammation, using a more elaborate description of flare, they also look at conjunctival hyperemia and chemosis. It is interesting to note that we also see in many cases of the anterior segment uveitis changes in the conjunctiva, and yet we don't grade this at all. A more elaborate description of what each flare grade represents improves consistency, but not as much as can be achieved by comparing photographs of specific grades of flare.
The vitreous haze scale makes use of such images. The original grading scale by Nussenblatt was created by defocusing the view of the retina, making the image progressively blurrier. As mentioned earlier, despite its simplicity, it provides a relatively consistent interobserver assessment of haze, but as the anterior cell/flare grading scale, it lacks discriminatory power. A newer scale developed in Miami by Janet Davis uses nine levels of vitreous haze instead of 4. This scale still needs to be “field tested,” but I suspect this will be a step forward from the Nussenblatt scale.
Flare can also be quantitated using technology. The laser flare photometer was introduced nearly two decades ago for this purpose. Technology in principle can provide us with better discriminatory power using a continuous scale and better interobserver reproducibility. Agarwal et al., attempted to correlate laser flare photometry with the SUN classification. From an analysis of 50 eyes, they observed that laser/flare photometry provided a continuous variable and was better able to distinguish levels of inflammation. The discriminatory power of the SUN classification was limited, but could be improved by adding an intermediate step between 0 and 1. Without the introduction of this intermediate step, there was considerable overlap in the flare photometry measurements between patients assigned to grades O and 1. However, in their study, the state of anterior chamber, in particular the presence of a cataract was not taken into account, and yet it is known to influence basal flare values. Hence, if we wish to use flare photometry in clinical studies, also for this device, we would have to establish standards for the measurement and interpretation of results. We would have to define basal flare photometry levels with regards to age, the presence of cataract, and since chronic flare can lead to a persistent high value over time, some standardization in the number of measurements needed to determine baseline values.
Would other technologies be appropriate? One drawback of laser/flare photometry is the need for a dedicated machine. Anterior segment swept-source optical coherence tomography (OCT) also provides a means of assessing both cells and flare in the anterior chamber. If existing OCT machines could be adapted by use of an add-on lens to measure flare in the anterior chamber, this would be a significant advantage for many clinicians, and facilitate the conduct of clinical trials. An exploratory paper using an anterior swept source OCT gave striking results with good correlation between the measurements and the SUN clinical grades. There was also little overlap in the OCT values from one clinical grade to the next indicating that OCT may be better at assessing finer differences in disease severity as compared to laser/flare photometry, where overlap between grades was present.
So, what would be the ideal way to look at cell and flare? Well, we need a measurement that is robust, reproducible, correlated with disease severity. We need a simple test that we can use in many different settings, and that is not exorbitant in cost.
Devices would be nice, but one should not dispel a clinician's ability to score inflammation. As indicated above, validation of the scoring system is important, and the use of reference images. I was made recently aware of a UK digital patient record which required ophthalmologists to score diabetic retinopathy based on the same photographic images that were used in the ETDRS studies. Despite scoring while seeing patients in busy practices, the scores were similar to those achieved using independent graders.
In uveitis, we have scales for both the anterior segment as well as a posterior segment, but they're not ideal right now. Not as explicit or as well tested as those for diabetic retinopathy, but with a little effort, we can get there. While dedicated machines may be nice, we can also just use a clinical evaluation, against photographic comparators. Comparators that provide significant distinctive steps between each severity level. May be this is where technology can help us best to establish the ideal scales we as clinicians can use in our practices and clinical studies. Thanks.
So for me, flare definitely plays a role, in JIA associated anterior uveitis. There are some problems with the SUN criteria to define activity and inactivity. If you have 1+ cell in an eye, with a pressure of 30, then this is highly active, and actually suggests a diagnosis of a virally induced anterior uveitis. So, in these situations, the pressure is a sign of activity, not so much the amount of cells.
In contrast to this situation: If you have a +1 cell finding in Fuchs' uveitis, you may say “Well, wonderful. See you in a year.”
I think there are some very good studies. We recently finished a very successful one recruiting 41 intermediate uveitis only. So why not using this example? I agree: that seems to be quite a small group. In our clinic we see ∼1,000 new uveitis patients per year. Approximately 60% have intermediate uveitis. So it's not so rare and it should provide sufficient number to include into studies.
One of the things I want to comment to Marc: it's very nice to use scales for vitreous haze and cells. The major problem I see is measuring exactly your view to the posterior segment. And this does not portray the complete vitreous inflammation, especially not in intermediate uveitis, where a lot of inflammation happens more peripheral. So, we should have measurements which allow you to count the whole vitreous inflammation. In addition, the amount of snowballs for me do not reflect high activity. I may see patients probably with 20/20 once a year, they have the same snowballs, so I don't treat them, definitely not. But it's very important for reaching the correct diagnosis.
So, I would encourage people from the industry to be a little bit more patient. Work with the good people, and not necessarily only in Europe and America, definitely not. There are fantastic activities going on in Asia and other places. As a second point I may suggest: include subgroups.
To compare a birdshot with intermediate uveitis is ridiculous. You are having two different animals, and no surprise that you will get different outcomes. I must agree that the Humira studies were probably the best ones we had officially. Some of the others were a complete disaster, and their design ruined the results initially.
The point of this anecdote is that I don't think this scale is precise enough to make much judgment because +2 is really a huge amount of flare, and +1 is much less and improving from 2+ to 0 is probably nearly impossible. I think there must be some better way of measuring flare than this scale if you're going to use it for clinical practice, especially since cases don't necessarily go to 0 when they are quiet.
In the United States, there is less than a dozen, probably less than 6 clinical centers that have the flare meters from Kowa. We are fortunate to have one at the Byers Eye Institute at Stanford. It has been difficult for us to be able to get others to know about it because very few colleagues, at least in the United States, are familiar with it.
I am beginning to incorporate the flare meter and flare measurement into my clinical practice and in the uveitis clinical trials that we are launching across the United States. But it's hard because so few people have the device. As Marc said, it's about $US50,000 to 60,000. And one of the things people always ask is: how are we going to get this measurement reimbursed if we purchase the flare meter. If we get an OCT, at least we can charge for it. If we get a flare meter, right now, we don't have a way to recover the cost. Therefore, flare meters are mainly at academic centers in the US.
Therefore, getting the community to get accustomed to the flare meter has been a challenge for us because the majority do not have the device and thus cannot appreciate what it can do and what information it can give and how useful it is.
For every “uveitic disease” entity there are different weights for different signs, depending on their relative risk in causing visual impairment. I think we should try to come up with a numerical scale, like other disciplines in medicine (for example, Glasgow coma scale).
In this setting, a flare meter would be a more accurate tool, to provide an exact numerical data, rather than the variable inter-observer evaluation.
An important end point that we also currently use is visual acuity, and it may sometimes be misleading. A dense vitritis just in front of the posterior pole correlates with visual acuity (VA), however a focus of retinitis that does not yet involve the fovea may have a higher risk effect of vision loss but will look “better” if VA will be the sole parameter.
We should define now entities as well as we can in terms of diagnostic criteria, but also in terms of what is an adequate clinical response. I see no problem in testing different uveitis entities with the same drug, so long as for each we know what represents an acceptable response. Right now, for intermediate uveitis it's fairly well defined. It is not surprising that several recent clinical studies focused specifically on this disease. It is a very select group of patients, and an adequate response in other entities is not the same.
In my mind, the AbbVie trials were good trials, but I don't think they were necessarily that much better than the other trials in terms of the outcome selected. I think that it's more that the drug was more efficacious, so the effect could be detected. I think that a wide variety of different designs would have found that the drug works.
As long as we have some way of measuring activity or inactivity (or whatever we're going to measure), and it's not severely biased, then the trial should find differences between active treatment and comparator for that outcome once the drug actually works and the sample size is sufficient. The issue about whether the drug only works for certain types of uveitis is another matter—a very important matter.
If we think about the outcome we are going to use, we should consider how the drug would be used in the future. In general, I think that people are going to decide if they like the drug or not based on whether they think it controls their uveitis. Most of us think we can tell if the uveitis is controlled. And so, it could perhaps be fairly simple to ask for the main outcome in our trials, is the uveitis controlled or not at the key follow-up visits? Or if the goal is corticosteroid-sparing, has that goal been met by the key time point?
I think success in our fundamental clinical objectives is really what we want to look for, because that's also going to support the use of the drug. If the drug is causing us to say that the uveitis is inactive, then we're going to like it in our clinical practice. Such a drug is likely to do well in the market. If the drug is not having too much effect, and we're sort of splitting hairs as to whether it was showing a positive effect or not, then it's probably not a very strong drug.
And if these both disorders behave nicely to the treatment, then this is probably only for a short time. The more we know about the pathophysiology of these diseases, the more specific our treatment will become.
Summarizing everything as “posterior segment uveitis” and giving them all such an unspecific treatment may result in immediate good results. But the future will tell you that you must work with subgroups. So Behcet's disease needs its own treatment, as we have shown in our interferon alpha studies. Never compare that with the simple signs of inflammation in intermediate uveitis.
Ideally, if you have evaluated inflammation with a fancy machine in the trial, you still want to be able to follow the outcome in clinical practice. This is somewhat in an analogy with Lucentis when it was first approved. It was suggested that the retreatment criteria should be based on OCT or visual acuity. But only a few universities clinical had OCT, so initially, re-treatment criteria had to be based on visual acuity only.
So, when the drug comes out in clinical practice at least the majority of clinics should be able to follow the treatment effect.
Where I think the flare meter is particularly useful is for example the measurement of flare. I can't measure flare very well with my eyes, but a machine can. When objective measures are critical, a device that measures flare as continuous variable has advantages. So I agree with you, but at the same time, it's not for the investigator to define or validate the parameters. Maybe you can ask a drug company to do these multiple comparisons because they need them for their label.
And that's the point where my patients start immediately half an hour or an hour. It depends a little bit what happened before. If they can get rid of this one, and this is very, very successful. The next day they go to the doctors, and you may see probably 1- or 2-plus cells which were not there. This is an acute uveitis. That's a beautiful disease, yes? The chronic one, that's a real problem.
If we would like to estimate the risk of vision loss in these patients, we again should aim to establish a dedicated activity scale. Such a scale should consider also their therapy burden and the risk of complications resulting in visual impairment, such as, the presence of posterior synechiae, cataract, cystoid macular edema, etc.
In clinical trial we should be very accurate. If it helps us to get conclusions in a shorter period with much more accuracy, that's great, because this is the goal of clinical trial. It's not to treat patients, just to help us as soon as we can to have this drug on the market. And that's different thinking.
That's why I believe that some of the trials did fail, because of a suboptimal design. Like, for instance, the definition of an activation. Looking at small fluctuations, are these really flares? If you're counting the small fluctuations, probably it will not show any difference between study groups.
And, dividing for different disease entities. There was as an attempt to do a Behcet's trial, and I agree, we all feel very sorry that it failed, because we could have had a drug that is good for this and may not be working for others. So yes, some diseases within the uveitis spectrum seem to stand out and require special attention.
How do we propose that the uveitis community come up with something that colleagues can agree upon? We have not reach consensus so far.
But in some current studies, because the sponsors want to be successful, they just follow what the VISUAL studies did and start every study subject on high-dose systemic corticosteroids.
There are caveats in such trials that use time-to-failure. The VISUAL studies demonstrate that adalimumab can be beneficial for patients with non-infectious uveitis. Yes, the trials showed that adalimumab worked. However, in clinical practice, colleagues would not necessarily use it as the first line of treatment until they have employed other agents because of the study designs.
I think with the other forms of uveitis, we should give credence to outcomes like overall control of inflammation. This “time to failure” kind of concept to some extent addresses that. I do not like the idea that losing visual acuity is part of failure, because somebody could have gotten intra-vitreous corticosteroids a month and a half before enrollment in the study. Then they develop a cataract during the study, and now it's counted as a failure which may have nothing to do with the treatment. It's true that such an event may end up being balanced across treatment groups, but why do we need it? The immediate goal of treatment is to control inflammation without causing unnecessary side effects. It is true that one of the key reasons for controlling inflammation is in order to preserve vision, but preserving vision is an indirect effect of the treatment. So to me, it seems like controlling inflammation would be the ideal outcome for a study we want to complete quickly, to make the treatment available.
Regarding heterogeneity of effect across different types of uveitis, we're a little bit confused because corticosteroids work for basically all the forms of ocular inflammation, and so we expect all the other drugs to work for all of them, too. Nevertheless, they won't necessarily work equally well for all of them. That sais, many types of ocular inflammation are so uncommon that it will be hard ever to do sufficient trials to sort that out, and we probably will have to rely on post-approval clinical outcomes.
In summary, it seems to me that control of inflammation should be what we're monitoring and tells us whether we're succeeding with our therapy. In terms of implementation afterwards, control of inflammation should correspond well with our therapeutic goal in managing the patient. I would argue that we can largely agree on control or lack of control of inflammation (and the related concept of corticosteroid sparing), and ought to be able to do trials using that outcome. We may however be prudent to inflate the sample sizes for heterogeneity of effect across different types of posterior and panuveitis.
We do have a problem in deciding on whether an eye is active or not when there are merely 1 or 2 cells per high-powered field in the anterior chamber. But if we differentiate between very active and minimally active, I think we can have very good agreement in discriminating that for almost any uveitic disease.
