Hello!
I'm wondering if anyone knows of some case studies for stroke victims.
Health, Hope, Joy & Healing :
May you Prosper, even as your Soul Prospers 3John 2
Jennifer Ruby
Email advice is not a substitute for medical treatment.
http://www.rubysemporium.com
http://groups.yahoo.com/group/SymphonicHealth
http://groups.yahoo.com/group/Therapeutic-Laser_Therapy
http://groups.yahoo.com/group/SOTA_LightWorks/
http://www.lazrpulsr.com
______________________________________________
«¤»¥«¤»§«¤»¥«¤»§«¤»¥«¤»§«¤»¥«¤»§«¤»¥«¤»§«¤»¥«¤
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Stroke Study Cases
-
- Posts: 156
- Joined: Sun Nov 06, 2005 11:00 pm
Re: Stroke Study Cases
2008/1/24 Ruby ruby@industryinet.com :
Dear Jennifer,
Some details below.
Regards.
Sarvadaman Oberoi
Tower 1 Flat 1102, The Uniworld Garden,
Sohna Road, Gurgaon 122018 Haryana INDIA
Mobile: +919818768349 Tele: +911244227522
Website: http://www.freewebs.com/homeopathy249/
email: manioberoi@gmail.com
DETAILS
1.
The Evidence for Homoeopathy Foreword to Version 8.3 September 06
(update of March 02 Harvard Medical School Course version and Reilly D. Alt Ther Med Health 2005;(11)2:2831).
14. Is it progressing and contributing to medical advance?
New remedies and approaches are being developed, e.g. see the immunomodulation research in references 5 to 8, and the results from the stateofthe–art conventional research labs where Jonas and colleagues have shown reduced stroke damage in rats using the conventional knowledge of the toxicity of the released glutamate from the damaged brain, and the application of the homoeopathic principle with ultralow dose glutamate 69 .
69 Jonas WB, Lin Y, Tortella F. Neuroprotection from glutamate toxicity with ultralow dose glutamate. Neuroreport 2001;12:3359.
2.
Contributed by: Dr. J. Rozencwajg, MD, PhD. from New Plymouth, New Zealand
(jroz@ihug.co.nz )
Primary complaint: A Patient with stroke
This elderly gentleman (who has given permission to report his case) came into my office about 18 months ago. He sat down heavily on the chair and said in a slurry voice "My head is full of cement". He had been referred to me by a GP who had nothing else to offer him but wanted to help. He was obviously having what is called a stroke in progress: not a sudden event, not TIAs that dissappear without leaving a trace, but progressively worsening paresis (partial paralyisis), slurring of the voice, slow ideation, need to think and search for his words, slow motions,..... To me, he belonged in a neurological ward in hospital, with anticoagulants, monitoring, CT and MRI and angiograms, as he could have completed his stroke any time. But having heard that explanation, he and his wife decided to give homeopathy a go. So I gave him Plumbum 5C, 1 globule dissolved in 10 mls of water, take one drop 3 times/day for 3 days and call me. Needless to say I was very nervous about him, and as he did not call on the third day, did so on the fourth. He profoundly apologised for not ringing me as had been busy mowing his lawn! All the symptoms dissapeared on the second day, never to reappear again.
( Note: Plumbum is the Homeopathic remedy made from Lead.)
( This case is published on the impossible cure website – www.impossiblecure.com - the site of Amy Lansky author of the best selling book on Homeopathy – Impossible Cure)
A further and even more dramatic case is detailed in Amy Lansky's book
3.
Users Guides to Evidence Based Practice
http://www.cche.net/usersguides/main.asp
1. Was the assignment of patients to treatment randomized?
During the 1970s and early 1980s surgeons increasing undertook extracranial-intracranial bypass (that is, anastomosis of a branch of the external carotid artery, the superficial temporal, to a branch of the internal carotid artery, the middle cerebral). They believed it prevented strokes in patients whose symptomatic cerebrovascular disease was otherwise surgically inaccessible. This conviction was based on the comparison of clinical outcomes among non-randomized "cohorts" of patients who, for whatever reason, had and had not undergone this operation, for the former appeared to fare much better than the latter. To the surprise of many and the indignation of a few, a large multi-center randomized trial, in which patients were allocated to receive or forego this operation using a process analogous to flipping a coin, demonstrated that the only effect of surgery was to make patients worse off in the immediate post-surgical period; long-term outcome was unaffected [6]. Other surprises generated by randomized trials that contradicted the results of less rigorous trials include the demonstration that steroids may increase (rather than reduce) mortality in patients with sepsis [7], that steroid injections do not ameliorate facet-joint back pain[8], and that plasmapheresis does not benefit patients with polymyositis [9]. Such surprises may occur when treatments are assigned by random allocation, rather than by the conscious decisions of clinicians and patients. In short, clinical outcomes result from many causes, and treatment is just one of them: underlying severity of illness, the presence of comorbid conditions, and a host of other prognostic factors (unknown as well as known) often swamp any effect of therapy. Because these other features also influence the clinician's decision to offer the treatment at issue, nonrandomized studies of efficacy are inevitably limited in their ability to distinguish useful from useless or even harmful therapy. As confirmation of this fact, it turns out that studies in which treatment is allocated by any method other than randomization tend to show larger (and frequently "false-positive") treatment effects than do randomized trials [10] [11] [12] [13]. The beauty of randomization is that it assures, if sample size is sufficiently large, that both known and unknown determinants of outcome are evenly distributed between treatment and control groups.
What can the clinician do if no one has done a randomized trial of the therapeutic question she faces? She still has to make a treatment decision, and so must rely on weaker studies. In a later article in this series devoted to deciding whether a therapy or an exposure causes harm (a situation when randomization is usually not possible) we deal with how to assess weaker study designs. For now, you should bear in mind that non-randomized studies provide much weaker evidence than do randomized trials.
2. Were all patients who entered the trial properly accounted for and attributed at its conclusion?
This guide has two components: was followup complete? and, were patients analyzed in the groups to which they were randomized?
a) Was followup complete?
Every patient who entered the trial should be accounted for at its conclusion. If this is not done, or if substantial numbers of patients are reported as "lost to followup," the validity of the study is open to question. The greater the number of subjects who are lost, the more the trial may be subject to bias because patients who are lost often have different prognoses from those who are retained, and may disappear because they suffer adverse outcomes (even death) or because they are doing well (and so did not return to clinic to be assessed).
Readers can decide for themselves when the loss to follow-up is excessive by assuming, in positive trials, that all patients lost from the treatment group did badly, and all lost from the control group did well, and then recalculating the outcomes under these assumptions. If the conclusions of the trial do not change, then the loss to follow-up was not excessive. If the conclusions would change, the strength of inference is weakened (that is, less confidence can be placed in the study results). The extent to which the inference is weakened will depend on how likely it is that treatment patients lost to followup all did badly, while control patients lost to followup all did well.
b) Were patients analyzed in the groups to which they were randomized?
As in routine practice, patients in randomized trials sometimes forget to take their medicine or even refuse their treatment altogether. Readers might, on first blush, agree that such patients who never actually received their assigned treatment should be excluded from analyses for efficacy. Not so.
The reasons people don't take their medication are often related to prognosis. In a number of randomized trials non-compliant patients have fared worse than those who took their medication as instructed, even after taking into account all known prognostic factors, and even when their medications were placebos! [14] [15] [16] [17] [18] [19] Excluding non-compliant patients from the analysis leaves behind those who may be destined to have a better outcome and destroys the unbiased comparison provided by randomization.
The situation is similar with surgical therapies. Some patients randomized to surgery never have the operation because they are too sick, or suffer the outcome of interest (such as stroke or myocardial infarction) before they get to the operating room. If investigators include such patients, who are destined to do badly, in the control arm but not in the surgical arm of a trial, even a useless surgical therapy will appear to be effective. However, the apparent effectiveness of surgery will come not from a benefit to those who have surgery, but the systematic exclusion of those with the poorest prognosis from the surgical group.
This principle of attributing all patients to the group to which they were randomized results in an "intention-to-treat" analysis. This strategy preserves the value of randomization: prognostic factors that we know about, and those we don't know about, will be, on average, equally distributed in the two groups, and the effect we see will be just that due to the treatment assigned.
B. Secondary Guides
3. Were patients, their clinicians, and study personnel "blind" to treatment?
Patients who know that they are on a new, experimental treatment are likely to have an opinion about its efficacy, as are their clinicians or the other study personnel who are measuring responses to therapy. These opinions, whether optimistic or pessimistic, can systematically distort both the other aspects of treatment, and the reporting of treatment outcomes, thereby reducing our confidence in the study's results. In addition, unblinded study personnel who are measuring outcomes may provide different interpretations of marginal findings or differential encouragement during performance tests, either one of which can distort their results [20].
The best way of avoiding all this bias is double-blinding (sometimes referred to as double-masking), which is achieved in drug trials by administering a placebo, indistinguishable from active treatment in appearance, taste and texture but lacking the putative active ingredient, to the control group. When you read reports on treatments (such as trials of surgical therapies) in which patients and treating clinicians cannot be kept blind, you should note whether investigators have minimized bias by blinding those who assess clinical outcomes.
4. Were the groups similar at the start of the trial?
For reassurance about a study's validity, readers would like to be informed that the treatment and control groups were similar for all the factors that determine the clinical outcomes of interest save one: whether they received the experimental therapy. Investigators provide this reassurance when they display the "entry" or "baseline" prognostic features of the treatment and control patients. Although we never will know whether similarity exists for the unknown prognostic factors, we are reassured when the known prognostic factors are nicely balanced. Randomization doesn't always produce groups balanced for known prognostic factors. When the groups are small, chance may place those with apparently better prognoses in one group. As sample size increases, this is less and less likely (this is analogous to multiple coin flips: one wouldn't be too surprised to see seven heads out of ten coin flips, but one would be very surprised to see seventy heads out of one hundred coin flips).
The issue here is not whether there are statistically significant differences in known prognostic factors between treatment groups (in a randomized trial one knows in advance that any differences that did occur happened by chance) but rather the magnitude of these differences. If they are large, the validity of the study may be compromised. The stronger the relationship between the prognostic factors and outcome, and the smaller the trial, the more the differences between groups will weaken the strength of any inference about efficacy.
All is not lost if the treatment groups are not similar at baseline. Statistical techniques permit adjustment of the study result for baseline differences. Accordingly, readers should look for documentation of similarity for relevant baseline characteristics, and if substantial differences exist should note whether the investigators conducted an analysis that adjusted for those differences. When both unadjusted and adjusted analyses reach the same conclusion, readers justifiably gain confidence in the validity of the study result.
5. Aside from the experimental intervention, were the groups treated equally?
Care for experimental and control groups can differ in a number of ways besides the test therapy, and differences in care other than that under study can weaken distort the results. If one group received closer followup, events might be more likely to be reported, and patients may be treated more intensively with non-study therapies. For example, in trials of new forms of therapy for resistant rheumatoid arthritis, ancillary treatment with systemic steroids (extremely effective for relieving symptoms), if administered more frequently to the control group than to the treatment group, could obscure an experimental drug's true treatment effect (unless exacerbation requiring steroids were itself counted as an outcome).
Interventions other than the treatment under study, when differentially applied to the treatment and control groups, often are called "cointerventions". Cointervention is a more serious problem when double-blinding is absent, or when the use of very effective non-study treatments is permitted at the physicians' discretion. Clinicians gain greatest confidence in the results when permissible cointerventions are described in the methods section and documented to be infrequent occurrences in the results.
The foregoing five guides (two primary and three secondary), applied in sequence, will help the reader determine whether the results of an article on therapy are likely to be valid. If the results are valid, then the reader can proceed to consider the magnitude of the effect and the applicability to her patients.
Scenario Resolution
Readers may be interested in how well the trial of plasmapheresis in patients with lupus nephritis met the tests of validity. With respect to primary criteria, randomization was rigorously conducted, as treatment was assigned through a phone call to the study's Methods Center. One patient assigned to standard therapy was lost to followup, and all the other patients were analyzed in the group to which they had been assigned. With respect to secondary criteria, the study was not blinded, the two groups were comparable at the start of the trial, and the authors provide little information about comparability of other treatments.
In the introductory paper in this series, we described the concept of strength of inference. The final assessment of validity is never a "yes" or "no" decision and must, to some extent, be subjective. We judge that the methods in this trial were, overall, strong, and provide a valid start for deciding whether or not to administer plasmapheresis to our patient with severe lupus nephritis.
II. What were the results?
Clinical Scenario
You are a general internist who is asked to see a 65 year-old man with controlled hypertension and a six-month history of atrial fibrillation resistant to cardioversion. Although he has no evidence for valvular or coronary heart disease, the family physician who referred him to you wants your advice on whether the benefits of long-term anticoagulants (to reduce the risk of embolic stroke) outweigh their risks (of hemorrhage from anticoagulant therapy). The patient shares these concerns, and doesn't want to receive a treatment that would do more harm than good. You know that there have been randomized trials of warfarin for non-valvular atrial fibrillation, and decide that you'd better review one of them.
The Search
The ideal article addressing this clinical problem would include patients with non-valvular atrial fibrillation, and would compare the effect of warfarin and a control treatment, ideally a placebo, on the risk of emboli (including embolic stroke) and also on the risk of the complications of anticoagulation. Randomized, double-blind studies would provide the strongest evidence.
In the software program "Grateful Med" you select a Medical subject (MeSH) heading that identifies your population, "atrial fibrillation," another that specifies the intervention, "warfarin", and a third that specifies the outcome of interest, "stroke" (which the software automatically converts to "explode cerebrovascular disorders" meaning that all articles indexed under cerebrovascular disorders or its subheadings are potential targets of the search), while restricting the search to English-language studies. To ensure that, at least on your first pass, you identify only the highest quality studies, you include the methodological term "randomized controlled trial (PT)" (PT stands for publication type). The search yields nine articles. Three are editorials or commentaries, one addresses prognosis, and one focuses on quality of life on anticoagulants. You decide to read the most recent of the four randomized trials [21].
Reading the study, you find it meets the validity criteria you learned about in a prior article in this series [22]. To answer your patient's and the referring physician's concerns, however, you need to delve further into the relation between benefits and risks.
1. How large was the treatment effect?
Most frequently, randomized clinical trials carefully monitor how often patients experience some adverse event or outcome. Examples of these "dichotomous" outcomes ("yes" or "no" outcomes that either happen or don't happen) include cancer recurrence, myocardial infarction, and death. Patients either do or do not suffer an event, and the article reports the proportion of patients who develop such events. Consider, for example, a study in which 20% (0.20) of a control group died, but only 15% (0.15) of those receiving a new treatment died. How might these results be expressed? Table 2 provides a summary of ways of presenting the effects of therapy.
Table 2: Measures of the effects of therapy
Risk without therapy (Baseline risk):X 20/100=0.20 or 20%
Risk with therapy: Y 15/100=0.15 or 15%
Absolute Risk Reduction (Risk Difference): X - Y 0.20-0.15=0.05
Relative Risk: Y/X 0.15/0.20 = 0.75
Relative Risk Reduction (RRR):
[1-Y/X] x 100 or [(X-Y) / X] x 100 [1-0.75]x100=25% [0.05/0.20]x100=25%
95% Confidence Interval for the RRR -38% to +59%
One way would be as the absolute difference (known as the absolute risk reduction, or risk difference), between the proportion who died in the control group (X) and the proportion who died in the treatment group (Y), or X - Y = 0.20 - 0.15 = 0.05. Another way to express the impact of treatment would be as a relative risk: the risk of events among patients on the new treatment, relative to that among controls, or Y/X = 0.15 / 0.20 = 0.75.
The most commonly reported measure of dichotomous treatment effects is the complement of this relative risk, and is called the relative risk reduction (RRR). It is expressed as a per cent: (1 - Y/X) x 100% = (1 - 0.75) x 100% = 25%. A RRR of 25% means that the new treatment reduced the risk of death by 25% relative to that occurring among control patients; the greater the relative risk reduction, the more effective the therapy.
2. How precise was the estimate of treatment effect?
The true risk reduction can never be known; all we have is the estimate provided by rigorous controlled trials, and the best estimate of the true treatment effect is that observed in the trial. This estimate is called a "point estimate" in order to remind us that, although the true value lies somewhere in its neighbourhood, it is unlikely to be precisely correct. Investigators tell us the neighbourhood within which the true effect likely lies by the statistical strategy of calculating confidence intervals [23].
We usually (though arbitrarily) use the 95% confidence interval which can be simply interpreted as defining the range that includes the true relative risk reduction 95% of the time. You'll seldom find the true RRR toward the extremes of this interval, and you'll find the true RRR beyond these extremes only 5% of the time, a property of the confidence interval that relates closely to the conventional level of "statistical significance" of p 70%) [35] or a trial of asymptomatic patients with moderate stenoses [36] to their situation. However, it is entirely appropriate to extrapolate from the previously identified study [4] which enrolled symptomatic patients with similar degrees of stenoses as our patients.
We will now outline two approaches to addressing the latter two questions, our patient's risk of adverse events without treatment, and our patient's risk of harm with therapy. [21]
Approach 1: Generation of Patient-Specific Baseline Risks
Recognizing that patients are rarely identical to the average study patient, clinicians can derive estimates of the patient's baseline risk from various sources. First, if the study reports risk in various subgroups, they can use the baseline risk for the subgroup most like their patient. However, most trials are not large enough to allow the generation of precise estimates of baseline risk in various patient subgroups and one may have to search for systematic reviews (particularly those including individual patient data) [37] to glean useful information. For example, the AF investigators pooled the individual patient data from all of the randomized trials testing antithrombotic therapy in non-valvular atrial fibrillation and were able to provide estimates of prognosis for patients in clinically important subgroups.[24]
Second, as an extension of the subgroup approach, one can use clinical prediction guides to quantitate an individual patient's potential for benefit (and harm) from therapy. [32] [38] [39] Returning to our example, a prognostic model that could identify patients with carotid stenosis most likely to benefit from endarterectomy would be very useful. Such a model would need to incorporate the risk of stroke without surgery (and thus the potential benefit from surgery) with the risk of stroke or other adverse outcomes from surgery. Using the European Carotid Surgery Trial database [40], investigators have developed a preliminary version of just such a model. [41] However, our enthusiasm for applying this clinical prediction guide should be tempered until it has been prospectively validated in a different group of patients (and preferably with different clinicians). [38]
Third, one could derive an estimate of their patient's baseline risk from published papers (preferably population-based cohort studies) [42] that describe the prognosis of similar (untreated) patients. For example, analysis of the Malmo Stroke Registry demonstrated that in the three years after a stroke, patients have a 6% risk of recurrent nonfatal stroke and a 43% risk of death; these risks were even higher in older patients or those with diabetes mellitus or cardiac disease. [43]
Analogous to the estimation of patient-specific baseline risk, clinicians can use these same sources of information to determine an individual patient's likelihood of harm from treatment. For example, a systematic review of 36 studies relating the risk of peri-operative complications from carotid endarterectomy to various pre-operative clinical characteristics revealed that women were at higher risk than men (odds ratio 1.44 [95% CI 1.14 to 1.83], absolute rate 5.2%). [44]
The final step in generating a patient-specific NNT (or NNH) involves the formula: NNT=1/(PEER x RRR) (where PEER= the patient's estimated event rate, or baseline risk). [21] Given the three year risk of recurrent disabling stroke in diabetic patients from the Malmo Stroke Registry (8.4%) and the 49% RRR expected with carotid endarterectomy, the patient-specific NNT in a 65 year old diabetic with ipsilateral carotid stenosis and a minor stroke would be calculated as: NNT=1/(0.084 x 0.49)=24. Clinicians who know a patient's baseline risk and RRR can also call on a nomogram to calculate the NNT. [45]
Approach 2: Clinical Judgment
Alternately, one can use the study NNT and NNH directly to generate patient-specific estimates. This method involves only two steps and is less time-consuming than the previous method (as, depending on the experience of the clinician, it may not require detailed literature review).
First, the clinician estimates the patient's risk of the outcome event relative to that of the average control patient in the study and converts this risk to a decimal fraction (= 'ft'). [46] Thus, patients judged to be at less risk than those in the trials will be assigned an ft 1. There are several sources that a clinician could use to obtain a value for 'ft'. The best estimate would come from a systematic review of all available data about the prognosis of similar patients; individual studies about prognosis would provide the next best estimates. Alternatively, she could use her clinical expertise in assigning a value to 'ft'. While this may appear to be overly subjective, preliminary data suggests that experienced clinicians may be accurate in estimating relative differences in baseline risk (ie. ft) between patients (far exceeding our abilities to judge absolute risks). [47]
Second, the clinician calculates the patient-specific NNT by dividing the average NNT by 'ft'. Thus, if the clinician felt that patient A was at one-fifth the risk of the average patient in the trial (based on the reduced baseline risk for women demonstrated in the subgroup analyses reported by the investigators) [4], her patient specific NNT for the prevention of one disabling stroke would be 100 (20/0.2).
In addition to considering the benefits from therapy, the clinician needs to consider a patient's risk of adverse events from any intervention. Patients A and B need to be informed that carotid endarterectomy does carry with it a risk of peri-operative death. To individualize your patient's risk of death, you can use the 'f' method just described. For example, patient A may be assumed to be at twice the risk (fh = 2) of peri-operative death as patients in the control group of the study because of her gender, hypertension, and the fact that she has left-sided carotid artery stenosis. [4] [44] You can adjust the NNH using 'fh', assuming the relative risk increase is constant across the spectrum of susceptibilities (an assumption which, as we've noted for RRR, may or may not hold depending on the particular therapy being considered). Thus, patient A's NNH is estimated to be 32 (63/2).
Incorporating Patient Values and Preferences
We have determined the risks of benefit and harm for the individual, but we must still incorporate patient values into the decision-making process. As outlined in a previous Users' Guide, [9] systematically-constructed decision analyses and practice guidelines that include an explicit statement of values can be used to integrate the evidence on benefit/harm with patient values to reach treatment recommendations or establish threshold NNTs. [9] [48] Although this situation would be ideal, such evidence is often not available (we could not identify a relevant decision analysis for our scenario). Moreover, as there is often substantial variation in values between individuals, [49] [50] [51] decision analyses which rely on group averages for values may not always be applicable to a particular patient, although close examination of the utility sensitivity analyses can often provide some guidance. [52] [53] [54]
While active patient involvement in decision making can improve outcomes and reported quality of life, and possibly reduce health care expenditures, [55] [56] [57] [58] [59] [60] [61] the initial step in this process is to determine the extent to which your patient wants to be involved with decision-making (recognizing that this may vary with each clinical decision).
How Much Do Patients Want to Participate?
There are 3 main elements to clinical decision making: the disclosure of information (about the risks and benefits of therapeutic alternatives); the exploration of the patient's values about both the therapy and the potential health outcomes; and, the actual decision. Each patient varies in their desired level of involvement with these steps and clinicians may not accurately gauge the degree to which an individual patient wants to be involved. [62] [63] [64] [65] [66] [67] Some patients may want all available information provided to them and to make the decision themselves with the clinician's role being that of information provider. Other patients may want all the information provided but may want the clinician to make the final decision. Still others may want to collaborate with their clinician in the process. These differences emphasize the need for clinicians to accurately assess patient preferences for information, discussion and decision-making, and tailor their approach to the individual.
Regardless of whether the clinician, the patient, or the partnership will make the decision, clinicians must explore patients' values about the therapy and the potential health outcomes. You can elicit your patient's values in informal ways during exploratory discussions with him/her or by more formal (and time-consuming) methods such as the time-tradeoff, standard gamble or rating scale techniques. [68]
Decision Aids
If your patient's goal is shared decision making, there are several models for providing shared decision-making support. First, formal clinical decision analysis, incorporating that patient's likelihood of the outcome events with their own values for each health state, could be used to guide the decision. Performing a clinical decision analysis for each patient would be too time-consuming for the busy clinician, and this approach therefore currently relies on finding an existing decision analysis. While this is the case, either our patient's values must approximate those in the analysis, or the decision analysis must provide information about the impact of variation in patient values. Computer models available at the bedside may broaden the scope of decision analysis applicability, and permit wider use with individual patients. [69]
Second, investigators have developed numerical methods of presenting information to patients that incorporate calculated patient values though these methods haven't been fully tested. [39] [70] Third, clinicians can utilize "decision aids" that present descriptive and probabilistic information about the disease, treatment options, and potential outcomes. [71] [72] [73] Most commonly, these decision aids present the outcome data in terms of the percentage of people with a certain condition who do well without intervention compared to the percentage who do well with intervention. While each of these methods has considerable merit, they sometimes fall short in terms of comprehensibility, applicability, and efficiency for use on busy clinical services.
The Likelihood of Being Helped or Harmed
One method of expressing information to patients that incorporates their values, can be applied to any clinical decision, and which preliminary evidence suggests may be useful on busy clinical services is the likelihood of being helped versus harmed. [74] The first step in this method is the exploration of patient values about taking the treatment (relative to not taking it) and the severity of adverse events that might be caused by the treatment (relative to the severity of the target event that we hope to avoid with the treatment). To answer these questions, patients are provided with brief descriptions of both the target event we'd like to prevent and the potential adverse event from the treatment. [Table 4]
Table 4: Sample descriptions of stroke and death
A stroke can result in weakness and loss of function in one side of your body. With a disabling stroke, you are admitted to a hospital for initial treatment and then transferred to a rehabilitation hospital for at least two months of intense rehabilitation. You regain some movement in your arm and leg but are left with a permanent weakness in that side of your body and require assistance with activities of daily living such as getting dressed, taking a bath, cooking, eating and using a toilet. You have trouble getting the words out when you speak.
A surgical procedure called carotid endarterectomy can decrease the risk of disabling stroke but can result in death. Death is more likely to occur in the first 30 days after this surgical procedure.
Following the review of the description of the target event, the clinician presents the patient with a rating scale (anchored at 0 [=death] and 1 [=full health]) and asks her to place a mark where she would consider the value of the target event.
During your discussions with Patient A, you discover that she is a fiercely independent newspaper journalist who lives alone and previously cared for her father after he suffered a disabling stroke. She believes that a disabling stroke is almost as bad as immediate death and assigns it a value of 0.025. Similarly, you give your patient the description of the adverse event that could result from the therapy (death within 30 days of surgery) and ask her to assess this using the rating scale (she assigned a value of 0.15 since death may not necessarily be immediate). Using the two ratings, you could infer that she believes a disabling stroke to be six times worse than death within the next month (0.15/0.025). This exercise should be repeated on another occasion to confirm that her values are stable.
In contrast, during your conversation with Patient B, you find that he is a former truck driver who recently retired to the country with his wife so that he could be near his daughter and grandson. When you explore his values, he decides that death is 8 times worse than having a disabling stroke.
How can you now incorporate your individual patient's values into the description of therapy? The average patient with a hemispheric stroke has a 10.3% chance of having a disabling stroke over 5 years, [Table 1] but this can be decreased for patients with ipsilateral moderate carotid stenosis to 5.3% with carotid endarterectomy. [4] The average NNT for such patients is 20. The absolute risk increase for death for patients having carotid endarterectomy is 1.6% [19], which translates to an average NNH of 63 (1/0.016).
To calculate the likelihood of being helped versus harmed (LHH), 1/NNT (=ARR) and 1/NNH (=ARI) are combined into an aggregate ratio. For both patients, the first approximation of the LHH is given by: LHH = (1/NNT) : (1/NNH) = (1/20) : (1/63) = 3 to 1 in favor of surgery. As a first approximation, both patients can be told that 'carotid endarterectomy is three times as likely to help you as harm you'.
However, this first approximation ignores both patients' unique individual risks of, and values for, stroke and perioperative death. You can particularize the LHH for each patient using the 'f' factors we described previously. As discussed above, women have a lower risk of stroke and the 'ft' for Patient A can be estimated at approximately 0.2. [4] This study (and a systematic review of other studies) [44] found that women, patients with left sided carotid disease, and patients with a history of hypertension have increased risks of perioperative deaths (relative risks ranging from 1.4 to 2.3). Thus, Patient A is at an increased risk of death from surgery (fh = 2). Her risk-adjusted LHH is: LHHA = (1/NNT) x ft : (1/NNH) x fh = (1/20) x 0.2 : (1/63) x 2 = 3 to 1 in favor of medical therapy . Similarly, the LHH for Patient B can be particularized for his unique risks. Men had a greater risk of stroke in the trial[4] and you can estimate from the reported subgroup analyses that Patient B's ft is approximately 1.25. Patient B also has left-sided carotid disease, suggesting that his risk of perioperative death is increased (fh = 2). His risk-adjusted LHH is: LHHB = (1/20) x 1.25 : (1/63) x 2 = 2 to 1 in favor of surgery.
These risk-adjusted LHHs still ignore each patient's values. Patient A ranked a disabling stroke as 6 times worse than death and this number (the 's' or 'severity' factor) can be used to adjust the LHH as follows: LHHA= (1/NNT) x ft x s: (1/NNH) x fh = (1/20) x 0.2 x 6 : (1/63) x 2 = 2 to 1 in favor of surgery. Thus, incorporating Patient A's values and unique risks of benefit and harm, she is twice as likely to be helped as harmed by surgery. On the other hand, Patient B stated that death was 8 times worse than a stroke and incorporating this into his LHH you calculate: LHHB = (1/20) x 1.25 : (1/63) x 2 x 8 = 4 to 1 in favor of medical therapy.
These two cases illustrate how to incorporate your patient's values into the decision-making process. At present, this process is time consuming and inexact, and we don't know how much difference it makes to patients or their clinical outcomes, so this approach is best considered as a logical and feasible, but untested, model. If you are unsure of your patient's 'f' or if there is some uncertainty around your patient's estimate of values, you could do a sensitivity analysis (inserting different values for these variables into the above equation to see how this is reflected in the LHH). We've described a simple formulation for the LHH (ignoring other outcomes from carotid endarterectomy and the risks of the diagnostic workup) [75], but this could be modified for more complex situations.
II. What are the Results?
5. What were the incremental costs and effects of each strategy?
Let us start with the incremental costs. Look in the text and tables for the listings of all the costs considered for each treatment option and remember that costs are the product of the quantity of a resource used and its unit price. These should include the costs incurred to 'produce' the treatment such as the physician's time, nurse's time, materials, etc. which we might term the 'up-front costs'; as well as the 'downstream costs' due to resources consumed in the future and associated with clinical events that are attributable to the therapy.
The study by Mark et al [2] quantifies resources used by treatment group in three periods of time over one year; initial hospitalization, discharge to 6 months and 6 months to one year. Both treatment groups were very similar in their use of hospital resources over the year; both experienced mean length of stay of 8 days of which 3.5 were in ICU and both groups had the same rate of CABG (13%) and PTCA (31%) on initial hospitalization. As summarized in Table 2, the one-year health care costs, excluding the thrombolytic agent, were $24,990 per tPA-treated patient and $24,575 per streptokinase-treated patient. As is clear from Table 2, the main cost difference between the two groups is the cost of the thrombolytic drugs themselves $2,750 for tPA and $320 for streptokinase. The overall difference in cost between tPA-treated and streptokinase-treated patients is therefore our incremental cost at $2,845 over the first year. This is discounted at 5% per annum for a final figure of $2,760. The authors argue that there is no cost difference between the two groups after one year. These data for incremental costs from tPA are very similar to those estimated by Kalish [3] who found a difference of $2,535 in the use of tPA to manage MI in preference to streptokinase.
The measure of effectiveness chosen in the Mark et al [2] study is the gain in life expectancy associated with tPA. The available follow-up experience was to one year, with 89.9% surviving in the streptokinase group versus 91.1% in the tPA group (p < 0.001). To translate these observations into life expectancy gains, the authors project survival curves for another 30 years or more using first a 14-year MI survivorship database from Duke University and then an assumption that survivorship will follow a statistical distribution (Gompertz). Having projected two survival curves, the authors calculate the area under each curve, which represents the expected value of survival time or life expectancy. For tPA patients life expectancy was 15.41 years and for streptokinase 15.27 years. As summarized in Table 2, the difference in life expectancy is 0.14 years per patient; or phrased another way, for every 100 patients treated with tPA in preference to streptokinase we would expect to gain 14 years of life.
In other situations, quantifying incremental effectiveness may be more difficult. Not all treatments change survival, and those that do not may affect different dimensions of health in many ways. For example, drug treatment of asymptomatic hypertension may result in short-term health reductions from drug side-effects, in exchange for long-term expected health improvements, such as reduced risk of strokes. Note that in our tPA example the outcome is not unambiguously restricted to survival benefit because there is a small but statistically significant increased risk of non-fatal hemorrhagic stroke associated with tPA [1]. The existence of trade-offs between different aspects of health, or between length of life versus quality of life, means that to arrive at a summary measure of net effectiveness, we must implicitly or explicitly weight the 'desirability' of different outcomes relative to each other.
There is a large and growing literature on quantitative approaches for combining multiple health outcomes into a single metric using patient preferences [32]. Foremost among current practice is the construction of quality-adjusted life years (QALYs) as a measure that captures the impact of therapies in the two broad domains of survival and quality-of-life. (QALYs were described in more detail earlier in this series [10].) Alternative approaches include the Healthy Year Equivalent method [33].
Our second thrombolytic study by Kalish et al [3] used QALYs as their primary measure of effectiveness. First they took the same one-year survival probabilities from the GUSTO study and projected them forward to estimate life expectancy using data from a different longitudinal study, the Worcester Heart Attack Study. Similar to Mark et al [2] they estimate that the average life span after MI is 14.6 years and then used GUSTO risk reductions to estimate life expectancy difference for tPA and streptokinase patients.
To derive QALYs they applied utility weights (from death=0 to healthy=1) to patients surviving the MI but sustaining morbid events over time such as non-fatal stroke (utility of 0.79) or reinfarction (utility of 0.93). These utility weights were taken from the literature, based on preference measurements undertaken in the GISSI-2 trial [34]. However, given the small differences between treatment groups in risk of morbid events that receive quality-adjustment in survival, although the total number of future QALYs is fewer than unadjusted life years at 8.842 for streptokinase and 8.926 for tPA, the difference in QALYs (0.084), using 30 day GUSTO survival data, is identical to the effect calculated by Mark et al [2] using unadjusted life expectancy.
In summary, both studies use the efficacy data from the GUSTO trial as their starting point to conclude that tPA treatment is more costly than streptokinase but that it provides an increase in survival (quality- adjusted or otherwise). Table 2, using Mark et al data, illustrates the next calculation in both studies which determines the incremental cost-effectiveness ratio for tPA. After discounting future costs and effects at 5% per year to reflect time preference (for rationale, see our first paper [35]), the difference (tPA minus streptokinase) in cost per patient over the year (and by extension into the future because they assume no cost differences beyond one year) is $2,760, which is divided by the difference in life expectancy per patient (0.084) to yield a ratio of $32,678 per year of life gained.
A simple interpretation of this ratio is that it is the 'price' at which we are buying additional years of life by using tPA in preference to streptokinase; the lower this price, the more attractive is the use of tPA. The Kalish study [3] reaches a similar incremental cost-effectiveness ratio (with their adjusted denominator of QALYs and using the 30-day risk reduction GUSTO data) of $30,300 per QALY. These are the main results of the studies; we will discuss their interpretation later in this article.
In current health care practice, judgements often reflect clinician or societal values concerning whether intervention benefits are worth the cost. Consider the decisions regarding administration of tissue plasminogen activator (tPA) versus streptokinase to patients with acute myocardial infarction, or clopidogrel versus aspirin to patients with a transient ischemic attack. In both cases, evidence from large randomized trials suggests the more expensive agents are, for many patients, more effective. In both cases, many authoritative bodies recommend first-line treatment with the less effective drug, presumably because they believe society's resources would be better used in other ways. Implicitly, they are making a value or preference judgement about the trade-off between deaths and strokes prevented, and resources spent.
By values and preferences, we mean the underlying processes we bring to bear in weighing what our patients and our society will gain, or lose, when we make a management decision. A number of the Users' Guides focus on how clinicians can use research results to clearly understand the magnitude of potential benefits and risks associated with alternative management strategies.[6] [7] [8] [9] [10] Three guides focus on the process of balancing those benefits and risks when using treatment recommendations [11] [12] and in making individual treatment decisions.[13] The explicit enumeration and balancing of benefits and risks brings the underlying value judgements involved in making management decisions into bold relief.
Acknowledging that values play a role in every important patient care decision highlights our limited understanding of eliciting and incorporating societal and individual values. Health economists have played a major role in developing a science of measuring patient preferences.[14] [15] Some decision aids incorporate patient values indirectly: if patients truly understand the potential risks and benefits, their decisions will likely reflect their preferences.[16] These developments constitute a promising start. Nevertheless, many unanswered questions concerning how to elicit preferences, and how to incorporate them in clinical encounters already subject to crushing time pressures, remain. Addressing these issues constitutes an enormously challenging frontier for EBM.
Clinical Scenario
You are a primary care practitioner considering the possibility of anticoagulant therapy with warfarin in a new patient, a 76 year old woman with chronic congestive heart failure and atrial fibrillation. The patient has no hypertension, valvular disease, or other comorbidity. Aspirin is the only antithrombotic agent that the patient has received over the 10 years during which she was in atrial fibrillation. Her other medications include captopril, furosemide, and metoprolol. The duration of the patient's atrial fibrillation, and her dilated left atrium on echocardiogram, dissuade you from prescribing antiarrhythmic therapy. Discussing the issue with the patient, you find she places a high value in avoiding a stroke, a somewhat lower value in avoiding a major bleed, and would accept the inconvenience associated with monitoring anticoagulant therapy.
You have little inclination to review the voluminous original literature relating to the benefits of anticoagulant therapy in reducing stroke or its risk of bleeding, but hope to find an evidence-based recommendation to guide your advice to the patient. In your office file relating to this problem you find a report of a primary study [1], a decision analysis [2], and a recent practice guideline [3] that you hope will help.
Introduction
Each day, clinicians make dozens of patient management decisions. Some are relatively inconsequential, some are important. Each one involves weighing benefits and risks, gains and losses, and recommending or instituting a course of action judged to be in the patient's best interest. These decisions involve an implicit consideration of the relevant evidence, an intuitive integration of the evidence, and a weighing of the likely benefits and harms. In making choices, clinicians may benefit from structured summaries of the options and outcomes, systematic reviews of the evidence regarding the relation between options and outcomes, and recommendations regarding the best choices. This Users' Guide explores the process of developing recommendations, suggests how the process may be conducted systematically, and introduces a taxonomy for differentiating recommendations that are more rigorous (and thus more likely to be trustworthy) from those that are less rigorous (and thus at greater risk of being misleading).
While recommendations may be directed at health policy makers, our focus is advice for practicing clinicians. We will begin by considering the implicit steps that are involved in making a recommendation.
The Process of Developing a Recommendation
Figure 1 presents the steps involved in developing a recommendation, and the formal strategies that are available. The first step in clinical decision-making is to define the decision. This involves specifying the alternative courses of action, and the alternative outcomes. Often, treatments are designed to delay or prevent an adverse outcome such as stroke, death, or myocardial infarction. In our discussion, we will refer to the outcomes that treatment is designed to prevent as "target outcomes". Treatments are associated with their own adverse outcomes -- side effects or toxicity. Ideally, the definition of the decision will be comprehensive -- all reasonable alternatives will be considered, and all possible beneficial and adverse outcomes will be identified. In patients like the woman in the scenario with non-valvular atrial fibrillation options include not treating, giving aspirin, or anticoagulating with warfarin. Outcomes include minor and major embolic stroke, intracranial haemorrhage, gastrointestinal haemorrhage, minor bleeding, and the inconvenience associated with taking and monitoring medication.
Figure 1: A Schematic View of the Process of Developing a Treatment Recommendation
Having identified the options and outcomes, decision-makers must evaluate the links between the two -- what will the alternative management strategies yield in terms of benefit and harm [4]. They must also consider how this impact is likely to vary in different groups of patients [5]. Having made estimates of the consequences of alternative strategies, value judgements about the relative desirability or undesirability of possible outcomes becomes necessary to allow treatment recommendations. We will use the term "preferences" synonymously with "values" or "value judgements" in referring to the process of trading off positive and negative consequences of alternative management strategies.
Recently, investigators have applied scientific principles to the collection, selection, and summarization of evidence, and the valuing of outcomes. We will briefly describe these systematic approaches.
Linking Management Options and Outcomes -- Systematic Reviews
Unsystematic identification and collection of evidence risks biased ascertainment -- treatment effects may be under-, or more commonly, overestimated and side effects may be exaggerated or ignored. Unsystematic summaries of data run similar risks of bias. One result of these unsystematic approaches may be recommendations advocating harmful treatments, and failing to encourage effective therapy. For example, experts advocated routine use of lidocaine for patients with acute myocardial infarction when available data suggested the intervention was ineffective and possibly even harmful, and failed to recommend thrombolytic agents when data showed patient benefit [6].
Systematic reviews deal with this problem by explicitly stating inclusion and exclusion criteria for evidence to be considered, conducting a comprehensive search for the evidence, and summarizing the results according to explicit rules that include examining how effects may vary in different patient sub-groups [7] [8]. When a systematic review pools data across studies to provide a quantitative estimate of overall treatment effect we call it a meta-analysis. Systematic reviews provide strong evidence when the quality of the primary studies is high and sample sizes are large, and less strong evidence when designs are weaker and sample sizes small. Because judgement is involved in many steps in a systematic review (including specifying inclusion and exclusion criteria, applying these criteria to potentially eligible studies, evaluating the methodological quality of the primary studies, and selecting an approach to data analysis) systematic reviews are not immune from bias. Nevertheless, in their rigorous approach to collecting and summarizing data, systematic reviews reduce the likelihood of bias in estimating the causal links between management options and patient outcomes.
Decision Analysis
Rigorous decision analysis provides a formal structure for integrating the evidence about the beneficial and harmful effects of treatment options with the values or preferences associated with those beneficial and harmful effects. When done well, a decision analysis will use systematic reviews of the best evidence to estimate the probabilities of the outcomes and use appropriate sources of preferences (those of society, or of relevant patient groups) to generate treatment recommendations [9] [10]. When a decision analysis includes costs among the outcomes, it becomes an economic analysis, and summarizes tradeoffs between gains (typically valued in quality-adjusted life-years) and resource expenditure (valued in dollars) [11] [12]. A decision analysis will be open to bias if it fails criteria for a systematic overview in accumulating and summarizing evidence, or uses preferences that are arbitrary or come from small or unrepresentative populations (such as a small group of health-care providers).
Practice Guidelines
Practice guidelines provide an alternative structure for integrating evidence and applying values to reach treatment recommendations. Practice guideline methodology places less emphasis on precise quantitation than does decision analysis. Instead, it relies on the consensus of a group of decision-makers, ideally including experts, front-line clinicians, and patients, who carefully consider the evidence and decide on its implications. Rigorous practice guidelines will also use systematic reviews to summarize evidence, and sensible strategies to attribute values to alternative outcomes as they generate treatment recommendations [13] [14]. Guidelines developers may focus on local circumstances. For example, clinicians practicing in rural parts of less industrialized countries without resource to monitor its intensity may reject anticoagulation as a management approach for patients with atrial fibrillation. Practice guidelines may fail methodologic standards in the same ways as decision analyses.
We will now contrast these systematic approaches to developing recommendations with historical practice
Current Sources of Treatment Recommendations
Traditionally, authors of original, or primary, research into therapeutic interventions include recommendations about the use of these interventions in clinical practice in the discussion section of their papers. Authors of systematic reviews and meta-analyses also tend to provide their impressions of the management implications of their studies. Typically, however, individual trials or overviews do not consider all possible management options, but focus on a comparison of two or three alternatives. They may also fail to identify subpopulations in which the impact of treatment may vary considerably. Finally, when the authors of overviews provide recommendations, they are not typically grounded in an explicit presentation of societal or patient preferences.
Failure to consider these issues may lead to variability in recommendations given the same data. For example, a number of meta-analyses of selective decontamination of the gut using antibiotic prophylaxis for pneumonia in critically ill patients with very similar results regarding the impact of treatment on target outcomes resulted in recommendations varying from suggesting implementation, to equivocation, to rejecting implementation [15] [16] [17] [18]. Varying recommendations reflect the fact that both investigators reporting primary studies and meta-analysts often make their recommendations without benefit of an explicit standardized process or set of rules. When benefits or risks are dramatic, and these benefits and risks are essentially homogeneous across an entire population, intuition may provide an adequate guide to making treatment recommendations. Such situations are unusual. In most instances, because of their susceptibility to both bias and random error, intuitive recommendations risk misleading the clinician.
These considerations suggest that when clinicians examine treatment recommendations, they should critically evaluate the methodologic quality of the recommendations. The greater the extent to which recommendations adhere to the methodologic standards we have mentioned, the greater faith clinicians may place in those recommendations [Table 1]. Table 2 presents a scheme for classifying the methodological quality of treatment recommendations, emphasizing the three key components: consideration of all relevant options and outcomes, a systematic summary of the evidence, and explicit and/or quantitative consideration of societal or patient preferences. In the next section of the text, we will describe the rating system summarized in Table 2.
Table 1: Methodologic Requirements for Systematic, Rigorous Recommendations
• Comprehensive statement of management options and possible outcomes.
• Systematic review and summary of evidence linking options to outcomes. Examination of the magnitude of impact, in terms of both benefits and risks, in relative and absolute terms.
• Consideration of different populations, and the characteristics of these populations, that may effect impact of intervention.
• Examination of strength of evidence linking options to outcomes. Where evidence is weak, examine the implications of plausible differences in effects.
• Explicit, appropriate specification of values or preferences associated with outcomes.
Table 2: A hierarchy of rigour in making treatment recommendations
Level of Systematic Summary Considers all relevant Explicit Statement Example
Rigour of Evidence options and outcomes of Values Methodologies
High Yes Yes Yes Practice
guidelines
or decision
analysis* Intermediate Yes Yes or No No Systematic review*
Low No Yes or No No Traditional reviews;
original articles *
Example methodologies may not reflect the level of rigour shown. Exceptions may occur in either direction. For example, if a practice guideline or decision analysis neither systematically collects and summarizes information, nor explicitly considers societal or patients' values, it will produce recommendations which are of Low rigour. If a systematic review does consider all relevant options and at least qualitatively considers values, it can produce recommendations approaching High rigour.
Making recommendations: a hierarchy of rigour
Systematic summary of evidence for all relevant interventions using appropriate values
Quantitative summary of evidence and values
The most rigorous approach to making recommendations (which we will call a systematic synthesis) involves precisely quantifying all benefits and risks; determining the values of either a group of patients or the general population; where uncertainty exists making a systematic and quantitative exploration of the range of possible true values; and using quantitative methods to synthesize the data. One approach to meeting these criteria involves conducting a formal decision analysis. Many decision analyses fail to carry out each step in the process in an optimally rigorous fashion; to do so usually requires a major research project [9] [10].
Challenges for decision analysts include conducting the systematic reviews required to generate the best estimates of benefits and risks associated with treatment options, and measuring how the general public or patients value the relevant outcomes. Typically, a decision analysis values each treatment arm in terms of quality-adjusted life years. When costs are considered, the decision analysis becomes an economic analysis, and we think in terms of additional dollars spent to gain an additional quality-adjusted life year. The optimal therapy, or the cost-effectiveness of alternatives, may differ depending on untreated patients' risk of the target outcome.
What a decision analysis or economic analysis usually does not do is to value the benefits, risks and costs and provide an explicit threshold for decision-making. For example, a new treatment might cost $50,000 per quality adjusted life years gained. Is this a bargain, or too great a cost to warrant treatment? Often, decision analysts will refer to the cost/effectiveness or cost/utility ratios of currently-used treatments to help with this decision. For instance, the decision analysis from the scenario in this article concluded that while the cost of warfarin for patients with at least one factor increasing their risk of embolism was $8,000 per quality-adjusted life-year saved, the cost was $375,000 per quality-adjusted life-year saved for a 65-year old with no risk factors [2]. The authors compared these figures to the $50,000 to $100,000 cost per quality-adjusted life-year gained when screening adults for hypertension.
Quantitative summary of evidence and values: Explicit Decision Thresholds
Investigators can use the principles of decision analysis to arrive at explicit decision thresholds and present these thresholds in ways that facilitate clinicians' understanding. One such approach involves the number of patients to whom one must administer an intervention to prevent a single target event, the Number Needed to Treat (NNT) [19]. Typically, the NNT falls as patients' risk of an adverse outcome rises, and may become extremely large when patients are at very low risk. In a previous Users' Guide, we have described the threshold NNT [20], the dividing line between when treatment is warranted (the NNT is low enough that the benefits outweigh the costs and risks), and when it is not (the NNT is too great to warrant treatment). Deriving the threshold NNT involves specifying the relative value associated with preventing the target outcome versus incurring the side effects and inconvenience associated with treatment [21].
Investigators using this approach may also consider costs. If so, they face the additional requirement of specifying the number of dollars one would be willing to pay to prevent a single target event. With or without considering costs, investigators can plug the values they adduce into an equation that generates the threshold NNT[20]. They can then look at the risk of the target outcome in untreated subpopulations to whom clinicians might consider administering the intervention. Combining this information with the relative risk reduction associated with the treatment, they can determine on which side of the threshold the treatment falls.
Returning to our example, warfarin decreases the risk of stroke in patients with non-valvular atrial fibrillation. Since anticoagulation increases bleeding risk it is not self-evident that we should be recommending the treatment for our patients and must find a way of trading off decreased stroke and increased bleeds. We can calculate the threshold NNT by specifying the major adverse outcome of treatment, bleeding, and the frequency with which it occurs due to treatment. We then specify the impact of these deleterious effects relative to the target event the treatment prevents, a stroke. A variety of studies of relevant patient populations [22] [23] [24] [25] suggest that on average, patients consider 1 severe stroke equivalent to 5 episodes of serious gastrointestinal bleeding. We use these figures to calculate our threshold NNT which proves to be approximately 152 [Figure 2]. This implies that if we need to anticoagulate less than 152 patients to prevent a stroke, we will do so; if we must anticoagulate more than 152 patients, then our recommendation will be to not treat.
Figure 2: Calculating the Treshold Number Needed to Treat (T-NNT) for Warfarin Treatment of Patients With Nonvalvular Atrial Fibrillation
The threshold NNT then facilitates recommendations for specific patient groups. Table 3 summarizes the calculation of the NNT, and the associated comparison with the threshold, for two groups of patients. A meta-analysis of randomized trials tells us that anticoagulation reduces the risk of stroke by 68% (95% confidence interval 50% to 79%), and that this risk reduction is consistent across clinical trials [26]. The meta-analysis also provides risk estimates for different groups of patients with strokes. Patients over 75 with any of previous cerebrovascular events, diabetes, hypertension, or heart disease have a stroke risk of approximately 8.1% per year. Anticoagulation reduces this risk to 2.6% with an NNT of 1 / 0.055 or approximately 18 per year. The NNT for this group is appreciably lower than the threshold NNT suggesting that such patients should be treated. Patients less than 65 with no risk factors have a one-year stroke risk of 1% which anticoagulation reduces to 0.32%. The associated NNT of 146 approximates the threshold NNT of 153 and suggests the decision about whether or not to treat is a toss-up.
Table 3: Using the NNT to make treatment recommendations
Patient group Risk of stroke Relative risk reduction Group's absolute Group's NNT
without treatment with warfarin risk reduction (Threshold NNT)
Under 65,
no risk factors 1% 68% 0.68% 146 (Cost
omitted:152; Cost
considered:53)
Previous thrombo
embolic event 8.1% 68% 5.5% 18 (Cost
omitted:152; Cost
considered:53)
Clinicians or health-care decision makers interested in considering costs in their decisions can look for help from the model. Costs can be included by specifying the dollar value associated with preventing adverse outcomes (for example, Laupacis and colleagues have suggested the most that society might be willing to pay to gain a quality-adjusted life year is $100,000 [27]). When we consider costs as calculated in the decision analysis from the patient scenario [2], we arrive at an threshold NNT of 53, suggesting a more conservative approach to anticoagulant administration.
Investigators can use units other than NNT to develop clinically useful decision thresholds. For example, for 81 patients previously treated with cis-platinum based chemotherapy, the average minimum gain in survival which was felt to make the chemotherapy worthwhile was 4.5 months for mild toxicity and 9 months for severe toxicity [28]. Such a threshold could be integrated with information about the actual gain in life associated with the treatment to help form the basis for a recommendation about use of cis-platinum therapy.
Like other quantitative approaches, considering NNT and the threshold NNT, or alternative thresholds, is intended to supplement clinical judgement, not replace it. Investigators exploring different treatment choices have found the methodology useful [29]. However clinicians use it, the approach highlights the necessity for both valuing the benefits and risks of treatment, and understanding the magnitude of those benefits and risks, in making a treatment decision.
Quantitative summary of evidence, qualitative summary of preferences
Practice guidelines, if they are to minimize bias, should not substitute expert opinion for a systematic review of the literature, an explicit and sensible process for valuing outcomes, an explicit consideration of the impact of uncertainty associated with the evidence and values used in the guidelines and an explicit statement of the strength of evidence supporting the guideline. When a practice guideline meets these methodological standards, and thereby minimizes bias, we refer to the guideline as "evidence-based" [Table 1].
Once they have the evidence, investigators and clinicians are often uncomfortable with explicitly specifying preferences in moving from evidence to action. Their reluctance is understandable. Specifying a specific tradeoff between, say, a stroke and a gastrointestinal bleed is not an exercise with which we are familiar. People may feel that identifying a specific value -- a stroke is equivalent to 2.5 gastrointestinal bleeds, for instance -- implies more precision than is realistic. Discomfort may rise further when we specify a dollar value associated with preventing an adverse event.
This may be one reason that participants in the development of rigorous practice guidelines, including experts in the content area, methodologists, community practitioners, and patients and their representatives, seldom use numbers to identify the value judgements they are making. Still, a rigorous guideline will establish, reflect, and make explicit the community and patient values on which the recommendation is based.
Most practice guidelines fail to systematically summarize the evidence. Even those that meet criteria for evidence accumulation and summarization do not usually make their underlying values explicit. Guidelines that do not meet either set of criteria produce recommendations of low methodologic rigour.
Practice guidelines that meet the criteria in Table 1 provide an alternative to quantitative strategies to arrive at a systematic synthesis.
Systematic review of evidence, unsystematic application of values
Traditionally, investigators provide their results and then make an intuitive recommendation about the action that they believe should follow from their evidence. They may do so without considering all treatment options, or all outcomes (Table 2). Even when they consider all relevant treatments and outcomes, they may fail to use community or patient values, or even to make the values they are using explicit. For instance, the authors of a meta-analysis of antithrombotic therapy in atrial fibrillation stated "about one patient in seven in the combined study cohort were at such low risk of stroke (1% per year) that chronic anticoagulation is not warranted." [26] Here, the relative value of stroke and gastrointestinal bleeding is implicit in the recommendation. The nature of the value judgement is not transparent, and we have no guarantee that the implicit values reflect those of our patient or community.
Clinicians faced with such recommendations need to take care that they are aware of all relevant outcomes, both reductions in targets and treatment-related adverse events, and are aware of the relative values implied in the treatment recommendations.
Unsystematic review, unsystematic synthesis
The unsystematic approach represents the traditional strategy of accumulating evidence and summarizing evidence in an unsystematic fashion, and then applying implicit preferences to arrive at a treatment recommendation. The approach is open to bias, and is likely to lead to consistent, valid recommendations only when the gradient between beneficial and adverse consequences of alternative actions is very large.
Intermediate Approaches
Both quantitative strategies and practice guidelines, when done rigorously, are very resource-intensive. Investigators may adopt less onerous methods and still provide useful insights. Meta-analysts may wish to take the first steps in making treatment recommendations without a formal decision analysis or practice guideline development exercise. If they are to optimize the rigour of these tentative recommendations they will comprehensively identify all options and outcomes and use their meta-analysis to establish the causal links between the two. They may then choose to label values in only a qualitative way, such as: "we value preventing a stroke considerably more highly than incurring a gastrointestinal bleed. Given this value, we would be willing to treat a moderate to large number of patients to prevent a single target event, and would therefore recommend treating all but those at lowest risk of stroke."
Clinicians may find such recommendations useful, and they have the advantage of highlighting that if one does not share the specified values, one would choose an alternative treatment strategy. They may not, however, reflect community or patient preferences. In addition, they are less specific than the process of placing a number on our values. While quantifying values may make us uncomfortable, we are regularly (if unconsciously) making such judgements in the process of instituting or withholding treatment for our patients.
Are Treatment Recommendations Desirable at All?
The approaches we have described highlight that patient management decisions are always a function of both evidence and preferences. Clinicians may point out that values are likely to differ substantially between settings. Monitoring of anticoagulant therapy might take on a much stronger negative value in a rural setting where travel distances are large, or in a more severely resource-constrained environment where there is a direct inverse relationship between (for example) the resources available for purchase of antibiotics and those allocated to monitoring levels of anticoagulation.
Patient-to-patient differences in values are equally important. The magnitude of the negative value of anticoagulant monitoring, or the relative negative value associated with a stroke versus a gastrointestinal bleed, will vary widely between individual patients, even in the same setting. If decisions are so dependent on preferences, what is the point of recommendations?
This line of argument suggests that investigators should systematically search, accumulate and summarize information for presentation to clinicians. In addition, investigators may highlight the implications of different sets of values for clinical action. The dependence of the decision on the underlying values, and the variability of values, would suggest that such a presentation would be more useful than a recommendation.
We find this argument compelling. Its implementation is, however, dependent on standard methods of summarizing and presenting information that clinicians are comfortable interpreting and using. Furthermore, it implies clinicians having the time, and the methods, to ascertain patient values that they can then integrate with the information from systematic reviews of the impact of management decisions on patient outcomes. These requirements are unlikely to be fully met in the immediate future. Moreover, treatment recommendations are likely to remain useful for providing insight, marking progress, highlighting areas where we need more information, and stimulating productive controversy. In any case, clinical decisions are likely to improve if clinicians are aware of the underlying determinants of their actions, and are able to be more critical about the recommendations offered to them. Our taxonomy may help to achieve both goals.
Resolution of the Scenario
The closest statement to a recommendation relevant to your patient from the original journal article [1] is the following: "many elderly patients with atrial fibrillation are unable to sustain chronic anticoagulation. Furthermore, the risk of bleeding (particularly intracranial haemorrhage) was increased during anticoagulation of elderly patients in our study." Since this study neither summarized the available evidence, nor explicitly stated its underlying values, so its recommendation that is low in rigour.
The decision analysis uses systematic summaries of the available evidence and specifies the patient values used in developing its conclusion that "Treatment with warfarin is cost-effective in patients with non-valvular atrial fibrillation and one or more additional risk factors for stroke" [2], placing it in the high rigour category. Moreover, the patient values used in the analysis appear consistent with what your patient's preferences. The only limitation to the decision analysis is that its bottom line recommendation involves considerations of cost which you have reservations about including.
The practice guideline [3] once again uses a systematic summary of the evidence, and though making frequent reference to patients values, does not specify the relative value of stroke and bleeding implied in its strong recommendation that high risk patients such as ours be offered anticoagulant therapy. Nevertheless, armed with consistent recommendations from a systematic synthesis and a recommendation of intermediate rigour, you feel confident recommending your patient begin taking warfarin.
Dear Jennifer,
Some details below.
Regards.
Sarvadaman Oberoi
Tower 1 Flat 1102, The Uniworld Garden,
Sohna Road, Gurgaon 122018 Haryana INDIA
Mobile: +919818768349 Tele: +911244227522
Website: http://www.freewebs.com/homeopathy249/
email: manioberoi@gmail.com
DETAILS
1.
The Evidence for Homoeopathy Foreword to Version 8.3 September 06
(update of March 02 Harvard Medical School Course version and Reilly D. Alt Ther Med Health 2005;(11)2:2831).
14. Is it progressing and contributing to medical advance?
New remedies and approaches are being developed, e.g. see the immunomodulation research in references 5 to 8, and the results from the stateofthe–art conventional research labs where Jonas and colleagues have shown reduced stroke damage in rats using the conventional knowledge of the toxicity of the released glutamate from the damaged brain, and the application of the homoeopathic principle with ultralow dose glutamate 69 .
69 Jonas WB, Lin Y, Tortella F. Neuroprotection from glutamate toxicity with ultralow dose glutamate. Neuroreport 2001;12:3359.
2.
Contributed by: Dr. J. Rozencwajg, MD, PhD. from New Plymouth, New Zealand
(jroz@ihug.co.nz )
Primary complaint: A Patient with stroke
This elderly gentleman (who has given permission to report his case) came into my office about 18 months ago. He sat down heavily on the chair and said in a slurry voice "My head is full of cement". He had been referred to me by a GP who had nothing else to offer him but wanted to help. He was obviously having what is called a stroke in progress: not a sudden event, not TIAs that dissappear without leaving a trace, but progressively worsening paresis (partial paralyisis), slurring of the voice, slow ideation, need to think and search for his words, slow motions,..... To me, he belonged in a neurological ward in hospital, with anticoagulants, monitoring, CT and MRI and angiograms, as he could have completed his stroke any time. But having heard that explanation, he and his wife decided to give homeopathy a go. So I gave him Plumbum 5C, 1 globule dissolved in 10 mls of water, take one drop 3 times/day for 3 days and call me. Needless to say I was very nervous about him, and as he did not call on the third day, did so on the fourth. He profoundly apologised for not ringing me as had been busy mowing his lawn! All the symptoms dissapeared on the second day, never to reappear again.
( Note: Plumbum is the Homeopathic remedy made from Lead.)
( This case is published on the impossible cure website – www.impossiblecure.com - the site of Amy Lansky author of the best selling book on Homeopathy – Impossible Cure)
A further and even more dramatic case is detailed in Amy Lansky's book
3.
Users Guides to Evidence Based Practice
http://www.cche.net/usersguides/main.asp
1. Was the assignment of patients to treatment randomized?
During the 1970s and early 1980s surgeons increasing undertook extracranial-intracranial bypass (that is, anastomosis of a branch of the external carotid artery, the superficial temporal, to a branch of the internal carotid artery, the middle cerebral). They believed it prevented strokes in patients whose symptomatic cerebrovascular disease was otherwise surgically inaccessible. This conviction was based on the comparison of clinical outcomes among non-randomized "cohorts" of patients who, for whatever reason, had and had not undergone this operation, for the former appeared to fare much better than the latter. To the surprise of many and the indignation of a few, a large multi-center randomized trial, in which patients were allocated to receive or forego this operation using a process analogous to flipping a coin, demonstrated that the only effect of surgery was to make patients worse off in the immediate post-surgical period; long-term outcome was unaffected [6]. Other surprises generated by randomized trials that contradicted the results of less rigorous trials include the demonstration that steroids may increase (rather than reduce) mortality in patients with sepsis [7], that steroid injections do not ameliorate facet-joint back pain[8], and that plasmapheresis does not benefit patients with polymyositis [9]. Such surprises may occur when treatments are assigned by random allocation, rather than by the conscious decisions of clinicians and patients. In short, clinical outcomes result from many causes, and treatment is just one of them: underlying severity of illness, the presence of comorbid conditions, and a host of other prognostic factors (unknown as well as known) often swamp any effect of therapy. Because these other features also influence the clinician's decision to offer the treatment at issue, nonrandomized studies of efficacy are inevitably limited in their ability to distinguish useful from useless or even harmful therapy. As confirmation of this fact, it turns out that studies in which treatment is allocated by any method other than randomization tend to show larger (and frequently "false-positive") treatment effects than do randomized trials [10] [11] [12] [13]. The beauty of randomization is that it assures, if sample size is sufficiently large, that both known and unknown determinants of outcome are evenly distributed between treatment and control groups.
What can the clinician do if no one has done a randomized trial of the therapeutic question she faces? She still has to make a treatment decision, and so must rely on weaker studies. In a later article in this series devoted to deciding whether a therapy or an exposure causes harm (a situation when randomization is usually not possible) we deal with how to assess weaker study designs. For now, you should bear in mind that non-randomized studies provide much weaker evidence than do randomized trials.
2. Were all patients who entered the trial properly accounted for and attributed at its conclusion?
This guide has two components: was followup complete? and, were patients analyzed in the groups to which they were randomized?
a) Was followup complete?
Every patient who entered the trial should be accounted for at its conclusion. If this is not done, or if substantial numbers of patients are reported as "lost to followup," the validity of the study is open to question. The greater the number of subjects who are lost, the more the trial may be subject to bias because patients who are lost often have different prognoses from those who are retained, and may disappear because they suffer adverse outcomes (even death) or because they are doing well (and so did not return to clinic to be assessed).
Readers can decide for themselves when the loss to follow-up is excessive by assuming, in positive trials, that all patients lost from the treatment group did badly, and all lost from the control group did well, and then recalculating the outcomes under these assumptions. If the conclusions of the trial do not change, then the loss to follow-up was not excessive. If the conclusions would change, the strength of inference is weakened (that is, less confidence can be placed in the study results). The extent to which the inference is weakened will depend on how likely it is that treatment patients lost to followup all did badly, while control patients lost to followup all did well.
b) Were patients analyzed in the groups to which they were randomized?
As in routine practice, patients in randomized trials sometimes forget to take their medicine or even refuse their treatment altogether. Readers might, on first blush, agree that such patients who never actually received their assigned treatment should be excluded from analyses for efficacy. Not so.
The reasons people don't take their medication are often related to prognosis. In a number of randomized trials non-compliant patients have fared worse than those who took their medication as instructed, even after taking into account all known prognostic factors, and even when their medications were placebos! [14] [15] [16] [17] [18] [19] Excluding non-compliant patients from the analysis leaves behind those who may be destined to have a better outcome and destroys the unbiased comparison provided by randomization.
The situation is similar with surgical therapies. Some patients randomized to surgery never have the operation because they are too sick, or suffer the outcome of interest (such as stroke or myocardial infarction) before they get to the operating room. If investigators include such patients, who are destined to do badly, in the control arm but not in the surgical arm of a trial, even a useless surgical therapy will appear to be effective. However, the apparent effectiveness of surgery will come not from a benefit to those who have surgery, but the systematic exclusion of those with the poorest prognosis from the surgical group.
This principle of attributing all patients to the group to which they were randomized results in an "intention-to-treat" analysis. This strategy preserves the value of randomization: prognostic factors that we know about, and those we don't know about, will be, on average, equally distributed in the two groups, and the effect we see will be just that due to the treatment assigned.
B. Secondary Guides
3. Were patients, their clinicians, and study personnel "blind" to treatment?
Patients who know that they are on a new, experimental treatment are likely to have an opinion about its efficacy, as are their clinicians or the other study personnel who are measuring responses to therapy. These opinions, whether optimistic or pessimistic, can systematically distort both the other aspects of treatment, and the reporting of treatment outcomes, thereby reducing our confidence in the study's results. In addition, unblinded study personnel who are measuring outcomes may provide different interpretations of marginal findings or differential encouragement during performance tests, either one of which can distort their results [20].
The best way of avoiding all this bias is double-blinding (sometimes referred to as double-masking), which is achieved in drug trials by administering a placebo, indistinguishable from active treatment in appearance, taste and texture but lacking the putative active ingredient, to the control group. When you read reports on treatments (such as trials of surgical therapies) in which patients and treating clinicians cannot be kept blind, you should note whether investigators have minimized bias by blinding those who assess clinical outcomes.
4. Were the groups similar at the start of the trial?
For reassurance about a study's validity, readers would like to be informed that the treatment and control groups were similar for all the factors that determine the clinical outcomes of interest save one: whether they received the experimental therapy. Investigators provide this reassurance when they display the "entry" or "baseline" prognostic features of the treatment and control patients. Although we never will know whether similarity exists for the unknown prognostic factors, we are reassured when the known prognostic factors are nicely balanced. Randomization doesn't always produce groups balanced for known prognostic factors. When the groups are small, chance may place those with apparently better prognoses in one group. As sample size increases, this is less and less likely (this is analogous to multiple coin flips: one wouldn't be too surprised to see seven heads out of ten coin flips, but one would be very surprised to see seventy heads out of one hundred coin flips).
The issue here is not whether there are statistically significant differences in known prognostic factors between treatment groups (in a randomized trial one knows in advance that any differences that did occur happened by chance) but rather the magnitude of these differences. If they are large, the validity of the study may be compromised. The stronger the relationship between the prognostic factors and outcome, and the smaller the trial, the more the differences between groups will weaken the strength of any inference about efficacy.
All is not lost if the treatment groups are not similar at baseline. Statistical techniques permit adjustment of the study result for baseline differences. Accordingly, readers should look for documentation of similarity for relevant baseline characteristics, and if substantial differences exist should note whether the investigators conducted an analysis that adjusted for those differences. When both unadjusted and adjusted analyses reach the same conclusion, readers justifiably gain confidence in the validity of the study result.
5. Aside from the experimental intervention, were the groups treated equally?
Care for experimental and control groups can differ in a number of ways besides the test therapy, and differences in care other than that under study can weaken distort the results. If one group received closer followup, events might be more likely to be reported, and patients may be treated more intensively with non-study therapies. For example, in trials of new forms of therapy for resistant rheumatoid arthritis, ancillary treatment with systemic steroids (extremely effective for relieving symptoms), if administered more frequently to the control group than to the treatment group, could obscure an experimental drug's true treatment effect (unless exacerbation requiring steroids were itself counted as an outcome).
Interventions other than the treatment under study, when differentially applied to the treatment and control groups, often are called "cointerventions". Cointervention is a more serious problem when double-blinding is absent, or when the use of very effective non-study treatments is permitted at the physicians' discretion. Clinicians gain greatest confidence in the results when permissible cointerventions are described in the methods section and documented to be infrequent occurrences in the results.
The foregoing five guides (two primary and three secondary), applied in sequence, will help the reader determine whether the results of an article on therapy are likely to be valid. If the results are valid, then the reader can proceed to consider the magnitude of the effect and the applicability to her patients.
Scenario Resolution
Readers may be interested in how well the trial of plasmapheresis in patients with lupus nephritis met the tests of validity. With respect to primary criteria, randomization was rigorously conducted, as treatment was assigned through a phone call to the study's Methods Center. One patient assigned to standard therapy was lost to followup, and all the other patients were analyzed in the group to which they had been assigned. With respect to secondary criteria, the study was not blinded, the two groups were comparable at the start of the trial, and the authors provide little information about comparability of other treatments.
In the introductory paper in this series, we described the concept of strength of inference. The final assessment of validity is never a "yes" or "no" decision and must, to some extent, be subjective. We judge that the methods in this trial were, overall, strong, and provide a valid start for deciding whether or not to administer plasmapheresis to our patient with severe lupus nephritis.
II. What were the results?
Clinical Scenario
You are a general internist who is asked to see a 65 year-old man with controlled hypertension and a six-month history of atrial fibrillation resistant to cardioversion. Although he has no evidence for valvular or coronary heart disease, the family physician who referred him to you wants your advice on whether the benefits of long-term anticoagulants (to reduce the risk of embolic stroke) outweigh their risks (of hemorrhage from anticoagulant therapy). The patient shares these concerns, and doesn't want to receive a treatment that would do more harm than good. You know that there have been randomized trials of warfarin for non-valvular atrial fibrillation, and decide that you'd better review one of them.
The Search
The ideal article addressing this clinical problem would include patients with non-valvular atrial fibrillation, and would compare the effect of warfarin and a control treatment, ideally a placebo, on the risk of emboli (including embolic stroke) and also on the risk of the complications of anticoagulation. Randomized, double-blind studies would provide the strongest evidence.
In the software program "Grateful Med" you select a Medical subject (MeSH) heading that identifies your population, "atrial fibrillation," another that specifies the intervention, "warfarin", and a third that specifies the outcome of interest, "stroke" (which the software automatically converts to "explode cerebrovascular disorders" meaning that all articles indexed under cerebrovascular disorders or its subheadings are potential targets of the search), while restricting the search to English-language studies. To ensure that, at least on your first pass, you identify only the highest quality studies, you include the methodological term "randomized controlled trial (PT)" (PT stands for publication type). The search yields nine articles. Three are editorials or commentaries, one addresses prognosis, and one focuses on quality of life on anticoagulants. You decide to read the most recent of the four randomized trials [21].
Reading the study, you find it meets the validity criteria you learned about in a prior article in this series [22]. To answer your patient's and the referring physician's concerns, however, you need to delve further into the relation between benefits and risks.
1. How large was the treatment effect?
Most frequently, randomized clinical trials carefully monitor how often patients experience some adverse event or outcome. Examples of these "dichotomous" outcomes ("yes" or "no" outcomes that either happen or don't happen) include cancer recurrence, myocardial infarction, and death. Patients either do or do not suffer an event, and the article reports the proportion of patients who develop such events. Consider, for example, a study in which 20% (0.20) of a control group died, but only 15% (0.15) of those receiving a new treatment died. How might these results be expressed? Table 2 provides a summary of ways of presenting the effects of therapy.
Table 2: Measures of the effects of therapy
Risk without therapy (Baseline risk):X 20/100=0.20 or 20%
Risk with therapy: Y 15/100=0.15 or 15%
Absolute Risk Reduction (Risk Difference): X - Y 0.20-0.15=0.05
Relative Risk: Y/X 0.15/0.20 = 0.75
Relative Risk Reduction (RRR):
[1-Y/X] x 100 or [(X-Y) / X] x 100 [1-0.75]x100=25% [0.05/0.20]x100=25%
95% Confidence Interval for the RRR -38% to +59%
One way would be as the absolute difference (known as the absolute risk reduction, or risk difference), between the proportion who died in the control group (X) and the proportion who died in the treatment group (Y), or X - Y = 0.20 - 0.15 = 0.05. Another way to express the impact of treatment would be as a relative risk: the risk of events among patients on the new treatment, relative to that among controls, or Y/X = 0.15 / 0.20 = 0.75.
The most commonly reported measure of dichotomous treatment effects is the complement of this relative risk, and is called the relative risk reduction (RRR). It is expressed as a per cent: (1 - Y/X) x 100% = (1 - 0.75) x 100% = 25%. A RRR of 25% means that the new treatment reduced the risk of death by 25% relative to that occurring among control patients; the greater the relative risk reduction, the more effective the therapy.
2. How precise was the estimate of treatment effect?
The true risk reduction can never be known; all we have is the estimate provided by rigorous controlled trials, and the best estimate of the true treatment effect is that observed in the trial. This estimate is called a "point estimate" in order to remind us that, although the true value lies somewhere in its neighbourhood, it is unlikely to be precisely correct. Investigators tell us the neighbourhood within which the true effect likely lies by the statistical strategy of calculating confidence intervals [23].
We usually (though arbitrarily) use the 95% confidence interval which can be simply interpreted as defining the range that includes the true relative risk reduction 95% of the time. You'll seldom find the true RRR toward the extremes of this interval, and you'll find the true RRR beyond these extremes only 5% of the time, a property of the confidence interval that relates closely to the conventional level of "statistical significance" of p 70%) [35] or a trial of asymptomatic patients with moderate stenoses [36] to their situation. However, it is entirely appropriate to extrapolate from the previously identified study [4] which enrolled symptomatic patients with similar degrees of stenoses as our patients.
We will now outline two approaches to addressing the latter two questions, our patient's risk of adverse events without treatment, and our patient's risk of harm with therapy. [21]
Approach 1: Generation of Patient-Specific Baseline Risks
Recognizing that patients are rarely identical to the average study patient, clinicians can derive estimates of the patient's baseline risk from various sources. First, if the study reports risk in various subgroups, they can use the baseline risk for the subgroup most like their patient. However, most trials are not large enough to allow the generation of precise estimates of baseline risk in various patient subgroups and one may have to search for systematic reviews (particularly those including individual patient data) [37] to glean useful information. For example, the AF investigators pooled the individual patient data from all of the randomized trials testing antithrombotic therapy in non-valvular atrial fibrillation and were able to provide estimates of prognosis for patients in clinically important subgroups.[24]
Second, as an extension of the subgroup approach, one can use clinical prediction guides to quantitate an individual patient's potential for benefit (and harm) from therapy. [32] [38] [39] Returning to our example, a prognostic model that could identify patients with carotid stenosis most likely to benefit from endarterectomy would be very useful. Such a model would need to incorporate the risk of stroke without surgery (and thus the potential benefit from surgery) with the risk of stroke or other adverse outcomes from surgery. Using the European Carotid Surgery Trial database [40], investigators have developed a preliminary version of just such a model. [41] However, our enthusiasm for applying this clinical prediction guide should be tempered until it has been prospectively validated in a different group of patients (and preferably with different clinicians). [38]
Third, one could derive an estimate of their patient's baseline risk from published papers (preferably population-based cohort studies) [42] that describe the prognosis of similar (untreated) patients. For example, analysis of the Malmo Stroke Registry demonstrated that in the three years after a stroke, patients have a 6% risk of recurrent nonfatal stroke and a 43% risk of death; these risks were even higher in older patients or those with diabetes mellitus or cardiac disease. [43]
Analogous to the estimation of patient-specific baseline risk, clinicians can use these same sources of information to determine an individual patient's likelihood of harm from treatment. For example, a systematic review of 36 studies relating the risk of peri-operative complications from carotid endarterectomy to various pre-operative clinical characteristics revealed that women were at higher risk than men (odds ratio 1.44 [95% CI 1.14 to 1.83], absolute rate 5.2%). [44]
The final step in generating a patient-specific NNT (or NNH) involves the formula: NNT=1/(PEER x RRR) (where PEER= the patient's estimated event rate, or baseline risk). [21] Given the three year risk of recurrent disabling stroke in diabetic patients from the Malmo Stroke Registry (8.4%) and the 49% RRR expected with carotid endarterectomy, the patient-specific NNT in a 65 year old diabetic with ipsilateral carotid stenosis and a minor stroke would be calculated as: NNT=1/(0.084 x 0.49)=24. Clinicians who know a patient's baseline risk and RRR can also call on a nomogram to calculate the NNT. [45]
Approach 2: Clinical Judgment
Alternately, one can use the study NNT and NNH directly to generate patient-specific estimates. This method involves only two steps and is less time-consuming than the previous method (as, depending on the experience of the clinician, it may not require detailed literature review).
First, the clinician estimates the patient's risk of the outcome event relative to that of the average control patient in the study and converts this risk to a decimal fraction (= 'ft'). [46] Thus, patients judged to be at less risk than those in the trials will be assigned an ft 1. There are several sources that a clinician could use to obtain a value for 'ft'. The best estimate would come from a systematic review of all available data about the prognosis of similar patients; individual studies about prognosis would provide the next best estimates. Alternatively, she could use her clinical expertise in assigning a value to 'ft'. While this may appear to be overly subjective, preliminary data suggests that experienced clinicians may be accurate in estimating relative differences in baseline risk (ie. ft) between patients (far exceeding our abilities to judge absolute risks). [47]
Second, the clinician calculates the patient-specific NNT by dividing the average NNT by 'ft'. Thus, if the clinician felt that patient A was at one-fifth the risk of the average patient in the trial (based on the reduced baseline risk for women demonstrated in the subgroup analyses reported by the investigators) [4], her patient specific NNT for the prevention of one disabling stroke would be 100 (20/0.2).
In addition to considering the benefits from therapy, the clinician needs to consider a patient's risk of adverse events from any intervention. Patients A and B need to be informed that carotid endarterectomy does carry with it a risk of peri-operative death. To individualize your patient's risk of death, you can use the 'f' method just described. For example, patient A may be assumed to be at twice the risk (fh = 2) of peri-operative death as patients in the control group of the study because of her gender, hypertension, and the fact that she has left-sided carotid artery stenosis. [4] [44] You can adjust the NNH using 'fh', assuming the relative risk increase is constant across the spectrum of susceptibilities (an assumption which, as we've noted for RRR, may or may not hold depending on the particular therapy being considered). Thus, patient A's NNH is estimated to be 32 (63/2).
Incorporating Patient Values and Preferences
We have determined the risks of benefit and harm for the individual, but we must still incorporate patient values into the decision-making process. As outlined in a previous Users' Guide, [9] systematically-constructed decision analyses and practice guidelines that include an explicit statement of values can be used to integrate the evidence on benefit/harm with patient values to reach treatment recommendations or establish threshold NNTs. [9] [48] Although this situation would be ideal, such evidence is often not available (we could not identify a relevant decision analysis for our scenario). Moreover, as there is often substantial variation in values between individuals, [49] [50] [51] decision analyses which rely on group averages for values may not always be applicable to a particular patient, although close examination of the utility sensitivity analyses can often provide some guidance. [52] [53] [54]
While active patient involvement in decision making can improve outcomes and reported quality of life, and possibly reduce health care expenditures, [55] [56] [57] [58] [59] [60] [61] the initial step in this process is to determine the extent to which your patient wants to be involved with decision-making (recognizing that this may vary with each clinical decision).
How Much Do Patients Want to Participate?
There are 3 main elements to clinical decision making: the disclosure of information (about the risks and benefits of therapeutic alternatives); the exploration of the patient's values about both the therapy and the potential health outcomes; and, the actual decision. Each patient varies in their desired level of involvement with these steps and clinicians may not accurately gauge the degree to which an individual patient wants to be involved. [62] [63] [64] [65] [66] [67] Some patients may want all available information provided to them and to make the decision themselves with the clinician's role being that of information provider. Other patients may want all the information provided but may want the clinician to make the final decision. Still others may want to collaborate with their clinician in the process. These differences emphasize the need for clinicians to accurately assess patient preferences for information, discussion and decision-making, and tailor their approach to the individual.
Regardless of whether the clinician, the patient, or the partnership will make the decision, clinicians must explore patients' values about the therapy and the potential health outcomes. You can elicit your patient's values in informal ways during exploratory discussions with him/her or by more formal (and time-consuming) methods such as the time-tradeoff, standard gamble or rating scale techniques. [68]
Decision Aids
If your patient's goal is shared decision making, there are several models for providing shared decision-making support. First, formal clinical decision analysis, incorporating that patient's likelihood of the outcome events with their own values for each health state, could be used to guide the decision. Performing a clinical decision analysis for each patient would be too time-consuming for the busy clinician, and this approach therefore currently relies on finding an existing decision analysis. While this is the case, either our patient's values must approximate those in the analysis, or the decision analysis must provide information about the impact of variation in patient values. Computer models available at the bedside may broaden the scope of decision analysis applicability, and permit wider use with individual patients. [69]
Second, investigators have developed numerical methods of presenting information to patients that incorporate calculated patient values though these methods haven't been fully tested. [39] [70] Third, clinicians can utilize "decision aids" that present descriptive and probabilistic information about the disease, treatment options, and potential outcomes. [71] [72] [73] Most commonly, these decision aids present the outcome data in terms of the percentage of people with a certain condition who do well without intervention compared to the percentage who do well with intervention. While each of these methods has considerable merit, they sometimes fall short in terms of comprehensibility, applicability, and efficiency for use on busy clinical services.
The Likelihood of Being Helped or Harmed
One method of expressing information to patients that incorporates their values, can be applied to any clinical decision, and which preliminary evidence suggests may be useful on busy clinical services is the likelihood of being helped versus harmed. [74] The first step in this method is the exploration of patient values about taking the treatment (relative to not taking it) and the severity of adverse events that might be caused by the treatment (relative to the severity of the target event that we hope to avoid with the treatment). To answer these questions, patients are provided with brief descriptions of both the target event we'd like to prevent and the potential adverse event from the treatment. [Table 4]
Table 4: Sample descriptions of stroke and death
A stroke can result in weakness and loss of function in one side of your body. With a disabling stroke, you are admitted to a hospital for initial treatment and then transferred to a rehabilitation hospital for at least two months of intense rehabilitation. You regain some movement in your arm and leg but are left with a permanent weakness in that side of your body and require assistance with activities of daily living such as getting dressed, taking a bath, cooking, eating and using a toilet. You have trouble getting the words out when you speak.
A surgical procedure called carotid endarterectomy can decrease the risk of disabling stroke but can result in death. Death is more likely to occur in the first 30 days after this surgical procedure.
Following the review of the description of the target event, the clinician presents the patient with a rating scale (anchored at 0 [=death] and 1 [=full health]) and asks her to place a mark where she would consider the value of the target event.
During your discussions with Patient A, you discover that she is a fiercely independent newspaper journalist who lives alone and previously cared for her father after he suffered a disabling stroke. She believes that a disabling stroke is almost as bad as immediate death and assigns it a value of 0.025. Similarly, you give your patient the description of the adverse event that could result from the therapy (death within 30 days of surgery) and ask her to assess this using the rating scale (she assigned a value of 0.15 since death may not necessarily be immediate). Using the two ratings, you could infer that she believes a disabling stroke to be six times worse than death within the next month (0.15/0.025). This exercise should be repeated on another occasion to confirm that her values are stable.
In contrast, during your conversation with Patient B, you find that he is a former truck driver who recently retired to the country with his wife so that he could be near his daughter and grandson. When you explore his values, he decides that death is 8 times worse than having a disabling stroke.
How can you now incorporate your individual patient's values into the description of therapy? The average patient with a hemispheric stroke has a 10.3% chance of having a disabling stroke over 5 years, [Table 1] but this can be decreased for patients with ipsilateral moderate carotid stenosis to 5.3% with carotid endarterectomy. [4] The average NNT for such patients is 20. The absolute risk increase for death for patients having carotid endarterectomy is 1.6% [19], which translates to an average NNH of 63 (1/0.016).
To calculate the likelihood of being helped versus harmed (LHH), 1/NNT (=ARR) and 1/NNH (=ARI) are combined into an aggregate ratio. For both patients, the first approximation of the LHH is given by: LHH = (1/NNT) : (1/NNH) = (1/20) : (1/63) = 3 to 1 in favor of surgery. As a first approximation, both patients can be told that 'carotid endarterectomy is three times as likely to help you as harm you'.
However, this first approximation ignores both patients' unique individual risks of, and values for, stroke and perioperative death. You can particularize the LHH for each patient using the 'f' factors we described previously. As discussed above, women have a lower risk of stroke and the 'ft' for Patient A can be estimated at approximately 0.2. [4] This study (and a systematic review of other studies) [44] found that women, patients with left sided carotid disease, and patients with a history of hypertension have increased risks of perioperative deaths (relative risks ranging from 1.4 to 2.3). Thus, Patient A is at an increased risk of death from surgery (fh = 2). Her risk-adjusted LHH is: LHHA = (1/NNT) x ft : (1/NNH) x fh = (1/20) x 0.2 : (1/63) x 2 = 3 to 1 in favor of medical therapy . Similarly, the LHH for Patient B can be particularized for his unique risks. Men had a greater risk of stroke in the trial[4] and you can estimate from the reported subgroup analyses that Patient B's ft is approximately 1.25. Patient B also has left-sided carotid disease, suggesting that his risk of perioperative death is increased (fh = 2). His risk-adjusted LHH is: LHHB = (1/20) x 1.25 : (1/63) x 2 = 2 to 1 in favor of surgery.
These risk-adjusted LHHs still ignore each patient's values. Patient A ranked a disabling stroke as 6 times worse than death and this number (the 's' or 'severity' factor) can be used to adjust the LHH as follows: LHHA= (1/NNT) x ft x s: (1/NNH) x fh = (1/20) x 0.2 x 6 : (1/63) x 2 = 2 to 1 in favor of surgery. Thus, incorporating Patient A's values and unique risks of benefit and harm, she is twice as likely to be helped as harmed by surgery. On the other hand, Patient B stated that death was 8 times worse than a stroke and incorporating this into his LHH you calculate: LHHB = (1/20) x 1.25 : (1/63) x 2 x 8 = 4 to 1 in favor of medical therapy.
These two cases illustrate how to incorporate your patient's values into the decision-making process. At present, this process is time consuming and inexact, and we don't know how much difference it makes to patients or their clinical outcomes, so this approach is best considered as a logical and feasible, but untested, model. If you are unsure of your patient's 'f' or if there is some uncertainty around your patient's estimate of values, you could do a sensitivity analysis (inserting different values for these variables into the above equation to see how this is reflected in the LHH). We've described a simple formulation for the LHH (ignoring other outcomes from carotid endarterectomy and the risks of the diagnostic workup) [75], but this could be modified for more complex situations.
II. What are the Results?
5. What were the incremental costs and effects of each strategy?
Let us start with the incremental costs. Look in the text and tables for the listings of all the costs considered for each treatment option and remember that costs are the product of the quantity of a resource used and its unit price. These should include the costs incurred to 'produce' the treatment such as the physician's time, nurse's time, materials, etc. which we might term the 'up-front costs'; as well as the 'downstream costs' due to resources consumed in the future and associated with clinical events that are attributable to the therapy.
The study by Mark et al [2] quantifies resources used by treatment group in three periods of time over one year; initial hospitalization, discharge to 6 months and 6 months to one year. Both treatment groups were very similar in their use of hospital resources over the year; both experienced mean length of stay of 8 days of which 3.5 were in ICU and both groups had the same rate of CABG (13%) and PTCA (31%) on initial hospitalization. As summarized in Table 2, the one-year health care costs, excluding the thrombolytic agent, were $24,990 per tPA-treated patient and $24,575 per streptokinase-treated patient. As is clear from Table 2, the main cost difference between the two groups is the cost of the thrombolytic drugs themselves $2,750 for tPA and $320 for streptokinase. The overall difference in cost between tPA-treated and streptokinase-treated patients is therefore our incremental cost at $2,845 over the first year. This is discounted at 5% per annum for a final figure of $2,760. The authors argue that there is no cost difference between the two groups after one year. These data for incremental costs from tPA are very similar to those estimated by Kalish [3] who found a difference of $2,535 in the use of tPA to manage MI in preference to streptokinase.
The measure of effectiveness chosen in the Mark et al [2] study is the gain in life expectancy associated with tPA. The available follow-up experience was to one year, with 89.9% surviving in the streptokinase group versus 91.1% in the tPA group (p < 0.001). To translate these observations into life expectancy gains, the authors project survival curves for another 30 years or more using first a 14-year MI survivorship database from Duke University and then an assumption that survivorship will follow a statistical distribution (Gompertz). Having projected two survival curves, the authors calculate the area under each curve, which represents the expected value of survival time or life expectancy. For tPA patients life expectancy was 15.41 years and for streptokinase 15.27 years. As summarized in Table 2, the difference in life expectancy is 0.14 years per patient; or phrased another way, for every 100 patients treated with tPA in preference to streptokinase we would expect to gain 14 years of life.
In other situations, quantifying incremental effectiveness may be more difficult. Not all treatments change survival, and those that do not may affect different dimensions of health in many ways. For example, drug treatment of asymptomatic hypertension may result in short-term health reductions from drug side-effects, in exchange for long-term expected health improvements, such as reduced risk of strokes. Note that in our tPA example the outcome is not unambiguously restricted to survival benefit because there is a small but statistically significant increased risk of non-fatal hemorrhagic stroke associated with tPA [1]. The existence of trade-offs between different aspects of health, or between length of life versus quality of life, means that to arrive at a summary measure of net effectiveness, we must implicitly or explicitly weight the 'desirability' of different outcomes relative to each other.
There is a large and growing literature on quantitative approaches for combining multiple health outcomes into a single metric using patient preferences [32]. Foremost among current practice is the construction of quality-adjusted life years (QALYs) as a measure that captures the impact of therapies in the two broad domains of survival and quality-of-life. (QALYs were described in more detail earlier in this series [10].) Alternative approaches include the Healthy Year Equivalent method [33].
Our second thrombolytic study by Kalish et al [3] used QALYs as their primary measure of effectiveness. First they took the same one-year survival probabilities from the GUSTO study and projected them forward to estimate life expectancy using data from a different longitudinal study, the Worcester Heart Attack Study. Similar to Mark et al [2] they estimate that the average life span after MI is 14.6 years and then used GUSTO risk reductions to estimate life expectancy difference for tPA and streptokinase patients.
To derive QALYs they applied utility weights (from death=0 to healthy=1) to patients surviving the MI but sustaining morbid events over time such as non-fatal stroke (utility of 0.79) or reinfarction (utility of 0.93). These utility weights were taken from the literature, based on preference measurements undertaken in the GISSI-2 trial [34]. However, given the small differences between treatment groups in risk of morbid events that receive quality-adjustment in survival, although the total number of future QALYs is fewer than unadjusted life years at 8.842 for streptokinase and 8.926 for tPA, the difference in QALYs (0.084), using 30 day GUSTO survival data, is identical to the effect calculated by Mark et al [2] using unadjusted life expectancy.
In summary, both studies use the efficacy data from the GUSTO trial as their starting point to conclude that tPA treatment is more costly than streptokinase but that it provides an increase in survival (quality- adjusted or otherwise). Table 2, using Mark et al data, illustrates the next calculation in both studies which determines the incremental cost-effectiveness ratio for tPA. After discounting future costs and effects at 5% per year to reflect time preference (for rationale, see our first paper [35]), the difference (tPA minus streptokinase) in cost per patient over the year (and by extension into the future because they assume no cost differences beyond one year) is $2,760, which is divided by the difference in life expectancy per patient (0.084) to yield a ratio of $32,678 per year of life gained.
A simple interpretation of this ratio is that it is the 'price' at which we are buying additional years of life by using tPA in preference to streptokinase; the lower this price, the more attractive is the use of tPA. The Kalish study [3] reaches a similar incremental cost-effectiveness ratio (with their adjusted denominator of QALYs and using the 30-day risk reduction GUSTO data) of $30,300 per QALY. These are the main results of the studies; we will discuss their interpretation later in this article.
In current health care practice, judgements often reflect clinician or societal values concerning whether intervention benefits are worth the cost. Consider the decisions regarding administration of tissue plasminogen activator (tPA) versus streptokinase to patients with acute myocardial infarction, or clopidogrel versus aspirin to patients with a transient ischemic attack. In both cases, evidence from large randomized trials suggests the more expensive agents are, for many patients, more effective. In both cases, many authoritative bodies recommend first-line treatment with the less effective drug, presumably because they believe society's resources would be better used in other ways. Implicitly, they are making a value or preference judgement about the trade-off between deaths and strokes prevented, and resources spent.
By values and preferences, we mean the underlying processes we bring to bear in weighing what our patients and our society will gain, or lose, when we make a management decision. A number of the Users' Guides focus on how clinicians can use research results to clearly understand the magnitude of potential benefits and risks associated with alternative management strategies.[6] [7] [8] [9] [10] Three guides focus on the process of balancing those benefits and risks when using treatment recommendations [11] [12] and in making individual treatment decisions.[13] The explicit enumeration and balancing of benefits and risks brings the underlying value judgements involved in making management decisions into bold relief.
Acknowledging that values play a role in every important patient care decision highlights our limited understanding of eliciting and incorporating societal and individual values. Health economists have played a major role in developing a science of measuring patient preferences.[14] [15] Some decision aids incorporate patient values indirectly: if patients truly understand the potential risks and benefits, their decisions will likely reflect their preferences.[16] These developments constitute a promising start. Nevertheless, many unanswered questions concerning how to elicit preferences, and how to incorporate them in clinical encounters already subject to crushing time pressures, remain. Addressing these issues constitutes an enormously challenging frontier for EBM.
Clinical Scenario
You are a primary care practitioner considering the possibility of anticoagulant therapy with warfarin in a new patient, a 76 year old woman with chronic congestive heart failure and atrial fibrillation. The patient has no hypertension, valvular disease, or other comorbidity. Aspirin is the only antithrombotic agent that the patient has received over the 10 years during which she was in atrial fibrillation. Her other medications include captopril, furosemide, and metoprolol. The duration of the patient's atrial fibrillation, and her dilated left atrium on echocardiogram, dissuade you from prescribing antiarrhythmic therapy. Discussing the issue with the patient, you find she places a high value in avoiding a stroke, a somewhat lower value in avoiding a major bleed, and would accept the inconvenience associated with monitoring anticoagulant therapy.
You have little inclination to review the voluminous original literature relating to the benefits of anticoagulant therapy in reducing stroke or its risk of bleeding, but hope to find an evidence-based recommendation to guide your advice to the patient. In your office file relating to this problem you find a report of a primary study [1], a decision analysis [2], and a recent practice guideline [3] that you hope will help.
Introduction
Each day, clinicians make dozens of patient management decisions. Some are relatively inconsequential, some are important. Each one involves weighing benefits and risks, gains and losses, and recommending or instituting a course of action judged to be in the patient's best interest. These decisions involve an implicit consideration of the relevant evidence, an intuitive integration of the evidence, and a weighing of the likely benefits and harms. In making choices, clinicians may benefit from structured summaries of the options and outcomes, systematic reviews of the evidence regarding the relation between options and outcomes, and recommendations regarding the best choices. This Users' Guide explores the process of developing recommendations, suggests how the process may be conducted systematically, and introduces a taxonomy for differentiating recommendations that are more rigorous (and thus more likely to be trustworthy) from those that are less rigorous (and thus at greater risk of being misleading).
While recommendations may be directed at health policy makers, our focus is advice for practicing clinicians. We will begin by considering the implicit steps that are involved in making a recommendation.
The Process of Developing a Recommendation
Figure 1 presents the steps involved in developing a recommendation, and the formal strategies that are available. The first step in clinical decision-making is to define the decision. This involves specifying the alternative courses of action, and the alternative outcomes. Often, treatments are designed to delay or prevent an adverse outcome such as stroke, death, or myocardial infarction. In our discussion, we will refer to the outcomes that treatment is designed to prevent as "target outcomes". Treatments are associated with their own adverse outcomes -- side effects or toxicity. Ideally, the definition of the decision will be comprehensive -- all reasonable alternatives will be considered, and all possible beneficial and adverse outcomes will be identified. In patients like the woman in the scenario with non-valvular atrial fibrillation options include not treating, giving aspirin, or anticoagulating with warfarin. Outcomes include minor and major embolic stroke, intracranial haemorrhage, gastrointestinal haemorrhage, minor bleeding, and the inconvenience associated with taking and monitoring medication.
Figure 1: A Schematic View of the Process of Developing a Treatment Recommendation
Having identified the options and outcomes, decision-makers must evaluate the links between the two -- what will the alternative management strategies yield in terms of benefit and harm [4]. They must also consider how this impact is likely to vary in different groups of patients [5]. Having made estimates of the consequences of alternative strategies, value judgements about the relative desirability or undesirability of possible outcomes becomes necessary to allow treatment recommendations. We will use the term "preferences" synonymously with "values" or "value judgements" in referring to the process of trading off positive and negative consequences of alternative management strategies.
Recently, investigators have applied scientific principles to the collection, selection, and summarization of evidence, and the valuing of outcomes. We will briefly describe these systematic approaches.
Linking Management Options and Outcomes -- Systematic Reviews
Unsystematic identification and collection of evidence risks biased ascertainment -- treatment effects may be under-, or more commonly, overestimated and side effects may be exaggerated or ignored. Unsystematic summaries of data run similar risks of bias. One result of these unsystematic approaches may be recommendations advocating harmful treatments, and failing to encourage effective therapy. For example, experts advocated routine use of lidocaine for patients with acute myocardial infarction when available data suggested the intervention was ineffective and possibly even harmful, and failed to recommend thrombolytic agents when data showed patient benefit [6].
Systematic reviews deal with this problem by explicitly stating inclusion and exclusion criteria for evidence to be considered, conducting a comprehensive search for the evidence, and summarizing the results according to explicit rules that include examining how effects may vary in different patient sub-groups [7] [8]. When a systematic review pools data across studies to provide a quantitative estimate of overall treatment effect we call it a meta-analysis. Systematic reviews provide strong evidence when the quality of the primary studies is high and sample sizes are large, and less strong evidence when designs are weaker and sample sizes small. Because judgement is involved in many steps in a systematic review (including specifying inclusion and exclusion criteria, applying these criteria to potentially eligible studies, evaluating the methodological quality of the primary studies, and selecting an approach to data analysis) systematic reviews are not immune from bias. Nevertheless, in their rigorous approach to collecting and summarizing data, systematic reviews reduce the likelihood of bias in estimating the causal links between management options and patient outcomes.
Decision Analysis
Rigorous decision analysis provides a formal structure for integrating the evidence about the beneficial and harmful effects of treatment options with the values or preferences associated with those beneficial and harmful effects. When done well, a decision analysis will use systematic reviews of the best evidence to estimate the probabilities of the outcomes and use appropriate sources of preferences (those of society, or of relevant patient groups) to generate treatment recommendations [9] [10]. When a decision analysis includes costs among the outcomes, it becomes an economic analysis, and summarizes tradeoffs between gains (typically valued in quality-adjusted life-years) and resource expenditure (valued in dollars) [11] [12]. A decision analysis will be open to bias if it fails criteria for a systematic overview in accumulating and summarizing evidence, or uses preferences that are arbitrary or come from small or unrepresentative populations (such as a small group of health-care providers).
Practice Guidelines
Practice guidelines provide an alternative structure for integrating evidence and applying values to reach treatment recommendations. Practice guideline methodology places less emphasis on precise quantitation than does decision analysis. Instead, it relies on the consensus of a group of decision-makers, ideally including experts, front-line clinicians, and patients, who carefully consider the evidence and decide on its implications. Rigorous practice guidelines will also use systematic reviews to summarize evidence, and sensible strategies to attribute values to alternative outcomes as they generate treatment recommendations [13] [14]. Guidelines developers may focus on local circumstances. For example, clinicians practicing in rural parts of less industrialized countries without resource to monitor its intensity may reject anticoagulation as a management approach for patients with atrial fibrillation. Practice guidelines may fail methodologic standards in the same ways as decision analyses.
We will now contrast these systematic approaches to developing recommendations with historical practice
Current Sources of Treatment Recommendations
Traditionally, authors of original, or primary, research into therapeutic interventions include recommendations about the use of these interventions in clinical practice in the discussion section of their papers. Authors of systematic reviews and meta-analyses also tend to provide their impressions of the management implications of their studies. Typically, however, individual trials or overviews do not consider all possible management options, but focus on a comparison of two or three alternatives. They may also fail to identify subpopulations in which the impact of treatment may vary considerably. Finally, when the authors of overviews provide recommendations, they are not typically grounded in an explicit presentation of societal or patient preferences.
Failure to consider these issues may lead to variability in recommendations given the same data. For example, a number of meta-analyses of selective decontamination of the gut using antibiotic prophylaxis for pneumonia in critically ill patients with very similar results regarding the impact of treatment on target outcomes resulted in recommendations varying from suggesting implementation, to equivocation, to rejecting implementation [15] [16] [17] [18]. Varying recommendations reflect the fact that both investigators reporting primary studies and meta-analysts often make their recommendations without benefit of an explicit standardized process or set of rules. When benefits or risks are dramatic, and these benefits and risks are essentially homogeneous across an entire population, intuition may provide an adequate guide to making treatment recommendations. Such situations are unusual. In most instances, because of their susceptibility to both bias and random error, intuitive recommendations risk misleading the clinician.
These considerations suggest that when clinicians examine treatment recommendations, they should critically evaluate the methodologic quality of the recommendations. The greater the extent to which recommendations adhere to the methodologic standards we have mentioned, the greater faith clinicians may place in those recommendations [Table 1]. Table 2 presents a scheme for classifying the methodological quality of treatment recommendations, emphasizing the three key components: consideration of all relevant options and outcomes, a systematic summary of the evidence, and explicit and/or quantitative consideration of societal or patient preferences. In the next section of the text, we will describe the rating system summarized in Table 2.
Table 1: Methodologic Requirements for Systematic, Rigorous Recommendations
• Comprehensive statement of management options and possible outcomes.
• Systematic review and summary of evidence linking options to outcomes. Examination of the magnitude of impact, in terms of both benefits and risks, in relative and absolute terms.
• Consideration of different populations, and the characteristics of these populations, that may effect impact of intervention.
• Examination of strength of evidence linking options to outcomes. Where evidence is weak, examine the implications of plausible differences in effects.
• Explicit, appropriate specification of values or preferences associated with outcomes.
Table 2: A hierarchy of rigour in making treatment recommendations
Level of Systematic Summary Considers all relevant Explicit Statement Example
Rigour of Evidence options and outcomes of Values Methodologies
High Yes Yes Yes Practice
guidelines
or decision
analysis* Intermediate Yes Yes or No No Systematic review*
Low No Yes or No No Traditional reviews;
original articles *
Example methodologies may not reflect the level of rigour shown. Exceptions may occur in either direction. For example, if a practice guideline or decision analysis neither systematically collects and summarizes information, nor explicitly considers societal or patients' values, it will produce recommendations which are of Low rigour. If a systematic review does consider all relevant options and at least qualitatively considers values, it can produce recommendations approaching High rigour.
Making recommendations: a hierarchy of rigour
Systematic summary of evidence for all relevant interventions using appropriate values
Quantitative summary of evidence and values
The most rigorous approach to making recommendations (which we will call a systematic synthesis) involves precisely quantifying all benefits and risks; determining the values of either a group of patients or the general population; where uncertainty exists making a systematic and quantitative exploration of the range of possible true values; and using quantitative methods to synthesize the data. One approach to meeting these criteria involves conducting a formal decision analysis. Many decision analyses fail to carry out each step in the process in an optimally rigorous fashion; to do so usually requires a major research project [9] [10].
Challenges for decision analysts include conducting the systematic reviews required to generate the best estimates of benefits and risks associated with treatment options, and measuring how the general public or patients value the relevant outcomes. Typically, a decision analysis values each treatment arm in terms of quality-adjusted life years. When costs are considered, the decision analysis becomes an economic analysis, and we think in terms of additional dollars spent to gain an additional quality-adjusted life year. The optimal therapy, or the cost-effectiveness of alternatives, may differ depending on untreated patients' risk of the target outcome.
What a decision analysis or economic analysis usually does not do is to value the benefits, risks and costs and provide an explicit threshold for decision-making. For example, a new treatment might cost $50,000 per quality adjusted life years gained. Is this a bargain, or too great a cost to warrant treatment? Often, decision analysts will refer to the cost/effectiveness or cost/utility ratios of currently-used treatments to help with this decision. For instance, the decision analysis from the scenario in this article concluded that while the cost of warfarin for patients with at least one factor increasing their risk of embolism was $8,000 per quality-adjusted life-year saved, the cost was $375,000 per quality-adjusted life-year saved for a 65-year old with no risk factors [2]. The authors compared these figures to the $50,000 to $100,000 cost per quality-adjusted life-year gained when screening adults for hypertension.
Quantitative summary of evidence and values: Explicit Decision Thresholds
Investigators can use the principles of decision analysis to arrive at explicit decision thresholds and present these thresholds in ways that facilitate clinicians' understanding. One such approach involves the number of patients to whom one must administer an intervention to prevent a single target event, the Number Needed to Treat (NNT) [19]. Typically, the NNT falls as patients' risk of an adverse outcome rises, and may become extremely large when patients are at very low risk. In a previous Users' Guide, we have described the threshold NNT [20], the dividing line between when treatment is warranted (the NNT is low enough that the benefits outweigh the costs and risks), and when it is not (the NNT is too great to warrant treatment). Deriving the threshold NNT involves specifying the relative value associated with preventing the target outcome versus incurring the side effects and inconvenience associated with treatment [21].
Investigators using this approach may also consider costs. If so, they face the additional requirement of specifying the number of dollars one would be willing to pay to prevent a single target event. With or without considering costs, investigators can plug the values they adduce into an equation that generates the threshold NNT[20]. They can then look at the risk of the target outcome in untreated subpopulations to whom clinicians might consider administering the intervention. Combining this information with the relative risk reduction associated with the treatment, they can determine on which side of the threshold the treatment falls.
Returning to our example, warfarin decreases the risk of stroke in patients with non-valvular atrial fibrillation. Since anticoagulation increases bleeding risk it is not self-evident that we should be recommending the treatment for our patients and must find a way of trading off decreased stroke and increased bleeds. We can calculate the threshold NNT by specifying the major adverse outcome of treatment, bleeding, and the frequency with which it occurs due to treatment. We then specify the impact of these deleterious effects relative to the target event the treatment prevents, a stroke. A variety of studies of relevant patient populations [22] [23] [24] [25] suggest that on average, patients consider 1 severe stroke equivalent to 5 episodes of serious gastrointestinal bleeding. We use these figures to calculate our threshold NNT which proves to be approximately 152 [Figure 2]. This implies that if we need to anticoagulate less than 152 patients to prevent a stroke, we will do so; if we must anticoagulate more than 152 patients, then our recommendation will be to not treat.
Figure 2: Calculating the Treshold Number Needed to Treat (T-NNT) for Warfarin Treatment of Patients With Nonvalvular Atrial Fibrillation
The threshold NNT then facilitates recommendations for specific patient groups. Table 3 summarizes the calculation of the NNT, and the associated comparison with the threshold, for two groups of patients. A meta-analysis of randomized trials tells us that anticoagulation reduces the risk of stroke by 68% (95% confidence interval 50% to 79%), and that this risk reduction is consistent across clinical trials [26]. The meta-analysis also provides risk estimates for different groups of patients with strokes. Patients over 75 with any of previous cerebrovascular events, diabetes, hypertension, or heart disease have a stroke risk of approximately 8.1% per year. Anticoagulation reduces this risk to 2.6% with an NNT of 1 / 0.055 or approximately 18 per year. The NNT for this group is appreciably lower than the threshold NNT suggesting that such patients should be treated. Patients less than 65 with no risk factors have a one-year stroke risk of 1% which anticoagulation reduces to 0.32%. The associated NNT of 146 approximates the threshold NNT of 153 and suggests the decision about whether or not to treat is a toss-up.
Table 3: Using the NNT to make treatment recommendations
Patient group Risk of stroke Relative risk reduction Group's absolute Group's NNT
without treatment with warfarin risk reduction (Threshold NNT)
Under 65,
no risk factors 1% 68% 0.68% 146 (Cost
omitted:152; Cost
considered:53)
Previous thrombo
embolic event 8.1% 68% 5.5% 18 (Cost
omitted:152; Cost
considered:53)
Clinicians or health-care decision makers interested in considering costs in their decisions can look for help from the model. Costs can be included by specifying the dollar value associated with preventing adverse outcomes (for example, Laupacis and colleagues have suggested the most that society might be willing to pay to gain a quality-adjusted life year is $100,000 [27]). When we consider costs as calculated in the decision analysis from the patient scenario [2], we arrive at an threshold NNT of 53, suggesting a more conservative approach to anticoagulant administration.
Investigators can use units other than NNT to develop clinically useful decision thresholds. For example, for 81 patients previously treated with cis-platinum based chemotherapy, the average minimum gain in survival which was felt to make the chemotherapy worthwhile was 4.5 months for mild toxicity and 9 months for severe toxicity [28]. Such a threshold could be integrated with information about the actual gain in life associated with the treatment to help form the basis for a recommendation about use of cis-platinum therapy.
Like other quantitative approaches, considering NNT and the threshold NNT, or alternative thresholds, is intended to supplement clinical judgement, not replace it. Investigators exploring different treatment choices have found the methodology useful [29]. However clinicians use it, the approach highlights the necessity for both valuing the benefits and risks of treatment, and understanding the magnitude of those benefits and risks, in making a treatment decision.
Quantitative summary of evidence, qualitative summary of preferences
Practice guidelines, if they are to minimize bias, should not substitute expert opinion for a systematic review of the literature, an explicit and sensible process for valuing outcomes, an explicit consideration of the impact of uncertainty associated with the evidence and values used in the guidelines and an explicit statement of the strength of evidence supporting the guideline. When a practice guideline meets these methodological standards, and thereby minimizes bias, we refer to the guideline as "evidence-based" [Table 1].
Once they have the evidence, investigators and clinicians are often uncomfortable with explicitly specifying preferences in moving from evidence to action. Their reluctance is understandable. Specifying a specific tradeoff between, say, a stroke and a gastrointestinal bleed is not an exercise with which we are familiar. People may feel that identifying a specific value -- a stroke is equivalent to 2.5 gastrointestinal bleeds, for instance -- implies more precision than is realistic. Discomfort may rise further when we specify a dollar value associated with preventing an adverse event.
This may be one reason that participants in the development of rigorous practice guidelines, including experts in the content area, methodologists, community practitioners, and patients and their representatives, seldom use numbers to identify the value judgements they are making. Still, a rigorous guideline will establish, reflect, and make explicit the community and patient values on which the recommendation is based.
Most practice guidelines fail to systematically summarize the evidence. Even those that meet criteria for evidence accumulation and summarization do not usually make their underlying values explicit. Guidelines that do not meet either set of criteria produce recommendations of low methodologic rigour.
Practice guidelines that meet the criteria in Table 1 provide an alternative to quantitative strategies to arrive at a systematic synthesis.
Systematic review of evidence, unsystematic application of values
Traditionally, investigators provide their results and then make an intuitive recommendation about the action that they believe should follow from their evidence. They may do so without considering all treatment options, or all outcomes (Table 2). Even when they consider all relevant treatments and outcomes, they may fail to use community or patient values, or even to make the values they are using explicit. For instance, the authors of a meta-analysis of antithrombotic therapy in atrial fibrillation stated "about one patient in seven in the combined study cohort were at such low risk of stroke (1% per year) that chronic anticoagulation is not warranted." [26] Here, the relative value of stroke and gastrointestinal bleeding is implicit in the recommendation. The nature of the value judgement is not transparent, and we have no guarantee that the implicit values reflect those of our patient or community.
Clinicians faced with such recommendations need to take care that they are aware of all relevant outcomes, both reductions in targets and treatment-related adverse events, and are aware of the relative values implied in the treatment recommendations.
Unsystematic review, unsystematic synthesis
The unsystematic approach represents the traditional strategy of accumulating evidence and summarizing evidence in an unsystematic fashion, and then applying implicit preferences to arrive at a treatment recommendation. The approach is open to bias, and is likely to lead to consistent, valid recommendations only when the gradient between beneficial and adverse consequences of alternative actions is very large.
Intermediate Approaches
Both quantitative strategies and practice guidelines, when done rigorously, are very resource-intensive. Investigators may adopt less onerous methods and still provide useful insights. Meta-analysts may wish to take the first steps in making treatment recommendations without a formal decision analysis or practice guideline development exercise. If they are to optimize the rigour of these tentative recommendations they will comprehensively identify all options and outcomes and use their meta-analysis to establish the causal links between the two. They may then choose to label values in only a qualitative way, such as: "we value preventing a stroke considerably more highly than incurring a gastrointestinal bleed. Given this value, we would be willing to treat a moderate to large number of patients to prevent a single target event, and would therefore recommend treating all but those at lowest risk of stroke."
Clinicians may find such recommendations useful, and they have the advantage of highlighting that if one does not share the specified values, one would choose an alternative treatment strategy. They may not, however, reflect community or patient preferences. In addition, they are less specific than the process of placing a number on our values. While quantifying values may make us uncomfortable, we are regularly (if unconsciously) making such judgements in the process of instituting or withholding treatment for our patients.
Are Treatment Recommendations Desirable at All?
The approaches we have described highlight that patient management decisions are always a function of both evidence and preferences. Clinicians may point out that values are likely to differ substantially between settings. Monitoring of anticoagulant therapy might take on a much stronger negative value in a rural setting where travel distances are large, or in a more severely resource-constrained environment where there is a direct inverse relationship between (for example) the resources available for purchase of antibiotics and those allocated to monitoring levels of anticoagulation.
Patient-to-patient differences in values are equally important. The magnitude of the negative value of anticoagulant monitoring, or the relative negative value associated with a stroke versus a gastrointestinal bleed, will vary widely between individual patients, even in the same setting. If decisions are so dependent on preferences, what is the point of recommendations?
This line of argument suggests that investigators should systematically search, accumulate and summarize information for presentation to clinicians. In addition, investigators may highlight the implications of different sets of values for clinical action. The dependence of the decision on the underlying values, and the variability of values, would suggest that such a presentation would be more useful than a recommendation.
We find this argument compelling. Its implementation is, however, dependent on standard methods of summarizing and presenting information that clinicians are comfortable interpreting and using. Furthermore, it implies clinicians having the time, and the methods, to ascertain patient values that they can then integrate with the information from systematic reviews of the impact of management decisions on patient outcomes. These requirements are unlikely to be fully met in the immediate future. Moreover, treatment recommendations are likely to remain useful for providing insight, marking progress, highlighting areas where we need more information, and stimulating productive controversy. In any case, clinical decisions are likely to improve if clinicians are aware of the underlying determinants of their actions, and are able to be more critical about the recommendations offered to them. Our taxonomy may help to achieve both goals.
Resolution of the Scenario
The closest statement to a recommendation relevant to your patient from the original journal article [1] is the following: "many elderly patients with atrial fibrillation are unable to sustain chronic anticoagulation. Furthermore, the risk of bleeding (particularly intracranial haemorrhage) was increased during anticoagulation of elderly patients in our study." Since this study neither summarized the available evidence, nor explicitly stated its underlying values, so its recommendation that is low in rigour.
The decision analysis uses systematic summaries of the available evidence and specifies the patient values used in developing its conclusion that "Treatment with warfarin is cost-effective in patients with non-valvular atrial fibrillation and one or more additional risk factors for stroke" [2], placing it in the high rigour category. Moreover, the patient values used in the analysis appear consistent with what your patient's preferences. The only limitation to the decision analysis is that its bottom line recommendation involves considerations of cost which you have reservations about including.
The practice guideline [3] once again uses a systematic summary of the evidence, and though making frequent reference to patients values, does not specify the relative value of stroke and bleeding implied in its strong recommendation that high risk patients such as ours be offered anticoagulant therapy. Nevertheless, armed with consistent recommendations from a systematic synthesis and a recommendation of intermediate rigour, you feel confident recommending your patient begin taking warfarin.