San Miguel (2008) is a Johnson&Johnson Creation
The individuals who are under obligation to answer questions raised below concerning the use of Kaplan-Meier graphs in the San Miguel (2008) clinical trial are the individuals who head Johnson&Johnson because they conducted the research and wrote the report: San Miguel et al, Bortezomib plus Melphalan and Prednisone for Initial Treatment of Multiple Myeloma, New England Journal of Medicine, 2008, 359, 906-917, available at
And Johnson&Johnson leadership seems all the more clearly to be under obligation to answer such questions because San Miguel (2008) aimed primarily to show the benefit of bortezomib, of which Johnson&Johnson appears to be today's chief marketer, though now vending it as "VELCADE":
VELCADE is the market leader in treating relapsed multiple myeloma with over 300,000 patients treated worldwide. VELCADE is co-developed by Millennium Pharmaceuticals and Janssen Pharmaceutical Companies of Johnson & Johnson. Millennium is responsible for commercialization of VELCADE in the U.S., Janssen Pharmaceutical Companies of Johnson & Johnson are responsible for commercialization in Europe and the rest of the world. Takeda Pharmaceutical Company Limited and Janssen Pharmaceutical K.K. co-promote VELCADE in Japan.
The Historical Significance Of San Miguel (2008)
Both the FDA and Health Canada were impressed enough upon reading San Miguel (2008) to approve bortezomib (VELCADE) for front-line (meaning first or initial) treatment of multiple myeloma. The FDA approval came on
and Health Canada followed suit on
One can be sure that it is San Miguel (2008) that the FDA and Health Canada are citing because they each mention that the total number of patients was 682 (which equals the 344+338 appearing underneath the graph below). The Canadian announcement happens to also mention that overall survival in the Bortezomib group was 83% at 24 months, which percentage can be seen in red in the same graph.
A Kaplan-Meier Graph From San Miguel (2008)
Of primary interest in San Miguel (2008) is its overall-survival graph in Figure 1B, which is reproduced below. The Control Group got the two chemotherapy drugs melphalan and prednisone, and the Bortezomib Group got the same two chemotherapy drugs along with bortezomib, the purpose of the clinical trial being to show that adding bortezomib to the chemotherapy cocktail weakened multiple myeloma symptoms, and of particular interest here, gave longer life as seems to be confirmed in Figure 1B.
At the very beginning of the study, when Months of therapy equals zero, Figure 1B shows that all patients in both groups are alive, so both functions, commonly referred to as "curves", show 100% of patients surviving. At Months=27, however, only approximately 83% of the patients in the Bortezomib Group are still alive (which inference has been traced in orange on the graph), and which beats Control Group survival which at Months=27 is approximately 67% (but which inference has not been traced in orange). That much is elementary and straightforward.
But what are those "No. at Risk" values stretched underneath the graph? San Miguel does not explain them, nor does he rely on them, so it's easy to shrug them off as inconsequential.
San Miguel et al (2008)
Figure 1. Kaplan–Meier Curves for Overall Survival. [...] Panel B shows overall survival.
If an unsophisticated reader pays any attention to these "No. at Risk" values at all, he may guess that "at risk" means something like "in particular danger", and that as the numbers decrease over time, may guess that they signify that the longer chemotherapy was continued, the fewer patients were in particular danger, which reinforces the impression that bortezomib was doing good.
However, what the "No. at Risk" numbers really mean is "Patients Remaining", which is to say "Patients Remaining in the Study". As every Patient Remaining in the study is capable or dying, every Patient Remaining can be said to be "at risk" of dying.
For example, the Bortezomib Group starts out with 344 patients, so under Months=0 we see 344 Patients Remaining. By Months=3, the Bortezomib Group is down to 315 Patients Remaining — maybe some died (these are the dead), maybe some dropped out (they fled), and maybe some got kicked out of the experiment (they were shed). The Patients Remaining are patients who weren't yet dead, fled, or shed.
Putting this graph to the use for which graphs are intended, we might ask for example how many Bortezomib patients are still alive at Months=27? Of course we follow the vertical red line up from 27 on the X-axis to the Bortezomib curve, then left where we read off approximately 83% surviving of the original 344 in the Bortezomib Group which equals 286 claimed to be still alive at Months=27.
However, the No. at Risk data underneath the graph inform us that when researchers at Months=27 looked around, the only Bortezomib patients they could see were the 4 still enrolled in the study.
Claimed Alive=286, Observed Alive=4. That is the glaring discrepancy that the double-headed yellow arrow on a dashed shaft draws to our attention. Ordinary survival curves report the number of patients still alive, Kaplan-Meier survival curves, as we move along them from left to right, report inceasingly the number of subjects disappeared but guessed to be still alive. How this turning of dross into gold is achieved is explained in Kaplan, E. L. and Meier, P., Nonparametric estimation from incomplete observations, Journal of the American Statistical Association, 1958, 53, 457–481, and which article can be consulted in several locations, as at www.biecek.pl/~ or at
The key insight that is important to keep in mind when examining this Kaplan-Meier graph is that the 344 Bortezomib-Group patients at Months=0 are living and breathing people whose names can be spoken and whose photographs can be taken, whereas of the 286 Bortezomib patients which this Kaplan-Meier graph tells us have survived at Months=27, 286-4=282 are nameless and faceless and entirely incorporeal phantoms guessed to be alive and lurking among the throng of patients who have dropped out of the study.
Giving free rein to curiousity as to the nature of the surprising Patients-Remaining data, one may wonder why Figure 1B above presents its Kaplan-Meier curves graphically (they could instead have been presented as numbers in a table), but presents its Patients-Remaining data tabulated (they could instead have been drawn as curves in the graph). If drawn curves are advantageous for "Surviving Patients (%)", why are they not equally advantageous for "No. at Risk", especially when we realize that "No. at Risk" can be first transformed into "Patients Remaining (%)", and therefore drawn into Figure 1B using the same Y-axis range of 0 to 100?
Pursuing this line of thought, the version of Figure 1B below does exactly that — transforms the "No. at Risk" data into "Patients Remaining (%)", and plots that percentage on the same graph as the original Kaplan-Meier magic curves.
Figure 1. Kaplan–Meier Curves for Overall Survival. [...] This version of Panel B shows not only the original Kaplan-Meier survival curves that we have been examining, but also plots Patients-Remaining percentages.
Pondering this new graph is capable of awakening the question of whether the newly-added Patients-Remaining curves might be of greater practical relevance than the Kaplan-Meier curves. That is, unlike the Kaplan-Meier curves, the Patients-Remaining curves do accord with common-sense expectations of people reading the graph. In the graph immediately above, for example, since at Months=15 the Patients-Remaining Bortezomib-Group curve is opposite 48.8% on the Y-axis, then it is indeed the case that 344 * 0.488 = 168 patients have survived, and that these are corporeal beings who answer to their 168 names and who could be mustered for a group photo.
What the newly-added Patients-Remaining curves serve to remind us of is that anyone submitting to either the two-drug therapy in the Control Group, or the three-drug therapy in the Bortezomib Group, can expect by Months=30 to have vanished — to be either dead, or to have become so disenchanted with chemotherapy as to refuse to continue it (fled), or to have had his participation terminated by the researchers (shed).
And one more thing about these freshly-added Patient-Remaining curves in the graph just above — the Bortezomib curve no longer differs appreciably from the Control curve, suggesting that adding bortezomib to the two-chemical cocktail does not increase longevity. This is not to imply that drug-treatment differences appearing in Kaplan-Meier curves will generally — in all clinical trials — tend to disappear in the corresponding Patients-Remaining curves. A more conservative inference would be that Kaplan-Meier graphing is capable of encouraging untenable conclusions whether in the area of Overall Survival (how fast the curves drop from left to right) or in the area of drug-treatment differences (how big the separation between curves).
Magical Effects From Tolkien To Kaplan-Meier
In Tolkien's Lord of the Rings, Frodo slipping the One Ring onto his finger makes him invisible, and grants him immortality.
The trick which guarantees that Kaplan-Meier will produce inflated Surviving-Patients originates from its blindness to deaths that occur after a patient is fled/shed. The fled/shed patients, then, being out of sight and out of mind of the researchers, may be thought of as having become invisible to them, and the impossibility of their ever being recorded on the researchers' data sheets as having died has given them a sort of immortality.
A particular instance of researchers shedding patients is closing down a clinical trial prior to having obtained the most valuable piece of information from patients still alive — which is how much longer they are going to live, as is explained in A Practical Guide To Understanding Kaplan-Meier Curves:
[In clinical practice as opposed to research,] we may get information on our patients indefinitely; however, research is expensive, has a beginning and an end, and is formally closed when the study is complete (figuratively the lights go off, the telephones are not answered, and the files are stored, and everyone goes to another job).
Rich et al, A practical guide to understanding Kaplan-Meier Curves, Otolaryngol Head Neck Surgery, 2010, 143, 331-336.
The greater the number of patients categorized as fled or shed, then, the greater the number who are rendered invisible to the researchers, and the more immortality is shovelled into the data, and ultimately the more impressive seems the benefit of the pharmaceutical being tested.
By way of demonstrating this phenomenon, let us consider two extreme cases. The first case is imaginary, and portrays almost all patients in the drug group as fleeing the study. The second case is real, with all subjects remaining in the experiment to its completion.
Imaginary Case 1: Almost all patients fled
Suppose that all but one patient (in this case, 343 out of the 344 patients) who had been mustered to participate in the San Miguel (2008) Bortezomib Group demanded, one hour after their very first injection of bortezomib, to exit the clinical trial, and right then on the spot had put their signatures to their severance declarations, and thus had their names stricken from the list of participating patients, and who had then walked out of the lab intending never to return, and with the lab intending never to lay eyes on them again, and who one hour after that — died! That's 343 out of 344 patients dying right after dropping out of the study. However, we imagine further that the sole Bortezomib-Group patient who did not flee happened to still be alive at Months=30. As for the Control Group in this imaginary study, let us assume that it produced the same data as in the real San Miguel (2008) study.
Everyone apprised of the facts in this imaginary experiment would conclude that bortezomib is an extrordinarily lethal toxin, but that is not the impression that would be left in the minds of naive viewers of the Kaplan-Meier graph below, which sports a survival curve for the Bortezomib Group which is the dotted horizontal line floating opposite 100% surviving. Yes, this is exactly the Kaplan-Meier graph that matches the imaginary study just described.
How is such an incongruity possible? As has already been stated above, it is made possible by Kaplan-Meier discontinuing monitoring patients who are fled or shed. Once a subject is considered non-participating, no further attention is paid to him, and so no further information is sought from him, and so their 343 deaths never enter the statistical picture and so are incapable of pulling the Bortezomib Curve downward.
This imaginary study is admittedly fantastic — but which is irrelevant, as the question being asked at the moment is how the Kaplan-Meier analysis might handle fantastic data, and the answer being that Kaplan-Meier is capable of giving a fantastically-lethal toxin the appearance of a fantastically-successful cancer cure.
It should be noted that the HYPOTHETICAL DATA graph does not depend upon the 343 fled patients dying one hour after dropping out. The graph would look exactly the same if the 343 fled patients had died at random times ranging from, say, one minute after leaving the lab up to 50 years after. That impossible-to-beat bortezomib curve stays buoyed up at 100% not by any particular distribution of deaths after the patients' flight, but by (1) the large number of patients fleeing and (2) the principle that patient flight from the experiment renders a patient invisible to the researchers, and his death unrecorded.
But, it might be objected, what about the Patients-Remaining counts that we have gotten used to seeing right underneath the graph? Why have they been omitted above? Would not their presence reveal to every reader what had really happened?
Well, just below are the missing "No. at Risk" counts for the imaginary data we are considering, and we are free to picture them inserted just underneath the above graph, as has been usual:
However, there remains a reason why inclusion of the above "No. at Risk" numbers might fail to arouse suspicion that bortezomib is lethal — the reason being that most readers might not know what "No. at Risk" means, and may even take the string of ones in the table above to mean that only a single patient has been "at risk" over most of the study, which implies that nobody else in the Bortezomib Group was at risk, meaning that almost everybody remained safe from danger, which incorrect inference would make bortezomib treatment seem even more effective in fighting multiple myeloma than do the Kaplan-Meier curves alone.
But perhaps an even stronger reason exists why the "No. at Risk" data might fail to arouse suspicions that the Kaplan-Meier curves are misleading, and that reason is (as will be documented further below) that the "No. at Risk" numbers are likely to be withheld from the reader, withheld in the same way that they were withheld in the HYPOTHETICAL DATA graph just above. And so if withheld, they become incapable of triggering any suspicion that something is amiss with the curves in the graph.
Real Case 2: Zero subjects either fled or shed
When research is blessed with no vanished subjects — which is to say, when zero subjects are either fled or shed — which is normally the case when the subjects are animals, then the Kaplan-Meier curves are identical to conventional survival curves in which Percent Survival really does reflect the number of corporeal subjects still alive. In such a case, even when a researcher feeds his raw data into Kaplan-Meier software, the survival curves which the software delivers receive no magical Kaplan-Meier distortion and therefore do not mislead. That a Kaplan-Meier analysis, in the circumstance of zero subject loss, is identical to a conventional and straightforward survival analysis is acknowledged by Kaplan-Meier on the first page of their article: "When no losses [no subjects either fled or shed] occur at ages [Months in the example we have been considering] less than t [prior to the Month specified], the estimate of P(t) [proportion or percentage of survivers at that time] in all cases reduces to the usual binomial estimate, namely, the observed proportion of survivors. [p. 457, bold emphasis added]". Notice the word "observed" — when the number of fled/shed equals zero, survivors can be observed, and if a roll call were to be taken, would be heard answering "here".
But if, in the absence of subject loss, the Kaplan-Meier graph is identical to a conventional Percent-Surviving graph, then it might be better to avoid Kaplan-Meier calculations or software, because acknowledging their use might arouse in some readers' minds the unjustified suspicion that the data have been Kaplan-Meier distorted, and also because such acknowledgment might more deeply inculcate in other readers' minds a sense of Kaplan-Meier ubiquity, and therefore also a sense of its respectability and legitimacy.
Figure 1(G) Treatment-related Kaplan-Meier survival curves of mice. The log-rank test was performed and indicated that mouse survivals among these groups are significantly different (p < 0.001) and that PAA, when combined with low dose of melphalan, extends MM mouse survival.
Demonstration of conventional survival curves which were possibly generated by Kaplan-Meier software. As no subjects were fled/shed, the curves receive no Kaplan-Meier distortion, and would have been the same if they had been analyzed with a Kaplan-Meier-devoid software package. Also, as no subjects were fled/shed, there is no need to provide Subjects-Remaining counts underneath the graph.
PAA = Pharmacological doses of Ascorbic Acid, MM = Multiple Myeloma.
Xia, Xu, Zhang, et al (2017), EBioMedicine
Figure 1(E) Tumor burden was analyzed in ARP1 NOD.Cγ-Rag1 mice treated with PAA and with or without different doses of melphalan (1, 3, 5 mg/kg).
What an absence of fled-or-shed subjects permits researcher Xia (2017) to do in his graph above, and that San Miguel (2008) is unable to do in his Figure 1B, is to point to any plateau in any survival curve, and then point, within a photo of all subjects, to exactly which ones had been counted in the calculation and drawing of that plateau. In any Kaplan-Meier graph, no group photograph of the happy survivors claimed during any but the first days has ever been taken, or ever will be taken, as Kaplan-Meier survivors increasingly become, as the study progresses, incorporeal phantoms whose presence a camera is powerless to record.
Contemporary Usage Of Kaplan-Meier Graphs
Below is evidence compiled by Sato et al (2017) concerning the frequency of use of various statistical tools in the New England Journal of Medicine. Of particular interest is the "Survival methods" curve, which informs us that the Survival-methods category was the most relied on during 2004-2005 (in about 62% of all research articles), and was the second-most relied on during 2015 (in about 57% of all research articles). Furthermore,
Table S1 (not shown here) explains that "Survival methods" includes "Survival function, Kaplan-Meier plot, Proportional hazards model, Other survival model, rate adjustment, log-rank test" (bold emphasis added). Although the Sato findings are not detailed enough to reveal the frequency of use of Kaplan-Meier graphing alone, it may be safe from what we have seen to imagine that it is among the most widely used of all statistical tools in clinical trials.
Figure 1. Percentage of Studies Using Particular Types of Statistical Analysis during Four Periods between 1978 and 2015.
Googling "kaplan meier" (without the quotation marks) on 13 Mar 2017 produced a very large number of images, 740 of which contained Kaplan-Meier graphs, the first four of which are shown below, and with only the second of these four being accompanied by a Patients-Remaining count underneath the graph, and with this count misleadingly labelled "Number at risk", thereby echoing the misleading labelling in San Miguel (2008) Figure 1B that we have been discussing.
In the 147 cases in which items-remaining |
numbers were shown, what labels clarified
the meaning of these numbers?
| LABEL PROVIDED
| Number at risk|| 38|
| No. at risk|| 32|
| Number of patients at risk|| 12|
| Number of subjects at risk|| 9|
| Patients at Risk|| 9|
| Patients at Risk, n|| 6|
| At risk || 3|
| No at risk|| 3|
| At-Risk Patients|| 2|
| n at risk || 2|
| Number at risk by time|| 2|
| N patients at risk|| 1|
| No. of risk|| 1|
| Number of individuals at risk || 1|
| Numbers at risk|| 1|
| Numbers at risks|| 1|
| Pts at risk|| 1|
| # Still at Risk|| 1|
| # at risk|| 1|
| [NO LABEL]|| 14|
|SUBTOTAL =|| 14|
| n=|| 2|
| n|| 1|
| No. on Active Therapy|| 1|
| Number of eyes|| 1|
| Sample size|| 1|
| Subjects|| 1|
|SUBTOTAL =|| 7|
|GRAND TOTAL =||147|
| || |
Of the 740 Google hits that displayed Kaplan-Meier graphs, a mere 147 (equals 20%) tabulated Patients-Remaining counts underneath the graphs (as is instanced in the second of the four images above). From such images alone, though, it is impossible to say what fraction of the 80% which omitted Patients-Remaining counts did so legitimately (because there had been no subjects fled/shed), or did so improperly (when there had been subjects fled/shed).
In the 147 instances that Patients-Remaining (or sometimes it could have been Subjects-Remaining, or Lightbulbs-Remaining, or Rats-Remaining, or whatever) numbers were included underneath Kaplan-Meier graphs, they came with no explanatory label 14 times (see blue opposite), and with some variant of "at risk" 126 times (see gray opposite), giving a total for a category that we might call "misleading", or "failing to inform the reader" of (14 + 126) / 147 = 95%, and which could be summed up by positing that researchers yielded to the impulse to keep the reader in the dark about the meaning of those rows of numbers underneath the graph 95% of the time.
In 7 / 147 = 5% of the cases (see yellow opposite), the label was not clearly misleading, but it did fall short of the clear and effective "Patients Remaining" (with "Patients" replaced by whatever more appropriate noun was called for within each particular piece of research).
And in this sample of 147, there was not a single instance of the accurate and intelligible label "Patients Remaining".
In summary, the Kaplan-Meier analysis appears to be often used by researchers. And when used, is sometimes offered lacking indispensible Patients-Remaining counts. And when Patients-Remaining counts are supplied, they are almost always misleadingly labelled and for that reason perhaps often misunderstood by readers. And when misunderstood by readers, it may tend to be in the direction of trusting the efficacy of a medical treatment which may in reality be ineffective or harmful.
Duty To Warn
The following Johnson&Johnson statement of its "first responsibility" does Johnson&Johnson credit, although it might have been still more commendable had it ranked patients ahead of doctors and nurses:
And from the above declaration it may be fair to assume that Johnson&Johnson's "first responsibility" includes the duty to warn of dangers known to Johnson&Johnson, but not obvious to patients. The lack of integrity of the Kaplan-Meier methodology must surely top the list of such dangers.
The question of Kaplan-Meier integrity is not an abstract one that ivory-tower intellectuals can be left free to address at their convenience, and with the issue expected to be settled in years to come. Rather, the question of Kaplan-Meier integrity demands immediate action because patients and physicians are daily making life-and-death decisions based on Kaplan-Meier graphs, and so if these graphs are misleading, the life-and-death decisions will be wrong.
As an immediate first step, it is obligatory to begin giving patients some idea of how profoundly-distorted the conclusions based on Kaplan-Meier graphs can be. Among the proper places for such disclosure is within an FDA BLACK BOX, and until entry for this information into an FDA BLACK BOX is granted, perhaps it should be disseminated as broadly as possible inside an UNOFFICIAL RED BOX such as the following:
RED BOX WARNING
is not yet an
Black Box Warning, but should be:
Cancer patients are advised that estimates they may be given concerning survival times are likely to be based on Kaplan-Meier graphs which are capable of so radically distorting research data as to affirm that, for example, after 27 months of treatment, 286 patients are still alive, while the very same data simultaneously confesses that the total number of patients researchers have actually seen alive after 27 months is only 4.
It would seem incumbent upon Johnson&Johnson at this time to either
explain how the above critique of Kaplan-Meier is mistaken, and more specifically to demonstrate that the Kaplan-Meier methodology does not mislead, and so that past conclusions based on it do not need to be retracted, and so that it can continue to be used to analyze future clinical trials
disseminate recommendations to distrust conclusions that have relied on Kaplan-Meier methodology, and to recommend that the FDA suspend medical-use regulatory approval from drugs whose claim to efficacy was supported by Kaplan-Meier methodology, and to itself avoid reliance on Kaplan-Meier in all future clinical trials. ▢