Sherbenou (2016) N-Per-Group = 5
Sherbenou (2016) has already been referred to on semmel-weis.org, but needs to be examined again. In its experiment summarized in Figure 4B below, 30 rats were infected with Multiple Myeloma (MM) on Day=-10, and on Day=0 were given either a placebo (the ones in the PBS column on the left), or else one of five treatments, identified by the five titles above the five right-hand columns. One day later, on Day=1, the spread of the MM is shown in blue, and can be seen to be equivalent for the mice irrespective of treatment column, that first treatment one day earlier on Day=0 having as yet had insufficient time to affect the disease. The various treatments were continued for durations that differed from condition to condition.
Relevant here is that the mice in the fifth column from the left, the column titled CD46-ADC 4mg/kgx4, can be seen to be completely free of MM from Day=8 through Day=43. What is happening in the other columns is highly interesting, but at this moment irrelevant.
is short for
Daniel W. Sherbenou, Blake T. Aftab, Yang Su, Christopher R. Behrens, Arun Wiita, Aaron C. Logan, Diego Acosta-Alvear, Byron C. Hann, Peter Walter, Marc A. Shuman, Xiaobo Wu, John P. Atkinson, Jeffrey L. Wolf, Thomas G. Martin, and Bin Liu,
Antibody-drug conjugate targeting CD46 eliminates multiple myeloma cells,
Journal of Clinical Investigation, 2016;126(12):4640–4653.
Furthermore, the graph in Figure 4C immediately below follows the same experiment all the way to Day=212.5, and where can be seen that all mice in the fifth column are still alive, and all mice in all other columns are dead. What we see here, in short, seems to be a complete and permanent cure, which is a priceless contribution to our understanding of MM, but it is a cure of one particular kind of MM, and within one particular strain of mice, and nobody expected that whatever worked in such a situation would work the same way for different kinds of MM and in a collection of heterogeneous humans. What was expected is that the mouse data would be a step toward understanding myeloma in mice, and understanding myeloma in mice would be a step toward understanding myeloma in humans.
But what we are primarily interested in at the moment is that this complete mouse cure has been demonstrated using only 5 subjects per condition. Possibly no one examining this evidence has ever considered it unconvincing because it was based on too-small an N-Per-Group.
Chang (2011) N-Per-Group = 4 to 6 & N-Per-Group = 1 to 2
And to continue along this line of thought — it is not difficult to find statements advocating 4 to 6 N-Per-Group as a norm:
Mi Hee Chang, Suat L.G. Cirillo, and Jeffrey D. Cirillo
Using Luciferase to Image Bacterial Infections in Mice
Journal of Visualized Experiments, 2011; (48): 2547.
[blue emphasis added]
Although each animal can be followed individually, which controls for a great deal of the variability between animals, it is still necessary to include sufficient animals to allow statistical significance between groups to be determined. Usually, animal numbers should be 4 - 6 per group, allowing differences as low as two-fold between groups to be observed in many cases.
And that is not as low as N-Per-Group can go. We see the same Chang (2011) as was quoted immediately above answering the question of whether a bacterial infection can be detected in bioluminescent images, but relying on a mere N-Per-Group=1 in the control condition in which the sole mouse has no bacterial infection, and N-Per-Group=2 in the experimental condition in which both mice do have a bacterial infection. Each of the four images below are of the same three mice, the uninfected control mouse always on the left:
Again the question arises: Does an N-Per-Group of 1 and 2 fail to convince anybody that the bioluminescence imagery being tested here is able to detect bacterial infection? Maybe here too everybody is convinced of the conclusion despite the N-Per-Group scraping rock-bottom. And perhaps even the absolutely-rock-bottom of N-Per-Group=1 in each group would have been just as convincing.
Notice, too, that the above mini-experiment took only a bit more than 20 minutes to complete.
Lind (1747) N-Per-Group = 2
We look next at the first clinical trial in medical history, which was performed by Naval Surgeon James Lind aboard HMS Salisbury in 1747 in an attempt to understand scurvy, which we know today to be caused by Vitamin-C deficiency, and whose effects upon seamen of yore were devastating:
During the Age of Exploration (between 1500 and 1800), it has been estimated that scurvy killed at least two million sailors. Jonathan Lamb wrote: "In 1499, Vasco da Gama lost 116 of his crew of 170; In 1520, Magellan lost 208 out of 230, all mainly to scurvy." [...]
During the 18th century, disease killed more British sailors than enemy action. It was mainly by scurvy that George Anson, in his celebrated voyage of 1740–1744, lost nearly two-thirds of his crew (1300 out of 2000) within the first 10 months of the voyage. The Royal Navy enlisted 184,899 sailors during the Seven Years' War; 133,708 of these were "missing" or died from disease, and scurvy was the leading cause.
James Lind (1716-1794)
Lind's clinical trial tested six treatments, using Sailors-Per-Treatment=2:
This began after two months at sea when the ship was afflicted with scurvy. He divided twelve scorbutic sailors into six groups of two. They all received the same diet but, in addition,
group one was given a quart of cider daily,
group two twenty-five drops of elixir of vitriol (sulfuric acid),
group three six spoonfuls of vinegar,
group four half a pint of seawater,
group five received two oranges and one lemon, and
group six a spicy paste plus a drink of barley water.
The treatment of group five stopped after six days when they ran out of fruit, but by that time one sailor was fit for duty while the other had almost recovered. Apart from that, only group one also showed some effect of its treatment.
Wikipedia JAMES LIND [bold emphasis added]
We can imagine what Dr Lind's reply would have been had a time-travelling and jurisdiction-disregarding FDA representative informed him that his clinical trial would be credible only if it were repeated not with two sailors per group but with two thousand.
FDA (2018) N-Per-Group = 10 to THOUSANDS
Seeing above that cause-effect conclusions can be demonstrated with N-Per-Group ranging from 1 to 5, and with the recommendation that it be allowed to go as high as 6, what are we to make of the FDA describing as typical, and perhaps implying tolerable, or even obligatory, the fantastically-high subject numbers for clinical trials (meaning experiments on humans) below (which would need to be halved to get N-Per-Group, as clinical trials usually test two groups):
Clinical Research Phase Studies
Clinical trials follow a typical series from early, small-scale, Phase 1 studies to late-stage, large scale, Phase 3 studies.
Study Participants: 20 to 100 healthy volunteers or people with the disease/condition.
Length of Study: Several months
Study Participants: Up to several hundred people with the disease/condition.
Length of Study: Several months to 2 years
Study Participants: 300 to 3,000 volunteers who have the disease or condition
Length of Study: 1 to 4 years
Study Participants: Several thousand volunteers who have the disease/condition
[No Length of Study stated]
fda.gov [blue emphasis added]
But why such a Supersized-N-Per-Group? If it is possible to demonstrate sailors being cured of scurvy using Sailors-Per-Group=2, and if it is possible to demonstrate mice being cured of melanoma using Mice-Per-Group=5, why isn't it possible to demonstrate people being cured of anything at all using People-Per-Group=2 or 5? And if not 5, then surely 10 would be enough, and if not 10, then surely 15 — but that doesn't begin to approach the FDA requiring sometimes hundreds, and sometimes thousands.
Small-N clinical trials, admittedly, would require the human subjects to be more homogeneous at the start, and for the treatments to be applied more standardly, and for the expected results to be of non-trivial magnitude.
The advantages of Smaller-N research are enormous. For the cost and labor of a single clinical trial of the sort run today, it would be possible to run a hundred Small-N animal experiments, and so our understanding of animal cancer would advance rapidly, and its contribution to our understanding of human cancer would grow proportionately. The hard part is learning how to give many species of animals many kinds of cancers, and curing them all. But this goal could be attained quickly because the diversion of vast resources from staging useless infomercials to running useful science experiments would mean that tens of thousands of useful experiments would be conducted yearly all over the world. The step after that would not be as difficult — science continuing on to learn to cure humans with the help of the precedent set by its having learned to cure animals.
Why, instead, does the FDA not only tolerate Supersized-N-Per-Group clinical trials but seems also to recommend them and even to require them? I can think of several reasons, none of them good ones:
Supersized-N makes therapy improvement costly, such that only wealthy corporations are capable of doing it, which removes from competition researchers whose good ideas threaten the existing therapies being sold by Big Pharma.
Therapy improvement is slowed, extending the duration that patients remain uncured on antiquated therapies and thus extending the duration of payment for ineffective therapies.
Subjects available for research are depleted, leaving too few for competitors to research their new ideas.
Subject scarcity in the US makes it necessary to conduct research offshore, where it is next-to-impossible for the FDA to monitor, and where poverty and corruption make it easy for researchers to buy the results they want.
When running only a small number of subjects, they can be run more or less contemporaneously, as for example by starting their treatment on the same day, or at least within the same week, or at least not much beyond that, and therefore also completing their participation within the same narrow span of time, a day or a week, and so on. However, when running many hundreds of subjects, or thousands, contemporaneous running becomes impossible, and subjects need to staggered. They trickle in over an interval of many months or even years. When the clinical trial is terminated, a large number of subjects will have only begun their participation, or will be only half-way through it, or will be more than half-way, but far from completing it. But the decision to terminate the clinical trial is nevertheless made, and the months or years it would take for the latecomers to finish will not be accomodated. These latecomers who are not allowed to finish are the ones I call "shed" and the researchers call "censored". They are the subjects whom Kaplan-Meier graphing is able to imagine to be mainly survivors, and which product of imagination is relied upon to justify the inflation of survival statistics for which Kaplan-Meier is becoming notorious. In brief, Big Pharma needs to Supersize-N its infomercials because it's the Supersizing which necessitates staggered testing, and which later on triggers the subject disappearance which inflates Kaplan-Meier survival.
In short, FDA support of Supersized-N research reveals with unmistakable clarity that the FDA is under the influence of Big Pharma, and is therefore incapable of regulating Big Pharma. That Supersized-N is an unmitigated evil is evidenced in the glaring inefficiency which it inflicts and which it is impossible for the FDA to be unaware of:
Right now, the US has exactly 19,816 clinical trials open and ready to recruit patients — trials of promising new therapeutics to fight everything from HIV to cancer to Alzheimer’s. About 18,000 of them will get stuck on the tarmac because they won’t get enough people enrolled. And a third of those will never get off the ground at all, for the same reason.
The cancer patient already carries the burden of not having disclosed to him how many hundreds of millions of dollars his clinic may be accepting from Big Pharma, and also the burden of not having disclosed to him how many tens of millions of dollars his doctor may be accepting — and to those two burdens is now added the third burden of the FDA, trusted to be the patient's protector, instead turning out to be another cancer-industry component that is under Big Pharma influence.
FDA support of Supersized-N infomercials constitutes so egregious a deviation from legitimate scientific procedure that its revelation invites the cartoons below to be viewed not as expressions of hyperbole in the service of humor, but as welcome weapons in the war against Big Pharma's corruption of medical science and sabotaging of the fight against cancer.
The question before you, Dr Richardson, is whether you agree that approval of Supersized-N research is indicative of Big Pharma influence, or whether you are able to defend Supersized-N research? Whatever your reply, you can be sure that I will be delighted to publish it on semmel-weis.org.
I suggest to you also that your frequent lead-authorship or co-authorship of Supersized-N research confers on you a responsibility to answer questions concerning Supersized-N.
P.S. There is a possible defense of Supersized-N research which I have omitted to mention on this page because it is preposterous, and in any case because I have already debunked it in earlier writing on semmel-weis.org, but for the sake of completeness will touch on it yet again here. That defense is that Supersized-N subjects are required wherever the expected effect is teensy, to which the refutation is that such might be a credible defense if the experimental procedure were rigorously standardized and meticulously conducted. As it is the case that Supersized-N infomercials are slovenly beyond belief, no supersizing of their number of subjects can redeem them. A million subjects per condition would bring them not a step closer to justifying the conclusions that they proffer. And, incidentally, no so-called "meta-analysis" can extract meaning from a hundred infomercials when each one regarded individually is meaningless.
And in any case, what justification can be offered for searching for teensy effects when large effects call out for investigation, as for example the large effect suggesting that Bortezomib kills, originally noticed in a graph in BRIGHT FUTURE FOR J&J DARZALEX? showing that subjects disappear sooner in the two CASTOR curves in which they were given BORTEZOMIB than in the two POLLUX curves in which they were given LENALIDOMIDE: