-
Notifications
You must be signed in to change notification settings - Fork 17
Non inferiority by iMRMC
In response to a question posed in the “How do I do a non-inferiority study?” issue on the package’s Github repository (https://github.com/DIDSR/iMRMC/issues/31#issuecomment-1368940679), we have developed an example guide using the R package that demonstrates using the iMRMC software to help future users. This wiki page is based on an R markdown file that contains the R code for reproducing the results: LINK to R markdown file.
The issue asked how to find the p-value of a noninferiority study comparing split-plot radiologist reads against an AI reading all cases. The work of Chen, Petrick, and Sahiner (2012) addresses this issue, but the method they present uses the OR model (Obuchowski and Rockette Jr. (1995)) and assumes the data are fully crossed (all readers read all cases). The iMRMC software can produce the OR variance components only when the data is fully crossed, so the issue is asking for a solution that makes use of the iMRMC software for alternate study designs, and the split-plot study in particular (Obuchowski, Gallas, and Hillis (2012)).
The measure of diagnostic accuracy used is AUC, where the experimental
modality is a single AI algorithm and the conventional modality is the
radiologists’ reads. The margin parameter is set to a level of
To answer this question, we will use the ViperObservation data from the
ViperData package (https://github.com/DIDSR/viperData). This data
assesses radiologists’ diagnostic performance using screen film
mammography (SFM) and full field digital mammography (FFDM) under
varying study designs, which is outlined by Gallas et al. (2019)
(http://dx.doi.org/10.1117/1.JMI.6.1.015501). Similar to the issue
posed, the diagnostic accuracy of each modality is measured by AUC from
a split-plot study. In this manuscript, the experimental modality being
tested is AI simulated from the FFDM data, compared to the conventional
modality of SFM. We will use a margin parameter of
For noninferiority testing, the null and alternative hypotheses have been described in Chen, Petrick, and Sahiner (2012) (https://doi.org/10.1016/j.acra.2012.04.011) and are copied here:
where
Therefore, if we can reject
To use the viperData package, we can download the package in its entirety here: https://didsr.github.io/viperData/, as a tar.gz or zip file and install the package from the computer’s local library. This is possible using the Tools dropdown, selecting “Install packages…,” and changing the “Install From:” option to “Package Archive File.” From here we can browse and select the downloaded viperData package to install and load into our Environment.
#Load iMRMC package
library(iMRMC)
#Load Viper data
library(viperData)
viperObs <- viperData::viperObs
#Choose one sub-study => screeningHighP
screeningHighP <- viperObs[viperObs$subStudyLabel == "screeningHighP", , drop = FALSE]
screeningHighP$caseID <- as.character(screeningHighP$caseID)
The iMRMC function that will perform the analysis expects the input
dataframe to consist of 4 columns: readerID
, caseID
, modalityID
,
and score
. The truth
modality in the resulting iMRMC dataframe
represents the “ground truth” on the state of any case as diseased
(createIMRMCdf()
function specifying the variable columns mentioned above.
We begin preparing the data for analysis by separating the FFDM data from the SFM data. We will use the FFDM data to simulate an AI reader.
#Create iMRMC dataframe
dfMRMC <- createIMRMCdf(screeningHighP,
keyColumns = list(readerID = "readerID",
caseID = "caseID",
modalityID = "modalityID",
score = "score",
truth = "cancerStatus455"),
truePositiveFactor = "1")
#Subset FFDM reads from all readers
dfMRMC_onlyFFDM <- subset(dfMRMC, modalityID == "FFDM")
#Subset SFM reads from all readers
dfMRMC_onlySFM <- subset(dfMRMC, modalityID == "SFM")
#Subset truth reads from all readers
dfMRMC_onlyTruth <- subset(dfMRMC, modalityID == "truth")
To simulate data from an AI reading all cases, both as modality and
reader, we will combine data from multiple readers. We do this by
cycling though all readers in the dfMRMC_onlyFFDM
dataframe and adding
the first instance of each case to the new dataframe dfMRMC_AImod
.
Once we have just one score for each case under this modality, the
modalityID
is changed to “AI” and the readerID
for these scores is
renamed to “AI.1.” Now we have simulated the score results of AI reads
for all cases.
#Create a dataframe for AI simulated reads from FFDM modality
dfMRMC_AImod <- data.frame()
#Loop through all cases in caseID and isolate each case from the chosen readers just once
for(i in unique(dfMRMC_onlyFFDM$readerID)) {
remainingCases <- dfMRMC_onlyFFDM[!dfMRMC_onlyFFDM$caseID %in%
dfMRMC_AImod$caseID, , drop = FALSE]
dfMRMC_AImod <- rbind(dfMRMC_AImod, subset(remainingCases, readerID == i))
}
#Change modalityID to "AI" for all cases
dfMRMC_AImod$modalityID <- "AI"
#Change readerID to "AI.1" for all cases
dfMRMC_AImod$readerID <- "AI.1"
For the purpose of the software, we need to duplicate those results for a second reader which we will call “AI.2.” This process and reasoning is covered on the iMRMC repository wiki here: https://github.com/DIDSR/iMRMC/wiki/Single-reader-analysis. By duplicating these scores on the same cases, there is no reader variability for the AI “readers.”
#Duplicate those cases and rename with readerID "AI.2"
dfMRMC_AImod2 <- dfMRMC_AImod
dfMRMC_AImod2$readerID <- "AI.2"
#Bind dataframe with 2 AI "readers" with duplicate scores on all cases
dfMRMC_AImod <- rbind(dfMRMC_AImod, dfMRMC_AImod2)
## Bind simulated AI reads with split-plot SFM reads
#Bind full dataframe with only SFM, truth and AI rows
dfMRMC_AI <- rbind(dfMRMC_onlyTruth, dfMRMC_onlySFM, dfMRMC_AImod)
Once the dataframe is shaped into iMRMC’s expected format, we run the
doIMRMC
function which provides 5 results. These include:
-
perReader
: per-reader AUC results -
Ustat
: reader-averaged AUC results using U-statistics -
MLEstat
: reader-averaged AUC results using maximum likelihood estimation -
ROC
: modality-specific ROC curves, and -
varDecomp
: 5 variance decompositions for the reader-averaged AUCs.
#Run iMRMC analysis function
results <- doIMRMC(dfMRMC_AI)
As Chen, Petrick, and Sahiner (2012) describe, the p-value for a
noninferiority test is
where modA
and modB
, respectively)
are reader-averaged. The iMRMC function provides these reader-averaged
estimates in the object uStat
. Here we show how to calculate the test
statistic.
#Isolate data frame of reader-averaged U-stats based performance results
uStat <- results[['Ustat']]
#Capture degrees of freedom
#The third row we are calling here assesses the accuracy of modA against modB
dfBDG <- uStat[3, 'dfBDG']
#Capture variance of the difference between modalities, ie. the total variance
varAUCAminusAUCB <- uStat[3, 'varAUCAminusAUCB']
#Define all variables used in the t-statistic calculation
marginParam = 0.05
aucA = uStat[3, 'AUCA']
aucB = uStat[3, 'AUCB']
sd = sqrt(varAUCAminusAUCB)
#Calculate t statistic
tStat <- (aucA - aucB + marginParam) / sd
#Calculate p-value from tStat
pValue <- 2*(1 - pt(tStat, df = dfBDG))
aucA | aucB | marginParameter | sd | tStat | pValue | alpha |
---|---|---|---|---|---|---|
0.7500 | 0.7379 | 0.0500 | 0.0296 | 2.0956 | 0.0380 | 0.0500 |
Calculation components of p-value
The p value of 0.038 is less than our significance level of 0.05,
indicating that we reject
While the total variance of the difference is provided by the doIMRMC
function in the Ustat
dataframe as varAUCAminusAUCB
, we will also
use the varDecomp
results from the same function to find this variance
for the BDG variance components separately to demonstrate the
relationship. Understanding this relationship between coefficients and
components allows us to explore different study designs.
In the BDG variance decomposition, the total variance of the difference between modalities is split into 8 second-order U-statistic moments and their respective coefficients. The total variance is then the linear combinations of these coefficients and component terms.
Here is the one-shot total variance estimate where the coefficients
Further reading of that paper will provide a more detailed description of how the coefficients are calculated based on number of readers, non-diseased cases, and diseased cases when the data is fully crossed (every reader reads every case in all modalities). If the study design is not fully crossed, as it is in the split-plot Viper study, the coefficients are more complicated and can be calculated with the iMRMC software (Chen, Gong, and Gallas 2018). Gallas and Brown (2008) offer explanations on these non-fully-crossed coefficients.
Here we compose two data frames of the BDG coefficients
BDG_AI.SFM_coeff
and components BDG_AI.SFM_comp
from the iMRMC
output.
#List of methods of total variance decompositions
listVarDecomp <- results[['varDecomp']]
#Isolate BDG variance decomposition coefficients and component data frames from list
BDG_AI.SFM_coeff <- listVarDecomp[["BDG"]]$Ustat$coeff$AI.SFM
BDG_AI.SFM_comp <- listVarDecomp[["BDG"]]$Ustat$comp$AI.SFM
These data frames (printed as Table 1 and Table 2, respectively) each
provide 3 rows corresponding to the variance of each modality (labeled
AI
and SFM
) and the covariance of the difference between the two
modalities (labeled AI.SFM
). Values in these tables have been rounded
for conciseness.
Briefly, the coefficients are scale factors determined by the number of
readers, non-diseased, and diseased cases within the data-collection
study design. The coefficients for the covariance between the two
modalities (row 3, AI.SFM
) have already been multiplied by 2, as is
needed for calculating the total variance of the difference in
modalities. Since there is no reader overlap across modalities,
AI | 3.53e-05 | 4.55e-03 | 3.81e-03 | 4.92e-01 | 3.53e-05 | 4.55e-03 | 3.81e-03 | -5.08e-01 |
SFM | 5.66e-05 | 1.78e-03 | 1.49e-03 | 4.67e-02 | 2.27e-04 | 7.13e-03 | 5.96e-03 | -6.34e-02 |
2*(AI.SFM) | 0 | 0 | 0 | 0 | 1.41e-04 | 1.82e-02 | 1.52e-02 | -3.36e-02 |
Coefficients of the moments based on numbers of reader and cases
The components are the U-statistic moment estimates.
AI | 7.49e-01 | 6.48e-01 | 5.95e-01 | 5.61e-01 | 7.49e-01 | 6.48e-01 | 5.95e-01 | 5.61e-01 |
SFM | 7.34e-01 | 6.15e-01 | 5.92e-01 | 5.43e-01 | 5.98e-01 | 5.77e-01 | 5.57e-01 | 5.44e-01 |
AI.SFM | 0 | 0 | 0 | 0 | 5.91e-01 | 5.83e-01 | 5.62e-01 | 5.53e-01 |
Components of second-order U-statistic moment estimates
The code below will execute the linear combination and summation to arrive at the total variance of the difference.
#Create a new data frame of linear combination of coefficient and component moments
contributionsToVariance <- BDG_AI.SFM_coeff * BDG_AI.SFM_comp
#Sum moments to get the total Var contributions for each modality and interaction
varianceSums <- as.data.frame(rowSums(contributionsToVariance))
#Sum variance of each modality and subtract the 2*covariance term
totalVarianceofDiff <- as.numeric(varianceSums[1,1] + varianceSums[2, 1] - varianceSums[3,1])
var(AUC_AI) | 1.06e-03 |
var(AUC_SFM) | 5.06e-04 |
2*cov(AUC_AI, AUC_SFM) | 6.86e-04 |
var(AUC_AI - AUC_SFM) | 8.78e-04 |
Total variance for each modality and interaction
The total variance of the difference is 0.0008782.
Chen, Weijie, Qi Gong, and Brandon D. Gallas. 2018. “Paired Split-Plot Designs of Multireader Multicase Studies.” Journal of Medical Imaging 5: 031410. https://doi.org/10.1117/1.JMI.5.3.031410.
Chen, Weijie, Nicholas A. Petrick, and Berkman Sahiner. 2012. “Hypothesis Testing in Noninferiority and Equivalence MRMC ROC Studies.” Academic Radiology 19: 1158–65. https://doi.org/10.1016/j.acra.2012.04.011.
Gallas, Brandon D. 2006. “One-Shot Estimate of MRMC Variance: AUC.” Academic Radiology 13: 353–62. https://doi.org/10.1016/j.acra.2005.11.030.
Gallas, Brandon D. 2017. “IMRMC-R v1.2.4: Application for Analyzing and Sizing MRMC Reader Studies.” https://cran.r-project.org/web/packages/iMRMC/index.html.
Gallas, Brandon D., Andriy Bandos, Frank W. Samuelson, and Robert F. Wagner. 2009. “A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators.” Communications in Statistics—Theory and Methods 38: 2586–2603. https://doi.org/10.1080/03610920802610084.
Gallas, Brandon D., and David G. Brown. 2008. “Reader Studies for Validation of CAD Systems.” Neural Networks Special Conference Issue 21: 387–97. https://doi.org/10.1016/j.neunet.2007.12.013.
Gallas, Brandon D., Weijie Chen, Elodia Cole, Robert Ochs, Nicholas A. Petrick, Etta D. Pisano, Berkman Sahiner, Frank W. Samuelson, and Kyle J. Myers. 2019. “Impact of Prevalence and Case Distribution in Lab-Based Diagnostic Imaging Studies.” Journal of Medical Imaging 6: 015501. https://doi.org/10.1117/1.JMI.6.1.015501.
Obuchowski, Nancy A., Brandon D. Gallas, and Stephen L. Hillis. 2012. “Multi-Reader ROC Studies Iwth Split-Plot Designs: A Comparison of Statistical Methods.” Academic Radiology 19: 1508–17. https://doi.org/10.1016/j.acra.2012.09.012.
Obuchowski, Nancy A., and Howard E. Rockette Jr. 1995. “Hypothesis Testing of Diagnostic Accuracy for Multiple Readers and Multiple Tests an Anova Approach with Dependent Observations.” Communications in Statistics-Simulation and Computation 24: 285–308. https://doi.org/10.1080/03610919508813243.