Expert on Test Science Research Document Library

Determining the Necessary Number of Runs in Computer Simulations with Binary Outcomes

Mon, 01 Jan 2024 00:00:00 +0000

How many success-or-failure observations should we collect from a computer simulation? Often, researchers use space-filling design of experiments when planning modeling and simulation (M&S) studies. We are not satisfied with existing guidance on justifying the number of runs when developing these designs, either because the guidance is insufficiently justified, does not provide an unambiguous answer, or is not based on optimizing a statistical measure of merit. Analysts should use confidence interval margin of error as the statistical measure of merit for M&S studies intended to characterize overall M&S behavioral trends. Unfortunately, the margin of error for studies involving factors and success-or-failure (or binary) outcomes requires knowing model parameters when using logistic regression. We explore how an upper bound on the margin of error, needing less information about the statistical model we need to estimate, can assist in sample size planning. While the upper bound needs further theoretical refinement, simulation studies suggest the upper bound may provide a means of justifying M&S study sample sizes with a statistical measure of merit.

Suggested Citation

Duffy, Kelly, Curtis G Miller, and Rebecca Medlin. Sample Size Determination for Computer Simulations with Binary Outcomes. IDA Product 3002814. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Comparing Normal and Binary D-Optimal Designs by Statistical Power

Sun, 01 Jan 2023 00:00:00 +0000

In many Department of Defense test and evaluation applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments for generalized linear models. However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected. Results from using statistical power to compare designs contradict standard design of experiments comparisons, which employ D-efficiency ratios and fractional design space plots. Power calculations suggest that practitioners that are primarily interested in the resulting statistical power of a design should use normal D optimal designs over binary D-optimal designs when logistic regression is to be used in the data analysis after data collection

Suggested Citation

Medlin, Rebecca M, and Addison D Adams. Comparing Normal and Binary D-Optimal Design of Experiments by Statistical Power. IDA Document 3000032. Alexandria, VA: Institute for Defense Analyses, 2023.

Slides:

D-Optimal as an Alternative to Full Factorial Designs- a Case Study

Tue, 01 Jan 2019 00:00:00 +0000

The use of Bayesian statistics and experimental design as tools to scope testing and analyze data related to defense has increased in recent years. Planning a test using experimental design will allow testers to cover the operational space while maximizing the information obtained from each run. Understanding which factors can affect a detector’s performance can influence military tactics, techniques and procedures, and improve a commander’s situational awareness when making decisions in an operational environment. This presentation will explain how a D-optimal experimental design could be an option for planning a test when the number of runs is limited but an adequate test is desired. Additionally, it will describe how the results of a Bayesian multiple logistic model could be used to show in what way the operational environment can affect the detector’s performance.

Suggested Citation

Anderson, Breeana G, Heather M Wojton, and Keyla Pagan-Rivera. D-Optimal as an Alternative to Full Factorial Designs: A Case Study. IDA Document NS D-10580. Alexandria, VA: Institute for Defense Analyses, 2019.

Poster:

Impact of Conditions which Affect Exploratory Factor Analysis

Tue, 01 Jan 2019 00:00:00 +0000

Some responses cannot be observed directly and must be inferred from multiple indirect measurements, for example human experiences accessed through a variety of survey questions. Exploratory Factor Analysis (EFA) is a data-driven method to optimally combine these indirect measurements to infer some number of unobserved factors. Ideally, EFA should identify how many unobserved factors the indirect measures help estimate (factor extraction), as well as accurately capture how well each indirect measure estimates each factor (parameter recovery).

Suggested Citation

Krost, Kevin, Daniel J Porter, Stephanie T Lane, and Heather M Wojton. Impact of Conditions Which Affect Exploratory Factor Analysis. IDA Document NS D-10622. Alexandria, VA: Institute for Defense Analyses, 2019.

Poster:

Initial Validation of the Trust of Automated Systems Test (TOAST)

Tue, 01 Jan 2019 00:00:00 +0000

Trust is a key determinant of whether people rely on automated systems in the military and the public. However, there is currently no standard for measuring trust in automated systems. In the present studies we propose a scale to measure trust in automated systems that is grounded in current research and theory on trust formation, which we refer to as the Trust in Automated Systems Test (TOAST). We evaluated both the reliability of the scale structure and criterion validity using independent, military-affiliated and civilian samples. In both studies we found that the TOAST exhibited a two-factor structure, measuring system understanding and performance (respectively), and that factor scores significantly predicted scores on theoretically related constructs demonstrating clear criterion validity. We discuss the implications of our findings for advancing the empirical literature and in improving interface design.

Suggested Citation

Wojton, Heather M., Daniel Porter, Stephanie T Lane, Chad Bieber, and Poornima Madhavan. “Initial Validation of the Trust of Automated Systems Test (TOAST).” The Journal of Social Psychology 160, no. 6 (November 1, 2020): 735–50. https://doi.org/10.1080/00224545.2020.1749020.

Paper:

Power Approximations for Reliability Test Designs

Mon, 01 Jan 2018 00:00:00 +0000

Reliability tests determine which factors drive system reliability. Often, the reliability or failure time data collected in these tests tend to follow distinctly non- normal distributions and include censored observations. The experimental design should accommodate the skewed nature of the response and allow for censored observations, which occur when systems under test do not fail within the allotted test time. To account for these design and analysis considerations, Monte Carlo simulations are frequently used to evaluate experimental design properties. Simulation provides accurate power calculations as a function of sample size, allowing researchers to determine adequate sample sizes at each level of the treatment. However, simulation may be inefficient for comparing multiple experiments of various sizes. In this document, we present a closed form approach for calculating power, based on the non- central chi-squared approximation to the distribution of the likelihood ratio statistic. The solution can be used to compare multiple designs and accommodate trade-space analyses between power, effect size, model formulation, sample size, censoring rates, and design type. To demonstrate the efficiency of our approach, we provide a comparison to estimates that are generated using Monte Carlo simulation.

Suggested Citation

Johnson, Thomas H., Rebecca M. Medlin, and Laura Freeman. “Power Approximations for Failure-Time Regression Models.” Quality and Reliability Engineering International 35, no. 6 (2019): 1666–75. https://doi.org/10.1002/qre.2467.

Slides:

Paper:

Power Approximations for Generalized Linear Models using the Signal-to-Noise Transformation Method

Sun, 01 Jan 2017 00:00:00 +0000

Statistical power is a useful measure for assessing the adequacy of anexperimental design prior to data collection. This paper proposes an approach referredto as the signal-to-noise transformation method (SNRx), to approximate power foreffects in a generalized linear model. The contribution of SNRx is that, with a coupleassumptions, it generates power approximations for generalized linear model effectsusing F-tests that are typically used in ANOVA for classical linear models.Additionally, SNRx follows Ohlert and Whitcomb’s unified approach for sizing aneffect, which allows for intuitive effect size definitions, and consistent estimates ofpower. This paper details the process for defining an effect size, constructing thecoefficients for the test, and calculating power for the family of generalized linearmodels. The focus is on experimental designs that have multi-level categorical factors. A simulation study is performed which demonstrates that SNRx power results agreewith simulation.

Suggested Citation

Johnson, Thomas H., Laura Freeman, Jim Simpson, and Colin Anderson. “Power Approximations for Generalized Linear Models Using the Signal-to-Noise Transformation Method.” Quality Engineering 30, no. 3 (July 3, 2018): 511–24. https://doi.org/10.1080/08982112.2017.1361537.

Slides:

Prediction Uncertainty for Autocorrelated Lognormal Data with Random Effects

Sun, 01 Jan 2017 00:00:00 +0000

Accurately presenting model estimates with appropriate uncertainties is critical to the credibility and defensibility of anypiece of statistical analysis. When dealing with complex data that require hierarchical covariance structures, many of the standardapproaches for visualizing uncertainty are insufficient. One such case is data fit with log-linear autoregressive mixed effectsmodels. Data requiring such an approach have three exceptional characteristics.1. The data are sampled in “groups” that exhibit variation unexplained by other model factors.2. The data are sampled over time and exhibit autocorrelation.3. The data originate from a skewed distribution.These data are addressed using a log-linear autoregressive mixed model (LLARMM), which accounts for each of thesecharacteristics.

Suggested Citation

Freeman, Laura J, and Matthew R Avery. Lognormal Data with Random Effects. IDA Document NS D-8629. Alexandria, VA: Institute for Defense Analyses, 2017.

Slides:

Regularization for Continuously Observed Ordinal Response Variables with Piecewise-Constant Functional Predictors

Fri, 01 Jan 2016 00:00:00 +0000

This paper investigates regularization for continuously observed covariates that resemble step functions. The motivating examples come from operational test data from a recent United States Department of Defense (DoD) test of the Shadow Unmanned Air Vehicle system. The response variable, quality of video provided by the Shadow to friendly ground units, was measured on an ordinal scale continuously over time. Functional covariates, altitude and distance, can be well approximated by step functions. Two approaches for regularizing these covariates are considered, including a thinning approach commonly used within the DoD to address autocorrelated time series data, and a novel “smoothing” approach, which first approximates the covariates as step functions and then treats each “step” as a uniquely observed data point. Data sets resulting from both approaches are fit using a mixed model cumulative logistic regression, and we compare their results. While the thinning approach identifies altitude as having a significant impact on video quality, the smoothing approach finds no evidence of an effect. This difference is attributable to the larger effective sample size produced by thinning. System characteristics make it unlikely that video quality would degrade at higher altitudes, suggesting that the thinning approach has produced a Type 1 error. By accounting for the functional characteristics of the covariates, the novel smoothing approach has produced a more accurate characterization of the Shadow’s ability to provide full motion video to supported units.

Suggested Citation

Avery, Matthew, Mark Orndorff, Timothy Robinson, and Laura Freeman. “Regularization for Continuously Observed Ordinal Response Variables with Piecewise-Constant Functional Covariates.” Quality and Reliability Engineering International 32, no. 6 (2016): 2033–42. https://doi.org/10.1002/qre.2037.

Paper:

A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense

Wed, 01 Jan 2014 00:00:00 +0000

This paper summarizes sensitivity test methods commonly employed in the Department of Defense. A comparison study shows that modern methods such as Neyer’s method and Three-Phase Optimal Design are improvements over historical methods.

Suggested Citation

Johnson, Thomas H., Laura Freeman, Janice Hester, and Jonathan L. Bell. “A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense.” IEEE Access 2 (2014): 1442–55. https://doi.org/10.1109/ACCESS.2014.2377633.

Paper:

Comparing Computer Experiments for the Gaussian Process Model Using Integrated Prediction Variance

Tue, 01 Jan 2013 00:00:00 +0000

Space-Filling Designs are a common choice of experimental design strategy for computer experiments. This paper compares space filling design types based on their theoretical prediction variance properties with respect to the Gaussian Process model.

Suggested Citation

Silvestrini, Rachel T., Douglas C. Montgomery, and Bradley Jones. “Comparing Computer Experiments for the Gaussian Process Model Using Integrated Prediction Variance.” Quality Engineering 25, no. 2 (April 2013): 164–74. https://doi.org/10.1080/08982112.2012.758284.

Paper:

Choice of Second-Order Response Surface Designs for Logistic and Poisson Regression Models

Thu, 01 Jan 2009 00:00:00 +0000

This paper illustrates the construction of D-optimal second order designs for situations when the response is either binomial (pass/fail) or Poisson (count data).

Suggested Citation

Johnson, Rachel T., and Douglas C. Montgomery. “Choice of Second-Order Response Surface Designs for Logistic and Poisson Regression Models.” International Journal of Experimental Design and Process Optimisation 1, no. 1 (2009): 2. https://doi.org/10.1504/IJEDPO.2009.028954.

Paper:

Designing Experiments for Nonlinear Models—an Introduction

Thu, 01 Jan 2009 00:00:00 +0000

We illustrate the construction of Bayesian D-optimal designs for nonlinear models and compare the relative efficiency of standard designs with these designs for several models and prior distributions on the parameters. Through a relative efficiency analysis, we show that standard designs can perform well in situations where the nonlinear model is intrinsically linear. However, if the model is nonlinear and its expectation function cannot be linearized by simple transformations, the nonlinear optimal design is considerably more efficient than the standard design.

Suggested Citation

Johnson, Rachel T., and Douglas C. Montgomery. “Designing Experiments for Nonlinear Models—an Introduction.” Quality and Reliability Engineering International 26, no. 5 (July 2010): 431–41. https://doi.org/10.1002/qre.1063.