Practitioner on Test Science Research Document Library

A Practitioner’s Framework for Federated Model Validation Resource Allocation

Mon, 01 Jan 2024 00:00:00 +0000

Recent advances in computation and statistics led to an increasing use of federated models for end-to-end system test and evaluation. A federated model is a collection of interconnected models where the outputs of a model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood, especially when testers have limited resources, knowledge-based uncertainties, and concerns over operational realism. Testers often struggle with determining how to best allocate limited test resources for model validation. We propose a network-based representation of federated models, where the network encodes the connections between the federation of models. Nodes of the graph are given by sub-models. A directed edge from node a to node b is drawn if a inputs into b. We quantify their uncertainties through edge weights using meta-modeling and variance-based sensitivity analysis. The network-based framework allows us to propagate the uncertainties through the federated model and optimize resource allocation for validation based on the uncertainties.

Suggested Citation

Capp, Jo Anna, John T Haman, and Dhruv Patel. A Practitioner’s Framework for Federated Model Validation Resource Allocation. IDA Product ID 3001838. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

A Preview of Functional Data Analysis for Modeling and Simulation Validation

Mon, 01 Jan 2024 00:00:00 +0000

Modeling and simulation (M&S) validation for operational testing often involves comparing live data with simulation outputs. Statistical methods known as functional data analysis (FDA) provides techniques for analyzing large data sets (“large” meaning that a single trial has a lot of information associated with it), such as radar tracks. We preview how FDA methods could assist M&S validation by providing statistical tools handling these large data sets. This may facilitate analyses that make use of more of the data available and thus allows for better detection of differences between M&S predictions and live test results. We demonstrate some fundamental FDA approaches with a notional example of live and simulated radar tracks of a bomber’s flight

Suggested Citation

Medlin, Rebecca M, and Curtis G Miller. A Preview of Functional Data Analysis for Modeling and Simulation Validation. IDA Product ID 3001829. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

A Reliability Assurance Test Planning and Analysis Tool

Mon, 01 Jan 2024 00:00:00 +0000

This presentation documents the work of IDA 2024 Summer Associate Emma Mitchell. The work presented details an R Shiny application developed to provide a user-friendly software tool for researchers to use in planning for and analyzing system reliability. Specifically, the presentation details how one can plan for a reliability test using Bayesian Reliability Assurance test methods. Such tests utilize supplementary data and information, including reliability models, prior test results, expert judgment, and knowledge of environmental conditions, to plan for reliability testing, which in turn can often help in reducing the required amount of testing. In the planning phase, the application enables researchers to use Bayesian methods to incorporate supplementary data when determining appropriate test lengths. In the analysis phase, the tool allows researchers to combine information through Bayesian methods, resulting in better uncertainty quantification than traditional methods.

Suggested Citation

Haman, John T, Rebecca M Medlin, Emma P Mitchell, Keyla Pagán-Rivera, and Dhruv K Patel. A Reliability Assurance Test Planning and Analysis Tool. IDA Product ID 3003359. Institute for Defense Analyses, 2024.

Slides:

Developing AI Trust- From Theory to Testing and the Myths in Between

Mon, 01 Jan 2024 00:00:00 +0000

This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working definitions and works toward dispelling confusion and myths surrounding trust.

Suggested Citation

Razin, Yosef S., and Kristen Alexander. “Developing AI Trust: From Theory to Testing and the Myths in Between.” The ITEA Journal of Test and Evaluation 45, no. 1 (March 31, 2024). https://itea.org/journals/volume-45-1/developing-ai-trust-from-theory-to-testing-and-the-myths-in-between/.

Slides:

Paper:

Operational T&E of AI-Supported Data Integration, Fusion, and Analysis Systems

Mon, 01 Jan 2024 00:00:00 +0000

AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&E methods for overcoming them. We demonstrate how to apply these OT&E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?) and human systems interaction (how does the AI affect users?).

Suggested Citation

Anderson, Breeana G, Adam M Miller, Logan K Ausman, John T Haman, Keyla Pagan-Rivera, Sarah A Shaffer, and Brian D Vickers. Data Integration, Fusion, and Analysis Systems. IDA Product ID 3001848. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Sequential Space-Filling Designs for Modeling & Simulation Analyses

Mon, 01 Jan 2024 00:00:00 +0000

Space-filling designs (SFDs) are a rigorous method for designing modeling and simulation (M&S) studies. However, they are hindered by their requirement to choose the final sample size prior to testing. Sequential designs are an alternative that can increase test efficiency by testing small amounts of data at a time. We have conducted a literature review of existing sequential space-filling designs and found the methods most applicable to the test and evaluation (T&E) community.

Suggested Citation

Haman, John T, and Anna Flowers. Sequential Space-Filling Designs for Modeling & Simulation Analyses. IDA Product ID 3003752. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Simulation Insights on Power Analysis with Binary Responses--from SNR Methods to 'skprJMP'

Mon, 01 Jan 2024 00:00:00 +0000

Logistic regression is a commonly-used method for analyzing tests with probabilistic responses in the test community, yet calculating power for these tests has historically been challenging. This difficulty prompted the development of methods based on signal-to-noise ratio (SNR) approximations over the last decade, tailored to address the intricacies of logistic regression’s binary outcomes. However, advancements and improvements in statistical software and computational power have reduced the need for such approximate methods. Our research presents a detailed simulation study that compares SNR-based power estimates with those derived from exact Monte Carlo simulations, highlighting the inadequacies of SNR approximations. To address these shortcomings, we will discuss improvements in the open-source R package “skpr” as well as present “skprJMP,” a new plug-in that offers more accurate and reliable power calculations for logistic regression analyses.

Suggested Citation

Atkins, Robert, Tyler Morgan-Wall, and Curtis Miller. “With Binary Responses–From SNR Methods to ‘skprJMP.’” Institute for Defense Analyses IDA Product ID 3002093 (April 2024).

Slides:

Paper:

Poster:

Statistical Advantages of Validated Surveys over Custom Surveys

Mon, 01 Jan 2024 00:00:00 +0000

Surveys play an important role in quantifying user opinion during test and evaluation (T&E). Current best practice is to use surveys that have been tested, or “validated,” to ensure that they produce reliable and accurate results. However, unvalidated (“custom”) surveys are still widely used in T&E, raising questions about how to determine sample sizes for—and interpret data from— T&E events that rely on custom surveys. In this presentation, I characterize the statistical properties of validated and custom survey responses using data from recent T&E events, and then I demonstrate how these properties affect test design, analysis, and interpretation. I show that validated surveys reduce the number of subjects required to estimate statistical parameters or to detect a mean difference between two populations. Additionally, I simulate the survey process to demonstrate how poorly designed custom surveys introduce unintended changes to the data, increasing the risk of drawing false conclusions.

Suggested Citation

Bell, Jonathan L, and Adam M Miller. Statistical Advantages of Validated Surveys over Custom Surveys. IDA Product ID 3001858. Alexandria, VA: Institute for Defense Analyses, 2024.

Poster:

Uncertainty Quantification for Ground Vehicle Vulnerability Simulation

Mon, 01 Jan 2024 00:00:00 +0000

A vulnerability assessment of a combat vehicle uses modeling and simulation (M&S) to predict the vehicle’s vulnerability to a given enemy attack. The system-level output of the M&S is the probability that the vehicle’s mobility is degraded as a result of the attack. The M&S models this system-level phenomenon by decoupling the attack scenario into a hierarchy of sub-systems. Each sub-system addresses a specific scientific problem, such as the fracture dynamics of an exploded munition, or the ballistic resistance provided by the vehicle’s armor. For each sub-system in the hierarchy, laboratory testing is conducted to gather data to fit a subsystem-level model. The M&S hierarchically interconnects the subsystem-level models to enable prediction of the system-level output. As part of the DoD’s ongoing effort to improve M&S using verification, validation, and uncertainty quantification, we present a case study that propagates the uncertainties in the hierarchy of sub-models to the system-level output.

Suggested Citation

Johnson, Thomas H., Dhruv K. Patel, John T. Haman, Jeremy S. Werner, and Dave Higdon. “Uncertainty Quantification for Ground Vehicle Vulnerability Simulation.” Quality Engineering, August 19, 2024. https://www.tandfonline.com/doi/abs/10.1080/08982112.2024.2394437.

Slides:

Paper:

A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams

Sun, 01 Jan 2023 00:00:00 +0000

We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working together as a team involves interactions that will only increase in complexity as machines become more intelligent, capable teammates. Assessing such teams will require explicit focus on not just the human-machine interfacing but the full spectrum of interactions between and among agents. As opposed to focusing on isolated qualities, capabilities, and performance contributions of individual team members, the proposed framework emphasizes the collective team as the fundamental unit of analysis and the interactions of the team as the key evaluation targets, with individual human and machine metrics still vital but secondary. With teammate interaction as the organizing diagnostic concept, the resulting framework arrives at a parallel assessment of the humans and machines, analyzing their individual capabilities less with respect to purely human or machine qualities and more through the prism of contributions to the team as a whole. This treatment reflects the increased machine capabilities and will allow for continued relevance as machines develop to exercise more authority and responsibility. This framework allows for identification of features specific to human-machine teaming that influence team performance and efficiency, and it provides a basis for operationalizing in specific scenarios. Potential applications of this research include test and evaluation of complex systems that rely on human-system interaction, including—though not limited to—autonomous vehicles, command and control systems, and pilot control systems.

Suggested Citation

Wilkins, Jay, David A. Sparrow, Caitlan A. Fealing, Brian D. Vickers, Kristina A. Ferguson, and Heather Wojton. “A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams.” Systems Engineering 27, no. 3 (May 1, 2024): 466–84. https://doi.org/10.1002/sys.21730.

Slides:

Paper:

Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions

Sun, 01 Jan 2023 00:00:00 +0000

This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test computes a Wald-type statistic comparing two generalized linear models estimated from live test data and analogous simulated data. The resulting statistic indicates whether the M&S outputs differ from the live data. After developing the test, we applied it to two logistic regression models estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s Environment Centric Weapons Analysis Facility (ECWAF). We developed this test to handle a specific problem with our data one weapon variant was seen in the in-water test data, but the ECWAF data had two weapon variants. We overcame this deficiency by adjusting the Wald statistic via combining linear model coefficients with the intercept term when a factor is varied in one sample but not another. A similar approach could be applied with score-type tests, which we also describe.

Suggested Citation

Metts, Carrington, and Curtis Miller. “Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions.” The ITEA Journal of Test and Evaluation 44, no. 3 (August 25, 2023). https://itea.org/journals/volume-44-3/development-of-wald-type-and-score-type-statistical-tests-to-compare-live-test-data-and-simulation-predictions/.

Slides:

Paper:

Implementing Fast Flexible Space-Filling Designs in R

Sun, 01 Jan 2023 00:00:00 +0000

Modeling and simulation (M&S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&S, testers use experimental design techniques to determine how much and which types of data to collect, and they can use space-filling designs to spread out test points across the operational space. Fast flexible space-filling designs (FFSFDs) are a type of space-filling design useful for M&S because they work well in design spaces with disallowed combinations and permit the inclusion of categorical factors. IDA analysts developed a function to create FFSFDs using the free statistical software R. To our knowledge, there are no R packages for creating an FFSFD that can accommodate a variety of user inputs, such as categorical factors. Moreover, users of IDA’s function can share their code to make their work reproducible.

Suggested Citation

Medlin, Rebecca M, and Christopher T Dimapasok. Space-Filling Designs in R. IDA Document NS 3000045. Alexandria, VA: Institute for Defense Analyses, 2023.

Slides:

Improving Test Efficiency- A Bayesian Assurance Case Study

Sun, 01 Jan 2023 00:00:00 +0000

To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase to better quantify uncertainty. We find that when using Bayesian Methods for test planning we can scope smaller tests and using Bayesian methods in analysis results in a more precise estimate of reliability – improving uncertainty quantification.

Suggested Citation

Medlin, Rebecca M. A Bayesian Assurance Case Study. IDA Document NS 3000024. Alexandria, VA: Institute for Defense Analyses, 2023.

Slides:

Introduction to Design of Experiments in R- Generating and Evaluating Designs with Skpr

Sun, 01 Jan 2023 00:00:00 +0000

This workshop instructs attendees on how to run an end-to-end optimal Design of Experiments workflow in R using the open source skpr package. This workshop is split into two sections optimal design generation and design evaluation. The first half of the workshop provides basic instructions how to use R, as well as how to use skpr to create an optimal design for an experiment how to specify a model, create a candidate set of potential runs, remove disallowed combinations, and specify the design generation conditions to best suit an experimenter’s goals. The second half of the workshop covers design evaluation with skpr how to determine if an experimental design is adequate for the test at hand. The workshop provides information on how to perform power calculations and evaluate other design properties that affect design quality. This also includes instruction on how to generate fraction of design space plots and correlation plots.

Suggested Citation

Morgan-Wall, Tyler T. Introduction to Design of Experiments in R: Generating and Evaluating Designs with Skpr. IDA Document NS D-33397. Alexandria, VA: Institute for Defense Analyses, 2023.

Slides:

Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios

Sun, 01 Jan 2023 00:00:00 +0000

Situation Awareness (SA) plays a key role in decision making and human performance, higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” is a common requirement for systems under test, there is no single standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and sometimes suboptimal treatments of SA measurement across programs and test events. This paper introduces Endsley’s three-level model of SA in dynamic decision making, a frequently used model of individual SA, reviews trade-offs in some existing measures of SA, and discusses a selection of potential ways in which SA measurement during OT may be improved.

Suggested Citation

Green, Elizabeth, Miriam Armstrong, and Janna Mantua. “Scientific Measurement of Situation Awareness in Operational Testing.” The ITEA Journal of Test and Evaluation 44, no. 3 (October 2, 2023). https://doi.org/10.61278/itea.44.3.1002.

Slides:

Paper:

Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data

Sat, 01 Jan 2022 00:00:00 +0000

Modeling and simulation (M&S) outputs help the Director, Operational Test and Evaluation (DOT&E) assess the effectiveness, survivability, lethality, and suitability of systems. To use M&S outputs, DOT&E needs models and simulators to be sufficiently verified and validated. The purpose of this paper is to improve the state of verification and validation by recommending and demonstrating a set of statistical techniques—metamodels, also called statistical emulators—to the M&S community.

The paper expands on DOT&E’s existing guidance about metamodel usage by creating methodological recommendations the M&S community could apply to its activities. For a deterministic, discrete response variable, we recommend using a nearest neighbor or decision tree model. For a deterministic, continuous response variable, we recommend Gaussian process interpolation. For a stochastic response variable, we recommend a generalized additive model. We also present a set of techniques that testers can use to assess the adequacy of their metamodels. We conclude with a notional example that demonstrates the recommended techniques.

Suggested Citation

Haman, John T, and Curtis G Miller. Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data. IDA Paper P-33230. Alexandria, VA: Institute for Defense Analyses, 2022.

Slides:

Paper:

Predicting Trust in Automated Systems - An Application of TOAST

Sat, 01 Jan 2022 00:00:00 +0000

Following Wojton’s research on the Trust of Automated Systems Test (TOAST), which is designed to measure how much a human trusts an automated system, we aimed to determine how well this scale performs when not used in a military context. We found that participants who used a poorly performing automated system trusted the system less than expected when using that system on a case by case basis, however, those who used a high performing system trusted the system the same as they expected. Additionally, both participants who used the poorly performing system and those who used the high performing system lost a significant amount of trust after using the system on a group case basis. These results indicate that having a high performance system is important for trust, but only when the user has the ability to decide to trust or distrust the system on a case-by-case basis.

Suggested Citation

Porter, Daniel J, and Caitlan A Fealing. Predicting Trust in Automated Systems – An Application of TOAST. IDA Document NS D-33188. Alexandria, VA: Institute for Defense Analyses, 2022.

Slides:

Paper:

Thoughts on Applying Design of Experiments (DOE) to Cyber Testing

Sat, 01 Jan 2022 00:00:00 +0000

This briefing presented at Dataworks 2022 provides examples of potential ways in which Design of Experiments (DOE) could be applied to initially scope cyber assessments and, based on the results of those assessments, subsequently design in greater detail cyber tests.

Suggested Citation

Gilmore, James M, Kelly M Avery, Matthew R Girardi, and Rebecca M Medlin. Thoughts on Applying Design of Experiments (DOE) to Cyber Testing. IDA Document NS D-33023. Alexandria, VA: Institute for Defense Analyses, 2022.

Slides:

Topological Modeling of Human-Machine Teams

Sat, 01 Jan 2022 00:00:00 +0000

A Human-Machine Team (HMT) is a group ofagents consisting of at least one human and at least one machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential. Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios. You cannot predict team performance from knowledge of the individual team agents, alone, interaction between the humans and machines — and interaction between team agents, in general — increases the problem space and adds a measure of unpredictability. Collective team or group performance, in turn, depends heavily on how a team is structured and organized, as well as the mechanisms, paths, and substructures through which the agents in the team interact with one another — i.e. the team’s topology. With the tools and metrics for measuring team structure and interaction becoming more highly developed in recent years, we will propose and discuss a practical, topological HMT modeling framework that not only takes into account but is actually built around the team’s topological characteristics, while still utilizing the individual human and machine performance measures.

Suggested Citation

Wilkins, Leonard D, Caitlan A Fealing, V. Bram Lillard, and John Haman. Topological Modeling of Human-Machine Teams. IDA Document NS D-33031. Alexandria, VA: Institute for Defense Analyses, 2022.

Slides:

Introduction to Bayesian Analysis

Fri, 01 Jan 2021 00:00:00 +0000

As operational testing becomes increasingly integrated and research questions become more difficult to answer, IDA’s Test Science team has found Bayesian models to be powerful data analysis methods. Analysts and decision-makers should understand the differences between this approach and the conventional way of analyzing data. It is also important to recognize when an analysis could benefit from the inclusion of prior information—what we already know about a system’s performance—and to understand the proper way to incorporate that information. To apply Bayesian methods, analysts need to comprehend some technical aspects of this approach and know how to properly use appropriate statistical software. In this course, students learn the intuition behind Bayesian statistics, the mathematical details of posterior distributions, how to fit simple Bayesian models using computer software, and how to assess model fit.

Suggested Citation

Wojton, Heather M, Keyla Pagan-Rivera, John T Haman, and Rebecca M Medlin. Introduction to Bayesian Analysis. IDA Document NS D-20484. Alexandria, VA: Institute for Defense Analyses, 2021.

Slides:

Space-Filling Designs for Modeling & Simulation

Fri, 01 Jan 2021 00:00:00 +0000

This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&S) data collection.

Suggested Citation

Avery, Kelly, John T Haman, Thomas Johnson, Curtis Miller, Dhruv Patel, and Han Yi. Test Design Challenges in Defense Testing. IDA Product ID 3002855. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Paper:

Warhead Arena Analysis Advancements

Fri, 01 Jan 2021 00:00:00 +0000

Fragmentation analysis is a critical piece of the live fire test and evaluation (LFT&E) of the lethality and vulnerability aspects of warheads. But the traditional methods for data collection are expensive and laborious. New optical tracking technology is promising to increase the fidelity of fragmentation data, and decrease the time and costs associated with data collection. However, the new data will be complex, three-dimensional “fragmentation clouds,” possibly with a time component as well, and there will be a larger number of individual data points. This raises questions about how testers can effectively summarize spatial data and use it to draw conclusions about warhead performance for sponsors. In this briefing, we will discuss Bayesian spatial models that are effective for characterizing the mass and velocity fragmentation distributions, along with several exploratory data analysis techniques that help us make sense of the data. Our goals are to

Produce simple statistics and visuals that help the live fire analyst compare and contrast warhead fragmentations.
Characterize important performance attributes or confirm design/spec compliance.
Provide data methods that ensure higher fidelity data collection translates to higher fidelity modeling and simulation down the line.

Suggested Citation

Couch, Mark, Thomas Johnson, John Haman, Kerry Walzl, Heather Wojton, Thomas Hatch-Aguilar, and David Higdon. Warhead Arena Analysis Advancements. IDA Document NS-D-11038. Alexandria, VA: Institute for Defense Analyses, 2021.

Slides:

A Review of Sequential Analysis

Wed, 01 Jan 2020 00:00:00 +0000

Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the course of the investigation. Expanding the use of sequential analysis has the potential to save resources and reduce test time (National Research Council, 1998). This paper summarizes the literature on sequential analysis and offers fundamental information for providing recommendations for its use in DoD test and evaluation.

Suggested Citation

Wojton, Heather, Rebecca Medlin, John Dennis, Keyla Pagan-Rivera, and Leonard Wilkins. A Review of Sequential Analysis. IDA Document NS D-20487. Alexandria, VA: Institute for Defense Analyses, 2020.

Paper:

Circular Prediction Regions for Miss Distance Models under Heteroskedasticity

Wed, 01 Jan 2020 00:00:00 +0000

Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss distance response variable is bivariate normal and has a mean and variance that can change with one or more experimental factors. The first approach fits a heteroskedastic linear model using restricted maximum likelihood, and uses the Kenward-Roger statistic to estimate circular prediction regions. The second approach fits the analogous Bayesian model with unrestricted likelihood modifications, and computes circular prediction regions by sampling from the posterior predictive distribution. The two approaches are applied to an example problem, and are compared using simulation.

Suggested Citation

Johnson, Thomas H., John T. Haman, Heather Wojton, and Laura Freeman. “Circular Prediction Regions for Miss Distance Models under Heteroskedasticity.” Quality and Reliability Engineering International 37, no. 7 (November 2021): 2991–3003. https://doi.org/10.1002/qre.2771.

Slides:

Paper:

Poster:

Bayesian Component Reliability- An F-35 Case Study

Tue, 01 Jan 2019 00:00:00 +0000

A challenging aspect ofa system reliability assessment is integratingmultiple sources of information, such as component, subsystem, and full-system data,along with previous test data or subject matter expert (SME) opinion. A powerfulfeature of Bayesian analyses is the ability to combine these multiple sources of dataand variability in an informed way to perform statistical inference. This feature isparticularly valuable in assessing system reliability where testing is limited and only asmall number of failures (or none at all) are observed.The F-35 is DoD’s largest program; approximately one-third of the operations andsustainment cost is attributed to the cost of spare parts and the removal, replacement,and repair of components. The failure rate of those components is the drivingparameter for a significant portion of the sustainment cost, and yet for many of thesecomponents, available estimates of the failure rate are poor. For many programs, thecontractor produces estimates of component failure rates based on engineering analysisand legacy systems with similar parts. While these estimates are useful, the actualremoval rates provide a more accurate estimate of the removal and replacement ratesthe program will experience in future years.In this document, we show how we applied a Bayesian analysis to combine theengineering reliability estimates with the actual failure data to estimate componentreliability. Our analysis technique also allows for us to overcome the problems of caseswhere few or no failures have been observed. We are able to show that combining theengineering knowledge of reliability with the observed operational reliability results inboth a more informed estimate of each individual component’s reliaiblity and a moreinformed estimate of overall F-35 maintenance costs.The technique presented is broadly applicable to any progam where multiple sourcesof reliability information need to be combined for the best estimation of componentfailure rates, and ultimately of sustainment costs.

Suggested Citation

Medlin, Rebecca M, and V. Bram Lillard. Bayesian Component Reliability Estimation: An F-35 Case Study. IDA Document NS D-10561. Alexandria, VA: Institute for Defense Analyses, 2019.

Slides:

Challenges and New Methods for Designing Reliability Experiments

Tue, 01 Jan 2019 00:00:00 +0000

Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges in designing reliability experiments for complex, repairable systems. We leverage lessons learned from current research and propose methods for designing an experiment for a complex, repairable system.

Suggested Citation

Freeman, Laura J., Rebecca M. Medlin, and Thomas H. Johnson. “Challenges and New Methods for Designing Reliability Experiments.” Quality Engineering 31, no. 1 (January 2, 2019): 108–21. https://doi.org/10.1080/08982112.2018.1546394.

Paper:

Handbook on Statistical Design & Analysis Techniques for Modeling & Simulation Validation

Tue, 01 Jan 2019 00:00:00 +0000

This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&A) and the emerging references on uncertainty quantification (UQ). The goal of this handbook is to aid the test and evaluation (T&E) community in developing test strategies that support model validation (both external validation and parametric analysis) and statistical UQ.

Suggested Citation

Wojton, Heather, Kelly M Avery, Laura J Freeman, Samuel H Parry, Gregory S Whittier, Thomas H Johnson, and Andrew C Flack. Handbook on Statistical Design & Analysis Techniques for Modeling & Simulation Validation. IDA Document NS D-10455. Alexandria, VA: Institute for Defense Analyses, 2019.

Slides:

Paper:

M&S Validation for the Joint Air-to-Ground Missile

Tue, 01 Jan 2019 00:00:00 +0000

An operational test is resource-limited and must therefore rely on both live test data and modeling and simulation (M&S) data to inform a full evaluation. For the Joint Air-to-Ground Missile (JAGM) system, we needed to create a test design that accomplished dual goals, characterizing missile performance across the operational space and supporting rigorous validation of the M&S. Our key question is which statistical techniques should be used to compare the M&S to the live data?

Suggested Citation

Crabtree, Brent, Andrew Cseko, Joel Williamson, and Kelly Avery. M&S Validation for the Joint Air-to-Ground Missile. Alexandria, VA: Institute for Defense Analyses, 2019.

Poster:

Operational Testing of Systems with Autonomy

Tue, 01 Jan 2019 00:00:00 +0000

Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test & Evaluation along with his deputies and Chief Scientist.

Suggested Citation

Wojton, Heather M, Daniel Porter, Yevgeniya Pinelis, Chad Bieber, Heather Wojton, Michael McAnally, and Laura Freeman. Operational Testing of Systems with Autonomy. IDA Document NS D-9266. Alexandria, VA: Institute for Defense Analyses, 2019.

Slides:

Sample Size Determination Methods Using Acceptance Sampling by Variables

Tue, 01 Jan 2019 00:00:00 +0000

Acceptance Sampling by Variables (ASbV) is a statistical testing technique used in Personal Protective Equipment programs to determine the quality of the equipment in First Article and Lot Acceptance Tests. This article intends to remedy the lack of existing references that discuss the similarities between ASbV and certain techniques used in different sub-disciplines within statistics. Understanding ASbV from a statistical perspective allows testers to create customized test plans, beyond what is available in MIL-STD-414.

Suggested Citation

Walzl, Kerry, Lindsey A Davis, Thomas H Johnson, and Heather M Wojton. Sample Size Determination Methods Using Acceptance Sampling by Variables. IDA Document NS D-10666. Alexandria, VA: Institute for Defense Analyses, 2019.

Paper:

The Effect of Extremes in Small Sample Size on Simple Mixed Models- A Comparison of Level-1 and Level-2 Size

Tue, 01 Jan 2019 00:00:00 +0000

We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, our findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research.

Suggested Citation

Carter, Kristina A, Heather M Wojton, and Stephanie T Lane. “The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size.” The ITEA Journal of Test and Evaluation 40, no. 1 (2019): 16–29.

Paper:

The Purpose of Mixed-Effects Models in Test and Evaluation

Tue, 01 Jan 2019 00:00:00 +0000

Mixed-effects models are the standard technique for analyzing data with grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in many operational tests. In this article, we describe the advantages of modeling data from a mixed-effects perspective and discuss an R package—ciTools—that equips the user with easy methods for presenting results from this type of model.

Suggested Citation

Haman, John, Matthew Avery, and Heather Wojton. “The Purpose of Mixed-Effects Models in Test and Evaluation.” The ITEA Journal of Test and Evaluation 40, no. 4 (2019): 249–55.

Slides:

Paper:

Analysis of Split-Plot Reliability Experiments with Subsampling

Mon, 01 Jan 2018 00:00:00 +0000

Reliability experiments are important for determining which factors drive product reliability. The data collected in these experiments can be challenging to analyze. Often, the reliability or lifetime data collected follow distinctly nonnormal distributions and include censored observations. Additional challenges in the analysis arise when the experiment is executed with restrictions on randomization. The focus of this paper is on the proper analysis of reliability data collected from a nonrandomized reliability experiments. Specifically, we focus on the analysis of lifetime data from a split-plot experimental design. We outline a nonlinear mixed-model analysis for a split-plot reliability experiment with subsampling and right-censored Weibull distributed lifetime data. A simulation study compares the proposed method with a two-stage method of analysis.

Suggested Citation

Medlin, Rebecca M., Laura J. Freeman, Jennifer L.K. Kensler, and G. Geoffrey Vining. “Analysis of Split-Plot Reliability Experiments with Subsampling.” Quality and Reliability Engineering International 35, no. 3 (2019): 738–49. https://doi.org/10.1002/qre.2394.

Paper:

Comparing M&S Output to Live Test Data- A Missile System Case Study

Mon, 01 Jan 2018 00:00:00 +0000

In the operational testing of DoD weapons systems, modeling and simulation (M&S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&S and live data across multiple operational conditions and quantify the associated uncertainties.

Suggested Citation

Thomas, Dean, and Kelly M Avery. Comparing M&S Output to Live Test Data: A Missile System Case Study. IDA Non-Standard Document NS D-9002. Alexandria, VA: Institute for Defense Analyses, 2018.

Slides:

Improved Surface Gunnery Analysis with Continuous Data

Mon, 01 Jan 2018 00:00:00 +0000

Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allows us to draw statistical conclusions with tactical implications from free-play,live-fire surface gunnery events. Our analysis provided the Navy with suggestions forimprovements to its tactics and the employment of its weapons. A censored analysisenabled us to do so, where other methods fell short.

Suggested Citation

Ashwell, Benjamin A, V Bram Lillard, and George M Khoury. Improved Surface Gunnery Analysis with Continuous Data. IDA Document NS D-8990. Alexandria, VA: Institute for Defense Analyses, 2018.

Slides:

Introduction to Observational Studies

Mon, 01 Jan 2018 00:00:00 +0000

A presentation on the theory and practice of observational studies. Specific average treatment effect methods include matching, difference-in-difference estimators, and instrumental variables.

Suggested Citation

Thomas, Dean, and Yevgeniya K Pinelis. Introduction to Observational Studies. IDA Document NS D-9020. Alexandria, VA: Institute for Defense Analyses, 2018.

Slides:

Parametric Reliability Models Tutorial

Mon, 01 Jan 2018 00:00:00 +0000

This tutorial demonstrates how to plot reliability functions parametrically in R using the output from any reliability modeling software. It provides code and sample plots of reliability and failure rate functions with confidence intervals for three different skewed probability distributions the exponential, the two-parameter Weibull, and the lognormal. These three distributions are the most common parametric models for reliability or survival analysis. This paper also provides mathematical background for the models and recommendations for when to use them.

Suggested Citation

Pinelis, Yevgeniya K, and William R Whitledge. “Tutorial: Parametric Reliability Models.” Institute for Defense Analyses IDA Non-Standard Document NS D-9171 (September 2018).

Paper:

Scientific Test and Analysis Techniques

Mon, 01 Jan 2018 00:00:00 +0000

Abstract

This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&E) continuous learning module. The module provides a basic understanding of STAT in T&E. Topics coverec include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&E certification curriculum.

Slides

Scientific Test and Analysis Techniques- Continuous Learning Module

Mon, 01 Jan 2018 00:00:00 +0000

This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&E) continuous learning module. The module provides a basic understanding of STAT in T&E. Topics covered include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&E certification curriculum.

Suggested Citation

Pinelis, Yevgeniya, Laura J Freeman, Heather M Wojton, Denise J Edwards, Stephanie T Lane, and James R Simpson. Scientific Test and Analysis Techniques: Continuous Learning Module. IDA Document NS D-892. Alexandria, VA: Institute for Defense Analyses, 2018.

Testing Defense Systems

Mon, 01 Jan 2018 00:00:00 +0000

The complex, multifunctional nature of defense systems, along with the wide variety of system types, demands a structured but flexible analytical process for testing systems. This chapter summarizes commonly used techniques in defense system testing and specific challenges imposed by the nature of defense system testing. It highlights the core statistical methodologies that have proven useful in testing defense systems. Case studies illustrate the value of using statistical techniques in the design of tests and analysis of the resulting data. The chapter focuses on the unique statistical challenges of designing operational tests, many of which can be attributed to the process, but some of which are inherent to the complexity of the systems and the missions system operators must complete. It provides an overview of the process of designing experiments for military systems with operational users in an operational environment.

Suggested Citation

Freeman, Laura J., Thomas Johnson, Matthew Avery, V. Bram Lillard, and Justace Clutter. “Testing Defense Systems.” In Analytic Methods in Systems and Software Testing, 439–87. John Wiley & Sons, Ltd, 2018. https://doi.org/10.1002/9781119357056.ch18.

Paper:

Comparing Live Missile Fire and Simulation

Sun, 01 Jan 2017 00:00:00 +0000

Modeling and Simulation is frequently used in Test and Evaluation (T&E) of air-to-air weapon systems to evaluate the effectiveness of a weapons. The AirIntercept Missile-9X (AIM-9X) program uses modeling and simulationextensively to evaluate missile miss distances. Since flight testing isexpensive, the test program uses relatively few flight tests and supplementsthose data with large numbers of miss distances from simulated tests acrossthe weapons operational space. However, before modeling and simulation canbe used to predict performance it must first be validated. Validation isespecially challenging when working with a limited number of live test data. Inthis presentation, we show that even with a limited number of live test points(e.g., 16 missile fires), we can still perform a statistical analysis for thevalidation. We introduce a validation technique known as Fisher’s CombinedProbability Test and show how to apply Fisher’s test to validate the AIM-9Xmodel and simulation.

Suggested Citation

Medlin, Rebecca, Pamela Rambow, and Douglas Peek. Comparing Live Missile Fire and Simulation. IDA Document NS D-8443. Alexandria, VA: Institute for Defense Analyses, 2017.

Slides:

On Scoping a Test that Addresses the Wrong Objective

Sun, 01 Jan 2017 00:00:00 +0000

Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In this paper, we focus on a specific Type III error that on some occasions test planners commit to reduce test size and resources.

Suggested Citation

Johnson, Thomas H., Rebecca M. Medlin, Laura J. Freeman, and James R. Simpson. “On Scoping a Test That Addresses the Wrong Objective.” Quality Engineering 31, no. 2 (April 3, 2019): 230–39. https://doi.org/10.1080/08982112.2018.1479035.

Paper:

Bayesian Reliability- Combining Information

Fri, 01 Jan 2016 00:00:00 +0000

One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems where testing is limited. At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. We introduce the Bayesian approach to reliability using several examples and point to open problems and areas for future work.

Suggested Citation

Wilson, Alyson G., and Kassandra M. Fronczyk. “Bayesian Reliability: Combining Information.” Quality Engineering, August 26, 2016, 0–0. https://doi.org/10.1080/08982112.2016.1211889.

Paper:

Tutorial on Sensitivity Testing in Live Fire Test and Evaluation

Fri, 01 Jan 2016 00:00:00 +0000

A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this mini-tutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area.

Suggested Citation

Johnson, Thomas, Laura Freeman, and Raymond Chen. Tutorial on Sensitivity Testing in Live Fire Test and Evaluation. IDA Document NS D-5829. Alexandria, VA: Institute for Defense Analyses, 2016.

Slides:

Estimating System Reliability from Heterogeneous Data

Thu, 01 Jan 2015 00:00:00 +0000

This briefing provides an example of some of the nuanced issues in reliability estimation in operational testing. The statistical models are motivated by an example of the Paladin Integrated Management (PIM). We demonstrate how to use a Bayesian approach to reliability estimation that uses data from all phases of testing.

Suggested Citation

Browning, Caleb, Laura Freeman, Alyson Wilson, Kassandra Fronczyk, and Rebecca Dickinson. “Estimating System Reliability from Heterogeneous Data.” Presented at the Conference on Applied Statistics in Defense, George Mason University, October 2015.

Slides:

Improving Reliability Estimates with Bayesian Statistics

Thu, 01 Jan 2015 00:00:00 +0000

This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology.

Suggested Citation

Freeman, Laura J, and Kassandra Fronczyk. “Improving Reliability Estimates with Bayesian Statistics.” ITEA Journal of Test and Evaluation 37, no. 4 (June 2015).

Paper:

Statistical Models for Combining Information Stryker Reliability Case Study

Thu, 01 Jan 2015 00:00:00 +0000

Reliability is an essential element in assessing the operational suitability of Department of Defense weapon systems. Reliability takes a prominent role in both the design and analysis of operational tests. In the current era of reduced budgets and increased reliability requirements, it is challenging to verify reliability requirements in a single test. Furthermore, all available data should be considered in order to ensure evaluations provide the most appropriate analysis of the system’s reliability. This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed and they are compared and contrasted to illustrate different statistical methods for combining information. We apply these methods to data collected during the developmental and operational test phases for the Stryker family of vehicles. We show that, when we combine the available information across two test phases for the Stryker family of vehicles, reliability estimates are more accurate and precise than those reported previously using traditional methods that use only operational test data in their reliability assessments.

Suggested Citation

Steiner, Stefan, Rebecca M. Dickinson, Laura J. Freeman, Bruce A. Simpson, and Alyson G. Wilson. “Statistical Methods for Combining Information: Stryker Family of Vehicles Reliability Case Study.” Journal of Quality Technology 47, no. 4 (October 2015): 400–415. https://doi.org/10.1080/00224065.2015.11918142.

Slides:

Paper:

Applying Risk Analysis to Acceptance Testing of Combat Helmets

Wed, 01 Jan 2014 00:00:00 +0000

Acceptance testing of combat helmets presents multiple challenges that require statistically-sound solutions. For example, how should first article and lot acceptance tests treat multiple threats and measures of performance? How should these tests account for multiple helmet sizes and environmental treatments? How closely should first article testing requirements match historical or characterization test data? What government and manufacturer risks are acceptable during lot acceptance testing? Similar challenges arise when testing other components of Personal Protective Equipment and similar statistical approaches should be applied to all components. This presentation explores these questions using operating characteristics curves and simulation studies.

Suggested Citation

Hester, Janice, and Laura Freeman. Applying Risk Analysis to Acceptance Testing of Combat Helmets. IDA Document NS D-5334. Alexandria, VA: Institute for Defense Analyses, 2014.

Slides:

Power Analysis Tutorial for Experimental Design Software

Wed, 01 Jan 2014 00:00:00 +0000

This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX).

Suggested Citation

Freeman, Laura J., Thomas H. Johnson, and James R. Simpson. “Power Analysis Tutorial for Experimental Design Software:” Fort Belvoir, VA: Defense Technical Information Center, November 1, 2014. https://doi.org/10.21236/ADA619843.

Paper:

Censored Data Analysis- A Statistical Tool for Efficient and Information-Rich Testing

Tue, 01 Jan 2013 00:00:00 +0000

Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are information-poor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics. A solution is to convert to a continuous variable, such as miss distance or time-to-detect. The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance. Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis. This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology.

Suggested Citation

Lillard, V. Bram. Censored Data Analysis: A Statistical Tool for Efficient and Information-Rich Testing. IDA Document D-4912. Alexandria, VA: Institute for Defense Analyses, 2013.

Slides:

An Expository Paper on Optimal Design

Sat, 01 Jan 2011 00:00:00 +0000

There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are constraints on the design region, and when a non-standard model is expected to be required to adequately explain the response.

Suggested Citation

Johnson, Rachel T., Douglas C. Montgomery, and Bradley A. Jones. “An Expository Paper on Optimal Design.” Quality Engineering 23, no. 3 (July 2011): 287–301. https://doi.org/10.1080/08982112.2011.576203.

Paper:

Design for Reliability using Robust Parameter Design

Sat, 01 Jan 2011 00:00:00 +0000

Recently, the principles of Design of Experiments (DOE) have been implemented as amethod of increasing the statistical rigor of operational tests. The focus has been on ensuringcoverage of the operational envelope in terms of system effectiveness. DOE is applicable inreliability analysis as well. A reliability standard, ANSI-0009, advocates the use Design forReliability (DfR) early in the product development cycle in order to design-in reliability. Robustparameter design (RPD) first used by Taguchi and then by the response surface communityprovides insights on how DOE can be used to make a products and processes invariant tochanges in factors. Using the principles ofRPD, I propose a new application of RPD to DfR.

Suggested Citation

Freeman, Laura. Design for Reliability Using Robust Parameter Design. IDA Document D-4387. Alexandria, VA: Institute for Defense Analyses, 2011.

Slides:

Hybrid Designs- Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models

Sat, 01 Jan 2011 00:00:00 +0000

This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed.

Suggested Citation

Silvestrini, Rachel Johnson. “Hybrid Designs: Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models.” Monterey, California, May 2011.

Slides:

Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods

Fri, 01 Jan 2010 00:00:00 +0000

In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity to reduce wind tunnel test efforts without losing test information.

Suggested Citation

Hill, Raymond R., Derek A. Leggio, Shay R. Capehart, and August G. Roesener. “Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods.” Quality and Reliability Engineering International 27, no. 6 (October 2011): 795–803. https://doi.org/10.1002/qre.1165.