A Practitioner’s Framework for Federated Model Validation Resource Allocation

Recent advances in computation and statistics led to an increasing use of federated models for end-to-end system test and evaluation. A federated model is a collection of interconnected models where the outputs of a model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood, especially when testers have limited resources, knowledge-based uncertainties, and concerns over operational realism. Testers often struggle with determining how to best allocate limited test resources for model validation....

2024 · Dhruv Patel, Jo Anna Capp, John Haman

A Reliability Assurance Test Planning and Analysis Tool

This presentation documents the work of IDA 2024 Summer Associate Emma Mitchell. The work presented details an R Shiny application developed to provide a user-friendly software tool for researchers to use in planning for and analyzing system reliability. Specifically, the presentation details how one can plan for a reliability test using Bayesian Reliability Assurance test methods. Such tests utilize supplementary data and information, including reliability models, prior test results, expert judgment, and knowledge of environmental conditions, to plan for reliability testing, which in turn can often help in reducing the required amount of testing....

2024 · Emma Mitchell, Rebecca Medlin, John Haman, Keyla Pagan-Rivera, Dhruv Patel

Developing AI Trust- From Theory to Testing and the Myths in Between

This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working definitions and works toward dispelling confusion and myths surrounding trust. Suggested Citation Razin, Yosef S., and Kristen Alexander. “Developing AI Trust: From Theory to Testing and the Myths in Between.” The ITEA Journal of Test and Evaluation 45, no....

2024 · Yosef Razin, Kristen Alexander, John Haman

Operational T&E of AI-Supported Data Integration, Fusion, and Analysis Systems

AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&E methods for overcoming them. We demonstrate how to apply these OT&E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?...

2024 · Adam Miller, Logan Ausman, John Haman, Keyla Pagan-Rivera, Sarah Shaffer, Brian Vickers

Quantifying Uncertainty to Keep Astronauts and Warfighters Safe

Both NASA and DOT&E increasingly rely on computer models to supplement data collection, and utilize statistical distributions to quantify the uncertainty in models, so that decision-makers are equipped with the most accurate information about system performance and model fitness. This article provides a high-level overview of uncertainty quantification (UQ) through an example assessment for the reliability of a new space-suit system. The goal is to reach a more general audience in Significance Magazine, and convey the importance and relevance of statistics to the defense and aerospace communities....

2024 · John Haman, John Dennis, James Warner

Sequential Space-Filling Designs for Modeling & Simulation Analyses

Space-filling designs (SFDs) are a rigorous method for designing modeling and simulation (M&S) studies. However, they are hindered by their requirement to choose the final sample size prior to testing. Sequential designs are an alternative that can increase test efficiency by testing small amounts of data at a time. We have conducted a literature review of existing sequential space-filling designs and found the methods most applicable to the test and evaluation (T&E) community....

2024 · Anna Flowers, John Haman

Uncertainty Quantification for Ground Vehicle Vulnerability Simulation

A vulnerability assessment of a combat vehicle uses modeling and simulation (M&S) to predict the vehicle’s vulnerability to a given enemy attack. The system-level output of the M&S is the probability that the vehicle’s mobility is degraded as a result of the attack. The M&S models this system-level phenomenon by decoupling the attack scenario into a hierarchy of sub-systems. Each sub-system addresses a specific scientific problem, such as the fracture dynamics of an exploded munition, or the ballistic resistance provided by the vehicle’s armor....

2024 · John Haman, David Higdon, Thomas Johnson, Dhruv Patel, Jeremy Werner

CDV Method for Validating AJEM using FUSL Test Data

M&S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM....

2023 · Thomas Johnson, Lindsey Butler, David Grimm, John Haman, Kerry Walzl

Data Principles for Operational and Live-Fire Testing

Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of data standardization and interoperability is widely recognized by leadership in DoD, however, there are no generally recognized standards for the management and handling of data (format, pedigree, architecture, transferability, etc.) in the DOD. In this presentation, I will review a set of data principles that we believe DOD should adopt to improve how it manages test data....

2023 · John Haman, Matthew Avery

Introduction to Design of Experiments for Testers

This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The training also explains the benefits of using a Design of Experiments approach to testing and provides an overview of commonly used designs (e.g., factorial, optimal, and space-filling). The briefing illustrates the concepts discussed using several case studies....

2023 · Breeana Anderson, Rebecca Medlin, John Haman, Kelly Avery, Keyla Pagan-Rivera

Statistical Methods for M&S V&V- An Intro for Non-Statisticians

This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&S VV&A WG (Sub-WG on Validation Statistical Method Selection). Suggested Citation Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&S V&V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024....

2023 · John Haman, Kelly Avery, Curtis Miller

Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data

Modeling and simulation (M&S) outputs help the Director, Operational Test and Evaluation (DOT&E) assess the effectiveness, survivability, lethality, and suitability of systems. To use M&S outputs, DOT&E needs models and simulators to be sufficiently verified and validated. The purpose of this paper is to improve the state of verification and validation by recommending and demonstrating a set of statistical techniques—metamodels, also called statistical emulators—to the M&S community. The paper expands on DOT&E’s existing guidance about metamodel usage by creating methodological recommendations the M&S community could apply to its activities....

2022 · John Haman, Curtis Miller

What Statisticians Should Do to Improve M&S Validation Studies

It is often said that many research findings – from social sciences, medicine, economics, and other disciplines – are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and modeling and simulation? In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations....

2022 · John Haman

Introduction to Bayesian Analysis

As operational testing becomes increasingly integrated and research questions become more difficult to answer, IDA’s Test Science team has found Bayesian models to be powerful data analysis methods. Analysts and decision-makers should understand the differences between this approach and the conventional way of analyzing data. It is also important to recognize when an analysis could benefit from the inclusion of prior information—what we already know about a system’s performance—and to understand the proper way to incorporate that information....

2021 · John Haman, Keyla Pagan-Rivera, Rebecca Medlin, Heather Wojton

Warhead Arena Analysis Advancements

Fragmentation analysis is a critical piece of the live fire test and evaluation (LFT&E) of the lethality and vulnerability aspects of warheads. But the traditional methods for data collection are expensive and laborious. New optical tracking technology is promising to increase the fidelity of fragmentation data, and decrease the time and costs associated with data collection. However, the new data will be complex, three-dimensional “fragmentation clouds,” possibly with a time component as well, and there will be a larger number of individual data points....

2021 · John Haman, Mark Couch, Thomas Johnson, Kerry Walzl, Heather Wojton

Circular Prediction Regions for Miss Distance Models under Heteroskedasticity

Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss distance response variable is bivariate normal and has a mean and variance that can change with one or more experimental factors. The first approach fits a heteroskedastic linear model using restricted maximum likelihood, and uses the Kenward-Roger statistic to estimate circular prediction regions....

2020 · Thomas Johnson, John Haman, Heather Wojton, Laura Freeman

Reproducible Research Mini-Tutorial

Analyses are reproducible if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors....

2019 · Andrew Flack, John Haman, Kevin Kirshenbaum

The Purpose of Mixed-Effects Models in Test and Evaluation

Mixed-effects models are the standard technique for analyzing data with grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in many operational tests. In this article, we describe the advantages of modeling data from a mixed-effects perspective and discuss an R package—ciTools—that equips the user with easy methods for presenting results from this type of model. Suggested Citation Haman, John, Matthew Avery, and Heather Wojton....

2019 · John Haman, Matthew Avery, Heather Wojton

Use of Design of Experiments in Survivability Testing

The purpose of survivability testing is to provide decision makers with relevant, credible evidence about the survivability of an aircraft that is conveyed with some degree of certainty or inferential weight. In developing an experiment to accomplish this goal, a test planner faces numerous questions What critical issue or issues are being address? What data are needed to answer the critical issues? What test conditions should be varied? What is the most economical way of varying those conditions?...

2019 · Thomas Johnson, Mark Couch, John Haman, Heather Wojton