A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams

We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working together as a team involves interactions that will only increase in complexity as machines become more intelligent, capable teammates. Assessing such teams will require explicit focus on not just the human-machine interfacing but the full spectrum of interactions between and among agents....

2023 · Wilkins, David Sparrow, Caitlan Fealing, Brian Vickers, Kristina Ferguson, Heather Wojton

AI + Autonomy T&E in DoD

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions...

2023 · Brian Vickers, Matthew Avery, Rachel Haga, Mark Herrera, Daniel Porter, Stuart Rodgers

CDV Method for Validating AJEM using FUSL Test Data

M&S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM....

2023 · Thomas Johnson, Lindsey Butler, David Grimm, John Haman, Kerry Walzl

Comparing Normal and Binary D-Optimal Designs by Statistical Power

In many Department of Defense test and evaluation applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments for generalized linear models. However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected....

2023 · Addison Adams

Data Principles for Operational and Live-Fire Testing

Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of data standardization and interoperability is widely recognized by leadership in DoD, however, there are no generally recognized standards for the management and handling of data (format, pedigree, architecture, transferability, etc.) in the DOD. In this presentation, I will review a set of data principles that we believe DOD should adopt to improve how it manages test data....

2023 · John Haman, Matthew Avery

Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions

This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test computes a Wald-type statistic comparing two generalized linear models estimated from live test data and analogous simulated data. The resulting statistic indicates whether the M&S outputs differ from the live data. After developing the test, we applied it to two logistic regression models estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s Environment Centric Weapons Analysis Facility (ECWAF)....

2023 · Carrington Metts, Curtis Miller

Framework for Operational Test Design- An Example Application of Design Thinking

This poster provides an example of how a design thinking framework can facilitate operational test design. Design thinking is a problem-solving approach of interest to many groups including those in the test and evaluation community. Design thinking promotes the principles of human-centeredness, iteration, and diversity and it can be accomplished via a five-phased approach. Following this approach, designers create innovated product solutions by (l) conducting research to empathize with their users, (2) defining specific user problems, (3) ideating on solutions that address the defined problems, (4) prototyping the product, and (5) testing the prototype....

2023 · Miriam Armstrong

Implementing Fast Flexible Space-Filling Designs in R

Modeling and simulation (M&S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&S, testers use experimental design techniques to determine how much and which types of data to collect, and they can use space-filling designs to spread out test points across the operational space. Fast flexible space-filling designs (FFSFDs) are a type of space-filling design useful for M&S because they work well in design spaces with disallowed combinations and permit the inclusion of categorical factors....

2023 · Christopher Dimapasok

Improving Test Efficiency- A Bayesian Assurance Case Study

To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase to better quantify uncertainty. We find that when using Bayesian Methods for test planning we can scope smaller tests and using Bayesian methods in analysis results in a more precise estimate of reliability – improving uncertainty quantification....

2023 · Rebecca Medlin

Introduction to Design of Experiments for Testers

This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The training also explains the benefits of using a Design of Experiments approach to testing and provides an overview of commonly used designs (e.g., factorial, optimal, and space-filling). The briefing illustrates the concepts discussed using several case studies....

2023 · Breeana Anderson, Rebecca Medlin, John Haman, Kelly Avery, Keyla Pagan-Rivera

Introduction to Design of Experiments in R- Generating and Evaluating Designs with Skpr

This workshop instructs attendees on how to run an end-to-end optimal Design of Experiments workflow in R using the open source skpr package. This workshop is split into two sections optimal design generation and design evaluation. The first half of the workshop provides basic instructions how to use R, as well as how to use skpr to create an optimal design for an experiment how to specify a model, create a candidate set of potential runs, remove disallowed combinations, and specify the design generation conditions to best suit an experimenter’s goals....

2023 · Tyler Morgan-Wall

Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios

Situation Awareness (SA) plays a key role in decision making and human performance, higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” is a common requirement for systems under test, there is no single standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and sometimes suboptimal treatments of SA measurement across programs and test events....

2023 · Elizabeth Green, Miriam Armstrong, Janna Mantua

Statistical Methods Development Work for M&S Validation

We discuss four areas in which statistically rigorous methods contribute to modeling and simulation validation studies. These areas are statistical risk analysis, space-filling experimental designs, metamodel construction, and statistical validation. Taken together, these areas implement DOT&E guidance on model validation. In each area, IDA has contributed either research methods, user-friendly tools, or both. We point to our tools on testscience.org, and survey the research methods that we’ve contributed to the M&S validation literature...

2023 · Curtis Miller

Statistical Methods for M&S V&V- An Intro for Non-Statisticians

This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&S VV&A WG (Sub-WG on Validation Statistical Method Selection). Suggested Citation Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&S V&V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024....

2023 · John Haman, Kelly Avery, Curtis Miller