Everyone on Test Science Research Document Library

Introduction to Human-Systems Interaction in Operational Test and Evaluation Course

Mon, 01 Jan 2024 00:00:00 +0000

Human-System Interaction (HSI) is the study of interfaces between humans and technical systems. The Department of Defense incorporates HSI evaluations into defense acquisition to improve system performance and reduce lifecycle costs. During operational test and evaluation, HSI evaluations characterize how a system’s operational performance is affected by its users. The goal of this course is to provide the theoretical background and practical tools necessary to plan and evaluate HSI test plans, collect and analyze HSI data, and report on HSI results. We will discuss HSI concepts, measurement methods, design of experiments, data analysis, and evaluation and reporting, all from an operational testing perspective.

Suggested Citation

Miller, Dr Adam M, and Keyla Pagan-Rivera. Introduction to Human-Systems Interaction in Operational Test and Evaluation Course. IDA Product ID 3002009. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Meta-Analysis of the Effectiveness of the SALIANT Procedure for Assessing Team Situation Awareness

Mon, 01 Jan 2024 00:00:00 +0000

Many Department of Defense (DoD) systems aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level. While research using the SALIANT model suggests that it effectively quantifies team SA, no study has examined the effectiveness of SALIANT across the entirety of the existing empirical research. The aim of the current work is to conduct a meta-analysis of previous research to examine the overall reliability of SALIANT as an SA measurement tool. This meta-analysis will assess when and how SALIANT can serve as a reliable indicator of performance at testing. Additional applications of SALIANT in non-traditional operational testing domains will also be discussed.

Suggested Citation

Shaffer, Sarah, Miriam Armstrong, and Rebecca Medlin. Meta-Analysis of the Effectiveness of the SALIANT Procedure for Assessing Team Situation Awareness. IDA Product ID 3001867. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Quantifying Uncertainty to Keep Astronauts and Warfighters Safe

Mon, 01 Jan 2024 00:00:00 +0000

Both NASA and DOT&E increasingly rely on computer models to supplement data collection, and utilize statistical distributions to quantify the uncertainty in models, so that decision-makers are equipped with the most accurate information about system performance and model fitness. This article provides a high-level overview of uncertainty quantification (UQ) through an example assessment for the reliability of a new space-suit system. The goal is to reach a more general audience in Significance Magazine, and convey the importance and relevance of statistics to the defense and aerospace communities.

Suggested Citation

Dennis, John W, John T Haman, and James E Warner. “Out-of-This-World Spacesuits: Quantifying Uncertainty Helps Keep Heroes Safe.” Significance 21, no. 4 (September 1, 2024): 10–13. https://doi.org/10.1093/jrssig/qmae056.

Paper:

AI + Autonomy T&E in DoD

Sun, 01 Jan 2023 00:00:00 +0000

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions

What is “Artificial Intelligence (AI)”?

a. A brief “AI Primer” defines some common terms, highlights words that are used inconsistently, and discusses where definitions are insufficient for identifying systems that require additional T&E considerations.

How does AI impact T&E?

a. AI isn’t new, but systems with AI pose new challenges and may require structural changes to how we T&E.

What makes DoD applications of AI unique?

a. Many Silicon Valley applications of AI often lack the task complexity and severe consequences of risk faced by DoD.

What is the warfighter’s role?

a. T&E must assure warfighters have calibrated trust & an adequate understanding of system behavior.

What is the state of DoD AI T&E in IDA and OED?

Suggested Citation

Vickers, Brian D, Matthew R Avery, Rachel A Haga, Mark R Herrera, Daniel J Porter, Stuart M Rodgers, and Rebecca M Medlin. AI + Autonomy T&E in DoD. IDA Document NS 3000083. Alexandria, VA: Institute for Defense Analyses, 2023.

Paper:

CDV Method for Validating AJEM using FUSL Test Data

Sun, 01 Jan 2023 00:00:00 +0000

M&S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM. In the past, the CDV Method has employed a variety of different analysis techniques and results presentations. Many focused on low-level validation results, detailing each component that was damaged in each FUSL event. The unique contribution of this report, which complements past CDV efforts, is that it focuses on high-level results. This has three purposes (1) to provide a pithy, yet detailed, validation assessment for a given FUSL test series, (2) to discover high-level trends that cut across an entire FUSL test series, such as whether AJEM performed better for one type of threat versus another, and (3) to compare validation results between multiple FUSL test series.

Suggested Citation

Grimm, David K, Thomas H Johnson, Lindsey D Butler, Craig Andres, Julia Ivancik, and Russ Dibelka. Component Data Vector Methodology in Support of FUSL-AJEM Validation. IDA Product ID - 3002075. Alexandria, VA: Institute for Defense Analyses, 2024.

Miller, Curtis G. “Statistical Methods Development Work for M&S Validation.” International Test and Evaluation Association 44, no. 3 (September 11, 2023). https://doi.org/10.61278/itea.44.3.1010.

Slides:

Paper:

Statistical Methods for M&S V&V- An Intro for Non-Statisticians

Sun, 01 Jan 2023 00:00:00 +0000

This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&S VV&A WG (Sub-WG on Validation Statistical Method Selection).

Suggested Citation

Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&S V&V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024.

Slides:

Analysis Apps for the Operational Tester

Sat, 01 Jan 2022 00:00:00 +0000

In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings. Using software tools like these apps can increase reproducibility of results, timeliness of analysis and reporting, attractiveness and standardization of aesthetics in figures, and accuracy of results. The first app models reliability of a system or component by fitting parametric statistical distributions to time-to-failure data. The second app fits a logistic regression model to binary data with one or two independent continuous variables as The third calculates summary statistics and produces plots of groups of Likert-scale survey question responses. The fourth calculates the system usability scale (SUS) scores for SUS survey responses and enables the app user to plot scores versus an independent variable. These apps are available for public use on the Test Science Interactive Tools webpage https://testscience.org/interactive-tools/.

Suggested Citation

Lillard, V Bram, and William Whitledge. Analysis Apps for the Operational Tester. IDA Document NS D-32959. Alexandria, VA: Institute for Defense Analyses, 2022.

Paper:

Poster:

Case Study on Applying Sequential Analyses in Operational Testing

Sat, 01 Jan 2022 00:00:00 +0000

Sequential analysis concerns statistical evaluation in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends on the information acquired during the investigation. Although sequential analysis originated in ballistics testing for the Department of Defense (DoD)and it is widely used in other disciplines, it is underutilized in the DoD. Expanding the use of sequential analysis may save money and reduce test time. In this paper, we introduce sequential analysis, describe its current and potential uses in operational test and evaluation (OT&E), and present a method for applying it to the test and evaluation of defense systems. We evaluate the proposed method by performing simulation studies and applying the method to a case study. Additionally, we discuss challenges to address for sequential analysis in OT&E. Lastly, while operational testing is the focus in this paper, the methodology presented is applicable to campaigns of experimentation and general testing across numerous disciplines.

Suggested Citation

Ahrens, Monica, Rebecca Medlin, Keyla Pagán-Rivera, and John W. Dennis. “Case Study on Applying Sequential Analyses in Operational Testing.” Quality Engineering 35, no. 3 (July 3, 2023): 534–45. https://doi.org/10.1080/08982112.2022.2146510.

Medlin, Rebecca, Matthew R Avery, James R Simpson, and Heather M Wojton. Determining How Much Testing Is Enough: An Exploration of Progress in the Department of Defense Test and Evaluation Community. IDA Document NS D-21561. Alexandria, VA: Institute for Defense Analyses, 2021.

Paper:

Introduction to Qualitative Methods

Fri, 01 Jan 2021 00:00:00 +0000

Qualitative data, captured through free-form comment boxes, interviews, focus groups, and activity observation is heavily employed in testing and evaluation (T&E). The qualitative research approach can offer many benefits, but knowledge of how to implement methods, collect data, and analyze data according to rigorous qualitative research standards is not broadly understood within the T&E community.

This tutorial offers insight into the foundational concepts of method and practice that embody defensible approaches to qualitative research. We discuss where qualitative data comes from, how it can be captured, what kind of value it offers, and how to capitalize on that value through methods and best practices.

Suggested Citation

Medlin, Rebecca, Kristina Carter, Emily Fedele, and Daniel Hellmann. Introduction to Qualitative Methods. IDA Document NS D-21591. Alexandria, VA: Institute for Defense Analyses, 2021.

Slides:

Why are Statistical Engineers Needed for Test & Evaluation?

Fri, 01 Jan 2021 00:00:00 +0000

The Department of Defense (DoD) develops and acquires some of the world’s most advanced and sophisticated systems. As new technologies emerge and are incorporated into systems, OSD/DOT&E faces the challenge of ensuring that these systems undergo adequate and efficient test and evaluation (T&E) prior to operational use. Statistical engineering is a collaborative, analytical approach to problem solving that integrates statistical thinking, methods, and tools with other relevant disciplines. The statistical engineering process provides better solutions to large, unstructured, real-world problems and supports rigorous decision-making. In this talk, we provide two case study examples related to looking at ways to improve approaches to integrate testing and data collection across the full system lifecycle. These case studies highlight why we believe statistical engineers are necessary for successful T&E.

Suggested Citation

Medlin, Rebecca, Kayla Pagan-Rivera, and Monica Ahrens. Why Are Statistical Engineers Needed for Test & Evaluation? IDA Document NS-D-22722. Alexandria, VA: Institute for Defense Analyses, 2021.

Slides:

A Validation Case Study- The Environment Centric Weapons Analysis Facility (ECWAF)

Wed, 01 Jan 2020 00:00:00 +0000

Reliable modeling and simulation (M&S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model’s results, the T&E community has applied rigorous statistical design of experiments techniques to both live and simulation testing. As part of ECWAF’s two-phased validation approach, we ran the M&S experiment with the legacy torpedo and developed an empirical emulator of the ECWAF using logistic regression. Comparing the emulator’s predictions to actual outcomes from live test events supported the test design for the upgraded torpedo. This talk overviews the ECWAF’s validation strategy, decisions that have put the ECWAF on a promising path, and the metrics used to quantify uncertainty.

Suggested Citation

Bartis, Elliot, and Steven Rabinowitz. A Validation Case Study: The Environment Centric Weapons Analysis Facility (ECWAF). IDA Document NS D-12081. Alexandria, VA: Institute for Defense Analyses, 2020.

Slides:

T&E Contributions to Avoiding Unintended Behaviors in Autonomous Systems

Wed, 01 Jan 2020 00:00:00 +0000

To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making.

Suggested Citation

Porter, Daniel J, and Heather Wojton. T&E Contributions to Avoiding Unintended Behaviors in Autonomous Systems. Vol. IDA Document NS D-12078. Alexandria, VA: Institute for Defense Analyses, 2020.

Slides:

Test & Evaluation of AI-Enabled and Autonomous Systems- A Literature Review

Wed, 01 Jan 2020 00:00:00 +0000

We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&V challenges

Problems arising from the complexity of autonomous systems,
Challenges imposed by the structure of the current acquisition system,
Lack of methods, tools, and infrastructure for testing,
Novel safety and security issues,
A lack of consensus on policy, standards, and metrics,
Issues around how to integrate humans into the operation and testing of these systems.

Recommendations for how to test autonomous military systems can be sorted into five broad groups

Use certain processes for writing requirements, or for designing and developing systems,
Make targeted investments to develop methods or tools, improve our test infrastructure, or enhance our workforce’s AI skillsets,
Use specific proposed test frameworks,
Employ novel methods for system safety or cybersecurity, and
Adopt specific proposed policies, standards, or metrics.

Suggested Citation

Wojton, Heather M, Daniel J Porter, and John W Dennis. Test & Evaluation of AI-Enabled and Autonomous Systems: A Literature Review. IDA Document NS-D-14331. Alexandria, VA: Institute for Defense Analyses, 2020.

Paper:

Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness

Wed, 01 Jan 2020 00:00:00 +0000

The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems. More recent work has done this big-picture thinking for the test and evaluation (T&E) community (e.g., Ahner & Parson, 2016, Haugh, Sparrow, & Tate, 2018, Porter et al., 2018, Sparrow, Tate, Biddle, Kaminski, & Madhavan, 2018, Zacharias, 2019b). In parallel, individual programs have been generating their own working-level solutions for their own particular use-cases and challenges.

The framework proposed in the current work bridges the gap between the big picture policy recommendations already made and individual program needs. It is meant to serve as a roadmap framework that the T&E community can follow in order to provide evidence that artificial intelligence (AI)-enabled and autonomous systems function as intended. At times we echo broad policy recommendations made by others as they will also enable T&E activities. In other places we make more specific recommendations relating to test planning and analysis. In this document, we present part one of our two-part roadmap. We discuss the challenges and possible solutions to assessing system effectiveness. A future part two will deal with test efficiency, simulation, and infrastructure. Due to the scope of this project, even the main body of this document only provides a survey of the challenges and our proposed solutions. However, this roadmap serves as an outline to a future series of technical papers covering these topics in detail for working-level testers and analysts

Suggested Citation

Porter, Daniel, Michael McAnally, Chad Bieber, Heather Wojton, and Rebecca Medlin. Trustworthy Autonomy: A Roadmap to Assurance Part I: System Effectiveness. IDA Document P-10768-NS. Alexandria, VA: Institute for Defense Analyses, 2020.

Slides:

Paper:

Visualizing Data- I Don't Remember that Memo, but I Do Remember that Graph

Wed, 01 Jan 2020 00:00:00 +0000

IDA analysts strive to communicate clearly and effectively. Good data visualizations can enhance reports by making the conclusions easier to understand and more memorable. The goal of this seminar is to help you avoid settling for factory defaults and instead present your conclusions through visually appealing and understandable charts. Topics covered include choosing the right level of detail, guidelines for different types of graphical elements (titles, legends, annotations, etc.), selecting the right variable encodings (color, plot symbol, etc.), advice on practical implementations, and determining whether to include a chart at all. Most of the time, there’s no single “right” answer, so this presentation will include audience discussion to examine the trade-offs associated with different options.

Suggested Citation

Avery, Matthew, Heather Wojton, Andrew Flack, and Brian Vickers. Visualizing Data: I Don’t Remember That Memo, but I Do Remember That Graph. Alexandria, VA: Institute for Defense Analyses, 2020.

Freeman, Laura, Karl Glaeser, and Alethea Rucker. “Use of Statistically Design Experiments to Inform Decisions in a Resource Constrained Environment.” ITEA Journal of Test and Evaluation. 32, no. 3 (2011): 267–76.