Developing AI Trust- From Theory to Testing and the Myths in Between

This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working definitions and works toward dispelling confusion and myths surrounding trust. Suggested Citation Razin, Yosef S., and Kristen Alexander. “Developing AI Trust: From Theory to Testing and the Myths in Between.” The ITEA Journal of Test and Evaluation 45, no....

2024 · Yosef Razin, Kristen Alexander, John Haman

Operational T&E of AI-Supported Data Integration, Fusion, and Analysis Systems

AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&E methods for overcoming them. We demonstrate how to apply these OT&E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?...

2024 · Adam Miller, Logan Ausman, John Haman, Keyla Pagan-Rivera, Sarah Shaffer, Brian Vickers

A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams

We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working together as a team involves interactions that will only increase in complexity as machines become more intelligent, capable teammates. Assessing such teams will require explicit focus on not just the human-machine interfacing but the full spectrum of interactions between and among agents....

2023 · Wilkins, David Sparrow, Caitlan Fealing, Brian Vickers, Kristina Ferguson, Heather Wojton

AI + Autonomy T&E in DoD

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions...

2023 · Brian Vickers, Matthew Avery, Rachel Haga, Mark Herrera, Daniel Porter, Stuart Rodgers

Artificial Intelligence & Autonomy Test & Evaluation Roadmap Goals

As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&A) capabilities, the test and evaluation (T&E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&E shortcomings that will be compounded by newer technologies....

2021 · Brian Vickers, Daniel Porter, Rachel Haga, Heather Wojton

T&E Contributions to Avoiding Unintended Behaviors in Autonomous Systems

To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making....

2020 · Daniel Porter, Heather Wojton

Test & Evaluation of AI-Enabled and Autonomous Systems- A Literature Review

We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&V challenges Problems arising from the complexity of autonomous systems, Challenges imposed by the structure of the current acquisition system,...

2020 · Heather Wojton, Daniel Porter, John Dennis

Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness

The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems....

2020 · Daniel Porter, Michael McAnally, Chad Bieber, Heather Wojton, Rebecca Medlin

Demystifying the Black Box- A Test Strategy for Autonomy

The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century. Suggested Citation Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS....

2019 · Heather Wojton, Daniel Porter

Operational Testing of Systems with Autonomy

Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test & Evaluation along with his deputies and Chief Scientist....

2019 · Heather Wojton, Daniel Porter, Yevgeniya Pinelis, Chad Bieber, Michael McAnally, Laura Freeman