Daniel Porter

AI + Autonomy T&E in DoD

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions...

Measuring Training Efficacy- Structural Validation of the Operational Assessment of Training Scale

Effective training of the broad set of users/operators of systems has downstream impacts on usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale)....

Artificial Intelligence & Autonomy Test & Evaluation Roadmap Goals

As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&A) capabilities, the test and evaluation (T&E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&E shortcomings that will be compounded by newer technologies....

T&E Contributions to Avoiding Unintended Behaviors in Autonomous Systems

To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making....

Test & Evaluation of AI-Enabled and Autonomous Systems- A Literature Review

We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&V challenges Problems arising from the complexity of autonomous systems, Challenges imposed by the structure of the current acquisition system,...

Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness

The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems....

Demystifying the Black Box- A Test Strategy for Autonomy

The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century. Suggested Citation Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS....

Initial Validation of the Trust of Automated Systems Test (TOAST)

Trust is a key determinant of whether people rely on automated systems in the military and the public. However, there is currently no standard for measuring trust in automated systems. In the present studies we propose a scale to measure trust in automated systems that is grounded in current research and theory on trust formation, which we refer to as the Trust in Automated Systems Test (TOAST). We evaluated both the reliability of the scale structure and criterion validity using independent, military-affiliated and civilian samples....

Operational Testing of Systems with Autonomy

Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test & Evaluation along with his deputies and Chief Scientist....

Pilot Training Next- Modeling Skill Transfer in a Military Learning Environment

Pilot Training Next is an exploratory investigation of new technologies and procedures to increase the efficiency of Undergraduate Pilot Training in the United States Air Force. IDA analysts present a method of quantifying skill transfer from simulators to aircraft under realistic, uncontrolled conditions. Suggested Citation Porter, Daniel, Emily Fedele, and Heather Wojton. Pilot Training Next: Modeling Skill Transfer in a Military Learning Environment. IDA Document NS D-10927. Alexandria, VA: Institute for Defense Analyses, 2019....

A Multi-Method Approach to Evaluating Human-System Interactions During Operational Testing

The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign. Suggested Citation Thomas, Dean, Heather Wojton, Chad Bieber, and Daniel Porter....