Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions

  1. What is “Artificial Intelligence (AI)”?

a. A brief “AI Primer” defines some common terms, highlights words that are used inconsistently, and discusses where definitions are insufficient for identifying systems that require additional T&E considerations.

  1. How does AI impact T&E?

a. AI isn’t new, but systems with AI pose new challenges and may require structural changes to how we T&E.

  1. What makes DoD applications of AI unique?

a. Many Silicon Valley applications of AI often lack the task complexity and severe consequences of risk faced by DoD.

  1. What is the warfighter’s role?

a. T&E must assure warfighters have calibrated trust & an adequate understanding of system behavior.

  1. What is the state of DoD AI T&E in IDA and OED?

Suggested Citation

Vickers, Brian D, Matthew R Avery, Rachel A Haga, Mark R Herrera, Daniel J Porter, Stuart M Rodgers, and Rebecca M Medlin. AI + Autonomy T&E in DoD. IDA Document NS 3000083. Alexandria, VA: Institute for Defense Analyses, 2023.

Paper: