Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions
- What is “Artificial Intelligence (AI)”?
a. A brief “AI Primer” defines some common terms, highlights words that are used inconsistently, and discusses where definitions are insufficient for identifying systems that require additional T&E considerations.
- How does AI impact T&E?
a. AI isn’t new, but systems with AI pose new challenges and may require structural changes to how we T&E.
- What makes DoD applications of AI unique?
a. Many Silicon Valley applications of AI often lack the task complexity and severe consequences of risk faced by DoD.
- What is the warfighter’s role?
a. T&E must assure warfighters have calibrated trust & an adequate understanding of system behavior.
- What is the state of DoD AI T&E in IDA and OED?
Suggested Citation
Vickers, Brian D, Matthew R Avery, Rachel A Haga, Mark R Herrera, Daniel J Porter, Stuart M Rodgers, and Rebecca M Medlin. AI + Autonomy T&E in DoD. IDA Document NS 3000083. Alexandria, VA: Institute for Defense Analyses, 2023.