Test and Evaluation

AI + Autonomy T&E in DoD

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions...

Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios

Situation Awareness (SA) plays a key role in decision making and human performance, higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” is a common requirement for systems under test, there is no single standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and sometimes suboptimal treatments of SA measurement across programs and test events....

Determining How Much Testing is Enough- An Exploration of Progress in the Department of Defense Test and Evaluation Community

This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&E community has made progress, areas in which progress remains elusive, and issues that have emerged since 1994 that provide additional challenges. The selected case studies used to highlight progress are especially interesting examples, rather than a comprehensive look at all programs since 1994. Suggested Citation Medlin, Rebecca, Matthew R Avery, James R Simpson, and Heather M Wojton....

Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness

The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems....

Demystifying the Black Box- A Test Strategy for Autonomy

The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century. Suggested Citation Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS....

Perspectives on Operational Testing-Guest Lecture at Naval Postgraduate School

This document was prepared to support Dr. Lillard’s visit to the NavalPostgraduate School where he will provide a guest lecture to students in the T&Ecourse. The briefing covers three primary themes: 1) evaluation of military systemson the basis of requirements and KPPs alone is often insufficient to determineeffectiveness and suitability in combat conditions, 2) statistical methods are essentialfor developing defensible and rigorous test designs, 3) operational testing is often theonly means to discover critical performance shortcomings....

Taking the Next Step- Improving the Science of Test in DoD T&E

The current fiscal climate demands now, more than ever, that test and evaluation(T&E) provide relevant and credible characterization of system capabilities andshortfalls across all relevant operational conditions as efficiently as possible. Indetermining the answer to the question, “How much testing is enough?” it isimperative that we use a scientifically defensible methodology. Design ofExperiments (DOE) has a proven track record in Operational Test andEvaluation (OT&E) of not only quantifying how much testing is enough, but alsowhere in the operational space the test points should be placed....

Continuous Metrics for Efficient and Effective Testing

In today’s fiscal environment, efficient and effective testing is essential. Often, military system requirements are defined using probability of success as the primary measure of effectiveness – for example, a system must complete its mission 80 percent of the time; or the system must detect 90 percent of targets. The traditional approach to testing these probability-based requirements is to execute a series of trials and then total the number of successes; the ratio of successes to number of trails provides an intuitive measure of the probability of success....

Designed Experiments for the Defense Community

The areas of application for design of experiments principles have evolved, mimicking the growth of U.S. industries over the last century, from agriculture to manufacturing to chemical and process industries to the services and government sectors. In addition, statistically based quality programs adopted by businesses morphed from total quality management to Six Sigma and, most recently, statistical engineering (see Hoerl and Snee 2010). The good news about these transformations is that each evolution contains more technical substance, embedding the methodologies as core competencies, and is less of a ‘‘program....

Statistically Based T&E Using Design of Experiments

This document outlines the charter for the Committee to Institutionalize Scientific Test Design and Rigor in Test and Evaluation. The charter defines the problem, identifies potential steps in a roadmap for accomplishing the goals of the committee and lists committeemembership. Once the committee is assembled, the members will revise this document as needed. The charter will be endorsed by DOT&E and DDT&E, once finalize. Suggested Citation Freeman, Laura. Statistically Based T&E Using Design of Experiments....

Design for Reliability using Robust Parameter Design

Recently, the principles of Design of Experiments (DOE) have been implemented as amethod of increasing the statistical rigor of operational tests. The focus has been on ensuringcoverage of the operational envelope in terms of system effectiveness. DOE is applicable inreliability analysis as well. A reliability standard, ANSI-0009, advocates the use Design forReliability (DfR) early in the product development cycle in order to design-in reliability. Robustparameter design (RPD) first used by Taguchi and then by the response surface communityprovides insights on how DOE can be used to make a products and processes invariant tochanges in factors....

Use of Statistically Designed Experiments to Inform Decisions in a Resource Constrained Environment

There has been recent emphasis on the increased use of statistics, including the use of statistically designed experiments, to plan and execute tests that support Department of Defense (DoD) acquisition programs. The use of statistical methods, including experimental design, has shown great benefits in industry, especially when used in an integrated fashion; for example see the literature on Six Sigma. The structured approach of experimental design allows the user to determine what data need to be collected and how it should be analyzed to achieve specific decision making objectives....