Introduction to Human-Systems Interaction in Operational Test and Evaluation Course

Human-System Interaction (HSI) is the study of interfaces between humans and technical systems. The Department of Defense incorporates HSI evaluations into defense acquisition to improve system performance and reduce lifecycle costs. During operational test and evaluation, HSI evaluations characterize how a system’s operational performance is affected by its users. The goal of this course is to provide the theoretical background and practical tools necessary to plan and evaluate HSI test plans, collect and analyze HSI data, and report on HSI results....

2024 · Adam Miller, Keyla Pagan-Rivera

Meta-Analysis of the Effectiveness of the SALIANT Procedure for Assessing Team Situation Awareness

Many Department of Defense (DoD) systems aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level....

2024 · Sarah Shaffer, Miriam Armstrong

Quantifying Uncertainty to Keep Astronauts and Warfighters Safe

Both NASA and DOT&E increasingly rely on computer models to supplement data collection, and utilize statistical distributions to quantify the uncertainty in models, so that decision-makers are equipped with the most accurate information about system performance and model fitness. This article provides a high-level overview of uncertainty quantification (UQ) through an example assessment for the reliability of a new space-suit system. The goal is to reach a more general audience in Significance Magazine, and convey the importance and relevance of statistics to the defense and aerospace communities....

2024 · John Haman, John Dennis, James Warner

AI + Autonomy T&E in DoD

Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions...

2023 · Brian Vickers, Matthew Avery, Rachel Haga, Mark Herrera, Daniel Porter, Stuart Rodgers

CDV Method for Validating AJEM using FUSL Test Data

M&S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM....

2023 · Thomas Johnson, Lindsey Butler, David Grimm, John Haman, Kerry Walzl

Data Principles for Operational and Live-Fire Testing

Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of data standardization and interoperability is widely recognized by leadership in DoD, however, there are no generally recognized standards for the management and handling of data (format, pedigree, architecture, transferability, etc.) in the DOD. In this presentation, I will review a set of data principles that we believe DOD should adopt to improve how it manages test data....

2023 · John Haman, Matthew Avery

Framework for Operational Test Design- An Example Application of Design Thinking

This poster provides an example of how a design thinking framework can facilitate operational test design. Design thinking is a problem-solving approach of interest to many groups including those in the test and evaluation community. Design thinking promotes the principles of human-centeredness, iteration, and diversity and it can be accomplished via a five-phased approach. Following this approach, designers create innovated product solutions by (l) conducting research to empathize with their users, (2) defining specific user problems, (3) ideating on solutions that address the defined problems, (4) prototyping the product, and (5) testing the prototype....

2023 · Miriam Armstrong

Introduction to Design of Experiments for Testers

This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The training also explains the benefits of using a Design of Experiments approach to testing and provides an overview of commonly used designs (e.g., factorial, optimal, and space-filling). The briefing illustrates the concepts discussed using several case studies....

2023 · Breeana Anderson, Rebecca Medlin, John Haman, Kelly Avery, Keyla Pagan-Rivera

Statistical Methods Development Work for M&S Validation

We discuss four areas in which statistically rigorous methods contribute to modeling and simulation validation studies. These areas are statistical risk analysis, space-filling experimental designs, metamodel construction, and statistical validation. Taken together, these areas implement DOT&E guidance on model validation. In each area, IDA has contributed either research methods, user-friendly tools, or both. We point to our tools on testscience.org, and survey the research methods that we’ve contributed to the M&S validation literature...

2023 · Curtis Miller

Statistical Methods for M&S V&V- An Intro for Non-Statisticians

This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&S VV&A WG (Sub-WG on Validation Statistical Method Selection). Suggested Citation Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&S V&V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024....

2023 · John Haman, Kelly Avery, Curtis Miller

Analysis Apps for the Operational Tester

In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings....

2022 · William Whitledge

Case Study on Applying Sequential Analyses in Operational Testing

Sequential analysis concerns statistical evaluation in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends on the information acquired during the investigation. Although sequential analysis originated in ballistics testing for the Department of Defense (DoD)and it is widely used in other disciplines, it is underutilized in the DoD. Expanding the use of sequential analysis may save money and reduce test time....

2022 · Rebecca Medlin, Keyla Pagán-Rivera, Jay Dennis, Monica Ahrens

Introduction to Git

Version control software manages, archives, and (optionally) distributes different versions of files. The most popular program for version control is Git, which serves as the backbone of websites such as Github, Bitbucket, and others. In this mini- tutorial, we will introduce basics of version control in general, and Git in particular. We explain what role Git plays in a reproducible research context. The goal of the course is to get participants started using Git....

2022 · Curtis Miller

Measuring Training Efficacy- Structural Validation of the Operational Assessment of Training Scale

Effective training of the broad set of users/operators of systems has downstream impacts on usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale)....

2022 · Brian Vickers, Rachel Haga, Daniel Porter, Heather Wojton

What Statisticians Should Do to Improve M&S Validation Studies

It is often said that many research findings – from social sciences, medicine, economics, and other disciplines – are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and modeling and simulation? In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations....

2022 · John Haman

Artificial Intelligence & Autonomy Test & Evaluation Roadmap Goals

As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&A) capabilities, the test and evaluation (T&E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&E shortcomings that will be compounded by newer technologies....

2021 · Brian Vickers, Daniel Porter, Rachel Haga, Heather Wojton

Determining How Much Testing is Enough- An Exploration of Progress in the Department of Defense Test and Evaluation Community

This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&E community has made progress, areas in which progress remains elusive, and issues that have emerged since 1994 that provide additional challenges. The selected case studies used to highlight progress are especially interesting examples, rather than a comprehensive look at all programs since 1994. Suggested Citation Medlin, Rebecca, Matthew R Avery, James R Simpson, and Heather M Wojton....

2021 · Rebecca Medlin, Matthew Avery, James Simpson, Heather Wojton

Introduction to Qualitative Methods

Qualitative data, captured through free-form comment boxes, interviews, focus groups, and activity observation is heavily employed in testing and evaluation (T&E). The qualitative research approach can offer many benefits, but knowledge of how to implement methods, collect data, and analyze data according to rigorous qualitative research standards is not broadly understood within the T&E community. This tutorial offers insight into the foundational concepts of method and practice that embody defensible approaches to qualitative research....

2021 · Kristina Carter, Emily Fedele, Daniel Hellmann

Why are Statistical Engineers Needed for Test & Evaluation?

The Department of Defense (DoD) develops and acquires some of the world’s most advanced and sophisticated systems. As new technologies emerge and are incorporated into systems, OSD/DOT&E faces the challenge of ensuring that these systems undergo adequate and efficient test and evaluation (T&E) prior to operational use. Statistical engineering is a collaborative, analytical approach to problem solving that integrates statistical thinking, methods, and tools with other relevant disciplines. The statistical engineering process provides better solutions to large, unstructured, real-world problems and supports rigorous decision-making....

2021 · Rebecca Medlin, Keyla Pagan-Rivera, Monica Ahrens

A Validation Case Study- The Environment Centric Weapons Analysis Facility (ECWAF)

Reliable modeling and simulation (M&S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model’s results, the T&E community has applied rigorous statistical design of experiments techniques to both live and simulation testing....

2020 · Elliot Bartis, Steven Rabinowitz

T&E Contributions to Avoiding Unintended Behaviors in Autonomous Systems

To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making....

2020 · Daniel Porter, Heather Wojton

Test & Evaluation of AI-Enabled and Autonomous Systems- A Literature Review

We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&V challenges Problems arising from the complexity of autonomous systems, Challenges imposed by the structure of the current acquisition system,...

2020 · Heather Wojton, Daniel Porter, John Dennis

Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness

The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems....

2020 · Daniel Porter, Michael McAnally, Chad Bieber, Heather Wojton, Rebecca Medlin

Visualizing Data- I Don't Remember that Memo, but I Do Remember that Graph

IDA analysts strive to communicate clearly and effectively. Good data visualizations can enhance reports by making the conclusions easier to understand and more memorable. The goal of this seminar is to help you avoid settling for factory defaults and instead present your conclusions through visually appealing and understandable charts. Topics covered include choosing the right level of detail, guidelines for different types of graphical elements (titles, legends, annotations, etc.), selecting the right variable encodings (color, plot symbol, etc....

2020 · Matthew Avery, Andrew Flack, Brian Vickers, Heather Wojton

Demystifying the Black Box- A Test Strategy for Autonomy

The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century. Suggested Citation Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS....

2019 · Heather Wojton, Daniel Porter

Designing Experiments for Model Validation- The Foundations for Uncertainty Quantification

Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding model that simulates its performance in the field. In response, the DoD needs defensible practices for validating these models. Design of Experiments and statistical analysis techniques are the foundational building blocks for validating the use of computer models and quantifying uncertainty in that validation. Recent developments in uncertainty quantification have the potential to benefit the DoD in using modeling and simulation to inform operational evaluations....

2019 · Heather Wojton, Kelly Avery, Laura Freeman, Thomas Johnson

Managing T&E Data to Encourage Reuse

Reusing Test and Evaluation (T&E) datasets multiple times at different points throughout a program’s lifecycle is one way to realize their full value. Data management plays an important role in enabling - and even encouraging – this practice. Although Department-level policy on data management is supportive of reuse and consistent with best practices from industry and academia, the documents that shape the day-to-day activities of T&E practitioners are much less so....

2019 · Andrew Flack, Rebecca Medlin

Pilot Training Next- Modeling Skill Transfer in a Military Learning Environment

Pilot Training Next is an exploratory investigation of new technologies and procedures to increase the efficiency of Undergraduate Pilot Training in the United States Air Force. IDA analysts present a method of quantifying skill transfer from simulators to aircraft under realistic, uncontrolled conditions. Suggested Citation Porter, Daniel, Emily Fedele, and Heather Wojton. Pilot Training Next: Modeling Skill Transfer in a Military Learning Environment. IDA Document NS D-10927. Alexandria, VA: Institute for Defense Analyses, 2019....

2019 · Daniel Porter, Emily Fedele, Heather Wojton

Reproducible Research Mini-Tutorial

Analyses are reproducible if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors....

2019 · Andrew Flack, John Haman, Kevin Kirshenbaum

Statistics Boot Camp

In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics. We will continue by discussing commonly used parametric and nonparametric statistics within the defense community, ranging from comparisons of distributions to comparisons of means. We will conclude with a brief discussion of how to present your statistical findings graphically for maximum impact....

2019 · Kelly Avery, Stephanie Lane

Survey Testing Automation Tool (STAT)

In operational testing, survey administration is typically a manual, paper-driven process. We developed a web-based tool called Survey Testing Automation Tool (STAT), which integrates and automates survey construction, administration, and analysis procedures. STAT introduces a standardized approach to the construction of surveys and includes capabilities for survey management, survey planning, and form generation. Suggested Citation Finnegan, Gary M, Kelly Tran, Tara A McGovern, and William R Whitledge. Survey Testing Automation Tool (STAT)....

2019 · Kelly Tran, Tara McGovern, William Whitledge

Use of Design of Experiments in Survivability Testing

The purpose of survivability testing is to provide decision makers with relevant, credible evidence about the survivability of an aircraft that is conveyed with some degree of certainty or inferential weight. In developing an experiment to accomplish this goal, a test planner faces numerous questions What critical issue or issues are being address? What data are needed to answer the critical issues? What test conditions should be varied? What is the most economical way of varying those conditions?...

2019 · Thomas Johnson, Mark Couch, John Haman, Heather Wojton

A Groundswell for Test and Evaluation

The fundamental purpose of test and evaluation (T&E) in the Department of Defense (DOD) is to provide knowledge to answer critical questions that help decision makers manage the risk involved in developing, producing, operating, and sustaining systems and capabilities. At its core, T&E takes data and translates it into information for decision makers. Subject matter expertise of the platform and operational mission have always been critical components of developing defensible test and evaluation strategies....

2018 · Laura Freeman

Informing the Warfighter—Why Statistical Methods Matter in Defense Testing

Needs one Suggested Citation Freeman, Laura J., and Catherine Warner. “Informing the Warfighter—Why Statistical Methods Matter in Defense Testing.” CHANCE 31, no. 2 (April 3, 2018): 4–11. https://doi.org/10.1080/09332480.2018.1467627. Paper:

2018 · Laura Freeman, Catherine Warner

JEDIS Briefing and Tutorial

Are you sick of having to manually iterate your way through sizing your design of experiments? Come learn about JEDIS, the new IDA-developed JMP Add-In for automating design of experiments power calculations. JEDIS builds multiple test designs in JMP over user-specified ranges of sample sizes, Signal-to-Noise Ratios (SNR), and alpha (1 -confidence) levels. It then automatically calculates the statistical power to detect an effect due to each factor and any specified interactions for each design....

2018 · Jason Sheldon

Reliability Best Practices and Lessons Learned in the Department of Defense

Despite the importance of acquiring reliable systems to support thewarfighter, many military programs fail to meet reliability requirements, which affectsthe overall suitability and cost of the system. To determine ways to improve reliabilityoutcomes in the future, research staff from the Institute for Defense analysesOperational Evaluation Division compiled case studies identifying reliability lessonslearned and best practices for several DOT&E oversight programs. The case studiesprovide program specific information on strategies that worked well or did not workwell to produce reliable systems....

2018 · Jon Bell, Jane Pinelis, Laura Freeman

Vetting Custom Scales - Understanding Reliability, Validity, and Dimensionality

For situations in which an empirically vetted scale does not exist or is not suitable, a custom scale may be created. This document presents a comprehensive process for establishing the defensible use of a custom scale. At the highest level, this process encompasses (1) establishing validity of the scale, (2) establishing reliability of the scale, and (3) assessing dimensionality, whether intended or unintended, of the scale. First, the concept of validity is described, including how validity may be established using operators and subject matter experts....

2018 · Stephanie Lane

A Multi-Method Approach to Evaluating Human-System Interactions During Operational Testing

The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign. Suggested Citation Thomas, Dean, Heather Wojton, Chad Bieber, and Daniel Porter....

2017 · Dean Thomas, Heather Wojton, Chad Bieber, Daniel Porter

Foundations of Psychological Measurement

Psychological measurement is an important issue throughout the Department of Defense (DoD). Forinstance, the DoD engages in psychological measurement to place military personnel into specialties,evaluate the mental health of military personnel, evaluate the quality of human-systems interactions, andidentify factors that affect crime rates on bases. Given its broad use, researchers and decision-makers needto understand the basics of psychological measurement – most notably, the development of surveys. Thisbriefing discusses 1) the goals and challenges of psychological measurement, 2) basic measurementconcepts and how they apply to psychological measurement, 3) basics for developing scales to measurepsychological attributes, and 4) methods for ensuring that scales are reliable and valid....

2017 · Heather Wojton

Perspectives on Operational Testing-Guest Lecture at Naval Postgraduate School

This document was prepared to support Dr. Lillard’s visit to the NavalPostgraduate School where he will provide a guest lecture to students in the T&Ecourse. The briefing covers three primary themes: 1) evaluation of military systemson the basis of requirements and KPPs alone is often insufficient to determineeffectiveness and suitability in combat conditions, 2) statistical methods are essentialfor developing defensible and rigorous test designs, 3) operational testing is often theonly means to discover critical performance shortcomings....

2017 · V. Bram Lillard

Statistical Methods for Defense Testing

In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Although the typical test environment is highly variable and factors are often uncontrolled, design of experiments techniques can add objectivity, efficiency, and rigor to the process of test planning. Statistical analyses help system evaluators get the most information out of limited data sets....

2017 · Dean Thomas, Kelly Avery, Laura Freeman, Matthew Avery

Thinking About Data for Operational Test and Evaluation

While the human brain is powerful tool for quickly recognizing patterns in data, it will frequently make errors in interpreting random data. Luckily, these mistakes occur in systematic and predictable ways. Statistical models provide an analytical framework that helps us avoid these error-prone heuristics and draw accurate conclusions from random data. This non-technical presentation highlights some tricks of the trade learned by studying data and the way the human brain processes....

2017 · Matthew Avery

Users are Part of the System-How to Account for Human Factors when Designing Operational Tests for Software Systems

The goal of operation testing (OT) is to evaluate the effectiveness and suitability of military systems for use by trained military users in operationally realistic environments. Operators perform missions and make systems function. Thus, adequate OT must assess not only system performance and technical capability across the operational space, but also the quality of human-system interactions. Software systems in particular pose a unique challenge to testers. While some software systems may inherently be deterministic in nature, once placed in their intended environment with error-prone humans and highly stochastic networks, variability in outcomes often occurs, so tests often need to account for both “bug” finding and characterizing variability....

2017 · Kelly Avery, Heather Wojton

A First Step into the Bootstrap World

Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling distribution of the desired statistic is difficult to derive. Careful use of bootstrapping can help address many challenges in analyzing operational test data. Suggested Citation Avery, Matthew R. A First Step into the Bootstrap World. IDA Document NS D-5816....

2016 · Matthew Avery

Bayesian Analysis in R/STAN

In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Bio-chemical Detection System (BDS) with different agents and matrices....

2016 · Kassandra Fronczyk

Censored Data Analysis Methods for Performance Data- A Tutorial

Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. Censored data analysis allows us to account for both pieces of information simultaneously. Suggested Citation Lillard, V Bram. Censored Data Analysis Methods for Performance Data: A Tutorial. IDA Document NS D-5811. Alexandria, VA: Institute for Defense Analyses, 2016....

2016 · V. Bram Lillard

DOT&E Reliability Course

This reliability course provides information to assist DOT&E action officers in their review and assessment of system reliability. Course briefings cover reliability planning and analysis activities that span the acquisition life cycle. Each briefing discusses review criteria relevant to DOT&E action officers based on DoD policies and lessons learned from previous oversight efforts. Suggested Citation Avery, Matthew, Jonathan Bell, Rebecca Medlin, and Freeman Laura. DOT&E Reliability Course. IDA Document NS D-5836....

2016 · Matthew Avery, Rebecca Medlin, Jonathan Bell, Laura Freeman

Introduction to Survey Design

An important goal of test and evaluation is to understand not only how a system performs in its intended environment, but also users’ experiences operating the system. This briefing aimed to provide the audience with a set of tools – most notably, surveys – that are appropriate for measuring the user experience. DOT&E guidance regarding these tools is highlighted where appropriate. The briefing was broken into three major sections: conceptualizing surveys, writing survey items, and formatting surveys....

2016 · Heather Wojton, Justin Mary, Jonathan Snavely

Rigorous Test and Evaluation for Defense, Aerospace, and National Security

In April 2016, NASA, DOT&E, and IDA collaborated on a workshopdesigned to strengthen the community around statistical approaches to test andevaluation in defense and aerospace. The workshop brought practitioners, analysts,technical leadership, and statistical academics together for a three day exchange ofinformation with opportunities to attend world renowned short courses, share commodchallenges, and learn new skill sets from a variety of tutorials. A highlight of theworkshop was the Tuesday afternoon technical leadership panel chaired by Dr....

2016 · Laura Freeman

Science of Test Workshop Proceedings, April 11-13, 2016

To mark IDA’s 60th anniversary, we are conducting a series of workshops and symposia that bring together IDA sponsors, researchers, experts inside and outside government, and other stakeholders to discuss issues of the day. These events focus on future national security challenges, reflecting on how past lessons and accomplishments help prepare us to deal with complex issues and environments we face going forward. This publication represents the proceedings of the Science of Test Workshop....

2016 · Laura Freeman, Pamela Rambow, Jonathan Snavely

Best Practices for Statistically Validating Modeling and Simulation (M&S) Tools Used in Operational Testing

In many situations, collecting sufficient data to evaluate system performance against operationally realistic threats is not possible due to cost and resource restrictions, safety concerns, or lack of adequate or representative threats. Modeling and simulation tools that have been verified, validated, and accredited can be used to supplement live testing in order to facilitate a more complete evaluation of performance. Two key questions that frequently arise when planning an operational test are (1) which (and how many) points within the operational space should be chosen in the simulation space and the live space for optimal ability to verify and validate the M&S, and (2) once that data is collected, what is the best way to compare the live trials to the simulated trials for the purpose of validating the M&S?...

2015 · Kelly Avery, Laura Freeman, Rebecca Medlin

Surveys in Operational Test and Evaluation

Recently DOT&E signed out a memo providing Guidance on the Use and Design of Surveys in Operational Test and Evaluation. This guidance memo helps the Human Systems Integration (HSI) community to ensure that useful and accurate HSI data are collected. Information about how HSI experts can leverage the guidance is presented. Specifically, the presentation will cover which HSI metrics can and cannot be answered by surveys. Suggested Citation Grier, Rebecca A, and Laura Freeman....

2015 · Rebecca Grier, Laura Freeman

Validating the PRA Testbed Using a Statistically Rigorous Approach

For many systems, testing is expensive and only a few live test events are conducted. When this occurs, testers frequently use a model to extend the test results. However, testers must validate the model to show that it is an accurate representation of the real world from the perspective of the intended uses of the model. This raises a problem when only a small number of live test events are conducted, only limited data are available to validate the model, and some testers struggle with model validation....

2015 · Rebecca Medlin, Dean Thomas

Design of Experiments for in-Lab Operational Testing of the an/BQQ-10 Submarine Sonar System

Operational testing of the AN/BQQ-10 submarine sonar system has never been able to show significant improvements in software versions because of the high variability of at sea measurements. To mitigate this problem, in the most recent AN/BQQ-10 operational test, the Navy’s operational test agency (in consultation with IDA under the direction of Director, Operational Test and Evaluation) supplemented the at sea testing with an operationally focused in-lab comparison. This test used recorded real data played back on two different versions of the sonar system....

2014 · Laura Freeman, Justace Clutter, George Khoury

Taking the Next Step- Improving the Science of Test in DoD T&E

The current fiscal climate demands now, more than ever, that test and evaluation(T&E) provide relevant and credible characterization of system capabilities andshortfalls across all relevant operational conditions as efficiently as possible. Indetermining the answer to the question, “How much testing is enough?” it isimperative that we use a scientifically defensible methodology. Design ofExperiments (DOE) has a proven track record in Operational Test andEvaluation (OT&E) of not only quantifying how much testing is enough, but alsowhere in the operational space the test points should be placed....

2014 · Laura Freeman, V. Bram Lillard

A Tutorial on the Planning of Experiments

This tutorial outlines the basic procedures for planning experiments within the context of the scientific method. Too often quality practitioners fail to appreciate how subject-matter expertise must interact with statistical expertise to generate efficient and effective experimental programs. This tutorial guides the quality practitioner through the basic steps, demonstrated by extensive past experience, that consistently lead to successful results. This tutorial makes extensive use of flowcharts to illustrate the basic process....

2013 · Rachel Johnson, Douglas Montgomery, Bradley Jones, Chris Gotwalt

Scientific Test and Analysis Techniques- Statistical Measures of Merit

Design of Experiments (DOE) provides a rigorous methodology for developing and evaluating test plans. Design excellence consists of having enough test points placed in the right locations in the operational envelope to answer the questions of interest for the test. The key aspects of a well-designed experiment include: the goal of the test, the response variables, the factors and levels, a method for strategically varying the factors across the operational envelope, and statistical measures of merit....

2013 · Laura Freeman

A Bayesian Approach to Evaluation of Land Warfare Systems

This presentation is a presentation for the Army Conference on Applied Statistics. The presentation covers a brief introduction to land warfare problems, and devises a methodology using Bayes Theorem to estimate parameters of interest. Two examples are given, a simple one using independent Bernoulli Trials, and a more complex one using correlated Red and Blue casualty data in a Loss Exchange Ratio and a hierarchical model. The presentation demonstrates that the Bayesian approach is successful in both examples at reducing the variance of the estimated parameters, potentially reducing the cost of devising a complex test program....

2012 · Alyson Wilson, Robert Holcomb, Lee Dewald, Samuel Parry

Continuous Metrics for Efficient and Effective Testing

In today’s fiscal environment, efficient and effective testing is essential. Often, military system requirements are defined using probability of success as the primary measure of effectiveness – for example, a system must complete its mission 80 percent of the time; or the system must detect 90 percent of targets. The traditional approach to testing these probability-based requirements is to execute a series of trials and then total the number of successes; the ratio of successes to number of trails provides an intuitive measure of the probability of success....

2012 · Laura Freeman, V. Bram Lillard

Designed Experiments for the Defense Community

The areas of application for design of experiments principles have evolved, mimicking the growth of U.S. industries over the last century, from agriculture to manufacturing to chemical and process industries to the services and government sectors. In addition, statistically based quality programs adopted by businesses morphed from total quality management to Six Sigma and, most recently, statistical engineering (see Hoerl and Snee 2010). The good news about these transformations is that each evolution contains more technical substance, embedding the methodologies as core competencies, and is less of a ‘‘program....

2012 · Rachel Johnson, Douglas Montgomery, James Simpson

Statistically Based T&E Using Design of Experiments

This document outlines the charter for the Committee to Institutionalize Scientific Test Design and Rigor in Test and Evaluation. The charter defines the problem, identifies potential steps in a roadmap for accomplishing the goals of the committee and lists committeemembership. Once the committee is assembled, the members will revise this document as needed. The charter will be endorsed by DOT&E and DDT&E, once finalize. Suggested Citation Freeman, Laura. Statistically Based T&E Using Design of Experiments....

2012 · Laura Freeman

Design of Experiments in Highly Constrained Design Spaces

This presentation shows the merits of applying experimental design to operational tests, guidance on using DOE from the Director, Operational Test and Evaluation, and presents the design solution for the test of a chemical agent detector. It is important to keep in mind the advanced techniques from DOE (split-plot designs, optimal designs) to determine effective DOEs for operational testing; traditional design strategies often result in designs that are not executable....

2011 · Laura Freeman

Use of Statistically Designed Experiments to Inform Decisions in a Resource Constrained Environment

There has been recent emphasis on the increased use of statistics, including the use of statistically designed experiments, to plan and execute tests that support Department of Defense (DoD) acquisition programs. The use of statistical methods, including experimental design, has shown great benefits in industry, especially when used in an integrated fashion; for example see the literature on Six Sigma. The structured approach of experimental design allows the user to determine what data need to be collected and how it should be analyzed to achieve specific decision making objectives....

2011 · Laura Freeman, Karl Glaeser, Alethea Rucker