2017 | Test Science Research Document Library

A Multi-Method Approach to Evaluating Human-System Interactions During Operational Testing

The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign. Suggested Citation Thomas, Dean, Heather Wojton, Chad Bieber, and Daniel Porter....

Comparing Live Missile Fire and Simulation

Modeling and Simulation is frequently used in Test and Evaluation (T&E) of air-to-air weapon systems to evaluate the effectiveness of a weapons. The AirIntercept Missile-9X (AIM-9X) program uses modeling and simulationextensively to evaluate missile miss distances. Since flight testing isexpensive, the test program uses relatively few flight tests and supplementsthose data with large numbers of miss distances from simulated tests acrossthe weapons operational space. However, before modeling and simulation canbe used to predict performance it must first be validated....

Foundations of Psychological Measurement

Psychological measurement is an important issue throughout the Department of Defense (DoD). Forinstance, the DoD engages in psychological measurement to place military personnel into specialties,evaluate the mental health of military personnel, evaluate the quality of human-systems interactions, andidentify factors that affect crime rates on bases. Given its broad use, researchers and decision-makers needto understand the basics of psychological measurement – most notably, the development of surveys. Thisbriefing discusses 1) the goals and challenges of psychological measurement, 2) basic measurementconcepts and how they apply to psychological measurement, 3) basics for developing scales to measurepsychological attributes, and 4) methods for ensuring that scales are reliable and valid....

On Scoping a Test that Addresses the Wrong Objective

Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In this paper, we focus on a specific Type III error that on some occasions test planners commit to reduce test size and resources. Suggested Citation Johnson, Thomas H., Rebecca M....

Perspectives on Operational Testing-Guest Lecture at Naval Postgraduate School

This document was prepared to support Dr. Lillard’s visit to the NavalPostgraduate School where he will provide a guest lecture to students in the T&Ecourse. The briefing covers three primary themes: 1) evaluation of military systemson the basis of requirements and KPPs alone is often insufficient to determineeffectiveness and suitability in combat conditions, 2) statistical methods are essentialfor developing defensible and rigorous test designs, 3) operational testing is often theonly means to discover critical performance shortcomings....

Power Approximations for Generalized Linear Models using the Signal-to-Noise Transformation Method

Statistical power is a useful measure for assessing the adequacy of anexperimental design prior to data collection. This paper proposes an approach referredto as the signal-to-noise transformation method (SNRx), to approximate power foreffects in a generalized linear model. The contribution of SNRx is that, with a coupleassumptions, it generates power approximations for generalized linear model effectsusing F-tests that are typically used in ANOVA for classical linear models.Additionally, SNRx follows Ohlert and Whitcomb’s unified approach for sizing aneffect, which allows for intuitive effect size definitions, and consistent estimates ofpower....

Prediction Uncertainty for Autocorrelated Lognormal Data with Random Effects

Accurately presenting model estimates with appropriate uncertainties is critical to the credibility and defensibility of anypiece of statistical analysis. When dealing with complex data that require hierarchical covariance structures, many of the standardapproaches for visualizing uncertainty are insufficient. One such case is data fit with log-linear autoregressive mixed effectsmodels. Data requiring such an approach have three exceptional characteristics.1. The data are sampled in “groups” that exhibit variation unexplained by other model factors....

Statistical Methods for Defense Testing

In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Although the typical test environment is highly variable and factors are often uncontrolled, design of experiments techniques can add objectivity, efficiency, and rigor to the process of test planning. Statistical analyses help system evaluators get the most information out of limited data sets....

Thinking About Data for Operational Test and Evaluation

While the human brain is powerful tool for quickly recognizing patterns in data, it will frequently make errors in interpreting random data. Luckily, these mistakes occur in systematic and predictable ways. Statistical models provide an analytical framework that helps us avoid these error-prone heuristics and draw accurate conclusions from random data. This non-technical presentation highlights some tricks of the trade learned by studying data and the way the human brain processes....

Users are Part of the System-How to Account for Human Factors when Designing Operational Tests for Software Systems

The goal of operation testing (OT) is to evaluate the effectiveness and suitability of military systems for use by trained military users in operationally realistic environments. Operators perform missions and make systems function. Thus, adequate OT must assess not only system performance and technical capability across the operational space, but also the quality of human-system interactions. Software systems in particular pose a unique challenge to testers. While some software systems may inherently be deterministic in nature, once placed in their intended environment with error-prone humans and highly stochastic networks, variability in outcomes often occurs, so tests often need to account for both “bug” finding and characterizing variability....