Laura Freeman

Circular Prediction Regions for Miss Distance Models under Heteroskedasticity

Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss distance response variable is bivariate normal and has a mean and variance that can change with one or more experimental factors. The first approach fits a heteroskedastic linear model using restricted maximum likelihood, and uses the Kenward-Roger statistic to estimate circular prediction regions....

Challenges and New Methods for Designing Reliability Experiments

Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges in designing reliability experiments for complex, repairable systems. We leverage lessons learned from current research and propose methods for designing an experiment for a complex, repairable system. Suggested Citation Freeman, Laura J., Rebecca M. Medlin, and Thomas H....

Designing Experiments for Model Validation- The Foundations for Uncertainty Quantification

Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding model that simulates its performance in the field. In response, the DoD needs defensible practices for validating these models. Design of Experiments and statistical analysis techniques are the foundational building blocks for validating the use of computer models and quantifying uncertainty in that validation. Recent developments in uncertainty quantification have the potential to benefit the DoD in using modeling and simulation to inform operational evaluations....

Handbook on Statistical Design & Analysis Techniques for Modeling & Simulation Validation

This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&A) and the emerging references on uncertainty quantification (UQ). The goal of this handbook is to aid the test and evaluation (T&E) community in developing test strategies that support model validation (both external validation and parametric analysis) and statistical UQ. Suggested Citation Wojton, Heather, Kelly M Avery, Laura J Freeman, Samuel H Parry, Gregory S Whittier, Thomas H Johnson, and Andrew C Flack....

Operational Testing of Systems with Autonomy

Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test & Evaluation along with his deputies and Chief Scientist....

A Groundswell for Test and Evaluation

The fundamental purpose of test and evaluation (T&E) in the Department of Defense (DOD) is to provide knowledge to answer critical questions that help decision makers manage the risk involved in developing, producing, operating, and sustaining systems and capabilities. At its core, T&E takes data and translates it into information for decision makers. Subject matter expertise of the platform and operational mission have always been critical components of developing defensible test and evaluation strategies....

Analysis of Split-Plot Reliability Experiments with Subsampling

Reliability experiments are important for determining which factors drive product reliability. The data collected in these experiments can be challenging to analyze. Often, the reliability or lifetime data collected follow distinctly nonnormal distributions and include censored observations. Additional challenges in the analysis arise when the experiment is executed with restrictions on randomization. The focus of this paper is on the proper analysis of reliability data collected from a nonrandomized reliability experiments....

Informing the Warfighter—Why Statistical Methods Matter in Defense Testing

Needs one Suggested Citation Freeman, Laura J., and Catherine Warner. “Informing the Warfighter—Why Statistical Methods Matter in Defense Testing.” CHANCE 31, no. 2 (April 3, 2018): 4–11. https://doi.org/10.1080/09332480.2018.1467627. Paper:

Power Approximations for Reliability Test Designs

Reliability tests determine which factors drive system reliability. Often, the reliability or failure time data collected in these tests tend to follow distinctly non- normal distributions and include censored observations. The experimental design should accommodate the skewed nature of the response and allow for censored observations, which occur when systems under test do not fail within the allotted test time. To account for these design and analysis considerations, Monte Carlo simulations are frequently used to evaluate experimental design properties....

Reliability Best Practices and Lessons Learned in the Department of Defense

Despite the importance of acquiring reliable systems to support thewarfighter, many military programs fail to meet reliability requirements, which affectsthe overall suitability and cost of the system. To determine ways to improve reliabilityoutcomes in the future, research staff from the Institute for Defense analysesOperational Evaluation Division compiled case studies identifying reliability lessonslearned and best practices for several DOT&E oversight programs. The case studiesprovide program specific information on strategies that worked well or did not workwell to produce reliable systems....

Scientific Test and Analysis Techniques

Abstract This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&E) continuous learning module. The module provides a basic understanding of STAT in T&E. Topics coverec include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&E certification curriculum. Slides

Scientific Test and Analysis Techniques- Continuous Learning Module

This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&E) continuous learning module. The module provides a basic understanding of STAT in T&E. Topics covered include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&E certification curriculum. Suggested Citation Pinelis, Yevgeniya, Laura J Freeman, Heather M Wojton, Denise J Edwards, Stephanie T Lane, and James R Simpson....

Testing Defense Systems

The complex, multifunctional nature of defense systems, along with the wide variety of system types, demands a structured but flexible analytical process for testing systems. This chapter summarizes commonly used techniques in defense system testing and specific challenges imposed by the nature of defense system testing. It highlights the core statistical methodologies that have proven useful in testing defense systems. Case studies illustrate the value of using statistical techniques in the design of tests and analysis of the resulting data....

On Scoping a Test that Addresses the Wrong Objective

Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In this paper, we focus on a specific Type III error that on some occasions test planners commit to reduce test size and resources. Suggested Citation Johnson, Thomas H., Rebecca M....

Power Approximations for Generalized Linear Models using the Signal-to-Noise Transformation Method

Statistical power is a useful measure for assessing the adequacy of anexperimental design prior to data collection. This paper proposes an approach referredto as the signal-to-noise transformation method (SNRx), to approximate power foreffects in a generalized linear model. The contribution of SNRx is that, with a coupleassumptions, it generates power approximations for generalized linear model effectsusing F-tests that are typically used in ANOVA for classical linear models.Additionally, SNRx follows Ohlert and Whitcomb’s unified approach for sizing aneffect, which allows for intuitive effect size definitions, and consistent estimates ofpower....

Statistical Methods for Defense Testing

In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Although the typical test environment is highly variable and factors are often uncontrolled, design of experiments techniques can add objectivity, efficiency, and rigor to the process of test planning. Statistical analyses help system evaluators get the most information out of limited data sets....

DOT&E Reliability Course

This reliability course provides information to assist DOT&E action officers in their review and assessment of system reliability. Course briefings cover reliability planning and analysis activities that span the acquisition life cycle. Each briefing discusses review criteria relevant to DOT&E action officers based on DoD policies and lessons learned from previous oversight efforts. Suggested Citation Avery, Matthew, Jonathan Bell, Rebecca Medlin, and Freeman Laura. DOT&E Reliability Course. IDA Document NS D-5836....

Regularization for Continuously Observed Ordinal Response Variables with Piecewise-Constant Functional Predictors

This paper investigates regularization for continuously observed covariates that resemble step functions. The motivating examples come from operational test data from a recent United States Department of Defense (DoD) test of the Shadow Unmanned Air Vehicle system. The response variable, quality of video provided by the Shadow to friendly ground units, was measured on an ordinal scale continuously over time. Functional covariates, altitude and distance, can be well approximated by step functions....

Rigorous Test and Evaluation for Defense, Aerospace, and National Security

In April 2016, NASA, DOT&E, and IDA collaborated on a workshopdesigned to strengthen the community around statistical approaches to test andevaluation in defense and aerospace. The workshop brought practitioners, analysts,technical leadership, and statistical academics together for a three day exchange ofinformation with opportunities to attend world renowned short courses, share commodchallenges, and learn new skill sets from a variety of tutorials. A highlight of theworkshop was the Tuesday afternoon technical leadership panel chaired by Dr....

Science of Test Workshop Proceedings, April 11-13, 2016

To mark IDA’s 60th anniversary, we are conducting a series of workshops and symposia that bring together IDA sponsors, researchers, experts inside and outside government, and other stakeholders to discuss issues of the day. These events focus on future national security challenges, reflecting on how past lessons and accomplishments help prepare us to deal with complex issues and environments we face going forward. This publication represents the proceedings of the Science of Test Workshop....

Tutorial on Sensitivity Testing in Live Fire Test and Evaluation

A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this mini-tutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation....

Best Practices for Statistically Validating Modeling and Simulation (M&S) Tools Used in Operational Testing

In many situations, collecting sufficient data to evaluate system performance against operationally realistic threats is not possible due to cost and resource restrictions, safety concerns, or lack of adequate or representative threats. Modeling and simulation tools that have been verified, validated, and accredited can be used to supplement live testing in order to facilitate a more complete evaluation of performance. Two key questions that frequently arise when planning an operational test are (1) which (and how many) points within the operational space should be chosen in the simulation space and the live space for optimal ability to verify and validate the M&S, and (2) once that data is collected, what is the best way to compare the live trials to the simulated trials for the purpose of validating the M&S?...

Estimating System Reliability from Heterogeneous Data

This briefing provides an example of some of the nuanced issues in reliability estimation in operational testing. The statistical models are motivated by an example of the Paladin Integrated Management (PIM). We demonstrate how to use a Bayesian approach to reliability estimation that uses data from all phases of testing. Suggested Citation Browning, Caleb, Laura Freeman, Alyson Wilson, Kassandra Fronczyk, and Rebecca Dickinson. “Estimating System Reliability from Heterogeneous Data.” Presented at the Conference on Applied Statistics in Defense, George Mason University, October 2015....

Improving Reliability Estimates with Bayesian Statistics

This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology. Suggested Citation Freeman, Laura J, and Kassandra Fronczyk. “Improving Reliability Estimates with Bayesian Statistics.” ITEA Journal of Test and Evaluation 37, no. 4 (June 2015). Paper:

Statistical Models for Combining Information Stryker Reliability Case Study

Reliability is an essential element in assessing the operational suitability of Department of Defense weapon systems. Reliability takes a prominent role in both the design and analysis of operational tests. In the current era of reduced budgets and increased reliability requirements, it is challenging to verify reliability requirements in a single test. Furthermore, all available data should be considered in order to ensure evaluations provide the most appropriate analysis of the system’s reliability....

Surveys in Operational Test and Evaluation

Recently DOT&E signed out a memo providing Guidance on the Use and Design of Surveys in Operational Test and Evaluation. This guidance memo helps the Human Systems Integration (HSI) community to ensure that useful and accurate HSI data are collected. Information about how HSI experts can leverage the guidance is presented. Specifically, the presentation will cover which HSI metrics can and cannot be answered by surveys. Suggested Citation Grier, Rebecca A, and Laura Freeman....

A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense

This paper summarizes sensitivity test methods commonly employed in the Department of Defense. A comparison study shows that modern methods such as Neyer’s method and Three-Phase Optimal Design are improvements over historical methods. Suggested Citation Johnson, Thomas H., Laura Freeman, Janice Hester, and Jonathan L. Bell. “A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense.” IEEE Access 2 (2014): 1442–55. https://doi.org/10.1109/ACCESS.2014.2377633. Paper:

Applying Risk Analysis to Acceptance Testing of Combat Helmets

Acceptance testing of combat helmets presents multiple challenges that require statistically-sound solutions. For example, how should first article and lot acceptance tests treat multiple threats and measures of performance? How should these tests account for multiple helmet sizes and environmental treatments? How closely should first article testing requirements match historical or characterization test data? What government and manufacturer risks are acceptable during lot acceptance testing? Similar challenges arise when testing other components of Personal Protective Equipment and similar statistical approaches should be applied to all components....

Design of Experiments for in-Lab Operational Testing of the an/BQQ-10 Submarine Sonar System

Operational testing of the AN/BQQ-10 submarine sonar system has never been able to show significant improvements in software versions because of the high variability of at sea measurements. To mitigate this problem, in the most recent AN/BQQ-10 operational test, the Navy’s operational test agency (in consultation with IDA under the direction of Director, Operational Test and Evaluation) supplemented the at sea testing with an operationally focused in-lab comparison. This test used recorded real data played back on two different versions of the sonar system....

Power Analysis Tutorial for Experimental Design Software

This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX). Suggested Citation Freeman, Laura J., Thomas H. Johnson, and James R. Simpson. “Power Analysis Tutorial for Experimental Design Software:” Fort Belvoir, VA: Defense Technical Information Center, November 1, 2014. https://doi.org/10.21236/ADA619843. Paper:

Taking the Next Step- Improving the Science of Test in DoD T&E

The current fiscal climate demands now, more than ever, that test and evaluation(T&E) provide relevant and credible characterization of system capabilities andshortfalls across all relevant operational conditions as efficiently as possible. Indetermining the answer to the question, “How much testing is enough?” it isimperative that we use a scientifically defensible methodology. Design ofExperiments (DOE) has a proven track record in Operational Test andEvaluation (OT&E) of not only quantifying how much testing is enough, but alsowhere in the operational space the test points should be placed....

Scientific Test and Analysis Techniques- Statistical Measures of Merit

Design of Experiments (DOE) provides a rigorous methodology for developing and evaluating test plans. Design excellence consists of having enough test points placed in the right locations in the operational envelope to answer the questions of interest for the test. The key aspects of a well-designed experiment include: the goal of the test, the response variables, the factors and levels, a method for strategically varying the factors across the operational envelope, and statistical measures of merit....

Continuous Metrics for Efficient and Effective Testing

In today’s fiscal environment, efficient and effective testing is essential. Often, military system requirements are defined using probability of success as the primary measure of effectiveness – for example, a system must complete its mission 80 percent of the time; or the system must detect 90 percent of targets. The traditional approach to testing these probability-based requirements is to execute a series of trials and then total the number of successes; the ratio of successes to number of trails provides an intuitive measure of the probability of success....

Statistically Based T&E Using Design of Experiments

This document outlines the charter for the Committee to Institutionalize Scientific Test Design and Rigor in Test and Evaluation. The charter defines the problem, identifies potential steps in a roadmap for accomplishing the goals of the committee and lists committeemembership. Once the committee is assembled, the members will revise this document as needed. The charter will be endorsed by DOT&E and DDT&E, once finalize. Suggested Citation Freeman, Laura. Statistically Based T&E Using Design of Experiments....

Design for Reliability using Robust Parameter Design

Recently, the principles of Design of Experiments (DOE) have been implemented as amethod of increasing the statistical rigor of operational tests. The focus has been on ensuringcoverage of the operational envelope in terms of system effectiveness. DOE is applicable inreliability analysis as well. A reliability standard, ANSI-0009, advocates the use Design forReliability (DfR) early in the product development cycle in order to design-in reliability. Robustparameter design (RPD) first used by Taguchi and then by the response surface communityprovides insights on how DOE can be used to make a products and processes invariant tochanges in factors....

Design of Experiments in Highly Constrained Design Spaces

This presentation shows the merits of applying experimental design to operational tests, guidance on using DOE from the Director, Operational Test and Evaluation, and presents the design solution for the test of a chemical agent detector. It is important to keep in mind the advanced techniques from DOE (split-plot designs, optimal designs) to determine effective DOEs for operational testing; traditional design strategies often result in designs that are not executable....

Use of Statistically Designed Experiments to Inform Decisions in a Resource Constrained Environment

There has been recent emphasis on the increased use of statistics, including the use of statistically designed experiments, to plan and execute tests that support Department of Defense (DoD) acquisition programs. The use of statistical methods, including experimental design, has shown great benefits in industry, especially when used in an integrated fashion; for example see the literature on Six Sigma. The structured approach of experimental design allows the user to determine what data need to be collected and how it should be analyzed to achieve specific decision making objectives....