<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Everyone on Test Science Research Document Library</title>
    <link>https://research.testscience.org/audience/everyone/</link>
    <description>Recent content in Everyone on Test Science Research Document Library</description>
    <generator>Hugo -- 0.129.0</generator>
    <language>en-us</language>
    <copyright>Institute for Defense Analyses</copyright>
    <lastBuildDate>Mon, 01 Jan 2024 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://research.testscience.org/audience/everyone/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Introduction to Human-Systems Interaction in Operational Test and Evaluation Course</title>
      <link>https://research.testscience.org/post/2024-introduction-to-human-systems-interaction-in-operational-test-and-evaluation-course/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-introduction-to-human-systems-interaction-in-operational-test-and-evaluation-course/</guid>
      <description>Human-System Interaction (HSI) is the study of interfaces between humans and technical systems. The Department of Defense incorporates HSI evaluations into defense acquisition to improve system performance and reduce lifecycle costs. During operational test and evaluation, HSI evaluations characterize how a system’s operational performance is affected by its users. The goal of this course is to provide the theoretical background and practical tools necessary to plan and evaluate HSI test plans, collect and analyze HSI data, and report on HSI results.</description>
      <content:encoded><![CDATA[<p>Human-System Interaction (HSI) is the study of interfaces between humans and technical systems. The Department of Defense incorporates HSI evaluations into defense acquisition to improve system performance and reduce lifecycle costs. During operational test and evaluation, HSI evaluations characterize how a system’s operational performance is affected by its users. The goal of this course is to provide the theoretical background and practical tools necessary to plan and evaluate HSI test plans, collect and analyze HSI data, and report on HSI results. We will discuss HSI concepts, measurement methods, design of experiments, data analysis, and evaluation and reporting, all from an operational testing perspective.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Miller, Dr Adam M, and Keyla Pagan-Rivera. Introduction to Human-Systems Interaction in Operational Test and Evaluation Course. IDA Product ID 3002009. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_3002009.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Meta-Analysis of the Effectiveness of the SALIANT Procedure for Assessing Team Situation Awareness</title>
      <link>https://research.testscience.org/post/2024-meta-analysis-of-the-effectiveness-of-the-saliant-procedure-for-assessing-team-situation-awareness/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-meta-analysis-of-the-effectiveness-of-the-saliant-procedure-for-assessing-team-situation-awareness/</guid>
      <description>Many Department of Defense (DoD) systems aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/Vmt1CT__stU?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Many Department of Defense (DoD) systems aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level. While research using the SALIANT model suggests that it effectively quantifies team SA, no study has examined the effectiveness of SALIANT across the entirety of the existing empirical research.  The aim of the current work is to conduct a meta-analysis of previous research to examine the overall reliability of SALIANT as an SA measurement tool. This meta-analysis will assess when and how SALIANT can serve as a reliable indicator of performance at testing. Additional applications of SALIANT in non-traditional operational testing domains will also be discussed.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Shaffer, Sarah, Miriam Armstrong, and Rebecca Medlin. Meta-Analysis of the Effectiveness of the SALIANT Procedure for Assessing Team Situation Awareness. IDA Product ID 3001867. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Quantifying Uncertainty to Keep Astronauts and Warfighters Safe</title>
      <link>https://research.testscience.org/post/2024-quantifying-uncertainty-to-keep-astronauts-and-warfighters-safe/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-quantifying-uncertainty-to-keep-astronauts-and-warfighters-safe/</guid>
      <description>Both NASA and DOT&amp;amp;E increasingly rely on computer models to supplement data collection, and utilize statistical distributions to quantify the uncertainty in models, so that decision-makers are equipped with the most accurate information about system performance and model fitness. This article provides a high-level overview of uncertainty quantification (UQ) through an example assessment for the reliability of a new space-suit system. The goal is to reach a more general audience in Significance Magazine, and convey the importance and relevance of statistics to the defense and aerospace communities.</description>
      <content:encoded><![CDATA[<p>Both NASA and DOT&amp;E increasingly rely on computer models to supplement data collection, and utilize statistical distributions to quantify the uncertainty in models, so that decision-makers are equipped with the most accurate information about system performance and model fitness.  This article provides a high-level overview of uncertainty quantification (UQ) through an example assessment for the reliability of a new space-suit system.  The goal is to reach a more general audience in Significance Magazine, and convey the importance and relevance of statistics to the defense and aerospace communities.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Dennis, John W, John T Haman, and James E Warner. “Out-of-This-World Spacesuits: Quantifying Uncertainty Helps Keep Heroes Safe.” Significance 21, no. 4 (September 1, 2024): 10–13. <a href="https://doi.org/10.1093/jrssig/qmae056">https://doi.org/10.1093/jrssig/qmae056</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>AI &#43; Autonomy T&amp;E in DoD</title>
      <link>https://research.testscience.org/post/2023-ai-autonomy-t-e-in-dod/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-ai-autonomy-t-e-in-dod/</guid>
      <description>Test and evaluation (T&amp;amp;E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&amp;amp;E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions</description>
      <content:encoded><![CDATA[<p>Test and evaluation (T&amp;E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&amp;E paradigms removed from operational realism are insufficient. However, leveraging traditional operational testing (OT) methods for to evaluate AIESs can fail to capture novel sources of risk. This brief establishes a common AI vocabulary and highlights OT challenges posed by AIESs by answering the following questions</p>
<ol>
<li>What is “Artificial Intelligence (AI)”?</li>
</ol>
<p>a. A brief “AI Primer” defines some common terms, highlights words that are used inconsistently, and discusses where definitions are insufficient for identifying systems that require additional T&amp;E considerations.</p>
<ol start="2">
<li>How does AI impact T&amp;E?</li>
</ol>
<p>a. AI isn’t new, but systems with AI pose new challenges and may require structural changes to how we T&amp;E.</p>
<ol start="3">
<li>What makes DoD applications of AI unique?</li>
</ol>
<p>a. Many Silicon Valley applications of AI often lack the task complexity and severe consequences of risk faced by DoD.</p>
<ol start="4">
<li>What is the warfighter’s role?</li>
</ol>
<p>a. T&amp;E must assure warfighters have calibrated trust &amp; an adequate understanding of system behavior.</p>
<ol start="5">
<li>What is the state of DoD AI T&amp;E in IDA and OED?</li>
</ol>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Vickers, Brian D, Matthew R Avery, Rachel A Haga, Mark R Herrera, Daniel J Porter, Stuart M Rodgers, and Rebecca M Medlin. AI + Autonomy T&amp;E in DoD. IDA Document NS 3000083. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>CDV Method for Validating AJEM using FUSL Test Data</title>
      <link>https://research.testscience.org/post/2023-cdv-method-for-validating-ajem-using-fusl-test-data/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-cdv-method-for-validating-ajem-using-fusl-test-data/</guid>
      <description>M&amp;amp;S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM.</description>
      <content:encoded><![CDATA[<p>M&amp;S validation is critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews and improves upon one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV Method compares vehicle components that were damaged in FUSL testing to simulated representations of that damage from AJEM. In the past, the CDV Method has employed a variety of different analysis techniques and results presentations. Many focused on low-level validation results, detailing each component that was damaged in each FUSL event. The unique contribution of this report, which complements past CDV efforts, is that it focuses on high-level results. This has three purposes  (1) to provide a pithy, yet detailed, validation assessment for a given FUSL test series, (2) to discover high-level trends that cut across an entire FUSL test series, such as whether AJEM performed better for one type of threat versus another, and (3) to compare validation results between multiple FUSL test series.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Grimm, David K, Thomas H Johnson, Lindsey D Butler, Craig Andres, Julia Ivancik, and Russ Dibelka. Component Data Vector Methodology in Support of FUSL-AJEM Validation. IDA Product ID - 3002075. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Data Principles for Operational and Live-Fire Testing</title>
      <link>https://research.testscience.org/post/2023-data-principles-for-operational-and-live-fire-testing/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-data-principles-for-operational-and-live-fire-testing/</guid>
      <description>Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of data standardization and interoperability is widely recognized by leadership in DoD, however, there are no generally recognized standards for the management and handling of data (format, pedigree, architecture, transferability, etc.) in the DOD. In this presentation, I will review a set of data principles that we believe DOD should adopt to improve how it manages test data.</description>
      <content:encoded><![CDATA[<p>Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of data standardization and interoperability is widely recognized by leadership in DoD, however, there are no generally recognized standards for the management and handling of data (format, pedigree, architecture, transferability, etc.) in the DOD. In this presentation, I will review a set of data principles that we believe DOD should adopt to improve how it manages test data. I will explain the current state of data management, each of the data principles, why the DOD should adopt them, and some of the difficulties of improving data handling.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, John Haman, and Matthew Avery. Data Principles for Operational and Live-Fire Testing. IDA Document NS - 1038201. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Framework for Operational Test Design- An Example Application of Design Thinking</title>
      <link>https://research.testscience.org/post/2023-framework-for-operational-test-design-an-example-application-of-design-thinking/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-framework-for-operational-test-design-an-example-application-of-design-thinking/</guid>
      <description>This poster provides an example of how a design thinking framework can facilitate operational test design. Design thinking is a problem-solving approach of interest to many groups including those in the test and evaluation community. Design thinking promotes the principles of human-centeredness, iteration, and diversity and it can be accomplished via a five-phased approach. Following this approach, designers create innovated product solutions by (l) conducting research to empathize with their users, (2) defining specific user problems, (3) ideating on solutions that address the defined problems, (4) prototyping the product, and (5) testing the prototype.</description>
      <content:encoded><![CDATA[<p>This poster provides an example of how a design thinking framework can facilitate operational test design. Design thinking is a problem-solving approach of interest to many groups including those in the test and evaluation community. Design thinking promotes the principles of human-centeredness, iteration, and diversity and it can be accomplished via a five-phased approach. Following this approach, designers create innovated product solutions by (l) conducting research to empathize with their users, (2) defining specific user problems, (3) ideating on solutions that address the defined problems, (4) prototyping the product, and (5) testing the prototype.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Kelly M, and Miriam E Armstrong. An Example Application of Design Thinking. IDA Document NS D-33368. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Design of Experiments for Testers</title>
      <link>https://research.testscience.org/post/2023-introduction-to-design-of-experiments-for-testers/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-introduction-to-design-of-experiments-for-testers/</guid>
      <description>This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The training also explains the benefits of using a Design of Experiments approach to testing and provides an overview of commonly used designs (e.g., factorial, optimal, and space-filling). The briefing illustrates the concepts discussed using several case studies.</description>
      <content:encoded><![CDATA[<p>This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The training also explains the benefits of using a Design of Experiments approach to testing and provides an overview of commonly used designs (e.g., factorial, optimal, and space-filling). The briefing illustrates the concepts discussed using several case studies.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John T, Breeana Anderson, Rebecca Medlin, Kelly M Avery, and Keyla Pagan-Rivera. I/ITSEC DOE Tutorial. IDA Document NS-D-33561. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistical Methods Development Work for M&amp;S Validation</title>
      <link>https://research.testscience.org/post/2023-statistical-methods-development-work-for-m-s-validation/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-statistical-methods-development-work-for-m-s-validation/</guid>
      <description>We discuss four areas in which statistically rigorous methods contribute to modeling and simulation validation studies. These areas are statistical risk analysis, space-filling experimental designs, metamodel construction, and statistical validation. Taken together, these areas implement DOT&amp;amp;E guidance on model validation. In each area, IDA has contributed either research methods, user-friendly tools, or both. We point to our tools on testscience.org, and survey the research methods that we&amp;rsquo;ve contributed to the M&amp;amp;S validation literature</description>
      <content:encoded><![CDATA[<p>We discuss four areas in which statistically rigorous methods contribute to modeling and simulation validation studies. These areas are statistical risk analysis, space-filling experimental designs, metamodel construction, and statistical validation. Taken together, these areas implement DOT&amp;E guidance on model validation. In each area, IDA has contributed either research methods, user-friendly tools, or both. We point to our tools on testscience.org, and survey the research methods that we&rsquo;ve contributed to the M&amp;S validation literature</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Miller, Curtis G. “Statistical Methods Development Work for M&amp;S Validation.” International Test and Evaluation Association 44, no. 3 (September 11, 2023). <a href="https://doi.org/10.61278/itea.44.3.1010">https://doi.org/10.61278/itea.44.3.1010</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistical Methods for M&amp;S V&amp;V- An Intro for Non-Statisticians</title>
      <link>https://research.testscience.org/post/2023-statistical-methods-for-m-s-v-v-an-intro-for-non-statisticians/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-statistical-methods-for-m-s-v-v-an-intro-for-non-statisticians/</guid>
      <description>This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&amp;amp;S VV&amp;amp;A WG (Sub-WG on Validation Statistical Method Selection).
Suggested Citation Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&amp;amp;S V&amp;amp;V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024.</description>
      <content:encoded><![CDATA[<p>This is a briefing intended to motivate and explain the basic concepts of applying statistics to verification and validation. The briefing will be presented at the Navy M&amp;S VV&amp;A WG (Sub-WG on Validation Statistical Method Selection).</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Pagan-Rivera, Keyla, John T Haman, Kelly M Avery, and Curtis G Miller. Statistical Methods for M&amp;S V&amp;V: An Intro for Non- Statisticians. IDA Product ID-3000770. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Analysis Apps for the Operational Tester</title>
      <link>https://research.testscience.org/post/2022-analysis-apps-for-the-operational-tester/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-analysis-apps-for-the-operational-tester/</guid>
      <description>In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings.</description>
      <content:encoded><![CDATA[<p>In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings. Using software tools like these apps can increase reproducibility of results, timeliness of analysis and reporting, attractiveness and standardization of aesthetics in figures, and accuracy of results. The first app models reliability of a system or component by fitting parametric statistical distributions to time-to-failure data. The second app fits a logistic regression model to binary data with one or two independent continuous variables as The third calculates summary statistics and produces plots of groups of Likert-scale survey question responses. The fourth calculates the system usability scale (SUS) scores for SUS survey responses and enables the app user to plot scores versus an independent variable. These apps are available for public use on the Test Science Interactive Tools webpage <a href="https://testscience.org/interactive-tools/">https://testscience.org/interactive-tools/</a>.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Lillard, V Bram, and William Whitledge. Analysis Apps for the Operational Tester. IDA Document NS D-32959. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Case Study on Applying Sequential Analyses in Operational Testing</title>
      <link>https://research.testscience.org/post/2022-case-study-on-applying-sequential-analyses-in-operational-testing/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-case-study-on-applying-sequential-analyses-in-operational-testing/</guid>
      <description>Sequential analysis concerns statistical evaluation in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends on the information acquired during the investigation. Although sequential analysis originated in ballistics testing for the Department of Defense (DoD)and it is widely used in other disciplines, it is underutilized in the DoD. Expanding the use of sequential analysis may save money and reduce test time.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/gYTY5OJY4Yo?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Sequential analysis concerns statistical evaluation in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends on the information acquired during the investigation. Although sequential analysis originated in ballistics testing for the Department of Defense (DoD)and it is widely used in other disciplines, it is underutilized in the DoD. Expanding the use of sequential analysis may save money and reduce test time. In this paper, we introduce sequential analysis, describe its current and potential uses in operational test and evaluation (OT&amp;E), and present a method for applying it to the test and evaluation of defense systems. We evaluate the proposed method by performing simulation studies and applying the method to a case study. Additionally, we discuss challenges to address for sequential analysis in OT&amp;E. Lastly, while operational testing is the focus in this paper, the methodology presented is applicable to campaigns of experimentation and general testing across numerous disciplines.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Ahrens, Monica, Rebecca Medlin, Keyla Pagán-Rivera, and John W. Dennis. “Case Study on Applying Sequential Analyses in Operational Testing.” Quality Engineering 35, no. 3 (July 3, 2023): 534–45. <a href="https://doi.org/10.1080/08982112.2022.2146510">https://doi.org/10.1080/08982112.2022.2146510</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Git</title>
      <link>https://research.testscience.org/post/2022-introduction-to-git/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-introduction-to-git/</guid>
      <description>Version control software manages, archives, and (optionally) distributes different versions of files. The most popular program for version control is Git, which serves as the backbone of websites such as Github, Bitbucket, and others. In this mini- tutorial, we will introduce basics of version control in general, and Git in particular. We explain what role Git plays in a reproducible research context. The goal of the course is to get participants started using Git.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/Pq6dHTNMlOA?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Version control software manages, archives, and (optionally) distributes different versions of files. The most popular program for version control is Git, which serves as the backbone of websites such as Github, Bitbucket, and others. In this mini- tutorial, we will introduce basics of version control in general, and Git in particular. We explain what role Git plays in a reproducible research context. The goal of the course is to get participants started using Git. We will create and clone repositories, add and track files in a repository, and manage Git branches. We also discuss a few Git best practices.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Miller, Curtis G, and John T Haman. Introduction to Git. IDA Document NS D-33021. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Measuring Training Efficacy- Structural Validation of the Operational Assessment of Training Scale</title>
      <link>https://research.testscience.org/post/2022-measuring-training-efficacy-structural-validation-of-the-operational-assessment-of-training-scale/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-measuring-training-efficacy-structural-validation-of-the-operational-assessment-of-training-scale/</guid>
      <description>Effective training of the broad set of users/operators of systems has downstream impacts on usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale).</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/0XkuBNb1TBg?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Effective training of the broad set of users/operators of systems has downstream impacts on usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale). The full list of 15 items were given to over 700 users/operators across a range of military systems and test events (comprising both developmental and operational testing phases). Systems included vehicles, aircraft, C3 systems, and dismounted squad equipment, among other types. We evaluated reliability of the factor structure across these military samples using confirmatory factor analysis. We confirmed that OATS exhibited a two-factor structure for training relevance and training efficacy. Additionally, a shortened, six-item measure of the OATS with three items per subscale continues to fit observed data well, allowing for quicker assessments of training. We discuss various ways that the OATS can be applied to one-off, multi-day, multi-event, and other types of training events. Additional OATS details and information about other scales for test and evaluation are available at the Institute for Defense Analyses&rsquo; website, <a href="https://testscience.org/validated-scales-repository/">https://testscience.org/validated-scales-repository/</a>.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Vickers, Brian D, Daniel J Porter, Rachel A Haga, Heather M Wojton, and V. Bram Lillard. Measuring Training Efficacy: Structural Validation of the Operational Assessment of Training Scale (OATS). IDA Document NS D-32972. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>What Statisticians Should Do to Improve M&amp;S Validation Studies</title>
      <link>https://research.testscience.org/post/2022-what-statisticians-should-do-to-improve-m-s-validation-studies/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-what-statisticians-should-do-to-improve-m-s-validation-studies/</guid>
      <description>It is often said that many research findings &amp;ndash; from social sciences, medicine, economics, and other disciplines &amp;ndash; are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and modeling and simulation? In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/8rZNYHeCJNU?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>It is often said that many research findings &ndash; from social sciences, medicine, economics, and other disciplines &ndash; are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and modeling and simulation? In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John T. What Statisticians Should Do to Improve M&amp;S Validation Studies. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Artificial Intelligence &amp; Autonomy Test &amp; Evaluation Roadmap Goals</title>
      <link>https://research.testscience.org/post/2021-artificial-intelligence-autonomy-test-evaluation-roadmap-goals/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-artificial-intelligence-autonomy-test-evaluation-roadmap-goals/</guid>
      <description>As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&amp;amp;A) capabilities, the test and evaluation (T&amp;amp;E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&amp;amp;E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&amp;amp;E shortcomings that will be compounded by newer technologies.</description>
      <content:encoded><![CDATA[<p>As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&amp;A) capabilities, the test and evaluation (T&amp;E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&amp;E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&amp;E shortcomings that will be compounded by newer technologies.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Brian Vickers, Daniel Porter, and Rachel Haga. Artificial Intelligence &amp; Autonomy Test &amp; Evaluation Roadmap Goals. IDA Document NS D-22750. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Determining How Much Testing is Enough- An Exploration of Progress in the Department of Defense Test and Evaluation Community</title>
      <link>https://research.testscience.org/post/2021-determining-how-much-testing-is-enough-an-exploration-of-progress-in-the-department-of-defense-test-and-evaluation-community/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-determining-how-much-testing-is-enough-an-exploration-of-progress-in-the-department-of-defense-test-and-evaluation-community/</guid>
      <description>This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&amp;amp;E community has made progress, areas in which progress remains elusive, and issues that have emerged since 1994 that provide additional challenges. The selected case studies used to highlight progress are especially interesting examples, rather than a comprehensive look at all programs since 1994.
Suggested Citation Medlin, Rebecca, Matthew R Avery, James R Simpson, and Heather M Wojton.</description>
      <content:encoded><![CDATA[<p>This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&amp;E community has made progress, areas in which progress remains elusive, and issues that have emerged since 1994 that provide additional challenges. The selected case studies used to highlight progress are especially interesting examples, rather than a comprehensive look at all programs since 1994.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, Matthew R Avery, James R Simpson, and Heather M Wojton. Determining How Much Testing Is Enough: An Exploration of Progress in the Department of Defense Test and Evaluation Community. IDA Document NS D-21561. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Qualitative Methods</title>
      <link>https://research.testscience.org/post/2021-introduction-to-qualitative-methods/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-introduction-to-qualitative-methods/</guid>
      <description>Qualitative data, captured through free-form comment boxes, interviews, focus groups, and activity observation is heavily employed in testing and evaluation (T&amp;amp;E). The qualitative research approach can offer many benefits, but knowledge of how to implement methods, collect data, and analyze data according to rigorous qualitative research standards is not broadly understood within the T&amp;amp;E community.
This tutorial offers insight into the foundational concepts of method and practice that embody defensible approaches to qualitative research.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/R8Qwc5IF1C8?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Qualitative data, captured through free-form comment boxes, interviews, focus groups, and activity observation is heavily employed in testing and evaluation (T&amp;E). The qualitative research approach can offer many benefits, but knowledge of how to implement methods, collect data, and analyze data according to rigorous qualitative research standards is not broadly understood within the T&amp;E community.</p>
<p>This tutorial offers insight into the foundational concepts of method and practice that embody defensible approaches to qualitative research. We discuss where qualitative data comes from, how it can be captured, what kind of value it offers, and how to capitalize on that value through methods and best practices.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, Kristina Carter, Emily Fedele, and Daniel Hellmann. Introduction to Qualitative Methods. IDA Document NS D-21591. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Why are Statistical Engineers Needed for Test &amp; Evaluation?</title>
      <link>https://research.testscience.org/post/2021-why-are-statistical-engineers-needed-for-test-evaluation/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-why-are-statistical-engineers-needed-for-test-evaluation/</guid>
      <description>The Department of Defense (DoD) develops and acquires some of the world’s most advanced and sophisticated systems. As new technologies emerge and are incorporated into systems, OSD/DOT&amp;amp;E faces the challenge of ensuring that these systems undergo adequate and efficient test and evaluation (T&amp;amp;E) prior to operational use. Statistical engineering is a collaborative, analytical approach to problem solving that integrates statistical thinking, methods, and tools with other relevant disciplines. The statistical engineering process provides better solutions to large, unstructured, real-world problems and supports rigorous decision-making.</description>
      <content:encoded><![CDATA[<p>The Department of Defense (DoD) develops and acquires some of the world’s most advanced and sophisticated systems. As new technologies emerge and are incorporated into systems, OSD/DOT&amp;E faces the challenge of ensuring that these systems undergo adequate and efficient test and evaluation (T&amp;E) prior to operational use. Statistical engineering is a collaborative, analytical approach to problem solving that integrates statistical thinking, methods, and tools with other relevant disciplines. The statistical engineering process provides better solutions to large, unstructured, real-world problems and supports rigorous decision-making. In this talk, we provide two case study examples related to looking at ways to improve approaches to integrate testing and data collection across the full system lifecycle. These case studies highlight why we believe statistical engineers are necessary for successful T&amp;E.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, Kayla Pagan-Rivera, and Monica Ahrens. Why Are Statistical Engineers Needed for Test &amp; Evaluation? IDA Document NS-D-22722. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Validation Case Study- The Environment Centric Weapons Analysis Facility (ECWAF)</title>
      <link>https://research.testscience.org/post/2020-a-validation-case-study-the-environment-centric-weapons-analysis-facility-ecwaf/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-a-validation-case-study-the-environment-centric-weapons-analysis-facility-ecwaf/</guid>
      <description>Reliable modeling and simulation (M&amp;amp;S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model&amp;rsquo;s results, the T&amp;amp;E community has applied rigorous statistical design of experiments techniques to both live and simulation testing.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/ujrZakOLJJ4?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Reliable modeling and simulation (M&amp;S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model&rsquo;s results, the T&amp;E community has applied rigorous statistical design of experiments techniques to both live and simulation testing. As part of ECWAF&rsquo;s two-phased validation approach, we ran the M&amp;S experiment with the legacy torpedo and developed an empirical emulator of the ECWAF using logistic regression. Comparing the emulator&rsquo;s predictions to actual outcomes from live test events supported the test design for the upgraded torpedo. This talk overviews the ECWAF&rsquo;s validation strategy, decisions that have put the ECWAF on a promising path, and the metrics used to quantify uncertainty.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Bartis, Elliot, and Steven Rabinowitz. A Validation Case Study: The Environment Centric Weapons Analysis Facility (ECWAF). IDA Document NS D-12081. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>T&amp;E Contributions to Avoiding Unintended Behaviors in Autonomous Systems</title>
      <link>https://research.testscience.org/post/2020-t-e-contributions-to-avoiding-unintended-behaviors-in-autonomous-systems/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-t-e-contributions-to-avoiding-unintended-behaviors-in-autonomous-systems/</guid>
      <description>To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making.</description>
      <content:encoded><![CDATA[<p>To provide assurance that AI-enabled systems will behave appropriately across the range of their operating conditions without performing exhaustive testing, the DoD will need to make inferences about system decision making. However, making these inferences validly requires understanding what causally drives system decision-making, which is not possible when systems are black boxes. In this briefing, we discuss the state of the art and gaps in techniques for obtaining, verifying, validating, and accrediting (OVVA) models of system decision-making.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Porter, Daniel J, and Heather Wojton. T&amp;E Contributions to Avoiding Unintended Behaviors in Autonomous Systems. Vol. IDA Document NS D-12078. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Test &amp; Evaluation of AI-Enabled and Autonomous Systems- A Literature Review</title>
      <link>https://research.testscience.org/post/2020-test-evaluation-of-ai-enabled-and-autonomous-systems-a-literature-review/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-test-evaluation-of-ai-enabled-and-autonomous-systems-a-literature-review/</guid>
      <description>We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&amp;amp;V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&amp;amp;V challenges
Problems arising from the complexity of autonomous systems,
Challenges imposed by the structure of the current acquisition system,</description>
      <content:encoded><![CDATA[<p>We summarize a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&amp;V) of autonomous military systems. This literature review is meant for informational purposes only and does not make any recommendations of its own. A synthesis of the literature identified the following categories of TEV&amp;V challenges</p>
<ol>
<li>
<p>Problems arising from the complexity of autonomous systems,</p>
</li>
<li>
<p>Challenges imposed by the structure of the current acquisition system,</p>
</li>
<li>
<p>Lack of methods, tools, and infrastructure for testing,</p>
</li>
<li>
<p>Novel safety and security issues,</p>
</li>
<li>
<p>A lack of consensus on policy, standards, and metrics,</p>
</li>
<li>
<p>Issues around how to integrate humans into the operation and testing of these systems.</p>
</li>
</ol>
<p>Recommendations for how to test autonomous military systems can be sorted into five broad groups</p>
<ol>
<li>
<p>Use certain processes for writing requirements, or for designing and developing systems,</p>
</li>
<li>
<p>Make targeted investments to develop methods or tools, improve our test infrastructure, or enhance our workforce&rsquo;s AI skillsets,</p>
</li>
<li>
<p>Use specific proposed test frameworks,</p>
</li>
<li>
<p>Employ novel methods for system safety or cybersecurity, and</p>
</li>
<li>
<p>Adopt specific proposed policies, standards, or metrics.</p>
</li>
</ol>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather M, Daniel J Porter, and John W Dennis. Test &amp; Evaluation of AI-Enabled and Autonomous Systems: A Literature Review. IDA Document NS-D-14331. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Trustworthy Autonomy- A Roadmap to Assurance -- Part 1- System Effectiveness</title>
      <link>https://research.testscience.org/post/2020-trustworthy-autonomy-a-roadmap-to-assurance-part-1-system-effectiveness/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-trustworthy-autonomy-a-roadmap-to-assurance-part-1-system-effectiveness/</guid>
      <description>The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems.</description>
      <content:encoded><![CDATA[<p>The Department of Defense (DoD) has invested significant effort over the past decade considering the role of artificial intelligence and autonomy in national security (e.g., Defense Science Board, 2012, 2016, Deputy Secretary of Defense, 2012, Endsley, 2015, Executive Order No. 13859, 2019, US Department of Defense, 2011, 2019, Zacharias, 2019a). However, these efforts were broadly scoped and only partially touched on how the DoD will certify the safety and performance of these systems. More recent work has done this big-picture thinking for the test and evaluation (T&amp;E) community (e.g., Ahner &amp; Parson, 2016, Haugh, Sparrow, &amp; Tate, 2018, Porter et al., 2018, Sparrow, Tate, Biddle, Kaminski, &amp; Madhavan, 2018, Zacharias, 2019b). In parallel, individual programs have been generating their own working-level solutions for their own particular use-cases and challenges.</p>
<p>The framework proposed in the current work bridges the gap between the big picture policy recommendations already made and individual program needs. It is meant to serve as a roadmap framework that the T&amp;E community can follow in order to provide evidence that artificial intelligence (AI)-enabled and autonomous systems function as intended. At times we echo broad policy recommendations made by others as they will also enable T&amp;E activities. In other places we make more specific recommendations relating to test planning and analysis. In this document, we present part one of our two-part roadmap. We discuss the challenges and possible solutions to assessing system effectiveness. A future part two will deal with test efficiency, simulation, and infrastructure. Due to the scope of this project, even the main body of this document only provides a survey of the challenges and our proposed solutions. However, this roadmap serves as an outline to a future series of technical papers covering these topics in detail for working-level testers and analysts</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Porter, Daniel, Michael McAnally, Chad Bieber, Heather Wojton, and Rebecca Medlin. Trustworthy Autonomy: A Roadmap to Assurance Part I: System Effectiveness. IDA Document P-10768-NS. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Visualizing Data- I Don&#39;t Remember that Memo, but I Do Remember that Graph</title>
      <link>https://research.testscience.org/post/2020-visualizing-data-i-don-t-remember-that-memo-but-i-do-remember-that-graph/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-visualizing-data-i-don-t-remember-that-memo-but-i-do-remember-that-graph/</guid>
      <description>IDA analysts strive to communicate clearly and effectively. Good data visualizations can enhance reports by making the conclusions easier to understand and more memorable. The goal of this seminar is to help you avoid settling for factory defaults and instead present your conclusions through visually appealing and understandable charts. Topics covered include choosing the right level of detail, guidelines for different types of graphical elements (titles, legends, annotations, etc.), selecting the right variable encodings (color, plot symbol, etc.</description>
      <content:encoded><![CDATA[<p>IDA analysts strive to communicate clearly and effectively. Good data visualizations can enhance reports by making the conclusions easier to understand and more memorable. The goal of this seminar is to help you avoid settling for factory defaults and instead present your conclusions through visually appealing and understandable charts. Topics covered include choosing the right level of detail, guidelines for different types of graphical elements (titles, legends, annotations, etc.), selecting the right variable encodings (color, plot symbol, etc.), advice on practical implementations, and determining whether to include a chart at all. Most of the time, there’s no single “right” answer, so this presentation will include audience discussion to examine the trade-offs associated with different options.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Matthew, Heather Wojton, Andrew Flack, and Brian Vickers. Visualizing Data: I Don’t Remember That Memo, but I Do Remember That Graph. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Demystifying the Black Box- A Test Strategy for Autonomy</title>
      <link>https://research.testscience.org/post/2019-demystifying-the-black-box-a-test-strategy-for-autonomy/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-demystifying-the-black-box-a-test-strategy-for-autonomy/</guid>
      <description>The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century.
Suggested Citation Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS.</description>
      <content:encoded><![CDATA[<p>The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather M, and Daniel J Porter. Demystifying the Black Box: A Test Strategy for Autonomy. IDA Document NS D-10465-NS. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Designing Experiments for Model Validation- The Foundations for Uncertainty Quantification</title>
      <link>https://research.testscience.org/post/2019-designing-experiments-for-model-validation-the-foundations-for-uncertainty-quantification/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-designing-experiments-for-model-validation-the-foundations-for-uncertainty-quantification/</guid>
      <description>Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding model that simulates its performance in the field. In response, the DoD needs defensible practices for validating these models. Design of Experiments and statistical analysis techniques are the foundational building blocks for validating the use of computer models and quantifying uncertainty in that validation. Recent developments in uncertainty quantification have the potential to benefit the DoD in using modeling and simulation to inform operational evaluations.</description>
      <content:encoded><![CDATA[<p>Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding model that simulates its performance in the field. In response, the DoD needs defensible practices for validating these models. Design of Experiments and statistical analysis techniques are the foundational building blocks for validating the use of computer models and quantifying uncertainty in that validation. Recent developments in uncertainty quantification have the potential to benefit the DoD in using modeling and simulation to inform operational evaluations.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Kelly Avery, Laura Freeman, and Thomas Johnson. “Designing Experiments for Model Validation – The Foundations for Uncertainty Quantification.” The  ITEA Journal of Test and Evaluation 40, no. 1 (2019).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Managing T&amp;E Data to Encourage Reuse</title>
      <link>https://research.testscience.org/post/2019-managing-t-e-data-to-encourage-reuse/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-managing-t-e-data-to-encourage-reuse/</guid>
      <description>Reusing Test and Evaluation (T&amp;amp;E) datasets multiple times at different points throughout a program’s lifecycle is one way to realize their full value. Data management plays an important role in enabling - and even encouraging – this practice. Although Department-level policy on data management is supportive of reuse and consistent with best practices from industry and academia, the documents that shape the day-to-day activities of T&amp;amp;E practitioners are much less so.</description>
      <content:encoded><![CDATA[<p>Reusing Test and Evaluation (T&amp;E) datasets multiple times at different points throughout a program’s lifecycle is one way to realize their full value. Data management plays an important role in enabling - and even encouraging – this practice. Although Department-level policy on data management is supportive of reuse and consistent with best practices from industry and academia, the documents that shape the day-to-day activities of T&amp;E practitioners are much less so. As a result, reuse of T&amp;E datasets does not occur on a consistent basis or in a formalized way. To fill this apparent gap, this article expands upon four best practices – addressed in different ways in Service-specific T&amp;E policies – that can increase the reuse of T&amp;E datasets.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, and Andrew Flack. “Managing T&amp;E Data to Encourage Reuse.” The  ITEA Journal of Test and Evaluation of Test and Evaluation, 2019.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Pilot Training Next- Modeling Skill Transfer in a Military Learning Environment</title>
      <link>https://research.testscience.org/post/2019-pilot-training-next-modeling-skill-transfer-in-a-military-learning-environment/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-pilot-training-next-modeling-skill-transfer-in-a-military-learning-environment/</guid>
      <description>Pilot Training Next is an exploratory investigation of new technologies and procedures to increase the efficiency of Undergraduate Pilot Training in the United States Air Force. IDA analysts present a method of quantifying skill transfer from simulators to aircraft under realistic, uncontrolled conditions.
Suggested Citation Porter, Daniel, Emily Fedele, and Heather Wojton. Pilot Training Next: Modeling Skill Transfer in a Military Learning Environment. IDA Document NS D-10927. Alexandria, VA: Institute for Defense Analyses, 2019.</description>
      <content:encoded><![CDATA[<p>Pilot Training Next is an exploratory investigation of new technologies and procedures to increase the efficiency of Undergraduate Pilot Training in the United States Air Force. IDA analysts present a method of quantifying skill transfer from simulators to aircraft under realistic, uncontrolled conditions.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Porter, Daniel, Emily Fedele, and Heather Wojton. Pilot Training Next: Modeling Skill Transfer in a Military Learning Environment. IDA Document NS D-10927. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-10927-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Reproducible Research Mini-Tutorial</title>
      <link>https://research.testscience.org/post/2019-reproducible-research-mini-tutorial/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-reproducible-research-mini-tutorial/</guid>
      <description>Analyses are reproducible if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/WOyEulNqCKA?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Analyses are reproducible if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Andrew Flack, John Haman, and Kevin Kirshenbaum. Reproducible Research Mini-Tutorial. IDA Document NS D-10581. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistics Boot Camp</title>
      <link>https://research.testscience.org/post/2019-statistics-boot-camp/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-statistics-boot-camp/</guid>
      <description>In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics. We will continue by discussing commonly used parametric and nonparametric statistics within the defense community, ranging from comparisons of distributions to comparisons of means. We will conclude with a brief discussion of how to present your statistical findings graphically for maximum impact.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/qVPtm43prdU?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics. We will continue by discussing commonly used parametric and nonparametric statistics within the defense community, ranging from comparisons of distributions to comparisons of means. We will conclude with a brief discussion of how to present your statistical findings graphically for maximum impact.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather M, Rebecca M Medlin, Kelly M Avery, and Stephanie T Lane. Statistics Bootcamp. IDA Document NS D-10565. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Survey Testing Automation Tool (STAT)</title>
      <link>https://research.testscience.org/post/2019-survey-testing-automation-tool-stat/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-survey-testing-automation-tool-stat/</guid>
      <description>In operational testing, survey administration is typically a manual, paper-driven process. We developed a web-based tool called Survey Testing Automation Tool (STAT), which integrates and automates survey construction, administration, and analysis procedures. STAT introduces a standardized approach to the construction of surveys and includes capabilities for survey management, survey planning, and form generation.
Suggested Citation Finnegan, Gary M, Kelly Tran, Tara A McGovern, and William R Whitledge. Survey Testing Automation Tool (STAT).</description>
      <content:encoded><![CDATA[<p>In operational testing, survey administration is typically a manual, paper-driven process. We developed a web-based tool called Survey Testing Automation Tool (STAT), which integrates and automates survey construction, administration, and analysis procedures. STAT introduces a standardized approach to the construction of surveys and includes capabilities for survey management, survey planning, and form generation.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Finnegan, Gary M, Kelly Tran, Tara A McGovern, and William R Whitledge. Survey Testing Automation Tool (STAT). IDA Document NS D-10566. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Use of Design of Experiments in Survivability Testing</title>
      <link>https://research.testscience.org/post/2019-use-of-design-of-experiments-in-survivability-testing/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-use-of-design-of-experiments-in-survivability-testing/</guid>
      <description>The purpose of survivability testing is to provide decision makers with relevant, credible evidence about the survivability of an aircraft that is conveyed with some degree of certainty or inferential weight. In developing an experiment to accomplish this goal, a test planner faces numerous questions What critical issue or issues are being address? What data are needed to answer the critical issues? What test conditions should be varied? What is the most economical way of varying those conditions?</description>
      <content:encoded><![CDATA[<p>The purpose of survivability testing is to provide decision makers with relevant, credible evidence about the survivability of an aircraft that is conveyed with some degree of certainty or inferential weight. In developing an experiment to accomplish this goal, a test planner faces numerous questions  What critical issue or issues are being address? What data are needed to answer the critical issues? What test conditions should be varied? What is the most economical way of varying those conditions? How many test articles are needed? Design of Experiments provides an analytical basis for test planning tradeoffs when answering these questions.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Couch, Mark, John Haman, Thomas Johnson, and Heather Wojton. “Designs of Experiments (DOE) in Survivability Testing.” Joint Aircraft Survivability Program - JASP Online (blog), March 2019. <a href="https://www.jasp-online.org/asjournal/summer-2019/designs-of-experiments-doe-in-survivability-testing/">https://www.jasp-online.org/asjournal/summer-2019/designs-of-experiments-doe-in-survivability-testing/</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Groundswell for Test and Evaluation</title>
      <link>https://research.testscience.org/post/2018-a-groundswell-for-test-and-evaluation/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-a-groundswell-for-test-and-evaluation/</guid>
      <description>The fundamental purpose of test and evaluation (T&amp;amp;E) in the Department of Defense (DOD) is to provide knowledge to answer critical questions that help decision makers manage the risk involved in developing, producing, operating, and sustaining systems and capabilities. At its core, T&amp;amp;E takes data and translates it into information for decision makers. Subject matter expertise of the platform and operational mission have always been critical components of developing defensible test and evaluation strategies.</description>
      <content:encoded><![CDATA[<p>The fundamental purpose of test and evaluation (T&amp;E) in the Department of Defense (DOD) is to provide knowledge to answer critical questions that help decision makers manage the risk involved in developing, producing, operating, and sustaining systems and capabilities. At its core, T&amp;E takes data and translates it into information for decision makers. Subject matter expertise of the platform and operational mission have always been critical components of developing defensible test and evaluation strategies. Recent innovations in data science have improved our ability to collect, store, manage, transfer, process and visualize data. Additionally, advances in statistics and uncertainty quantification are revolutionizing how we think about predictions from all types of data. The ability to integrate system and scientific knowledge, coupled with advances in data science and statistics, will enable us to better target testing, make efficient use of resources, quantify risk, and lead to well informed decisions.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J. “A Groundswell for Test and Evaluation.” The ITEA Journal 39, no. 4 (December 2018).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Informing the Warfighter—Why Statistical Methods Matter in Defense Testing</title>
      <link>https://research.testscience.org/post/2018-informing-the-warfighter-why-statistical-methods-matter-in-defense-testing/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-informing-the-warfighter-why-statistical-methods-matter-in-defense-testing/</guid>
      <description>Needs one
Suggested Citation Freeman, Laura J., and Catherine Warner. “Informing the Warfighter—Why Statistical Methods Matter in Defense Testing.” CHANCE 31, no. 2 (April 3, 2018): 4–11. https://doi.org/10.1080/09332480.2018.1467627.
Paper: </description>
      <content:encoded><![CDATA[<p>Needs one</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J., and Catherine Warner. “Informing the Warfighter—Why Statistical Methods Matter in Defense Testing.” CHANCE 31, no. 2 (April 3, 2018): 4–11. <a href="https://doi.org/10.1080/09332480.2018.1467627">https://doi.org/10.1080/09332480.2018.1467627</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>JEDIS Briefing and Tutorial</title>
      <link>https://research.testscience.org/post/2018-jedis-briefing-and-tutorial/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-jedis-briefing-and-tutorial/</guid>
      <description>Are you sick of having to manually iterate your way through sizing your design of experiments? Come learn about JEDIS, the new IDA-developed JMP Add-In for automating design of experiments power calculations. JEDIS builds multiple test designs in JMP over user-specified ranges of sample sizes, Signal-to-Noise Ratios (SNR), and alpha (1 -confidence) levels. It then automatically calculates the statistical power to detect an effect due to each factor and any specified interactions for each design.</description>
      <content:encoded><![CDATA[<p>Are you sick of having to manually iterate your way through sizing your design of experiments? Come learn about JEDIS, the new IDA-developed JMP Add-In for automating design of experiments power calculations. JEDIS builds multiple test designs in JMP over user-specified ranges of sample sizes, Signal-to-Noise Ratios (SNR), and alpha (1 -confidence) levels. It then automatically calculates the statistical power to detect an effect due to each factor and any specified interactions for each design. When finished, JEDIS presents the statistical power vs. design metrics in interactive plots and stores the data in an easy to use format. JEDIS creates factorial and optimal designs, but does not currently support split plot designs. If you already have a pre-made design table, the JEDIS Light feature can compute power for the design over ranges of SNR and alpha levels.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Pechkis, Daniel, and Jason P Sheldon. JEDIS Briefing and Tutorial. IDA Document NS D-8964. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Reliability Best Practices and Lessons Learned in the Department of Defense</title>
      <link>https://research.testscience.org/post/2018-reliability-best-practices-and-lessons-learned-in-the-department-of-defense/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-reliability-best-practices-and-lessons-learned-in-the-department-of-defense/</guid>
      <description>Despite the importance of acquiring reliable systems to support thewarfighter, many military programs fail to meet reliability requirements, which affectsthe overall suitability and cost of the system. To determine ways to improve reliabilityoutcomes in the future, research staff from the Institute for Defense analysesOperational Evaluation Division compiled case studies identifying reliability lessonslearned and best practices for several DOT&amp;amp;E oversight programs. The case studiesprovide program specific information on strategies that worked well or did not workwell to produce reliable systems.</description>
      <content:encoded><![CDATA[<p>Despite the importance of acquiring reliable systems to support thewarfighter, many military programs fail to meet reliability requirements, which affectsthe overall suitability and cost of the system. To determine ways to improve reliabilityoutcomes in the future, research staff from the Institute for Defense analysesOperational Evaluation Division compiled case studies identifying reliability lessonslearned and best practices for several DOT&amp;E oversight programs. The case studiesprovide program specific information on strategies that worked well or did not workwell to produce reliable systems.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Pinelis, Yevgeniya K, Jonathan L Bell, Charles D Carlson, Brent A Crabtree, Rebecca M Dickinson, Laura J Freeman, Duane A Goehring, et al. Reliability Best Practices and Lessons Learned in the Department Of Defense. IDA Document NS D-8889. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-8889-NS.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Vetting Custom Scales - Understanding Reliability, Validity, and Dimensionality</title>
      <link>https://research.testscience.org/post/2018-vetting-custom-scales-understanding-reliability-validity-and-dimensionality/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-vetting-custom-scales-understanding-reliability-validity-and-dimensionality/</guid>
      <description>For situations in which an empirically vetted scale does not exist or is not suitable, a custom scale may be created. This document presents a comprehensive process for establishing the defensible use of a custom scale. At the highest level, this process encompasses (1) establishing validity of the scale, (2) establishing reliability of the scale, and (3) assessing dimensionality, whether intended or unintended, of the scale. First, the concept of validity is described, including how validity may be established using operators and subject matter experts.</description>
      <content:encoded><![CDATA[<p>For situations in which an empirically vetted scale does not exist or is not suitable, a custom scale may be created. This document presents a comprehensive process for establishing the defensible use of a custom scale. At the highest level, this process encompasses (1) establishing validity of the scale, (2) establishing reliability of the scale, and (3) assessing dimensionality, whether intended or unintended, of the scale. First, the concept of validity is described, including how validity may be established using operators and subject matter experts. The concept of scale reliability is described, with guidelines for computing, interpreting, and using results to inform potential modifications to a custom scale. Next, a method for investigating the dimensionality of a scale, exploratory factor analysis, is described, along with a walkthrough of software implementation and results. Finally, confirmatory factor analysis, a technique for testing a priori hypotheses about dimensionality, is presented.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, and Stephanie Lane. Vetting Custom Scales - Understanding Reliability, Validity, and Dimensionality. IDA Non-Standard Document NS D-9168. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Multi-Method Approach to Evaluating Human-System Interactions During Operational Testing</title>
      <link>https://research.testscience.org/post/2017-a-multi-method-approach-to-evaluating-human-system-interactions-during-operational-testing/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-a-multi-method-approach-to-evaluating-human-system-interactions-during-operational-testing/</guid>
      <description>The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign.
Suggested Citation Thomas, Dean, Heather Wojton, Chad Bieber, and Daniel Porter.</description>
      <content:encoded><![CDATA[<p>The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Thomas, Dean, Heather Wojton, Chad Bieber, and Daniel Porter. A Multi-Method Approach to Evaluating Human-System Interactions during Operational Testing. IDA Document NS D-8857. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Foundations of Psychological Measurement</title>
      <link>https://research.testscience.org/post/2017-foundations-of-psychological-measurement/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-foundations-of-psychological-measurement/</guid>
      <description>Psychological measurement is an important issue throughout the Department of Defense (DoD). Forinstance, the DoD engages in psychological measurement to place military personnel into specialties,evaluate the mental health of military personnel, evaluate the quality of human-systems interactions, andidentify factors that affect crime rates on bases. Given its broad use, researchers and decision-makers needto understand the basics of psychological measurement – most notably, the development of surveys. Thisbriefing discusses 1) the goals and challenges of psychological measurement, 2) basic measurementconcepts and how they apply to psychological measurement, 3) basics for developing scales to measurepsychological attributes, and 4) methods for ensuring that scales are reliable and valid.</description>
      <content:encoded><![CDATA[<p>Psychological measurement is an important issue throughout the Department of Defense (DoD). Forinstance, the DoD engages in psychological measurement to place military personnel into specialties,evaluate the mental health of military personnel, evaluate the quality of human-systems interactions, andidentify factors that affect crime rates on bases. Given its broad use, researchers and decision-makers needto understand the basics of psychological measurement – most notably, the development of surveys. Thisbriefing discusses 1) the goals and challenges of psychological measurement, 2) basic measurementconcepts and how they apply to psychological measurement, 3) basics for developing scales to measurepsychological attributes, and 4) methods for ensuring that scales are reliable and valid.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather. Foundations of Psychological Measurement. IDA Document NS D-8273. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-8273.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Perspectives on Operational Testing-Guest Lecture at Naval Postgraduate School</title>
      <link>https://research.testscience.org/post/2017-perspectives-on-operational-testing-guest-lecture-at-naval-postgraduate-school/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-perspectives-on-operational-testing-guest-lecture-at-naval-postgraduate-school/</guid>
      <description>This document was prepared to support Dr. Lillard&amp;rsquo;s visit to the NavalPostgraduate School where he will provide a guest lecture to students in the T&amp;amp;Ecourse. The briefing covers three primary themes: 1) evaluation of military systemson the basis of requirements and KPPs alone is often insufficient to determineeffectiveness and suitability in combat conditions, 2) statistical methods are essentialfor developing defensible and rigorous test designs, 3) operational testing is often theonly means to discover critical performance shortcomings.</description>
      <content:encoded><![CDATA[<p>This document was prepared to support Dr. Lillard&rsquo;s visit to the NavalPostgraduate School where he will provide a guest lecture to students in the T&amp;Ecourse. The briefing covers three primary themes: 1) evaluation of military systemson the basis of requirements and KPPs alone is often insufficient to determineeffectiveness and suitability in combat conditions, 2) statistical methods are essentialfor developing defensible and rigorous test designs, 3) operational testing is often theonly means to discover critical performance shortcomings.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Lillard, Vincent A. Perspectives on Operational Testing: Guest Lecture at Naval Postgraduate School. IDA Document D-8333-NS. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-8333.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistical Methods for Defense Testing</title>
      <link>https://research.testscience.org/post/2017-statistical-methods-for-defense-testing/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-statistical-methods-for-defense-testing/</guid>
      <description>In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Although the typical test environment is highly variable and factors are often uncontrolled, design of experiments techniques can add objectivity, efficiency, and rigor to the process of test planning. Statistical analyses help system evaluators get the most information out of limited data sets.</description>
      <content:encoded><![CDATA[<p>In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Although the typical test environment is highly variable and factors are often uncontrolled, design of experiments techniques can add objectivity, efficiency, and rigor to the process of test planning. Statistical analyses help system evaluators get the most information out of limited data sets. Oftentimes new or complex analysis techniques are needed to support the goal of characterizing or predicting system performance across the operational space. Finally, the growing need for computer models or simulations to supplement live testing also means that these models must be appropriately validated before their output can be deemed sufficient for use. Statistical design and analysis techniques are essential for rigorous evaluation of these models.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Matthew R., Kelly M. Avery, and Laura J. Freeman. “Statistical Methods for Defense Testing.” In Wiley StatsRef: Statistics Reference Online, edited by Ron S. Kenett, Nicholas T. Longford, Walter W. Piegorsch, and Fabrizio Ruggeri, 1st ed., 1–5. Wiley, 2018. <a href="https://doi.org/10.1002/9781118445112.stat07946">https://doi.org/10.1002/9781118445112.stat07946</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Thinking About Data for Operational Test and Evaluation</title>
      <link>https://research.testscience.org/post/2017-thinking-about-data-for-operational-test-and-evaluation/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-thinking-about-data-for-operational-test-and-evaluation/</guid>
      <description>While the human brain is powerful tool for quickly recognizing patterns in data, it will frequently make errors in interpreting random data. Luckily, these mistakes occur in systematic and predictable ways. Statistical models provide an analytical framework that helps us avoid these error-prone heuristics and draw accurate conclusions from random data. This non-technical presentation highlights some tricks of the trade learned by studying data and the way the human brain processes.</description>
      <content:encoded><![CDATA[<p>While the human brain is powerful tool for quickly recognizing patterns in data, it will frequently make errors in interpreting random data. Luckily, these mistakes occur in systematic and predictable ways. Statistical models provide an analytical framework that helps us avoid these error-prone heuristics and draw accurate conclusions from random data. This non-technical presentation highlights some tricks of the trade learned by studying data and the way the human brain processes. First, we introduce statistics as the science of data, and discuss how the popular conception of randomness differs from its technical definition. Later sections highlight the human brain as a pattern recognition machine. Examples from published literature and media highlight systematic and predicable errors in human cognition as well as how poor data analysis and graphical displays can cause critical errors in analysis. Finally, we&rsquo;ll talk about using statistical models for analysis, including how violations of model assumptions should effect our analyses.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Thomas, Dean, and Matthew Avery. Thinking About Data for Operational Test and Evaluation. IDA Document NS D-8729. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Users are Part of the System-How to Account for Human Factors when Designing Operational Tests for Software Systems</title>
      <link>https://research.testscience.org/post/2017-users-are-part-of-the-system-how-to-account-for-human-factors-when-designing-operational-tests-for-software-systems/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-users-are-part-of-the-system-how-to-account-for-human-factors-when-designing-operational-tests-for-software-systems/</guid>
      <description>The goal of operation testing (OT) is to evaluate the effectiveness and suitability of military systems for use by trained military users in operationally realistic environments. Operators perform missions and make systems function. Thus, adequate OT must assess not only system performance and technical capability across the operational space, but also the quality of human-system interactions. Software systems in particular pose a unique challenge to testers. While some software systems may inherently be deterministic in nature, once placed in their intended environment with error-prone humans and highly stochastic networks, variability in outcomes often occurs, so tests often need to account for both “bug” finding and characterizing variability.</description>
      <content:encoded><![CDATA[<p>The goal of operation testing (OT) is to evaluate the effectiveness and suitability of military systems for use by trained military users in operationally realistic environments.  Operators perform missions and make systems function.  Thus, adequate OT must assess not only system performance and technical capability across the operational space, but also the quality of human-system interactions. Software systems in particular pose a unique challenge to testers. While some software systems may inherently be deterministic in nature, once placed in their intended environment with error-prone humans and highly stochastic networks, variability in outcomes often occurs, so tests often need to account for both “bug” finding and characterizing variability.   This document outlines common statistical techniques for planning tests of system performance for software systems, and then discusses how testers might integrate human-system interaction metrics into that design and evaluation. System PerformanceBefore deciding what class of statistical design techniques to apply, testers should consider whether the system under test is deterministic (repeating a process with the same inputs always produces the same output) or stochastic (even if the inputs are fixed, repeating the process again could produce a different result).  Software systems–a calculator, for example– may intuitively be deterministic, and as standalone entities in a pristine environment, they are.  However, there are other sources of variation to consider when testing such a system in an operational environment with an intended user.  If the calculator is intended to be used by scientists in Antarctica, temperature, lighting conditions, and user clothing such as gloves all could affect the users’ ability to operate the system. Combinatorial covering arrays can cover a large input space extremely efficiently and are useful for conducting functionality checks of a complex system.  However, several assumptions must be met in order for testers to benefit from combinatorial designs.  The system must be fully deterministic, the response variable of interest must be binary (pass/fail), and the primary goal of the test must be to find problems.  Combinatorial designs cannot determine cause and effect and are not designed to detect or quantify uncertainty or variability in responses. In operational testing, the assumptions listed above typically are not met.  Any number of factors, including the human user, the network load, memory leaks, database errors, and a constantly changing environment can cause variability in the mission-level outcome of interest.  While combinatorial designs can be useful for bug checking, they typically are not sufficient for OT.  One goal of OT should be to characterize system performance across the space.   The appropriate designs to support characterization are classical or optimal designs.  These designs, including factorial, fractional factorial, response surface, and D-optimal constructs, have the ability to quantify variability in outcomes and attribute changes in response to specific factors or factor interactions. These two broad classes of design (combinatorial and classical) can be merged in order to serve both goals, finding problems and characterizing performance.  Testers can develop a “hybrid” design by first building a combinatorial covering array across all factors, and then adding the necessary runs to support a D-optimal design, for example.  This allows testers to efficiently detect any remaining “bugs” in the software, while also quantifying variability and supporting statistical regression analysis of the data.Human-System InteractionIt is not sufficient only to assess technical performance when testing software systems.  Systems that account for human factors (operators’ physical and psychological characteristics) are more likely to fulfill their missions. Software that is psychologically challenging often leads to mistakes, inefficiencies, and safety concerns. Testers can use human-system interaction (HSI) metrics to capture software compatibility with key psychological characteristics.  Inherent characteristics such as short- and long-term memory processes, capacity for attention, and cognitive load are directly related to measurable constructs such as usability, workload, and task error rates. To evaluate HSI, testers can use either behavioral metrics (e.g. error rates, completion times, speech/facial expressions) or self-report metrics (surveys and interviews). Though behavioral metrics are generally preferred since they are directly observable, the method you choose depends on the HSI concept you want to measure, your test design, and operational constraints. The same logic can be applied to HSI data collection as data collection for system performance.  Testers should strive to understand how users’ experience of the system shifts with the operational environment, thus designed experiments with factors and levels should be applied.    In addition, understanding if, or how much, user experience affects system performance is key to a thorough evaluation.The easiest way to fit HSI into OT is to leverage the existing test design.  First, identify the subset (or possibly superset) of factors that are likely to shape how users experience the system, then distribute those users across the test conditions logically.  The number of users, their groupings, and how they will be spread across the factor space all matter when designing an adequate test for HSI.Most HSI data, including behavioral metrics and empirically validated surveys, also can be analyzed in the same way system performance data can, using statistically rigorous techniques such as regression.  Operational conditions, user type, and system characteristics all can affect HSI, so it is critical to account for those factors in the design and analysis.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J, Kelly M Avery, and Heather M Wojton. Users Are Part of the System: How to Account for Human Factors When Designing Operational Tests for Software Systems. IDA Document NS D-8630. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A First Step into the Bootstrap World</title>
      <link>https://research.testscience.org/post/2016-a-first-step-into-the-bootstrap-world/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-a-first-step-into-the-bootstrap-world/</guid>
      <description>Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling distribution of the desired statistic is difficult to derive. Careful use of bootstrapping can help address many challenges in analyzing operational test data.
Suggested Citation Avery, Matthew R. A First Step into the Bootstrap World. IDA Document NS D-5816.</description>
      <content:encoded><![CDATA[<p>Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling distribution of the desired statistic is difficult to derive. Careful use of bootstrapping can help address many challenges in analyzing operational test data.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Matthew R. A First Step into the Bootstrap World. IDA Document NS D-5816. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Bayesian Analysis in R/STAN</title>
      <link>https://research.testscience.org/post/2016-bayesian-analysis-in-r-stan/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-bayesian-analysis-in-r-stan/</guid>
      <description>In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&amp;amp;E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Bio-chemical Detection System (BDS) with different agents and matrices.</description>
      <content:encoded><![CDATA[<p>In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&amp;E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Bio-chemical Detection System (BDS) with different agents and matrices. Examples are also presented using STAN, a high-performance open-source software for Bayesian inference on multi-level models.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Fronczyk, Kassandra. Bayesian Analysis in R/STAN. IDA Document NS D-5831. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5831-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Censored Data Analysis Methods for Performance Data- A Tutorial</title>
      <link>https://research.testscience.org/post/2016-censored-data-analysis-methods-for-performance-data-a-tutorial/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-censored-data-analysis-methods-for-performance-data-a-tutorial/</guid>
      <description>Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. Censored data analysis allows us to account for both pieces of information simultaneously.
Suggested Citation Lillard, V Bram. Censored Data Analysis Methods for Performance Data: A Tutorial. IDA Document NS D-5811. Alexandria, VA: Institute for Defense Analyses, 2016.</description>
      <content:encoded><![CDATA[<p>Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. Censored data analysis allows us to account for both pieces of information simultaneously.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Lillard, V Bram. Censored Data Analysis Methods for Performance Data: A Tutorial. IDA Document NS D-5811. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>DOT&amp;E Reliability Course</title>
      <link>https://research.testscience.org/post/2016-dot-e-reliability-course/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-dot-e-reliability-course/</guid>
      <description>This reliability course provides information to assist DOT&amp;amp;E action officers in their review and assessment of system reliability. Course briefings cover reliability planning and analysis activities that span the acquisition life cycle. Each briefing discusses review criteria relevant to DOT&amp;amp;E action officers based on DoD policies and lessons learned from previous oversight efforts.
Suggested Citation Avery, Matthew, Jonathan Bell, Rebecca Medlin, and Freeman Laura. DOT&amp;amp;E Reliability Course. IDA Document NS D-5836.</description>
      <content:encoded><![CDATA[<p>This reliability course provides information to assist DOT&amp;E action officers in their review and assessment of system reliability. Course briefings cover reliability planning and analysis activities that span the acquisition life cycle. Each briefing discusses review criteria relevant to DOT&amp;E action officers based on DoD policies and lessons learned from previous oversight efforts.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Matthew, Jonathan Bell, Rebecca Medlin, and Freeman Laura. DOT&amp;E Reliability Course. IDA Document NS D-5836. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5836-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Survey Design</title>
      <link>https://research.testscience.org/post/2016-introduction-to-survey-design/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-introduction-to-survey-design/</guid>
      <description>An important goal of test and evaluation is to understand not only how a system performs in its intended environment, but also users’ experiences operating the system. This briefing aimed to provide the audience with a set of tools – most notably, surveys – that are appropriate for measuring the user experience. DOT&amp;amp;E guidance regarding these tools is highlighted where appropriate. The briefing was broken into three major sections: conceptualizing surveys, writing survey items, and formatting surveys.</description>
      <content:encoded><![CDATA[<p>An important goal of test and evaluation is to understand not only how a system performs in its intended environment, but also users’ experiences operating the system. This briefing aimed to provide the audience with a set of tools – most notably, surveys – that are appropriate for measuring the user experience. DOT&amp;E guidance regarding these tools is highlighted where appropriate. The briefing was broken into three major sections: conceptualizing surveys, writing survey items, and formatting surveys. At the end of this briefing, the audience should have a better understanding of the value and purpose of surveys and how to construct them.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Jonathan Snavely, and Justin Mary. Introduction to Survey Design. IDA Document NS D-5835. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5835.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Rigorous Test and Evaluation for Defense, Aerospace, and National Security</title>
      <link>https://research.testscience.org/post/2016-rigorous-test-and-evaluation-for-defense-aerospace-and-national-security/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-rigorous-test-and-evaluation-for-defense-aerospace-and-national-security/</guid>
      <description>In April 2016, NASA, DOT&amp;amp;E, and IDA collaborated on a workshopdesigned to strengthen the community around statistical approaches to test andevaluation in defense and aerospace. The workshop brought practitioners, analysts,technical leadership, and statistical academics together for a three day exchange ofinformation with opportunities to attend world renowned short courses, share commodchallenges, and learn new skill sets from a variety of tutorials. A highlight of theworkshop was the Tuesday afternoon technical leadership panel chaired by Dr.</description>
      <content:encoded><![CDATA[<p>In April 2016, NASA, DOT&amp;E, and IDA collaborated on a workshopdesigned to strengthen the community around statistical approaches to test andevaluation in defense and aerospace. The workshop brought practitioners, analysts,technical leadership, and statistical academics together for a three day exchange ofinformation with opportunities to attend world renowned short courses, share commodchallenges, and learn new skill sets from a variety of tutorials. A highlight of theworkshop was the Tuesday afternoon technical leadership panel chaired by Dr.Catherine Warner, Science Advisor, DOT&amp;E. This article summarizes core themesdiscuss during the panel session.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura. “Rigorous Test and Evaluation for Defense Aerospace, and National Security: A Panel Session Summary.” The ITEA Journal of Test and Evaluation 37, no. 4 (2016).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper_D-8229-non-std-Final.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Science of Test Workshop Proceedings, April 11-13, 2016</title>
      <link>https://research.testscience.org/post/2016-science-of-test-workshop-proceedings-april-11-13-2016/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-science-of-test-workshop-proceedings-april-11-13-2016/</guid>
      <description>To mark IDA&amp;rsquo;s 60th anniversary, we are conducting a series of workshops and symposia that bring together IDA sponsors, researchers, experts inside and outside government, and other stakeholders to discuss issues of the day. These events focus on future national security challenges, reflecting on how past lessons and accomplishments help prepare us to deal with complex issues and environments we face going forward. This publication represents the proceedings of the Science of Test Workshop.</description>
      <content:encoded><![CDATA[<p>To mark IDA&rsquo;s 60th anniversary, we are conducting a series of workshops and symposia that bring together IDA sponsors, researchers, experts inside and outside government, and other stakeholders to discuss issues of the day. These events focus on future national security challenges, reflecting on how past lessons and accomplishments help prepare us to deal with complex issues and environments we face going forward. This publication represents the proceedings of the Science of Test Workshop.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura, Pamela Rambow, and Jonathan Snavely. Science of Test Workshop Proceedings. IDA Document NS D-8249. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper_NS-D-8249-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Best Practices for Statistically Validating Modeling and Simulation (M&amp;S) Tools Used in Operational Testing</title>
      <link>https://research.testscience.org/post/2015-best-practices-for-statistically-validating-modeling-and-simulation-m-s-tools-used-in-operational-testing/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-best-practices-for-statistically-validating-modeling-and-simulation-m-s-tools-used-in-operational-testing/</guid>
      <description>In many situations, collecting sufficient data to evaluate system performance against operationally realistic threats is not possible due to cost and resource restrictions, safety concerns, or lack of adequate or representative threats. Modeling and simulation tools that have been verified, validated, and accredited can be used to supplement live testing in order to facilitate a more complete evaluation of performance. Two key questions that frequently arise when planning an operational test are (1) which (and how many) points within the operational space should be chosen in the simulation space and the live space for optimal ability to verify and validate the M&amp;amp;S, and (2) once that data is collected, what is the best way to compare the live trials to the simulated trials for the purpose of validating the M&amp;amp;S?</description>
      <content:encoded><![CDATA[<p>In many situations, collecting sufficient data to evaluate system performance against operationally realistic threats is not possible due to cost and resource restrictions, safety concerns, or lack of adequate or representative threats. Modeling and simulation tools that have been verified, validated, and accredited can be used to supplement live testing in order to facilitate a more complete evaluation of performance. Two key questions that frequently arise when planning an operational test are (1) which (and how many) points within the operational space should be chosen in the simulation space and the live space for optimal ability to verify and validate the M&amp;S, and (2) once that data is collected, what is the best way to compare the live trials to the simulated trials for the purpose of validating the M&amp;S? This conference presentation addresses various strategies for addressing these two questions. The best methodologies for designing and analyzing will vary depending on the goal of operational test, the type of model used in the simulation, and the amount of live and simulated data available.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Kelly, Laura Freeman, and Rebecca Medlin. Best Practices for Statistically Validating Modeling and Simulation (M&amp;S) Tools Used in Operational Testing. IDA Document NS D-5582. Alexandria, VA: Institute for Defense Analyses, 2015.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Surveys in Operational Test and Evaluation</title>
      <link>https://research.testscience.org/post/2015-surveys-in-operational-test-and-evaluation/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-surveys-in-operational-test-and-evaluation/</guid>
      <description>Recently DOT&amp;amp;E signed out a memo providing Guidance on the Use and Design of Surveys in Operational Test and Evaluation. This guidance memo helps the Human Systems Integration (HSI) community to ensure that useful and accurate HSI data are collected. Information about how HSI experts can leverage the guidance is presented. Specifically, the presentation will cover which HSI metrics can and cannot be answered by surveys.
Suggested Citation Grier, Rebecca A, and Laura Freeman.</description>
      <content:encoded><![CDATA[<p>Recently  DOT&amp;E signed out a memo providing Guidance on the Use and Design of Surveys in Operational Test and Evaluation. This guidance memo helps the Human Systems Integration (HSI) community to ensure that useful and accurate HSI data are collected. Information about how HSI experts can leverage the guidance is presented. Specifically, the presentation will cover which HSI metrics can and cannot be answered by surveys.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Grier, Rebecca A, and Laura Freeman. Surveys in Operational Test &amp; Evaluation. IDA Document D-5410. Alexandria, VA: Institute for Defense Analyses, 2015.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-5410-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Validating the PRA Testbed Using a Statistically Rigorous Approach</title>
      <link>https://research.testscience.org/post/2015-validating-the-pra-testbed-using-a-statistically-rigorous-approach/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-validating-the-pra-testbed-using-a-statistically-rigorous-approach/</guid>
      <description>For many systems, testing is expensive and only a few live test events are conducted. When this occurs, testers frequently use a model to extend the test results. However, testers must validate the model to show that it is an accurate representation of the real world from the perspective of the intended uses of the model. This raises a problem when only a small number of live test events are conducted, only limited data are available to validate the model, and some testers struggle with model validation.</description>
      <content:encoded><![CDATA[<p>For many systems, testing is expensive and only a few live test events are conducted. When this occurs, testers frequently use a model to extend the test results. However, testers must validate the model to show that it is an accurate representation of the real world from the perspective of the intended uses of the model. This raises a problem  when only a small number of live test events are conducted, only limited data are available to validate the model, and some testers struggle with model validation. This article describes a statistically rigorous approach for validating a model with only a small number of live test results. We discuss a specific application for validating a model of a naval surface combatant defending itself against a cruise missile attack. The approach takes into account potential correlation in the data and other factors that may drive system performance.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Thomas, Dean, and Rebecca Dickinson. “Validating the Probability of Raid Annihilation Testbed Using a Statistical Approach.” The ITEA Journal of Test and Evaluation 36, no. 2 (June 2015).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper_PRA" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Design of Experiments for in-Lab Operational Testing of the an/BQQ-10 Submarine Sonar System</title>
      <link>https://research.testscience.org/post/2014-design-of-experiments-for-in-lab-operational-testing-of-the-an-bqq-10-submarine-sonar-system/</link>
      <pubDate>Wed, 01 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2014-design-of-experiments-for-in-lab-operational-testing-of-the-an-bqq-10-submarine-sonar-system/</guid>
      <description>Operational testing of the AN/BQQ-10 submarine sonar system has never been able to show significant improvements in software versions because of the high variability of at sea measurements. To mitigate this problem, in the most recent AN/BQQ-10 operational test, the Navy’s operational test agency (in consultation with IDA under the direction of Director, Operational Test and Evaluation) supplemented the at sea testing with an operationally focused in-lab comparison. This test used recorded real data played back on two different versions of the sonar system.</description>
      <content:encoded><![CDATA[<p>Operational testing of the AN/BQQ-10 submarine sonar system has never been able to show significant improvements in software versions because of the high variability of at sea measurements. To mitigate this problem, in the most recent AN/BQQ-10 operational test, the Navy’s operational test agency (in consultation with IDA under the direction of Director, Operational Test and Evaluation) supplemented the at sea testing with an operationally focused in-lab comparison. This test used recorded real data played back on two different versions of the sonar system. For each version, the test recorded the time it took multiple operations, with varying operational experience, to detect a submarine target once it appeared on the display. This new test methodology had several benefits: (1) the laboratory setting allowed for the use of design of experiments to control factors that are traditionally infeasible to control during an at sea test; (2) the direct comparison between the two systems resulted in demonstrating a statistically significant reduction in the detection time for the new system. Although laboratory testing cannot replace at sea testing, the results provide strong indication that we can expect performance improvements in the operational environment. This case study shows that laboratory testing and design of experiments have a place in operational testing and should be expanded to improve testing for other systems.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Clutter, Justace R, George Khoury, and Laura Freeman. Design of Experiments for In-Lab Operational Testing of the AN/BQQ-10 Submarine Sonar System. IDA Document NS D-5486. Alexandria, VA: Institute for Defense Analyses, 2014.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5286.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Taking the Next Step- Improving the Science of Test in DoD T&amp;E</title>
      <link>https://research.testscience.org/post/2014-taking-the-next-step-improving-the-science-of-test-in-dod-t-e/</link>
      <pubDate>Wed, 01 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2014-taking-the-next-step-improving-the-science-of-test-in-dod-t-e/</guid>
      <description>The current fiscal climate demands now, more than ever, that test and evaluation(T&amp;amp;E) provide relevant and credible characterization of system capabilities andshortfalls across all relevant operational conditions as efficiently as possible. Indetermining the answer to the question, “How much testing is enough?” it isimperative that we use a scientifically defensible methodology. Design ofExperiments (DOE) has a proven track record in Operational Test andEvaluation (OT&amp;amp;E) of not only quantifying how much testing is enough, but alsowhere in the operational space the test points should be placed.</description>
      <content:encoded><![CDATA[<p>The current fiscal climate demands now, more than ever, that test and evaluation(T&amp;E) provide relevant and credible characterization of system capabilities andshortfalls across all relevant operational conditions as efficiently as possible. Indetermining the answer to the question, “How much testing is enough?” it isimperative that we use a scientifically defensible methodology. Design ofExperiments (DOE) has a proven track record in Operational Test andEvaluation (OT&amp;E) of not only quantifying how much testing is enough, but alsowhere in the operational space the test points should be placed. Over the last fewyears, the T&amp;E community has made great strides in the application of DOE toOT&amp;E, but there is still work to be done in ensuring that the scientificcommunity’s full toolset is utilized. In particular, many test programs have yet tocapitalize on the power of the test design when conducting the data analysis.Employing empirical statistical models (e.g., regression techniques, analysis ofvariance (ANOVA)) allows us to maximize the information from every data point,resulting in defensible analyses that provide crucial information about systemperformance that decision-makers and warfighters need to know. DOT&amp;E willcontinue to work to ensure the highest technical caliber in every DOT&amp;Eevaluation, and that Test and Evaluation Master Plans (TEMPs) are adequate tosupport these robust evaluations. As we improve in our use of these test designsand analysis methods, we need to ensure these practices are institutionalizedacross the entire T&amp;E community and applied across all phases of DoD testing</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura, and V. Bram Lillard. “Taking the Next Step: Improving the Science of Test in DoD T and E.” The ITEA Journal of Test and Evaluation 35, no. 1 (March 2014). <a href="https://apps.dtic.mil/sti/citations/trecms/AD1123777">https://apps.dtic.mil/sti/citations/trecms/AD1123777</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-5101-final-version.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Tutorial on the Planning of Experiments</title>
      <link>https://research.testscience.org/post/2013-a-tutorial-on-the-planning-of-experiments/</link>
      <pubDate>Tue, 01 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2013-a-tutorial-on-the-planning-of-experiments/</guid>
      <description>This tutorial outlines the basic procedures for planning experiments within the context of the scientific method. Too often quality practitioners fail to appreciate how subject-matter expertise must interact with statistical expertise to generate efficient and effective experimental programs. This tutorial guides the quality practitioner through the basic steps, demonstrated by extensive past experience, that consistently lead to successful results. This tutorial makes extensive use of flowcharts to illustrate the basic process.</description>
      <content:encoded><![CDATA[<p>This tutorial outlines the basic procedures for planning experiments within the context of the scientific method. Too often quality practitioners fail to appreciate how subject-matter expertise must interact with statistical expertise to generate efficient and effective experimental programs. This tutorial guides the quality practitioner through the basic steps, demonstrated by extensive past experience, that consistently lead to successful results. This tutorial makes extensive use of flowcharts to illustrate the basic process. Two case studies summarize the applications of the methodology.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J., Anne G. Ryan, Jennifer L. K. Kensler, Rebecca M. Dickinson, and G. Geoffrey Vining. “A Tutorial on the Planning of Experiments.” Quality Engineering 25, no. 4 (October 1, 2013): 315–32. <a href="https://doi.org/10.1080/08982112.2013.817013">https://doi.org/10.1080/08982112.2013.817013</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Scientific Test and Analysis Techniques- Statistical Measures of Merit</title>
      <link>https://research.testscience.org/post/2013-scientific-test-and-analysis-techniques-statistical-measures-of-merit/</link>
      <pubDate>Tue, 01 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2013-scientific-test-and-analysis-techniques-statistical-measures-of-merit/</guid>
      <description>Design of Experiments (DOE) provides a rigorous methodology for developing and evaluating test plans. Design excellence consists of having enough test points placed in the right locations in the operational envelope to answer the questions of interest for the test. The key aspects of a well-designed experiment include: the goal of the test, the response variables, the factors and levels, a method for strategically varying the factors across the operational envelope, and statistical measures of merit.</description>
      <content:encoded><![CDATA[<p>Design of Experiments (DOE) provides a rigorous methodology for developing and evaluating test plans. Design excellence consists of having enough test points placed in the right locations in the operational envelope to answer the questions of interest for the test. The key aspects of a well-designed experiment include: the goal of the test, the response variables, the factors and levels, a method for strategically varying the factors across the operational envelope, and statistical measures of merit. Currently, the majority of test plans utilize statistical measures of merit based on confidence and power. Although important, confidence and power are not the only measure of the adequacy and merit of a test design. The type of method that is appropriate is dependent on the goal of the test and the experimental design methodology used. There is no one-size-fits-all solution; rather there is a collection of useful tools that apply in various combinations for different test goals and designs. This talk outlines different statistical measures of merit that should be used when planning an operational test.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura. Scientific Test and Analysis Techniques: Statistical Measures of Merit. IDA Document D-5070. Alexandria, VA: Institute for Defense Analyses, 2014.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-5070-2.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Bayesian Approach to Evaluation of Land Warfare Systems</title>
      <link>https://research.testscience.org/post/2012-a-bayesian-approach-to-evaluation-of-land-warfare-systems/</link>
      <pubDate>Sun, 01 Jan 2012 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2012-a-bayesian-approach-to-evaluation-of-land-warfare-systems/</guid>
      <description>This presentation is a presentation for the Army Conference on Applied Statistics. The presentation covers a brief introduction to land warfare problems, and devises a methodology using Bayes Theorem to estimate parameters of interest. Two examples are given, a simple one using independent Bernoulli Trials, and a more complex one using correlated Red and Blue casualty data in a Loss Exchange Ratio and a hierarchical model. The presentation demonstrates that the Bayesian approach is successful in both examples at reducing the variance of the estimated parameters, potentially reducing the cost of devising a complex test program.</description>
      <content:encoded><![CDATA[<p>This presentation is a presentation for the Army Conference on Applied Statistics. The presentation covers a brief introduction to land warfare problems, and devises a methodology using Bayes Theorem to estimate parameters of interest. Two examples are given, a simple one using independent Bernoulli Trials, and a more complex one using correlated Red and Blue casualty data in a Loss Exchange Ratio and a hierarchical model. The presentation demonstrates that the Bayesian approach is successful in both examples at reducing the variance of the estimated parameters, potentially reducing the cost of devising a complex test program. The presentation concludes with suggested next steps applicable to the Army Ground Combat Vehicle program.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wilson, Alyson, Lee Dewald, Robert Holcomb, and Samuel Parry. A Bayesian Approach to Evaluation  of Land Warfare Systems. IDA Document NS D-4711. Alexandria, VA: Institute for Defense Analyses, 2012.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-4711-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Continuous Metrics for Efficient and Effective Testing</title>
      <link>https://research.testscience.org/post/2012-continuous-metrics-for-efficient-and-effective-testing/</link>
      <pubDate>Sun, 01 Jan 2012 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2012-continuous-metrics-for-efficient-and-effective-testing/</guid>
      <description>In today’s fiscal environment, efficient and effective testing is essential. Often, military system requirements are defined using probability of success as the primary measure of effectiveness – for example, a system must complete its mission 80 percent of the time; or the system must detect 90 percent of targets. The traditional approach to testing these probability-based requirements is to execute a series of trials and then total the number of successes; the ratio of successes to number of trails provides an intuitive measure of the probability of success.</description>
      <content:encoded><![CDATA[<p>In today’s fiscal environment, efficient and effective testing is essential. Often, military system requirements are defined using probability of success as the primary measure of effectiveness – for example, a system must complete its mission 80 percent of the time; or the system must detect 90 percent of targets. The traditional approach to testing these probability-based requirements is to execute a series of trials and then total the number of successes; the ratio of successes to number of trails provides an intuitive measure of the probability of success. However, this method of testing has proven to be cost prohibitive, especially at high levels of statistical confidence and power. Often, one or more continuous metrics empirically related to the probability based metric provide more information about system performance than the pass/fail construct. Using these metrics in lieu of the probability-based metrics to plan testing both reduces test costs and provides a better understanding of system performance. In this talk the authors discusses the cost of using binary test metrics (e.g., success or failure, hit or miss). They present several common T&amp;E examples, translating the original probability based requirement to a related continuous metric, and show potential cost savings and information gain achieved by the conversion.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J, and V Bram Lillard. Continuous Metrics for Efficient and Effective Testing. IDA Document NS D-4571. Alexandria, VA: Institute for Defense Analyses, 2012.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-4571-1.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Designed Experiments for the Defense Community</title>
      <link>https://research.testscience.org/post/2012-designed-experiments-for-the-defense-community/</link>
      <pubDate>Sun, 01 Jan 2012 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2012-designed-experiments-for-the-defense-community/</guid>
      <description>The areas of application for design of experiments principles have evolved, mimicking the growth of U.S. industries over the last century, from agriculture to manufacturing to chemical and process industries to the services and government sectors. In addition, statistically based quality programs adopted by businesses morphed from total quality management to Six Sigma and, most recently, statistical engineering (see Hoerl and Snee 2010). The good news about these transformations is that each evolution contains more technical substance, embedding the methodologies as core competencies, and is less of a ‘‘program.</description>
      <content:encoded><![CDATA[<p>The areas of application for design of experiments principles have evolved, mimicking the growth of U.S. industries over the last century, from agriculture to manufacturing to chemical and process industries to the services and government sectors. In addition, statistically based quality programs adopted by businesses morphed from total quality management to Six Sigma and, most recently, statistical engineering (see Hoerl and Snee 2010). The good news about these transformations is that each evolution contains more technical substance, embedding the methodologies as core competencies, and is less of a ‘‘program.’’ Design of experiments is fundamental to statistical engineering and is receiving increased attention within large government agencies such as the National Aeronautics and Space Administration (NASA) and the Department of Defense. Because test policy is intended to shape test programs, numerous test agencies have experimented with policy wording since about 2001. The Director of Operational Test &amp; Evaluation has recently (2010) published guidelines to mold test programs into a sequence of well-designed and statistically defensible experiments. Specifically, the guidelines require, for the first time, that test programs report statistical power as one proof of sound test design. This article presents the underlying tenets of design of experiments, as applied in the Department of Defense, focusing on factorial, fractional factorial, and response surface design and analyses. The concepts of statistical modeling and sequential experimentation are also emphasized. Military applications are presented for testing and evaluation of weapon system acquisition, including force-on-force tactics, weapons employment and maritime search, identification, and intercept.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Rachel T., Gregory T. Hutto, James R. Simpson, and Douglas C. Montgomery. “Designed Experiments for the Defense Community.” Quality Engineering 24, no. 1 (January 2012): 60–79. <a href="https://doi.org/10.1080/08982112.2012.627288">https://doi.org/10.1080/08982112.2012.627288</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistically Based T&amp;E Using Design of Experiments</title>
      <link>https://research.testscience.org/post/2012-statistically-based-t-e-using-design-of-experiments/</link>
      <pubDate>Sun, 01 Jan 2012 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2012-statistically-based-t-e-using-design-of-experiments/</guid>
      <description>This document outlines the charter for the Committee to Institutionalize Scientific Test Design and Rigor in Test and Evaluation. The charter defines the problem, identifies potential steps in a roadmap for accomplishing the goals of the committee and lists committeemembership. Once the committee is assembled, the members will revise this document as needed. The charter will be endorsed by DOT&amp;amp;E and DDT&amp;amp;E, once finalize.
Suggested Citation Freeman, Laura. Statistically Based T&amp;amp;E Using Design of Experiments.</description>
      <content:encoded><![CDATA[<p>This document outlines the charter for the Committee to Institutionalize Scientific Test Design and Rigor in Test and Evaluation. The charter defines the problem, identifies potential steps in a roadmap for accomplishing the goals of the committee and lists committeemembership. Once the committee is assembled, the members will revise this document as needed. The charter will be endorsed by DOT&amp;E and DDT&amp;E, once finalize.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura. Statistically Based T&amp;E Using Design of Experiments. IDA Document D-4548. Alexandria, VA: Institute for Defense Analyses, 2012.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS_D-4548.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Design of Experiments in Highly Constrained  Design Spaces</title>
      <link>https://research.testscience.org/post/2011-design-of-experiments-in-highly-constrained-design-spaces/</link>
      <pubDate>Sat, 01 Jan 2011 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2011-design-of-experiments-in-highly-constrained-design-spaces/</guid>
      <description>This presentation shows the merits of applying experimental design to operational tests, guidance on using DOE from the Director, Operational Test and Evaluation, and presents the design solution for the test of a chemical agent detector. It is important to keep in mind the advanced techniques from DOE (split-plot designs, optimal designs) to determine effective DOEs for operational testing; traditional design strategies often result in designs that are not executable.</description>
      <content:encoded><![CDATA[<p>This presentation shows the merits of applying experimental design to operational tests, guidance on using DOE from the Director, Operational Test and Evaluation, and presents the design solution for the test of a chemical agent detector.  It is important to keep in mind the advanced techniques from DOE (split-plot designs, optimal designs) to determine effective DOEs for operational testing; traditional design strategies often result in designs that are not executable.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura. “Design of Experiments in Highly Constrained Design Spaces.” Presented at the Army Conference on Applied Statistics, October 2011.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Use of Statistically Designed Experiments to Inform Decisions in a Resource Constrained Environment</title>
      <link>https://research.testscience.org/post/2011-use-of-statistically-designed-experiments-to-inform-decisions-in-a-resource-constrained-environment/</link>
      <pubDate>Sat, 01 Jan 2011 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2011-use-of-statistically-designed-experiments-to-inform-decisions-in-a-resource-constrained-environment/</guid>
      <description>There has been recent emphasis on the increased use of statistics, including the use of statistically designed experiments, to plan and execute tests that support Department of Defense (DoD) acquisition programs. The use of statistical methods, including experimental design, has shown great benefits in industry, especially when used in an integrated fashion; for example see the literature on Six Sigma. The structured approach of experimental design allows the user to determine what data need to be collected and how it should be analyzed to achieve specific decision making objectives.</description>
      <content:encoded><![CDATA[<p>There has been recent emphasis on the increased use of statistics, including the use of statistically designed experiments, to plan and execute tests that support Department of Defense (DoD) acquisition programs. The use of statistical methods, including experimental design, has shown great benefits in industry, especially when used in an integrated fashion; for example see the literature on Six Sigma. The structured approach of experimental design allows the user to determine what data need to be collected and how it should be analyzed to achieve specific decision making objectives. This focuses decision making processes, improves test efficiency and provides objective data for evidence-based decision-making. Today the DoD Test and Evaluation (T&amp;E) community is investigating the use of statistical methods to provide efficient and effective testing. This paper discusses the use of statistics in T&amp;E to assist T&amp;E practitioners and acquisition management in understanding how to improve the quantity and quality of information made available to decision makers to make risk assessments, even in a resource constrained environment.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura, Karl Glaeser, and Alethea Rucker. “Use of Statistically Design Experiments to Inform Decisions in a Resource Constrained Environment.” ITEA Journal of Test and Evaluation. 32, no. 3 (2011): 267–76.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper_D-4355-NS-final.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
  </channel>
</rss>
