<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Practitioner on Test Science Research Document Library</title>
    <link>https://research.testscience.org/audience/practitioner/</link>
    <description>Recent content in Practitioner on Test Science Research Document Library</description>
    <generator>Hugo -- 0.129.0</generator>
    <language>en-us</language>
    <copyright>Institute for Defense Analyses</copyright>
    <lastBuildDate>Mon, 01 Jan 2024 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://research.testscience.org/audience/practitioner/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Practitioner’s Framework for Federated Model Validation Resource Allocation</title>
      <link>https://research.testscience.org/post/2024-a-practitioner-s-framework-for-federated-model-validation-resource-allocation/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-a-practitioner-s-framework-for-federated-model-validation-resource-allocation/</guid>
      <description>Recent advances in computation and statistics led to an increasing use of federated models for end-to-end system test and evaluation. A federated model is a collection of interconnected models where the outputs of a model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood, especially when testers have limited resources, knowledge-based uncertainties, and concerns over operational realism. Testers often struggle with determining how to best allocate limited test resources for model validation.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/owcIxrA_sXs?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Recent advances in computation and statistics led to an increasing use of federated models for end-to-end system test and evaluation. A federated model is a collection of interconnected models where the outputs of a model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood, especially when testers have limited resources, knowledge-based uncertainties, and concerns over operational realism. Testers often struggle with determining how to best allocate limited test resources for model validation. We propose a network-based representation of federated models, where the network encodes the connections between the federation of models. Nodes of the graph are given by sub-models. A directed edge from node a to node b is drawn if a inputs into b. We quantify their uncertainties through edge weights using meta-modeling and variance-based sensitivity analysis. The network-based framework allows us to propagate the uncertainties through the federated model and optimize resource allocation for validation based on the uncertainties.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Capp, Jo Anna, John T Haman, and Dhruv Patel. A Practitioner’s Framework for Federated Model Validation Resource Allocation. IDA Product ID 3001838. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Preview of Functional Data Analysis for Modeling and Simulation Validation</title>
      <link>https://research.testscience.org/post/2024-a-preview-of-functional-data-analysis-for-modeling-and-simulation-validation/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-a-preview-of-functional-data-analysis-for-modeling-and-simulation-validation/</guid>
      <description>Modeling and simulation (M&amp;amp;S) validation for operational testing often involves comparing live data with simulation outputs. Statistical methods known as functional data analysis (FDA) provides techniques for analyzing large data sets (&amp;ldquo;large&amp;rdquo; meaning that a single trial has a lot of information associated with it), such as radar tracks. We preview how FDA methods could assist M&amp;amp;S validation by providing statistical tools handling these large data sets. This may facilitate analyses that make use of more of the data available and thus allows for better detection of differences between M&amp;amp;S predictions and live test results.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/1neGQl8Jtxs?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Modeling and simulation (M&amp;S) validation for operational testing often involves comparing live data with simulation outputs. Statistical methods known as functional data analysis (FDA) provides techniques for analyzing large data sets (&ldquo;large&rdquo; meaning that a single trial has a lot of information associated with it), such as radar tracks. We preview how FDA methods could assist M&amp;S validation by providing statistical tools handling these large data sets. This may facilitate analyses that make use of more of the data available and thus allows for better detection of differences between M&amp;S predictions and live test results. We demonstrate some fundamental FDA approaches with a notional example of live and simulated radar tracks of a bomber’s flight</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca M, and Curtis G Miller. A Preview of Functional Data Analysis for Modeling and Simulation Validation. IDA Product ID 3001829. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Reliability Assurance Test Planning and Analysis Tool</title>
      <link>https://research.testscience.org/post/2024-a-reliability-assurance-test-planning-and-analysis-tool/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-a-reliability-assurance-test-planning-and-analysis-tool/</guid>
      <description>This presentation documents the work of IDA 2024 Summer Associate Emma Mitchell. The work presented details an R Shiny application developed to provide a user-friendly software tool for researchers to use in planning for and analyzing system reliability. Specifically, the presentation details how one can plan for a reliability test using Bayesian Reliability Assurance test methods. Such tests utilize supplementary data and information, including reliability models, prior test results, expert judgment, and knowledge of environmental conditions, to plan for reliability testing, which in turn can often help in reducing the required amount of testing.</description>
      <content:encoded><![CDATA[<p>This presentation documents the work of IDA 2024 Summer Associate Emma Mitchell. The work presented details an R Shiny application developed to provide a user-friendly software tool for researchers to use in planning for and analyzing system reliability. Specifically, the presentation details how one can plan for a reliability test using Bayesian Reliability Assurance test methods. Such tests utilize supplementary data and information, including reliability models, prior test results, expert judgment, and knowledge of environmental conditions, to plan for reliability testing, which in turn can often help in reducing the required amount of testing. In the planning phase, the application enables researchers to use Bayesian methods to incorporate supplementary data when determining appropriate test lengths. In the analysis phase, the tool allows researchers to combine information through Bayesian methods, resulting in better uncertainty quantification than traditional methods.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John T, Rebecca M Medlin, Emma P Mitchell, Keyla Pagán-Rivera, and Dhruv K Patel. A Reliability Assurance Test Planning and Analysis Tool. IDA Product ID 3003359. Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "3003359%20Pagan-Rivera%20et%20al-3_slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Developing AI Trust- From Theory to Testing and the Myths in Between</title>
      <link>https://research.testscience.org/post/2024-developing-ai-trust-from-theory-to-testing-and-the-myths-in-between/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-developing-ai-trust-from-theory-to-testing-and-the-myths-in-between/</guid>
      <description>This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working definitions and works toward dispelling confusion and myths surrounding trust.
Suggested Citation Razin, Yosef S., and Kristen Alexander. “Developing AI Trust: From Theory to Testing and the Myths in Between.” The ITEA Journal of Test and Evaluation 45, no.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/xQL_kBiasPI?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems.  The paper provides a set of working definitions and works toward dispelling confusion and myths surrounding trust.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Razin, Yosef S., and Kristen Alexander. “Developing AI Trust: From Theory to Testing and the Myths in Between.” The ITEA Journal of Test and Evaluation 45, no. 1 (March 31, 2024). <a href="https://itea.org/journals/volume-45-1/developing-ai-trust-from-theory-to-testing-and-the-myths-in-between/">https://itea.org/journals/volume-45-1/developing-ai-trust-from-theory-to-testing-and-the-myths-in-between/</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Operational T&amp;E of AI-Supported Data Integration, Fusion, and Analysis Systems</title>
      <link>https://research.testscience.org/post/2024-operational-t-e-of-ai-supported-data-integration-fusion-and-analysis-systems/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-operational-t-e-of-ai-supported-data-integration-fusion-and-analysis-systems/</guid>
      <description>AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&amp;amp;E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&amp;amp;E methods for overcoming them. We demonstrate how to apply these OT&amp;amp;E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/JqlIzJh-RQI?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&amp;E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&amp;E methods for overcoming them. We demonstrate how to apply these OT&amp;E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?) and human systems interaction (how does the AI affect users?).</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Anderson, Breeana G, Adam M Miller, Logan K Ausman, John T Haman, Keyla Pagan-Rivera, Sarah A Shaffer, and Brian D Vickers. Data Integration, Fusion, and Analysis Systems. IDA Product ID 3001848. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Sequential Space-Filling Designs for Modeling &amp; Simulation Analyses</title>
      <link>https://research.testscience.org/post/2024-sequential-space-filling-designs-for-modeling-simulation-analyses/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-sequential-space-filling-designs-for-modeling-simulation-analyses/</guid>
      <description>Space-filling designs (SFDs) are a rigorous method for designing modeling and simulation (M&amp;amp;S) studies. However, they are hindered by their requirement to choose the final sample size prior to testing. Sequential designs are an alternative that can increase test efficiency by testing small amounts of data at a time. We have conducted a literature review of existing sequential space-filling designs and found the methods most applicable to the test and evaluation (T&amp;amp;E) community.</description>
      <content:encoded><![CDATA[<p>Space-filling designs (SFDs) are a rigorous method for designing modeling and simulation (M&amp;S) studies. However, they are hindered by their requirement to choose the final sample size prior to testing. Sequential designs are an alternative that can increase test efficiency by testing small amounts of data at a time. We have conducted a literature review of existing sequential space-filling designs and found the methods most applicable to the test and evaluation (T&amp;E) community.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John T, and Anna Flowers. Sequential Space-Filling Designs for Modeling &amp; Simulation Analyses. IDA Product ID 3003752. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "3003752%20Haman%20et%20al-3_slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Simulation Insights on Power Analysis with Binary Responses--from SNR Methods to &#39;skprJMP&#39;</title>
      <link>https://research.testscience.org/post/2024-simulation-insights-on-power-analysis-with-binary-responses-from-snr-methods-to-skprjmp/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-simulation-insights-on-power-analysis-with-binary-responses-from-snr-methods-to-skprjmp/</guid>
      <description>Logistic regression is a commonly-used method for analyzing tests with probabilistic responses in the test community, yet calculating power for these tests has historically been challenging. This difficulty prompted the development of methods based on signal-to-noise ratio (SNR) approximations over the last decade, tailored to address the intricacies of logistic regression&amp;rsquo;s binary outcomes. However, advancements and improvements in statistical software and computational power have reduced the need for such approximate methods.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/j0rINL3L-yo?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Logistic regression is a commonly-used method for analyzing tests with probabilistic responses in the test community, yet calculating power for these tests has historically been challenging. This difficulty prompted the development of methods based on signal-to-noise ratio (SNR) approximations over the last decade, tailored to address the intricacies of logistic regression&rsquo;s binary outcomes. However, advancements and improvements in statistical software and computational power have reduced the need for such approximate methods. Our research presents a detailed simulation study that compares SNR-based power estimates with those derived from exact Monte Carlo simulations, highlighting the inadequacies of SNR approximations. To address these shortcomings, we will discuss improvements in the open-source R package &ldquo;skpr&rdquo; as well as present &ldquo;skprJMP,&rdquo; a new plug-in that offers more accurate and reliable power calculations for logistic regression analyses.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Atkins, Robert, Tyler Morgan-Wall, and Curtis Miller. “With Binary Responses&ndash;From SNR Methods to ‘skprJMP.’” Institute for Defense Analyses IDA Product ID 3002093 (April 2024).</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistical Advantages of Validated Surveys over Custom Surveys</title>
      <link>https://research.testscience.org/post/2024-statistical-advantages-of-validated-surveys-over-custom-surveys/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-statistical-advantages-of-validated-surveys-over-custom-surveys/</guid>
      <description>Surveys play an important role in quantifying user opinion during test and evaluation (T&amp;amp;E). Current best practice is to use surveys that have been tested, or “validated,” to ensure that they produce reliable and accurate results. However, unvalidated (“custom”) surveys are still widely used in T&amp;amp;E, raising questions about how to determine sample sizes for—and interpret data from— T&amp;amp;E events that rely on custom surveys. In this presentation, I characterize the statistical properties of validated and custom survey responses using data from recent T&amp;amp;E events, and then I demonstrate how these properties affect test design, analysis, and interpretation.</description>
      <content:encoded><![CDATA[<p>Surveys play an important role in quantifying user opinion during test and evaluation (T&amp;E). Current best practice is to use surveys that have been tested, or “validated,” to ensure that they produce reliable and accurate results. However, unvalidated (“custom”) surveys are still widely used in T&amp;E, raising questions about how to determine sample sizes for—and interpret data from— T&amp;E events that rely on custom surveys. In this presentation, I characterize the statistical properties of validated and custom survey responses using data from recent T&amp;E events, and then I demonstrate how these properties affect test design, analysis, and interpretation. I show that validated surveys reduce the number of subjects required to estimate statistical parameters or to detect a mean difference between two populations. Additionally, I simulate the survey process to demonstrate how poorly designed custom surveys introduce unintended changes to the data, increasing the risk of drawing false conclusions.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Bell, Jonathan L, and Adam M Miller. Statistical Advantages of Validated  Surveys over Custom Surveys. IDA Product ID 3001858. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Uncertainty Quantification for Ground Vehicle Vulnerability Simulation</title>
      <link>https://research.testscience.org/post/2024-uncertainty-quantification-for-ground-vehicle-vulnerability-simulation/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2024-uncertainty-quantification-for-ground-vehicle-vulnerability-simulation/</guid>
      <description>A vulnerability assessment of a combat vehicle uses modeling and simulation (M&amp;amp;S) to predict the vehicle&amp;rsquo;s vulnerability to a given enemy attack. The system-level output of the M&amp;amp;S is the probability that the vehicle&amp;rsquo;s mobility is degraded as a result of the attack. The M&amp;amp;S models this system-level phenomenon by decoupling the attack scenario into a hierarchy of sub-systems. Each sub-system addresses a specific scientific problem, such as the fracture dynamics of an exploded munition, or the ballistic resistance provided by the vehicle&amp;rsquo;s armor.</description>
      <content:encoded><![CDATA[<p>A vulnerability assessment of a combat vehicle uses modeling and simulation (M&amp;S) to predict the vehicle&rsquo;s vulnerability to a given enemy attack. The system-level output of the M&amp;S is the probability that the vehicle&rsquo;s mobility is degraded as a result of the attack. The M&amp;S models this system-level phenomenon by decoupling the attack scenario into a hierarchy of sub-systems. Each sub-system addresses a specific scientific problem, such as the fracture dynamics of an exploded munition, or the ballistic resistance provided by the vehicle&rsquo;s armor. For each sub-system in the hierarchy, laboratory testing is conducted to gather data to fit a subsystem-level model.  The M&amp;S hierarchically interconnects the subsystem-level models to enable prediction of the system-level output. As part of the DoD&rsquo;s ongoing effort to improve M&amp;S using verification, validation, and uncertainty quantification, we present a case study that propagates the uncertainties in the hierarchy of sub-models to the system-level output.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Thomas H., Dhruv K. Patel, John T. Haman, Jeremy S. Werner, and Dave Higdon. “Uncertainty Quantification for Ground Vehicle Vulnerability Simulation.” Quality Engineering, August 19, 2024. <a href="https://www.tandfonline.com/doi/abs/10.1080/08982112.2024.2394437">https://www.tandfonline.com/doi/abs/10.1080/08982112.2024.2394437</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams</title>
      <link>https://research.testscience.org/post/2023-a-team-centric-metric-framework-for-testing-and-evaluation-of-human-machine-teams/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-a-team-centric-metric-framework-for-testing-and-evaluation-of-human-machine-teams/</guid>
      <description>We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working together as a team involves interactions that will only increase in complexity as machines become more intelligent, capable teammates. Assessing such teams will require explicit focus on not just the human-machine interfacing but the full spectrum of interactions between and among agents.</description>
      <content:encoded><![CDATA[<p>We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working together as a team involves interactions that will only increase in complexity as machines become more intelligent, capable teammates. Assessing such teams will require explicit focus on not just the human-machine interfacing but the full spectrum of interactions between and among agents. As opposed to focusing on isolated qualities, capabilities, and performance contributions of individual team members, the proposed framework emphasizes the collective team as the fundamental unit of analysis and the interactions of the team as the key evaluation targets, with individual human and machine metrics still vital but secondary. With teammate interaction as the organizing diagnostic concept, the resulting framework arrives at a parallel assessment of the humans and machines, analyzing their individual capabilities less with respect to purely human or machine qualities and more through the prism of contributions to the team as a whole. This treatment reflects the increased machine capabilities and will allow for continued relevance as machines develop to exercise more authority and responsibility. This framework allows for identification of features specific to human-machine teaming that influence team performance and efficiency, and it provides a basis for operationalizing in specific scenarios. Potential applications of this research include test and evaluation of complex systems that rely on human-system interaction, including—though not limited to—autonomous vehicles, command and control systems, and pilot control systems.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wilkins, Jay, David A. Sparrow, Caitlan A. Fealing, Brian D. Vickers, Kristina A. Ferguson, and Heather Wojton. “A Team-Centric Metric Framework for Testing and Evaluation of Human-Machine Teams.” Systems Engineering 27, no. 3 (May 1, 2024): 466–84. <a href="https://doi.org/10.1002/sys.21730">https://doi.org/10.1002/sys.21730</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions</title>
      <link>https://research.testscience.org/post/2023-development-of-wald-type-and-score-type-statistical-tests-to-compare-live-test-data-and-simulation-predictions/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-development-of-wald-type-and-score-type-statistical-tests-to-compare-live-test-data-and-simulation-predictions/</guid>
      <description>This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&amp;amp;A) efforts for modeling and simulation (M&amp;amp;S) environments. The test computes a Wald-type statistic comparing two generalized linear models estimated from live test data and analogous simulated data. The resulting statistic indicates whether the M&amp;amp;S outputs differ from the live data. After developing the test, we applied it to two logistic regression models estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s Environment Centric Weapons Analysis Facility (ECWAF).</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/8OgvSuwTdys?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&amp;A) efforts for modeling and simulation (M&amp;S) environments. The test computes a Wald-type statistic comparing two generalized linear models estimated from live test data and analogous simulated data. The resulting statistic indicates whether the M&amp;S outputs differ from the live data. After developing the test, we applied it to two logistic regression models estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s Environment Centric Weapons Analysis Facility (ECWAF). We developed this test to handle a specific problem with our data  one weapon variant was seen in the in-water test data, but the ECWAF data had two weapon variants. We overcame this deficiency by adjusting the Wald statistic via combining linear model coefficients with the intercept term when a factor is varied in one sample but not another. A similar approach could be applied with score-type tests, which we also describe.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Metts, Carrington, and Curtis Miller. “Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions.” The ITEA Journal of Test and Evaluation 44, no. 3 (August 25, 2023). <a href="https://itea.org/journals/volume-44-3/development-of-wald-type-and-score-type-statistical-tests-to-compare-live-test-data-and-simulation-predictions/">https://itea.org/journals/volume-44-3/development-of-wald-type-and-score-type-statistical-tests-to-compare-live-test-data-and-simulation-predictions/</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Implementing Fast Flexible Space-Filling Designs in R</title>
      <link>https://research.testscience.org/post/2023-implementing-fast-flexible-space-filling-designs-in-r/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-implementing-fast-flexible-space-filling-designs-in-r/</guid>
      <description>Modeling and simulation (M&amp;amp;S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&amp;amp;S, testers use experimental design techniques to determine how much and which types of data to collect, and they can use space-filling designs to spread out test points across the operational space. Fast flexible space-filling designs (FFSFDs) are a type of space-filling design useful for M&amp;amp;S because they work well in design spaces with disallowed combinations and permit the inclusion of categorical factors.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/vg36C3hhDmk?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Modeling and simulation (M&amp;S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&amp;S, testers use experimental design techniques to determine how much and which types of data to collect, and they can use space-filling designs to spread out test points across the operational space. Fast flexible space-filling designs (FFSFDs) are a type of space-filling design useful for M&amp;S because they work well in design spaces with disallowed combinations and permit the inclusion of categorical factors. IDA analysts developed a function to create FFSFDs using the free statistical software R. To our knowledge, there are no R packages for creating an FFSFD that can accommodate a variety of user inputs, such as categorical factors. Moreover, users of IDA’s function can share their code to make their work reproducible.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca M, and Christopher T Dimapasok. Space-Filling Designs in R. IDA Document NS 3000045. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Improving Test Efficiency- A Bayesian Assurance Case Study</title>
      <link>https://research.testscience.org/post/2023-improving-test-efficiency-a-bayesian-assurance-case-study/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-improving-test-efficiency-a-bayesian-assurance-case-study/</guid>
      <description>To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase to better quantify uncertainty. We find that when using Bayesian Methods for test planning we can scope smaller tests and using Bayesian methods in analysis results in a more precise estimate of reliability – improving uncertainty quantification.</description>
      <content:encoded><![CDATA[<p>To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase to better quantify uncertainty. We find that when using Bayesian Methods for test planning we can scope smaller tests and using Bayesian methods in analysis results in a more precise estimate of reliability – improving uncertainty quantification.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca M. A Bayesian Assurance Case Study. IDA Document NS 3000024. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Design of Experiments in R- Generating and Evaluating Designs with Skpr</title>
      <link>https://research.testscience.org/post/2023-introduction-to-design-of-experiments-in-r-generating-and-evaluating-designs-with-skpr/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-introduction-to-design-of-experiments-in-r-generating-and-evaluating-designs-with-skpr/</guid>
      <description>This workshop instructs attendees on how to run an end-to-end optimal Design of Experiments workflow in R using the open source skpr package. This workshop is split into two sections optimal design generation and design evaluation. The first half of the workshop provides basic instructions how to use R, as well as how to use skpr to create an optimal design for an experiment how to specify a model, create a candidate set of potential runs, remove disallowed combinations, and specify the design generation conditions to best suit an experimenter&amp;rsquo;s goals.</description>
      <content:encoded><![CDATA[<p>This workshop instructs attendees on how to run an end-to-end optimal Design of Experiments workflow in R using the open source skpr package. This workshop is split into two sections  optimal design generation and design evaluation. The first half of the workshop provides basic instructions how to use R, as well as how to use skpr to create an optimal design for an experiment  how to specify a model, create a candidate set of potential runs, remove disallowed combinations, and specify the design generation conditions to best suit an experimenter&rsquo;s goals.  The second half of the workshop covers design evaluation with skpr  how to determine if an experimental design is adequate for the test at hand. The workshop provides information on how to perform power calculations and evaluate other design properties that affect design quality. This also includes instruction on how to generate fraction of design space plots and correlation plots.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Morgan-Wall, Tyler T. Introduction to Design of Experiments in R: Generating and Evaluating Designs with Skpr. IDA Document NS D-33397. Alexandria, VA: Institute for Defense Analyses, 2023.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios</title>
      <link>https://research.testscience.org/post/2023-introduction-to-measuring-situational-awareness-in-mission-based-testing-scenarios/</link>
      <pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2023-introduction-to-measuring-situational-awareness-in-mission-based-testing-scenarios/</guid>
      <description>Situation Awareness (SA) plays a key role in decision making and human performance, higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” is a common requirement for systems under test, there is no single standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and sometimes suboptimal treatments of SA measurement across programs and test events.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/uqmjLsDA_EA?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Situation Awareness (SA) plays a key role in decision making and human performance, higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” is a common requirement for systems under test, there is no single standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and sometimes suboptimal treatments of SA measurement across programs and test events. This paper introduces Endsley’s three-level model of SA in dynamic decision making, a frequently used model of individual SA, reviews trade-offs in some existing measures of SA, and discusses a selection of potential ways in which SA measurement during OT may be improved.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Green, Elizabeth, Miriam Armstrong, and Janna Mantua. “Scientific Measurement of Situation Awareness in Operational Testing.” The ITEA Journal of Test and Evaluation 44, no. 3 (October 2, 2023). <a href="https://doi.org/10.61278/itea.44.3.1002">https://doi.org/10.61278/itea.44.3.1002</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data</title>
      <link>https://research.testscience.org/post/2022-metamodeling-techniques-for-verification-and-validation-of-modeling-and-simulation-data/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-metamodeling-techniques-for-verification-and-validation-of-modeling-and-simulation-data/</guid>
      <description>Modeling and simulation (M&amp;amp;S) outputs help the Director, Operational Test and Evaluation (DOT&amp;amp;E) assess the effectiveness, survivability, lethality, and suitability of systems. To use M&amp;amp;S outputs, DOT&amp;amp;E needs models and simulators to be sufficiently verified and validated. The purpose of this paper is to improve the state of verification and validation by recommending and demonstrating a set of statistical techniques—metamodels, also called statistical emulators—to the M&amp;amp;S community.
The paper expands on DOT&amp;amp;E’s existing guidance about metamodel usage by creating methodological recommendations the M&amp;amp;S community could apply to its activities.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/s4DUCI1M8Fw?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>Modeling and simulation (M&amp;S) outputs help the Director, Operational Test and Evaluation (DOT&amp;E) assess the effectiveness, survivability, lethality, and suitability of systems. To use M&amp;S outputs, DOT&amp;E needs models and simulators to be sufficiently verified and validated. The purpose of this paper is to improve the state of verification and validation by recommending and demonstrating a set of statistical techniques—metamodels, also called statistical emulators—to the M&amp;S community.</p>
<p>The paper expands on DOT&amp;E’s existing guidance about metamodel usage by creating methodological recommendations the M&amp;S community could apply to its activities. For a deterministic, discrete response variable, we recommend using a nearest neighbor or decision tree model. For a deterministic, continuous response variable, we recommend Gaussian process interpolation. For a stochastic response variable, we recommend a generalized additive model. We also present a set of techniques that testers can use to assess the adequacy of their metamodels. We conclude with a notional example that demonstrates the recommended techniques.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John T, and Curtis G Miller. Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data. IDA Paper P-33230. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Predicting Trust in Automated Systems - An Application of TOAST</title>
      <link>https://research.testscience.org/post/2022-predicting-trust-in-automated-systems-an-application-of-toast/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-predicting-trust-in-automated-systems-an-application-of-toast/</guid>
      <description>Following Wojton&amp;rsquo;s research on the Trust of Automated Systems Test (TOAST), which is designed to measure how much a human trusts an automated system, we aimed to determine how well this scale performs when not used in a military context. We found that participants who used a poorly performing automated system trusted the system less than expected when using that system on a case by case basis, however, those who used a high performing system trusted the system the same as they expected.</description>
      <content:encoded><![CDATA[<p>Following Wojton&rsquo;s research on the Trust of Automated Systems Test (TOAST), which is designed to measure how much a human trusts an automated system, we aimed to determine how well this scale performs when not used in a military context. We found that participants who used a poorly performing automated system trusted the system less than expected when using that system on a case by case basis, however, those who used a high performing system trusted the system the same as they expected. Additionally, both participants who used the poorly performing system and those who used the high performing system lost a significant amount of trust after using the system on a group case basis. These results indicate that having a high performance system is important for trust, but only when the user has the ability to decide to trust or distrust the system on a case-by-case basis.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Porter, Daniel J, and Caitlan A Fealing. Predicting Trust in Automated Systems – An Application of TOAST. IDA Document NS D-33188. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Thoughts on Applying Design of Experiments (DOE) to Cyber Testing</title>
      <link>https://research.testscience.org/post/2022-thoughts-on-applying-design-of-experiments-doe-to-cyber-testing/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-thoughts-on-applying-design-of-experiments-doe-to-cyber-testing/</guid>
      <description>This briefing presented at Dataworks 2022 provides examples of potential ways in which Design of Experiments (DOE) could be applied to initially scope cyber assessments and, based on the results of those assessments, subsequently design in greater detail cyber tests.
Suggested Citation Gilmore, James M, Kelly M Avery, Matthew R Girardi, and Rebecca M Medlin. Thoughts on Applying Design of Experiments (DOE) to Cyber Testing. IDA Document NS D-33023. Alexandria, VA: Institute for Defense Analyses, 2022.</description>
      <content:encoded><![CDATA[<p>This briefing presented at Dataworks 2022 provides examples of potential ways in which Design of Experiments (DOE) could be applied to initially scope cyber assessments and, based on the results of those assessments, subsequently design in greater detail cyber tests.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Gilmore, James M, Kelly M Avery, Matthew R Girardi, and Rebecca M Medlin. Thoughts on Applying Design of Experiments (DOE) to Cyber Testing. IDA Document NS D-33023. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-33023.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Topological Modeling of Human-Machine Teams</title>
      <link>https://research.testscience.org/post/2022-topological-modeling-of-human-machine-teams/</link>
      <pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2022-topological-modeling-of-human-machine-teams/</guid>
      <description>A Human-Machine Team (HMT) is a group ofagents consisting of at least one human and at least one machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential. Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios.</description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/E1vPChYwf-k?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>A Human-Machine Team (HMT) is a group ofagents consisting of at least one human and at least one machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential. Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios. You cannot predict team performance from knowledge of the individual team agents, alone, interaction between the humans and machines — and interaction between team agents, in general — increases the problem space and adds a measure of unpredictability. Collective team or group performance, in turn, depends heavily on how a team is structured and organized, as well as the mechanisms, paths, and substructures through which the agents in the team interact with one another — i.e. the team&rsquo;s topology. With the tools and metrics for measuring team structure and interaction becoming more highly developed in recent years, we will propose and discuss a practical, topological HMT modeling framework that not only takes into account but is actually built around the team&rsquo;s topological characteristics, while still utilizing the individual human and machine performance measures.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wilkins, Leonard D, Caitlan A Fealing, V. Bram Lillard, and John Haman. Topological Modeling of Human-Machine Teams. IDA Document NS D-33031. Alexandria, VA: Institute for Defense Analyses, 2022.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Bayesian Analysis</title>
      <link>https://research.testscience.org/post/2021-introduction-to-bayesian-analysis/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-introduction-to-bayesian-analysis/</guid>
      <description>As operational testing becomes increasingly integrated and research questions become more difficult to answer, IDA’s Test Science team has found Bayesian models to be powerful data analysis methods. Analysts and decision-makers should understand the differences between this approach and the conventional way of analyzing data. It is also important to recognize when an analysis could benefit from the inclusion of prior information—what we already know about a system’s performance—and to understand the proper way to incorporate that information.</description>
      <content:encoded><![CDATA[<p>As operational testing becomes increasingly integrated and research questions become more difficult to answer, IDA’s Test Science team has found Bayesian models to be powerful data analysis methods. Analysts and decision-makers should understand the differences between this approach and the conventional way of analyzing data. It is also important to recognize when an analysis could benefit from the inclusion of prior information—what we already know about a system’s performance—and to understand the proper way to incorporate that information. To apply Bayesian methods, analysts need to comprehend some technical aspects of this approach and know how to properly use appropriate statistical software. In this course, students learn the intuition behind Bayesian statistics, the mathematical details of posterior distributions, how to fit simple Bayesian models using computer software, and how to assess model fit.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather M, Keyla Pagan-Rivera, John T Haman, and Rebecca M Medlin. Introduction to Bayesian Analysis. IDA Document NS D-20484. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-20484.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Space-Filling Designs for Modeling &amp; Simulation</title>
      <link>https://research.testscience.org/post/2021-space-filling-designs-for-modeling-simulation/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-space-filling-designs-for-modeling-simulation/</guid>
      <description>This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&amp;amp;S) data collection.
Suggested Citation Avery, Kelly, John T Haman, Thomas Johnson, Curtis Miller, Dhruv Patel, and Han Yi. Test Design Challenges in Defense Testing. IDA Product ID 3002855. Alexandria, VA: Institute for Defense Analyses, 2024.
Slides: Paper: </description>
      <content:encoded><![CDATA[<p>This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&amp;S) data collection.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Avery, Kelly, John T Haman, Thomas Johnson, Curtis Miller, Dhruv Patel, and Han Yi. Test Design Challenges in Defense Testing. IDA Product ID 3002855. Alexandria, VA: Institute for Defense Analyses, 2024.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Warhead Arena Analysis Advancements</title>
      <link>https://research.testscience.org/post/2021-warhead-arena-analysis-advancements/</link>
      <pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2021-warhead-arena-analysis-advancements/</guid>
      <description>Fragmentation analysis is a critical piece of the live fire test and evaluation (LFT&amp;amp;E) of the lethality and vulnerability aspects of warheads. But the traditional methods for data collection are expensive and laborious. New optical tracking technology is promising to increase the fidelity of fragmentation data, and decrease the time and costs associated with data collection. However, the new data will be complex, three-dimensional &amp;ldquo;fragmentation clouds,&amp;rdquo; possibly with a time component as well, and there will be a larger number of individual data points.</description>
      <content:encoded><![CDATA[<p>Fragmentation analysis is a critical piece of the live fire test and evaluation (LFT&amp;E) of the lethality and vulnerability aspects of warheads. But the traditional methods for data collection are expensive and laborious. New optical tracking technology is promising to increase the fidelity of fragmentation data, and decrease the time and costs associated with data collection. However, the new data will be complex, three-dimensional &ldquo;fragmentation clouds,&rdquo; possibly with a time component as well, and there will be a larger number of individual data points. This raises questions about how testers can effectively summarize spatial data and use it to draw conclusions about warhead performance for sponsors. In this briefing, we will discuss Bayesian spatial models that are effective for characterizing the mass and velocity fragmentation distributions, along with several exploratory data analysis techniques that help us make sense of the data. Our goals are to</p>
<ol>
<li>
<p>Produce simple statistics and visuals that help the live fire analyst compare and contrast warhead fragmentations.</p>
</li>
<li>
<p>Characterize important performance attributes or confirm design/spec compliance.</p>
</li>
<li>
<p>Provide data methods that ensure higher fidelity data collection translates to higher fidelity modeling and simulation down the line.</p>
</li>
</ol>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Couch, Mark, Thomas Johnson, John Haman, Kerry Walzl, Heather Wojton, Thomas Hatch-Aguilar, and David Higdon. Warhead Arena Analysis Advancements. IDA Document NS-D-11038. Alexandria, VA: Institute for Defense Analyses, 2021.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>A Review of Sequential Analysis</title>
      <link>https://research.testscience.org/post/2020-a-review-of-sequential-analysis/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-a-review-of-sequential-analysis/</guid>
      <description>Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the course of the investigation. Expanding the use of sequential analysis has the potential to save resources and reduce test time (National Research Council, 1998). This paper summarizes the literature on sequential analysis and offers fundamental information for providing recommendations for its use in DoD test and evaluation.</description>
      <content:encoded><![CDATA[<p>Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the course of the investigation. Expanding the use of sequential analysis has the potential to save resources and reduce test time (National Research Council, 1998). This paper summarizes the literature on sequential analysis and offers fundamental information for providing recommendations for its use in DoD test and evaluation.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Rebecca Medlin, John Dennis, Keyla Pagan-Rivera, and Leonard Wilkins. A Review of Sequential Analysis. IDA Document NS D-20487. Alexandria, VA: Institute for Defense Analyses, 2020.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper_seq_review.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Circular Prediction Regions for Miss Distance Models under Heteroskedasticity</title>
      <link>https://research.testscience.org/post/2020-circular-prediction-regions-for-miss-distance-models-under-heteroskedasticity/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2020-circular-prediction-regions-for-miss-distance-models-under-heteroskedasticity/</guid>
      <description>Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss distance response variable is bivariate normal and has a mean and variance that can change with one or more experimental factors. The first approach fits a heteroskedastic linear model using restricted maximum likelihood, and uses the Kenward-Roger statistic to estimate circular prediction regions.</description>
      <content:encoded><![CDATA[<p>Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss distance response variable is bivariate normal and has a mean and variance that can change with one or more experimental factors. The first approach fits a heteroskedastic linear model using restricted maximum likelihood, and uses the Kenward-Roger statistic to estimate circular prediction regions. The second approach fits the analogous Bayesian model with unrestricted likelihood modifications, and computes circular prediction regions by sampling from the posterior predictive distribution. The two approaches are applied to an example problem, and are compared using simulation.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Thomas H., John T. Haman, Heather Wojton, and Laura Freeman. “Circular Prediction Regions for Miss Distance Models under Heteroskedasticity.” Quality and Reliability Engineering International 37, no. 7 (November 2021): 2991–3003. <a href="https://doi.org/10.1002/qre.2771">https://doi.org/10.1002/qre.2771</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Bayesian Component Reliability- An F-35 Case Study</title>
      <link>https://research.testscience.org/post/2019-bayesian-component-reliability-an-f-35-case-study/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-bayesian-component-reliability-an-f-35-case-study/</guid>
      <description>A challenging aspect ofa system reliability assessment is integratingmultiple sources of information, such as component, subsystem, and full-system data,along with previous test data or subject matter expert (SME) opinion. A powerfulfeature of Bayesian analyses is the ability to combine these multiple sources of dataand variability in an informed way to perform statistical inference. This feature isparticularly valuable in assessing system reliability where testing is limited and only asmall number of failures (or none at all) are observed.</description>
      <content:encoded><![CDATA[<p>A challenging aspect ofa system reliability assessment is integratingmultiple sources of information, such as component, subsystem, and full-system data,along with previous test data or subject matter expert (SME) opinion. A powerfulfeature of Bayesian analyses is the ability to combine these multiple sources of dataand variability in an informed way to perform statistical inference. This feature isparticularly valuable in assessing system reliability where testing is limited and only asmall number of failures (or none at all) are observed.The F-35 is DoD&rsquo;s largest program; approximately one-third of the operations andsustainment cost is attributed to the cost of spare parts and the removal, replacement,and repair of components. The failure rate of those components is the drivingparameter for a significant portion of the sustainment cost, and yet for many of thesecomponents, available estimates of the failure rate are poor. For many programs, thecontractor produces estimates of component failure rates based on engineering analysisand legacy systems with similar parts. While these estimates are useful, the actualremoval rates provide a more accurate estimate of the removal and replacement ratesthe program will experience in future years.In this document, we show how we applied a Bayesian analysis to combine theengineering reliability estimates with the actual failure data to estimate componentreliability. Our analysis technique also allows for us to overcome the problems of caseswhere few or no failures have been observed. We are able to show that combining theengineering knowledge of reliability with the observed operational reliability results inboth a more informed estimate of each individual component&rsquo;s reliaiblity and a moreinformed estimate of overall F-35 maintenance costs.The technique presented is broadly applicable to any progam where multiple sourcesof reliability information need to be combined for the best estimation of componentfailure rates, and ultimately of sustainment costs.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca M, and V. Bram Lillard. Bayesian Component Reliability Estimation: An F-35 Case Study. IDA Document NS D-10561. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-10561-NS-Bayesian-Component-Reliability-Estimation---an-F-35-Case-Study.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Challenges and New Methods for Designing Reliability Experiments</title>
      <link>https://research.testscience.org/post/2019-challenges-and-new-methods-for-designing-reliability-experiments/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-challenges-and-new-methods-for-designing-reliability-experiments/</guid>
      <description>Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges in designing reliability experiments for complex, repairable systems. We leverage lessons learned from current research and propose methods for designing an experiment for a complex, repairable system.
Suggested Citation Freeman, Laura J., Rebecca M. Medlin, and Thomas H.</description>
      <content:encoded><![CDATA[<p>Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges in designing reliability experiments for complex, repairable systems. We leverage lessons learned from current research and propose methods for designing an experiment for a complex, repairable system.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J., Rebecca M. Medlin, and Thomas H. Johnson. “Challenges and New Methods for Designing Reliability Experiments.” Quality Engineering 31, no. 1 (January 2, 2019): 108–21. <a href="https://doi.org/10.1080/08982112.2018.1546394">https://doi.org/10.1080/08982112.2018.1546394</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Handbook on Statistical Design &amp; Analysis Techniques for Modeling &amp; Simulation Validation</title>
      <link>https://research.testscience.org/post/2019-handbook-on-statistical-design-analysis-techniques-for-modeling-simulation-validation/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-handbook-on-statistical-design-analysis-techniques-for-modeling-simulation-validation/</guid>
      <description>This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&amp;amp;A) and the emerging references on uncertainty quantification (UQ). The goal of this handbook is to aid the test and evaluation (T&amp;amp;E) community in developing test strategies that support model validation (both external validation and parametric analysis) and statistical UQ.
Suggested Citation Wojton, Heather, Kelly M Avery, Laura J Freeman, Samuel H Parry, Gregory S Whittier, Thomas H Johnson, and Andrew C Flack.</description>
      <content:encoded><![CDATA[<p>This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&amp;A) and the emerging references on uncertainty quantification (UQ). The goal of this handbook is to aid the test and evaluation (T&amp;E) community in developing test strategies that support model validation (both external validation and parametric analysis) and statistical UQ.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather, Kelly M Avery, Laura J Freeman, Samuel H Parry, Gregory S Whittier, Thomas H Johnson, and Andrew C Flack. Handbook on Statistical Design &amp; Analysis Techniques for Modeling &amp; Simulation Validation. IDA Document NS D-10455. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>M&amp;S Validation for the Joint Air-to-Ground Missile</title>
      <link>https://research.testscience.org/post/2019-m-s-validation-for-the-joint-air-to-ground-missile/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-m-s-validation-for-the-joint-air-to-ground-missile/</guid>
      <description>An operational test is resource-limited and must therefore rely on both live test data and modeling and simulation (M&amp;amp;S) data to inform a full evaluation. For the Joint Air-to-Ground Missile (JAGM) system, we needed to create a test design that accomplished dual goals, characterizing missile performance across the operational space and supporting rigorous validation of the M&amp;amp;S. Our key question is which statistical techniques should be used to compare the M&amp;amp;S to the live data?</description>
      <content:encoded><![CDATA[<p>An operational test is resource-limited and must therefore rely on both live test data and modeling and simulation (M&amp;S) data to inform a full evaluation.  For the Joint Air-to-Ground Missile (JAGM) system, we needed to create a test design that accomplished dual goals, characterizing missile performance across the operational space and supporting rigorous validation of the M&amp;S.  Our key question is which statistical techniques should be used to compare the M&amp;S to the live data?</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Crabtree, Brent, Andrew Cseko, Joel Williamson, and Kelly Avery. M&amp;S Validation for the Joint Air-to-Ground Missile. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="poster">Poster:</h4>
<embed src= "poster.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Operational Testing of Systems with Autonomy</title>
      <link>https://research.testscience.org/post/2019-operational-testing-of-systems-with-autonomy/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-operational-testing-of-systems-with-autonomy/</guid>
      <description>Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test &amp;amp; Evaluation along with his deputies and Chief Scientist.</description>
      <content:encoded><![CDATA[<p>Systems with autonomy pose unique challenges for operational test. This document provides an executive level overview of these issues and the proposed solutions and reforms. In order to be ready for the testing challenges of the next century, we will need to change the entire acquisition life cycle, starting even from initial system conceptualization. This briefing was presented to the Director, Operational Test &amp; Evaluation along with his deputies and Chief Scientist.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wojton, Heather M, Daniel Porter, Yevgeniya Pinelis, Chad Bieber, Heather Wojton, Michael McAnally, and Laura Freeman. Operational Testing of Systems with Autonomy. IDA Document NS D-9266. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Sample Size Determination Methods Using Acceptance Sampling by Variables</title>
      <link>https://research.testscience.org/post/2019-sample-size-determination-methods-using-acceptance-sampling-by-variables/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-sample-size-determination-methods-using-acceptance-sampling-by-variables/</guid>
      <description>Acceptance Sampling by Variables (ASbV) is a statistical testing technique used in Personal Protective Equipment programs to determine the quality of the equipment in First Article and Lot Acceptance Tests. This article intends to remedy the lack of existing references that discuss the similarities between ASbV and certain techniques used in different sub-disciplines within statistics. Understanding ASbV from a statistical perspective allows testers to create customized test plans, beyond what is available in MIL-STD-414.</description>
      <content:encoded><![CDATA[<p>Acceptance Sampling by Variables (ASbV) is a statistical testing technique used in Personal Protective Equipment programs to determine the quality of the equipment in First Article and Lot Acceptance Tests. This article intends to remedy the lack of existing references that discuss the similarities between ASbV and certain techniques used in different sub-disciplines within statistics. Understanding ASbV from a statistical perspective allows testers to create customized test plans, beyond what is available in MIL-STD-414.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Walzl, Kerry, Lindsey A Davis, Thomas H Johnson, and Heather M Wojton. Sample Size Determination Methods Using Acceptance Sampling by Variables. IDA Document NS D-10666. Alexandria, VA: Institute for Defense Analyses, 2019.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>The Effect of Extremes in Small Sample Size on Simple Mixed Models- A Comparison of Level-1 and Level-2 Size</title>
      <link>https://research.testscience.org/post/2019-the-effect-of-extremes-in-small-sample-size-on-simple-mixed-models-a-comparison-of-level-1-and-level-2-size/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-the-effect-of-extremes-in-small-sample-size-on-simple-mixed-models-a-comparison-of-level-1-and-level-2-size/</guid>
      <description>We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, our findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research.</description>
      <content:encoded><![CDATA[<p>We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, our findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Carter, Kristina A, Heather M Wojton, and Stephanie T Lane. “The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size.” The ITEA Journal of Test and Evaluation 40, no. 1 (2019): 16–29.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>The Purpose of Mixed-Effects Models in Test and Evaluation</title>
      <link>https://research.testscience.org/post/2019-the-purpose-of-mixed-effects-models-in-test-and-evaluation/</link>
      <pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2019-the-purpose-of-mixed-effects-models-in-test-and-evaluation/</guid>
      <description>Mixed-effects models are the standard technique for analyzing data with grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in many operational tests. In this article, we describe the advantages of modeling data from a mixed-effects perspective and discuss an R package—ciTools—that equips the user with easy methods for presenting results from this type of model.
Suggested Citation Haman, John, Matthew Avery, and Heather Wojton.</description>
      <content:encoded><![CDATA[<p>Mixed-effects models are the standard technique for analyzing data with grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in many operational tests. In this article, we describe the advantages of modeling data from a mixed-effects perspective and discuss an R package—ciTools—that equips the user with easy methods for presenting results from this type of model.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Haman, John, Matthew Avery, and Heather Wojton. “The Purpose of Mixed-Effects Models in Test and Evaluation.” The ITEA Journal of Test and Evaluation 40, no. 4 (2019): 249–55.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Analysis of Split-Plot Reliability Experiments with Subsampling</title>
      <link>https://research.testscience.org/post/2018-analysis-of-split-plot-reliability-experiments-with-subsampling/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-analysis-of-split-plot-reliability-experiments-with-subsampling/</guid>
      <description>Reliability experiments are important for determining which factors drive product reliability. The data collected in these experiments can be challenging to analyze. Often, the reliability or lifetime data collected follow distinctly nonnormal distributions and include censored observations. Additional challenges in the analysis arise when the experiment is executed with restrictions on randomization. The focus of this paper is on the proper analysis of reliability data collected from a nonrandomized reliability experiments.</description>
      <content:encoded><![CDATA[<p>Reliability experiments are important for determining which factors drive product reliability. The data collected in these experiments can be challenging to analyze. Often, the reliability or lifetime data collected follow distinctly nonnormal distributions and include censored observations. Additional challenges in the analysis arise when the experiment is executed with restrictions on randomization. The focus of this paper is on the proper analysis of reliability data collected from a nonrandomized reliability experiments. Specifically, we focus on the analysis of lifetime data from a split-plot experimental design. We outline a nonlinear mixed-model analysis for a split-plot reliability experiment with subsampling and right-censored Weibull distributed lifetime data. A simulation study compares the proposed method with a two-stage method of analysis.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca M., Laura J. Freeman, Jennifer L.K. Kensler, and G. Geoffrey Vining. “Analysis of Split-Plot Reliability Experiments with Subsampling.” Quality and Reliability Engineering International 35, no. 3 (2019): 738–49. <a href="https://doi.org/10.1002/qre.2394">https://doi.org/10.1002/qre.2394</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "subsampling_paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Comparing M&amp;S Output to Live Test Data- A Missile System Case Study</title>
      <link>https://research.testscience.org/post/2018-comparing-m-s-output-to-live-test-data-a-missile-system-case-study/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-comparing-m-s-output-to-live-test-data-a-missile-system-case-study/</guid>
      <description>In the operational testing of DoD weapons systems, modeling and simulation (M&amp;amp;S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&amp;amp;S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&amp;amp;S output.</description>
      <content:encoded><![CDATA[<p>In the operational testing of DoD weapons systems, modeling and simulation (M&amp;S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&amp;S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&amp;S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&amp;S and live data across multiple operational conditions and quantify the associated uncertainties.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Thomas, Dean, and Kelly M Avery. Comparing M&amp;S Output to Live Test Data: A Missile System Case Study. IDA Non-Standard Document NS D-9002. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Improved Surface Gunnery Analysis with Continuous Data</title>
      <link>https://research.testscience.org/post/2018-improved-surface-gunnery-analysis-with-continuous-data/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-improved-surface-gunnery-analysis-with-continuous-data/</guid>
      <description>Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allows us to draw statistical conclusions with tactical implications from free-play,live-fire surface gunnery events. Our analysis provided the Navy with suggestions forimprovements to its tactics and the employment of its weapons. A censored analysisenabled us to do so, where other methods fell short.
Suggested Citation Ashwell, Benjamin A, V Bram Lillard, and George M Khoury. Improved Surface Gunnery Analysis with Continuous Data.</description>
      <content:encoded><![CDATA[<p>Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allows us to draw statistical conclusions with tactical implications from free-play,live-fire surface gunnery events. Our analysis provided the Navy with suggestions forimprovements to its tactics and the employment of its weapons. A censored analysisenabled us to do so, where other methods fell short.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Ashwell, Benjamin A, V Bram Lillard, and George M Khoury. Improved Surface Gunnery Analysis with Continuous Data. IDA Document NS D-8990. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-8990-NS.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Introduction to Observational Studies</title>
      <link>https://research.testscience.org/post/2018-introduction-to-observational-studies/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-introduction-to-observational-studies/</guid>
      <description> A presentation on the theory and practice of observational studies. Specific average treatment effect methods include matching, difference-in-difference estimators, and instrumental variables.
Suggested Citation Thomas, Dean, and Yevgeniya K Pinelis. Introduction to Observational Studies. IDA Document NS D-9020. Alexandria, VA: Institute for Defense Analyses, 2018.
Slides: </description>
      <content:encoded><![CDATA[

    
    <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/csO17jA2cxI?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
      ></iframe>
    </div>

<p>A presentation on the theory and practice of observational studies.  Specific average treatment effect methods include matching, difference-in-difference estimators, and instrumental variables.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Thomas, Dean, and Yevgeniya K Pinelis. Introduction to Observational Studies. IDA Document NS D-9020. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Parametric Reliability Models Tutorial</title>
      <link>https://research.testscience.org/post/2018-parametric-reliability-models-tutorial/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-parametric-reliability-models-tutorial/</guid>
      <description>This tutorial demonstrates how to plot reliability functions parametrically in R using the output from any reliability modeling software. It provides code and sample plots of reliability and failure rate functions with confidence intervals for three different skewed probability distributions the exponential, the two-parameter Weibull, and the lognormal. These three distributions are the most common parametric models for reliability or survival analysis. This paper also provides mathematical background for the models and recommendations for when to use them.</description>
      <content:encoded><![CDATA[<p>This tutorial demonstrates how to plot reliability functions parametrically in R using the output from any reliability modeling software. It provides code and sample plots of reliability and failure rate functions with confidence intervals for three different skewed probability distributions  the exponential, the two-parameter Weibull, and the lognormal. These three distributions are the most common parametric models for reliability or survival analysis. This paper also provides mathematical background for the models and recommendations for when to use them.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Pinelis, Yevgeniya K, and William R Whitledge. “Tutorial: Parametric Reliability Models.” Institute for Defense Analyses IDA Non-Standard Document NS D-9171 (September 2018).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Scientific Test and Analysis Techniques</title>
      <link>https://research.testscience.org/post/2018-scientific-test-and-analysis-techniques/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-scientific-test-and-analysis-techniques/</guid>
      <description>Abstract This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&amp;amp;E) continuous learning module. The module provides a basic understanding of STAT in T&amp;amp;E. Topics coverec include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&amp;amp;E certification curriculum.
Slides </description>
      <content:encoded><![CDATA[<h3 id="abstract">Abstract</h3>
<p>This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&amp;E) continuous learning module. The module provides a basic understanding of STAT in T&amp;E. Topics coverec include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&amp;E certification curriculum.</p>
<h3 id="slides-hahahugoshortcode75s0hbhb">Slides <embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >
</h3>
]]></content:encoded>
    </item>
    <item>
      <title>Scientific Test and Analysis Techniques- Continuous Learning Module</title>
      <link>https://research.testscience.org/post/2018-scientific-test-and-analysis-techniques-continuous-learning-module/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-scientific-test-and-analysis-techniques-continuous-learning-module/</guid>
      <description>This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&amp;amp;E) continuous learning module. The module provides a basic understanding of STAT in T&amp;amp;E. Topics covered include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&amp;amp;E certification curriculum.
Suggested Citation Pinelis, Yevgeniya, Laura J Freeman, Heather M Wojton, Denise J Edwards, Stephanie T Lane, and James R Simpson.</description>
      <content:encoded><![CDATA[<p>This document contains the technical content for the Scientific Test and Analysis Techniques (STAT) in Test and Evaluation (T&amp;E) continuous learning module. The module provides a basic understanding of STAT in T&amp;E. Topics covered include design of experiments, observational studies, survey design and analysis, and statistical analysis. It is designed as a four hour online course, suitable for inclusion in the DAU T&amp;E certification curriculum.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Pinelis, Yevgeniya, Laura J Freeman, Heather M Wojton, Denise J Edwards, Stephanie T Lane, and James R Simpson. Scientific Test and Analysis Techniques: Continuous Learning Module. IDA  Document NS D-892. Alexandria, VA: Institute for Defense Analyses, 2018.</p>
</blockquote>
]]></content:encoded>
    </item>
    <item>
      <title>Testing Defense Systems</title>
      <link>https://research.testscience.org/post/2018-testing-defense-systems/</link>
      <pubDate>Mon, 01 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2018-testing-defense-systems/</guid>
      <description>The complex, multifunctional nature of defense systems, along with the wide variety of system types, demands a structured but flexible analytical process for testing systems. This chapter summarizes commonly used techniques in defense system testing and specific challenges imposed by the nature of defense system testing. It highlights the core statistical methodologies that have proven useful in testing defense systems. Case studies illustrate the value of using statistical techniques in the design of tests and analysis of the resulting data.</description>
      <content:encoded><![CDATA[<p>The complex, multifunctional nature of defense systems, along with the wide variety of system types, demands a structured but flexible analytical process for testing systems. This chapter summarizes commonly used techniques in defense system testing and specific challenges imposed by the nature of defense system testing. It highlights the core statistical methodologies that have proven useful in testing defense systems. Case studies illustrate the value of using statistical techniques in the design of tests and analysis of the resulting data. The chapter focuses on the unique statistical challenges of designing operational tests, many of which can be attributed to the process, but some of which are inherent to the complexity of the systems and the missions system operators must complete. It provides an overview of the process of designing experiments for military systems with operational users in an operational environment.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J., Thomas Johnson, Matthew Avery, V. Bram Lillard, and Justace Clutter. “Testing Defense Systems.” In Analytic Methods in Systems and Software Testing, 439–87. John Wiley &amp; Sons, Ltd, 2018. <a href="https://doi.org/10.1002/9781119357056.ch18">https://doi.org/10.1002/9781119357056.ch18</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Comparing Live Missile Fire and Simulation</title>
      <link>https://research.testscience.org/post/2017-comparing-live-missile-fire-and-simulation/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-comparing-live-missile-fire-and-simulation/</guid>
      <description>Modeling and Simulation is frequently used in Test and Evaluation (T&amp;amp;E) of air-to-air weapon systems to evaluate the effectiveness of a weapons. The AirIntercept Missile-9X (AIM-9X) program uses modeling and simulationextensively to evaluate missile miss distances. Since flight testing isexpensive, the test program uses relatively few flight tests and supplementsthose data with large numbers of miss distances from simulated tests acrossthe weapons operational space. However, before modeling and simulation canbe used to predict performance it must first be validated.</description>
      <content:encoded><![CDATA[<p>Modeling and Simulation is frequently used in Test and Evaluation (T&amp;E) of air-to-air weapon systems to evaluate the effectiveness of a weapons. The AirIntercept Missile-9X (AIM-9X) program uses modeling and simulationextensively to evaluate missile miss distances. Since flight testing isexpensive, the test program uses relatively few flight tests and supplementsthose data with large numbers of miss distances from simulated tests acrossthe weapons operational space. However, before modeling and simulation canbe used to predict performance it must first be validated. Validation isespecially challenging when working with a limited number of live test data. Inthis presentation, we show that even with a limited number of live test points(e.g., 16 missile fires), we can still perform a statistical analysis for thevalidation. We introduce a validation technique known as Fisher&rsquo;s CombinedProbability Test and show how to apply Fisher&rsquo;s test to validate the AIM-9Xmodel and simulation.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Medlin, Rebecca, Pamela Rambow, and Douglas Peek. Comparing Live Missile Fire and Simulation. IDA Document NS D-8443. Alexandria, VA: Institute for Defense Analyses, 2017.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-8443.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>On Scoping a Test that Addresses the Wrong Objective</title>
      <link>https://research.testscience.org/post/2017-on-scoping-a-test-that-addresses-the-wrong-objective/</link>
      <pubDate>Sun, 01 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2017-on-scoping-a-test-that-addresses-the-wrong-objective/</guid>
      <description>Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In this paper, we focus on a specific Type III error that on some occasions test planners commit to reduce test size and resources.
Suggested Citation Johnson, Thomas H., Rebecca M.</description>
      <content:encoded><![CDATA[<p>Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In this paper, we focus on a specific Type III error that on some occasions test planners commit to reduce test size and resources.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Thomas H., Rebecca M. Medlin, Laura J. Freeman, and James R. Simpson. “On Scoping a Test That Addresses the Wrong Objective.” Quality Engineering 31, no. 2 (April 3, 2019): 230–39. <a href="https://doi.org/10.1080/08982112.2018.1479035">https://doi.org/10.1080/08982112.2018.1479035</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Bayesian Reliability- Combining Information</title>
      <link>https://research.testscience.org/post/2016-bayesian-reliability-combining-information/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-bayesian-reliability-combining-information/</guid>
      <description>One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems where testing is limited. At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models.</description>
      <content:encoded><![CDATA[<p>One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems where testing is limited. At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. We introduce the Bayesian approach to reliability using several examples and point to open problems and areas for future work.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Wilson, Alyson G., and Kassandra M. Fronczyk. “Bayesian Reliability: Combining Information.” Quality Engineering, August 26, 2016, 0–0. <a href="https://doi.org/10.1080/08982112.2016.1211889">https://doi.org/10.1080/08982112.2016.1211889</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Tutorial on Sensitivity Testing in Live Fire Test and Evaluation</title>
      <link>https://research.testscience.org/post/2016-tutorial-on-sensitivity-testing-in-live-fire-test-and-evaluation/</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2016-tutorial-on-sensitivity-testing-in-live-fire-test-and-evaluation/</guid>
      <description>A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile&amp;rsquo;s probability of penetrating the armor. In this mini-tutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation.</description>
      <content:encoded><![CDATA[<p>A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile&rsquo;s probability of penetrating the armor. In this mini-tutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Thomas, Laura Freeman, and Raymond Chen. Tutorial on Sensitivity Testing in Live Fire Test and Evaluation. IDA Document NS D-5829. Alexandria, VA: Institute for Defense Analyses, 2016.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5829.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Estimating System Reliability from Heterogeneous Data</title>
      <link>https://research.testscience.org/post/2015-estimating-system-reliability-from-heterogeneous-data/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-estimating-system-reliability-from-heterogeneous-data/</guid>
      <description>This briefing provides an example of some of the nuanced issues in reliability estimation in operational testing. The statistical models are motivated by an example of the Paladin Integrated Management (PIM). We demonstrate how to use a Bayesian approach to reliability estimation that uses data from all phases of testing.
Suggested Citation Browning, Caleb, Laura Freeman, Alyson Wilson, Kassandra Fronczyk, and Rebecca Dickinson. “Estimating System Reliability from Heterogeneous Data.” Presented at the Conference on Applied Statistics in Defense, George Mason University, October 2015.</description>
      <content:encoded><![CDATA[<p>This briefing provides an example of some of the nuanced issues in reliability estimation in operational testing.  The statistical models are motivated by an example of the Paladin Integrated Management (PIM).  We demonstrate how to use a Bayesian approach to reliability estimation that uses data from all phases of testing.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Browning, Caleb, Laura Freeman, Alyson Wilson, Kassandra Fronczyk, and Rebecca Dickinson. “Estimating System Reliability from Heterogeneous Data.” Presented at the Conference on Applied Statistics in Defense, George Mason University, October 2015.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Improving Reliability Estimates with Bayesian Statistics</title>
      <link>https://research.testscience.org/post/2015-improving-reliability-estimates-with-bayesian-statistics/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-improving-reliability-estimates-with-bayesian-statistics/</guid>
      <description>This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology.
Suggested Citation Freeman, Laura J, and Kassandra Fronczyk. “Improving Reliability Estimates with Bayesian Statistics.” ITEA Journal of Test and Evaluation 37, no. 4 (June 2015).
Paper: </description>
      <content:encoded><![CDATA[<p>This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J, and Kassandra Fronczyk. “Improving Reliability Estimates with Bayesian Statistics.” ITEA Journal of Test and Evaluation 37, no. 4 (June 2015).</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Statistical Models for Combining Information Stryker Reliability Case Study</title>
      <link>https://research.testscience.org/post/2015-statistical-models-for-combining-information-stryker-reliability-case-study/</link>
      <pubDate>Thu, 01 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2015-statistical-models-for-combining-information-stryker-reliability-case-study/</guid>
      <description>Reliability is an essential element in assessing the operational suitability of Department of Defense weapon systems. Reliability takes a prominent role in both the design and analysis of operational tests. In the current era of reduced budgets and increased reliability requirements, it is challenging to verify reliability requirements in a single test. Furthermore, all available data should be considered in order to ensure evaluations provide the most appropriate analysis of the system’s reliability.</description>
      <content:encoded><![CDATA[<p>Reliability is an essential element in assessing the operational suitability of Department of Defense weapon systems. Reliability takes a prominent role in both the design and analysis of operational tests. In the current era of reduced budgets and increased reliability requirements, it is challenging to verify reliability requirements in a single test. Furthermore, all available data should be considered in order to ensure evaluations provide the most appropriate analysis of the system’s reliability. This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed and they are compared and contrasted to illustrate different statistical methods for combining information. We apply these methods to data collected during the developmental and operational test phases for the Stryker family of vehicles. We show that, when we combine the available information across two test phases for the Stryker family of vehicles, reliability estimates are more accurate and precise than those reported previously using traditional methods that use only operational test data in their reliability assessments.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Steiner, Stefan, Rebecca M. Dickinson, Laura J. Freeman, Bruce A. Simpson, and Alyson G. Wilson. “Statistical Methods for Combining Information: Stryker Family of Vehicles Reliability Case Study.” Journal of Quality Technology 47, no. 4 (October 2015): 400–415. <a href="https://doi.org/10.1080/00224065.2015.11918142">https://doi.org/10.1080/00224065.2015.11918142</a>.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Applying Risk Analysis to Acceptance Testing of Combat Helmets</title>
      <link>https://research.testscience.org/post/2014-applying-risk-analysis-to-acceptance-testing-of-combat-helmets/</link>
      <pubDate>Wed, 01 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2014-applying-risk-analysis-to-acceptance-testing-of-combat-helmets/</guid>
      <description>Acceptance testing of combat helmets presents multiple challenges that require statistically-sound solutions. For example, how should first article and lot acceptance tests treat multiple threats and measures of performance? How should these tests account for multiple helmet sizes and environmental treatments? How closely should first article testing requirements match historical or characterization test data? What government and manufacturer risks are acceptable during lot acceptance testing? Similar challenges arise when testing other components of Personal Protective Equipment and similar statistical approaches should be applied to all components.</description>
      <content:encoded><![CDATA[<p>Acceptance testing of combat helmets presents multiple challenges that require statistically-sound solutions. For example, how should first article and lot acceptance tests treat multiple threats and measures of performance? How should these tests account for multiple helmet sizes and environmental treatments? How closely should first article testing requirements match historical or characterization test data? What government and manufacturer risks are acceptable during lot acceptance testing? Similar challenges arise when testing other components of Personal Protective Equipment and similar statistical approaches should be applied to all components. This presentation explores these questions using operating characteristics curves and simulation studies.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Hester, Janice, and Laura Freeman. Applying Risk Analysis to Acceptance Testing of Combat Helmets. IDA Document NS D-5334. Alexandria, VA: Institute for Defense Analyses, 2014.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_NS-D-5334.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Power Analysis Tutorial for Experimental Design Software</title>
      <link>https://research.testscience.org/post/2014-power-analysis-tutorial-for-experimental-design-software/</link>
      <pubDate>Wed, 01 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2014-power-analysis-tutorial-for-experimental-design-software/</guid>
      <description>This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX).
Suggested Citation Freeman, Laura J., Thomas H. Johnson, and James R. Simpson. “Power Analysis Tutorial for Experimental Design Software:” Fort Belvoir, VA: Defense Technical Information Center, November 1, 2014. https://doi.org/10.21236/ADA619843.
Paper: </description>
      <content:encoded><![CDATA[<p>This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX).</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura J., Thomas H. Johnson, and James R. Simpson. “Power Analysis Tutorial for Experimental Design Software:” Fort Belvoir, VA: Defense Technical Information Center, November 1, 2014. <a href="https://doi.org/10.21236/ADA619843">https://doi.org/10.21236/ADA619843</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Censored Data Analysis- A Statistical Tool for Efficient and Information-Rich Testing</title>
      <link>https://research.testscience.org/post/2013-censored-data-analysis-a-statistical-tool-for-efficient-and-information-rich-testing/</link>
      <pubDate>Tue, 01 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2013-censored-data-analysis-a-statistical-tool-for-efficient-and-information-rich-testing/</guid>
      <description>Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are information-poor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics.</description>
      <content:encoded><![CDATA[<p>Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes.  However, they are information-poor metrics and extremely expensive to test.  The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics.  A solution is to convert to a continuous variable, such as miss distance or time-to-detect.  The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance.  Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis.  This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Lillard, V. Bram. Censored Data Analysis: A Statistical Tool for Efficient and Information-Rich Testing. IDA Document D-4912. Alexandria, VA: Institute for Defense Analyses, 2013.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-4912-non-std.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>An Expository Paper on Optimal Design</title>
      <link>https://research.testscience.org/post/2011-an-expository-paper-on-optimal-design/</link>
      <pubDate>Sat, 01 Jan 2011 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2011-an-expository-paper-on-optimal-design/</guid>
      <description>There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are constraints on the design region, and when a non-standard model is expected to be required to adequately explain the response.
Suggested Citation Johnson, Rachel T., Douglas C. Montgomery, and Bradley A. Jones. “An Expository Paper on Optimal Design.</description>
      <content:encoded><![CDATA[<p>There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are constraints on the design region, and when a non-standard model is expected to be required to adequately explain the response.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Johnson, Rachel T., Douglas C. Montgomery, and Bradley A. Jones. “An Expository Paper on Optimal Design.” Quality Engineering 23, no. 3 (July 2011): 287–301. <a href="https://doi.org/10.1080/08982112.2011.576203">https://doi.org/10.1080/08982112.2011.576203</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Design for Reliability using Robust Parameter Design</title>
      <link>https://research.testscience.org/post/2011-design-for-reliability-using-robust-parameter-design/</link>
      <pubDate>Sat, 01 Jan 2011 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2011-design-for-reliability-using-robust-parameter-design/</guid>
      <description>Recently, the principles of Design of Experiments (DOE) have been implemented as amethod of increasing the statistical rigor of operational tests. The focus has been on ensuringcoverage of the operational envelope in terms of system effectiveness. DOE is applicable inreliability analysis as well. A reliability standard, ANSI-0009, advocates the use Design forReliability (DfR) early in the product development cycle in order to design-in reliability. Robustparameter design (RPD) first used by Taguchi and then by the response surface communityprovides insights on how DOE can be used to make a products and processes invariant tochanges in factors.</description>
      <content:encoded><![CDATA[<p>Recently, the principles of Design of Experiments (DOE) have been implemented as amethod of increasing the statistical rigor of operational tests. The focus has been on ensuringcoverage of the operational envelope in terms of system effectiveness. DOE is applicable inreliability analysis as well. A reliability standard, ANSI-0009, advocates the use Design forReliability (DfR) early in the product development cycle in order to design-in reliability. Robustparameter design (RPD) first used by Taguchi and then by the response surface communityprovides insights on how DOE can be used to make a products and processes invariant tochanges in factors. Using the principles ofRPD, I propose a new application of RPD to DfR.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Freeman, Laura. Design for Reliability Using Robust Parameter Design. IDA Document D-4387. Alexandria, VA: Institute for Defense Analyses, 2011.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides_D-4387-non-std.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Hybrid Designs- Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models</title>
      <link>https://research.testscience.org/post/2011-hybrid-designs-space-filling-and-optimal-experimental-designs-for-use-in-studying-computer-simulation-models/</link>
      <pubDate>Sat, 01 Jan 2011 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2011-hybrid-designs-space-filling-and-optimal-experimental-designs-for-use-in-studying-computer-simulation-models/</guid>
      <description>This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed.
Suggested Citation Silvestrini, Rachel Johnson. “Hybrid Designs: Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models.” Monterey, California, May 2011.
Slides: </description>
      <content:encoded><![CDATA[<p>This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Silvestrini, Rachel Johnson. “Hybrid Designs: Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models.” Monterey, California, May 2011.</p>
</blockquote>
<h4 id="slides">Slides:</h4>
<embed src= "slides.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
    <item>
      <title>Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods</title>
      <link>https://research.testscience.org/post/2010-examining-improved-experimental-designs-for-wind-tunnel-testing-using-monte-carlo-sampling-methods/</link>
      <pubDate>Fri, 01 Jan 2010 00:00:00 +0000</pubDate>
      <guid>https://research.testscience.org/post/2010-examining-improved-experimental-designs-for-wind-tunnel-testing-using-monte-carlo-sampling-methods/</guid>
      <description>In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity to reduce wind tunnel test efforts without losing test information.
Suggested Citation Hill, Raymond R., Derek A. Leggio, Shay R. Capehart, and August G. Roesener. “Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods.” Quality and Reliability Engineering International 27, no.</description>
      <content:encoded><![CDATA[<p>In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity to reduce wind tunnel test efforts without losing test information.</p>
<h4 id="suggested-citation">Suggested Citation</h4>
<blockquote>
<p>Hill, Raymond R., Derek A. Leggio, Shay R. Capehart, and August G. Roesener. “Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods.” Quality and Reliability Engineering International 27, no. 6 (October 2011): 795–803. <a href="https://doi.org/10.1002/qre.1165">https://doi.org/10.1002/qre.1165</a>.</p>
</blockquote>
<h4 id="paper">Paper:</h4>
<embed src= "paper.pdf" width= "100%" height= "700px" type="application/pdf" >

]]></content:encoded>
    </item>
  </channel>
</rss>
