top of page

2.3 Measurement System Analysis

Measurement System Analysis Introduction to Measurement System Analysis Measurement System Analysis (MSA) is the structured evaluation of how well a measurement process performs. It examines the entire chain from the object being measured to the recorded value, including the instrument, the procedure, the environment, and the people using the system. The goal is to understand and quantify how much of the observed variation in data comes from the measurement system itself versus actual variation in the process or product. Without a capable and stable measurement system, all subsequent analysis and decisions become unreliable. MSA applies to both variable (continuous) data and attribute (discrete, pass/fail, classification) data. --- Fundamental Concepts in Measurement Systems Key Definitions - Measurement system The complete process used to obtain measurements: instruments, standards, methods, operators, environment, and samples. - True value The theoretical correct value of the characteristic being measured. It is usually unknown and approximated by a high-precision reference or consensus standard. - Observed value The actual value obtained from the measurement system for a specific part or sample. - Reference standard A well-characterized part or artifact whose value is known with high accuracy, often traceable to national or international standards. Sources of Measurement Variation Measurement results can be influenced by several factors: - Instrument Resolution, wear, calibration state, design limitations. - Operator Skill, technique, interpretation, fatigue, bias. - Method/procedure Measurement steps, fixtures, setup, instructions. - Environment Temperature, humidity, vibration, lighting, cleanliness. - Sample/part Geometry, surface finish, variability within the part, positioning. MSA aims to separate and quantify these influences to estimate how reliable the measurements are. --- Key Performance Characteristics of a Measurement System Accuracy and Bias - Accuracy The closeness of agreement between the observed measurements and the true or reference value. - Bias The systematic difference between the average of repeated measurements and the true (or reference) value. - Positive bias: measurements are consistently high. - Negative bias: measurements are consistently low. Bias is usually evaluated by measuring a reference part multiple times and comparing the average measurement to its known reference value. Linearity - Linearity The change in measurement bias over the operating range of the measurement system. A system may be accurate near the center of its range but biased at the upper or lower ends. Linearity analysis evaluates bias at multiple points across the measurement range using parts or standards with known reference values. Precision, Repeatability, and Reproducibility - Precision The closeness of agreement among repeated measurements under specified conditions. It reflects the spread of measurements, not their closeness to the true value. Precision has two main components: - Repeatability Variation when the same operator measures the same part using the same instrument, under the same conditions, over a short period. - Reproducibility Variation when different operators measure the same part with the same instrument, using the same method. Combined, these are often called Gage R&R (Gage Repeatability and Reproducibility). Stability - Stability The ability of a measurement system to produce consistent measurements over time under constant conditions. Instability appears as drift or sudden shifts in measurements even when the part, method, and environment are unchanged. Stability is typically evaluated using repeated measurements of a master or reference part over an extended period. Resolution and Discrimination - Resolution (discrimination) The smallest change in the measured characteristic that the instrument can reliably detect and display. A practical rule is that resolution should be small relative to the process variation or tolerance, so that the measurement system can distinguish meaningful differences between parts. --- Variable Data MSA: Gage R&R Studies Purpose and General Approach For continuous data, Gage R&R studies estimate how much of the total observed variation comes from the measurement system. The study is typically designed to: - Quantify repeatability and reproducibility. - Compare measurement variation to: - Total process variation. - Engineering tolerance. The results guide decisions on whether the measurement system is adequate for its intended use and what improvements may be needed. Basic Study Design Elements A standard Gage R&R study for variable data typically includes: - Parts A set of parts that represent the actual or expected process variation (wide spread across the range). - Operators Multiple operators who normally perform the measurements. - Replicates Multiple repeated measurements of each part by each operator. - Randomization The order of measurements is randomized to avoid sequence-related bias. A common design has: - 10 parts - 3 operators - 2 or 3 replicates per operator-part combination Values may be adjusted based on context and available resources, but the design must support separating part, operator, and measurement system effects. ANOVA vs. Range-Based Methods Two main approaches are used to analyze Gage R&R data: - Range-based method (short method) Uses ranges of repeated measurements and averages to estimate repeatability and reproducibility. It is simpler but less flexible for unbalanced designs. - ANOVA method Uses analysis of variance to estimate components of variance directly: - Part-to-part variance - Repeatability variance - Reproducibility variance (including operator and part–operator interaction) The ANOVA method is more general and better suited for complex or unbalanced data. Components of Variation in Variable MSA The key components are: - Part-to-part variation Variation due to actual differences between parts. - Equipment variation (EV) Repeatability: variation when the same operator measures the same part repeatedly with the same instrument. - Appraiser variation (AV) Reproducibility: variation due to differences between operators. - Gage R&R variation Combined effect of repeatability and reproducibility: - Gage R&R = EV + AV (in terms of variance components) - Gage R&R standard deviation is the square root of the sum of variance components - Total variation (TV) Overall observed variation in the dataset: - TV combines part-to-part and Gage R&R variation (again via variance components). Key Metrics for Variable MSA Several metrics help interpret Gage R&R results: - % Contribution Proportion of total variance attributable to each source: - %Contribution (Gage R&R) = (Variance due to Gage R&R / Total variance) × 100 - % Study Variation Ratio of the standard deviation for each component to the total process standard deviation: - %StudyVar (Gage R&R) = (Std dev of Gage R&R / Std dev of total) × 100 - % Tolerance Portion of the engineering tolerance consumed by measurement system variation: - %Tolerance (Gage R&R) = (6 × Std dev of Gage R&R / Tolerance) × 100 (Using 6 standard deviations for an approximate full spread of normal variation.) - Number of distinct categories (ndc) An estimate of the number of reliably distinguishable groups the measurement system can identify: - Typically ndc = 1.41 × (Part-to-part standard deviation / Gage R&R standard deviation) - ndc is usually rounded down to the nearest whole number. These metrics focus on how much measurement variation exists relative to the process variation and tolerance, and how many meaningful groups can be distinguished. Interpreting Variable Gage R&R Results Interpretation guidelines (commonly used in practice): - %StudyVar or %Contribution for Gage R&R - Low values: measurement system contributes little to total variation. - High values: measurement system is a major source of variation. - %Tolerance - Small %Tolerance: measurement variation is small relative to specification limits. - Large %Tolerance: measurement variation consumes a large portion of the tolerance window. - ndc - Larger ndc suggests that the measurement system can differentiate multiple levels of the characteristic. - Very low ndc indicates the system is too noisy to distinguish groups reliably. Thresholds and acceptance criteria vary by industry and application. Interpretation must consider: - Criticality of the characteristic. - Process capability. - Decision risks (internal adjustments vs. customer acceptance decisions). --- Variable MSA: Bias, Linearity, and Stability Studies Bias Study A bias study evaluates how far the measurement system’s average reading is from a reference value. Basic steps: - Select a reference part or standard with a known reference value. - Obtain multiple measurements using the measurement system under standard conditions. - Calculate: - Average measured value. - Bias = Average measured value − Reference value. The magnitude and direction of bias are evaluated, typically using statistical tests or confidence intervals to determine whether the bias is statistically significant and practically important. Linearity Study A linearity study evaluates whether measurement bias is consistent across the entire measurement range. Basic approach: - Choose multiple parts or standards spanning the operating range of the measurement system. - For each part: - Record the known reference value. - Take multiple measurements and calculate the average measured value. - Compute the bias at that value (average measured − reference). - Analyze bias vs. reference value, often using regression to: - Estimate overall bias (intercept). - Estimate how bias changes over the range (slope). A significant slope indicates that bias changes with the size of the measurement, revealing non-linearity in the system. Stability Study A stability study tracks measurement results over time to detect drift or sudden shifts. Basic design: - Use a stable reference part or master sample. - Measure it periodically over a defined time frame (e.g., daily, weekly). - Plot the measurements on a control chart or time series. Key interpretations: - Stable pattern (random variation within expected limits) suggests adequate stability. - Trends, cycles, or points outside limits indicate instability and potential issues such as: - Instrument wear or damage. - Calibration drift. - Environmental changes. Stability problems must be addressed before relying on measurements for process decisions or capability assessments. --- Attribute Data MSA Nature of Attribute Measurement Systems Attribute data are discrete outcomes such as: - Pass/fail - Accept/reject - Defect present/absent - Category classifications (e.g., grade levels, codes, defect types) Attribute measurement systems are often less precise than variable systems and more dependent on human judgment. Errors can significantly bias defect rates, process capability estimates, and improvement conclusions. Types of Error in Attribute Systems Common error types include: - False accept (Type II error) Defective or nonconforming units classified as acceptable. - False reject (Type I error) Good or conforming units classified as defective. - Misclassification Assigning incorrect category codes or defect types. Both operators and criteria (e.g., unclear definitions, inconsistent visual standards) contribute to attribute measurement error. Attribute Agreement Analysis Attribute agreement analysis evaluates how consistently and correctly attribute decisions are made by one or more appraisers. Typical design elements: - Sample set A collection of items representing both good and defective conditions, and all relevant categories. Known reference classifications (standards) are highly desirable. - Appraisers Individuals who normally perform the inspection or classification. - Replications Multiple rounds of classification, often with randomized order and time separation to reduce memory effects. The study evaluates: - Within-appraiser agreement Consistency of each appraiser across repeated evaluations of the same items. - Between-appraiser agreement Consistency among different appraisers on the same items. - Appraisers vs. reference Agreement between each appraiser’s classifications and the known standard or reference decision. Metrics for Attribute MSA Key metrics often include: - Percent agreement Proportion of classifications that match the standard or each other. - Overall percent agreement. - Within-appraiser percent agreement. - Between-appraiser percent agreement. - Sensitivity Probability that the system correctly identifies defective units (true positive rate). - Specificity Probability that the system correctly identifies good units (true negative rate). - Kappa statistics Measures of agreement adjusted for agreement expected by chance: - Cohen’s kappa for agreement between appraiser and standard or between two appraisers. - Fleiss-type kappa for multiple appraisers. Higher kappa values indicate better agreement beyond chance. Interpretation must consider: - Impact of misclassification on decisions. - Balance between false accept and false reject risks. - Consistency of criteria and training. --- Designing Effective MSA Studies Selecting Parts and Samples For both variable and attribute MSA: - Parts should represent the full range of expected variation. - Include borderline cases near decision thresholds or specification limits. - Avoid only “easy” or extreme cases that inflate agreement. For variable Gage R&R: - Use enough parts to capture part-to-part variation. - Ensure parts are stable and will not change during the study (e.g., no significant wear or degradation). For attribute agreement: - Include: - Clear good units. - Clear bad units. - Borderline units. - Ensure reference classifications are as accurate as possible, often using expert consensus or rigorous criteria. Selecting Appraisers and Conditions - Include appraisers who perform the task in real operations. - Reflect the variety of skill levels and shifts when feasible. - Keep instructions consistent and documented. - Use normal operating conditions to ensure representativeness. Variation deliberately introduced in the study (e.g., different shifts or conditions) must be planned to understand how it affects measurement performance. Replications and Randomization - Multiple replications: - Allow separation of repeatability and reproducibility. - Increase the reliability of estimates. - Randomization: - Prevents systematic patterns and memory effects. - Reduces learning and order bias. In attribute studies, sufficient time between replications helps avoid appraisers remembering prior decisions. --- Improving Measurement Systems Investigating Poor MSA Results When a measurement system is found inadequate, consider investigating: - Procedures Ambiguity, unnecessary complexity, missing steps. - Training Inconsistent understanding, lack of practice, misinterpretation of criteria. - Instruments Wear, inappropriate range, poor resolution, need for calibration or replacement. - Environment Temperature or humidity control, vibration, lighting, cleanliness. - Sample handling Positioning, fixturing, preparation, marking. Analysis should focus on the specific component of variation that is high: - High repeatability variation suggests instrument or method issues. - High reproducibility variation suggests operator or training issues. - High bias or non-linearity suggests calibration or design limitations. - Poor stability suggests drift or uncontrolled environmental factors. Actions to Enhance Measurement Capability Possible improvement actions: - Clarify and standardize measurement procedures. - Provide focused training and practice with feedback. - Upgrade or maintain instruments (repair, recalibrate, or replace). - Improve fixtures and positioning aids to reduce operator influence. - Enhance environmental controls. - Refine attribute criteria with better visual aids or examples. - Automate measurements where appropriate, while validating the new system via MSA. After making changes, a follow-up MSA is needed to confirm improvement. --- Relationship Between MSA and Process Analysis Although Measurement System Analysis is focused on the measurement process rather than the production process, it directly affects the validity of: - Process capability analysis. - Control charts and process stability assessments. - Hypothesis tests and confidence intervals. - Regression and modeling efforts. - Improvement and optimization decisions. Measurement error inflates observed variation and can: - Underestimate process capability. - Mask real changes or improvements. - Create false signals of process shifts. - Distort estimates of relationships between variables. Reliable MSA ensures that data used in analysis reflect true process behavior as accurately as possible. --- Summary Measurement System Analysis provides a structured way to understand and quantify how much variation comes from the measurement process itself. It focuses on key characteristics: - Accuracy and bias How close measurements are to reference values. - Linearity How bias changes across the measurement range. - Precision (repeatability and reproducibility) How consistent measurements are under repeated and varied conditions. - Stability How measurements behave over time. For variable data, Gage R&R studies estimate measurement variation relative to total variation and tolerance, using metrics like %StudyVar, %Tolerance, and ndc. For attribute data, attribute agreement analysis evaluates agreement within and between appraisers, and against reference standards, using percent agreement and chance-corrected measures such as kappa. Effective MSA requires thoughtful study design, representative samples, appropriate appraisers, and proper randomization and replication. When MSA results indicate problems, targeted improvements to instruments, procedures, training, and environment can enhance measurement capability. By ensuring that measurement systems are accurate, precise, stable, and fit for purpose, MSA provides a sound foundation for all subsequent statistical analysis and decision-making.

Practical Case: Measurement System Analysis A pharmaceutical plant fills 10 mL vials with a critical liquid. Operators complain that the automated filler “keeps drifting,” triggering frequent line stoppages and rework for supposed underfills. Quality reviews recent batches: actual product complaints are low, but in-process checks show high variation and frequent out-of-spec readings. Production wants to recalibrate or replace the filler; maintenance argues the machine is stable. The Lean Six Sigma team suspects the issue may lie in the measurement system rather than the filler itself. They decide to run a Measurement System Analysis on the in-line checkweigher used to verify vial fill volume. They select a small, representative sample of vials, secretly label them, and have three experienced operators each measure the same vials multiple times, in random order, over one shift, using the same checkweigher. The analyst compiles the readings and runs a Gage R&R study. The results show that most of the observed variation comes from the measurement system, not the vials. Operator-to-operator inconsistency and poor equipment resolution are both significant. In particular, the same vial is frequently classified as both “in spec” and “out of spec” depending on who measures it and when. The team retrains operators on measurement technique, standardizes the measurement procedure, and upgrades the checkweigher to improve resolution and stability. A follow-up Measurement System Analysis confirms acceptable measurement variation and clear, consistent pass/fail decisions. After the improvement, line stoppages drop sharply, rework decreases, and no filler recalibration or major mechanical changes are needed. The plant recognizes that its prior “filler problem” was largely a measurement system issue revealed and resolved through Measurement System Analysis. End section

Practice question: Measurement System Analysis An engineering team conducts a variable Gage R&R study (crossed design) with 3 operators, 10 parts, and 2 trials. The study shows: Total Gage R&R = 18% of total variation, Part-to-Part = 80%, and Operator-by-Part interaction = 2%. How should the measurement system be interpreted? A. Acceptable; the measurement system is adequate for most process improvement applications B. Unacceptable; Gage R&R must always be below 10% of total variation C. Unacceptable; Operator-by-Part interaction makes the system unusable D. Marginal; the system can only be used for tracking large shifts, not for process improvement Answer: A Reason: For variable data, a Total Gage R&R <10% is ideal, 10–30% is conditionally acceptable, and >30% is generally unacceptable. At 18%, with Part-to-Part dominating the variation and negligible interaction, the system is acceptable for most process improvement and control purposes. Other options are incorrect because Gage R&R does not always need to be <10% (B), the small interaction does not invalidate the system (C), and a value of 18% is not considered merely marginal or limited to large shifts (D). --- In a measurement system analysis, which metric is most appropriate to evaluate the consistency of a single operator’s repeated measurements on the same parts for a continuous characteristic? A. Percent Tolerance consumed by Gage R&R B. Within-Operator (Repeatability) standard deviation C. Between-Operator (Reproducibility) variance component D. Number of distinct categories (NDC) Answer: B Reason: Repeatability is the variation when the same operator measures the same part multiple times using the same instrument; it is quantified by the within-operator standard deviation or variance component. Other options are not the best choice because %Tolerance is a system-level adequacy measure (A), reproducibility is between operators (C), and NDC reflects the overall system’s discrimination ability, not specifically a single operator’s repeatability (D). --- A Black Belt is comparing two attribute inspection systems for the same defect type. Each system is evaluated against a known standard using a 2×2 confusion matrix. Which metric is most critical to assess the risk of accepting defective units as conforming? A. Sensitivity (True Positive Rate) B. Specificity (True Negative Rate) C. False Negative Rate D. False Positive Rate Answer: C Reason: The False Negative Rate (1 – Sensitivity) reflects the probability that a defective unit is incorrectly classified as conforming, directly representing the risk of passing defects downstream. Other options are not the best because sensitivity alone is the complement of the desired measure (A), specificity relates to correctly classifying good units (B), and the False Positive Rate reflects the risk of rejecting good units, not the risk of accepting bad ones (D). --- A variable Gage R&R study for a critical dimension reports: Total Gage R&R = 35% of total variation, Part-to-Part = 65%, NDC = 3. The tolerance range is very tight relative to process variation. What is the most appropriate Black Belt recommendation? A. Accept the gage as-is; focus solely on reducing process variation B. Improve or replace the measurement system before using it for process capability assessment C. Increase sample size in the Gage R&R study to reduce the %Gage R&R value D. Rely on operator training only, since repeatability and reproducibility are combined Answer: B Reason: A Total Gage R&R >30% and NDC <5 indicate the measurement system cannot adequately distinguish between parts; using it for capability analysis or fine process adjustments would be misleading. The system should be improved or replaced first. Other options are incorrect because a poor measurement system cannot simply be ignored (A), sample size will not fix intrinsic measurement variation (C), and training alone is unlikely to resolve both repeatability and reproducibility issues when the overall %R&R is this high (D). --- During an MSA for an automated measuring device, the Black Belt observes a consistent difference of +0.12 mm from the reference standard across the entire measurement range, with very low dispersion around the mean. Which issue is primarily present, and what is the most appropriate action? A. High repeatability error; conduct a full Gage R&R study B. Linearity problem; check device accuracy at multiple points along the range C. Bias error; perform calibration to adjust the measurement system D. Stability problem; monitor the device over time with control charts Answer: C Reason: A consistent offset from the reference standard across the range indicates bias (systematic error). Calibration or adjustment is needed to remove or compensate for this bias, given that dispersion (precision) is acceptable. Other options are not the best because high repeatability error would show increased dispersion, not a constant offset (A); a linearity problem would manifest as varying bias over the range, not a uniform shift (B); and stability issues concern changes over time, not a constant bias at one point in time (D).

bottom of page