23h 59m 59s
🔥 Flash Sale -50% on Mock exams ! Use code 6sigmatool50 – Offer valid for 24 hours only! 🎯
2.3.3 Gage Repeatability & Reproducibility
Gage Repeatability & Reproducibility Introduction Gage Repeatability & Reproducibility (Gage R&R) is a Measurement System Analysis (MSA) method used to quantify how much of the observed variation in data comes from the measurement system itself rather than from the process or the parts being measured. A solid understanding of Gage R&R is essential whenever you base decisions on measured data, such as determining process capability, setting specifications, or evaluating improvements. --- Measurement System Basics What Is a Measurement System? A measurement system includes: - Gage or instrument – device used to obtain the measurement. - Operators or appraisers – people using the gage. - Procedures – methods, instructions, and conditions. - Environment – temperature, humidity, vibration, lighting. - Standards and references – master parts, calibration standards. The goal is to have measurements that are: - Accurate – close to the true value. - Precise – consistent and low in variation. - Stable – consistent over time. - Linear – performance consistent across the measurement range. Gage R&R focuses specifically on the precision portion (variation) of the measurement system. --- Core Concepts of Gage R&R Total Observed Variation Whenever you measure a part, the observed variation can be decomposed as: - Total Variation (TV) = Part-to-Part Variation + Measurement System Variation Measurement system variation is further decomposed into: - Repeatability (Equipment variation) - Reproducibility (Appraiser variation) Gage R&R quantifies both and expresses how much of the total variation is due to the measurement system. Repeatability Repeatability is the variation when: - The same operator - Uses the same gage - To measure the same part - Under the same conditions - Over a short time period High repeatability variation suggests issues with: - Gage resolution or sensitivity - Instrument wear, noise, or instability - Poor fixturing or inconsistent part positioning Reproducibility Reproducibility is the variation in averages of measurements made by different operators using the same gage on the same parts. It captures: - Operator-to-operator differences - Interpretation of the procedure - Technique, training, and consistency High reproducibility variation suggests: - Inadequate or ambiguous procedures - Insufficient training - Operator bias or inconsistent technique --- Types of Gage R&R Studies Crossed vs Nested Designs The design of the study depends on whether all operators can measure the same parts. - Crossed Gage R&R - All operators measure all selected parts. - Typical for destructive-free measurements. - Standard and most common design. - Nested Gage R&R - Different sets of parts are unique to each operator. - Used when parts are destroyed by measurement or cannot be remeasured. - Parts are nested within operators. Most standard Gage R&R discussions assume a crossed design. Variable vs Attribute Gage R&R - Variable Gage R&R - Measurement data is continuous (length, weight, time, diameter). - Analyzed statistically via ANOVA or range methods. - Primary focus of IASSC Gage R&R knowledge. - Attribute Gage R&R - Data is categorical (pass/fail, good/bad, defect type). - Assesses agreement and misclassification. - Different metrics (percent agreement, kappa); related but distinct from variable Gage R&R. This article focuses on variable Gage R&R, while recognizing attribute studies as a related but separate domain. --- Planning a Gage R&R Study Study Objectives Plan the study to answer: - Is the measurement system variation small enough compared to: - Part-to-part variation? - Tolerances or specification limits? - Are there issues with: - Equipment (repeatability)? - Operators (reproducibility)? - Interaction between operators and parts? Selecting Parts Parts should: - Represent the full range of process variation. - Include: - Low, nominal, and high values within specification. - If possible, some near specification limits. Guidelines: - Typical study uses 10 parts. - Parts should be stable during the study (no changes in properties). Selecting Operators Operators (appraisers) should: - Be representative of actual users of the gage. - Be familiar with normal work conditions. - Use the same documented procedure. Guidelines: - Typical study uses 3 operators. - Each operator should be trained to follow the same instructions. Number of Trials Trials are the repeated measurements on the same part by the same operator. Guidelines: - Typical design: 10 parts × 3 operators × 3 trials. - Trials should be done: - In randomized order. - Without operators seeing prior results. - Under similar environmental conditions. Randomization and Blinding To avoid bias: - Randomize: - Order of parts. - Order of operators where practical. - Blind operators from: - True values (if known). - Their previous readings. - Which specific part they are measuring (if feasible). --- Conducting the Gage R&R Study Standard 10 × 3 × 3 Crossed Study A standard crossed Gage R&R involves: - 10 parts selected to cover process range. - 3 operators. - Each operator measures each part 3 times. Total measurements = 10 × 3 × 3 = 90 readings. Data Collection Considerations Ensure consistency in: - Procedure – same steps followed for each measurement. - Setup – same fixtures, positioning, reference surfaces. - Environment – temperature, vibration, and lighting controlled if relevant. - Warm-up and calibration – gages calibrated and stabilized before use. Record: - Part ID - Operator ID - Trial number - Measurement value - Any unusual observations (e.g., difficulty measuring) --- Analyzing Gage R&R Results Basic Decomposition of Variation The analysis decomposes the total observed variation (in standard deviation form) into: - σ Total - σ Part-to-Part - σ Gage R&R, where: - σ Gage R&R = √(σ Repeatability² + σ Reproducibility²) In practice, software usually reports: - Variance components (σ²) - Standard deviations (σ) - Percent contribution of each component Repeatability (Equipment Variation) For repeatability: - Compute variation among repeated measurements: - Same part - Same operator - Same gage Using ANOVA or range-based methods, estimate: - σ Repeatability (within appraiser variation) A large repeatability component indicates: - Limited resolution - Excessive instrument noise - Poor stability of fixtures or methods Reproducibility (Appraiser Variation) For reproducibility: - Compare mean measurements across operators for the same part. Estimate: - σ Reproducibility (between appraiser variation) Reproducibility can also include: - Operator × Part interaction: - Some operators measure certain parts consistently high or low. A large reproducibility component indicates: - Operator technique differences - Inconsistent interpretation of criteria or measurement points - Inadequate training or unclear instructions --- Key Metrics in Gage R&R Percent Contribution Percent contribution indicates how much each source contributes to the total variation (in terms of variance): - %Contribution (component) = (σ² component / σ² total) × 100% Common components: - %Contribution Repeatability - %Contribution Reproducibility - %Contribution Gage R&R - %Contribution Part-to-Part Interpretation: - High %Contribution Gage R&R indicates the measurement system is a major source of variation, which is undesirable. Percent Study Variation Percent study variation compares standard deviations: - %StudyVar (component) = (σ component / σ total) × 100% Key metric: - %StudyVar Gage R&R = (σ Gage R&R / σ total) × 100% This metric shows how much of the observed variability is due to the measurement system. Percent Tolerance (or Percent P/T) Percent of tolerance compares measurement variation to the specification range: - Tolerance = USL − LSL - %Tolerance (component) = (6 × σ component / Tolerance) × 100% Often used: - %Tolerance Gage R&R = (6 × σ Gage R&R / Tolerance) × 100% Interprets how much of the specification range is “consumed” by measurement error. Number of Distinct Categories (ndc) Number of distinct categories estimates how many non-overlapping groups of parts the measurement system can reliably distinguish: - A common formula (software-based): ndc = 1.41 × (σ Part-to-Part / σ Gage R&R) Interpretation: - ndc < 2 – Measurement system cannot distinguish parts. - 2 ≤ ndc < 5 – Very limited discrimination. - ndc ≥ 5 – Generally considered adequate for analysis. - Larger ndc indicates better resolution and discrimination capability. --- Interpreting Gage R&R Outcomes General Acceptance Guidelines Common (but context-dependent) guidelines: - %StudyVar Gage R&R ≤ 10% - Measurement system variation is small. - Generally acceptable for most uses. - 10% < %StudyVar Gage R&R ≤ 30% - May be acceptable depending on application, cost, and risk. - Improvement of the measurement system is recommended. - %StudyVar Gage R&R > 30% - Measurement system not acceptable. - Use with great caution, and prioritize improvement. Similar thresholds may be applied to %Tolerance Gage R&R. Always consider: - Criticality of the characteristic being measured. - Regulatory or customer requirements. - Decision risk (e.g., risk of misclassifying good/bad parts). Part-to-Part Variation Check A reliable measurement system should show: - High %Contribution Part-to-Part - Low %Contribution Gage R&R If part-to-part variation is low: - It may mean selected parts do not span the process range. - Gage R&R metrics may look poor simply because differences between parts are small. - Reselect parts to better represent the process, then repeat the study. --- Practical Diagnosis and Improvement When Repeatability Is the Main Problem Symptoms: - High %Contribution Repeatability. - Large spread in repeated measurements by the same operator on the same part. Possible actions: - Improve resolution: - Use a more sensitive instrument. - Adjust scale or range. - Enhance fixturing and method: - Use better fixtures or positioning guides. - Reduce sensitivity to operator handling. - Maintain equipment health: - Repair or replace worn tools. - Improve calibration and preventive maintenance. When Reproducibility Is the Main Problem Symptoms: - High %Contribution Reproducibility. - Operator means differ significantly. - Interaction between operator and part is present. Possible actions: - Improve standard work: - Clarify measurement method details. - Specify measurement location, orientation, and timing. - Strengthen training: - Ensure operators interpret standards identically. - Practice with reference parts and compare results. - Reduce subjectivity: - Use more objective criteria and well-defined reference points. - Where feasible, automate measurement. When Operator × Part Interaction Exists Symptoms: - Certain operators read specific parts systematically high or low compared to others. Possible actions: - Investigate part features: - Contours, surfaces, or access issues. - Refine fixturing to be more robust for all operators. - Emphasize technique standardization at tricky locations. --- Relationship to Process Capability and Control Impact on Capability Measures A poor measurement system affects: - Cp, Cpk, Pp, Ppk calculations: - Inflated variability leads to underestimated capability. - Poor discrimination hides true process performance. - Interpretation: - Capability results may reflect measurement noise rather than actual process variation. Capability analysis should be based on: - A measurement system with acceptable Gage R&R metrics. - Confirmed stability in the measurement process. Impact on Control Charts A poor measurement system: - Increases false alarms (Type I errors) due to extra noise. - Masks real shifts or trends (Type II errors). - Makes it difficult to distinguish common vs special causes. Effective control charts require: - A stable and capable measurement system. - Adequate discrimination (sufficient ndc). --- Special Considerations Destructive Testing and Nested Designs When the act of measurement destroys the part (e.g., tensile strength, burst tests): - The same physical part cannot be measured by multiple operators. - Use a nested Gage R&R: - Parts are nested within operators. - Statistical model differs from crossed design. - Interpretation still focuses on: - Repeatability-like variation within operators. - Reproducibility-like variation between operators. Attribute Gage R&R (High-Level Link) Although distinct from variable Gage R&R: - Attribute studies assess: - Agreement with a standard. - Repeatability and reproducibility of classifications. - Key measures include: - Percent agreement with standard. - Percent within- and between-appraiser agreement. - Chance-corrected measures (e.g., kappa). The underlying idea is similar: evaluate whether the decision-making measurement system is reliable enough for process analysis and control. --- Summary Gage Repeatability & Reproducibility is a structured method for quantifying how much measurement error contributes to total observed variation. It separates: - Repeatability – variation from the equipment itself. - Reproducibility – variation between operators and their interaction with parts. A sound Gage R&R study: - Uses an appropriate design (usually crossed 10 × 3 × 3 for variable data). - Selects parts that span the real process range. - Uses representative operators under realistic conditions. - Randomizes and blinds measurements to reduce bias. Key metrics include: - %Contribution of each variation source. - %StudyVar Gage R&R – comparing measurement variation to total variation. - %Tolerance Gage R&R – comparing measurement variation to specification range. - ndc – indicating how many distinct part categories the system can reliably distinguish. Interpreting these results guides whether the measurement system is acceptable for use in capability analysis, control charts, and decision making, and where to focus improvements in equipment, procedures, training, and fixturing. A capable, stable measurement system is essential for trustworthy data and meaningful process improvement.
Practical Case: Gage Repeatability & Reproducibility A precision machining company produces stainless steel shafts for a medical device. Each shaft diameter must be 10.00 mm ± 0.02 mm, checked with digital calipers by operators on the shop floor. Production is frequently stopped because Quality rejects “out-of-spec” shafts, while supervisors insist the same shafts were “in-spec” when measured at the machines. Arguments center on whether the calipers or the operators can be trusted. The quality engineer runs a Gage R&R study using: - 3 operators from different shifts - 10 randomly selected shafts covering the expected range - 3 repeated measurements per part per operator, using the same calibrated digital caliper Measurements are taken in random order, without operators knowing which shaft they are re-measuring. The engineer analyzes the data with the company’s statistical software and finds that: - Part-to-part variation is acceptable. - Repeatability is poor: the same operator gets noticeably different readings on the same shaft. - Reproducibility is also poor: operators systematically differ from each other due to where and how they hold the caliper. Based on this, the team: - Standardizes the measurement method (fixed measurement location and orientation on the shaft, defined clamping force, brief retraining). - Repeats the Gage R&R with the same setup. The second study shows gage variation now well within the company’s acceptance criteria. After implementing the new standard and routine operator refreshers, disputes between Production and Quality drop sharply, false rejects decrease, and process capability data becomes reliable for ongoing improvement. End section
Practice question: Gage Repeatability & Reproducibility A machining process produces shafts with a specification width of 20.00 ± 0.10 mm. A crossed Gage R&R study shows that total Gage R&R is 28% of total observed variation, and part-to-part variation explains 60% of the total observed variation. How should a Black Belt interpret the adequacy of the measurement system? A. The measurement system is acceptable for all purposes B. The measurement system is acceptable only for preliminary or screening use C. The measurement system is unacceptable and must be replaced before any process decisions D. The measurement system is excellent and can be used for process release decisions Answer: B Reason: Total Gage R&R around 28% typically falls in the “marginal” range (≈10–30%): usable for screening or limited decision-making, but not ideal for final acceptance or tight control. Part-to-part variation at 60% is not dominant enough for a strong system. Other options are not best because A and D overstate the quality of this marginal system, and C is too strict given it can still be used for limited purposes. --- An analyst conducts a crossed Gage R&R with 3 appraisers, 10 parts, and 2 replicates. The ANOVA output shows: Part-to-part p-value < 0.001, Appraiser p-value = 0.65, and Part*Appraiser interaction p-value = 0.02. What is the most appropriate conclusion? A. Appraiser bias is significant and must be corrected B. There is no significant appraiser effect, but appraisers are inconsistent across parts C. The measurement system is dominated by repeatability problems only D. The measurement system is fully acceptable since appraisers are not different Answer: B Reason: Non-significant appraiser main effect (p=0.65) suggests no consistent bias among appraisers, but a significant Part*Appraiser interaction (p=0.02) indicates lack of consistency across parts (reproducibility issue via interaction), requiring further investigation or training. Other options are not best because A conflicts with the non-significant appraiser main effect, C ignores the significant interaction, and D ignores the interaction problem. --- A variable Gage R&R study (crossed design) produced the following variance components (all in squared units): σ²repeatability = 4, σ²reproducibility = 1, σ²_part-to-part = 45. What is the approximate %StudyVar for total Gage R&R? A. 10% B. 20% C. 30% D. 40% Answer: A Reason: Total Gage R&R variance = 4 + 1 = 5. Total variance = 5 + 45 = 50. %StudyVar (Gage R&R) = √5 / √50 ≈ 2.236 / 7.071 ≈ 31.6%; however, IASSC-consistent practice usually uses the SD ratio: (√5 / √(5+45)) ≈ 31.6%, but if using variance ratio 5/50 = 10%. For Black Belt exams that specify %Contribution by variance, 5/50 = 10%, which aligns with typical output labeled %Contribution (by variance). Other options are not best because B–D overstate Gage R&R contribution; for %Contribution (variance-based), 10% is correct, while 20–40% contradict the given components. --- A Black Belt wants to understand whether an automated vision system’s measurement error is small relative to process requirements, not just relative to current process variation. Which Gage R&R metric is most appropriate? A. %StudyVar (Gage R&R / Total) B. Number of distinct categories (ndc) C. Gage R&R expressed as % of tolerance (%Tolerance) D. Appraiser-by-part interaction variance component Answer: C Reason: %Tolerance compares measurement variation to engineering tolerance (specification range), directly addressing whether measurement error is small enough for the required precision, independent of current process spread. Other options are not best because A is based on current process variation, B summarizes discrimination capability but not directly relative to specs, and D is a subcomponent of reproducibility, not a direct requirement-based metric. --- A Black Belt is planning a Gage R&R study for a destructive hardness test that consumes the part. Which approach is most appropriate? A. Use a crossed Gage R&R design with multiple repeated measures on each part B. Use an attribute agreement analysis instead of Gage R&R C. Use a nested Gage R&R design with parts nested within appraisers D. Use a standard deviation chart to estimate repeatability without appraisers Answer: C Reason: For destructive tests, repeated measures on the same physical part are impossible; a nested design (different parts for each appraiser, parts nested within appraisers) is the appropriate Gage R&R structure for estimating repeatability and reproducibility. Other options are not best because A requires repeated readings on the same part (not feasible), B changes the data type and method inappropriately if measurements are continuous, and D ignores systematic appraiser effects and proper Gage R&R structure.
