top of page

5.2.1 Data Collection for SPC

Data Collection for SPC Introduction Statistical Process Control (SPC) depends on accurate, reliable data. Without disciplined data collection, control charts and capability analyses can mislead more than they help. This article explains how to design, implement, and validate data collection specifically for SPC applications. --- Linking Data Collection to SPC Objectives Defining the SPC Question SPC data collection starts by clarifying the question the control chart must answer. Typical SPC questions include: - Stability: Is the process stable over time under current conditions? - Shift detection: Can we quickly detect small or moderate shifts in the process? - Capability: Is the stable process capable of meeting specifications? - Comparison: Are two process conditions meaningfully different in variation or level? Each question influences: - What is measured (metric definition) - How often data are collected (sampling frequency) - How many data points are needed (sample size and duration) - Where and by whom data are collected (process locations and operators) --- Defining the Process and Measurement Scope Defining the Process Boundaries To support SPC, define the process segment to be monitored: - Start and end points: Clear trigger and completion events - Input conditions: Key materials, machines, settings, and environmental factors - Output characteristic: The quality characteristic to be charted - Subgroups: Natural groupings of items in time (shifts, batches, lots, cycles) This definition ensures: - Data reflect a single, consistent process - Special causes can be traced back to real process events - Subgroups correspond to meaningful time slices of variation Choosing the Quality Characteristic Select a characteristic that is: - Critical to performance: Tied to customer, safety, or regulatory requirements - Measurable in real time or near real time: Supports timely detection and correction - Sensitive to process changes: Varies when important inputs or conditions change Common SPC data types: - Continuous (variable): Length, weight, temperature, time, force, viscosity - Discrete (attribute): Pass/fail, defect count, number of errors, presence/absence The data type largely determines: - Which control chart is appropriate - How much data are needed to detect specific shifts - How the sampling plan should be structured --- Operational Definitions and Measurement Specifications Creating Operational Definitions An operational definition describes exactly how to recognize and record a measurement so that different people and times yield consistent results. For SPC, include: - What: Exact definition of the characteristic - Example: “Hole diameter measured at the widest point, perpendicular to the part face” - Where: Measurement location or point in process - When: Timing relative to production (e.g., after cooling, before packaging) - How: Instrument, method, and any preparation - Who: Roles responsible for measurement and recording A good operational definition allows multiple observers to: - Select the same units to measure - Obtain closely agreeing numerical values - Classify items identically as conforming or non-conforming Specification and Tolerance Clarification For SPC, specifications are not required to construct control charts, but they are essential when: - Interpreting process capability (Cp, Cpk, Pp, Ppk) - Setting sampling priorities - Establishing reaction plans to out-of-control signals Clarify: - Nominal (target) value - Upper specification limit (USL) and lower specification limit (LSL), when applicable - One-sided specifications where only USL or LSL exists - Functional meaning of limits, not just numbers --- Selecting Data Types and Units Continuous vs Attribute Data in SPC Continuous (variable) data: - Measured on a scale (e.g., millimeters, PSI, seconds, °C) - Capture magnitude and small changes - Provide more information per sample - Allow more sensitive SPC charts (X̄–R, X̄–S, I–MR) Attribute (discrete) data: - Count or classification (e.g., defective/not defective, number of defects) - Often easier and cheaper to collect - Needed when measurement is inherently categorical - Used on p, np, c, u charts When both are possible, continuous data are generally preferred for SPC because they: - Detect smaller shifts with fewer samples - Provide better insight into the nature of variation Units and Resolution For SPC data: - Units must be: - Consistent across time, locations, and instruments - Appropriate to the process and specifications - Resolution (granularity) should: - Be small enough to detect meaningful variation - Not be so fine that measurement error dominates Guidelines: - Use measurement increments at least 5–10 times smaller than the tolerance band when feasible - Avoid rounding that masks process variation (e.g., reporting 0.1 mm when variation is within 0.02 mm) --- Sampling Strategy for SPC Rational Subgrouping Rational subgrouping is central to SPC data collection. The goal is to: - Have within-subgroup variation represent only common-cause variation - Have between-subgroup variation capture potential special-cause changes over time Choosing rational subgroups: - Group items produced: - Close together in time - Under nearly identical conditions - From the same machine, line, or setup when possible - Avoid mixing: - Different shifts - Different machines - Different raw materials - Major setup or process changes within the same subgroup unless those differences are specifically under study Consequences of poor subgrouping: - Masking of special causes within subgroups - False appearance of stability or instability - Misleading control limits Sampling Frequency Sampling frequency determines how quickly SPC can detect process changes. Consider: - Process speed: Fast processes may require more frequent sampling - Risk and impact: High-risk outputs justify tighter monitoring - Expected rate of change: Processes that drift quickly require closer spacing in time - Cost and feasibility: Balance between detection speed and measurement burden Common patterns: - Time-based sampling: - Example: Every hour, every shift, every batch - Useful for continuous or high-volume processes - Event-based sampling: - Example: At each setup, tool change, lot change - Useful when risk of change is tied to specific events The aim is to choose a sampling interval short enough that: - The process is unlikely to drift far between samples - Out-of-control signals are detected before large quantities are affected Subgroup Size Subgroup size (n) influences: - Sensitivity of control charts to shifts - Reliability of estimates of process mean and variation - Effort and cost of data collection Typical sizes: - For X̄–R or X̄–S charts: - Common subgroup sizes: 4–5 units - Smaller n (2–3) when process is expensive or slow - Larger n (up to 10) where feasible and justified - For attribute charts: - p and np charts: choose sample size large enough to observe variation in defectives - c and u charts: based on area of opportunity or time, not number of units Trade-offs: - Larger subgroups: - Better detection of small shifts in the mean - Higher measurement cost - Smaller subgroups: - Less sensitive to small shifts - More practical for high-cost or low-volume processes --- Data Collection Plans for SPC Components of a Data Collection Plan A structured data collection plan reduces errors and confusion. For SPC, include: - Objective: - What SPC question the data will answer - Characteristic: - Operational definition and unit of measure - Data type: - Continuous or attribute, and appropriate chart type - Sampling strategy: - Subgroup size - Sampling frequency - Time frame (duration) - Population and location: - Process step, equipment, shift, and operator scope - Method and equipment: - Measurement instruments and procedures - Recording format: - Forms, check sheets, or digital systems - Required fields (date, time, lot, operator, instrument ID) - Reaction plan: - Actions to take for: - Out-of-control signals - Missing or suspect data - Instrument failure Designing Forms and Check Sheets Well-designed forms support consistent SPC data collection. Effective forms: - Show clearly: - Date, time, and subgroup labels - Sample sequence within subgroup - Units and specification limits - Minimize opportunities for: - Misinterpretation of columns - Missing entries - Transcription errors For attribute data: - Make categories mutually exclusive and collectively exhaustive - Provide examples or brief definitions of defect types - Include space to record: - Total units inspected - Number of defectives or defects by category --- Measurement System Considerations Measurement System Requirements for SPC SPC assumes that the measurement system: - Accurately reflects true process variation - Has measurement error small relative to process variation and specification width - Is stable over time (no uncontrolled drift) Key characteristics: - Bias: Difference between average measured value and true value - Repeatability: Variation when same operator measures same item repeatedly - Reproducibility: Variation among different operators measuring the same item - Linearity: Consistency of accuracy across measurement range - Stability: Consistency of measurement system performance over time If measurement error is too large: - Control charts may show false signals or hide real ones - Process capability indices will be distorted - Improvement decisions may target measurement noise instead of process causes Attribute Measurement Consistency For attribute SPC data: - Misclassification (e.g., calling defective items good) impacts: - Control limits - Observed defect rates - Ensure: - Clear defect definitions and examples - Common interpretation of borderline cases - Consistent inspection conditions (lighting, magnification, angle) Training and calibration examples help align inspectors’ decisions before routine SPC data collection starts. --- Data Integrity and Handling Preventing Data Collection Bias Common sources of bias in SPC data collection include: - Convenience sampling: - Measuring only easily accessible units instead of representative samples - Selective recording: - Omitting values outside expectations or perceived as “obvious errors” without investigation - Operator influence: - Avoiding measurement when the process is known to be unstable Preventative practices: - Adhere strictly to the sampling schedule, even when the process looks stable or unstable - Record all values, including suspected outliers, and flag them for later review - Avoid “data smoothing” or retroactively adjusting values to fit expectations Dealing with Missing or Suspect Data Missing or invalid data points affect SPC charts. Rules for handling them should be defined in advance: - Missing subgroups or observations: - Do not fabricate or estimate values - Record reasons for missing data (machine down, instrument failure, etc.) - Resume normal sampling as soon as conditions allow - Suspect values: - If measurement error is confirmed, clearly mark and exclude from control chart calculations - If doubt cannot be resolved, retain the data and flag it; let SPC interpretation consider possible measurement issues --- Preparing Data for SPC Charts Organizing Data for Charting For effective chart construction and interpretation: - Collect and store data in time sequence - Maintain subgroup identity: - Each subgroup with a unique ID or timestamp - Within-subgroup order preserved for traceability - Record contextual information: - Shift, machine, lot, material batch, setup ID, operator, and any significant events This contextual information helps: - Explain out-of-control points - Identify special-cause patterns - Support stratification and further analysis if needed Initial Data Collection Period Before formal SPC charting: - Collect a baseline set of data under normal operating conditions - Use this baseline to: - Estimate process average and variation - Compute initial control limits - Check for obvious non-random patterns or special causes Considerations: - Baseline data should represent consistent conditions: - Avoid including known startup, shutdown, or trial runs - If special causes are found in baseline data: - Investigate and correct where feasible - Recollect baseline data if conditions change significantly --- Interpreting SPC Results and Adjusting Data Collection Linking Signals to Data Collection Adequacy When control charts show frequent or puzzling signals, consider data collection factors: - Are subgroups rational, or are different conditions mixed? - Is sampling frequency appropriate for expected process changes? - Does measurement resolution hide important variation? - Has the process definition shifted without updating the data collection plan? Adjustments might include: - Redesigning subgroups (e.g., split by machine or shift) - Increasing sample frequency during high-risk periods - Revisiting operational definitions to remove ambiguity - Improving measurement system performance Maintaining Long-Term Data Quality Over time, processes, equipment, and people change. To preserve SPC data quality: - Periodically review: - Operational definitions for relevance and clarity - Sampling plans for alignment with current risks - Forms and systems for ease of use and completeness - Monitor measurement system stability: - Repeat key checks at defined intervals - Compare old and new instruments or procedures when changes occur --- Summary Effective SPC requires disciplined, well-designed data collection. Key elements include: - Clear SPC objectives and process boundaries - Precise operational definitions and appropriate selection of continuous or attribute data - Rational subgrouping, suitable sample sizes, and sampling frequencies that reflect process risk and dynamics - Structured data collection plans, including reaction plans and robust forms - Reliable and stable measurement systems, with attention to both continuous and attribute consistency - Procedures that protect data integrity, manage missing or suspect values, and maintain time sequence and context When these aspects are in place, SPC charts reliably distinguish common-cause from special-cause variation, enabling sound decisions and sustained process control.

Practical Case: Data Collection for SPC A mid-sized pharmaceutical plant produces liquid cough syrup on a single filling line. Customers have complained that some bottles feel “underfilled,” leading to returns and internal rework. The quality manager and line supervisor agree to use SPC, starting with disciplined data collection on fill volume. They define: - Purpose: detect and prevent underfill/overfill before batches ship. - Characteristic: net fill volume per bottle. - Measurement method: calibrated gravimetric scale, converted to volume. - Sampling plan: every 30 minutes, 5 consecutive bottles pulled by the line operator. - Data rules: no replacing “bad” bottles, record exact readings, note line speed and tank changeovers. Operators receive a 10‑minute coaching session and a simple paper form with preprinted times. Each sample set is labeled with date, filler head number, and operator initials. Data is entered daily into a shared spreadsheet that auto‑plots an X‑bar and R chart. After five days, the SPC chart shows a clear pattern: whenever line speed is increased for a large order, the average fill volume drifts down over the next two sample sets, staying just inside spec but trending toward the lower limit. Maintenance confirms that a worn filler nozzle and a pressure fluctuation are causing the drift at higher speeds. The team schedules a short shutdown, replaces the nozzle, and adjusts pressure control. They keep the same data collection plan. Over the next week, the SPC chart shows a stable process centered mid‑spec with no further customer complaints or rework due to underfill. End section

Practice question: Data Collection for SPC A Black Belt is designing an SPC study for a high-volume filling process. The output is fill volume, measured continuously, and subgroups of 5 will be sampled every hour. Which type of data and chart combination requires the most care to ensure rational subgrouping? A. Attribute data with a p-chart B. Variable data with an X̄-R chart C. Attribute data with a c-chart D. Variable data with an individuals (I-MR) chart Answer: B Reason: The process produces variable data and the plan uses subgroups of 5, which is suitable for an X̄-R chart. For X̄-R, rational subgrouping is critical to ensure that within-subgroup variation reflects only common cause and between-subgroup variation captures potential special causes over time. Other options: Attribute charts (p, c) and I-MR charts either do not use subgroups or use counts, so rational subgrouping is less central or defined differently for their application. --- A team wants to collect data for an SPC study on the proportion of defective invoices. To ensure that the data are representative over time, which sampling strategy is most appropriate? A. Convenience sampling from a single day’s production B. Judgment sampling of invoices suspected to contain errors C. Systematic sampling, selecting every 20th invoice over multiple days D. Stratified sampling, selecting only invoices from the busiest shift Answer: C Reason: Systematic sampling across time (every 20th invoice over multiple days) yields time-ordered, representative data suitable for SPC, allowing detection of trends and shifts while maintaining operational feasibility. Other options: Convenience and judgment sampling introduce bias; stratifying on the busiest shift only restricts representativeness of the process across all operating conditions. --- An SPC data collection plan for an assembly process specifies: sample 4 units every 2 hours, measure torque (Nm) with a calibrated digital torque wrench, and record in the SPC system in time order. Which additional element is most critical to add to improve the plan’s robustness? A. Definition of operational criteria for what constitutes one “unit” B. A requirement to re-calculate control limits every day C. A rule to discard outliers before plotting the data D. A requirement to randomize the time of each subgroup measurement Answer: A Reason: Clear operational definitions of what constitutes a “unit” (e.g., which joint, side, station) ensure measurement consistency and comparability, which is fundamental for valid SPC data collection. Other options: Recalculating limits daily, discarding outliers, or randomizing time of measurement can undermine time-based control chart analysis or conceal special causes instead of detecting them. --- A Black Belt is deciding how much data to collect for estimating control limits of an X̄-R chart for a stable machining process. The process will use subgroups of size 5. Which guideline is most appropriate for determining the minimum data collection effort? A. Collect at least 5 subgroups regardless of process type B. Collect at least 20–25 subgroups before calculating initial control limits C. Collect at least 200 individual observations before calculating limits D. Collect data until two consecutive points fall beyond 3σ limits Answer: B Reason: A common guideline for reliable estimation of control limits on X̄-R charts is to collect at least 20–25 rational subgroups, which provides reasonably stable estimates of the mean and dispersion. Other options: 5 subgroups is insufficient; 200 individuals may be unnecessary and not aligned with subgroup-based estimation; using out-of-control signals to stop data collection confuses estimation with monitoring. --- During an SPC study, an operator records thickness data by visually reading an analog gauge to the nearest 0.1 mm. The short-term process variation (σ) is approximately 0.02 mm. Which data collection issue is most critical to address before constructing control charts? A. The sampling frequency is likely too high B. The measurement resolution is too coarse relative to process variation C. The process is not suitable for variable data SPC D. The operator is not rotating among different machines Answer: B Reason: With process σ at 0.02 mm, a 0.1 mm resolution yields only about ±1σ to ±2σ within a single increment, causing rounding and data “chunking,” which degrades SPC sensitivity and mask shifts. Measurement system resolution should be small relative to process variation. Other options: Sampling frequency, data type, and operator rotation may matter, but they are secondary to ensuring adequate measurement resolution for reliable SPC data.

bottom of page