24h 0m 0s
🔥 Flash Sale -50% on Mock exams ! Use code 6sigmatool50 – Offer valid for 24 hours only! 🎯
3.0 Analyze Phase
Analyze Phase Purpose of the Analyze Phase The Analyze Phase determines what truly causes the problem identified and measured in earlier work. The aim is to move from symptoms and correlations to evidence-based causes that can be acted on. In this phase the work focuses on: - Clarifying and refining the problem statement using data - Exploring the process and data for patterns and relationships - Identifying potential root causes - Statistically validating which factors are critical - Prioritizing causes to address in later phases Everything in Analyze should support one question: Why is the process performing the way it does? --- Connecting Analyze to the Project Charter and Data Refining Problem and Objectives The Analyze Phase starts by revisiting earlier definitions using collected data: - Confirm that the problem statement matches what the data shows - Verify that baseline performance (defects, cycle time, cost, etc.) is accurate - Check that project objectives are still realistic given the measured performance If the initial problem or scope was based on assumptions, the Analyze Phase uses evidence to correct or narrow it. Using the Y = f(X) View Analyze work is organized around a simple model: - Y = outcome or response (the performance measure: defect rate, time, cost) - X = input or factor (what might influence Y) - Y = f(X) means the outcome is a function of its inputs In Analyze, the main tasks are to: - Clarify the definition of Y (defect, unit, opportunity, specification) - Generate a list of possible Xs (process inputs, conditions, methods, materials) - Use data to distinguish: - Critical Xs (major impact on Y) - Trivial Xs (minor or no impact on Y) --- Understanding and Interpreting Process Behavior Process Mapping for Cause Discovery Process maps are refined in Analyze to reveal where causes may reside. Common tools include: - Detailed process map (swimlane, deployment) - Value stream map (VSM) for time, delays, and waste - SIPOC refinement if high-level clarity is missing Use these maps to: - Locate steps with high variation, rework, or waiting - Identify handoffs, queues, and information gaps - Spot non-value-added steps that might drive defects or delays The maps guide where to look for data patterns and potential Xs. Graphical Data Exploration Graphical analysis helps form and test cause ideas. Typical tools: - Histogram - Shows distribution shape, spread, and centering - Suggests potential multimodality (mixtures of subpopulations) - Boxplot - Compares distributions between groups or conditions - Highlights median, spread, and outliers - Time series (run) chart - Shows trends, shifts, cycles, and special events over time - Helps detect instability or step changes - Scatter plot - Visualizes relationship between two continuous variables - Indicates possible linear or nonlinear patterns - Pareto chart - Ranks categories by frequency or impact - Supports focusing on the “vital few” problem sources Graphical analysis is primarily used for: - Generating hypotheses about Xs that might affect Y - Providing initial evidence for or against suspected causes - Directing deeper statistical analysis --- Identifying Possible Root Causes Cause-and-Effect Structures Structured brainstorming helps transform scattered ideas into a coherent list of Xs: - Cause-and-Effect (Fishbone) Diagram - Organizes causes under categories (such as methods, machines, materials, people, environment, measurement) - Helps ensure no major cause category is overlooked - 5 Whys - Repeatedly asking “Why?” to move from surface symptoms to underlying process conditions Outputs from these methods should be: - Specific, observable process conditions (not vague statements) - Stated as potential Xs that could be tested with data Stratification and Segmentation Stratification breaks data into meaningful subgroups to reveal hidden patterns: - By time (shift, day of week, season) - By location (site, line, machine, workstation) - By product or service type - By operator or team - By supplier or input source Stratification can show that: - Problems are concentrated in specific subgroups - Overall averages hide serious issues in segments - Different root causes may exist in different segments In Analyze, stratification is a key step before formal statistical testing. --- Relating Inputs (Xs) to Outputs (Y) Basic Relationship Types Analyze focuses on three main relationship forms: - Continuous Y with discrete X - Example: cycle time (Y) vs. shift (X) - Often analyzed with t-tests or ANOVA - Continuous Y with continuous X - Example: thickness (Y) vs. temperature (X) - Explored using correlation and regression - Discrete Y (defective / nondefective) - Example: defect presence vs. machine type or setting - Explored with proportions, chi-square tests, or logistic models The aim is not just to detect association but to understand: - Direction and size of effect - Practical significance, not just statistical significance - Which Xs are most influential on Y Correlation and Causation Correlation analysis is useful but limited: - Correlation coefficient (r) - Measures strength and direction of linear association between two continuous variables - Values between −1 and +1 - Key cautions - Correlation does not prove causation - Outliers can distort correlation - Nonlinear relationships may have low correlation even if strong Correlation is typically: - A preliminary screening tool - Followed by more detailed modeling or designed experiments when needed --- Hypothesis Testing in Analyze Purpose and Logic Hypothesis testing in Analyze is used to decide whether observed differences or relationships are likely due to: - Natural process variation (random chance), or - A real underlying effect of an X on Y Core elements: - Null hypothesis (H₀): no difference, no effect, or no relationship - Alternative hypothesis (H₁): there is a difference, effect, or relationship - Significance level (α): probability of wrongly rejecting H₀ (commonly 0.05) - p-value: probability of observing data as extreme as the sample, assuming H₀ is true Decision rule: - If p-value ≤ α: reject H₀ (evidence supports an effect) - If p-value > α: fail to reject H₀ (no sufficient evidence of effect) Interpretation must always consider both statistical and practical significance. Common Hypothesis Tests in Analyze The following tests are central to the Analyze Phase toolkit: - 1-sample t-test - Compares sample mean to a known or target value - Used to check whether current performance significantly differs from a requirement - 2-sample t-test - Compares means of two independent groups - Example: mean cycle time on Machine A vs. Machine B - Paired t-test - Compares means of paired observations (before/after, same unit under two conditions) - Example: time for same operators before and after a small change - One-way ANOVA - Compares means across more than two groups - Example: yield across three suppliers - Proportion tests (1-proportion, 2-proportion) - Compare observed defect proportions to a target or between groups - Example: defect rates before vs. after a training intervention - Chi-square test for independence - Tests association between two categorical variables - Example: defect type vs. shift, or failure mode vs. supplier Each test requires: - Choosing appropriate test based on data type and design - Checking basic assumptions (independence, approximate normality, sample size) - Interpreting results in the context of the process and objectives --- Analyzing Variation and Sources of Variation Common Cause vs. Special Cause Understanding variation is essential to root cause analysis: - Common cause variation - Inherent in the process design and conditions - Stable but potentially high; addressed by process redesign - Special cause variation - Arises from specific, identifiable events or conditions - Irregular; addressed by containment and removal of cause The Analyze Phase seeks to: - Detect special causes while focusing on stable process data when evaluating underlying relationships - Differentiate between normal fluctuation and changes linked to specific Xs Stratification and Variance Components Variation can be decomposed by: - Within-group variation - Variation inside a single machine, operator, day, or batch - Between-group variation - Differences between machines, operators, days, or batches Methods such as ANOVA and variance component analysis help determine: - Whether a factor (machine, shift, supplier) contributes significantly to overall variation - Where to focus improvement efforts (which level or subgroup is most problematic) --- Regression and Modeling in Analyze Simple Linear Regression Simple linear regression models the relationship between: - One continuous X (predictor) - One continuous Y (response) Key outputs: - Slope: expected change in Y for a one-unit change in X - Intercept: expected Y when X = 0 (if meaningful) - R² (coefficient of determination): proportion of variation in Y explained by X - p-value for slope: evidence that X has a significant impact on Y Use simple regression to: - Quantify strength and direction of effect - Predict Y for given values of X - Prioritize Xs based on effect size and significance Multiple Linear Regression Multiple regression extends the model to several Xs: - Y = b₀ + b₁X₁ + b₂X₂ + … + bₖXₖ Benefits: - Evaluates each X while controlling for others - Identifies combinations of Xs that best explain Y - Detects interaction effects if included in the model Considerations: - Multicollinearity: strong correlation among Xs can distort estimates - Overfitting: too many predictors relative to available data reduces predictive reliability - Model validation: residual analysis, holdout samples, or cross-validation In Analyze, regression is used to: - Select critical Xs from a larger set - Build a prediction or explanation model as an input to solution design - Quantify expected impact of changing specific Xs --- Non-Normal Data and Transformations Recognizing Non-Normality Many process measures do not follow a normal distribution, particularly: - Time-based data (often skewed) - Count data (defects, errors) - Bounded data (proportions, percentages) Indications of non-normality: - Strong skew in histograms - Heavy tails or multiple peaks - Formal normality tests failing (with interpretation caution) Transformation Techniques To apply methods that assume approximate normality, transformations may help: - Log transformation - Common for right-skewed data (e.g., cycle times) - Square root transformation - Useful for count data with moderate skew - Box-Cox transformation - Systematic way to find a suitable power transformation Transformations should be: - Applied consistently - Interpreted carefully when converting model results back to the original scale Where transformation is not appropriate, alternative methods suited to non-normal data may be used (such as nonparametric tests or models designed for counts or proportions). --- Failure Modes and Risk-Based Causal Analysis FMEA in Analyze Failure Modes and Effects Analysis (FMEA) connects causes to risk: - Failure modes: ways a process step can fail - Effects: consequences of each failure mode - Causes: underlying reasons the failure occurs - Controls: existing detection or prevention mechanisms In Analyze, FMEA is used to: - Organize and prioritize potential root causes - Assess severity, occurrence, and detection to gauge risk - Highlight causes that warrant deeper statistical validation This supports selecting which causal paths to investigate further with data. Cause Verification vs. Speculation All suspected causes should undergo verification: - Data-based confirmation - Compare performance with and without the suspected cause - Use appropriate tests (proportion tests, t-tests, ANOVA, regression) - Logical consistency - Confirm that the cause mechanism is physically, logically, and operationally plausible - Reproducibility - Check that similar patterns appear across repeated samples or periods Analyze should end with: - A short, validated list of high-impact root causes - Documented evidence that these causes actually drive the problem --- Using Designed Experiments in Analyze (Screening) When to Use Designed Experiments When multiple potential Xs are suspected and process trials are feasible, screening experiments can be part of Analyze: - Purpose - Identify which factors from a larger set truly affect Y - Estimate direction and magnitude of effects - Typical context - Adjustable process settings (temperature, speed, pressure, etc.) - Multiple factors that may interact Screening experiments are usually: - Fractional factorial or other efficient designs - Focused on main effects rather than precise optimization Interpreting Screening Results Key results from screening-type designs: - Main effects plots - Show average Y at each level of a factor - Interaction plots - Reveal whether the effect of one factor depends on the level of another - Pareto of effects - Ranks factor effects by size In Analyze, these results are used to: - Eliminate non-significant factors from further consideration - Identify a subset of influential factors for more detailed study or optimization in later phases --- Verifying Measurement System Adequacy for Analyze Ensuring Data Quality for Causal Analysis Analyze conclusions are only as strong as the measurements behind them. Prior to or during Analyze, it may be necessary to: - Reconfirm measurement system capability for critical Ys and Xs - Use: - Gage R&R (for continuous measures) - Attribute agreement analysis (for pass/fail or categorical data) Criteria often checked: - Measurement variation small compared to process variation - Acceptable repeatability and reproducibility - Consistent classification among appraisers If the measurement system is weak, it must be improved or replaced before relying on cause-effect conclusions. --- Prioritizing Causes and Preparing for Improvement Practical Significance and Impact Not all statistically significant causes are worth addressing. To prioritize: - Estimate effect size of each cause on Y - Consider frequency or prevalence of the cause - Evaluate feasibility of controlling or eliminating the cause - Assess risk and cost of leaving the cause unaddressed This leads to a focused list of causes to attack in the next phase. Translate Analyze Output into Requirements for Solutions Key deliverables at the end of Analyze should include: - A clearly stated and data-supported view of: - How Y behaves (location, spread, patterns) - Which Xs (inputs, conditions, factors) most strongly affect Y - Visual and statistical evidence for each validated cause - A prioritized list of root causes with quantified impacts where possible - A clear linkage from each major cause to: - A process step - A control point - A potential solution direction This prepares for targeted improvements rather than broad, unfocused changes. --- Summary The Analyze Phase transforms raw data and suspected issues into validated knowledge about what drives process performance. It: - Refines the problem and clarifies the Y using process behavior and stratified data - Identifies and structures potential causes using maps, cause-and-effect tools, and FMEA - Applies statistical and graphical methods to distinguish real effects from random variation - Uses hypothesis testing, regression, and, when appropriate, screening experiments to quantify relationships between Xs and Y - Verifies measurement adequacy to support credible conclusions - Produces a short, evidence-based list of root causes prioritized by impact and practicality With these outputs, subsequent phases can design and implement solutions that directly address the true drivers of the problem.
Practical Case: Analyze Phase A regional hospital’s lab faces frequent late test results for emergency patients. The Define and Measure phases have confirmed that turnaround time is outside the target and have mapped the current process. Context and Problem Emergency doctors complain that lab results often arrive more than 60 minutes after blood draw, delaying treatment decisions. Measure Phase data shows average turnaround of 82 minutes with high variation, but the main drivers are unclear. How Analyze Phase Was Applied The project team (lab manager, two technicians, emergency nurse, and a quality analyst) focuses on identifying root causes of the delay. They first review the time-stamped data collected in Measure: - Order entry time - Specimen collection time - Specimen arrival at lab - Start and end of analysis - Result verification time - Result release time They create a simple time-series plot showing delays by step. It reveals that most delay occurs between specimen arrival at the lab and start of analysis. The team then: - Stratifies the data by shift (day/evening/night) and by test type. - Uses a boxplot to compare turnaround times by shift, finding evening shift significantly slower. - Performs a basic correlation check between lab staffing levels and queue length; high queues correlate with low staffing on evenings. - Conducts a short, structured observation on evening shift, timing each sub-step from specimen receipt to loading on the analyzer. Observation and data analysis show: - Batching behavior: technicians wait to collect 10+ samples before starting the analyzer. - Frequent rework from incomplete labeling; evening shift has more labeling errors coming from ED. - Only one technician handles both phone calls and analyzer loading during evenings, creating interruptions. A quick cause-and-effect matrix is used to prioritize causes by impact on turnaround and ease of control. Batching and interruptions score highest; labeling errors are important but secondary. Result By the end of Analyze Phase, the team clearly defines three validated root causes: 1. Batching of specimens before analysis on evening shift. 2. Technician role conflicts causing frequent interruptions. 3. Higher rate of labeling errors from ED during evenings. They quantify that batching and interruptions account for about 70% of the excess turnaround time beyond target. These become the focused inputs for the upcoming Improve Phase, avoiding broad, unfocused solutions. End section
Practice question: Analyze Phase A team investigating long cycle time suspects that rework is a key driver. They have collected cycle time and rework count per order. Which tool is most appropriate to quantify the strength and direction of the linear relationship between these two continuous variables? A. Chi-square test of independence B. Pearson correlation coefficient C. Kruskal–Wallis test D. Two-sample t-test Answer: B Reason: Pearson correlation measures the strength and direction of the linear relationship between two continuous variables. It is the standard Analyze Phase tool for this purpose. Other options either test associations between categorical variables (A), compare medians across multiple groups (C), or compare means between two groups (D), which do not directly quantify linear correlation. --- A Black Belt wants to confirm whether machine type (3 different models) has a significant effect on a continuous quality characteristic, assuming normality and equal variances. Which Analyze Phase tool is most appropriate? A. One-way ANOVA B. Simple linear regression C. 2x2 Contingency table analysis D. Mann–Whitney U test Answer: A Reason: One-way ANOVA is used to test whether there are statistically significant differences in the mean of a continuous response across more than two groups (here, three machine types). Regression (B) assumes a continuous or properly coded predictor, contingency tables (C) are for categorical–categorical relationships, and Mann–Whitney (D) compares two, not three, groups. --- A process outputs a continuous CTQ. A potential X is a binary factor (Setting A vs. Setting B). The team collected 30 observations for each setting. Normality and equal variance assumptions are satisfied. Which hypothesis test should be used to verify the impact of the factor on the CTQ mean? A. Paired t-test B. Two-sample (independent) t-test C. F-test for equality of variances only D. Chi-square goodness-of-fit test Answer: B Reason: A two-sample t-test compares the means of a continuous response between two independent groups (Setting A vs. Setting B) under normality and equal-variance assumptions. Paired t-test (A) is for matched pairs, F-test alone (C) only addresses variances, and chi-square goodness-of-fit (D) is for distributions of categorical data, not means of continuous data. --- During Analyze, a Black Belt builds a simple linear regression model relating lead time (Y) to work-in-progress (X). The regression equation is: Y = 2.5 + 0.8X If X increases by 5 units, what is the expected change in Y, and how should this be interpreted? A. Y increases by 4 units; each unit increase in X adds 0.8 units to Y on average B. Y increases by 2.5 units; the intercept represents the effect of X C. Y stays the same; intercept dominates the model D. Y decreases by 4 units; slope is negative Answer: A Reason: In simple linear regression, the slope (0.8) represents the expected change in Y for a one-unit increase in X. For a 5-unit increase, Y is expected to increase by 0.8 × 5 = 4 units. The intercept (B, C) is the expected Y when X = 0, not the change per unit; the slope is positive, so (D) is incorrect. --- In the Analyze Phase, a team performs a Pareto analysis on defect types and then uses a Cause & Effect Matrix (C&E Matrix) for the top defect. What is the primary purpose of the C&E Matrix at this stage? A. To confirm normality of the response distribution B. To prioritize potential Xs based on their impact on Y and process knowledge C. To estimate process capability indices (Cp, Cpk) D. To validate measurement system repeatability and reproducibility Answer: B Reason: A C&E Matrix is used in Analyze to systematically prioritize potential input variables (Xs) by rating their influence on key outputs (Ys) and leveraging team knowledge, guiding which Xs to study further. Normality checks (A), capability calculations (C), and MSA (D) are not the main purposes of a C&E Matrix.
