top of page

2.2.4 Graphical Analysis

Graphical Analysis Introduction Graphical analysis is the disciplined use of plots and charts to understand data behavior, evaluate assumptions, and support data‑driven decisions. It connects raw data to statistical methods by making patterns visible and testable. This article explains the essential graphical tools and how to interpret them correctly in process improvement and problem‑solving work. --- Foundations of Graphical Analysis Why Graphical Analysis Matters Graphical analysis is used to: - Reveal patterns and structure in data - Detect outliers and special causes - Assess distributions and model assumptions - Compare groups, conditions, or time periods - Communicate findings clearly and convincingly It is not a replacement for statistical tests; it guides which tests are appropriate and validates their assumptions. Data Types and Scale Understanding data types is critical for choosing the right graph: - Continuous data – measured on a scale (time, length, weight) - Discrete data – counted values (defects, occurrences) - Attribute data – categories or classifications (pass/fail, type A/B/C) - Ordinal data – ranked categories (low/medium/high) Graph selection depends on: - Number of variables (one, two, many) - Type of variables (continuous vs categorical) - Objective (distribution, comparison, relationship, time trend) --- Graphs for Univariate Analysis Univariate analysis looks at one variable at a time to understand its distribution, center, spread, and unusual values. Histograms A histogram displays the frequency of data across continuous intervals (bins). Use a histogram to: - Visualize distribution shape (normal, skewed, multimodal) - Estimate center and spread - Detect outliers and gaps - Spot potential subgroups or mixture distributions Key aspects to interpret: - Shape - Approximately symmetric and bell‑shaped suggests normality - Right‑skewed: long tail to the right, many small values - Left‑skewed: long tail to the left, many large values - Multimodal: multiple peaks, suggesting different sources or conditions - Spread - Wide histograms: higher variation - Narrow histograms: lower variation - Location - Where most data lie relative to a target or specification - Outliers and gaps - Isolated bars or empty regions may indicate special causes or data issues Practical tips: - Use appropriate bin widths; too few bins hide structure, too many amplify noise. - Overlay specification limits or targets when relevant to see capability visually. Dotplots A dotplot shows each observation as a dot along a scale, stacked where values repeat. Use dotplots when: - Sample sizes are small or moderate - You want to see individual values clearly - The histogram would be too coarse or misleading Dotplots help you: - See clustering and spread - Detect outliers directly - Compare small samples across groups side by side Boxplots A boxplot (box‑and‑whisker plot) summarizes a distribution using quantiles. Key elements: - Median – line inside the box - Box – from first quartile (Q1, 25th percentile) to third quartile (Q3, 75th percentile) - Interquartile range (IQR) – Q3 − Q1, central 50% of data - Whiskers – usually extend to the most extreme points within 1.5 × IQR from Q1 and Q3 - Outliers – points beyond whiskers, often plotted individually Use boxplots to: - Compare distributions across multiple groups - Identify differences in medians (center) - Compare spreads and detect variability changes - Spot outliers and skewness (via box asymmetry and whisker lengths) Interpretation focus: - Relative median locations vs target or between groups - Overall height of boxes (variation) - Whisker length and isolated points (extremes and outliers) - Asymmetry of the box and whiskers (skewness) Stem‑and‑Leaf Plots Stem‑and‑leaf plots are text‑based displays that show both shape and raw data values. Use them when: - Data sets are small to moderate - You want a quick view of distribution while retaining actual values They help: - Reveal distribution shape (similar to histogram) - Detect clusters and gaps - View individual data points directly Interpretation is similar to histograms, but with exact values visible. --- Assessing Normality with Graphs Many statistical methods assume normality. Graphical tools help evaluate whether this assumption is reasonable. Normal Probability Plots (Q‑Q Plots) A normal probability plot compares ordered sample data to the expected quantiles of a normal distribution. How to read: - Plot points form an approximately straight line: - Normal assumption is reasonable - Systematic curvature: - S‑shaped: light tails or heavy tails depending on direction - Concave up or concave down: skewness (right or left) - Outlying points far from the line: - Potential outliers or mixture distributions Key uses: - Assess suitability of normal‑based methods (e.g., z, t, ANOVA, regression residuals) - Identify whether transformations or non‑normal methods may be needed Complementary checks: - Compare normal probability plot with histogram and boxplot - Look for consistency among views before deciding on assumptions Anderson‑Darling Plot (Graphical Support) Some software combines a normal probability plot with an Anderson‑Darling test result. The plot remains interpreted as a Q‑Q plot, while the test provides a p‑value. Use the graph to: - Judge the pattern and practical importance of deviations - Avoid relying blindly on p‑values, especially with large samples --- Graphs for Bivariate Analysis Bivariate analysis examines the relationship between two variables, either continuous‑continuous or continuous‑categorical. Scatterplots A scatterplot shows paired data points (x, y) to visualize relationships. Use scatterplots to: - Detect linear or nonlinear relationships - Identify clusters or subgroups - Detect outliers and leverage points - Check for potential interactions (when grouped or colored by a third variable) Interpretation: - Direction - Positive association: y increases as x increases - Negative association: y decreases as x increases - No clear pattern: weak or no linear association - Form - Linear trend suggests linear models (e.g., regression) - Curved patterns suggest nonlinear models or transformations - Horizontal bands suggest no relationship - Strength - Tight clustering around a line: stronger relationship - Wide scatter: weaker relationship - Outliers - Points far from the general cloud may unduly affect correlation or regression Enhancements: - Add fit lines to visually inspect linearity - Use subgroup symbols or colors to see stratification or interactions Time Series Plots (Run Charts) A time series plot shows data in time order (x‑axis as time, y‑axis as measurement). Use time series plots to: - Detect trends (upward or downward) - Identify cycles or seasonality - Spot sudden shifts or jumps - Detect special‑cause variation visually - Observe process behavior before and after changes Interpretation focus: - Stability: is the process fluctuating randomly around a consistent level? - Patterns: long runs above/below mean, trends, cycles - Abrupt changes: suggest process shifts or external events Time series plots are often the first step before advanced time‑based tools (e.g., control charts). The core interpretation—pattern over time—remains essential. Stratified Plots Sometimes a relationship appears weak overall but strong within subgroups (stratification). Graphical strategies: - Scatterplots with subgroup markers (different colors/shapes) - Side‑by‑side boxplots across categories - Panel (small multiple) plots by category Use stratification to: - Reveal hidden relationships masked by aggregation - Identify interactions between factors - Avoid misleading conclusions from pooled data --- Graphs for Multivariate Analysis When more than two variables are involved, graphical tools help visualize complex relationships. Matrix Plots (Scatterplot Matrices) A matrix plot shows multiple scatterplots for all pairs of variables in a grid. Use matrix plots to: - Screen for potential relationships among several variables - Identify candidate predictors for modeling - Spot collinearity (strong relationships among predictors) - Detect clusters or unusual patterns across multiple dimensions Interpretation: - Look for panels with visible linear or curved patterns - Watch for pairs with unusual outliers or separate clusters - Use correlation coefficients (if included) as a guide, but rely on visual patterns for context Bubble and 3D‑Like Plots Bubble plots extend scatterplots by using bubble size (and sometimes color) to represent a third or fourth variable. Use them to: - Visualize three variables on one 2D plot - See how a third variable changes with the x‑y relationship Limitations: - Can be hard to interpret with many points or similar sizes - Should be used to explore hypotheses, not as final proof Practical guidance: - Keep the number of variables reasonable per graph - Use clear legends and scaling - Confirm graphical impressions with appropriate statistical analysis --- Graphs for Comparing Groups Often the goal is to compare distributions or means across categories, factors, or conditions. Side‑by‑Side Boxplots Side‑by‑side boxplots display boxplots for several groups on a common scale. Use them to: - Compare centers (medians) across groups - Compare spreads (IQR, whiskers) - Assess overlap and potential differences - Detect outliers in each group Interpretation: - Large separation between medians suggests a difference - Non‑overlapping boxes indicate stronger visual evidence of difference - Differences in IQR or whisker range suggest unequal variances Side‑by‑side boxplots are especially useful before tests like ANOVA or nonparametric comparisons. Interval Plots Interval plots show sample means with confidence intervals, usually 95%, for each group. Use interval plots to: - Compare mean estimates visually - Judge whether intervals overlap or are clearly separated - Evaluate effect sizes and uncertainty Interpretation: - Intervals that do not overlap suggest a more pronounced difference - Overlap does not automatically mean no difference; use as a screening tool - Length of intervals reflects precision; wider intervals indicate more uncertainty or smaller sample sizes Interval plots are often used as a graphical companion to hypothesis tests. --- Graphs for Categorical and Count Data Attribute and count data require different graphs than continuous data. Bar Charts Bar charts display counts or proportions for categorical levels. Use bar charts to: - Show frequency distribution of categories - Compare relative occurrence of defect types, failure modes, or categories - Communicate categorical patterns clearly Key variants: - Simple bar charts – counts or percentages per category - Stacked bar charts – show composition within categories - Clustered bar charts – compare category frequencies across groups Interpretation: - Identify most frequent categories - Compare patterns between subgroups (e.g., before/after) - Look for categories that dominate or are negligible Pareto Charts Pareto charts are bar charts ordered from largest to smallest, often with a cumulative percentage line. Use Pareto charts to: - Prioritize problem categories by impact - Focus improvement on the vital few causes contributing most defects - Communicate which issues are most important to address first Interpretation: - Bars on the left represent highest counts or impact - Cumulative line shows the cumulative contribution - Determine where a small number of categories contribute a large share of the total Practical notes: - Define a consistent measure (e.g., defect count, cost, downtime) - Group small remaining categories under an “other” bar when appropriate --- Graphs for Process Capability and Performance Graphical capability tools show how a process distribution fits within specification limits. Capability Histograms Capability analysis often overlays a fitted distribution curve and specification limits onto a histogram. Use capability histograms to: - See whether most data fall within specs - Visualize centering relative to target - Observe spread relative to spec width Interpretation: - Well‑centered and tight distribution within specs suggests capable performance - Distribution shifted towards or beyond a limit indicates risk of nonconformance - Wide spread approaching or crossing limits suggests higher defect rates Combine capability histograms with numerical indices (e.g., Cp, Cpk) and normality checks, but rely on the graph to understand practical impact. --- Residual Graphical Analysis in Modeling When applying statistical models (e.g., regression, ANOVA), residual plots check whether model assumptions are met. Residuals vs Fits A residuals vs fitted values plot shows residuals on the y‑axis and fitted (predicted) values on the x‑axis. Use it to check: - Linearity: residuals should scatter randomly around zero - Homoscedasticity: spread of residuals should be roughly constant - Model form: patterns suggest missing terms or wrong functional form Interpretation: - Random cloud around zero: assumptions reasonably satisfied - Funnel shape: varying variance, may need transformation or different model - Curved pattern: model may be missing curvature or interaction Residuals vs Predictors Residuals vs individual predictors help detect: - Nonlinear relationships with specific predictors - Omitted terms or interactions - Predictor ranges where the model fits poorly Interpretation is similar: look for randomness rather than structure. Normal Probability Plot of Residuals A normal probability plot of residuals checks normality of model errors. Use it to: - Assess validity of normal‑based inference (e.g., p‑values, confidence intervals) - Decide whether transformations or robust methods are needed Interpret as with other normal probability plots: straight line is acceptable, systematic deviations suggest non‑normality. --- Practical Guidelines for Effective Graphical Analysis Selecting the Right Graph Match your objective to the tool: - Understand distribution: histogram, boxplot, dotplot, stem‑and‑leaf - Check normality: normal probability plot, histogram - Relationship between two continuous variables: scatterplot - Trend over time: time series plot - Compare groups (continuous outcome): side‑by‑side boxplots, interval plots - Categorical counts: bar chart, Pareto chart - Multiple variables: matrix plot, stratified plots - Model diagnostics: residual plots Avoiding Common Pitfalls Watch for these issues: - Inappropriate graph for data type - Example: pie chart instead of bar chart for detailed categorical comparison - Misleading scales - Truncated axes or inconsistent scales between graphs can distort perception - Overplotting - Too many points can obscure patterns; use transparency, jitter, or aggregation - Ignoring context - Always interpret graphs within process knowledge and sampling plan - Over‑interpreting noise - Small apparent patterns, especially with low sample sizes, may be random variation Combining Graphs with Statistical Methods Graphical analysis should: - Precede formal tests to guide tool selection and model choice - Accompany tests to validate assumptions - Follow tests to interpret and communicate results Example flow: - Use histograms and boxplots to understand data and spot outliers. - Use normal probability plots to assess normality before applying parametric tests. - Use scatterplots and matrix plots to explore relationships before building models. - Use residual plots after modeling to verify assumptions and refine models. --- Summary Graphical analysis connects raw data to meaningful insight by making structure, patterns, and anomalies visible. Mastery involves: - Choosing the right graph for distribution, relationship, comparison, or time‑based questions. - Interpreting histograms, boxplots, dotplots, and normal probability plots to understand distributions and assess normality. - Using scatterplots, time series plots, and stratified views to uncover relationships, trends, and hidden subgroups. - Applying boxplots, interval plots, bar charts, and Pareto charts to compare groups and prioritize issues. - Using capability histograms and residual plots to evaluate process performance and model assumptions. Consistent, disciplined use of graphical analysis strengthens every stage of data‑driven problem solving, from initial exploration through modeling and final decision making.

Practical Case: Graphical Analysis A mid-size hospital’s lab reports that turnaround time (TAT) for routine blood tests is “too long and unpredictable.” Physicians complain that morning results often arrive after ward rounds. Context and Problem The Lean Six Sigma team collects two weeks of timestamped data for routine blood tests: - sample collection time - lab receipt time - result validation time The goal is to understand the current TAT behavior before proposing improvements. How Graphical Analysis Was Applied The team uses basic graphs from Minitab/Excel: 1. Time Series Plot (TAT by day and hour) - Reveals TAT spikes between 6–9 a.m. on weekdays. - Shows relatively stable TAT during afternoons and weekends. 1. Boxplots (TAT by shift and test type) - Morning shift boxplot is visibly higher and more spread out. - One specific test panel shows a much higher median and wider spread than others. 1. Histogram of TAT - Shows a long right tail indicating a subset of extreme delays. - Confirms that the “average TAT” alone hides significant late outliers. 1. Dotplot of TAT vs. Sample Arrival Time - Shows clustering of delayed samples when arrival volume peaks. - Highlights a “bottleneck window” where many samples arrive together and delays start. 1. Process Map Overlaid with Cycle Time Bars - Not detailed analysis—just a quick visual overlay of average step times. - Visually highlights a queue forming before centrifugation in the morning. Result From the graphs, the team identifies that: - delays are concentrated in the morning shift, - primarily for one high-volume test panel, - with a clear bottleneck at centrifugation during the 6–9 a.m. window. Without changing staffing levels, they: - stagger phlebotomy rounds by ward, - pre-stage centrifuges before 6 a.m., - prioritize the problematic test panel in the morning batch. Follow-up graphs (time series and boxplots) over the next month show: - median TAT reduced and more consistent for the morning shift, - extreme outliers in TAT significantly reduced, - complaints from physicians dropping correspondingly. End section

Practice question: Graphical Analysis A Black Belt is analyzing call center handle time data that are highly right-skewed with several extreme long calls. She wants to visually assess whether a log transformation improves normality and stabilizes variance before performing a capability analysis. Which graphical approach is most appropriate? A. Time series plot of raw data B. Boxplot of raw versus log-transformed data C. Normal probability plots of raw and log-transformed data, side by side D. Scatter plot of raw data versus log-transformed data Answer: C Reason: Normal probability plots directly assess normality by comparing data to a theoretical normal distribution; plotting raw and transformed data side by side allows evaluation of improvement in linearity (normality) and detection of reduced skew. Options A, B, and D provide useful descriptive views but do not directly and rigorously assess normality before capability analysis. --- An engineer wants to determine whether a process shift occurred after a new machine setting was implemented at a known point in time. There is a continuous quality characteristic measured daily for 60 days before and 60 days after the change. Which graphical analysis is most appropriate as a first step? A. Side-by-side boxplots of “before” vs “after” data B. Time series plot with a reference line at the change point C. Scatter plot of measurement versus operator D. Histogram of all data combined Answer: B Reason: A time series plot with the change point marked allows visual assessment of level shifts, trends, and patterns over time relative to the implementation date, which is critical for detecting a process shift. Option A summarizes distributions but loses temporal sequence; C is unrelated to time; D masks the shift by combining data before and after the change. --- A Black Belt reviews a main effects plot from a designed experiment with factors A, B, and C. The lines for factor A show a steep slope, while lines for B and C are nearly flat. Interaction plots show no clear crossing of lines. Which interpretation is most appropriate? A. Only factor A appears to have a practical main effect; B and C do not. B. All three factors have strong main effects with no interactions. C. There are strong interactions among all factors but no main effects. D. No conclusions can be drawn from graphical analysis alone. Answer: A Reason: In main effects plots, the steepness of the line indicates the strength of the main effect; a steep slope for A and nearly flat lines for B and C indicate A is influential while B and C likely have negligible effects. Absence of crossing in interaction plots suggests limited interaction. Option B overstates effects; C contradicts the non-crossing interaction plots; D ignores the legitimate insights that main and interaction plots provide in DOE. --- A process capability study is performed on a critical dimension measured in millimeters. The histogram with specification limits overlaid shows a bimodal distribution, even though the overall mean is within specifications and the calculated Cp and Cpk values appear acceptable. Which conclusion is most appropriate? A. The process is capable, and no further analysis is required. B. The bimodal distribution suggests a mixture of sources; investigate subgroups before trusting capability indices. C. The histogram is not relevant; only Cp and Cpk matter. D. The bimodal shape means the measurement system is automatically unacceptable. Answer: B Reason: A bimodal histogram suggests the data may come from different process conditions (e.g., shifts, machines, lots). Capability indices assume a single, stable distribution; mixture must be investigated via stratified or subgrouped graphical analysis before relying on Cp/Cpk. Option A ignores a critical signal; C misuses capability indices; D over-attributes the pattern to measurement error without evidence. --- A Black Belt is comparing defect rates (proportion nonconforming) for five production lines over the last quarter. Data are defect counts and unit counts per line. She wants a graphical tool to compare performance and visually identify lines with significantly different defect rates. Which is most appropriate? A. Boxplots of defect counts by line B. Individual X chart of total defects over time C. P-chart stratified by line D. Pareto chart of defect types across all lines combined Answer: C Reason: A P-chart is designed for proportions (defect rate) with potentially varying sample sizes; stratifying by line allows comparison of performance and visual detection of statistically significant differences among lines. Option A ignores the denominator (units inspected); B does not separate lines; D loses line-level comparison by aggregating lines into a single Pareto of defect types.

bottom of page