24h 0m 0s
🔥 Flash Sale -50% on Mock exams ! Use code 6sigmatool50 – Offer valid for 24 hours only! 🎯
3.5.2 Kruskal-Wallis
Kruskal-Wallis Introduction The Kruskal-Wallis test is a nonparametric statistical test used to compare three or more independent groups on a continuous or ordinal outcome. It is often used as a substitute for one-way ANOVA when data do not meet ANOVA’s assumptions, especially normality and equal variances. Kruskal-Wallis is based on ranks rather than raw data, making it robust against outliers and non-normal distributions. It tests whether the groups come from the same population distribution. --- Purpose of the Kruskal-Wallis Test When to Use Kruskal-Wallis Use Kruskal-Wallis when: - Objective: Compare the central tendency of 3+ independent groups. - Data type: Ordinal or continuous data that may be skewed or non-normal. - Assumptions for parametric ANOVA are doubtful or clearly violated. - Design: One factor (one-way), with independent samples. Typical examples: - Comparing customer satisfaction scores (Likert scales) across several branches. - Comparing cycle times across several process designs when data are skewed. - Comparing defect detection scores across more than two training methods. Relationship to Other Tests - Alternative to one-way ANOVA: - ANOVA compares means assuming normality and equal variances. - Kruskal-Wallis compares distributions using ranks and is less sensitive to these assumptions. - Extension of Mann-Whitney: - Mann-Whitney (Wilcoxon rank-sum) compares 2 independent groups. - Kruskal-Wallis generalizes this concept to 3 or more independent groups. --- Assumptions and Data Requirements Core Assumptions For valid results, Kruskal-Wallis requires: - Independent groups: - Each observation belongs to only one group. - No subject appears in more than one group. - Independent observations: - Measurements from different participants or items are not correlated. - Ordinal or higher data: - You must be able to rank the data meaningfully. - Interval and ratio data are acceptable; nominal data are not. - Similar distribution shapes: - Groups should have similarly shaped distributions. - If this holds, Kruskal-Wallis mainly compares medians. - If not, it signals differences in overall distributions, not just central tendency. When Not to Use Kruskal-Wallis Avoid Kruskal-Wallis when: - You have paired or repeated measures data (within-subjects design); use a repeated-measures nonparametric method instead. - Groups are not independent (clustered in ways that create correlation). - Data are purely categorical without an inherent order. --- Conceptual Basis: Ranking and H Statistic Ranking the Data Kruskal-Wallis converts raw data to ranks: - Combine all observations from all groups into a single list. - Rank them from smallest (rank 1) to largest (rank N, where N is total sample size). - For ties, assign each tied value the average of the ranks they would occupy. This rank transformation: - Reduces the impact of non-normality and outliers. - Makes the test robust when distributions are skewed. The H Test Statistic Kruskal-Wallis uses a test statistic called H. Conceptually: - It compares the sum of ranks in each group to what would be expected if all groups came from the same population. - If group medians (or distributions) are similar, rank sums should be similar. - Larger differences in rank sums lead to larger H values. H is approximately chi-square distributed under the null hypothesis when sample sizes are not too small. --- Hypotheses, Interpretation, and Decision Rules Hypotheses - Null hypothesis (H₀): - All groups come from the same population distribution. - Equivalently, there is no difference in distribution location (medians), assuming similar shapes. - Alternative hypothesis (H₁): - At least one group’s distribution differs from at least one other group. Kruskal-Wallis is a global test: it tells you that differences exist but not which groups differ. Significance and Decision After calculating H (and possibly a tie-corrected version): - Determine the degrees of freedom: - df = k − 1, where k is the number of groups. - Obtain the p-value using the chi-square distribution with df = k − 1. - Decision rule: - If p-value ≤ α (commonly 0.05): reject H₀; at least one group differs. - If p-value > α: fail to reject H₀; no statistically significant difference detected. The test is directional only in the sense of “some difference exists,” not “which is higher.” --- Manual Calculation Steps Step 1: Organize the Data - List all observations with their group labels. - Let: - k = number of groups. - nᵢ = sample size of group i. - N = total sample size = n₁ + n₂ + … + n_k. Step 2: Rank All Observations - Sort all N values from smallest to largest. - Assign ranks 1 through N. - For ties: - Compute the average of the ranks that would be assigned if there were no ties. - Assign that average rank to each tied observation. Step 3: Compute Rank Sums and Means per Group For each group i: - Sum of ranks: Rᵢ. - Mean rank: R̄ᵢ = Rᵢ / nᵢ. These rank sums and mean ranks show which groups tend to have higher or lower values. Step 4: Compute the H Statistic (No Tie Correction) If there are no ties, use: H = [12 / (N(N + 1))] × Σ [Rᵢ² / nᵢ] − 3(N + 1) where the sum Σ is over all groups i = 1 to k. Step 5: Correct for Ties (When Present) Ties affect the distribution of ranks. Use a correction factor C: - Let: - For each tied group of values j, tⱼ is the number of tied observations. - Compute: C = 1 − [Σ (tⱼ³ − tⱼ)] / [N³ − N] Then use the tie-corrected H: H_corrected = H / C Use H_corrected when comparing to the chi-square distribution. Step 6: Obtain p-Value and Compare to Chi-Square - Degrees of freedom: df = k − 1. - Use the chi-square distribution with df = k − 1 to get the p-value for H_corrected. - Compare p-value to the chosen significance level α. For typical process improvement applications, the chi-square approximation is adequate when all group sizes are reasonably large (commonly all nᵢ ≥ 5, preferably ≥ 7–10). --- Post-Hoc Analysis and Multiple Comparisons Need for Post-Hoc Tests If Kruskal-Wallis is significant: - You know at least one group differs. - To identify which groups differ, perform pairwise comparisons. Kruskal-Wallis alone does not specify the pattern of differences. Common Nonparametric Pairwise Methods For pairwise follow-up comparisons: - Mann-Whitney (Wilcoxon rank-sum) on each pair of groups: - Compare Group A vs Group B, Group A vs Group C, etc. - Use the same rank-based logic as Kruskal-Wallis for two groups. - Dunn’s test (if available in software): - Specifically designed as a post-hoc test for Kruskal-Wallis. - Uses a standardized difference in mean ranks, with adjustment for multiple comparisons. Multiple Comparison Adjustments Because multiple pairwise tests inflate Type I error: - Adjust p-values or significance levels using methods such as: - Bonferroni adjustment: divide α by the number of comparisons. - Other corrections may be available in software interfaces. Key idea: ensure that your overall error rate remains controlled when testing several pairs. --- Effect Size and Practical Significance Why Effect Size Matters A statistically significant Kruskal-Wallis result does not automatically imply a large or meaningful difference. Effect size helps quantify how substantial the difference is. Common Effect Size Measures For Kruskal-Wallis, you can use: - Eta-squared (η²) approximation: - η² ≈ (H − k + 1) / (N − k) - Interpretation (rough guidance, not strict rules): - Around 0.01: small effect - Around 0.06: medium effect - Around 0.14: large effect - Epsilon-squared (ε²): - ε² = (H − k + 1) / (N² − 1) - Similar interpretation, but slightly different scaling. Choosing between η² and ε² is often dictated by convention or software output. The key is to interpret the magnitude in context of the process. --- Assumption Checks and Practical Diagnostics Distribution Shape Similarity Kruskal-Wallis is often described as comparing medians, but this is precise only when distributions have similar shapes and spreads. To check this: - Compare: - Group medians - Spreads (e.g., interquartile ranges) - Shapes (e.g., skew direction) If group distributions differ strongly in shape (not just location), then: - A significant Kruskal-Wallis result indicates at least one group distribution is different overall. - Interpretation should emphasize distributional differences, not just medians. Sample Size Considerations - Larger sample sizes: - Make the chi-square approximation more accurate. - Provide higher power to detect meaningful differences. - Very small sample sizes: - The approximation may be less accurate. - Exact methods may be needed, but are often handled automatically by software. --- Practical Implementation with Statistical Software Typical Software Inputs Most statistical packages or tools require you to specify: - Response variable: - The measurement you are comparing (e.g., time, score, rating). - Factor or group variable: - The categorical variable defining the groups (e.g., method, location, treatment). Typical Outputs Expect to see: - H statistic (or sometimes just labeled as test statistic). - Degrees of freedom (df). - p-value. - Mean ranks per group or rank sums. - Post-hoc test results (if requested). Focus interpretation on: - Whether p-value ≤ α. - Direction and magnitude of differences in mean ranks. - Effect size and practical impact on the process or system being studied. --- Common Pitfalls and Misinterpretations Misusing Kruskal-Wallis Avoid these errors: - Using Kruskal-Wallis with paired data: - It is not designed for repeated measures or matched pairs. - Treating nominal categories as ordered: - If categories have no natural order, do not treat them as ordinal. - Ignoring distribution shape differences: - Assuming it compares medians without checking whether shapes are similar. Over-Reliance on p-Values - A significant p-value: - Indicates evidence of a difference. - Does not describe how large or important the difference is. Combine Kruskal-Wallis results with: - Effect size estimates. - Contextual knowledge of the process. - Visual data summaries where possible (e.g., boxplots). --- Summary Kruskal-Wallis is a nonparametric, rank-based test used to compare three or more independent groups on an ordinal or continuous outcome when assumptions of one-way ANOVA are not met. It: - Uses ranks to reduce sensitivity to non-normality and outliers. - Tests the null hypothesis that all groups come from the same distribution. - Produces an H statistic, approximately chi-square distributed with df = k − 1. - Requires independent groups, independent observations, and at least ordinal data. - Is most interpretable as a test of median differences when group distributions share similar shapes. - Signals the presence of differences but requires post-hoc pairwise tests to identify which groups differ. - Benefits from effect size measures to assess practical significance. Mastering Kruskal-Wallis involves understanding when to use it, how it is calculated and interpreted, how to conduct and adjust post-hoc analyses, and how to connect statistical findings to meaningful conclusions about group differences.
Practical Case: Kruskal-Wallis A manufacturing plant wants to reduce assembly time variation across three shifts (day, evening, night). Data show assembly times are skewed with outliers, so a nonparametric test is needed. The Black Belt collects a small random sample of assembly times per operator from each shift. Because normality and equal variance assumptions are clearly violated and sample sizes are unequal, the team selects Kruskal-Wallis to compare median assembly times across the three shifts. Using statistical software, the Black Belt: 1. Enters assembly time as the response and shift as the factor. 2. Runs a Kruskal-Wallis test to see if at least one shift has a different median assembly time. 3. Obtains a p-value below the team’s alpha level, concluding there is a statistically significant difference among shifts. 4. Uses the software’s post-hoc pairwise comparisons (with adjusted p-values) to identify that the night shift’s median time is significantly higher than both day and evening; day and evening do not differ. The team then focuses its improvement efforts specifically on the night shift: reviewing staffing levels, training, and setup standardization, rather than launching plant-wide changes. End section
Practice question: Kruskal-Wallis A Black Belt is comparing customer satisfaction scores (1–10 scale) across four service centers. The scores are clearly skewed and contain several extreme outliers. The Black Belt decides to use the Kruskal-Wallis test. Which is the most appropriate justification? A. It compares medians across multiple groups without assuming normality B. It estimates differences in group variances without assuming equal variances C. It replaces the need for any graphical analysis before hypothesis testing D. It is required whenever the number of groups exceeds three Answer: A Reason: Kruskal-Wallis is a nonparametric test that compares the central tendency (location, often interpreted as median) of more than two independent groups using ranks, making it suitable for skewed data with outliers and without assuming normality. Other options are incorrect because the test does not focus on variance estimation (B), does not replace graphical analysis (C), and is based on distributional assumptions rather than number of groups alone (D). --- A Black Belt conducts a Kruskal-Wallis test on three process designs with sample sizes of 12, 15, and 10. The test statistic H is 7.20 with a p-value of 0.027. Using α = 0.05, what is the correct conclusion? A. Fail to reject H₀; there is no statistically significant difference among designs B. Reject H₀; at least one process design differs in distribution from the others C. Fail to reject H₀; the test is invalid because group sizes are unequal D. Reject H₀; all three process designs have different medians Answer: B Reason: With p = 0.027 < 0.05, the null hypothesis of equal distributions (typically equal medians) is rejected, indicating that at least one group differs, but not specifying which or how many. Options A and C are incorrect because unequal sample sizes are allowed and the p-value indicates significance; D is too strong, as Kruskal-Wallis does not prove that all groups differ from each other. --- A Black Belt compares cycle time (heavily right-skewed, ordinally recorded in bands) across five suppliers using Kruskal-Wallis. After ranking all observations and calculating H, the Black Belt obtains H = 3.1 with 4 degrees of freedom. The chi-square critical value at α = 0.05 and 4 df is 9.49. What is the correct interpretation? A. Reject H₀; at least one supplier’s median cycle time is different B. Fail to reject H₀; there is no evidence of a difference in distributions C. Reject H₀; the test statistic must exceed 2.0 to show significance D. Fail to reject H₀; the test is invalid since data are not interval Answer: B Reason: Since H = 3.1 < 9.49, the test statistic does not exceed the critical chi-square value; therefore, at α = 0.05 we fail to reject the null hypothesis of equal distributions. Option A and C misinterpret the decision rule, while D is incorrect because Kruskal-Wallis is appropriate for ordinal data and does not require interval-level measurement. --- A Black Belt is deciding between one-way ANOVA and Kruskal-Wallis for comparing three treatments. A normality test shows p = 0.001 for two groups, and boxplots indicate severe outliers and unequal variances. Sample sizes are small (n = 7, 8, 6). Which is the most appropriate choice and primary rationale? A. One-way ANOVA, because it is robust to all violations if α is set to 0.10 B. Kruskal-Wallis, because it does not require normality or equal variances C. One-way ANOVA, because Kruskal-Wallis requires equal group sizes D. Kruskal-Wallis, because it assumes normality only in the largest group Answer: B Reason: Kruskal-Wallis is a rank-based nonparametric alternative to one-way ANOVA, suitable when normality and homoscedasticity assumptions are violated, particularly with small, unequal sample sizes and severe outliers. Options A and C misstate ANOVA robustness and Kruskal-Wallis requirements; D incorrectly describes the test’s assumptions. --- A Black Belt runs a Kruskal-Wallis test on four call centers and obtains a significant result. What is the most appropriate next analytical step to support process decisions? A. Conclude all call centers differ and implement changes at all locations B. Perform appropriate post-hoc pairwise comparisons on the ranks C. Re-run the test as one-way ANOVA to confirm the result D. Increase α to 0.10 until the p-value becomes non-significant Answer: B Reason: A significant Kruskal-Wallis result indicates at least one difference among groups, so the next step is to perform post-hoc pairwise nonparametric comparisons (e.g., Dunn’s test) to identify which centers differ for targeted actions. Option A overgeneralizes, C is not required and mixes methodologies without purpose, and D is poor statistical practice and invalidates the inference.
