top of page

3.5.3 Mood’s Median

Mood’s Median Concept and Purpose Mood’s Median test is a nonparametric hypothesis test used to compare the medians of two or more independent groups. - Goal: Determine whether population medians are equal. - Type: Distribution-free, uses medians instead of means. - Design: One-factor, independent samples, k groups (k ≄ 2). It is especially useful when: - Data are not normally distributed. - There are outliers that distort the mean. - Only ordinal data or “at least ordinal” scale is available. - Variances are unequal and classical ANOVA assumptions are not met. Assumptions and Data Requirements Data and Design Assumptions - Independent samples: Observations in each group are independent of each other. - Independent groups: Different groups do not share participants or repeated measures. - At least ordinal scale: Data must be ranked or better (interval, ratio are fine). - Similar shapes (desirable): Interpretation is cleanest when group distributions have similar shapes, so differences can reasonably be attributed to medians. The test is robust and does not require: - Normality. - Homogeneity of variance. - Equal group sizes. Hypotheses and Test Logic Hypotheses For k groups: - Null hypothesis (H₀): All population medians are equal. - Alternative hypothesis (H₁): At least one population median is different. For two groups, this is equivalent to: - H₀: Median₁ = Median₂ - H₁: Median₁ ≠ Median₂ The test is generally treated as a two-sided test: any difference in median (higher or lower) leads to rejection of H₀. Conceptual Logic - Pool all observations across groups. - Find the overall median of the pooled data. - For each group, classify observations as: - Above the overall median. - At or below the overall median. - Count how many observations in each group fall into each category. - Compare the observed counts with the counts expected if all groups had the same median. - Use a chi-square statistic to quantify the discrepancy between observed and expected counts. If the discrepancy is large relative to what chance alone would produce, reject H₀. Step-by-Step Procedure Step 1: Pool Data and Compute Overall Median - Collect all observations from all groups into a single combined list. - Compute the overall median of this pooled dataset. If there is an even number of total observations, any conventional definition of median is acceptable; common practice is to use the average of the two central values. Step 2: Classify Observations For each observation: - Label as “above median” if strictly greater than the overall median. - Label as “at or below median” if equal to or less than the overall median. For each group: - Count Aᔹ: number of observations above the overall median. - Count Bᔹ: number of observations at or below the overall median. - Let nᔹ = Aᔹ + Bᔹ be the group sample size. Let: - N: total sample size across all groups. - T: total number of observations above the overall median across all groups. - N − T: total number at or below the overall median. Step 3: Compute Expected Counts Under H₀ (equal medians), each group would be expected to have: - E(Aᔹ) = nᔹ × (T / N) observations above median. - E(Bᔹ) = nᔹ × ((N − T) / N) observations at or below median. These are the expected frequencies if the probability of being above the median is the same in every group. Step 4: Compute Chi-Square Test Statistic The Mood’s Median test statistic is a chi-square statistic based on the 2 × k table of counts (above vs at/below by group): [ \chi^2 = \sum{i=1}^{k} \left[ \frac{(Ai - E(Ai))^2}{E(Ai)} + \frac{(Bi - E(Bi))^2}{E(B_i)} \right] ] Where: - Aᔹ, Bᔹ: observed counts. - E(Aᔹ), E(Bᔹ): expected counts under H₀. For the usual large-sample version: - Degrees of freedom (df) = k − 1 The test relies on the chi-square approximation, which is more accurate when expected frequencies are not too small (commonly at least about 5 per cell). Step 5: Determine p-Value and Decision - Use the chi-square distribution with df = k − 1 to compute the p-value for the calculated χÂČ. - Compare the p-value to the chosen significance level α (often 0.05): - If p ≀ α, reject H₀ (evidence that at least one group median differs). - If p > α, fail to reject H₀ (insufficient evidence to say medians differ). The test is non-directional as implemented here; it does not specify which group median is higher or lower, only that not all medians are equal. Handling Ties and Special Cases Ties at the Overall Median Values exactly equal to the overall median are classified as “at or below median.” This is standard practice and maintains the 2 × k structure: - Category 1: Above median. - Category 2: At or below median (including ties). This can sometimes produce a large “at or below” group when many values equal the median, slightly reducing power but not invalidating the test. Small Sample and Exact Tests When: - Total sample size is small, or - Some expected counts are very low, the chi-square approximation may be inaccurate. In such cases: - An exact test for the 2 × k contingency table can be used, based on the exact distribution of counts under H₀. - Software may automatically adjust or warn about low expected counts. For many practical applications with modest or large samples, the chi-square approximation is adequate and commonly used. Interpretation and Usage What a Significant Result Means If H₀ is rejected: - There is statistical evidence that not all group medians are equal. - Mood’s Median test does not indicate: - Which specific groups differ. - The direction (higher or lower) for each pair. - The magnitude of median differences. To interpret results meaningfully: - Examine sample medians for each group to understand direction and size of differences. - Consider graphical summaries (boxplots, dotplots) to visualize median differences and data spread. Multiple Comparisons and Follow-Up Mood’s Median test is an overall test. When it is significant and more than two groups are involved, follow-up steps are often needed: - Compare group medians pairwise by: - Inspecting median values. - Using complementary nonparametric comparisons where appropriate (for example, based on ranks) to confirm specific differences. Any multiple comparison strategy should consider control of the overall Type I error rate if many pairwise tests are performed. Mood’s Median itself does not provide adjusted pairwise comparisons. Strengths and Limitations Strengths: - Robust to: - Non-normal data. - Unequal variances. - Outliers. - Simple interpretation: focuses on medians, which are resistant to extremes. - Easy to explain to non-technical audiences when needed. Limitations: - Uses only position relative to the overall median (above vs at/below), discarding detailed information on: - Actual values. - Magnitude of differences. - Generally less powerful than tests using more detailed information (for example, rank-based tests) when their assumptions are reasonably met. - Sensitive mainly to location (median) differences, not shape differences such as spread or skewness. For situations where detecting differences in medians under minimal assumptions is central, Mood’s Median is an appropriate choice. Practical Tips for Application When to Prefer Mood’s Median Mood’s Median is particularly suitable when: - Data distributions are heavily skewed or contain extreme outliers. - A robust comparison centered strictly on medians is desired. - Only ordinal or highly non-normal continuous data are available. - There is concern that unequal variances or distribution shapes could distort parametric or some rank-based tests. Data Preparation and Checking Before running the test: - Check independence: - Ensure that each observation belongs to only one group. - Confirm there is no repeated measurement of the same unit across groups. - Inspect distributions: - Look for extreme outliers. - Assess skewness and shape to support the decision to use a median-based test. - Verify group sizes: - Very small groups can lead to low expected counts; consider exact methods or cautious interpretation. Reporting Results When reporting Mood’s Median test, include: - Group names and sample sizes. - Sample medians for each group. - Test statistic and degrees of freedom: - χÂČ(value), df = k − 1 - p-value. - Clear conclusion relating back to the question about medians. Example reporting template: - “Mood’s Median test indicated a statistically significant difference in medians among the groups (χÂČ = X.XX, df = k − 1, p = 0.0XX). Group sample medians were: [Group 1: M₁, Group 2: M₂, 
].” Or, when not significant: - “Mood’s Median test did not indicate a statistically significant difference in medians among the groups (χÂČ = X.XX, df = k − 1, p = 0.XXX).” Common Pitfalls Misinterpreting What Is Tested - Mood’s Median tests medians, not means. - A non-significant result does not guarantee equality of distributions, only no detected difference in medians at the chosen α. - A significant result says “not all medians are equal,” but does not identify which specific medians differ. Ignoring Sample Medians Running the test without examining: - Sample medians, and - Graphical summaries can hide practical patterns. Always pair the test with at least simple descriptive statistics. Applying to Dependent Samples Mood’s Median requires independent groups: - Do not use for repeated measures from the same units over time. - Do not use for matched pairs or any data with inherent pairing. Dependent-sample situations require other methods tailored to paired or repeated data. Summary Mood’s Median test is a nonparametric, chi-square-based procedure for assessing whether the medians of two or more independent groups are equal. It: - Uses an overall pooled median and classifies each observation as above or at/below this median. - Builds a 2 × k contingency table of counts by group and median category. - Compares observed counts with expected counts under the hypothesis of equal medians using a chi-square statistic with k − 1 degrees of freedom. - Provides a robust alternative for detecting location (median) differences when normality or equal variances cannot be assumed, or when data are ordinal or strongly affected by outliers. Careful adherence to assumptions of independent groups, appropriate data types, and correct interpretation ensures effective use of Mood’s Median in comparing group medians under real-world, non-ideal data conditions.

Practical Case: Mood’s Median A regional hospital wants to compare patient discharge times among three wards (A, B, C) after introducing a new electronic documentation system. The goal is to see if any ward has a systematically longer discharge time. Nursing leadership suspects Ward C has slower discharges, but the data are highly skewed: some complex cases stay many extra hours, making averages unreliable. The Lean Six Sigma project team gathers a modest sample of discharge times (in hours) from each ward for the past month. Because: - the data are non-normal, - there are clear outliers, - and sample sizes differ slightly, the Black Belt recommends Mood’s Median test to compare medians across the three wards instead of using a parametric ANOVA. They: 1. Combine all discharge times and find the overall median. 2. Classify each discharge as “below” or “above” this median. 3. Use Mood’s Median test to check whether the pattern of “below/above” differs significantly by ward. The test shows a statistically significant difference: Ward C has a much higher proportion of times above the median, while Wards A and B do not differ meaningfully from each other. Result: The team concludes that Ward C’s median discharge time is significantly longer. They focus a follow-up kaizen event on Ward C’s discharge steps (physician sign-off and pharmacy coordination), rather than changing the process hospital-wide. End section

Practice question: Mood’s Median A Black Belt is comparing the central tendency of cycle times from four different production lines. Normality tests fail and data are heavily skewed with several extreme outliers. Which test is most appropriate to compare the medians across all four lines? A. One-way ANOVA B. Kruskal–Wallis test C. Mood’s Median test D. Paired t-test Answer: C Reason: Mood’s Median test compares medians across two or more independent groups and is robust to non-normality and outliers, making it appropriate for skewed data with extreme values. A tests means under normality, B compares distributions (not specifically medians) and is more sensitive to shape, and D is for paired, not independent, samples. --- A Black Belt is evaluating customer wait times for two branches (Branch 1 and Branch 2). The overall median wait time for the combined data set is 11 minutes. The contingency table of counts above/below the combined median is: - Branch 1: 18 observations ≀ 11 min, 22 observations > 11 min - Branch 2: 30 observations ≀ 11 min, 10 observations > 11 min Which conclusion is most consistent with the Mood’s Median test at α = 0.05? A. Fail to reject H0; the medians of Branch 1 and Branch 2 are equal B. Reject H0; the medians of Branch 1 and Branch 2 are different C. Cannot use Mood’s Median because sample sizes are unequal D. Cannot conclude because the median must be recomputed for each branch Answer: B Reason: Mood’s Median evaluates whether groups differ in proportion of observations above/below the overall median. Branch 1 is skewed high (more > 11), Branch 2 skewed low (more ≀ 11); this pattern typically yields a significant chi-square, indicating different medians. C is incorrect because unequal sample sizes are allowed, and D misunderstands the test structure. --- A Black Belt wants to select a test to compare the medians of three suppliers’ delivery lead times. Data are ordinal (1–5 ranked delay severity scores), independent, and mildly skewed but without extreme outliers. Which justification best supports choosing Mood’s Median test over Kruskal–Wallis? A. Mood’s Median has higher power when distributions differ in shape B. Mood’s Median directly tests equality of medians, independent of distribution shape C. Mood’s Median requires normality, while Kruskal–Wallis does not D. Mood’s Median can be used only when there are exactly two samples Answer: B Reason: Mood’s Median explicitly tests equality of population medians via counts around a common median and is less influenced by distribution shape, making it appropriate when the parameter of interest is strictly the median. A reverses the relative power property, C is incorrect on assumptions, and D is false since Mood’s Median can be applied to more than two groups. --- A Black Belt applies Mood’s Median test to compare the median transaction amounts among four regions. The software output shows p-value = 0.18. Which statement is the most appropriate interpretation at α = 0.05? A. At least one region’s median is significantly different from the others B. All four regions have exactly the same median transaction amount C. There is insufficient evidence to conclude that the medians differ among regions D. The distributions are identical across all four regions in every respect Answer: C Reason: With p-value (0.18) > α (0.05), the Black Belt fails to reject H0 and concludes there is not enough evidence to assert median differences. B and D overstate the conclusion (test only fails to find differences, does not prove equality or identical distributions), and A contradicts the non-significant p-value. --- A Black Belt has non-normal data with severe outliers and is deciding between tests. She wants to compare a continuous CTQ outcome across three independent process alternatives and is interested specifically in detecting differences in the central tendency. When is Mood’s Median more appropriate than a one-way ANOVA? A. When the data are normally distributed with equal variances B. When the primary interest is the median and robustness to extreme outliers is required C. When sample sizes are large and distributions are symmetric D. When comparing variances among three or more groups Answer: B Reason: Mood’s Median is a nonparametric test that compares medians and is robust to non-normal data and extreme outliers, making it preferable to ANOVA when assumptions of normality/homoscedasticity are violated and median is the parameter of interest. A describes conditions for ANOVA, C does not justify departing from ANOVA, and D refers to variance-comparison tests (e.g., Bartlett, Levene), not Mood’s Median.

bottom of page