top of page

3.5.6 1 Sample Wilcoxon

1 Sample Wilcoxon Concept and Purpose The 1 Sample Wilcoxon test (also called the Wilcoxon signed-rank test) is a nonparametric alternative to the 1-sample t-test. It is used when you want to test whether the median of a single population differs from a hypothesized value, and when the assumptions of the t-test (especially normality) are questionable. It is applied to: - A single set of numeric observations - Data that are at least ordinal (can be ranked) - Situations where you compare the population median to a benchmark or target The test uses the ranks of signed differences from the hypothesized value rather than using the raw data values. --- When to Use the 1 Sample Wilcoxon Appropriate Situations Use the 1 Sample Wilcoxon when: - You have one sample: A single group of observations from one process or condition. - You have a benchmark: A target or reference value to compare the median against. - Data are not normal: Skewed distributions or clear violation of normality. - You want to test the median, not the mean. - You have small samples: Especially when normality is hard to assess and t-test assumptions are doubtful. Typical examples: - Testing whether the median cycle time is lower than a customer requirement. - Checking whether the median defect count per batch differs from a specified value. - Evaluating whether the median satisfaction score meets a target. When Not to Use It It is not appropriate when: - Data are not at least ordinal (e.g., purely nominal categories). - There are many ties or zeros that prevent meaningful ranking. - You are interested specifically in the mean and the normality assumption is reasonably satisfied (then 1-sample t-test may be preferable). --- Assumptions and Data Requirements Key Assumptions The 1 Sample Wilcoxon test requires: - Random and independent observations: Each measurement is independent of the others. - Symmetry of differences around the median: The distribution of differences from the hypothesized value is symmetric. The test is robust but works best when this is roughly true. - Ordinal or better: Data must be at least rankable; interval or ratio data are fine. Data Requirements - Single sample of size n - Hypothesized median (M₀): A specific numeric benchmark - Reasonable sample size: The test can be used with very small samples, but power increases with n. When sample size is large (usually n ≥ 10–20, depending on the tool), a normal approximation is commonly used for the test statistic. --- Hypotheses and Test Direction Null and Alternative Hypotheses You are testing a statement about the population median (M). - Null hypothesis (H₀): The population median equals the hypothesized value: - H₀: M = M₀ - Alternative hypothesis (H₁) (choose form based on the question): - Two-sided: H₁: M ≠ M₀ - Lower one-sided: H₁: M < M₀ - Upper one-sided: H₁: M > M₀ Examples: - Two-sided: “Is the median cycle time different from 10 minutes?” - Lower: “Has the median defect rate been reduced below 2 per batch?” - Upper: “Is the median lead time greater than 5 days?” Significance Level and Decision Rule - Significance level (α): Commonly 0.05. - Decision rule: - If p-value ≤ α: Reject H₀ (evidence that the median differs as specified by H₁). - If p-value > α: Do not reject H₀ (insufficient evidence to conclude a difference). --- Mechanics of the 1 Sample Wilcoxon Test Step 1: Compute Differences from the Hypothesized Median Given observations ( X1, X2, ..., Xn ) and hypothesized median ( M0 ): - Compute differences: ( Di = Xi - M_0 ) - Ignore any zero differences (where ( D_i = 0 )); they do not enter the ranking. - Let ( n' ) be the number of nonzero differences. Step 2: Rank the Absolute Differences - Take absolute values: ( |D_i| ) - Rank the absolute differences from smallest to largest: - Smallest abs difference gets rank 1, next gets rank 2, etc. - Handle ties: - If two or more values of ( |D_i| ) are equal, assign them the average of the ranks they would occupy. Step 3: Apply Signs to the Ranks - For each nonzero difference: - If ( D_i > 0 ), assign the rank a positive sign. - If ( D_i < 0 ), assign the rank a negative sign. You now have signed ranks. Step 4: Compute the Test Statistic Two common equivalent forms are used: - Sum of positive ranks (T⁺): - ( T^+ = ) sum of all ranks where ( D_i > 0 ) - Sum of negative ranks (T⁻): - ( T^- = ) sum of all ranks where ( D_i < 0 ) Software may use one or both, but they carry the same information. Often the smaller of T⁺ and T⁻ is compared to distribution tables for exact p-values in small samples. For large samples, a normal approximation can be used by converting T⁺ (or T⁻) into a z-statistic. --- Interpreting Output and Results Typical Output Elements Statistical software for the 1 Sample Wilcoxon typically provides: - Sample size (n) and number of nonzero differences - Hypothesized median (M₀) - Test statistic: - Sum of positive ranks (T⁺) and/or negative ranks (T⁻) - Possibly a z-value (for normal approximation) - p-value: - Two-sided p-value for H₁: M ≠ M₀ - One-sided p-value for H₁: M < M₀ or M > M₀, as selected - Estimated median (often sample median) - Confidence interval for the median (typically a nonparametric CI) Practical Interpretation Connect the statistical result to the real question: - If p-value ≤ α: - There is statistically significant evidence that the population median differs from M₀ in the direction stated by H₁. - Example: If H₁: M < M₀, a small p-value supports that the median is below the target. - If p-value > α: - There is not enough evidence to conclude that the median differs from M₀. - You do not prove equality; you simply fail to reject H₀. - Use the confidence interval: - If the CI for the median does not contain M₀, this is consistent with rejecting H₀. - If the CI includes M₀, this is consistent with not rejecting H₀. Direction of Effect Interpret the pattern of signed ranks: - Many large positive signed ranks: - Data values tend to be greater than M₀. - Many large negative signed ranks: - Data values tend to be less than M₀. This aligns with one-sided hypothesis decisions. --- Comparison with the 1-Sample t-Test Similarities Both tests: - Compare a single sample to a numeric benchmark. - Use hypothesis testing with p-values and confidence intervals. - Require independent observations. Differences - Parameter tested: - 1-sample t-test: Mean (μ). - 1 Sample Wilcoxon: Median (M). - Assumptions: - t-test: Assumes approximate normality of data or at least of the sampling distribution of the mean. - 1 Sample Wilcoxon: Nonparametric, does not require normality, but assumes symmetry of differences. - Data basis: - t-test: Uses raw data values directly. - 1 Sample Wilcoxon: Uses ranks of signed differences from M₀. - Robustness to outliers: - 1 Sample Wilcoxon is generally more robust to outliers and heavy tails. When data are clearly non-normal, heavily skewed, or include extreme values, and when median is a more relevant measure of central tendency, the 1 Sample Wilcoxon is often preferable. --- Practical Considerations and Common Pitfalls Sample Size and Power - Small samples: - The test is valid but may have low power (less ability to detect real differences). - Exact p-values (based on the exact distribution of T⁺/T⁻) are often used. - Larger samples: - Normal approximation is common. - Power is generally reasonable if the effect size is not very small. Handling Ties and Zeros - Ties in |Dᵢ|: - Assign average ranks; software does this automatically. - Many ties can slightly change the distribution of the test statistic. - Zero differences (Xᵢ = M₀): - Exclude from ranking. - Large numbers of zeros reduce effective sample size, weakening the test. Checking Assumptions Informally Even though this is a nonparametric test, consider: - Symmetry: - If the differences ( Di = Xi - M_0 ) are extremely skewed, results may be harder to interpret as a median test. - Independence: - Do not use this test for time-series data with strong autocorrelation without appropriate adjustments. - Do not treat repeated measures on the same unit as independent observations. --- Reporting the 1 Sample Wilcoxon Test When summarizing results, include: - Purpose: - What median and benchmark were tested. - Data description: - Sample size and context (e.g., type of measurement). - Test details: - Test name (1 Sample Wilcoxon or Wilcoxon signed-rank). - Hypothesized median (M₀). - Direction of test (two-sided, less-than, greater-than). - Statistics: - Test statistic (T⁺ or T⁻ and, if applicable, z-value). - p-value. - Sample median and confidence interval for the median. - Conclusion in context: - State whether evidence supports a median different from M₀ and how (higher or lower). Example structure: - “A 1 Sample Wilcoxon test was used to evaluate whether the median cycle time differs from 10 minutes (H₀: M = 10). With n = 20, the sum of positive ranks was T⁺ = 180, p = 0.012 (two-sided). The sample median was 8.7 minutes (95% CI: 7.9 to 9.5). We conclude that the median cycle time is significantly different from 10 minutes and is lower in practice.” --- Summary The 1 Sample Wilcoxon test is a nonparametric method for testing whether the median of a single population equals a hypothesized value. It is based on the ranks of signed differences from that value and is well-suited for non-normal, ordinal, or skewed data. Effective use requires understanding: - When it is appropriate (single sample, benchmark, non-normal data). - How to set null and alternative hypotheses for the median. - How to compute and interpret the signed-rank test statistic. - How to read p-values and confidence intervals to draw practical conclusions. By focusing on medians and ranks instead of means and raw values, the 1 Sample Wilcoxon test provides a robust tool for assessing central tendency against a target under less restrictive assumptions than the 1-sample t-test.

Practical Case: 1 Sample Wilcoxon A regional hospital wants to check if a new triage process actually reduces median emergency-room waiting time below the internal target of 30 minutes. Previously, they used parametric tests, but the new waiting-time data are clearly skewed with several extreme waits; the normality assumption is not reasonable. The improvement team decides to use a 1 Sample Wilcoxon test against the historical target of 30 minutes. Over one week, they collect a convenience sample of 25 patient waiting times (in minutes) after the new process is implemented. They input the data into their statistical software, specify: - Test: 1 Sample Wilcoxon (signed-rank) - Null median: 30 minutes - Alternative: true median < 30 minutes - Alpha: 0.05 The software reports: - P-value = 0.012 - Median of sample ≈ 24 minutes - Conclusion: reject the null hypothesis that the median wait time is 30 minutes. The Lean Six Sigma team documents that, based on the 1 Sample Wilcoxon test, there is statistically significant evidence that the new triage process has reduced the median waiting time below the 30-minute target, and they proceed to standardize and roll out the new process across all shifts. End section

Practice question: 1 Sample Wilcoxon A Black Belt wants to test whether the median completion time of a new process differs from the historical target of 20 minutes. The data are non-normal with clear skewness and include several mild outliers. Which is the most appropriate test? A. 1-Sample t-test B. 1 Sample Wilcoxon signed-rank test C. 2 Sample Wilcoxon rank-sum test D. Chi-square goodness-of-fit test Answer: B Reason: The 1 Sample Wilcoxon signed-rank test is the nonparametric alternative to the 1-sample t-test used to test a hypothesized median when normality is violated. Other options: A requires approximate normality, C is for two independent samples, and D is for categorical distributions, not a single continuous median. --- A Black Belt applies a 1 Sample Wilcoxon test to assess whether the median defect repair time differs from 5 hours (H0: median = 5). The software output shows p-value = 0.018 at α = 0.05. What is the correct conclusion? A. Fail to reject H0; there is not enough evidence that the median is different from 5 hours B. Reject H0; there is evidence that the population mean is greater than 5 hours C. Reject H0; there is evidence that the population median differs from 5 hours D. Fail to reject H0; the data are non-normal so this test is invalid Answer: C Reason: p-value < α indicates rejecting H0, concluding the population median is significantly different from 5 hours (two-sided). Other options: A and D ignore the p-value decision rule; B incorrectly states “mean” and “greater than” rather than “median” and “different from,” which is what the generic Wilcoxon two-sided test assesses here. --- A Black Belt has 10 paired differences (Xi – hypothesized median m0) from a process and wants to manually verify the Wilcoxon signed-rank calculation. After discarding zeros, they obtain 8 nonzero differences. What is the correct first step in computing the Wilcoxon signed-rank statistic? A. Convert all differences to z-scores, then sum the positive z-scores B. Rank the absolute values of the nonzero differences from smallest to largest, assigning average ranks to ties C. Rank the signed differences from most negative to most positive, ignoring ties D. Take the ranks from a standard normal table and assign them to each observation Answer: B Reason: The Wilcoxon signed-rank procedure ranks the absolute values of nonzero differences, then assigns the sign back to each rank; test statistics are then based on sums of signed ranks. Other options: A and D are unrelated to the Wilcoxon procedure; C incorrectly ranks signed values directly rather than absolute values. --- A Black Belt compares the median time-to-ship against a contractual target of 48 hours using a 1 Sample Wilcoxon test. There are 50 observations with no ties or zeros in the differences from 48. Software reports: T+ = 950, T– = 325, and a two-sided p-value = 0.23. At α = 0.05, which statement is most appropriate? A. There is not enough evidence to conclude that the process median differs from 48 hours B. The process median is significantly less than 48 hours C. The process median is significantly greater than 48 hours D. The sample size is too small to perform the 1 Sample Wilcoxon test Answer: A Reason: A p-value = 0.23 > 0.05 leads to failing to reject H0, meaning insufficient evidence that the population median differs from 48 hours. Other options: B and C assert directional differences not supported by the non-significant p-value; D is incorrect because n = 50 is adequate for the Wilcoxon test. --- A Black Belt is deciding between a 1 Sample Wilcoxon test and a sign test to compare a process median cycle time to a target. The data have many small deviations from the target and are symmetrically distributed but non-normal. Which statement best justifies choosing the 1 Sample Wilcoxon test? A. The Wilcoxon test uses the magnitudes of differences as well as their signs, giving it more power than the sign test B. The Wilcoxon test is valid only for normal data, unlike the sign test C. The Wilcoxon test is designed only for large samples, unlike the sign test D. The Wilcoxon test can only be used for categorical data, making it more robust Answer: A Reason: The Wilcoxon signed-rank test uses both sign and rank (magnitude) of differences, providing higher statistical power than the sign test when symmetry is reasonable. Other options: B is false (Wilcoxon is nonparametric), C is incorrect (it handles small and large n with appropriate tables/approximations), and D is incorrect since Wilcoxon is for continuous/ordinal data, not categorical.

bottom of page