24h 0m 0s
🔥 Flash Sale -50% on Mock exams ! Use code 6sigmatool50 – Offer valid for 24 hours only! 🎯
3.4 Hypothesis Testing with Normal Data
Hypothesis Testing with Normal Data Foundations of Hypothesis Testing Purpose of Hypothesis Testing Hypothesis testing is a structured way to decide, using sample data, whether a claim about a population parameter should be rejected or not. Typical questions include: - Is the process mean equal to a target? - Has a change reduced the process mean or variation? - Are two processes’ means or variances different? - Is the proportion of defective items acceptable? Hypothesis testing balances: - Random variation: differences caused by chance. - Real effects: differences that are large enough to be unlikely under chance alone. Core Concepts and Vocabulary - Population: the entire set of items or outcomes of interest. - Sample: a subset drawn from the population. - Parameter: numerical characteristic of a population (mean μ, standard deviation σ, proportion p). - Statistic: numerical characteristic of a sample (mean , standard deviation s, sample proportion ). In hypothesis testing: - Null hypothesis (H₀): a statement of “no difference” or “no effect” (status quo). - Alternative hypothesis (H₁): a statement representing a difference, effect, or improvement. - Significance level (α): probability of rejecting H₀ when H₀ is true (Type I error). - p-value: probability, assuming H₀ is true, of obtaining a result at least as extreme as the observed. - Test statistic: standardized value (z, t, F, χ²) used to compare with a reference distribution. The decision rule: - If p-value ≤ α → reject H₀ (evidence of an effect). - If p-value > α → fail to reject H₀ (insufficient evidence of an effect). Normality and Its Role in Testing Normal Distribution Basics Normal data follow the bell-shaped distribution: - Symmetric around the mean μ. - Characterized by μ (center) and σ (spread). - Many process metrics (cycle time, dimensions, weight) are approximately normal. Hypothesis tests for means and variances often assume: - Data are normally distributed, or - Sample size is large enough for the Central Limit Theorem to apply (approximate normality of sample means). Checking Normality Before using tests that assume normality, verify that assumption with: - Graphical tools: - Histograms. - Normal probability plots (Q–Q plots). - Descriptive indicators: - Skewness (asymmetry). - Kurtosis (tail heaviness). - Formal tests (if needed): - Shapiro–Wilk. - Anderson–Darling. For IASSC-style applications: - Mild deviations from normality are often acceptable, especially with moderate to large samples. - Severe skewness or outliers can invalidate tests that strongly rely on normality. Structure of Hypothesis Tests with Normal Data Defining Hypotheses Each test begins with a clear statement of H₀ and H₁. For means: - Two-sided: - H₀: μ = μ₀ - H₁: μ ≠ μ₀ - Lower-sided: - H₀: μ ≥ μ₀ - H₁: μ < μ₀ - Upper-sided: - H₀: μ ≤ μ₀ - H₁: μ > μ₀ For variances: - H₀: σ² = σ₀² - H₁: σ² ≠ σ₀² (two-sided) or one-sided variants. For two-sample comparisons: - Means: H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂ (or one-sided). - Variances: H₀: σ₁² = σ₂² vs H₁: σ₁² ≠ σ₂². Choosing One- vs Two-Tailed Tests - Two-tailed tests are used when any difference (higher or lower) is important. - One-tailed tests are used when only one direction is relevant, such as: - Verifying improvement (mean decreased). - Ensuring a specification is not exceeded (mean less than or equal to a limit). The choice must be made before seeing the data; switching after seeing results biases the analysis. Test Statistic, Critical Region, and p-Value For all tests: - Compute a test statistic from the sample. - Determine a reference distribution (z, t, F, χ²). - Compare: - p-value with α, or - test statistic with critical values. Two equivalent decision methods: - p-value method: reject H₀ if p-value ≤ α. - Critical value method: reject H₀ if test statistic is in the rejection region determined by α. One-Sample Tests for Normal Data One-Sample z-Test for the Mean (σ Known) Use when: - Population standard deviation σ is known (or estimated very reliably). - Data are normal or sample size is large. Hypotheses: - Example: H₀: μ = μ₀; H₁: μ ≠ μ₀. Test statistic: - z = ( − μ₀) / (σ / √n). Assumptions: - Data are independent. - Population is normal or n is large (typically n ≥ 30). Interpretation: - Large |z| and small p-value indicate evidence that μ differs from μ₀. One-Sample t-Test for the Mean (σ Unknown) Use when: - Population standard deviation is unknown (most practical situations). - Data are approximately normal. Hypotheses: - Example: H₀: μ = μ₀; H₁: μ ≠ μ₀. Test statistic: - t = ( − μ₀) / (s / √n), - Degrees of freedom (df) = n − 1. Assumptions: - Data are independent. - Population normality is important for small n. - Robust to mild non-normality if n is moderate. As n grows, the t distribution approaches the normal distribution. One-Sample Chi-Square Test for Variance Use to test if the process variance matches a specified value. Hypotheses: - H₀: σ² = σ₀². - H₁: σ² ≠ σ₀² (or one-sided variants). Test statistic: - χ² = (n − 1)s² / σ₀², - df = n − 1. Assumptions: - Data are independent. - Population is normal (this assumption is important). Interpretation: - Compare χ² to chi-square critical values, or use p-value. - Helps verify if process variability meets requirements. Two-Sample Tests for Normal Data Independent Two-Sample t-Test (Means) Used to compare means of two independent groups (e.g., before vs. after with different units, two machines, two suppliers). Hypotheses: - H₀: μ₁ = μ₂. - H₁: μ₁ ≠ μ₂ (or one-sided). Two main forms: - Pooled-variance t-test (assumes equal variances). - Welch’s t-test (does not assume equal variances). Assumptions: - Samples are independent. - Each population is normal, or samples are large. - Equality of variances assumption must be assessed if using the pooled test. Test for Equality of Variances (F-Test) Used to check if two population variances differ. Hypotheses: - H₀: σ₁² = σ₂². - H₁: σ₁² ≠ σ₂² (or one-sided). Test statistic: - F = s₁² / s₂² (with s₁² ≥ s₂² by convention), - df₁ = n₁ − 1, df₂ = n₂ − 1. Assumptions: - Both populations are normal. - Samples are independent. Interpretation and use: - If H₀ is rejected, conclude variances differ. - Helps choose the appropriate two-sample t-test: - Variances equal → pooled-variance t-test. - Variances unequal → Welch’s t-test. Paired t-Test (Matched Pairs) Used when two measurements are taken from the same units or matched units (e.g., before vs. after on the same item). Approach: - Compute differences dᵢ = X₂ᵢ − X₁ᵢ for each pair. - Perform a one-sample t-test on the mean difference. Hypotheses: - H₀: μd = 0 (no average change). - H₁: μd ≠ 0 (or one-sided for direction). Test statistic: - = average of differences. - sd = standard deviation of differences. - t = / (sd / √n), - df = n − 1 (n = number of pairs). Assumptions: - Pairs are dependent; differences across pairs are independent. - Differences are approximately normal. Advantages: - Reduces variability by controlling for unit-to-unit differences. - More powerful than independent tests when pairing is appropriate. Proportion Tests with Normal Approximation One-Sample Proportion Test (Normal Approximation) Used when: - Outcome is pass/fail, yes/no, or defect/no defect. - Sample size is large enough for normal approximation. Hypotheses: - H₀: p = p₀. - H₁: p ≠ p₀ (or one-sided). Test statistic: - = sample proportion = x / n. - Standard error under H₀: SE = √[p₀(1 − p₀) / n]. - z = ( − p₀) / SE. Assumptions: - Independent Bernoulli trials. - np₀ and n(1 − p₀) both sufficiently large (commonly ≥ 5 or 10). Interpretation: - Helps determine if defect proportion differs from a target or specification. Two-Sample Proportion Test (Normal Approximation) Used to compare proportions from two independent samples (e.g., defect rates of two lines). Hypotheses: - H₀: p₁ = p₂. - H₁: p₁ ≠ p₂ (or one-sided). Test statistic: - ₁ = x₁ / n₁, ₂ = x₂ / n₂. - Pooled proportion under H₀: = (x₁ + x₂) / (n₁ + n₂). - SE = √[ ( (1 − ) (1/n₁ + 1/n₂) ) ]. - z = ( ₁ − ₂ ) / SE. Assumptions: - Independent samples. - Each group with sufficiently large counts of successes and failures. Type I Error, Type II Error, and Power Type I and Type II Errors Two kinds of errors exist in hypothesis testing: - Type I error (α): - Rejecting H₀ when H₀ is true. - Controlled by selecting significance level α (e.g., 0.05, 0.01). - Type II error (β): - Failing to reject H₀ when H₀ is false. - Depends on sample size, true effect size, variability, and α. There is a trade-off: - Lower α reduces false alarms but can increase β (missed effects). - Increasing sample size can reduce both α and β for a given effect size. Power of a Test - Power = 1 − β = probability of correctly rejecting H₀ when H₀ is false. - High power is desirable to detect practically important differences. Key influences on power: - Effect size: larger true differences from H₀ increase power. - Sample size (n): larger n increases power. - Variability (σ): lower σ increases power. - Significance level (α): higher α increases power, but also increases Type I error. In practice: - Power and sample size calculations are used in planning tests to ensure a reasonable chance of detecting meaningful changes. Practical Assumptions and Data Integrity Independence and Random Sampling Normal-data hypothesis tests rely on: - Independence: observations do not influence each other. - Random sampling: every unit has a fair chance of selection. Violations can cause misleading p-values and conclusions. Common issues: - Serial correlation in time-ordered data. - Clusters or batches where measurements are more similar within a batch. Mitigation includes: - Understanding the data collection process. - Using rational subgrouping when appropriate. - Adjusting sampling plans to reduce dependence. Outliers and Robustness Outliers can: - Distort mean and standard deviation. - Strongly affect normal-based tests, especially for small samples. Approach to outliers: - Investigate causes (measurement error, special causes). - Correct or remove clearly erroneous values, with documented justification. - Consider transformations or alternative methods if outliers reflect real, heavy-tailed behavior. Most t-tests are reasonably robust to modest non-normality when sample size is moderate and no extreme outliers are present. Practical vs Statistical Significance A statistically significant result may or may not matter in practice. Key distinctions: - Statistical significance: - Based on p-value and α. - Indicates the result is unlikely under H₀. - Practical significance: - Based on effect size and business impact. - Asks whether the difference is large enough to matter. With large samples: - Very small differences can be statistically significant but practically negligible. With small samples: - Meaningful differences may fail to reach significance due to low power. Always interpret hypothesis test results in the context of: - Specification limits, - Cost/benefit, - Process capability, - Operational constraints. Step-by-Step Logic for Hypothesis Testing with Normal Data A consistent approach helps avoid errors and misinterpretation. Typical sequence: - 1. Clarify the question - What parameter (mean, variance, proportion) is of interest? - What difference is meaningful? - 2. State H₀ and H₁ - Choose one- or two-tailed form before examining the data. - 3. Verify assumptions - Data type (continuous vs discrete). - Approximate normality for continuous data. - Independence and rational sampling. - 4. Select the appropriate test - One-sample vs two-sample vs paired. - Mean vs variance vs proportion. - z vs t vs χ² vs F. - 5. Choose significance level α - Common: 0.10, 0.05, or 0.01, depending on risk tolerance. - 6. Compute test statistic and p-value - Use sample statistics and the correct distribution. - 7. Make a decision - Compare p-value to α. - Reject or fail to reject H₀. - 8. Interpret quantitatively - Report: - Estimated effect size (e.g., difference in means). - Confidence intervals when available. - Comment on both statistical and practical significance. - 9. Document assumptions and limitations - How data were collected. - Any concerns about normality, outliers, or independence. Summary Hypothesis testing with normal data provides a rigorous, quantitative way to decide whether observed differences in means, variances, or proportions are likely due to random variation or indicate real change. Key elements include: - Correctly framing H₀ and H₁ and selecting one- or two-tailed forms. - Choosing the proper test (z, t, χ², F, or normal-approximation for proportions) based on data type, sample structure, and assumptions. - Understanding Type I and Type II errors, significance level α, and power, and how they relate to sample size and effect size. - Verifying assumptions such as normality, independence, and stable data collection methods. - Distinguishing between statistical significance and practical importance to drive sound decisions. Mastery of these concepts enables consistent, accurate hypothesis testing whenever process data can be reasonably modeled by normal distributions or their normal approximations.
Practical Case: Hypothesis Testing with Normal Data A pharmaceutical packaging line is filling 50 ml vials of saline. The specification is 50.0 ml ± 0.5 ml. Historical data show fill volumes are approximately normal. The quality manager receives a customer complaint that recent vials “seem underfilled.” Production insists the mean fill is still on target at 50.0 ml. The manager draws a random sample of 40 vials from the last 2 hours of production. Fill volume is measured using a calibrated gravimetric method. The sample’s mean volume appears slightly below 50.0 ml, and the sample standard deviation is consistent with historical data, supporting the assumption of normality. Using Hypothesis Testing with Normal Data, the manager tests whether the true mean fill has dropped below 50.0 ml. A one-sample test is run assuming normality of the fill volumes. The p-value is well below the preset significance level, so the manager concludes the mean fill is statistically lower than 50.0 ml. As a result, the line is stopped. The filling pump is recalibrated and adjusted upward. A new sample is collected after adjustment and tested in the same way; this time, the test does not indicate a mean different from 50.0 ml. The line restarts, and a short-term monitoring plan is put in place to ensure the mean remains on target. End section
Practice question: Hypothesis Testing with Normal Data A Black Belt is testing whether a new filling process has changed the mean fill weight from the historical target of 500 g. A random sample of 25 fills produces a sample mean of 507 g and a sample standard deviation of 10 g. The data are approximately normal and the population standard deviation is unknown. Which is the most appropriate test? A. One-sample Z-test for the mean B. One-sample t-test for the mean C. Paired t-test D. Two-sample t-test assuming equal variances Answer: B Reason: With normal data, unknown population standard deviation, and a single sample (n = 25) compared to a known target, the appropriate test is a one-sample t-test. Other options are incorrect because Z-test requires known σ, paired t-test requires matched pairs, and two-sample t-test requires two independent samples. --- A process historically has a mean cycle time of 12.0 minutes with a known population standard deviation of 1.2 minutes and normal distribution. A Black Belt wants to detect an increase in mean cycle time. A sample of 36 observations yields a mean of 12.4 minutes. Using α = 0.05 for a one-sided test, which conclusion is most appropriate? A. Fail to reject H0; no significant increase B. Reject H0; significant increase in mean cycle time C. Fail to reject H0; significant increase in mean cycle time D. Reject H0; significant decrease in mean cycle time Answer: B Reason: Z = (12.4 − 12.0) / (1.2/√36) = 0.4 / 0.2 = 2.0. For a one-sided test at α = 0.05, the critical Z is 1.645. Since 2.0 > 1.645, reject H0 and conclude a significant increase. Other options misinterpret the direction or the decision rule. --- An engineer compares the mean tensile strength of parts from two machines, A and B. Each machine provides 20 independent, normally distributed samples. The population variances are unknown but appear similar. Which test is most appropriate to determine whether the mean tensile strengths differ? A. Two-sample Z-test for means B. Two-sample t-test assuming unequal variances (Welch’s t-test) C. Two-sample t-test assuming equal variances (pooled t-test) D. Paired t-test for means Answer: C Reason: With two independent normal samples, unknown but similar variances, and no pairing, the pooled two-sample t-test assuming equal variances is appropriate. Other options mis-specify the variance structure, assume known σ, or incorrectly assume pairing. --- A Black Belt tests whether a new training reduces the average handling time below the current standard of 8.0 minutes. The null and alternative hypotheses are: H0: μ = 8.0 H1: μ < 8.0 Which statement best describes the Type II error in this context? A. Concluding μ < 8.0 when it is actually equal to 8.0 B. Concluding μ > 8.0 when it is actually equal to 8.0 C. Failing to conclude μ < 8.0 when it is actually less than 8.0 D. Failing to conclude μ > 8.0 when it is actually greater than 8.0 Answer: C Reason: Type II error is failing to reject H0 when H1 is true; here, not detecting the reduction (μ < 8.0) when it actually exists. Other options confuse Type I vs. Type II errors or use the wrong direction of effect. --- A supplier claims its product has a mean diameter of 10.00 mm. A Black Belt takes a sample of 16 parts, finds a mean of 9.94 mm and a sample standard deviation of 0.08 mm. Data are normal and σ is unknown. For a two-sided test at α = 0.05, what is the correct test statistic and decision if the critical t values are ±2.131? A. t = −3.00; reject H0 B. t = −1.50; fail to reject H0 C. t = −3.00; fail to reject H0 D. t = −1.50; reject H0 Answer: A Reason: t = (9.94 − 10.00) / (0.08/√16) = (−0.06) / 0.02 = −3.00. Since −3.00 < −2.131, reject H0 and conclude the mean differs from 10.00 mm. Other options contain incorrect calculations or misapply the critical values to the decision.
