24h 0m 0s
🔥 Flash Sale -50% on Mock exams ! Use code 6sigmatool50 – Offer valid for 24 hours only! 🎯
3.5.8 Chi-Squared (Contingency Tables)
Chi-Squared (Contingency Tables) Introduction Chi-squared tests on contingency tables are used to investigate relationships between categorical variables. They answer questions such as: - Are two categorical variables statistically independent? - Does the observed distribution of categories differ from what we would expect by chance? This article focuses on the chi-squared test of independence using contingency tables, with all supporting concepts required to use, interpret, and explain the method correctly. --- Categorical Data and Contingency Tables Types of Categorical Variables A chi-squared test for contingency tables is applied to discrete categories, not continuous measurements. - Nominal variables: Categories with no inherent order (for example, defect type A/B/C, supplier X/Y). - Ordinal variables: Categories with a logical order (for example, low/medium/high), treated as categories without using their order in the chi-squared test. The data must be available as counts (frequencies) in each category combination. Structure of a Contingency Table A contingency table (cross-tabulation) summarizes how often combinations of two categorical variables occur. Example: 2×3 table (2 rows, 3 columns) - Rows: Category levels of variable 1. - Columns: Category levels of variable 2. - Cells: Count of observations in each row–column combination. - Marginal totals: Row sums, column sums. - Grand total: Sum of all cell counts. The chi-squared test uses: - Observed frequencies: Actual counts from data. - Expected frequencies: Counts predicted under the assumption of independence. --- Chi-Squared Test of Independence Purpose of the Test The chi-squared test of independence evaluates whether two categorical variables are associated. - Null hypothesis (H₀): The variables are independent (no association). - Alternative hypothesis (H₁): The variables are not independent (there is an association). The test compares observed cell counts with expected counts calculated under H₀. Assumptions and Data Requirements For valid conclusions from the chi-squared test: - Data type - Each observation belongs to exactly one row category and one column category. - Cell entries are counts of independent observations, not percentages or rates. - Sampling - Data come from a random sample or a process that can be approximated as random. - Each observation is independent of others. - Expected cell counts - All expected counts should be reasonably large. - A common rule: at least 80% of expected counts ≥ 5 and no expected count < 1. - Table structure - Any r×c table (r rows, c columns) is allowed, including 2×2 tables. If assumptions about expected counts are not met, consider combining categories where conceptually appropriate to increase expected frequencies. --- Calculating Expected Counts Formula for Expected Frequencies Under the null hypothesis of independence, the expected frequency for each cell is: - Expected count for cell (i, j) Eᵢⱼ = (Row i total × Column j total) / Grand total Where: - i indexes the row. - j indexes the column. This uses the idea that if the variables are independent, the joint probability factors into the product of the marginal probabilities. Example Structure (Conceptual) For a 2×3 table: - Row totals: R₁, R₂ - Column totals: C₁, C₂, C₃ - Grand total: N Then: - E₁₁ = (R₁ × C₁) / N - E₁₂ = (R₁ × C₂) / N - E₁₃ = (R₁ × C₃) / N - E₂₁ = (R₂ × C₁) / N - E₂₂ = (R₂ × C₂) / N - E₂₃ = (R₂ × C₃) / N These expected counts are compared with the observed counts to compute the chi-squared test statistic. --- Chi-Squared Test Statistic Formula The chi-squared test statistic measures how far the observed frequencies deviate from expected frequencies: - Chi-squared statistic χ² = Σ Σ ( (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ ) Where: - Oᵢⱼ = observed frequency in cell (i, j) - Eᵢⱼ = expected frequency in cell (i, j) - The summation is over all rows i and columns j. Large values of χ² indicate stronger evidence against the null hypothesis of independence. Degrees of Freedom The degrees of freedom (df) for an r×c table: - df = (r − 1) × (c − 1) Where: - r = number of rows - c = number of columns The df and χ² value together determine the p-value using the chi-squared distribution. --- P-Value and Decision Making Interpreting the P-Value The p-value is the probability, assuming H₀ is true, of obtaining a chi-squared statistic at least as large as the one observed. - Small p-value (typically < chosen alpha, for example 0.05) - Evidence against independence. - Conclude there is a statistically significant association between the variables. - Large p-value - Insufficient evidence to reject independence. - Conclude the data are consistent with independence. The choice of significance level (alpha) should be set before examining the data. Hypothesis Test Steps (Conceptual) - Define the variables and categories. - State the hypotheses: - H₀: Variables are independent. - H₁: Variables are not independent. - Construct the contingency table of observed counts. - Compute: - Row totals, column totals, grand total. - Expected counts for each cell. - Calculate the chi-squared statistic and degrees of freedom. - Determine the p-value from the chi-squared distribution. - Compare p-value with alpha: - If p ≤ alpha: reject H₀ (evidence of association). - If p > alpha: fail to reject H₀ (no evidence of association). - Interpret the result in context of the process or problem. --- Practical Use and Interpretation Checking Practical vs Statistical Significance A statistically significant chi-squared result indicates the presence of an association, but not its strength or practical importance. - Statistical significance - Driven by effect size and sample size. - Practical significance - Related to whether the association is meaningful for decisions or improvements. A large sample can make small, unimportant differences statistically significant; always interpret results with process knowledge. Identifying Contributing Cells Once a test is significant, identify which cells contribute most to χ². Methods: - Cell contributions - For each cell: (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ - Larger values indicate cells where observed counts deviate most from expectation. - Standardized residuals (conceptual, not formula-heavy here) - Residuals scaled to help judge unusual cells. - Large absolute values indicate cells that depart substantially from independence. Use these diagnostics to understand patterns, for example: - A category occurring more often than expected with another category. - A category occurring less often than expected with another category. 2×2 Tables and Continuity Correction For 2×2 tables, some software applies a continuity correction (Yates’ correction), especially for smaller samples. - Continuity correction reduces the absolute difference between observed and expected before squaring. - It makes the test more conservative (harder to declare significance) in small samples. In larger samples, the impact of the continuity correction becomes negligible. --- Effect Size for Categorical Association Why Effect Size Matters The chi-squared test indicates whether an association exists but not how strong it is. Effect size measures help quantify the strength of association. Effect size is especially useful when: - Comparing associations across different tables or studies. - Assessing practical impact beyond statistical significance. Common Effect Size Measures Effect size measures for contingency tables are based on χ² and sample size N. - Phi coefficient (φ) - Used for 2×2 tables. - Range: 0 (no association) to 1 (perfect association in binary case). - Cramer’s V - Used for larger r×c tables. - V = sqrt( χ² / [N × (k − 1)] ), where k is the smaller of r or c. - Range: 0 (no association) to 1 (strong association), though with many categories values rarely reach 1. Interpretation of effect size should consider context and domain expectations; there are no universal cutoffs. --- Data Preparation and Common Pitfalls Correct Data Coding Ensure categorical variables are properly defined: - Each category is mutually exclusive. - Category labels are consistent (no duplicates due to spelling or formatting). - Missing or “not applicable” responses are considered explicitly: - Either excluded with justification, or - Treated as a separate category if meaningful. Combining Categories When expected counts are too small: - Combine categories that are conceptually similar, such as merging rarely used categories. - Avoid combining categories only to achieve significance; the rationale must be grounded in the meaning of the categories. Repeatedly modifying categories to find significance can lead to misleading results. Independence of Observations The test assumes independence of observations: - Do not include repeated measurements from the same unit as separate independent observations unless justified. - If observations are paired or clustered, chi-squared results may be distorted. When independence is questionable, the conclusions from the chi-squared test should be treated cautiously. --- Using Software and Output Interpretation Typical Software Output Elements Most statistical tools produce: - Contingency table with observed frequencies. - Expected frequencies for each cell. - Chi-squared statistic. - Degrees of freedom. - P-value. - Sometimes: - Cell contributions. - Standardized residuals. - Effect size (φ or Cramer’s V). How to Read the Output Focus on: - Chi-squared statistic and p-value - Decide whether to reject or fail to reject independence. - Cells with large deviations - Use expected counts and residuals to interpret patterns. - Effect size - Gauge if the association, while statistically significant, is weak, moderate, or strong in practical terms. The final interpretation must link the statistical result to the process or problem under investigation. --- Limitations of Chi-Squared Tests Nature of the Information Chi-squared tests on contingency tables: - Indicate association, not causation. - Are sensitive to sample size: - Very large N may detect trivial associations. - Very small N may fail to detect meaningful associations. Data Restrictions The chi-squared test is not appropriate when: - Data are not counts (for example, percentages that do not sum to a total N). - Expected frequencies are very low and cannot be remedied by valid category combination. - Observations are not independent (for example, repeated measures on the same unit). In such cases, alternative methods may be required, but those methods lie outside the scope of this article. --- Summary The chi-squared test for contingency tables assesses whether two categorical variables are independent by comparing observed cell counts to expected counts under independence. It relies on: - Properly constructed contingency tables of counts. - Calculation of expected frequencies from row and column totals. - The chi-squared statistic, based on squared deviations between observed and expected counts. - Degrees of freedom determined by table dimensions. - P-values to decide whether an association is statistically significant. Effect size measures, such as phi and Cramer’s V, help quantify the strength of association, while examination of cell contributions and residuals reveals which category combinations drive the result. Correct data preparation, attention to assumptions (independence, adequate expected counts), and careful interpretation in the context of practical decision-making are essential for valid and useful application of chi-squared tests on contingency tables.
Practical Case: Chi-Squared (Contingency Tables) A hospital wants to know if a new text-message reminder system changes outpatient appointment no‑show rates across different age groups. They roll out the reminder system in two clinics for one month. At check-in, staff record whether each patient: - received the reminder (Yes/No), and - showed up or was a no-show (Show/No-show), along with the patient’s age group (Under 30, 30–60, Over 60). The quality improvement team builds a contingency table with: - Rows: Reminder (Yes, No) - Columns: Attendance (Show, No-show) They repeat this for each age group. They run a Chi-Squared test of independence on each age-group table to see whether reminders and attendance are associated. Results show: - For patients 30–60, there is a statistically significant association: those who received reminders have a lower no-show rate. - For under 30 and over 60, no significant association is found. The hospital decides to: - Make reminders mandatory for 30–60 age group appointments. - Pilot alternative interventions (phone calls, app notifications) for the other age groups, to be later tested with new contingency tables and Chi-Squared analysis. End section
Practice question: Chi-Squared (Contingency Tables) A Black Belt is evaluating whether defect type is associated with production shift using a 3×2 contingency table (3 defect types, 2 shifts). Which Chi-Squared test is most appropriate? A. One-sample Chi-Squared goodness-of-fit test B. Two-sample t-test for means C. Chi-Squared test of independence D. Chi-Squared test for equal variances Answer: C Reason: The Chi-Squared test of independence is used to determine whether two categorical variables (defect type and shift) are associated, based on a contingency table. Other options are not correct because A is for one categorical variable vs a theoretical distribution, B is for continuous means, and D is not a standard Chi-Squared contingency-table application. --- A 2×3 contingency table is used to assess the relationship between machine (M1, M2) and product disposition (Accept, Rework, Scrap). The Chi-Squared test statistic is 5.99. The significance level is α = 0.05. The degrees of freedom are: A. 1 B. 2 C. 3 D. 5 Answer: B Reason: Degrees of freedom for a contingency table are (rows − 1) × (columns − 1) = (2 − 1) × (3 − 1) = 1 × 2 = 2. Other options are incorrect because they do not follow the (r−1)(c−1) rule for a 2×3 table. --- A Black Belt constructs a 2×2 contingency table to test whether a new training program affects whether operators follow a standard work instruction (Yes/No). The expected count in one cell is 3. Which action is most appropriate? A. Proceed with standard Chi-Squared test without changes B. Combine categories or redesign the study to increase cell counts C. Ignore the small cell and analyze only the other three cells D. Use a two-sample proportion z-test instead of any Chi-Squared method Answer: B Reason: Chi-Squared assumptions require adequate expected counts; with an expected count as low as 3, combining categories or increasing sample size is appropriate to meet assumptions. Other options are not best because A violates assumptions, C discards data and biases results, and D does not resolve the low-count issue in a 2×2 with small expected frequencies. --- A Black Belt analyzes the association between supplier (A, B, C) and lot acceptance (Pass, Fail). The Chi-Squared test of independence yields p = 0.012 at α = 0.05. Which conclusion is most appropriate? A. Fail to reject H0; supplier and lot acceptance are independent B. Reject H0; supplier and lot acceptance are associated C. Fail to reject H0; there is no evidence of any relationship D. Conclude causation: supplier choice causes lot failure Answer: B Reason: With p < α, the null hypothesis of independence is rejected, indicating a statistically significant association between supplier and lot acceptance. Other options are incorrect because A and C misinterpret p < α, and D incorrectly claims causation from an association test. --- A Black Belt is analyzing a 4×3 contingency table (4 regions, 3 defect categories). The observed and expected counts are shown, and the Chi-Squared statistic is computed as χ² = 11.4. At α = 0.05, the critical Chi-Squared value with the correct degrees of freedom is approximately 12.59. What is the best decision? A. Reject H0; defect category depends on region B. Fail to reject H0; no statistically significant association C. Increase α to 0.10 and then reject H0 D. Conclude that the sample size is too small and results are invalid Answer: B Reason: Degrees of freedom are (4−1)(3−1) = 3×2 = 6; with χ² = 11.4 < 12.59, we fail to reject H0 at α = 0.05, indicating no statistically significant association. Other options are not best because A contradicts the comparison to the critical value, C changes α post hoc (inappropriate for formal inference), and D is not supported solely by χ² < critical.
