top of page

4.4.2 Linear & Quadratic Mathematical Models

Linear & Quadratic Mathematical Models Introduction Linear and quadratic models are core tools for describing, analyzing, and improving relationships between variables. They are widely used for: - Modeling cause–effect relationships - Predicting process performance - Quantifying the impact of input changes on outputs - Supporting optimization and decision-making This article builds from basic form and interpretation through estimation, diagnostics, and practical use in improvement projects, focusing only on linear and quadratic models and closely related concepts. --- Foundations of Mathematical Modeling Variables, Parameters, and Error A mathematical model expresses how a response (output) depends on one or more predictors (inputs): - Response variable (Y): outcome of interest (e.g., cycle time, defect rate) - Predictor variable (X): factor thought to influence Y (e.g., temperature, speed) - Parameters: unknown constants to be estimated from data (e.g., slopes, intercept) - Error term (ε): random deviation of observations from the model A general model form is: - Deterministic part: describes the systematic relationship - Example: ( f(X) = \beta0 + \beta1 X + \beta_2 X^2 ) - Stochastic part: random variation not explained by the predictors - Complete model: ( Y = f(X) + \varepsilon ) Functional vs Statistical Models - Functional model: exact relationship assumed, no random error - Example: geometric formulas, physical laws - Statistical model: includes random variation - Example: regression models with error term ε Linear and quadratic regression models are statistical models: parameters are estimated from data, and predictions include uncertainty. --- Linear Models Form of a Simple Linear Model A simple linear model with one predictor X has the form: [ Y = \beta0 + \beta1 X + \varepsilon ] - (\beta_0) (intercept): predicted value of Y when X = 0 - (\beta_1) (slope): change in mean Y for a 1-unit increase in X - (\varepsilon): random error term, typically assumed to have: - Mean 0 - Constant variance - Independence - Often approximated as normally distributed for inference For multiple predictors (X1, X2, \dots, X_k): [ Y = \beta0 + \beta1 X1 + \dots + \betak X_k + \varepsilon ] Estimation by Least Squares Parameters are typically estimated by Ordinary Least Squares (OLS): - Objective: choose (\hat{\beta}0, \hat{\beta}1, \dots) to minimize the sum of squared residuals: [ \text{SSE} = \sum{i=1}^{n} (yi - \hat{y}_i)^2 ] - Residual for point i: [ ei = yi - \hat{y}_i ] Key ideas: - The fitted line passes through the “center” of the data in an optimal way - Residuals measure the unexplained portion of each observation - Sum of residuals is zero for a model with intercept Interpretation of Linear Coefficients For the simple linear model: [ \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X ] - Intercept (\hat{\beta}_0): - Predicted Y when X = 0 - Often extrapolative and may have limited physical meaning - Slope (\hat{\beta}_1): - Estimated change in the mean of Y for a one-unit increase in X - Sign indicates direction: - Positive: Y increases as X increases - Negative: Y decreases as X increases - Magnitude indicates strength of effect per unit change In multiple linear regression, each slope is interpreted holding other predictors constant. Goodness of Fit: R² and Adjusted R² To assess how well a linear model fits: - Total Sum of Squares (SST): [ \text{SST} = \sum (y_i - \bar{y})^2 ] - Regression Sum of Squares (SSR): [ \text{SSR} = \sum (\hat{y}_i - \bar{y})^2 ] - Error Sum of Squares (SSE): [ \text{SSE} = \sum (yi - \hat{y}i)^2 ] - Decomposition: [ \text{SST} = \text{SSR} + \text{SSE} ] Coefficient of determination (R²): [ R^2 = \frac{\text{SSR}}{\text{SST}} = 1 - \frac{\text{SSE}}{\text{SST}} ] - Proportion of variation in Y explained by the model - Between 0 and 1; higher values indicate better explanatory power Adjusted R²: - Adjusts R² for the number of predictors and sample size - Prevents artificial inflation of R² by adding non-informative variables - Used mainly in multiple regression to compare models with different numbers of predictors Linear Model Assumptions and Diagnostics Core assumptions: - Linearity: mean of Y is a linear function of predictors - Independence: errors are independent - Homoscedasticity: errors have constant variance for all fitted values - Normality (for inference): errors are approximately normally distributed Common diagnostics: - Residual vs fitted plot: - Should show random scatter around zero - Systematic patterns (curvature, funnel shapes) suggest: - Nonlinearity - Non-constant variance - Normal Q–Q plot of residuals: - Points should follow a straight line - Strong deviations suggest non-normality - Outlier and leverage analyses: - Identify unusual observations or influential points When residual plots show curvature, a linear model may be inadequate; a quadratic model can often provide a better fit. --- Quadratic Models Form of a Quadratic Model A quadratic model in one predictor X has the form: [ Y = \beta0 + \beta1 X + \beta_2 X^2 + \varepsilon ] - (\beta_0): intercept - (\beta_1): linear effect of X - (\beta_2): curvature term; determines the shape of the parabola Properties: - If (\beta_2 > 0): parabola opens upward (U-shaped) - If (\beta_2 < 0): parabola opens downward (inverted U) - Allows modeling of: - Diminishing returns - Optimum points (minima or maxima) - Curved response patterns often found in real processes Relationship to Linear Regression Even though the model includes (X^2), it is still a linear model in the parameters: [ Y = \beta0 + \beta1 X + \beta_2 X^2 + \varepsilon ] Define new predictors: - (Z_1 = X) - (Z_2 = X^2) Then: [ Y = \beta0 + \beta1 Z1 + \beta2 Z_2 + \varepsilon ] This means: - OLS methods for linear regression apply directly - Software handles quadratic models as a special case of multiple linear regression with transformed predictors Vertex and Optimum Point The quadratic model has a vertex where the predicted response is minimum (if (\beta2 > 0)) or maximum (if (\beta2 < 0)). For: [ \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X + \hat{\beta}_2 X^2 ] - X-coordinate of the vertex: [ X^* = -\frac{\hat{\beta}1}{2\hat{\beta}2} ] - Predicted optimum response: [ \hat{Y}^ = \hat{\beta}0 + \hat{\beta}1 X^ + \hat{\beta}_2 {X^*}^2 ] Interpretation: - If (\hat{\beta}_2 < 0): vertex is a maximum; (X^*) gives the peak predicted response - If (\hat{\beta}_2 > 0): vertex is a minimum; (X^*) gives the lowest predicted response This is central for identifying operating conditions that optimize performance. Interpretation of Quadratic Coefficients For: [ \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X + \hat{\beta}_2 X^2 ] - (\hat{\beta}_0): - Predicted Y at X = 0 (may be extrapolative) - (\hat{\beta}_1): - Local linear effect near X = 0 - Not constant across all X when (\hat{\beta}_2 \ne 0) - (\hat{\beta}_2): - Curvature: magnitude indicates degree of bending - Sign determines whether the response curves up or down The instantaneous rate of change of Y with respect to X is the derivative: [ \frac{d\hat{Y}}{dX} = \hat{\beta}1 + 2\hat{\beta}2 X ] - Shows that the effect of X on Y changes with the level of X - Rate of change is zero at the vertex (X^* = -\hat{\beta}1 / (2\hat{\beta}2)) --- Building Linear and Quadratic Models from Data Data Requirements and Preparation Key considerations: - Measurement quality: - Reliable, valid measurements of Y and X - Stable measurement system - Range of X: - Must include enough spread to detect linear or quadratic patterns - For quadratic effects, include values on both sides of the potential optimum if feasible - Sample size: - Should be sufficient to estimate parameters and assess model adequacy - More parameters (e.g., adding (X^2)) require more data Data preparation: - Check for: - Missing values - Obvious errors or outliers - Understand units and scaling: - Strongly different scales across predictors can affect computations and diagnostics - Centering or standardizing X can sometimes improve interpretability and reduce multicollinearity with (X^2) Model Selection: Linear vs Quadratic Choosing between a linear and quadratic model involves: - Subject-matter knowledge: - Physical or process understanding may suggest curvature or optimum - Graphical examination: - Scatter plot of Y vs X - Overlaid linear and quadratic fits for comparison - Statistical comparison (for nested models): - Linear model: (Y = \beta0 + \beta1 X + \varepsilon) - Quadratic model: (Y = \beta0 + \beta1 X + \beta_2 X^2 + \varepsilon) - Test significance of (\beta_2): - If (\beta_2) is statistically nonzero, curvature is supported - Model parsimony: - Prefer the simpler model if it fits adequately - Use quadratic when it clearly improves fit and interpretation Parameter Estimation and Inference OLS estimation provides: - Point estimates: - (\hat{\beta}0, \hat{\beta}1, \hat{\beta}_2, \dots) - Standard errors: - Measure of estimation uncertainty - t-tests for individual coefficients: - Hypotheses: - (H0: \betaj = 0) - (Ha: \betaj \ne 0) - Large |t| and small p-values suggest the coefficient is significantly different from 0 - Confidence intervals for parameters: - Ranges of plausible values for each coefficient For the overall model: - ANOVA F-test: - Tests whether the model explains a significant portion of the variation in Y - Compares model with predictors against a null model with only an intercept --- Model Validation and Diagnostics Residual Analysis for Linear and Quadratic Fits Residuals (ei = yi - \hat{y}_i) are central to assessing model adequacy. Key plots: - Residuals vs fitted values: - Look for: - Random scatter around zero: acceptable - Curved patterns: may indicate that a linear model is insufficient - Fanning out or narrowing: signals non-constant variance - Residuals vs X: - Useful specifically for checking overlooked curvature - If linear model shows systematic curvature, a quadratic model may be appropriate - Normal Q–Q plot of residuals: - Evaluate approximate normality - Important for valid p-values and confidence intervals Comparing Linear and Quadratic Residuals When both models are fit: - Compare: - Residual patterns - SSE values - R² and adjusted R² - Assess: - Whether the quadratic significantly reduces SSE compared to linear - Whether residual plots for the quadratic show improved randomness and constant variance If adding the quadratic term: - Substantially improves fit - Removes systematic patterns in residuals - Yields significant (\beta_2) then the quadratic model is generally preferred. Overfitting and Appropriate Complexity Model complexity should match data support and purpose: - Overfitting: - Model captures random noise instead of the underlying relationship - Symptoms: - Very high R² with poor predictive performance on new data - Unstable coefficient estimates and wide confidence intervals - Balanced approach: - Use linear model when it adequately describes the data - Move to quadratic when there is clear evidence of curvature - Avoid unnecessarily high-degree polynomials without strong justification --- Using Linear and Quadratic Models for Prediction Point Predictions Given fitted model (\hat{Y} = \hat{\beta}0 + \hat{\beta}1 X) or (\hat{Y} = \hat{\beta}0 + \hat{\beta}1 X + \hat{\beta}_2 X^2): - For a specified X value: - Compute the fitted value (\hat{Y}) by substitution - In multiple regression: - Substitute the set of predictor values (X1, X2, \dots) These point predictions represent the estimated mean response at that X (and other predictors). Prediction Intervals and Uncertainty Predictions have uncertainty from: - Estimation error in the coefficients - Random error in new observations Key intervals: - Confidence interval for mean response at X: - Reflects uncertainty in the expected value of Y at that X - Narrower than prediction intervals - Prediction interval for an individual Y at X: - Includes both: - Uncertainty in the model parameters - Random variation in a specific observation - Wider than the confidence interval for the mean As X moves far from the center of the observed data, both intervals typically widen, reflecting increased uncertainty in extrapolation. Extrapolation Risks Using the model outside the data range can be misleading: - For linear models: - Assuming the linear trend continues indefinitely may be unrealistic - For quadratic models: - Predicted values can grow very large (positive or negative) outside the observed X range - Apparent optimum may be outside the practical or physically meaningful range To manage risk: - Restrict interpretation primarily to the range of X used for fitting - Use domain knowledge to judge plausibility of extrapolated predictions - When optimal conditions are near the edge of the studied range, consider collecting more data beyond that edge if feasible --- Practical Interpretation and Application Understanding Effect Size and Sensitivity For a linear model: [ \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X ] - Sensitivity of Y to X: - (\hat{\beta}_1) units of change in Y per one unit of X - Improvement thinking: - How much must X change to achieve a desired change in Y? - Is this change operationally feasible? For a quadratic model: [ \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X + \hat{\beta}_2 X^2 ] - Sensitivity depends on X: [ \frac{d\hat{Y}}{dX} = \hat{\beta}1 + 2\hat{\beta}2 X ] - Interpretation: - Near the vertex, the rate of change may be small (plateau) - Far from the vertex, the rate of change may be larger (more sensitive) Identifying Optimum Operating Conditions Quadratic models are especially useful for locating settings that optimize performance. Steps: - Fit quadratic model: ( \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X + \hat{\beta}_2 X^2 ) - Compute: [ X^* = -\frac{\hat{\beta}1}{2\hat{\beta}2} ] - Evaluate: - Whether (\hat{\beta}2 < 0) (for a maximum) or (\hat{\beta}2 > 0) (for a minimum), depending on the objective - If (X^*) lies within the studied and practically feasible range - Calculate predicted optimum response: [ \hat{Y}^ = \hat{\beta}0 + \hat{\beta}1 X^ + \hat{\beta}_2 {X^*}^2 ] If the optimum is near the boundary of the tested region: - Consider extending the range of X and collecting additional data - Confirm that the observed optimum is not an artifact of extrapolation Multiple Predictors and Second-Order Models When more than one predictor is involved, a second-order model can include: [ Y = \beta0 + \sum{i} \betai Xi + \sum{i} \beta{ii} X_i^2 + \sum{i

Practical Case: Linear & Quadratic Mathematical Models Context A bottling plant is experiencing frequent overfill and underfill of 500 ml drink bottles on one production line, causing rework and customer complaints. Problem Management wants to adjust conveyor speed and filler valve time to: - Minimize fill-level variation - Maintain target throughput They suspect a simple adjustment rule is needed for operators, usable during shift changes and minor product switches. Application of Linear & Quadratic Models Step 1 – Linear model for planning The team collects data for: - Conveyor speed (x) - Bottles filled per minute (throughput, T) A simple linear model is fitted: T = a + b·x This model is used to select a target conveyor speed that meets the daily output goal, giving operators a first-pass setpoint for each product. Step 2 – Quadratic model for optimization Next, the team studies: - Conveyor speed (x) - Average squared deviation from 500 ml (fill error, E) Data show that error rises at very low and very high speeds. A quadratic model is fitted: E = c + d·x + e·x² Using this model, they compute the conveyor speed x that minimizes E while checking that T from the linear model at x still meets the throughput requirement. Step 3 – Simple operating rule The plant standardizes: - A small table of recommended speeds by bottle size (from the quadratic optimum) - A check that predicted throughput (from the linear model) stays above the minimum requirement Operators adjust only speed; filler valve time remains fixed for each product, based on existing settings. Result - Fill-related rework drops significantly. - Throughput remains at target level. - Operators use a one-page job aid with the linear-based throughput check and the quadratic-based optimal speed, enabling consistent settings across shifts without needing to understand the underlying math. End section

Practice question: Linear & Quadratic Mathematical Models A Black Belt is modeling the relationship between cycle time (Y) and batch size (X). A scatter plot shows a strong curved pattern where cycle time decreases up to a certain batch size and then increases beyond that point. Which model is most appropriate to test first? A. Simple linear regression model B. Quadratic regression model C. Exponential regression model D. Logistic regression model Answer: B Reason: The data pattern shows a single turning point (U-shaped or inverted U), which is characteristic of a quadratic relationship (Y = β0 + β1X + β2X²). Quadratic regression directly models curvature with one inflection-free turning point, aligning with the observed pattern. Other options include models that assume monotonic relationships (A, C) or binary outcomes (D), which do not match the continuous, curved behavior described. --- In evaluating a quadratic regression model Y = β0 + β1X + β2X², the Black Belt finds β1 is statistically significant (p < 0.01) and β2 is not (p = 0.72). Residual plots show no remaining curvature. What is the most appropriate next step? A. Retain the quadratic term because it was part of the original model B. Remove the quadratic term and refit a linear regression model C. Add an X³ term to check for higher-order curvature D. Transform Y using a logarithmic function Answer: B Reason: A non-significant β2 with no curvature in residuals indicates the quadratic term does not materially contribute to explaining Y. The parsimonious and statistically justified choice is to refit a simpler linear model without X². Keeping an unnecessary term (A), adding higher-order complexity without evidence (C), or transforming Y without an identified need (D) are not supported by the current diagnostics. --- A Black Belt fits a quadratic model of the form Y = β0 + β1X + β2X² for cost (Y) versus production rate (X). The fitted equation is: Ŷ = 120 – 10X + 0.5X². Which interpretation about cost behavior is most appropriate? A. Cost decreases indefinitely as production rate increases B. Cost increases at a constant rate with production rate C. Cost has a minimum at some production rate, then increases afterward D. Cost has a maximum at some production rate, then decreases afterward Answer: C Reason: With β1 < 0 and β2 > 0, the quadratic function is U-shaped: costs initially decrease with increasing X (negative linear term) until the vertex, then increase as X grows (positive quadratic term). This indicates a minimum cost at an optimal production rate. Options A and B describe monotonic or linear relationships inconsistent with a quadratic; D would require β2 < 0, which is not the case here. --- A team compares two models for predicting defect rate (Y) from machine speed (X): Model 1 (linear): Y = β0 + β1X, R² = 0.71 Model 2 (quadratic): Y = β0 + β1X + β2X², R² = 0.75, adjusted R² increases only from 0.70 to 0.71 and β2 has p = 0.08. Residual plots from both models look similar. What should the Black Belt conclude? A. Prefer Model 2 because it has a higher R² B. Prefer Model 2 because it is more complex C. Prefer Model 1 because the incremental benefit of the quadratic term is minimal D. Reject both models and fit a cubic polynomial Answer: C Reason: The quadratic term provides only a marginal improvement in adjusted R² and is not strongly significant (p = 0.08), with no visible residual improvement. Applying the principle of parsimony, the simpler linear model is preferred. Choosing Model 2 based solely on R² (A, B) ignores overfitting risk; moving to a cubic model (D) is unjustified without evidence of remaining curvature. --- A Black Belt uses a quadratic regression model to relate throughput (Y) to staffing level (X) and obtains the fitted equation: Ŷ = 50 + 12X – X², with X measured in number of operators. To find the staffing level that maximizes predicted throughput within the studied range, which calculation is correct? A. Set dŶ/dX = 0 ⇒ 12 – 2X = 0 ⇒ X = 6 B. Set dŶ/dX = 0 ⇒ 12 – X² = 0 ⇒ X = √12 C. Evaluate Ŷ at the minimum and maximum observed X only D. Select the staffing level that gave the highest observed Y in the sample Answer: A Reason: For Ŷ = 50 + 12X – X², the derivative is dŶ/dX = 12 – 2X. Setting it to zero gives X = 6 as the vertex, which is the maximum because the coefficient of X² is negative (concave down). This is the correct analytic method for optimizing a quadratic model. Options B, C, and D ignore or misapply the standard vertex/derivative-based optimization method for a quadratic function.

bottom of page