Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10 .

  • Payment Plans
  • Product List
  • Partnerships

AnalystPrep

  • Try Free Trial
  • Study Packages
  • Levels I, II & III Lifetime Package
  • Video Lessons
  • Study Notes
  • Practice Questions
  • Levels II & III Lifetime Package
  • About the Exam
  • About your Instructor
  • Part I Study Packages
  • Part I & Part II Lifetime Package
  • Part II Study Packages
  • Exams P & FM Lifetime Package
  • Quantitative Questions
  • Verbal Questions
  • Data Insight Questions
  • Live Tutoring
  • About your Instructors
  • EA Practice Questions
  • Data Sufficiency Questions
  • Integrated Reasoning Questions

Hypothesis Testing in Regression Analysis

Hypothesis Testing in Regression Analysis

Hypothesis testing is used to confirm if the estimated regression coefficients bear any statistical significance.  Either the confidence interval approach or the t-test approach can be used in hypothesis testing. In this section, we will explore the t-test approach.

The t-test Approach

The following are the steps followed in the performance of the t-test:

  • Set the significance level for the test.
  • Formulate the null and the alternative hypotheses.

$$t=\frac{\widehat{b_1}-b_1}{s_{\widehat{b_1}}}$$

\(b_1\) = True slope coefficient.

\(\widehat{b_1}\) = Point estimate for \(b_1\)

\(b_1 s_{\widehat{b_1\ }}\) = Standard error of the regression coefficient.

  • Compare the absolute value of the t-statistic to the critical t-value (t_c). Reject the null hypothesis if the absolute value of the t-statistic is greater than the critical t-value i.e., \(t\ >\ +\ t_{critical}\ or\ t\ <\ –t_{\text{critical}}\).

Example: Hypothesis Testing of the Significance of Regression Coefficients

An analyst generates the following output from the regression analysis of inflation on unemployment:

$$\small{\begin{array}{llll}\hline{}& \textbf{Regression Statistics} &{}&{}\\ \hline{}& \text{Multiple R} & 0.8766 &{} \\ {}& \text{R Square} & 0.7684 &{} \\ {}& \text{Adjusted R Square} & 0.7394 & {}\\ {}& \text{Standard Error} & 0.0063 &{}\\ {}& \text{Observations} & 10 &{}\\ \hline {}& & & \\ \hline{} & \textbf{Coefficients} & \textbf{Standard Error} & \textbf{t-Stat}\\ \hline \text{Intercept} & 0.0710 & 0.0094 & 7.5160 \\\text{Forecast (Slope)} & -0.9041 & 0.1755 & -5.1516\\ \hline\end{array}}$$

At the 5% significant level, test the null hypothesis that the slope coefficient is significantly different from one, that is,

$$ H_{0}: b_{1} = 1\ vs. \ H_{a}: b_{1}≠1 $$

The calculated t-statistic, \(\text{t}=\frac{\widehat{b_{1}}-b_1}{\widehat{S_{b_{1}}}}\) is equal to:

$$\begin{align*}\text{t}& = \frac{-0.9041-1}{0.1755}\\& = -10.85\end{align*}$$

The critical two-tail t-values from the table with \(n-2=8\) degrees of freedom are:

$$\text{t}_{c}=±2.306$$

hypothesis test regression analysis

Notice that \(|t|>t_{c}\) i.e., (\(10.85>2.306\))

Therefore, we reject the null hypothesis and conclude that the estimated slope coefficient is statistically different from one.

Note that we used the confidence interval approach and arrived at the same conclusion.

Question Neeth Shinu, CFA, is forecasting price elasticity of supply for a certain product. Shinu uses the quantity of the product supplied for the past 5months as the dependent variable and the price per unit of the product as the independent variable. The regression results are shown below. $$\small{\begin{array}{lccccc}\hline \textbf{Regression Statistics} & & & & & \\ \hline \text{Multiple R} & 0.9971 & {}& {}&{}\\ \text{R Square} & 0.9941 & & & \\ \text{Adjusted R Square} & 0.9922 & & & & \\ \text{Standard Error} & 3.6515 & & & \\ \text{Observations} & 5 & & & \\ \hline {}& \textbf{Coefficients} & \textbf{Standard Error} & \textbf{t Stat} & \textbf{P-value}\\ \hline\text{Intercept} & -159 & 10.520 & (15.114) & 0.001\\ \text{Slope} & 0.26 & 0.012 & 22.517 & 0.000\\ \hline\end{array}}$$ Which of the following most likely reports the correct value of the t-statistic for the slope and most accurately evaluates its statistical significance with 95% confidence?     A. \(t=21.67\); slope is significantly different from zero.     B. \(t= 3.18\); slope is significantly different from zero.     C. \(t=22.57\); slope is not significantly different from zero. Solution The correct answer is A . The t-statistic is calculated using the formula: $$\text{t}=\frac{\widehat{b_{1}}-b_1}{\widehat{S_{b_{1}}}}$$ Where: \(b_{1}\) = True slope coefficient \(\widehat{b_{1}}\) = Point estimator for \(b_{1}\) \(\widehat{S_{b_{1}}}\) = Standard error of the regression coefficient $$\begin{align*}\text{t}&=\frac{0.26-0}{0.012}\\&=21.67\end{align*}$$ The critical two-tail t-values from the t-table with \(n-2 = 3\) degrees of freedom are: $$t_{c}=±3.18$$ Notice that \(|t|>t_{c}\) (i.e \(21.67>3.18\)). Therefore, the null hypothesis can be rejected. Further, we can conclude that the estimated slope coefficient is statistically different from zero.

Offered by AnalystPrep

hypothesis test regression analysis

Analysis of Variance (ANOVA)

Predicted value of a dependent variable, test for differences between means: pa ..., probability tree and its application t ....

A tree diagram is a visual representation of all possible future outcomes and... Read More

Expected Value, Variance, and Standard ...

Implied return and growth.

Implied Return for Fixed-Income Instruments The growth rate is the rate at which... Read More

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13.6 Testing the Regression Coefficients

Learning objectives.

  • Conduct and interpret a hypothesis test on individual regression coefficients.

Previously, we learned that the population model for the multiple regression equation is

[latex]\begin{eqnarray*} y & = & \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_kx_k +\epsilon \end{eqnarray*}[/latex]

where [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex] are the population parameters of the regression coefficients, and [latex]\epsilon[/latex] is the error variable.  In multiple regression, we estimate each population regression coefficient [latex]\beta_i[/latex] with the sample regression coefficient [latex]b_i[/latex].

In the previous section, we learned how to conduct an overall model test to determine if the regression model is valid.  If the outcome of the overall model test is that the model is valid, then at least one of the independent variables is related to the dependent variable—in other words, at least one of the regression coefficients [latex]\beta_i[/latex] is not zero.  However, the overall model test does not tell us which independent variables are related to the dependent variable.  To determine which independent variables are related to the dependent variable, we must test each of the regression coefficients.

Testing the Regression Coefficients

For an individual regression coefficient, we want to test if there is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].

  • No Relationship .  There is no relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficient [latex]\beta_i[/latex] is zero.  This is the claim for the null hypothesis in an individual regression coefficient test:  [latex]H_0: \beta_i=0[/latex].
  • Relationship.  There is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficients [latex]\beta_i[/latex] is not zero.  This is the claim for the alternative hypothesis in an individual regression coefficient test:  [latex]H_a: \beta_i \neq 0[/latex].  We are not interested if the regression coefficient [latex]\beta_i[/latex] is positive or negative, only that it is not zero.  We only need to find out if the regression coefficient is not zero to demonstrate that there is a relationship between the dependent variable and the independent variable. This makes the test on a regression coefficient a two-tailed test.

In order to conduct a hypothesis test on an individual regression coefficient [latex]\beta_i[/latex], we need to use the distribution of the sample regression coefficient [latex]b_i[/latex]:

  • The mean of the distribution of the sample regression coefficient is the population regression coefficient [latex]\beta_i[/latex].
  • The standard deviation of the distribution of the sample regression coefficient is [latex]\sigma_{b_i}[/latex].  Because we do not know the population standard deviation we must estimate [latex]\sigma_{b_i}[/latex] with the sample standard deviation [latex]s_{b_i}[/latex].
  • The distribution of the sample regression coefficient follows a normal distribution.

Steps to Conduct a Hypothesis Test on a Regression Coefficient

[latex]\begin{eqnarray*} H_0: &  &  \beta_i=0 \\ \\ \end{eqnarray*}[/latex]

[latex]\begin{eqnarray*} H_a: &  & \beta_i \neq 0 \\ \\ \end{eqnarray*}[/latex]

  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}t & = & \frac{b_i-\beta_i}{s_{b_i}} \\ \\ df &  = & n-k-1 \\  \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

The required [latex]t[/latex]-score and p -value for the test can be found on the regression summary table, which we learned how to generate in Excel in a previous section.

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week”.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \beta_1=0 \\   H_a: & & \beta_1 \neq 0 \end{eqnarray*}[/latex]

The regression summary table generated by Excel is shown below:

Multiple R 0.711779225
R Square 0.506629665
Adjusted R Square 0.436148189
Standard Error 1.585212784
Observations 25
Regression 3 54.189109 18.06303633 7.18812504 0.001683189
Residual 21 52.770891 2.512899571
Total 24 106.96
Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

The  p -value for the test on the hours of unpaid work per week regression coefficient is in the bottom part of the table under the P-value column of the Hours of Unpaid Work per Week row .  So the  p -value=[latex]0.0082[/latex].

Conclusion:  

Because p -value[latex]=0.0082 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”

  • The null hypothesis [latex]\beta_1=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 1 because the “hours of unpaid work per week” is defined as [latex]x_1[/latex] in the regression model.
  • The p -value for the tests on the regression coefficients are located in the bottom part of the table under the P-value column heading in the corresponding independent variable row. 
  • Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  This is the value calculated out by Excel in the regression summary table.
  • The p -value of 0.0082 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_1[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”  This means that the independent variable “hours of unpaid work per week” is useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “age”.

[latex]\begin{eqnarray*} H_0: & & \beta_2=0 \\   H_a: & & \beta_2 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the age regression coefficient is in the bottom part of the table under the P-value column of the Age row .  So the  p -value=[latex]0.8439[/latex].

Because p -value[latex]=0.8439 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “age.”

  • The null hypothesis [latex]\beta_2=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “age.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “age.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 2 because “age” is defined as [latex]x_2[/latex] in the regression model.
  • The p -value of 0.8439 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the regression coefficient [latex]\beta_2[/latex] is zero, and so there is no relationship between the dependent variable “job satisfaction” and the independent variable “age.”  This means that the independent variable “age” is not particularly useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “income”.

[latex]\begin{eqnarray*} H_0: & & \beta_3=0 \\   H_a: & & \beta_3 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the income regression coefficient is in the bottom part of the table under the P-value column of the Income row .  So the  p -value=[latex]0.0060[/latex].

Because p -value[latex]=0.0060 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”

  • The null hypothesis [latex]\beta_3=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “income.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “income.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 3 because “income” is defined as [latex]x_3[/latex] in the regression model.
  • The p -value of 0.0060 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_3[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”  This means that the independent variable “income” is useful in predicting the dependent variable.

Concept Review

The test on a regression coefficient determines if there is a relationship between the dependent variable and the corresponding independent variable.  The p -value for the test is the sum of the area in tails of the [latex]t[/latex]-distribution.  The p -value can be found on the regression summary table generated by Excel.

The hypothesis test for a regression coefficient is a well established process:

  • Write down the null and alternative hypotheses in terms of the regression coefficient being tested.  The null hypothesis is the claim that there is no relationship between the dependent variable and independent variable.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and independent variable.
  • Collect the sample information for the test and identify the significance level.
  • The p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  Use the regression summary table generated by Excel to find the p -value.
  • Compare the  p -value to the significance level and state the outcome of the test.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Linear regression - Hypothesis testing

by Marco Taboga , PhD

This lecture discusses how to perform tests of hypotheses about the coefficients of a linear regression model estimated by ordinary least squares (OLS).

Table of contents

Normal vs non-normal model

The linear regression model, matrix notation, tests of hypothesis in the normal linear regression model, test of a restriction on a single coefficient (t test), test of a set of linear restrictions (f test), tests based on maximum likelihood procedures (wald, lagrange multiplier, likelihood ratio), tests of hypothesis when the ols estimator is asymptotically normal, test of a restriction on a single coefficient (z test), test of a set of linear restrictions (chi-square test), learn more about regression analysis.

The lecture is divided in two parts:

in the first part, we discuss hypothesis testing in the normal linear regression model , in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors;

in the second part, we show how to carry out hypothesis tests in linear regression analyses where the hypothesis of normality holds only in large samples (i.e., the OLS estimator can be proved to be asymptotically normal).

How to choose which test to carry out after estimating a linear regression model.

We also denote:

We now explain how to derive tests about the coefficients of the normal linear regression model.

It can be proved (see the lecture about the normal linear regression model ) that the assumption of conditional normality implies that:

How the acceptance region is determined depends not only on the desired size of the test , but also on whether the test is:

one-tailed (only one of the two things, i.e., either smaller or larger, is possible).

For more details on how to determine the acceptance region, see the glossary entry on critical values .

[eq28]

The F test is one-tailed .

A critical value in the right tail of the F distribution is chosen so as to achieve the desired size of the test.

Then, the null hypothesis is rejected if the F statistics is larger than the critical value.

In this section we explain how to perform hypothesis tests about the coefficients of a linear regression model when the OLS estimator is asymptotically normal.

As we have shown in the lecture on the properties of the OLS estimator , in several cases (i.e., under different sets of assumptions) it can be proved that:

These two properties are used to derive the asymptotic distribution of the test statistics used in hypothesis testing.

The test can be either one-tailed or two-tailed . The same comments made for the t-test apply here.

[eq50]

Like the F test, also the Chi-square test is usually one-tailed .

The desired size of the test is achieved by appropriately choosing a critical value in the right tail of the Chi-square distribution.

The null is rejected if the Chi-square statistics is larger than the critical value.

Want to learn more about regression analysis? Here are some suggestions:

R squared of a linear regression ;

Gauss-Markov theorem ;

Generalized Least Squares ;

Multicollinearity ;

Dummy variables ;

Selection of linear regression models

Partitioned regression ;

Ridge regression .

How to cite

Please cite as:

Taboga, Marco (2021). "Linear regression - Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression-hypothesis-testing.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • F distribution
  • Beta distribution
  • Conditional probability
  • Central Limit Theorem
  • Binomial distribution
  • Mean square convergence
  • Delta method
  • Almost sure convergence
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Loss function
  • Almost sure
  • Type I error
  • Precision matrix
  • Integrable variable
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .
  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Statistical Hypothesis Testing Overview

By Jim Frost 59 Comments

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide .

Why You Should Perform Statistical Hypothesis Testing

Graph that displays mean drug scores by group. Use hypothesis testing to determine whether the difference between the means are statistically significant.

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.  For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.

Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error .

For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!

Let’s cover some basic hypothesis testing terms that you need to know.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Hypothesis Testing

Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.

When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.

The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.

Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics .

An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance .

Null Hypothesis

The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H 0 .

In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Related post : Understanding the Null Hypothesis in More Detail

Alternative Hypothesis

The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H 1 or H A .

For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is H A : μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, H A : μ > 0 can only test for differences that are greater than zero.

Related posts : Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained

Image of a P for the p-value in hypothesis testing.

P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.

Related post : Interpreting P-values Correctly

Significance Level (Alpha)

image of the alpha symbol for hypothesis testing.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.

Related posts : Graphical Approach to Significance Levels and P-values and Conceptual Approach to Understanding Significance Levels

Types of Errors in Hypothesis Testing

Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.

  • False positives: You reject a null that is true. Statisticians call this a Type I error . The Type I error rate equals your significance level or alpha (α).
  • False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size , noisy data, or a small effect size. The type II error rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics .

Related posts : Types of Errors in Hypothesis Testing and Estimating a Good Sample Size for Your Study Using Power Analysis

Which Type of Hypothesis Test is Right for You?

There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.

To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data . This background research is necessary before you begin a study.

Related Post : Hypothesis Tests for Continuous, Binary, and Count Data

Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics !

If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:

  • How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
  • Fatality Rates in Star Trek . This example shows how to use hypothesis testing with categorical data.
  • Busting Myths About the Battle of the Sexes . A fun example based on a Mythbusters episode that assess continuous data using several different tests.
  • Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.

Share this:

hypothesis test regression analysis

Reader Interactions

' src=

January 14, 2024 at 8:43 am

Hello professor Jim, how are you doing! Pls. What are the properties of a population and their examples? Thanks for your time and understanding.

' src=

January 14, 2024 at 12:57 pm

Please read my post about Populations vs. Samples for more information and examples.

Also, please note there is a search bar in the upper-right margin of my website. Use that to search for topics.

' src=

July 5, 2023 at 7:05 am

Hello, I have a question as I read your post. You say in p-values section

“P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null.”

But according to your definition of effect, the null states that an effect does not exist, correct? So what I assume you want to say is that “P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is **incorrect**.”

July 6, 2023 at 5:18 am

Hi Shrinivas,

The correct definition of p-value is that it is a probability that exists in the context of a true null hypothesis. So, the quotation is correct in stating “if the null hypothesis is correct.”

Essentially, the p-value tells you the likelihood of your observed results (or more extreme) if the null hypothesis is true. It gives you an idea of whether your results are surprising or unusual if there is no effect.

Hence, with sufficiently low p-values, you reject the null hypothesis because it’s telling you that your sample results were unlikely to have occurred if there was no effect in the population.

I hope that helps make it more clear. If not, let me know I’ll attempt to clarify!

' src=

May 8, 2023 at 12:47 am

Thanks a lot Ny best regards

May 7, 2023 at 11:15 pm

Hi Jim Can you tell me something about size effect? Thanks

May 8, 2023 at 12:29 am

Here’s a post that I’ve written about Effect Sizes that will hopefully tell you what you need to know. Please read that. Then, if you have any more specific questions about effect sizes, please post them there. Thanks!

' src=

January 7, 2023 at 4:19 pm

Hi Jim, I have only read two pages so far but I am really amazed because in few paragraphs you made me clearly understand the concepts of months of courses I received in biostatistics! Thanks so much for this work you have done it helps a lot!

January 10, 2023 at 3:25 pm

Thanks so much!

' src=

June 17, 2021 at 1:45 pm

Can you help in the following question: Rocinante36 is priced at ₹7 lakh and has been designed to deliver a mileage of 22 km/litre and a top speed of 140 km/hr. Formulate the null and alternative hypotheses for mileage and top speed to check whether the new models are performing as per the desired design specifications.

' src=

April 19, 2021 at 1:51 pm

Its indeed great to read your work statistics.

I have a doubt regarding the one sample t-test. So as per your book on hypothesis testing with reference to page no 45, you have mentioned the difference between “the sample mean and the hypothesised mean is statistically significant”. So as per my understanding it should be quoted like “the difference between the population mean and the hypothesised mean is statistically significant”. The catch here is the hypothesised mean represents the sample mean.

Please help me understand this.

Regards Rajat

April 19, 2021 at 3:46 pm

Thanks for buying my book. I’m so glad it’s been helpful!

The test is performed on the sample but the results apply to the population. Hence, if the difference between the sample mean (observed in your study) and the hypothesized mean is statistically significant, that suggests that population does not equal the hypothesized mean.

For one sample tests, the hypothesized mean is not the sample mean. It is a mean that you want to use for the test value. It usually represents a value that is important to your research. In other words, it’s a value that you pick for some theoretical/practical reasons. You pick it because you want to determine whether the population mean is different from that particular value.

I hope that helps!

' src=

November 5, 2020 at 6:24 am

Jim, you are such a magnificent statistician/economist/econometrician/data scientist etc whatever profession. Your work inspires and simplifies the lives of so many researchers around the world. I truly admire you and your work. I will buy a copy of each book you have on statistics or econometrics. Keep doing the good work. Remain ever blessed

November 6, 2020 at 9:47 pm

Hi Renatus,

Thanks so much for you very kind comments. You made my day!! I’m so glad that my website has been helpful. And, thanks so much for supporting my books! 🙂

' src=

November 2, 2020 at 9:32 pm

Hi Jim, I hope you are aware of 2019 American Statistical Association’s official statement on Statistical Significance: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 In case you do not bother reading the full article, may I quote you the core message here: “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way."

With best wishes,

November 3, 2020 at 2:09 am

I’m definitely aware of the debate surrounding how to use p-values most effectively. However, I need to correct you on one point. The link you provide is NOT a statement by the American Statistical Association. It is an editorial by several authors.

There is considerable debate over this issue. There are problems with p-values. However, as the authors state themselves, much of the problem is over people’s mindsets about how to use p-values and their incorrect interpretations about what statistical significance does and does not mean.

If you were to read my website more thoroughly, you’d be aware that I share many of their concerns and I address them in multiple posts. One of the authors’ key points is the need to be thoughtful and conduct thoughtful research and analysis. I emphasize this aspect in multiple posts on this topic. I’ll ask you to read the following three because they all address some of the authors’ concerns and suggestions. But you might run across others to read as well.

Five Tips for Using P-values to Avoid Being Misled How to Interpret P-values Correctly P-values and the Reproducibility of Experimental Results

' src=

September 24, 2020 at 11:52 pm

HI Jim, i just want you to know that you made explanation for Statistics so simple! I should say lesser and fewer words that reduce the complexity. All the best! 🙂

September 25, 2020 at 1:03 am

Thanks, Rene! Your kind words mean a lot to me! I’m so glad it has been helpful!

' src=

September 23, 2020 at 2:21 am

Honestly, I never understood stats during my entire M.Ed course and was another nightmare for me. But how easily you have explained each concept, I have understood stats way beyond my imagination. Thank you so much for helping ignorant research scholars like us. Looking forward to get hardcopy of your book. Kindly tell is it available through flipkart?

September 24, 2020 at 11:14 pm

I’m so happy to hear that my website has been helpful!

I checked on flipkart and it appears like my books are not available there. I’m never exactly sure where they’re available due to the vagaries of different distribution channels. They are available on Amazon in India.

Introduction to Statistics: An Intuitive Guide (Amazon IN) Hypothesis Testing: An Intuitive Guide (Amazon IN)

' src=

July 26, 2020 at 11:57 am

Dear Jim I am a teacher from India . I don’t have any background in statistics, and still I should tell that in a single read I can follow your explanations . I take my entire biostatistics class for botany graduates with your explanations. Thanks a lot. May I know how I can avail your books in India

July 28, 2020 at 12:31 am

Right now my books are only available as ebooks from my website. However, soon I’ll have some exciting news about other ways to obtain it. Stay tuned! I’ll announce it on my email list. If you’re not already on it, you can sign up using the form that is in the right margin of my website.

' src=

June 22, 2020 at 2:02 pm

Also can you please let me if this book covers topics like EDA and principal component analysis?

June 22, 2020 at 2:07 pm

This book doesn’t cover principal components analysis. Although, I wouldn’t really classify that as a hypothesis test. In the future, I might write a multivariate analysis book that would cover this and others. But, that’s well down the road.

My Introduction to Statistics covers EDA. That’s the largely graphical look at your data that you often do prior to hypothesis testing. The Introduction book perfectly leads right into the Hypothesis Testing book.

June 22, 2020 at 1:45 pm

Thanks for the detailed explanation. It does clear my doubts. I saw that your book related to hypothesis testing has the topics that I am studying currently. I am looking forward to purchasing it.

Regards, Take Care

June 19, 2020 at 1:03 pm

For this particular article I did not understand a couple of statements and it would great if you could help: 1)”If sample error causes the observed difference, the next time someone performs the same experiment the results might be different.” 2)”If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics.”

I discovered your articles by chance and now I keep coming back to read & understand statistical concepts. These articles are very informative & easy to digest. Thanks for the simplifying things.

June 20, 2020 at 9:53 pm

I’m so happy to hear that you’ve found my website to be helpful!

To answer your questions, keep in mind that a central tenant of inferential statistics is that the random sample that a study drew was only one of an infinite number of possible it could’ve drawn. Each random sample produces different results. Most results will cluster around the population value assuming they used good methodology. However, random sampling error always exists and makes it so that population estimates from a sample almost never exactly equal the correct population value.

So, imagine that we’re studying a medication and comparing the treatment and control groups. Suppose that the medicine is truly not effect and that the population difference between the treatment and control group is zero (i.e., no difference.) Despite the true difference being zero, most sample estimates will show some degree of either a positive or negative effect thanks to random sampling error. So, just because a study has an observed difference does not mean that a difference exists at the population level. So, on to your questions:

1. If the observed difference is just random error, then it makes sense that if you collected another random sample, the difference could change. It could change from negative to positive, positive to negative, more extreme, less extreme, etc. However, if the difference exists at the population level, most random samples drawn from the population will reflect that difference. If the medicine has an effect, most random samples will reflect that fact and not bounce around on both sides of zero as much.

2. This is closely related to the previous answer. If there is no difference at the population level, but say you approve the medicine because of the observed effects in a sample. Even though your random sample showed an effect (which was really random error), that effect doesn’t exist. So, when you start using it on a larger scale, people won’t benefit from the medicine. That’s why it’s important to separate out what is easily explained by random error versus what is not easily explained by it.

I think reading my post about how hypothesis tests work will help clarify this process. Also, in about 24 hours (as I write this), I’ll be releasing my new ebook about Hypothesis Testing!

' src=

May 29, 2020 at 5:23 am

Hi Jim, I really enjoy your blog. Can you please link me on your blog where you discuss about Subgroup analysis and how it is done? I need to use non parametric and parametric statistical methods for my work and also do subgroup analysis in order to identify potential groups of patients that may benefit more from using a treatment than other groups.

May 29, 2020 at 2:12 pm

Hi, I don’t have a specific article about subgroup analysis. However, subgroup analysis is just the dividing up of a larger sample into subgroups and then analyzing those subgroups separately. You can use the various analyses I write about on the subgroups.

Alternatively, you can include the subgroups in regression analysis as an indicator variable and include that variable as a main effect and an interaction effect to see how the relationships vary by subgroup without needing to subdivide your data. I write about that approach in my article about comparing regression lines . This approach is my preferred approach when possible.

' src=

April 19, 2020 at 7:58 am

sir is confidence interval is a part of estimation?

' src=

April 17, 2020 at 3:36 pm

Sir can u plz briefly explain alternatives of hypothesis testing? I m unable to find the answer

April 18, 2020 at 1:22 am

Assuming you want to draw conclusions about populations by using samples (i.e., inferential statistics ), you can use confidence intervals and bootstrap methods as alternatives to the traditional hypothesis testing methods.

' src=

March 9, 2020 at 10:01 pm

Hi JIm, could you please help with activities that can best teach concepts of hypothesis testing through simulation, Also, do you have any question set that would enhance students intuition why learning hypothesis testing as a topic in introductory statistics. Thanks.

' src=

March 5, 2020 at 3:48 pm

Hi Jim, I’m studying multiple hypothesis testing & was wondering if you had any material that would be relevant. I’m more trying to understand how testing multiple samples simultaneously affects your results & more on the Bonferroni Correction

March 5, 2020 at 4:05 pm

I write about multiple comparisons (aka post hoc tests) in the ANOVA context . I don’t talk about Bonferroni Corrections specifically but I cover related types of corrections. I’m not sure if that exactly addresses what you want to know but is probably the closest I have already written. I hope it helps!

' src=

January 14, 2020 at 9:03 pm

Thank you! Have a great day/evening.

January 13, 2020 at 7:10 pm

Any help would be greatly appreciated. What is the difference between The Hypothesis Test and The Statistical Test of Hypothesis?

January 14, 2020 at 11:02 am

They sound like the same thing to me. Unless this is specialized terminology for a particular field or the author was intending something specific, I’d guess they’re one and the same.

' src=

April 1, 2019 at 10:00 am

so these are the only two forms of Hypothesis used in statistical testing?

April 1, 2019 at 10:02 am

Are you referring to the null and alternative hypothesis? If so, yes, that’s those are the standard hypotheses in a statistical hypothesis test.

April 1, 2019 at 9:57 am

year very insightful post, thanks for the write up

' src=

October 27, 2018 at 11:09 pm

hi there, am upcoming statistician, out of all blogs that i have read, i have found this one more useful as long as my problem is concerned. thanks so much

October 27, 2018 at 11:14 pm

Hi Stano, you’re very welcome! Thanks for your kind words. They mean a lot! I’m happy to hear that my posts were able to help you. I’m sure you will be a fantastic statistician. Best of luck with your studies!

' src=

October 26, 2018 at 11:39 am

Dear Jim, thank you very much for your explanations! I have a question. Can I use t-test to compare two samples in case each of them have right bias?

October 26, 2018 at 12:00 pm

Hi Tetyana,

You’re very welcome!

The term “right bias” is not a standard term. Do you by chance mean right skewed distributions? In other words, if you plot the distribution for each group on a histogram they have longer right tails? These are not the symmetrical bell-shape curves of the normal distribution.

If that’s the case, yes you can as long as you exceed a specific sample size within each group. I include a table that contains these sample size requirements in my post about nonparametric vs parametric analyses .

Bias in statistics refers to cases where an estimate of a value is systematically higher or lower than the true value. If this is the case, you might be able to use t-tests, but you’d need to be sure to understand the nature of the bias so you would understand what the results are really indicating.

I hope this helps!

' src=

April 2, 2018 at 7:28 am

Simple and upto the point 👍 Thank you so much.

April 2, 2018 at 11:11 am

Hi Kalpana, thanks! And I’m glad it was helpful!

' src=

March 26, 2018 at 8:41 am

Am I correct if I say: Alpha – Probability of wrongly rejection of null hypothesis P-value – Probability of wrongly acceptance of null hypothesis

March 28, 2018 at 3:14 pm

You’re correct about alpha. Alpha is the probability of rejecting the null hypothesis when the null is true.

Unfortunately, your definition of the p-value is a bit off. The p-value has a fairly convoluted definition. It is the probability of obtaining the effect observed in a sample, or more extreme, if the null hypothesis is true. The p-value does NOT indicate the probability that either the null or alternative is true or false. Although, those are very common misinterpretations. To learn more, read my post about how to interpret p-values correctly .

' src=

March 2, 2018 at 6:10 pm

I recently started reading your blog and it is very helpful to understand each concept of statistical tests in easy way with some good examples. Also, I recommend to other people go through all these blogs which you posted. Specially for those people who have not statistical background and they are facing to many problems while studying statistical analysis.

Thank you for your such good blogs.

March 3, 2018 at 10:12 pm

Hi Amit, I’m so glad that my blog posts have been helpful for you! It means a lot to me that you took the time to write such a nice comment! Also, thanks for recommending by blog to others! I try really hard to write posts about statistics that are easy to understand.

' src=

January 17, 2018 at 7:03 am

I recently started reading your blog and I find it very interesting. I am learning statistics by my own, and I generally do many google search to understand the concepts. So this blog is quite helpful for me, as it have most of the content which I am looking for.

January 17, 2018 at 3:56 pm

Hi Shashank, thank you! And, I’m very glad to hear that my blog is helpful!

' src=

January 2, 2018 at 2:28 pm

thank u very much sir.

January 2, 2018 at 2:36 pm

You’re very welcome, Hiral!

' src=

November 21, 2017 at 12:43 pm

Thank u so much sir….your posts always helps me to be a #statistician

November 21, 2017 at 2:40 pm

Hi Sachin, you’re very welcome! I’m happy that you find my posts to be helpful!

' src=

November 19, 2017 at 8:22 pm

great post as usual, but it would be nice to see an example.

November 19, 2017 at 8:27 pm

Thank you! At the end of this post, I have links to four other posts that show examples of hypothesis tests in action. You’ll find what you’re looking for in those posts!

Comments and Questions Cancel reply

  • Search Menu

Sign in through your institution

  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Numismatics
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Social History
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Variation
  • Language Families
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Culture
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Meta-Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Legal System - Costs and Funding
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Restitution
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business History
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Social Issues in Business and Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic Methodology
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Management of Land and Natural Resources (Social Science)
  • Natural Disasters (Environment)
  • Pollution and Threats to the Environment (Social Science)
  • Social Impact of Environmental Issues (Social Science)
  • Sustainability
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • Ethnic Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Politics of Development
  • Public Administration
  • Public Policy
  • Qualitative Political Methodology
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Disability Studies
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Time Series and Panel Data Econometrics

  • < Previous chapter
  • Next chapter >

3 Hypothesis Testing in Regression Models

  • Published: October 2015
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter introduces some key concepts of statistical inference and shows their use to investigate the statistical significance of the (linear) relationships modelled through regression analysis, or to investigate the validity of the classical assumptions in simple and multiple linear regression models. The discussions cover statistical hypothesis testing in simple and multiple regression models; testing linear restrictions on regression coefficients; joint tests of linear restrictions; testing general linear restrictions; the relationship between the F test and the coefficient of multiple correlation; the joint confidence region; multicollinearity and the prediction problem; implications of mis-specification of the regression model on hypothesis testing; Jarque-Bera's test of the normality of regression residuals; the predictive failure test; the Chow test; and non-parametric estimation of the density function. Exercises are provided at the end of the chapter.

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

Month: Total Views:
October 2022 13
November 2022 18
December 2022 7
January 2023 12
February 2023 5
March 2023 7
April 2023 3
May 2023 3
June 2023 6
July 2023 3
August 2023 4
September 2023 4
October 2023 9
November 2023 13
December 2023 6
January 2024 10
February 2024 8
March 2024 4
April 2024 3
May 2024 4
June 2024 5
September 2024 2
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.4 - the hypothesis tests for the slopes.

At the beginning of this lesson, we translated three different research questions pertaining to heart attacks in rabbits ( Cool Hearts dataset ) into three sets of hypotheses we can test using the general linear F -statistic. The research questions and their corresponding hypotheses are:

Hypotheses 1

Is the regression model containing at least one predictor useful in predicting the size of the infarct?

  • \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3} = 0\)
  • \(H_{A} \colon\) At least one \(\beta_{j} ≠ 0\) (for j = 1, 2, 3)

Hypotheses 2

Is the size of the infarct significantly (linearly) related to the area of the region at risk?

  • \(H_{0} \colon \beta_{1} = 0 \)
  • \(H_{A} \colon \beta_{1} \ne 0 \)

Hypotheses 3

(Primary research question) Is the size of the infarct area significantly (linearly) related to the type of treatment upon controlling for the size of the region at risk for infarction?

  • \(H_{0} \colon \beta_{2} = \beta_{3} = 0\)
  • \(H_{A} \colon \) At least one \(\beta_{j} ≠ 0\) (for j = 2, 3)

Let's test each of the hypotheses now using the general linear F -statistic:

\(F^*=\left(\dfrac{SSE(R)-SSE(F)}{df_R-df_F}\right) \div \left(\dfrac{SSE(F)}{df_F}\right)\)

To calculate the F -statistic for each test, we first determine the error sum of squares for the reduced and full models — SSE ( R ) and SSE ( F ), respectively. The number of error degrees of freedom associated with the reduced and full models — \(df_{R}\) and \(df_{F}\), respectively — is the number of observations, n , minus the number of parameters, p , in the model. That is, in general, the number of error degrees of freedom is n - p . We use statistical software, such as Minitab's F -distribution probability calculator, to determine the P -value for each test.

Testing all slope parameters equal 0 Section  

To answer the research question: "Is the regression model containing at least one predictor useful in predicting the size of the infarct?" To do so, we test the hypotheses:

  • \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3} = 0 \)
  • \(H_{A} \colon\) At least one \(\beta_{j} \ne 0 \) (for j = 1, 2, 3)

The full model

The full model is the largest possible model — that is, the model containing all of the possible predictors. In this case, the full model is:

\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)

The error sum of squares for the full model, SSE ( F ), is just the usual error sum of squares, SSE , that appears in the analysis of variance table. Because there are 4 parameters in the full model, the number of error degrees of freedom associated with the full model is \(df_{F} = n - 4\).

The reduced model

The reduced model is the model that the null hypothesis describes. Because the null hypothesis sets each of the slope parameters in the full model equal to 0, the reduced model is:

\(y_i=\beta_0+\epsilon_i\)

The reduced model suggests that none of the variations in the response y is explained by any of the predictors. Therefore, the error sum of squares for the reduced model, SSE ( R ), is just the total sum of squares, SSTO , that appears in the analysis of variance table. Because there is only one parameter in the reduced model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n - 1 \).

Upon plugging in the above quantities, the general linear F -statistic:

\(F^*=\dfrac{SSE(R)-SSE(F)}{df_R-df_F} \div \dfrac{SSE(F)}{df_F}\)

becomes the usual " overall F -test ":

\(F^*=\dfrac{SSR}{3} \div \dfrac{SSE}{n-4}=\dfrac{MSR}{MSE}\)

That is, to test \(H_{0}\) : \(\beta_{1} = \beta_{2} = \beta_{3} = 0 \), we just use the overall F -test and P -value reported in the analysis of variance table:

Analysis of Variance

Source DF Adj SS Adj MS F- Value P-Value
Regression 3 0.95927 0.31976 16.43 0.000
Area 1 0.63742 0.63742 32.75 0.000
X2 1 0.29733 0.29733 15.28 0.001
X3 1 0.01981 0.01981 1.02 0.322
Error 28 0.54491 0.01946    
31 1.50418      

Regression Equation

Inf = - 0.135 + 0.613 Area - 0.2435 X2 - 0.0657 X3

There is sufficient evidence ( F = 16.43, P < 0.001) to conclude that at least one of the slope parameters is not equal to 0.

In general, to test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F -test reported in the analysis of variance table.

Testing one slope parameter is 0 Section  

Now let's answer the second research question: "Is the size of the infarct significantly (linearly) related to the area of the region at risk?" To do so, we test the hypotheses:

Again, the full model is the model containing all of the possible predictors:

The error sum of squares for the full model, SSE ( F ), is just the usual error sum of squares, SSE . Alternatively, because the three predictors in the model are \(x_{1}\), \(x_{2}\), and \(x_{3}\), we can denote the error sum of squares as SSE (\(x_{1}\), \(x_{2}\), \(x_{3}\)). Again, because there are 4 parameters in the model, the number of error degrees of freedom associated with the full model is \(df_{F} = n - 4 \).

Because the null hypothesis sets the first slope parameter, \(\beta_{1}\), equal to 0, the reduced model is:

\(y_i=(\beta_0+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)

Because the two predictors in the model are \(x_{2}\) and \(x_{3}\), we denote the error sum of squares as SSE (\(x_{2}\), \(x_{3}\)). Because there are 3 parameters in the model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n - 3\).

The general linear statistic:

simplifies to:

\(F^*=\dfrac{SSR(x_1|x_2, x_3)}{1}\div \dfrac{SSE(x_1,x_2, x_3)}{n-4}=\dfrac{MSR(x_1|x_2, x_3)}{MSE(x_1,x_2, x_3)}\)

Getting the numbers from the Minitab output:

we determine that the value of the F -statistic is:

\(F^* = \dfrac{SSR(x_1 \vert x_2, x_3)}{1} \div \dfrac{SSE(x_1, x_2, x_3)}{28} = \dfrac{0.63742}{0.01946}=32.7554\)

The P -value is the probability — if the null hypothesis were true — that we would get an F -statistic larger than 32.7554. Comparing our F -statistic to an F -distribution with 1 numerator degree of freedom and 28 denominator degrees of freedom, Minitab tells us that the probability is close to 1 that we would observe an F -statistic smaller than 32.7554:

F distribution with 1 DF in Numerator and 28 DF in denominator

x P ( X ≤x )
32.7554 1.00000

Therefore, the probability that we would get an F -statistic larger than 32.7554 is close to 0. That is, the P -value is < 0.001. There is sufficient evidence ( F = 32.8, P < 0.001) to conclude that the size of the infarct is significantly related to the size of the area at risk after the other predictors x2 and x3 have been taken into account.

But wait a second! Have you been wondering why we couldn't just use the slope's t -statistic to test that the slope parameter, \(\beta_{1}\), is 0? We can! Notice that the P -value ( P < 0.001) for the t -test ( t * = 5.72):

Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant -0.135 0.104 -1.29 0.206  
Area 0.613 0.107 5.72 0.000 1.14
X2 -0.2435 0.0623 -3.91 0.001 1.44
X3 -0.0657 0.0651 -1.01 0.322 1.57

is the same as the P -value we obtained for the F -test. This will always be the case when we test that only one slope parameter is 0. That's because of the well-known relationship between a t -statistic and an F -statistic that has one numerator degree of freedom:

\(t_{(n-p)}^{2}=F_{(1, n-p)}\)

For our example, the square of the t -statistic, 5.72, equals our F -statistic (within rounding error). That is:

\(t^{*2}=5.72^2=32.72=F^*\)

So what have we learned in all of this discussion about the equivalence of the F -test and the t -test? In short:

Compare the output obtained when \(x_{1}\) = Area is entered into the model last :

Term Coef SE Coef T-Value P-Value VIF
Constant -0.135 0.104 -1.29 0.206  
X2 -0.2435 0.0623 -3.91 0.001 1.44
X3 -0.0657 0.0651 -1.01 0.322 1.57
Area 0.613 0.107 5.72 0.000 1.14

Inf = - 0.135 - 0.2435 X2 - 0.0657 X3 + 0.613 Area

to the output obtained when \(x_{1}\) = Area is entered into the model first :

The t -statistic and P -value are the same regardless of the order in which \(x_{1}\) = Area is entered into the model. That's because — by its equivalence to the F -test — the t -test for one slope parameter adjusts for all of the other predictors included in the model.

  • We can use either the F -test or the t -test to test that only one slope parameter is 0. Because the t -test results can be read right off of the Minitab output, it makes sense that it would be the test that we'll use most often.
  • But, we have to be careful with our interpretations! The equivalence of the t -test to the F -test has taught us something new about the t -test. The t -test is a test for the marginal significance of the \(x_{1}\) predictor after the other predictors \(x_{2}\) and \(x_{3}\) have been taken into account. It does not test for the significance of the relationship between the response y and the predictor \(x_{1}\) alone.

Testing a subset of slope parameters is 0 Section  

Finally, let's answer the third — and primary — research question: "Is the size of the infarct area significantly (linearly) related to the type of treatment upon controlling for the size of the region at risk for infarction?" To do so, we test the hypotheses:

  • \(H_{0} \colon \beta_{2} = \beta_{3} = 0 \)
  • \(H_{A} \colon\) At least one \(\beta_{j} \ne 0 \) (for j = 2, 3)

Because the null hypothesis sets the second and third slope parameters, \(\beta_{2}\) and \(\beta_{3}\), equal to 0, the reduced model is:

\(y_i=(\beta_0+\beta_1x_{i1})+\epsilon_i\)

The ANOVA table for the reduced model is:

Source DF Adj SS Adj MS F- Value P-Value
Regression 1 0.6249 0.62492 21.32 0.000
Area 1 0.6249 0.62492 21.32 0.000
Error 30 0.8793 0.02931    
31 1.5042      

Because the only predictor in the model is \(x_{1}\), we denote the error sum of squares as SSE (\(x_{1}\)) = 0.8793. Because there are 2 parameters in the model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n - 2 = 32 – 2 = 30\).

\begin{align} F^*&=\dfrac{SSE(R)-SSE(F)}{df_R-df_F} \div\dfrac{SSE(F)}{df_F}\\&=\dfrac{0.8793-0.54491}{30-28} \div\dfrac{0.54491}{28}\\&= \dfrac{0.33439}{2} \div 0.01946\\&=8.59.\end{align}

Alternatively, we can calculate the F-statistic using a partial F-test :

\begin{align}F^*&=\dfrac{SSR(x_2, x_3|x_1)}{2}\div \dfrac{SSE(x_1,x_2, x_3)}{n-4}\\&=\dfrac{MSR(x_2, x_3|x_1)}{MSE(x_1,x_2, x_3)}.\end{align}

To conduct the test, we regress y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and \(x_{3 }\)— in order (and with "Sequential sums of squares" selected under "Options"):

Source DF Seq SS Seq MS F- Value P-Value
Regression 3 0.95927 0.31976 16.43 0.000
Area 1 0.62492 0.63492 32.11 0.000
X2 1 0.3143 0.31453 16.16 0.001
X3 1 0.01981 0.01981 1.02 0.322
Error 28 0.54491 0.01946    
31 1.50418      

Inf = - 0.135 + 0.613 Area - 0.2435 X2 - 0.0657 X3

yielding SSR (\(x_{2}\) | \(x_{1}\)) = 0.31453, SSR (\(x_{3}\) | \(x_{1}\), \(x_{2}\)) = 0.01981, and MSE = 0.54491/28 = 0.01946. Therefore, the value of the partial F -statistic is:

\begin{align} F^*&=\dfrac{SSR(x_2, x_3|x_1)}{2}\div \dfrac{SSE(x_1,x_2, x_3)}{n-4}\\&=\dfrac{0.31453+0.01981}{2}\div\dfrac{0.54491}{28}\\&= \dfrac{0.33434}{2} \div 0.01946\\&=8.59,\end{align}

which is identical (within round-off error) to the general F-statistic above. The P -value is the probability — if the null hypothesis were true — that we would observe a partial F -statistic more extreme than 8.59. The following Minitab output:

F distribution with 2 DF in Numerator and 28 DF in denominator

x P ( X ≤ x )
8.59 0.998767

tells us that the probability of observing such an F -statistic that is smaller than 8.59 is 0.9988. Therefore, the probability of observing such an F -statistic that is larger than 8.59 is 1 - 0.9988 = 0.0012. The P -value is very small. There is sufficient evidence ( F = 8.59, P = 0.0012) to conclude that the type of cooling is significantly related to the extent of damage that occurs — after taking into account the size of the region at risk.

Summary of MLR Testing Section  

For the simple linear regression model, there is only one slope parameter about which one can perform hypothesis tests. For the multiple linear regression model, there are three different hypothesis tests for slopes that one could conduct. They are:

  • Hypothesis test for testing that all of the slope parameters are 0.
  • Hypothesis test for testing that a subset — more than one, but not all — of the slope parameters are 0.
  • Hypothesis test for testing that one slope parameter is 0.

We have learned how to perform each of the above three hypothesis tests. Along the way, we also took two detours — one to learn about the " general linear F-test " and one to learn about " sequential sums of squares. " As you now know, knowledge about both is necessary for performing the three hypothesis tests.

The F -statistic and associated p -value in the ANOVA table is used for testing whether all of the slope parameters are 0. In most applications, this p -value will be small enough to reject the null hypothesis and conclude that at least one predictor is useful in the model. For example, for the rabbit heart attacks study, the F -statistic is (0.95927/(4–1)) / (0.54491/(32–4)) = 16.43 with p -value 0.000.

To test whether a subset — more than one, but not all — of the slope parameters are 0, there are two equivalent ways to calculate the F-statistic:

  • Use the general linear F-test formula by fitting the full model to find SSE(F) and fitting the reduced model to find SSE(R) . Then the numerator of the F-statistic is (SSE(R) – SSE(F)) / ( \(df_{R}\) – \(df_{F}\)) .
  • Alternatively, use the partial F-test formula by fitting only the full model but making sure the relevant predictors are fitted last and "sequential sums of squares" have been selected. Then the numerator of the F-statistic is the sum of the relevant sequential sums of squares divided by the sum of the degrees of freedom for these sequential sums of squares. The denominator of the F -statistic is the mean squared error in the ANOVA table.

For example, for the rabbit heart attacks study, the general linear F-statistic is ((0.8793 – 0.54491) / (30 – 28)) / (0.54491 / 28) = 8.59 with p -value 0.0012. Alternatively, the partial F -statistic for testing the slope parameters for predictors \(x_{2}\) and \(x_{3}\) using sequential sums of squares is ((0.31453 + 0.01981) / 2) / (0.54491 / 28) = 8.59.

To test whether one slope parameter is 0, we can use an F -test as just described. Alternatively, we can use a t -test, which will have an identical p -value since in this case, the square of the t -statistic is equal to the F -statistic. For example, for the rabbit heart attacks study, the F -statistic for testing the slope parameter for the Area predictor is (0.63742/1) / (0.54491/(32–4)) = 32.75 with p -value 0.000. Alternatively, the t -statistic for testing the slope parameter for the Area predictor is 0.613 / 0.107 = 5.72 with p -value 0.000, and \(5.72^{2} = 32.72\).

Incidentally, you may be wondering why we can't just do a series of individual t-tests to test whether a subset of the slope parameters is 0. For example, for the rabbit heart attacks study, we could have done the following:

  • Fit the model of y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and \(x_{3}\) and use an individual t-test for \(x_{3}\).
  • If the test results indicate that we can drop \(x_{3}\) then fit the model of y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and use an individual t-test for \(x_{2}\).

The problem with this approach is we're using two individual t-tests instead of one F-test, which means our chance of drawing an incorrect conclusion in our testing procedure is higher. Every time we do a hypothesis test, we can draw an incorrect conclusion by:

  • rejecting a true null hypothesis, i.e., make a type I error by concluding the tested predictor(s) should be retained in the model when in truth it/they should be dropped; or
  • failing to reject a false null hypothesis, i.e., make a type II error by concluding the tested predictor(s) should be dropped from the model when in truth it/they should be retained.

Thus, in general, the fewer tests we perform the better. In this case, this means that wherever possible using one F-test in place of multiple individual t-tests is preferable.

Hypothesis tests for the slope parameters Section  

The problems in this section are designed to review the hypothesis tests for the slope parameters, as well as to give you some practice on models with a three-group qualitative variable (which we'll cover in more detail in Lesson 8). We consider tests for:

  • whether one slope parameter is 0 (for example, \(H_{0} \colon \beta_{1} = 0 \))
  • whether a subset (more than one but less than all) of the slope parameters are 0 (for example, \(H_{0} \colon \beta_{2} = \beta_{3} = 0 \) against the alternative \(H_{A} \colon \beta_{2} \ne 0 \) or \(\beta_{3} \ne 0 \) or both ≠ 0)
  • whether all of the slope parameters are 0 (for example, \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3}\) = 0 against the alternative \(H_{A} \colon \) at least one of the \(\beta_{i}\) is not 0)

(Note the correct specification of the alternative hypotheses for the last two situations.)

Sugar beets study

A group of researchers was interested in studying the effects of three different growth regulators ( treat , denoted 1, 2, and 3) on the yield of sugar beets (y = yield , in pounds). They planned to plant the beets in 30 different plots and then randomly treat 10 plots with the first growth regulator, 10 plots with the second growth regulator, and 10 plots with the third growth regulator. One problem, though, is that the amount of available nitrogen in the 30 different plots varies naturally, thereby giving a potentially unfair advantage to plots with higher levels of available nitrogen. Therefore, the researchers also measured and recorded the available nitrogen (\(x_{1}\) = nit , in pounds/acre) in each plot. They are interested in comparing the mean yields of sugar beets subjected to the different growth regulators after taking into account the available nitrogen. The Sugar Beets dataset contains the data from the researcher's experiment.

Preliminary Work

The plot shows a similar positive linear trend within each treatment category, which suggests that it is reasonable to formulate a multiple regression model that would place three parallel lines through the data.

Because the qualitative variable treat distinguishes between the three treatment groups (1, 2, and 3), we need to create two indicator variables, \(x_{2}\) and \(x_{3}\), say, to fit a linear regression model to these data. The new indicator variables should be defined as follows:

treat \(x_2\) \(x_3\)
1 1 0
2 0 1
3 0 0

Use Minitab's Calc >> Make Indicator Variables command to create the new indicator variables in your worksheet

Minitab creates an indicator variable for each treatment group but we can only use two, for treatment groups 1 and 2 in this case (treatment group 3 is the reference level in this case).

Then, if we assume the trend in the data can be summarized by this regression model:

\(y_{i} = \beta_{0}\) + \(\beta_{1}\)\(x_{1}\) + \(\beta_{2}\)\(x_{2}\) + \(\beta_{3}\)\(x_{3}\) + \(\epsilon_{i}\)

where \(x_{1}\) = nit and \(x_{2}\) and \(x_{3}\) are defined as above, what is the mean response function for plots receiving treatment 3? for plots receiving treatment 1? for plots receiving treatment 2? Are the three regression lines that arise from our formulated model parallel? What does the parameter \(\beta_{2}\) quantify? And, what does the parameter \(\beta_{3}\) quantify?

The fitted equation from Minitab is Yield = 84.99 + 1.3088 Nit - 2.43 \(x_{2}\) - 2.35 \(x_{3}\), which means that the equations for each treatment group are:

  • Group 1: Yield = 84.99 + 1.3088 Nit - 2.43(1) = 82.56 + 1.3088 Nit
  • Group 2: Yield = 84.99 + 1.3088 Nit - 2.35(1) = 82.64 + 1.3088 Nit
  • Group 3: Yield = 84.99 + 1.3088 Nit

The three estimated regression lines are parallel since they have the same slope, 1.3088.

The regression parameter for \(x_{2}\) represents the difference between the estimated intercept for treatment 1 and the estimated intercept for reference treatment 3.

The regression parameter for \(x_{3}\) represents the difference between the estimated intercept for treatment 2 and the estimated intercept for reference treatment 3.

Testing whether all of the slope parameters are 0

\(H_0 \colon \beta_1 = \beta_2 = \beta_3 = 0\) against the alternative \(H_A \colon \) at least one of the \(\beta_i\) is not 0.

\(F=\dfrac{SSR(X_1,X_2,X_3)\div3}{SSE(X_1,X_2,X_3)\div(n-4)}=\dfrac{MSR(X_1,X_2,X_3)}{MSE(X_1,X_2,X_3)}\)

\(F = \dfrac{\frac{16039.5}{3}}{\frac{1078.0}{30-4}} = \dfrac{5346.5}{41.46} = 128.95\)

Since the p -value for this F -statistic is reported as 0.000, we reject \(H_{0}\) in favor of \(H_{A}\) and conclude that at least one of the slope parameters is not zero, i.e., the regression model containing at least one predictor is useful in predicting the size of sugar beet yield.

Tests for whether one slope parameter is 0

\(H_0 \colon \beta_1= 0\) against the alternative \(H_A \colon \beta_1 \ne 0\)

t -statistic = 19.60, p -value = 0.000, so we reject \(H_{0}\) in favor of \(H_{A}\) and conclude that the slope parameter for \(x_{1}\) = nit is not zero, i.e., sugar beet yield is significantly linearly related to the available nitrogen (controlling for treatment).

\(F=\dfrac{SSR(X_1|X_2,X_3)\div1}{SSE(X_1,X_2,X_3)\div(n-4)}=\dfrac{MSR(X_1|X_2,X_3)}{MSE(X_1,X_2,X_3)}\)

Use the Minitab output to calculate the value of this F statistic. Does the value you obtain equal \(t^{2}\), the square of the t -statistic as we might expect?

\(F-statistic= \dfrac{\frac{15934.5}{1}}{\frac{1078.0}{30-4}} = \dfrac{15934.5}{41.46} = 384.32\), which is the same as \(19.60^{2}\).

Because \(t^{2}\) will equal the partial F -statistic whenever you test for whether one slope parameter is 0, it makes sense to just use the t -statistic and P -value that Minitab displays as a default. But, note that we've just learned something new about the meaning of the t -test in the multiple regression setting. It tests for the ("marginal") significance of the \(x_{1}\) predictor after \(x_{2}\) and \(x_{3}\) have already been taken into account.

Tests for whether a subset of the slope parameters is 0

\(H_0 \colon \beta_2=\beta_3= 0\) against the alternative \(H_A \colon \beta_2 \ne 0\) or \(\beta_3 \ne 0\) or both \(\ne 0\).

\(F=\dfrac{SSR(X_2,X_3|X_1)\div2}{SSE(X_1,X_2,X_3)\div(n-4)}=\dfrac{MSR(X_2,X_3|X_1)}{MSE(X_1,X_2,X_3)}\)

\(F = \dfrac{\frac{10.4+27.5}{2}}{\frac{1078.0}{30-4}} = \dfrac{18.95}{41.46} = 0.46\).

F distribution with 2 DF in Numerator and 26 DF in denominator

x P ( X ≤ x )
0.46 0.363677

p-value \(= 1-0.363677 = 0.636\), so we fail to reject \(H_{0}\) in favor of \(H_{A}\) and conclude that we cannot rule out \(\beta_2 = \beta_3 = 0\), i.e., there is no significant difference in the mean yields of sugar beets subjected to the different growth regulators after taking into account the available nitrogen.

Note that the sequential mean square due to regression, MSR(\(X_{2}\),\(X_{3}\)|\(X_{1}\)), is obtained by dividing the sequential sum of square by its degrees of freedom (2, in this case, since two additional predictors \(X_{2}\) and \(X_{3}\) are considered). Use the Minitab output to calculate the value of this F statistic, and use Minitab to get the associated P -value. Answer the researcher's question at the \(\alpha= 0.05\) level.

Teach yourself statistics

Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y .

The test focuses on the slope of the regression line

Y = Β 0 + Β 1 X

where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

  • The dependent variable Y has a linear relationship to the independent variable X .
  • For each value of X, the probability distribution of Y has the same standard deviation σ.
  • The Y values are independent.
  • The Y values are roughly normally distributed (i.e., symmetric and unimodal ). A little skewness is ok if the sample size is large.

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

If there is a significant linear relationship between the independent variable X and the dependent variable Y , the slope will not equal zero.

H o : Β 1 = 0

H a : Β 1 ≠ 0

The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero.

Analyze Sample Data

Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.

Predictor Coef SE Coef T P
Constant 76 30 2.53 0.01
X 35 20 1.75 0.04

SE = s b 1 = sqrt [ Σ(y i - ŷ i ) 2 / (n - 2) ] / sqrt [ Σ(x i - x ) 2 ]

  • Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35.

t = b 1 / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Annual bill = 0.55 * Home size + 15

Predictor Coef SE Coef T P
Constant 15 3 5.0 0.00
Home size 0.55 0.24 2.29 0.01

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance.

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

H o : The slope of the regression line is equal to zero.

H a : The slope of the regression line is not equal to zero.

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero.

We get the slope (b 1 ) and the standard error (SE) from the regression output.

b 1 = 0.55       SE = 0.24

We compute the degrees of freedom and the t statistic, using the following equations.

DF = n - 2 = 101 - 2 = 99

t = b 1 /SE = 0.55/0.24 = 2.29

where DF is the degrees of freedom, n is the number of observations in the sample, b 1 is the slope of the regression line, and SE is the standard error of the slope.

  • Interpret results . Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

  • determine whether a predictor variable has a statistically significant relationship with an outcome variable.
  • estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

Table of contents

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis test regression analysis

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

  • whether your data meets certain assumptions.
  • the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

  • Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
  • Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
  • Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

  • Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
  • Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

  • Ordinal : represent data with an order (e.g. rankings).
  • Nominal : represent group names (e.g. brands or species names).
  • Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

Predictor variable Outcome variable Research question example
What is the effect of income on longevity?
What is the effect of income and minutes of exercise per day on longevity?
Logistic regression What is the effect of drug dosage on the survival of a test subject?

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

Predictor variable Outcome variable Research question example
Paired t-test What is the effect of two different test prep programs on the average exam scores for students from the same class?
Independent t-test What is the difference in average exam scores for students from two different schools?
ANOVA What is the difference in average pain levels among post-surgical patients given three different painkillers?
MANOVA What is the effect of flower species on petal length, petal width, and stem length?

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

Variables Research question example
Pearson’s  How are latitude and temperature related?

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

Predictor variable Outcome variable Use in place of…
Spearman’s 
Pearson’s 
Sign test One-sample -test
Kruskal–Wallis  ANOVA
ANOSIM MANOVA
Wilcoxon Rank-Sum test Independent t-test
Wilcoxon Signed-rank test Paired t-test

Prevent plagiarism. Run a free check.

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Choosing the right statistical test

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/statistics/statistical-tests/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, normal distribution | examples, formulas, & uses, what is your plagiarism score.

  • Privacy Policy

Research Method

Home » Regression Analysis – Methods, Types and Examples

Regression Analysis – Methods, Types and Examples

Table of Contents

Regression Analysis

Regression Analysis

Regression analysis is a set of statistical processes for estimating the relationships among variables . It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’).

Regression Analysis Methodology

Here is a general methodology for performing regression analysis:

  • Define the research question: Clearly state the research question or hypothesis you want to investigate. Identify the dependent variable (also called the response variable or outcome variable) and the independent variables (also called predictor variables or explanatory variables) that you believe are related to the dependent variable.
  • Collect data: Gather the data for the dependent variable and independent variables. Ensure that the data is relevant, accurate, and representative of the population or phenomenon you are studying.
  • Explore the data: Perform exploratory data analysis to understand the characteristics of the data, identify any missing values or outliers, and assess the relationships between variables through scatter plots, histograms, or summary statistics.
  • Choose the regression model: Select an appropriate regression model based on the nature of the variables and the research question. Common regression models include linear regression, multiple regression, logistic regression, polynomial regression, and time series regression, among others.
  • Assess assumptions: Check the assumptions of the regression model. Some common assumptions include linearity (the relationship between variables is linear), independence of errors, homoscedasticity (constant variance of errors), and normality of errors. Violation of these assumptions may require additional steps or alternative models.
  • Estimate the model: Use a suitable method to estimate the parameters of the regression model. The most common method is ordinary least squares (OLS), which minimizes the sum of squared differences between the observed and predicted values of the dependent variable.
  • I nterpret the results: Analyze the estimated coefficients, p-values, confidence intervals, and goodness-of-fit measures (e.g., R-squared) to interpret the results. Determine the significance and direction of the relationships between the independent variables and the dependent variable.
  • Evaluate model performance: Assess the overall performance of the regression model using appropriate measures, such as R-squared, adjusted R-squared, and root mean squared error (RMSE). These measures indicate how well the model fits the data and how much of the variation in the dependent variable is explained by the independent variables.
  • Test assumptions and diagnose problems: Check the residuals (the differences between observed and predicted values) for any patterns or deviations from assumptions. Conduct diagnostic tests, such as examining residual plots, testing for multicollinearity among independent variables, and assessing heteroscedasticity or autocorrelation, if applicable.
  • Make predictions and draw conclusions: Once you have a satisfactory model, use it to make predictions on new or unseen data. Draw conclusions based on the results of the analysis, considering the limitations and potential implications of the findings.

Types of Regression Analysis

Types of Regression Analysis are as follows:

Linear Regression

Linear regression is the most basic and widely used form of regression analysis. It models the linear relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values.

Multiple Regression

Multiple regression extends linear regression by incorporating two or more independent variables to predict the dependent variable. It allows for examining the simultaneous effects of multiple predictors on the outcome variable.

Polynomial Regression

Polynomial regression models non-linear relationships between variables by adding polynomial terms (e.g., squared or cubic terms) to the regression equation. It can capture curved or nonlinear patterns in the data.

Logistic Regression

Logistic regression is used when the dependent variable is binary or categorical. It models the probability of the occurrence of a certain event or outcome based on the independent variables. Logistic regression estimates the coefficients using the logistic function, which transforms the linear combination of predictors into a probability.

Ridge Regression and Lasso Regression

Ridge regression and Lasso regression are techniques used for addressing multicollinearity (high correlation between independent variables) and variable selection. Both methods introduce a penalty term to the regression equation to shrink or eliminate less important variables. Ridge regression uses L2 regularization, while Lasso regression uses L1 regularization.

Time Series Regression

Time series regression analyzes the relationship between a dependent variable and independent variables when the data is collected over time. It accounts for autocorrelation and trends in the data and is used in forecasting and studying temporal relationships.

Nonlinear Regression

Nonlinear regression models are used when the relationship between the dependent variable and independent variables is not linear. These models can take various functional forms and require estimation techniques different from those used in linear regression.

Poisson Regression

Poisson regression is employed when the dependent variable represents count data. It models the relationship between the independent variables and the expected count, assuming a Poisson distribution for the dependent variable.

Generalized Linear Models (GLM)

GLMs are a flexible class of regression models that extend the linear regression framework to handle different types of dependent variables, including binary, count, and continuous variables. GLMs incorporate various probability distributions and link functions.

Regression Analysis Formulas

Regression analysis involves estimating the parameters of a regression model to describe the relationship between the dependent variable (Y) and one or more independent variables (X). Here are the basic formulas for linear regression, multiple regression, and logistic regression:

Linear Regression:

Simple Linear Regression Model: Y = β0 + β1X + ε

Multiple Linear Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

In both formulas:

  • Y represents the dependent variable (response variable).
  • X represents the independent variable(s) (predictor variable(s)).
  • β0, β1, β2, …, βn are the regression coefficients or parameters that need to be estimated.
  • ε represents the error term or residual (the difference between the observed and predicted values).

Multiple Regression:

Multiple regression extends the concept of simple linear regression by including multiple independent variables.

Multiple Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

The formulas are similar to those in linear regression, with the addition of more independent variables.

Logistic Regression:

Logistic regression is used when the dependent variable is binary or categorical. The logistic regression model applies a logistic or sigmoid function to the linear combination of the independent variables.

Logistic Regression Model: p = 1 / (1 + e^-(β0 + β1X1 + β2X2 + … + βnXn))

In the formula:

  • p represents the probability of the event occurring (e.g., the probability of success or belonging to a certain category).
  • X1, X2, …, Xn represent the independent variables.
  • e is the base of the natural logarithm.

The logistic function ensures that the predicted probabilities lie between 0 and 1, allowing for binary classification.

Regression Analysis Examples

Regression Analysis Examples are as follows:

  • Stock Market Prediction: Regression analysis can be used to predict stock prices based on various factors such as historical prices, trading volume, news sentiment, and economic indicators. Traders and investors can use this analysis to make informed decisions about buying or selling stocks.
  • Demand Forecasting: In retail and e-commerce, real-time It can help forecast demand for products. By analyzing historical sales data along with real-time data such as website traffic, promotional activities, and market trends, businesses can adjust their inventory levels and production schedules to meet customer demand more effectively.
  • Energy Load Forecasting: Utility companies often use real-time regression analysis to forecast electricity demand. By analyzing historical energy consumption data, weather conditions, and other relevant factors, they can predict future energy loads. This information helps them optimize power generation and distribution, ensuring a stable and efficient energy supply.
  • Online Advertising Performance: It can be used to assess the performance of online advertising campaigns. By analyzing real-time data on ad impressions, click-through rates, conversion rates, and other metrics, advertisers can adjust their targeting, messaging, and ad placement strategies to maximize their return on investment.
  • Predictive Maintenance: Regression analysis can be applied to predict equipment failures or maintenance needs. By continuously monitoring sensor data from machines or vehicles, regression models can identify patterns or anomalies that indicate potential failures. This enables proactive maintenance, reducing downtime and optimizing maintenance schedules.
  • Financial Risk Assessment: Real-time regression analysis can help financial institutions assess the risk associated with lending or investment decisions. By analyzing real-time data on factors such as borrower financials, market conditions, and macroeconomic indicators, regression models can estimate the likelihood of default or assess the risk-return tradeoff for investment portfolios.

Importance of Regression Analysis

Importance of Regression Analysis is as follows:

  • Relationship Identification: Regression analysis helps in identifying and quantifying the relationship between a dependent variable and one or more independent variables. It allows us to determine how changes in independent variables impact the dependent variable. This information is crucial for decision-making, planning, and forecasting.
  • Prediction and Forecasting: Regression analysis enables us to make predictions and forecasts based on the relationships identified. By estimating the values of the dependent variable using known values of independent variables, regression models can provide valuable insights into future outcomes. This is particularly useful in business, economics, finance, and other fields where forecasting is vital for planning and strategy development.
  • Causality Assessment: While correlation does not imply causation, regression analysis provides a framework for assessing causality by considering the direction and strength of the relationship between variables. It allows researchers to control for other factors and assess the impact of a specific independent variable on the dependent variable. This helps in determining the causal effect and identifying significant factors that influence outcomes.
  • Model Building and Variable Selection: Regression analysis aids in model building by determining the most appropriate functional form of the relationship between variables. It helps researchers select relevant independent variables and eliminate irrelevant ones, reducing complexity and improving model accuracy. This process is crucial for creating robust and interpretable models.
  • Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.
  • Policy Evaluation and Decision-Making: Regression analysis plays a vital role in policy evaluation and decision-making processes. By analyzing historical data, researchers can evaluate the effectiveness of policy interventions and identify the key factors contributing to certain outcomes. This information helps policymakers make informed decisions, allocate resources effectively, and optimize policy implementation.
  • Risk Assessment and Control: Regression analysis can be used for risk assessment and control purposes. By analyzing historical data, organizations can identify risk factors and develop models that predict the likelihood of certain outcomes, such as defaults, accidents, or failures. This enables proactive risk management, allowing organizations to take preventive measures and mitigate potential risks.

When to Use Regression Analysis

  • Prediction : Regression analysis is often employed to predict the value of the dependent variable based on the values of independent variables. For example, you might use regression to predict sales based on advertising expenditure, or to predict a student’s academic performance based on variables like study time, attendance, and previous grades.
  • Relationship analysis: Regression can help determine the strength and direction of the relationship between variables. It can be used to examine whether there is a linear association between variables, identify which independent variables have a significant impact on the dependent variable, and quantify the magnitude of those effects.
  • Causal inference: Regression analysis can be used to explore cause-and-effect relationships by controlling for other variables. For example, in a medical study, you might use regression to determine the impact of a specific treatment while accounting for other factors like age, gender, and lifestyle.
  • Forecasting : Regression models can be utilized to forecast future trends or outcomes. By fitting a regression model to historical data, you can make predictions about future values of the dependent variable based on changes in the independent variables.
  • Model evaluation: Regression analysis can be used to evaluate the performance of a model or test the significance of variables. You can assess how well the model fits the data, determine if additional variables improve the model’s predictive power, or test the statistical significance of coefficients.
  • Data exploration : Regression analysis can help uncover patterns and insights in the data. By examining the relationships between variables, you can gain a deeper understanding of the data set and identify potential patterns, outliers, or influential observations.

Applications of Regression Analysis

Here are some common applications of regression analysis:

  • Economic Forecasting: Regression analysis is frequently employed in economics to forecast variables such as GDP growth, inflation rates, or stock market performance. By analyzing historical data and identifying the underlying relationships, economists can make predictions about future economic conditions.
  • Financial Analysis: Regression analysis plays a crucial role in financial analysis, such as predicting stock prices or evaluating the impact of financial factors on company performance. It helps analysts understand how variables like interest rates, company earnings, or market indices influence financial outcomes.
  • Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. Regression models provide insights into which marketing efforts are most effective and help optimize marketing campaigns.
  • Health Sciences: Regression analysis is extensively used in medical research and public health studies. It helps examine the relationship between risk factors and health outcomes, such as the impact of smoking on lung cancer or the relationship between diet and heart disease. Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices.
  • Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or social factors on various outcomes such as crime rates, academic performance, or job satisfaction.
  • Operations Research: Regression analysis is applied in operations research to optimize processes and improve efficiency. For example, it can be used to predict demand based on historical sales data, determine the factors influencing production output, or optimize supply chain logistics.
  • Environmental Studies: Regression analysis helps in understanding and predicting environmental phenomena. It can be used to analyze the impact of factors like temperature, pollution levels, or land use patterns on phenomena such as species diversity, water quality, or climate change.
  • Sports Analytics: Regression analysis is increasingly used in sports analytics to gain insights into player performance, team strategies, and game outcomes. It helps analyze the relationship between various factors like player statistics, coaching strategies, or environmental conditions and their impact on game outcomes.

Advantages and Disadvantages of Regression Analysis

Advantages of Regression AnalysisDisadvantages of Regression Analysis
Provides a quantitative measure of the relationship between variablesAssumes a linear relationship between variables, which may not always hold true
Helps in predicting and forecasting outcomes based on historical dataRequires a large sample size to produce reliable results
Identifies and measures the significance of independent variables on the dependent variableAssumes no multicollinearity, meaning that independent variables should not be highly correlated with each other
Provides estimates of the coefficients that represent the strength and direction of the relationship between variablesAssumes the absence of outliers or influential data points
Allows for hypothesis testing to determine the statistical significance of the relationshipCan be sensitive to the inclusion or exclusion of certain variables, leading to different results
Can handle both continuous and categorical variablesAssumes the independence of observations, which may not hold true in some cases
Offers a visual representation of the relationship through the use of scatter plots and regression linesMay not capture complex non-linear relationships between variables without appropriate transformations
Provides insights into the marginal effects of independent variables on the dependent variableRequires the assumption of homoscedasticity, meaning that the variance of errors is constant across all levels of the independent variables

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Uniform Histogram

Uniform Histogram – Purpose, Examples and Guide

Histogram

Histogram – Types, Examples and Making Guide

Bimodal Histogram

Bimodal Histogram – Definition, Examples

Critical Analysis

Critical Analysis – Types, Examples and Writing...

Hypothesis Testing and Regression Analysis

Cite this chapter.

hypothesis test regression analysis

  • Mustapha Akinkunmi 2  

Part of the book series: Synthesis Lectures on Engineering ((SLE))

63 Accesses

In this chapter, we look at the different stages of data preparation involved in quantitative analysis. Understanding these processes will help us gather reliable data and reach a valid conclusion. We will discuss types of hypotheses and how they are stated mathematically. Furthermore, we shall discuss hypothesis testing with worked examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Author information

Authors and affiliations.

American University of Nigeria, Nigeria

Mustapha Akinkunmi

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Akinkunmi, M. (2018). Hypothesis Testing and Regression Analysis. In: Data Mining and Market Intelligence. Synthesis Lectures on Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-79390-5_5

Download citation

DOI : https://doi.org/10.1007/978-3-031-79390-5_5

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-79389-9

Online ISBN : 978-3-031-79390-5

eBook Packages : Synthesis Collection of Technology (R0) eBColl Synthesis Collection 8

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 September 2024

Leader prohibitive voice behavior and its effects on followers through leader identification and political skill

  • Xueqin Tian 1 ,
  • Heesun Chae   ORCID: orcid.org/0000-0003-4748-8448 2 &
  • Youngjoe Kim 2  

Humanities and Social Sciences Communications volume  11 , Article number:  1219 ( 2024 ) Cite this article

Metrics details

  • Business and management

Prohibitive voice behavior plays a crucial role in organizational effectiveness by signaling previously undetected issues. However, individuals paradoxically perceive this behavior as difficult due to its emphasis on identifying harmful factors, which may lead to misinterpretation and other undesirable social consequences. Based on social learning theory, this study aims to identify how the leader’s prohibitive voice behavior affects the follower’s prohibitive voice behavior. We employed a quantitative research design, collecting and analyzing questionnaires from 317 leader-follower dyads in 59 Chinese companies. To test our hypotheses, we used mediation and moderation analyses, hierarchical regression analysis, and the PROCESS macro. The results partially supported the research hypothesis, indicating that leader prohibitive voice behavior positively influenced follower prohibitive voice behavior. Additionally, leader identification mediated this relationship, whereas follower political skill moderated the mediated relationship. Notably, the mediation’s direction contradicted our hypothesis; the mediated effect through leader identification was pronounced when follower political skill was at lower levels. This study proposes that the leader’s role as a role model is crucial in motivating followers’ prohibitive voice behavior. It highlights the significance of considering followers’ political skills when they emulate their leader’s behavior. From both theoretical and practical perspectives, we discussed these findings and highlighted their implications for organizational behavior and leadership development.

Introduction

To survive and thrive in an increasingly competitive business environment, companies must predict, identify, and respond to potential risks and opportunities, foster creativity and innovation, and continuously explore novel ideas (Villaluz and Hechanova, 2019 ). Through direct experience and interactions with customers, colleagues, subordinates, and supervisors during work tasks, employees can promptly detect and respond to new opportunities, problems, inefficiencies, and organizational improvements (Crant, 2000 ; Maria Stock et al. 2017 ). To enhance organizational processes and effectiveness, organizations increasingly anticipate employees’ expression of ideas, suggestions, and concerns (Grant, 2013 ).

Voice behavior refers to the voluntary expression of constructive opinions, ideas, or concerns regarding work-related issues, and it is associated with positive organizational outcomes (Ng and Feldman, 2012 ). Based on the content’s focus, we categorize voice behavior into two main types: promotive voice behavior and prohibitive voice behavior. Promotive voice behavior involves proposing new suggestions and solutions and is frequently perceived positively. In contrast, prohibitive voice behavior involves identifying harmful factors and evokes negative emotions and defensiveness (Liang et al. 2012 ). Prohibitive speakers directly highlight existing problems, thereby identifying stakeholder failures (Fast et al. 2014 ). The Doctrine of the Mean holds a significant and esteemed position, especially within the Chinese organizational context. The culture values relationships and preserving one’s reputation, with cultural norms emphasizing collective harmony and high-power distance (Wang et al. 2019 ). Individuals frequently refrain from posing challenging questions in public communication to prevent potentially embarrassing others. Particularly when interacting with leaders, followers focus on the leaders’ words and expressions, consider their feelings, and convey their opinions subtly and tactfully. Therefore, prohibitive speakers frequently experience greater negative social consequences and interpersonal risks than promotive speakers (Liang et al. 2012 ; Lin and Johnson, 2015 ).

Moreover, promotive voice behavior requires substantial time and effort to develop innovative ideas aimed at identifying strategies for enhancing the status quo. In contrast, prohibitive voice behavior highlights harmful practices or events that may undermine the status quo, making it highly relevant for organizational improvement (Svendsen et al. 2018 ). Prohibitive voice behavior can enhance team safety performance gains (Li et al. 2017 ) and promote team innovation (Liang et al. 2019 ). The lack of prohibitive voice behavior within organizations leads to the suppression of negative opinions, akin to the narrative depicted in the story of the emperor’s new clothes. Furthermore, it weakens organizations’ ability to correct mistakes, resulting in the collective phenomenon of organizational silence and the emergence of companies such as Theranos, which duped consumers and investors by falsifying blood tests for over a decade. Thus, although prohibitive voice behavior functions to avert potential problems and raise awareness regarding existing issues, individuals are reluctant to engage in it (MacMillan et al. 2020 ; Wei et al. 2015 ). Consequently, it is critical to formulate strategies that promote prohibitive voice behavior.

Previous studies have discovered that individual characteristics (e.g., extraversion, conscientiousness, core self-evaluation) (Aryee, et al. 2017 ; Liu et al. 2014 ), contextual factors (e.g., social support, voice climate, leader openness) (Frazier and Bowler, 2015 ; Jada and Mukhopadhyay, 2019 ; Prince and Rao, 2022 ), and psychological variables (e.g., felt obligation, commitment, justice, psychological safety) (Chamberlin et al. 2017 ; Liang et al. 2012 ; Miao et al. 2020 ; Qi et al. 2023 ) significantly affect employee prohibitive voice behavior. Particularly, leaders frequently interact with workplace members, possess the authority to evaluate and make decisions (Morrison, 2011 ; Yukl, 2012 ), influence workplace norms regarding voice, and directly encourage or hinder employee behavior (Detert and Burris, 2007 ; Yang et al. 2021 ). Prior research has focused on fostering prohibitive voice behavior by utilizing positive leadership strategies. For example, various studies have examined the effects of various leadership behaviors on employees’ voice behavior. These effects include ethical leadership, transformational leadership, inclusive leadership, and leader openness (Qi et al. 2023 ; Smith et al. 2017 ; Svendsen et al. 2018 ). However, a gap exists in the research regarding the specific leader behaviors that serve as role models and influence follower prohibitive voice behavior. This study addresses this gap by demonstrating how leaders can encourage prohibitive voice behavior among followers by embodying and modeling it themselves. Understanding the translation of leader prohibitive voice behavior into follower prohibitive voice behavior is essential because it involves risk even for leaders. This study explores the effect of leader prohibitive voice behavior, which is a potentially harmful aspect of leadership, on follower prohibitive voice behavior and the underlying mechanisms involved.

Social learning theory (SLT, Bandura, 1977 ) posits that individuals learn behavior through direct and vicarious experiences of observing others’ behavior and its consequences. However, the challenging nature of prohibitive voice behavior potentially limits its capacity to fully achieve behavioral imitation. The role model’s behavior is learned via a cognitive process where the observer’s characteristics and behavioral choices mutually influence each other (Grusec, 1992 ). Followers can learn leader prohibitive voice behavior by observing it, perceiving the organization’s contextual clues, and self-regulating based on their cognitive frame of reference.

One plausible explanation is that followers who positively perceive a leader’s prohibitive voice behavior are more likely to adopt in the same behavior as their leader. Leader identification involves developing a cognitive and emotional connection with a leader, accepting the leader’s values, goals, and vision (Kark et al. 2003 ), and holding a favorable perception of the leader’s behavior. Followers identify with a leader when they perceive the leader as a role model; thus, they imitate the leader’s prohibitive voice behaviors. Therefore, leader identification may be an essential mediating factor between leader and follower prohibitive voice behaviors. Social learning is a cognitive process where the learner engages in observation, acquisition, and decision-making regarding the adoption of learned behavior (Grusec, 1992 ). This study considers leader identification as a cognitive process that examines the mediation effect of leader and follower prohibitive voice behaviors.

Meanwhile, the effectiveness of leader identification and follower prohibitive voice may depend on individual differences (Morrison, 2011 ). Political skill is the capacity to influence followers’ behavior in the workplace with the aim of achieving personal and organizational goals (Ferris et al. 2005 ). It enables individuals to effectively navigate complex social situations, including managing interpersonal relationships and influencing others to achieve desired goals (Chang et al. 2023 ). Political skill, when viewed as a moderating variable, can significantly influence the effectiveness of social learning mechanisms. It enhances the transmission of leader prohibitive voice behavior to followers, especially where the behavior’s complexity requires astute political maneuvering. High political skill followers possess a greater ability to comprehend the underlying causes of a leader’s prohibitive voice behavior and effectively integrate it into their behaviors, thereby enhancing the social learning mechanism. Therefore, follower political skill can moderate the relationship between leader identification that arises from observing leader and follower prohibitive voice behaviors.

Regarding social learning, this study aims to identify the effect of the leader’s prohibitive voice on the follower’s prohibitive voice. First, it explores the direct influence of a leader’s prohibitive voice behavior as a role model or imitation object on follower-prohibitive voice behavior. Second, it analyzes the cognitive mechanism process of leader identification within the social learning theory’s framework and explores the mediation effect of leader identification between leader prohibitive and follower prohibitive voices. Third, it verifies the moderated mediation effect of political skill as an individual trait in social learning. The overall theoretical framework is illustrated in Fig. 1 .

figure 1

Conceptual framework.

Literature review

Prohibitive voice behavior.

Voice behavior is regarded as a form of extra-role behavior. It is not mere criticism, but rather a positive and challenging concept for improvement (Van Dyne and LePine, 1998 ). Liang et al. ( 2012 ) noted that voice behavior characteristics include the voluntary generation of new suggestions or ideas for organizations, as well as actions that inform problems and express concerns. They categorized voice behavior as either promotive or prohibitive. Promotive voice behavior presents new suggestions to emphasize proposals, whereas prohibitive voice behavior is past-oriented and aims at resolving existing problems (Morrison, 2011 ). Promotive voice emphasizes the expression of opportunities to enhance organizational functioning by implementing new things in novel ways in the future. Thus, individuals often perceive it as a good intention that can easily be interpreted positively. In contrast, prohibitive voice highlights existing harmful practices or events that may undermine the status quo (Liang et al. 2012 ).

Despite the shared desire for benign change motivating both voices, meta-analyses indicate significant differences in the antecedents of promotive and prohibitive voice behavior antecedents (Chamberlin et al. 2017 ). Compared with those engage in promotive voice behaviors, employees who engage in prohibitive voice behaviors are more likely to take risks. Due to employees’ emphasis on evading adverse outcomes and halting or preventing losses, they foster the perception that a prohibitive voice is critical and encourage defensive responses (Liang et al. 2012 ). A prohibitive voice generates negative emotions within organizations (Lin and Johnson, 2015 ) and damages interpersonal relationships (Wei et al. 2015 ). In such instances, the employees’ good intentions may not be easily recognized. Leaders typically give more negative evaluations (Chamberlin et al. 2017 ). Thus, employees often refrain from using prohibitive voices.

However, prohibitive voice behavior promotes greater adherence to safety measures and mitigates organizational development risks (Yang, 2020 ). In addition, in preventing organizations from experiencing unwarranted short-term failures, a prohibitive voice may be deemed more effective than a promotive voice. This is because the prohibitive voice highlights existing harmful problems and particularly helps organizations to prevent unnecessary failures (Liang et al. 2012 ). In contrast, promotive voice, which focuses on developing of innovative ideas, can be time- and effort-intensive. Therefore, additional effort and attention are required to motivate and promote prohibitive voice behavior among employees.

Prior research has established that transformational leadership (Svendsen et al. 2018 ), ethical leadership (Jada and Mukhopadhyay, 2019 ), high LMX (MacMillan et al. 2020 ), and servant leadership (Song et al. 2022 ) positively correlated with follower prohibitive voice. Leader positive behavior may heighten employees’ willingness to engage in risky discretionary behavior. This is because trust in robust positive leadership makes it safer to express concerns (Li et al. 2017 ; Liang et al. 2012 ), thereby facilitating interpersonal risk-taking. This study builds upon this discussion and examines leaders’ roles in reducing the burden of prohibitive voice behavior among followers and promoting its frequency.

Research hypotheses

Leader prohibitive voice behavior and follower-prohibitive voice behavior.

The social learning theory (Bandura, 1977 ) posits that human behavior can be learned by observing other people’s behavior without being directly rewarded or punished for specific behaviors. The social learning modeling comprises four elements: attention, retention, reproduction, and motivation (Bandura, 1977 ). Attention entails selecting a role model to observe and determining the observed behavior, whereas retention entails memorizing the observed behavior in a symbolic form. Reproduction involves converting symbolic images or language stored in memory into actual behavior. Finally, motivation entails inducing behavioral motivation to reinforce imitated behavior through observation. By observing and imitating, individuals can effectively learn without experiencing potentially adverse consequences, such as direct learning and the punishment associated with trial and error (Manz and Sims, 1981 ).

A leader assumes the role of a significant figure for their followers in an organizational context due to the leader’s status, power, and influence (Detert and Burris, 2007 ; Brown and Treviño, 2014 ). Working interactions provide followers with ample opportunities to observe their leaders. Leader behavior can be perceived as an exemplar that sets the standards for desirable behavior within the organization. Thus, followers may interpret and imitate the leader’s prohibitive voice behavior as the organization’s expectations. Based on social learning theory, individuals tend to select the most relevant features from a model, identify the types of behaviors they repeatedly observe, and learn from them most effectively (Bandura, 1977 ). Therefore, when followers observe leaders engaging in prohibitive voice behaviors to highlight issues and express concerns, they are inclined to instinctively imitate them.

Moreover, to conduct prohibitive voice behavior, the leader may gather information from followers, seek their opinions, and strive to include their insights into difficulties. This process indicates that the leader values followers’ knowledge and expertise, needs followers’ opinions, and supports voice behavior (Tangirala and Ramanujam, 2012 ). The perception level among followers regarding their ability to effectively communicate with their leaders and have their opinions acknowledged and actively considered directly influences their likelihood to engage in voice behavior (Detert and Burris, 2007 ). Conversely, followers will refrain from actively exercising voice behavior if they believe their leaders are uninterested in listening to their opinions (Ruck et al. 2017 ). By fostering trust in the leader, the perceived leader’s voice behavior positively affects the follower’s voice behavior (Son, 2019 ). Therefore, followers are more likely to imitate prohibitive voice behaviors when they observe their leaders utilizing them to express problems and concerns. The following hypothesis was formulated based on the above discussion and previous research.

Hypothesis 1. Leader prohibitive voice behavior positively affects follower prohibitive behavior .

Mediation effect of leader identification

Social learning is a cognitive process where observers observe behaviors, gather information, and determine whether to engage in the observed behaviors (Bandura, 1977 ). Understanding the effect of leader prohibitive voice behavior on followers’ information processing and decision-making through social learning mechanisms is necessary. According to social learning theory’s identification model, it involves a reproduction process that, through attention and maintenance, converts memorized symbolic information into actual behavior (Bandura, 1977 ). During this process, the social environment influences the employee’s judgment and perception, thereby directing the individual’s attention to specific information (Salancik and Pfeffer, 1978 ). Thus, the social environment provides clues that followers utilize to construct and interpret events, thereby influencing their perceptions of the organization and its leaders.

Prohibitive voice behavior entails raising concerns to prevent negative consequences and safeguard the organization (Liang et al. 2012 ). People who can provide effective information to mitigate organizational loss are typically experienced and capable. When leaders engage in prohibitive voice behavior and raise concerns regarding their current job and previous project experience, followers perceive them as demonstrating competency. In addition, prohibitive voice behavior is challenging and indicates organizational problems. Despite the potential for negative evaluation by other members, it can be recognized as a demonstration of honesty and responsibility. Therefore, followers regard a leader who exhibits prohibitive voice behavior as a respected and appealing role model.

Based on social learning theory, individuals gather information by paying attention to and observating the role model’s behavior. Consequently, the evaluation process of identification or self-regulation determines whether to reproduce the role model’s behavior (Bandura, 1977 ). Followers tend to accept the leader’s vision, goals, and values when they perceive their leader as an attractive and trustworthy role model. Moreover, they feel a sense of unity with the leaders’ successes and failures, and adopt the supervisor’s attitudes, values, and behaviors (Bandura, 1997 ; Zeng et al. 2021 ). In other words, when followers positively perceive a leader’s prohibitive voice behavior, they respect the leader, regard the leader as an example, identify with the leader, and ultimately adopt the leader’s beliefs and values (Yukl, 2012 ).

Moreover, social learning theory posits that interpersonal relationships can also be a means for learning behaviors, attitudes, and perceptions (Bandura, 1997 ). When followers recognize that their leader maintatins a positive attitude toward the organization, they identify with the leader and develop a similar attitude toward the organization. Moreover, leader prohibitive voice behavior is recognized as a positive and acceptable practice for achieving the organization’s goal when observed within that institution. The positive perception of a leader’s prohibitive voice behavior offsets the potential risks and costs of self-regulation, thereby motivating followers to imitate the leader’s prohibitive voice behavior. Consequently, leader identification will encourage active engagement in prohibitive voice behaviors among followers by imitating the leader’s behaviors and attitudes. Thus, leader prohibitive voice behavior can be explained by the identification process of attention-maintenance-reproduction-motivation resulting in follower-prohibitive voice behavior. Leader identification mediates the relationship between leader prohibitive and follower prohibitive voice behaviors.

Hypothesis 2. Leader identification mediates the relationship between leader prohibitive voice behavior and follower prohibitive voice behavior .

Moderated mediation effect of follower political skill

According to social learning theory, individuals control and alter their environments by utilizing their goals, needs, and cognitive systems (Bandura, 1997 ). Behavior formation can be significantly impacted by external reinforcing factors, such as reward and punishment, and internal factors, such as individual pride and sense of accomplishment. Because prohibitive voice behavior poses risks, including causing a negative reputation in the organization, the conviction that it can be performed successfully, that is, self-efficacy in voice behavior, can significantly impact the performance of prohibitive voice behavior. Followers with excellent political skills use their social insight and interpersonal influenc to effectively influence others (Ferris et al. 2007 ). They can skillfully control situations and interpersonal relationships in diverse social interactions (Ferris et al. 2005 ). As a result, they experience an increased sense of confidence when engaging in prohibitive voice behavior.

Due to the characteristics of political skill, such as a sense of control, stability, and confidence (Ferris et al. 2007 ; Jawahar et al. 2007 ), followers may perceive prohibitive voice behavior as less risky. Followers with high political skills can participate in prohibitive voice behavior more effectively while mitigating the potential negative consequences of such behavior (Hung et al. 2012 ). Specifically, followers with high political skills are adept at identifying problems and possess the confidence to effectively influence others in a way that benefits them (Ferris et al. 2005 ). Additionally, followers with high political skills can effectively persuade others and are better equipped to determine when, how, and to whom to voice their concerns; thus, they can express prohibitive voices appropriately (Ferris et al. 2005 ). In contrast, those with low political skills will exhibit less prohibitive voice behavior because they lack the ability or confidence to voice even when they exhibit high leader identification. Therefore, follower political skills have a moderated mediation effect on leader-prohibitive voice behavior, with leader identification influencing follower prohibitive voice behavior. Specifically, followers with higher political skills will experience a greater indirect effect than those with lower political skills. Based on the above discussion, we established the following hypotheses.

Hypothesis 3. The follower’s political skill positively moderates the mediated relationship between the leader’s and the follower’s prohibitive voice behavior; thus, the mediated effect through leader identification is more significant for higher levels of follower political skill .

Date collection and sample

To test the hypotheses and ensure the generalizability of the findings, we surveyed employees from 59 Chinese companies. The participants were from various industries, including manufacturing, finance, IT, services, and government agencies. This approach enabled us to gather a diverse sample that accurately represented various organizational contexts. The targeted departments included those with general management positions, where leaders may serve as observable role models. This enabled office workers to operate in an environment where they could directly observe their leaders’ behaviors. After explaining the study’s objective via email and obtaining consent from company managers, we distributed questionnaires to each of these managers. To avoid common method bias, we one-to-one matched leaders and followers and subsequently administered separate questionnaires to each group. Specifically, the outcome variable, follower prohibitive voice behavior, was assessed by immediate supervisors who continuously observed and evaluated their subordinates. A survey was conducted among the employees to gather data on the independent variable (leader prohibitive voice behavior), mediator (leader identification), and moderator (follower political skill). To enhance the reliability of the survey responses, participants voluntarily participated in the study. Before the survey, respondents were notified that all gathered data would be anonymized to guarantee the non-disclosure of individual identities and affiliations.

A total of 400 pairs of questionnaires were distributed, and 360 leader-follower dyads responded (response rate: 90%). After excluding those that did not answer sincerely or were not actually matched in leader-follower dyads, we used a total of 317 leader-follower dyads. The final employee sample included 145 males (45.7%) and 172 females (54.3%), showing a high percentage of females. The age groups were 20 s (21.1%), 30 s (55.6%), 40 s (21.4%), and 50 s (1.9%), showing the largest distribution in 30 s. In terms of education level, 60 participants were high school graduates (18.9%), 72 were two- or three-year college graduates (22.7%), 141 were four-year university graduates (44.5%), and 44 had completed graduate school (13.9%). In terms of tenure with their leader, 47 participants (14.8%) had worked fewer than 1 years, 173 (54.6%) had worked more than 1 years and fewer than 5 years, 65 (20.5%) had worked more than 5 years and fewer than 10 years, 27 (8.5%) had worked more than 10 years and fewer than 15 years, and 5 (1.6%) had worked more than 15 years with their leader.

To assess our hypotheses, items were rated on a 5-point Likert scale ranging from 1 = strongly disagree to 5 = strongly agree. The questionnaires were presented in Chinese. The scale items, which were originally developed in English, were translated into Chinese following standard back-translation procedures (Brislin, 1986 ).

Both leader prohibitive voice behavior and follower prohibitive voice behavior were measured using Liang et al. ( 2012 ) 5-item scale. A sample item of “This follower dares to voice out opinions on things that might affect efficiency in the work unit, even if that would embarrass others.” We replaced the word “This follower” in the items with “My leader” when measuring leader prohibitive voice behavior (for follower prohibitive voice behavior, α = 0.891; for leader prohibitive voice behavior α = 0.889, respectively).

Follower political skill

Follower political skill was measured on a 6-item scale developed by Ferris et al. ( 2005 ). A sample item for follower political skill is “At work, I know a lot of important people and am well connected.” (α = 0.829).

Leader identification

Leader identification was measured on a 7-item scale developed by Shamir et al. ( 1998 ). A sample item for leader identification is “I am proud to be under my leader’s command.” (α = 0.891).

Control variable

A meta-analysis that examines the relationship between demographic characteristics and voice behavior reveals a significantly positive relationship between voice behavior and age, education, and tenure (Duan et al. 2016 ). For instance, compared with female employees with lower education, male employees with higher education tend to demonstrate greater confidence in their voice behavior (LePine and Van Dyne, 1998 ). Moreover, older employees with greater social and work experience are more inclined to engage in voice behaviors (Detert and Burris, 2007 ), whereas those with longer tenure tend to possess higher voice efficacy (Detert and Burris, 2007 ). Furthermore, follower prohibitive voice behavior may vary based on tenure with a specific leader (Liang et al. 2012 ). We measured gender as a dichotomous dummy variable coded as 0 for males and 1 for females. Moreover, the educational level was measured in four categories: 1 = high school graduate, 2 = 2–3 years college graduate, 3 = 4-year university graduate, and 4 = graduate school. Finally, we measured age and tenure with leaders in terms of years. In addition to these variables, a 5-item leader promotive voice behavior developed by Liang et al. ( 2012 ) was added as a control variable. A sample item of leader promotive voice behavior is “My leader proactively voices out constructive suggestions that help the unit reach its goals.” (α = 0.844).

Confirmatory factory analysis

We conducted confirmatory factor analysis to assess the empirical distinctiveness of the study’s variables. This study employed four variables: leader-prohibitive voice behavior, follower-prohibitive voice behavior, leader identification, and follower political skill. The model fit is good when CFI, TLI, and IFI are 0.9 or more, and RMSEA is 0.08 or less (Kline, 2005 ). As shown in Table 1 , the hypothesized measurement model demonstrated an excellent fit with the data (χ²[224] = 333.278, CFI = 0.971, TLI = 0.966, IFI = 0.970, RMSEA = 0.039) and was superior to any alternative factor model. Thus, the study was conducted using the four-factor model assumed for this study variables.

Descriptive statistics and correlation analysis

We performed the descriptive statistical analysis and correlation analysis to investigate the correlations between the major variables. Reported in Table 2 are means, SDs, and correlations among the focal variables. As a result of correlation analysis, the independent variable, leader prohibitive voice behavior, had statistically significant correlations with the mediating variables, leader identification ( r = 0.317, p < 0.001), the moderated mediating variable, follower political skill ( r = 0.391, p < 0.001), and the dependent variable, follower prohibitive voice behavior ( r = 0.499, p < 0.001). In addition, it revealed that leader identification had a positive correlation with follower political skill ( r = 0.456, p < 0.001) and follower prohibitive voice behavior ( r = 0.611, p < 0.001). The relationships between all variables were consistent with hypotheses, subsequently we conducted regression analysis to retest the correlations between them.

Hypothesis testing

The hypotheses were tested using hierarchical regression analyses in SPSS and the Hayes ( 2013 ) PROCESS macro for the mediation and moderation analyses. To mitigate multicollinearity before conducting statistical analysis for hypothesis testing, a mean-centering approach was applied to all variables included in the regression analysis. Additionally, it was determined that the values of the variance inflation factor (VIF) did not exceed a maximum of 1.80 (Aiken, 1991 ). Therefore, multicollinearity was not a significant issue in this study. The results of Model 2 for the proposed model (see Table 3 ) showed that the positive effect of leader prohibitive voice behavior on follower prohibitive voice behavior was statistically significant ( β  = 0.25, p < 0.001), thereby supporting Hypothesis 1.

As the confidence interval did not include zero, the PROCESS results suggested that the indirect effect of leader prohibitive voice behavior on follower prohibitive voice behavior via leader identification was statistically significant (effect estimate = 0.11, se = 0.05, 95% confidence interval (CI) = 0.023 and 0.209). Thus, Hypothesis 2 was supported. As shown in Table 3 , Hypothesis 3 was tested using a four-step hierarchical regression. All control variables and two dimensions of leader voice behavior were entered in Step 1, whereas the mediator and moderator were entered in Steps 2 and 3, respectively. Using the cross-product of the mean-centered mediator and moderator, the interaction term was created and computed. According to the results, follower prohibitive voice behavior was significantly related to the interaction between leader identification and follower political skill ( β  = −0.15, p  < 0.001).

Following Aiken ( 1991 )’s recommendations, this interaction effect was plotted at one standard deviation above and below the mean of follower political skill. As depicted in Fig. 2 , the simple slope analysis demonstrated a stronger and more significant positive relationship between leader prohibitive voice behavior and follower prohibitive voice behavior at lower levels of follower political skill (slope = 0.427, p  < 0.001) and a non-significant relationship at higher levels of follower political skill (slope = −0.213, ns.).

figure 2

Interaction of leader identification and follower political skill on follower prohibitive voice behavior.

Furthermore, we analyzed the patterns of conditional indirect effects to test the moderated mediation hypothesis (Preacher et al. 2007 ). Table 4 shows the significant negative moderating role of follower political skill. When follower political skill is low ( b = 0.34, SE = 0.05, p < 0.001, 95% CI of 0.017 and 0.204), the conditional indirect effect of leader prohibitive voice behavior on follower prohibitive voice behavior through leader identification is significant and positive; however, it is insignificant when follower political skill is high ( b  = −0.16, SE = 0.03, ns, 95% CI of −0.014 and 0.142). Thus, the moderating role of follower political skill is significant, but this pattern contradicts our expectations, indicating a negative moderating effect that refutes Hypothesis 3.

Overall findings

This study analyzed the effect of leader prohibitive voice behavior on follower prohibitive voice behavior, the mediation effect of leader identification, and the moderated mediation effect of follower political skill. Using 317 dyads of leader-follower data, we revealed that a leader’s prohibitive voice behavior positively affected a follower’s prohibitive voice behavior. Additionally, we confirmed that leader identification mediated the relationship between leader and follower prohibitive voice behaviors. In contrast, follower political skills moderated the relationship between leader identification and follower prohibitive voice behavior; however, the direction of moderation contradicts our hypothesis prediction. The following section emphasizes the theoretical implications of the current analysis, as well as the study’s limitations, which indicate avenues for future research.

Theoretical implications

This study significantly contributes to the body of knowledge by examining voice behavior in two distinct dimensions and highlighting the previously understudied aspect of prohibitive voice behavior. It addresses the gap in existing research, which has focused on other forms of voice behavior. Moreover, it examines prohibitive voice behavior as a distinct dimension. Consequently, it provides valuable insights into the broader understanding of organizational voice behavior (Liang et al. 2012 ). From the organization’s perspective, prohibitive voice behavior presents opportunities to identify and address existing issues or to prevent potential harm to the current workflow or processes (Ng and Feldman, 2012 ). However, from an individual’s standpoint, engaging in prohibitive voice behavior can pose risks, such as causing unexpected interpersonal problems or being unintentionally misunderstood (Milliken et al. 2003 ). Consequently, individuals may be reluctant to invest their time and effort in prohibitive voice behavior. In the context of Chinese culture, it is crucial to recognize that individuals typically perceive prohibitive voice behavior, as risky. This is due to the potential negative effect of such actions on their reputation and relationships with supervisors and colleagues. This cultural background is characterized by a significant focus on harmony and high-power distance, resulting in employees being cautious about directly addressing inefficiencies or problems. This is because such an action may cause the supervisor to lose reputation, and the speaker may incur greater social risks compared to Western culture speakers (Wang et al. 2019 ). The Chinese inclination to employ context-dependent perceptual processes, where the target object is linked to the context, further complicates direct communication (Liang et al. 2019 ). Therefore, exploring ways to mitigate the cost factors associated with prohibitive voice behavior is essential. Based on this study, focusing on the role of leaders is one of the ways to address this issue.

Second, based on social learning theory (Bandura, 1977 ), this study examines the impact of leader prohibitive voice behavior on the formation of follower prohibitive voice behavior. A leader embodies a potent social context force that influences the collective value of certain behaviors (Yukl, 2012 ). This effect may be attributed to the characteristics of leaders as significant role models within the organization, as well as the characteristics of voice behavior as a role-exceeding behavior that attracts the attention of organizational members. Imitation and emulation may occur naturally by observing a leader’s prohibitive voice behavior. Although previous research on voice behavior emphasized the impact of leadership behavior on follower voice behavior (Chamberlin et al, 2017 ; Morrison, 2011 ), this study focused on social learning and modeling. It highlighted how a leader’s prohibitive voice behavior is imitated and transferred to followers. Therefore, this study contributes to the application of social learning theory by facilitating an understanding of the formation of prohibitive voice behavior.

Fourth, this research contributes to the leadership literature and enhances our understanding of the significance of leader prohibitive voice behavior in influencing follower prohibitive voice behavior. Although past studies have focused on positive leadership with a follower’s voice (Chamberlin et al. 2017 ), the effect of leader prohibitive voice behavior remains unknown. To address this vital question, we presented empirical evidence of the effect of leader prohibitive voice behavior on follower prohibitive voice behavior. We have demonstrated that leader prohibitive voice behavior significantly affects follower prohibitive voice behavior through leader identification.

Fifth, this research enhances our understanding of the cognitive processes of organizational members by evaluating the impact of leader-prohibitive voice on follower-prohibitive voice behavior. It demonstrates that a leader’s prohibitive voice behavior evokes leader identification and influences the follower’s prohibitive behavior. Leader identification mediates the relationship between the leader and the follower’s prohibitive voice behavior. When followers are attracted to their leader’s prohibitive voice behavior, they identify a sense of identification with their leaders based on trust and respect. By exploring self-regulation, internalization, and homogenization, this research explicitly identifies the mechanisms from the perspective of cognitive processes. Based on these processes, it explains how to enhance our understanding of followers’ engagement in prohibitive voice behavior. It demonstrates that followers tend to imitate and engage in voice behavior due to identifying with their leaders’ actions, even amidst potential risks and challenges. This study enhances our understanding of the mystery that underlies the relationship between leader and follower prohibitive voice behaviors by revealing the mediating effect of leader identification.

Sixth, the study discovered results that contradict the hypothesized trend of the moderated mediation effect of political skill in the mechanism through which leader prohibitive voice behavior influences follower prohibitive voice behavior. Specifically, followers with higher political skills were expected to demonstrate a stronger relationship between leader identification and prohibitive voice behavior due to their self-efficacy and ability to effectively engage in voice behavior. However, the findings indicated that the relationship was stronger for followers with lower political skills. This contradictory result implies that the motivation for social learning and engagement in voice behavior varies significantly based on individual characteristics. A plausible explanation is that in environments where leader identification is robust, it significantly impacts follower behavior, thereby overshadowing the effects of individual political skills. Followers with lower political skills may depend more on leader identification to mitigate the risks associated with prohibitive voice behavior. This can result in a greater tendency to engage in such behavior when they identify with their leaders. Additionally, considering this study’s cultural context, which involved employees and leaders in China, the established congruence between leader and follower may exert a significant influence irrespective of individual political traits. Leader identification may have a particularly pronounced impact in Chinese organizational contexts, where hierarchical relationships and collective harmony are emphasized (Wang et al. 2019 ). This phenomenon can result in stronger prohibitive voice behavior among followers with lower political skills when closely aligning with their leaders. These insights highlight the intricate interplay between leader behaviors and follower attributes in promoting prohibitive voice behavior. Although political skill remains a relevant individual characteristic, the demonstrated influence of leader identification in fostering a conducive environment for voice behavior is paramount. Future research should further explore these dynamics across diverse cultural and organizational contexts to enhance our understanding of how to effectively promote prohibitive voice behavior.

Practical implications

Firstly, to promote followers’ voice behavior, leaders must engage in prohibitive voice behavior. Fostering proactive voice behavior in China, where moderation and balance are highly emphasized, can be challenging due to cultural factors. To address this, the leadership role becomes crucial. As role models for followers, they must demonstrate proactive and exemplary voice behavior. By observing and emulating leaders who engage in prohibitive voice behavior, followers can naturally adopt and practice such behavior. Therefore, instead of relying solely on direct instructions, displaying the role model behavior of leaders can be more effective in enhancing followers’ prohibitive voice behavior. In other words, when leaders practice prohibitive voice behavior, it can motivate followers actively to engage in risky prohibitive voice behavior. Ultimately, this can result in the development of a constructive voice behavior that values organizational culture.

Secondly, this study reveals that subordinate identification with leaders facilitates prohibitive voice behavior. Identification is vital for satisfying basic needs, such as respect and a sense of belonging. It suggests that for leaders’ prohibitive voice behavior to impact followers’ prohibitive voice behavior positively, followers must identify themselves with leaders who engage in prohibitive voice behavior. Because identification is highly flexible, followers may find alternative categories to identify with if leaders do not actively seek to be the target of follower identification. For example, due to emotional contagion effects, identification with peers, customers, or labor unions may occur, and this can be counterproductive to leaders’ goals.

Third, followers must observe leaders’ behavior to gain positive influences and develop identification with their leaders. In this regard, organizations can find human resource development programs that emphasize talent development to be vital. These programs should guide leaders to become positive role models regarding their behavior and attitudes. For instance, if employees are encouraged to exceed their assigned roles and engage in voice behavior, the organization can develop programs that provide financial rewards and non-monetary incentives, such as recognition and praise. Employees are more likely to engage in voluntary and challenging voice behavior if they believe that the organization supports and values such behavior.

Fourth, followers’ political skills positively influence their prohibitive voice behavior within the organization. This is because individuals with higher political skills are more sensitive to and capable of utilizing the information provided in social contexts. Political skill is an innate trait and a capability that can be cultivated through learning (Ahearn et al. 2004 ; Munyon et al. 2015 ), thereby enhancing the managerial perspective regarding approachability. Therefore, managers can enhance challenging voice behavior within the organization by offering followers mentoring or developmental opportunities to enhance their political skills.

Fifth, this study reveals that followers with lower political skills depend more on leader identification to engage in prohibitive voice behavior. Leader identification significantly mitigates the risks for these followers, emphasizing the significance of leaders’ behaviors and commitment to safety and communication. This aligns with the findings of Wang and Zhou ( 2021 ), which indicate that leaders’ influence outweighs individual attributes when it comes to determining voice behavior. Even employees with limited political skills are more inclined to engage in prohibitive voice when leaders demonstrate a strong commitment to safety and risk mitigation. These findings emphasize the crucial role of leaders in fostering a supportive environment, despite personal attributes, such as political skill, are being significant. Leaders who model prohibitive voice behavior and promote a culture of openness enhance followers’ inclination to engage in such behavior. Particularly, this effect is pronounced when leader identification is high because followers tend to emulate leaders’ behaviors when they align with their values. Consequently, leadership is crucial in influencing organizational culture. Leaders who aggressively promote safety, encourage transparent communication, and model prohibitive voice behavior foster an environment that supports and values the expression of concerns. This empowers followers with lower political skills and fosters a proactive, risk-aware culture. The evidence indicates that organizational leaders can crucial in fostering a culture that promotes prohibitive voice behavior, transcending individual traits such as political skills. Future research should explore the interplay between leader behaviors and follower attributes to effectively promote prohibitive voice behavior in various contexts.

Limitations and future research

First, to alleviate the issue of common method bias, data on independent and dependent variables, we collected from leaders and followers who represent different sources. However, despite these efforts, the cross-sectional design is still limited due to the simultaneous administratin of the survey on independent and dependent variables. In future research, longitudinal studies would be necessary to provide a better understanding of causality. Second, this study is based on Chinese data and may reflect Chinese cultural characteristics. Consequently, this may reduce our findings’ generalizability to other cultures. Therefore, it is necessary to verify whether similar results are obtained in different cultural settings.

Third, future research should consider various mediating variables not addressed in this study to understand the relationship between leader and follower prohibitive voice behaviors. Other potential factors may influence follower-prohibitive voice behavior; therefore, further exploration and validation of additional variables are necessary. For example, potential variables, such as emotional factors, perceived fairness, or the quality of exchange relationships with leaders, should be considered. Followers with a high LMX (leader–member exchange) relationship enjoy higher levels of trust, latitude, mutual support, and loyalty (Schriesheim et al. 1999 ) due to greater discretion from their leaders. Thus, high LMX may encourage employees to actively follow the leader’s prohibitive voice behavior.

Fourth, this study considered only prohibitive voice behavior as a separate variable. However, to enhance explanatory power, it is vital to consider the mediating effects of the contrasting dimension of voice behavior, promotive voice behavior. Accordingly, it is valuable to design a research model that theorizes the two-dimensional voice behavior type in a parallel structure to confirm the influence or distinction among the two voice behaviors.

Fifth, based on social exchange theory and social learning theory, further research is needed on the relationship between leader voice behavior, leader identification, and follower voice behavior. According to the principle of reciprocity in social exchange theory (Blau, 1964 ), followers form a positive perception of their leader, known as leader identification when they believe that the leader’s voice behavior is intended to resolve organizational or follower issues. In this process, followers feel obligated to reciprocate and are more likely to engage in voice behavior (Ng and Feldman, 2012 ).

Sixth, future research should examine that boundary conditions affecting the social learning dynamics between leader voice behavior and follower voice behavior, extending beyond the proposed follower’s political skill. For example, individuals with a proactive personality or high achievement motivation are more likely to exhibit a challenging and proactive attitude (Chae and Park, 2022 ). Consequently, they are more inclined to actively engage in challenging voice behavior. Additionally, organizational situational characteristics may serve as moderating variables. There is a higher probability of engaging in voice behavior in decentralized and open organizations compared to centralized and closed ones. Therefore, research should be conducted on the various situational factors that influence voice behavior.

Prohibitive voice behavior, which is characterized by complex consequences and antecedents, is riskier and more challenging than promotive voice behavior (Liang et al. 2012 ). Based on social learning theory (Bandura, 1977 ), this study focuses on how leader prohibitive voice behavior, with a focus on the leader’s influence. Specifically, because followers tend to imitate prohibitive voice behavior through leader identification, the leader’s role as a role model is crucial in motivating followers’ prohibitive voice behavior. Additionally, the study emphasizes the significance of considering followers’ personal traits of political skill when they emulate their leader’s behavior. Therefore, this research underscores the crucial significance of leader prohibitive voice behavior, leader identification, and followers’ political skills in reinforcing follower prohibitive voice behavior. Future investigation is needed to explore the precise mechanisms through which specific leader behaviors impact follower prohibitive voice behavior.

Data availability

The datasets generated and/or analyzed during this study are not publicly available to protect participant confidentiality and adhere to organizational privacy policies. However, the data may be made available by the corresponding author upon reasonable request for legitimate research purposes, provided that participant anonymity is maintained.

Ahearn KK, Ferris GR, Hochwarter WA et al. (2004) Leader political skill and team performance. J. Manag. 30(3):309–327. https://doi.org/10.1016/j.jm.2003.01.004

Article   Google Scholar  

Aiken LS (1991) Multiple regression: Testing and interpreting interactions. Sage

Aryee S, Walumbwa FO, Mondejar R et al. (2017) Core self-evaluations and employee voice behavior: Test of a dual-motivational pathway. J. Manag. 43(3):946–966. https://doi.org/10.1177/0149206314546192

Bandura A (1997) Self-efficacy: The exercise of control. Freeman, New York, NY

Google Scholar  

Bandura A (1977) Social learning theory. Eng-lewood Cliffs, NJ’ Prentice-Hall

Blau H (1964) The Iroquois white dog sacrifice: Its evolution and symbolism. Ethnohistory 11(2):97–119. https://doi.org/10.2307/480853

Brislin RW (1986) The wording and translation of research instruments. In Lonner WL & Berry JW (Eds.) Field methods in cross-cultural research. Sage, Newbury Park, CA, p 137–164

Brown ME, Treviño LK (2014) Do role models matter? An investigation of role modeling as an antecedent of perceived ethical leadership. J Bus Ethics 122:587–598. https://doi.org/10.1007/s10551-013-1769-0

Chae H, Park J (2022) The effect of proactive personality on creativity: The mediating role of feedback-seeking behavior. Sustainability 14(3):1495. https://doi.org/10.3390/su14031495

Chamberlin M, Newton DW, Lepine JA (2017) A meta‐analysis of voice and its promotive and prohibitive forms: Identification of key associations, distinctions, and future research directions. Pers. Psychol. 70(1):11–71. https://doi.org/10.1111/peps.12185

Chang ML, Tang AD, Cheng CF et al. (2023) The bright side of environmental uncertainty for organizational learning: the moderating role of political skill. Asian Bus. Manag. 22(3):978–1007. https://doi.org/10.1057/s41291-022-00185-3

Crant JM (2000) Proactive behavior in organizations. J Manag 26(3):435–462. https://doi.org/10.1177/01492063000260030

Detert JR, Burris ER (2007) Leadership behavior and employee voice: Is the door really open? Acad. Manag. J. 50(4):869–884. https://doi.org/10.5465/amj.2007.26279183

Duan JY, Zhang C, Xu Y (2016) A meta-analysis of the relationship between demographic characteristics and employee voice behavior. Adv. Psychol. Sci. 24(10):1568–1582. https://doi.org/10.3724/SP.J.1042.2016.01568

Fast NJ, Burris ER, Bartel CA (2014) Managing to stay in the dark: Managerial self-efficacy, ego defensiveness, and the aversion to employee voice. Acad. Manag. J. 57(4):1013–1034. https://doi.org/10.5465/amj.2012.0393

Ferris GR, Davidson SL, Perrewe PL (2005) Political skill at work, Impact on work effectiveness. Davies-Black Pub, Mountain View, CA

Ferris GR, Treadway DC, Perrewé PL et al. (2007) Political skill in organizations. J. Manag. 33(3):290–320. https://doi.org/10.1177/0149206307300813

Frazier ML, Bowler WM (2015) Voice climate, supervisor undermining, and work outcomes: A group-level examination. J. Manag. 41(3):841–863. https://doi.org/10.1177/014920631143453

Grant AM (2013) Rocking the boat but keeping it steady: The role of emotion regulation in employee voice. Acad. Manag. J. 56(6):1703–1723. https://doi.org/10.5465/amj.2011.0035

Grusec JE (1992) Social learning theory and developmental psychology: The legacies of Robert Sears and Albert Bandura. Develop. Psychol. 28:776–786. https://doi.org/10.1037/0012-1649.28.5.776

Hayes AF (2013) Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. The Guilford Press, New York

Hung HK, Yeh RS, Shih HY (2012) Voice behavior and performance ratings: The role of political skill. Int. J. Hospitality Manag. 31(2):442–450. https://doi.org/10.1016/j.ijhm.2011.07.002

Jada UR, Mukhopadhyay S (2019) Understanding the effects of empowering, transformational and ethical leadership on promotive and prohibitive voice: A moderated mediated examination. Pers. Rev. 48(3):707–730. https://doi.org/10.1108/PR-11-2017-0365

Jawahar IM, Stone TH, Kisamore JL (2007) Role conflict and burnout: The direct and moderating effects of political skill and perceived organizational support on burnout dimensions. Int. J. Stress Manag. 14(2):142. https://doi.org/10.1037/1072-5245.14.2.142

Kark R, Shamir B, Chen G (2003) The two faces of transformational leadership: empowerment and dependency. J. Appl. Psychol. 88(2):246–255. https://doi.org/10.1037/0021-9010.88.2.246

Article   PubMed   Google Scholar  

Kline RB (2005) Principles and practice of structural equation modeling: Methodology in the social sciences. Guilford Publications, New York

LePine JA, Van Dyne L (1998) Predicting voice behavior in work groups. J. Appl. Psychol. 83(6):853–868. https://doi.org/10.1037/0021-9010.83.6.853

Li AN, Liao H, Tangirala S et al. (2017) The content of the message matters: The differential effects of promotive and prohibitive team voice on team productivity and safety performance gains. J. Appl. Psychol. 102(8):1259–1270. https://doi.org/10.1037/apl0000215

Liang J, Farh CIC, Farh JL (2012) Psychological antecedents of promotive and prohibitive voice: A two-wave examination. Acad. Manag. J. 55(1):71–92. https://doi.org/10.5465/amj.2010.0176

Liang J, Shu R, Farh CIC (2019) Differential implications of team member promotive and prohibitive voice on innovation performance in research and development project teams: A dialectic perspective. J. Organ. Behav. 40:91–104. https://doi.org/10.1002/job.2325

Lin SHJ, Johnson RE (2015) A suggestion to improve a day keeps your depletion away: Examining promotive and prohibitive voice behaviors within a regulatory focus and ego depletion framework. J. Appl. Psychol. 100(5):1381. https://doi.org/10.1037/apl0000018

Liu SM, Liao JQ, Liao S et al. (2014) The influence of prohibitive voice on proactive personality traits of extraversion, conscientiousness, and neuroticism. Soc. Behav. Personality: Int. J. 42(7):1099–1104. https://doi.org/10.2224/sbp.2014.42.7.1099

MacMillan K, Hurst C, Kelley K et al. (2020) Who says there’s a problem? Preferences on the sending and receiving of prohibitive voice. Hum. Relat. 73(8):1049–1076. https://doi.org/10.1177/00187267198502

Manz CC, Sims Jr HP (1981) Vicarious learning: The influence of modeling on organizational behavior. Acad. Manag. Rev. 6(1):105–113. https://doi.org/10.5465/amr.1981.4288021

Maria Stock R, Jong A, Zacharias NA (2017) Frontline employees’ innovative service behavior as key to customer loyalty: Insights into FLEs’ resource gain spiral. J. Prod. Innov. Manag. 34(2):223–245. https://doi.org/10.1111/jpim.12338

Miao R, Lu L, Cao Y et al. (2020) The high-performance work system, employee voice, and innovative behavior: The Ferris GR, Treadway DC, Perrewé PL et al (2007) Political skill in organizations. J. Manag. 33(3):290–320. https://doi.org/10.1177/0149206307300813

Milliken FJ, Morrison EW, Hewlin PF (2003) An exploratory study of employee silence: Issues that employees don’t communicate upward and why. J. Manag. Stud. 40(6):1453–1476. https://doi.org/10.1111/1467-6486.00387

Morrison EW (2011) Employee voice behavior: Integration and directions for future research. Acad. Manag. Ann. 5(1):373–412. https://doi.org/10.5465/19416520.2011.574506

Munyon TP, Summers JK, Thompson KM et al. (2015) Political skill and work outcomes: A theoretical extension, meta‐analytic investigation, and agenda for the future. Pers. Psychol. 68(1):143–184

Ng TWH, Feldman DC (2012) Employee voice behavior: A meta‐analytic test of the conservation of resources framework. J. Organ. Behav. 33(2):216–234. https://doi.org/10.1002/job.754

Preacher KJ, Rucker DD, Hayes AF (2007) Addressing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivar. Behav. Res. 42(1):185–227. https://doi.org/10.1080/00273170701341316

Prince R, Rao MK (2022) Efficacy beliefs and employee voice: the role of perceived influence and manager openness. Int. J. Product. Perform. Manag. 71(8):3331–3347. https://doi.org/10.1108/IJPPM-05-2020-0266

Qi L, Xu Y, Liu B (2023) Does justice matter in voice? Inclusive leadership and employee voice: the moderating role of organizational justice perception. Frontiers in Psychology 14. https://doi.org/10.3389/fpsyg.2023.1313922

Ruck K, Welch M, Menara B (2017) Employee voice: an antecedent to organizational engagement? Public Relat. Rev. 43(5):904–914. https://doi.org/10.1016/j.pubrev.2017.04.008

Salancik GR, Pfeffer J (1978) A social information processing approach to job attitudes and task design. Adm. Sci. Q. 23(2):224–253. https://doi.org/10.2307/2392563

Schriesheim CA, Castro SL, Cogliser CC (1999) Leader-member exchange (LMX) research: A comprehensive review of theory, measurement, and data-analytic practices. Leadersh. Q. 10(1):63–113. https://doi.org/10.1016/S1048-9843(99)80009-5

Shamir B, Zakay E, Breinin E et al. (1998) Correlates of charismatic leader behavior in military units: Subordinates’ attitudes, unit characteristics, and superiors’ appraisals of leader performance. Acad. Manag. J. 41(4):387–409. https://doi.org/10.5465/257080

Smith MJ, Young DJ, Figgins SG et al. (2017) Transformational leadership in elite sport: A qualitative analysis of effective leadership behaviors in cricket. Sport Psychologist 31(1):1–15. https://doi.org/10.1123/tsp.2015-0077

Son SJ (2019) The role of supervisors on employees’ voice behavior. Leadersh. Organ. Dev. J. 40(1):85–96. https://doi.org/10.1108/LODJ-06-2018-0230

Song Y, Tian Q, Kwan HK (2022) Servant leadership and employee voice: a moderated mediation. J. Manag. Psychol. 37(1):1–14. https://doi.org/10.1108/JMP-02-2020-0077

Svendsen M, Unterrainer C, Jønsson TF (2018) The effect of transformational leadership and job autonomy on promotive and prohibitive voice: A two-wave study. J. Lead. Organ. Stud. 25(2):171–183. https://doi.org/10.1177/1548051817750536

Tangirala S, Ramanujam R (2012) Ask and you shall hear (but not always): Examining the relationship between manager consultation and employee voice. Pers. Psychol. 65(2):251–282. https://doi.org/10.1111/j.1744-6570.2012.01248.x

Van Dyne L, LePine JA (1998) Helping and voice extra-role behaviors: Evidence of construct and predictive validity. Acad. Manag. J. 41(1):108–119. https://doi.org/10.5465/256902

Villaluz VC, Hechanova MRM (2019) Ownership and leadership in building an innovation culture. Leadersh. Organ. Dev. J. 40(2):138–150. https://doi.org/10.1108/LODJ-05-2018-0184

Wang H, Wu W, Liu Y, Hao S, Wu S (2019) In what ways do Chinese employees speak up? An exchange approach to supervisor–subordinate guanxi and voice behavior. Int. J. Hum. Resour. Manag. 30(3):479–501. https://doi.org/10.1080/09585192.2016.1253030

Article   ADS   Google Scholar  

Wang X, Zhou F (2021) Managing the uncertainties inherent in prohibitive voice: how leadership interacts with employee political skill. Front. Psychol. 12:702964. https://doi.org/10.3389/fpsyg.2021.702964

Article   PubMed   PubMed Central   Google Scholar  

Wei X, Zhang ZX, Chen XP (2015) I will speak up if my voice is socially desirable: A moderated mediating process of promotive versus prohibitive voice. J. Appl. Psychol. 100(5):1641–1652. https://doi.org/10.1037/a0039046

Yang L (2020) Regulatory fit demonstrates that prohibitive voice does not lead to low performance evaluation. Front. Psychol. 11:581162. https://doi.org/10.3389/fpsyg.2020.581162

Yang Y, Li J, Sekiguchi T (2021) How supervisors respond to employee voice: an experimental study in China and Japan. Asian Bus. Manag. 20:1–31. https://doi.org/10.1057/s41291-019-00075-1

Yukl G (2012) Effective leadership behavior: What we know and what questions need more attention. Acad. Manag. Perspect. 26(4):66–85. https://doi.org/10.5465/amp.2012.0088

Zeng K, Wang D, Ye Q et al. (2021) Influence of an individual’s unethical behaviour on peers’ vicarious learning in organizations: the role of moral anger. Chin. Manag. Stud. 15(3):557–574. https://doi.org/10.1108/CMS-08-2019-0281

Download references

Acknowledgements

This work was supported by the Pukyong National University Industry-university Cooperation Research Fund in 2023 (202312350001).

Author information

Authors and affiliations.

School of Urban Governance and Public Affairs, Suzhou City University, Suzhou City, PR China

Xueqin Tian

School of Business Administration, Pukyong National University, Busan, Republic of Korea

Heesun Chae & Youngjoe Kim

You can also search for this author in PubMed   Google Scholar

Contributions

All of the authors contributed immensely. Conceptualization, XT, HC and YK; methodology, XT, HC and YK; software, XT; validation, XT and HC; formal analysis, XT; investigation, HC and YK; resources, HC and YK; data curation, XT and HC; writing-original draft preparation, XT; writing-review and editing, HC and YK; visualization, XT and HC; supervision, YK; project administration, YK; funding acquisition, HC. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Heesun Chae or Youngjoe Kim .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Approval for this study was obtained from the Ethics Committee of Suzhou City University (Approval code: SZCU12020620230901). We confirm that the procedure was conducted in compliance with the ethical guidelines outlined in the 1964 Helsinki Declaration and its subsequent amendments.

Informed consent

All participants were asked to provide informed consent by signing a consent form, indicating their voluntary participation prior to the survey. No identifying information was collected during the survey, and there were no ethical concerns related to the study. The research adhered to ethical standards, ensuring that all responses remained anonymous and confidential.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tian, X., Chae, H. & Kim, Y. Leader prohibitive voice behavior and its effects on followers through leader identification and political skill. Humanit Soc Sci Commun 11 , 1219 (2024). https://doi.org/10.1057/s41599-024-03740-9

Download citation

Received : 03 March 2024

Accepted : 05 September 2024

Published : 16 September 2024

DOI : https://doi.org/10.1057/s41599-024-03740-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

hypothesis test regression analysis

IMAGES

  1. Hypothesis Tests in Multiple Linear Regression, Part 1

    hypothesis test regression analysis

  2. Mod-01 Lec-39 Hypothesis Testing in Linear Regression

    hypothesis test regression analysis

  3. How to Test Hypotheses in Regression Analysis, Correlation, and

    hypothesis test regression analysis

  4. PPT

    hypothesis test regression analysis

  5. PPT

    hypothesis test regression analysis

  6. Regression analysis for hypothesis testing: Dependent variables

    hypothesis test regression analysis

VIDEO

  1. Hypothesis Testing in Simple Linear Regression

  2. Hypothesis testing: Linear & Multiple Regression in R

  3. Hypothesis Test for Linear Regression

  4. SPSS-10: Hypothesis testing using regression (Urdu/Hindi)

  5. STANDARD DEVIATION

  6. اختبارات الفروض : تحليل الانحدار المتعدد Hypothesis tests: multiple regression analysis

COMMENTS

  1. 12.2.1: Hypothesis Test for Linear Regression

    The hypotheses are: Find the critical value using dfE = n − p − 1 = 13 for a two-tailed test α = 0.05 inverse t-distribution to get the critical values ± 2.160. Draw the sampling distribution and label the critical values, as shown in Figure 12-14. Figure 12-14: Graph of t-distribution with labeled critical values.

  2. 15.5: Hypothesis Tests for Regression Models

    Testing the model as a whole. Okay, suppose you've estimated your regression model. The first hypothesis test you might want to try is one in which the null hypothesis that there is no relationship between the predictors and the outcome, and the alternative hypothesis is that the data are distributed in exactly the way that the regression model predicts.

  3. Understanding the Null Hypothesis for Linear Regression

    x: The value of the predictor variable. Simple linear regression uses the following null and alternative hypotheses: H0: β1 = 0. HA: β1 ≠ 0. The null hypothesis states that the coefficient β1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

  4. Hypothesis Testing in Regression Analysis

    Hypothesis Testing in Regression Analysis. Hypothesis testing is used to confirm if the estimated regression coefficients bear any statistical significance. Either the confidence interval approach or the t-test approach can be used in hypothesis testing. In this section, we will explore the t-test approach.

  5. Linear regression hypothesis testing: Concepts, Examples

    This essentially means that the value of all the coefficients is equal to zero. So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis states that a1 = a2 = a3 = 0. Determine the test statistics: The next step is to determine the test statistics and calculate the value.

  6. 13.6 Testing the Regression Coefficients

    The p -value for the test on the income regression coefficient is in the bottom part of the table under the P-value column of the Income row. So the p -value= 0.0060 0.0060. Conclusion: Because p -value = 0.0060 <0.05 = α = 0.0060 <0.05 = α, we reject the null hypothesis in favour of the alternative hypothesis.

  7. 3.3.4: Hypothesis Test for Simple Linear Regression

    Simple Linear Regression ANOVA Hypothesis Test Example: Rainfall and sales of sunglasses We will now describe a hypothesis test to determine if the regression model is meaningful; in other words, does the value of \(X\) in any way help predict the expected value of \(Y\)?

  8. Linear regression

    The lecture is divided in two parts: in the first part, we discuss hypothesis testing in the normal linear regression model, in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors; in the second part, we show how to carry out hypothesis tests in linear regression analyses where the ...

  9. Statistical Hypothesis Testing Overview

    Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.

  10. PDF Lecture 5 Hypothesis Testing in Multiple Linear Regression

    Consider the regression model with p predictors y = Xβ + . We would like to determine if some subset of r < p predictors contributes significantly to the regression model. 16. Partition the vector of regression coefficients as β = β1. β2. where β1is (p+1−r)×1 and β2is r ×1. We want to test the hypothesis H. 0: β2= 0 H.

  11. Hypothesis Testing in Regression Models

    Pesaran, M. Hashem, 'Hypothesis Testing in Regression Models', Time Series and Panel Data Econometrics (Oxford, ... relationships modelled through regression analysis, or to investigate the validity of the classical assumptions in simple and multiple linear regression models. The discussions cover statistical hypothesis testing in simple and ...

  12. The Complete Guide to Linear Regression Analysis

    In the case of simple linear regression we performed the hypothesis testing by using the t statistics to see is there any relationship between the TV advertisement and sales. In the same manner, for multiple linear regression, we can perform the F test to test the hypothesis as, H0: β1 = β2 = · · · = βp = 0. Ha: At least one βj is non-zero.

  13. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  14. 6.4

    There is sufficient evidence (F = 16.43, P < 0.001) to conclude that at least one of the slope parameters is not equal to 0. In general, to test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F -test reported in the analysis of variance table. STAT501: Overall F test.

  15. How to Test the Significance of a Regression Slope

    Step 1. State the hypotheses. The null hypothesis (H0): B1 = 0. The alternative hypothesis: (Ha): B1 ≠ 0. Step 2. Determine a significance level to use. Since we constructed a 95% confidence interval in the previous example, we will use the equivalent approach here and choose to use a .05 level of significance. Step 3.

  16. Hypothesis Test for Regression Slope

    Hypothesis Test for Regression Slope. This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y.. The test focuses on the slope of the regression line Y = Β 0 + Β 1 X. where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of ...

  17. Choosing the Right Statistical Test

    What does a statistical test do? Statistical tests work by calculating a test statistic - a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.. It then calculates a p value (probability value). The p-value estimates how likely it is that you would see the difference described by the test statistic if the null ...

  18. Regression Analysis

    Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.

  19. 14.4: Hypothesis Test for Simple Linear Regression

    We will also run this test using the p p ‐value method with statistical software, such as Minitab. Data/Results. F = 341.422/12.859 = 26.551 F = 341.422 / 12.859 = 26.551, which is more than the critical value of 10.13, so Reject Ho H o. Also, the p p ‐value = 0.0142 < 0.05 which also supports rejecting Ho H o.

  20. Everything you need to know about Hypothesis Testing in Machine Learning

    The null hypothesis represented as H₀ is the initial claim that is based on the prevailing belief about the population. The alternate hypothesis represented as H₁ is the challenge to the null hypothesis. It is the claim which we would like to prove as True. One of the main points which we should consider while formulating the null and alternative hypothesis is that the null hypothesis ...

  21. 12.5: Testing the Significance of the Correlation Coefficient

    The p-value is calculated using a t -distribution with n − 2 degrees of freedom. The formula for the test statistic is t = r n−2√ 1−r2√. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.

  22. Hypothesis Testing and Regression Analysis

    In this chapter, we look at the different stages of data preparation involved in quantitative analysis. Understanding these processes will help us gather reliable data and reach a valid conclusion. We will discuss types of hypotheses and how they are stated mathematically. Furthermore, we shall discuss hypothesis testing with worked examples.

  23. Leader prohibitive voice behavior and its effects on followers through

    To test our hypotheses, we used mediation and moderation analyses, hierarchical regression analysis, and the PROCESS macro. The results partially supported the research hypothesis, indicating that ...