In both such cases, the coefficient of determination normally ranges from 0 to 1. In linear regression analysis, the coefficient of determination describes what proportion of the dependent variable’s variance can be explained by the independent variable(s). The coefficient of determination (R² or r-squared) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, the coefficient of determination tells one how well the data fits the model (the goodness of fit). Coefficient of determination, in statistics, R2 (or r2), a measure that assesses the ability of a model to predict or explain an outcome in the linear regression setting. More specifically, R2 indicates the proportion of the variance in the dependent variable (Y) that is predicted or explained by linear regression and the predictor variable (X, also known as the independent variable).

R2 in logistic regression

The professor wants to develop a linear regression model to predict a student’s final exam score from the third exam score. The coefficient of determination is a statistical measurement that examines how differences in one variable can be explained by the difference in a second variable when predicting the outcome of a given event. In other words, this coefficient, more commonly known as r-squared (or r2), assesses how strong the linear relationship is between two variables and is heavily relied on by investors when conducting trend analysis. These examples illustrate the wide-ranging applications of the Coefficient of Determination. It’s an essential tool in regression analysis, offering an easy-to-understand measure of how well a model fits a dataset. Nevertheless, as emphasized earlier, it’s crucial to consider its limitations and to use it in conjunction with other statistical measures and checks for a thorough analysis.


For the past 52 years, Harold Averkamp (CPA, MBA) has worked as an accounting supervisor, manager, consultant, university instructor, and innovator in teaching accounting online. Statology Study is the ultimate online statistics study guide that helps you study and practice all of the core concepts taught in any elementary statistics course net income after taxes niat and makes your life so much easier as a student. Depending on the objective, the answer to “What is a good value for R-squared? For example, students might find studying less frustrating when they understand the course material well, so they study longer. The previous two examples have suggested how we should define the measure formally.

Introduction to Statistics Course

In summary, the Coefficient of Determination provides an aggregate measure of the predictive power of a statistical model. Use each of the three formulas for the coefficient of determination to compute its value for the example of ages and values of vehicles. In addition, the statistical metric is frequently expressed in percentages.

  1. In Statistical Analysis, the coefficient of determination method is used to predict and explain the future outcomes of a model.
  2. Although the coefficient of determination provides some useful insights regarding the regression model, one should not rely solely on the measure in the assessment of a statistical model.
  3. No universal rule governs how to incorporate the coefficient of determination in the assessment of a model.
  4. The adjusted R2 can be negative, and its value will always be less than or equal to that of R2.
  5. If equation 1 of Kvålseth[12] is used (this is the equation used most often), R2 can be less than zero.

Lack of Information about Individual Predictors

Most of the time, the coefficient of determination is denoted as R2, simply called “R squared”. It is the proportion of variance in the dependent variable that is explained by the model. In our Exam Data example this value is 37.04% meaning that 37.04% of the variation in the final exam scores can be explained by quiz averages. Approximately 68% of the variation in a student’s exam grade is explained by the least square regression equation and the number of hours a student studied. Find and interpret the coefficient of determination for the hours studied and exam grade data. Ingram Olkin and John W. Pratt derived the minimum-variance unbiased estimator for the population R2,[20] which is known as Olkin–Pratt estimator.

A value of 1 indicates that the response variable can be perfectly explained without error by the predictor variable. The breakdown of variability in the above equation holds for the multiple regression model also. Where Xi is a row vector of values of explanatory variables for case i and b is a column vector of coefficients of the respective elements of Xi. In this form R2 is expressed as the ratio of the explained variance (variance of the model’s predictions, which is SSreg / n) to the total variance (sample variance of the dependent variable, which is SStot / n). The coefficient of determination is a ratio that shows how dependent one variable is on another variable.

Create a free account to unlock this Template

When squared, it provides the proportion of variance in one variable that is predictable from the other variable, which is precisely what the Coefficient of Determination represents. In simple linear least-squares regression, Y ~ aX + b, the coefficient of determination R2 coincides with the square of the Pearson correlation coefficient between x1, …, xn and y1, …, yn. The coefficient of determination measures the percentage of variability within the \(y\)-values that can be explained by the regression model. In least squares regression using typical data, R2 is at least weakly increasing with an increase in number of regressors in the model. Because increases in the number of regressors increase the value of R2, R2 alone cannot be used as a meaningful comparison of models with very different numbers of independent variables. As a reminder of this, some authors denote R2 by Rq2, where q is the number of columns in X (the number of explanators including the constant).

If equation 1 of Kvålseth[12] is used (this is the equation used most often), R2 can be less than zero. However, it is not always the case that a high r-squared is good for the regression model. The quality of the coefficient depends on several factors, including the units of measure of the variables, the nature of the variables employed in the model, and the applied data transformation. Thus, sometimes, a high coefficient can indicate issues with the regression model. In regression analysis this is a statistic (designated as r-squared) indicating the percentage of the change occurring in the dependent variable that is explained by the change in the independent variable(s).

R2 can be interpreted as the variance of the model, which is influenced by the model complexity. A high R2 indicates a lower bias error because the model can better explain the change of Y with predictors. For this reason, we make less (erroneous) assumptions, and this results in a lower bias error. Meanwhile, to accommodate less assumptions, the model tends to be more complex.

Studying longer may or may not cause an improvement in the students’ scores. Although this causal relationship is very plausible, the R² alone can’t tell us why there’s a relationship between students’ study time and exam scores. Another way of thinking of it is that the R² is the proportion of variance that is shared between the independent and dependent variables. In conclusion, the Coefficient of Determination serves as a fundamental tool in statistical analysis, assisting in model construction, validation, and comparison.

Investors use it to determine how correlated an asset’s price movements are with its listed index. Once you have the coefficient of determination, you use it to evaluate how closely the price movements of the asset you’re evaluating correspond to the price movements of an index or benchmark. In the Apple and S&P 500 example, the coefficient of determination for the period was 0.347. The coefficient of determination is a measurement used to explain how much the variability of one factor is caused by its relationship to another factor. This correlation is represented as a value between 0.0 and 1.0 (0% to 100%).