- Remind me if Xs are highly correlated (scatterplot, VIF, Correlations) if you have a sign switch, correct the situation by throwing one of the variables out of the model. Consider R-squared or adj R-squared when making the decision.
My Y = Revenue, my company name is Amazon
X variables = Earnings, Revenue, and Industry Revenue.
As shown in the first graph above, The X variables are significant and show some linear pattern however it is not a perfect linear line so we can say it is non-linear. The P- values are zero which means there is an evidence of a relationship between the variables. X variables are also statistically significant and co-related as the correlation matrix table shows the amount of correlation between the variables. Only Industry Revenue and Earnings has a low correlation as it has 0.447.
Comparing my tables I do see a sign switch in the Employment as it has a negative sign (-0.000573).
Interpreting the regression analysis in my second graph, we want there to be a relationship between our X and Y variable. X is the independent variable and Y is our dependent variable. So, the relation between X and Y is if the X changes our Y changes as well. The constants are not statistically significant as the P-values are greater than 1. Let’s assume that the unit is 1, My coefficient in Earnings is 114.5 that means is if my Earnings increases by $1, the Revenue will increase by $114.5. In my second X variable, if my Industry Revenue is increased by $1 then my Revenue also increases by $370.1. Also, for the Employment it has a negative coefficient so that means they will move on the opposite direction. Which means if the company employs more than 1 person the revenue will decrease by 0.000573. There is a very low impact so for the company to see a huge difference in the decrease of the revenue the company will need to hire more than 10,000 employs which would decrease by$5.73.
GDP information and what are the biggest sources of revenue in the economy
WORKSHEET 1
Regression Analysis: Revenue versus Earnings
Regression Equation
Revenue | = | -67933 + 114.91 Earnings |
Coefficients
Term | Coef | SE Coef | T-Value | P-Value | VIF |
Constant | -67933 | 5923 | -11.47 | 0.000 | |
Earnings | 114.91 | 8.45 | 13.60 | 0.000 | 1.00 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
9225.39 | 67.27% | 66.91% | 65.08% |
Analysis of Variance
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
Regression | 1 | 15742520392 | 15742520392 | 184.97 | 0.000 |
Earnings | 1 | 15742520392 | 15742520392 | 184.97 | 0.000 |
Error | 90 | 7659698627 | 85107763 | ||
Lack-of-Fit | 84 | 7637913547 | 90927542 | 25.04 | 0.000 |
Pure Error | 6 | 21785079 | 3630847 | ||
Total | 91 | 23402219018 |
Fits and Diagnostics for Unusual Observations
Obs | Revenue | Fit | Resid | Std Resid | |
88 | 60450 | 32387 | 28063 | 3.10 | R |
90 | 52890 | 34685 | 18205 | 2.02 | R |
91 | 56580 | 35145 | 21435 | 2.38 | R |
92 | 72380 | 35259 | 37121 | 4.12 | R |
R Large residual
Durbin-Watson Statistic
Durbin-Watson Statistic = | 0.186035 |
Considering the R-squared and the VIF table I have decided to take my Employment and industry revenue out. There was a low R-squared in all my three X variables when I ran the Regression analysis, so I decided to choose the one with the higher R-Square and R-sq(adj). The highest R-sq is the Earnings so I will be keeping only one X variable. Also, The P-value is O which is good and VIF is one which is low however this is the best model I have comparatively. Also looking at my VIF which is 1 there is no sign of multicollinearity as it is not greater than 5 or 10.
- Using the scatter plots you generated, identify any nonlinear relationships between Y and X variables.
Try to correct nonlinearity through transformation (page 233-237). If it works, keep the transformed version of the variable. Otherwise, use the original variable, acknowledge the nonlinearity, and move on to the next test. Use 2 different transformations (ex: Log X, 1/X, X^2 or SQRT(X)).
I ran a transformation to see if my nonlinearity changes. Transforming a data means to change its functional form, so I ran Y and X variable to see if my Scatterplot give me a linear or non-linear pattern. Transformation is one way to keep the information content, but we change the functional form also it is not necessary to always work. However, I do not see drastic changes. I ran Log X and SQRT (X) as my transformation test, but I got same results. The first graph is my Earnings and the second is transformation with Log X and third is SQRT (X). So, I have no choice to acknowledge it and to move on with my non-linearity.
3.Once you correct for nonlinearity and multicollinearity, check for autocorrelation using DW test. Do you have autocorrelation? Correct for autocorrelation if you have any.
Based on my data since I have decided to keep just one of my x variables that is Earnings. The DW = 0.186035 Durbin-Watson Statistic
Durbin-Watson Statistic = | 0.186035 |
In my 90 row I have lower bound of 1.64 and the upper bound 1.69. My DW is lower than the lower bound based on my decision rule I am going to reject the null and I have an auto correlation. So now I am attempting to fix the auto correlation so I will have one more coefficient. I am going to add a lag value of Yt to attempt to replicate assuming it is coming from the Yt.
So my row is 90 and K=1DL
WORKSHEET 1
Regression Analysis: Revenue (yt) versus Earnings, Revenue(yt-1), trend
Method
Rows unused | 1 |
Regression Equation
Revenue (yt) | = | -8952 + 16.4 Earnings + 1.0062 Revenue(yt-1) – 37 trend |
Coefficients
Term | Coef | SE Coef | T-Value | P-Value | VIF |
Constant | -8952 | 19638 | -0.46 | 0.650 | |
Earnings | 16.4 | 39.8 | 0.41 | 0.681 | 117.98 |
Revenue(yt-1) | 1.0062 | 0.0517 | 19.47 | 0.000 | 3.39 |
trend | -37 | 174 | -0.21 | 0.833 | 123.53 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
3933.62 | 94.21% | 94.01% | 93.00% |
Analysis of Variance
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
Regression | 3 | 21921168164 | 7307056055 | 472.24 | 0.000 |
Earnings | 1 | 2638051 | 2638051 | 0.17 | 0.681 |
Revenue(yt-1) | 1 | 5864694800 | 5864694800 | 379.02 | 0.000 |
trend | 1 | 692178 | 692178 | 0.04 | 0.833 |
Error | 87 | 1346180110 | 15473335 | ||
Total | 90 | 23267348275 |
Fits and Diagnostics for Unusual Observations
Obs | Revenue (yt) | Fit | Resid | Std Resid | ||
77 | 22720 | 30943 | -8223 | -2.16 | R | |
80 | 35750 | 27150 | 8600 | 2.23 | R | |
81 | 29130 | 37634 | -8504 | -2.22 | R | |
84 | 43740 | 34958 | 8782 | 2.28 | R | |
85 | 35710 | 46101 | -10391 | -2.75 | R | |
88 | 60450 | 46171 | 14279 | 3.77 | R | |
89 | 51049 | 63046 | -11997 | -3.41 | R | X |
90 | 52890 | 53781 | -891 | -0.24 | X | |
91 | 56580 | 55662 | 918 | 0.25 | X | |
92 | 72380 | 59354 | 13026 | 3.64 | R | X |
R Large residual X Unusual X
Durbin-Watson Statistic
Durbin-Watson Statistic = | 2.46865 |
I ran the regression adding a lag, Trend and Revenue (yt-1) which gave me a higher VIF which is more than 117.98 in my X variable. It fixed my DW which is 2.46865 which means there is no autocorrelation detected in the sample. However, I have detected Multicollinearity due to higher VIF. It shows the multicollinearity between my X variable and trend. The T-value from both Earnings and trend are not significant as we can see the P-values which are above 0.05 for both Earnings and trend. This is a consequence of multicollinearity between them two. If I drop the Trend the T-value for earnings will be significant. DW close to close to 0 indicates positive autocorrelation and close to 4 indicates negative autocorrelation. When its close to 2 there is no autocorrelation and that is exactly what we want in the model.
So again, I decided to remove trend from my data and run the regression analysis again. This gave me a perfect model which is shown in the graph below.
Regression Analysis: Revenue (yt) versus Earnings, Revenue(yt-1)
Method
Rows unused | 1 |
Regression Equation
Revenue (yt) | = | -4892 + 8.14 Earnings + 1.0037 Revenue(yt-1) |
Coefficients
Term | Coef | SE Coef | T-Value | P-Value | VIF |
Constant | -4892 | 4117 | -1.19 | 0.238 | |
Earnings | 8.14 | 6.54 | 1.24 | 0.216 | 3.21 |
Revenue(yt-1) | 1.0037 | 0.0500 | 20.07 | 0.000 | 3.21 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
3912.21 | 94.21% | 94.08% | 93.27% |
Analysis of Variance
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
Regression | 2 | 21920475987 | 10960237993 | 716.10 | 0.000 |
Earnings | 1 | 23720949 | 23720949 | 1.55 | 0.216 |
Revenue(yt-1) | 1 | 6165533565 | 6165533565 | 402.83 | 0.000 |
Error | 88 | 1346872288 | 15305367 | ||
Total | 90 | 23267348275 |
Fits and Diagnostics for Unusual Observations
Obs | Revenue (yt) | Fit | Resid | Std Resid | ||
77 | 22720 | 31089 | -8369 | -2.17 | R | |
80 | 35750 | 27259 | 8491 | 2.20 | R | |
81 | 29130 | 37719 | -8589 | -2.25 | R | |
84 | 43740 | 34912 | 8828 | 2.30 | R | |
85 | 35710 | 46023 | -10313 | -2.73 | R | |
88 | 60450 | 46113 | 14337 | 3.80 | R | |
89 | 51049 | 62933 | -11884 | -3.36 | R | X |
90 | 52890 | 53611 | -721 | -0.19 | X | |
91 | 56580 | 55492 | 1088 | 0.30 | X | |
92 | 72380 | 59203 | 13177 | 3.63 | R | X |
R Large residual X Unusual X
Durbin-Watson Statistic
Durbin-Watson Statistic = | 2.46554 |
- Incorporate seasonal dummies and trend into your model. Identify if you have seasonality, trend by checking their significance? Is that consistent with your previous findings?
Dummy variable are also called the indicator variables it is how we incorporate non continuous or qualitative date in our analysis. So, incorporating trend and seasonality in our regression model, Yt which is our dependent variable and is also a function of X1 which is Earnings. The coefficient in front of X1 is the marginal effects and beta zero is the intercept and errors. Anything that is not explained in the model, that will go to the error term. So, let us incorporate it in the model so error can look much better. Trend is going to keep a track of the number of years. My sample size is 92 so the trend will increase in the increments of 1. Similarly, seasonality would be into four quarters Q1,Q2,Q3 and Q4.
WORKSHEET 1
Regression Analysis: Revenue versus Earnings, T, q1, q2, q3
Method
Categorical predictor coding | (1, 0) |
Regression Equation
q1 | q2 | q3 | |||
Revenue | = | 27041 – 71.5 Earnings + 800 T | |||
1 | Revenue | = | 22892 – 71.5 Earnings + 800 T | ||
1 | Revenue | = | 22473 – 71.5 Earnings + 800 T | ||
1 | 1 | Revenue | = | 18324 – 71.5 Earnings + 800 T | |
1 | Revenue | = | 22774 – 71.5 Earnings + 800 T | ||
1 | 1 | Revenue | = | 18625 – 71.5 Earnings + 800 T | |
1 | 1 | Revenue | = | 18206 – 71.5 Earnings + 800 T | |
1 | 1 | 1 | Revenue | = | 14057 – 71.5 Earnings + 800 T |
Coefficients
Term | Coef | SE Coef | T-Value | P-Value | VIF |
Constant | 27041 | 44784 | 0.60 | 0.548 | |
Earnings | -71.5 | 90.6 | -0.79 | 0.432 | 120.16 |
T | 800 | 388 | 2.06 | 0.042 | 120.14 |
q1 | |||||
1 | -4267 | 2664 | -1.60 | 0.113 | 1.50 |
q2 | |||||
1 | -4568 | 2662 | -1.72 | 0.090 | 1.50 |
q3 | |||||
1 | -4149 | 2661 | -1.56 | 0.123 | 1.50 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
9022.42 | 70.09% | 68.35% | 64.94% |
Analysis of Variance
Source | DF | Adj SS | Adj MS | F-Value | P-Value |
Regression | 5 | 16401476827 | 3280295365 | 40.30 | 0.000 |
Earnings | 1 | 50676370 | 50676370 | 0.62 | 0.432 |
T | 1 | 345468005 | 345468005 | 4.24 | 0.042 |
q1 | 1 | 208856960 | 208856960 | 2.57 | 0.113 |
q2 | 1 | 239764768 | 239764768 | 2.95 | 0.090 |
q3 | 1 | 197858706 | 197858706 | 2.43 | 0.123 |
Error | 86 | 7000742191 | 81403979 | ||
Total | 91 | 23402219018 |
Fits and Diagnostics for Unusual Observations
Obs | Revenue | Fit | Resid | Std Resid | |
88 | 60450 | 35030 | 25420 | 2.92 | R |
89 | 51049 | 31134 | 19915 | 2.30 | R |
90 | 52890 | 30632 | 22258 | 2.60 | R |
91 | 56580 | 31565 | 25015 | 2.92 | R |
92 | 72380 | 36443 | 35937 | 4.17 | R |
R Large residual
Durbin-Watson Statistic
Durbin-Watson Statistic = | 0.126943 |
- Once you corrected for all possible problems, rewrite your final equation, INTERPRET the equation, and forecast y, for 35th in sample observation.
- Analyze the resulting residuals (4-in-11 plot in MINITAB)
- How does regression analysis perform compared to univariate methods you have learned? Create a table that includes the MSD for the univariate models (Trend, Smoothing, Decomposition) and MSE of regression model. (HINT 1: You don’t have to try ALL univariate models. Use your knowledge of your data. For example: If your revenue variable is trending, no need to run single smoothing. Or, if it is linear, no need to run nonlinear trend models etc. HINT 2: Your final is around the corner, no harm in reviewing the previous material ahead of time, either.)
Last Updated on November 16, 2020 by EssayPro