**SOC 2020: Statistics Final Exam B**

**FINAL EXAM – Statistics with SPSS**

For this exam, you will need to use SPSS. You need to save this completed document as a PDF, and upload it to Moodle. In addition to this document, I expect you to turn in your *syntax. *Please go to Moodle, click on the “Final Syntax” and upload your syntax once you complete the exam. Be sure to upload the syntax, not the output!

At the beginning of your syntax please mimic the following format (2 points):

*[insert your name here]’s syntax for the final

You will need to put notes in your syntax so I know what you are doing at what point. For example,

*Question 1. Run the syntax for a frequency distribution on the variable race.

Do this for every new question. It will help you work through the final, and it will also help me see what you were doing for each question. You will lose points if your syntax is missing notes. Don’t forget to save your syntax periodically. You will not receive credit for a question if the syntax is missing.

Also see: Homework Help on Statistics

**Hypothesis Testing **

**(18 points)**

You are going to run a one-sample t-test with the variable “speduc.” This variable indicates the highest year of school completed for the respondent’s spouse.

- Create your own hypothesis based on what you think you know about the general population. Write this hypothesis below (Hint: you will need a number in your hypothesis) (2 points):

H0: µ = 0 vs H0: µ ≠ 0

- Report the N___1399_____, the mean__13.41______, and the standard deviation _3.280_______ (3 points).
- Look at the output for the one-sample test.

Identify the t-statistic______152.968_____, the degrees of freedom____1398_____, and the level of significance _____5%______(3 points)

- Thinking back to your hypothesis, what can you conclude about the highest years of education among spouses in the American population (2 points)?

I think there is a significant mean difference in the highest years of education among spouses in the American population.

Now you are going to run a two-sample t-test. You will test the null hypothesis that whites and blacks have spouses with the same highest year of school completed by using the variable speduc and racecen1.

- Create a hypothesis (2 points):

H0: whites and blacks have spouses with the same highest year of school completed.

H1: whites and blacks have spouses with the different highest year of school completed.

- Now report the mean for Whites_____13.81_____, and the mean for Blacks ______12.71__.

And what is the ** actual **difference between these two groups?____1.102_____(3 points)

- Note which tailed test to use based on the hypothesis ________2-tailed test_____ Write a brief conclusion about your results. Can you reject the null hypothesis?
(3 points)*Explain*

Conclusion: The whites and blacks have spouses with the different highest year of school completed.

Explanation: This is be cause the p-value generated = 0.000 is much less than alpha 0.05 hence we reject the null hypothesis and conclude the alternative.

**Cross-Tabulations, Chi-Squares, and Measures of Association **

**(32 points)**

You are going to explore the relationship between sex and fepresch (this survey question asks respondents: “do preschool children suffer when a mother works?” Strongly agree – strongly disagree)

- Provide a causal diagram that depicts your hypothesis of how these two variables are related (3 points):
- Use words to explain your causal diagram (2 points):
- What is the level of measurement for these two variables (2 points)?

Sex: nominal

Fepresch: nominal

- Create a cross-tab with these two variables, and ask for a chi-square statistic. What percent of women strongly disagree with the statement?__________

What percent of men strongly agree with the statement (2 points)?_________

- What is your Pearson’s Chi-Square statistic?__________ and the degrees of freedom?_______ What is the level of significance reported in the output?___________(3 points)

In this section you will need to consider the variables: childs and sibs. Before proceeding, I would like you to recode the two variables (4 points).

- For the variable: sibs. Create a new variable called sibs3 that has only three categories: single child (0), standard family (1-3), and large family (4+). Now, check your recode: What percent of respondents report having 1-3 siblings (1 point)______
- For the variable: childs. Create a new variable called child3 that has only three categories: no children (0), standard family (1-3), and large family (4+). Now, check your recode: What percent of respondents report having 4 or more children (1 point)?__________
- You will now run a crosstab between sibs3 and child3. Think about which variable should be the independent variable and which variable should be the dependent variable. Draw a causal diagram that depicts the relationship you would expect between these two variables (3 points).

Don’t forget to put these variables into the correct spaces in SPSS. Ask SPSS to provide column percentages, and include the ** best **measure of association for examining these variables. Summarize the relationship between siblings and number of children below (be sure to talk about the actual variables).

- Is there a relationship? Write a sentence that states this, using the
*variables*(2 points) - What is the strength of the relationship? What is the maximum difference? (2 points)
- What is the direction of the relationship (1 point)?
- What measure of association
*should*you choose? _________ Justify why this is the best measure of association to use (3 points). - What does the measure of association and the p-value associated with your measure of association tell you about the observed relationship between sibs3 and child3 (2 points)?
- Assume that both sibs and childs are true interval-ratio variables (we know that childs is not, based on its last value, but we would likely treat it as one anyway). What measure of association should you use if you want to examine the relationship between sibs and childs rather than sibs3 and childs3 (1 point)?

More to read: Data Visualization and Descriptive Statistics

**Correlation, and Bivariate Regression **

**(20 points)**

Look at the list of interval-ratio variables. We are going to use four of these variables in the following questions. The dependent variable for all of your analyses will be childs (we are treating this as an interval-ratio variable, despite its last category).

- childs (number of children) – DV
- educ (highest year of school completed) – IV
- agewed (age when first married) – IV
- sex (sex of respondent) – IV

- The first independent variable you will use is educ. Draw a causal diagram that depicts the relationship you expect to see between educ and childs (3 points):
- Translate your diagram into words (2 points):

You will run a regression model with these two variables of interest (educ and childs).

- We are interested in the correlation coefficient. What is this number?_______ How do you interpret this correlation coefficient (3 points)?
- What is the value of the R-square?______ What percent of the variation in your dependent variable is explained with your independent variable (2 points)?__________
- What is the coefficient for your independent variable?________ What is the coefficient for the intercept term?________ Interpret the coefficient for your
variable (4 points)?*independent* - Now I want you to use the equation: Ŷ =a + bx to make a specific statement using the value “14” for your independent variable educ. Show me your work for solving for Ŷ below (3 points).
- What does the answer, when you solve for Ŷ tell you? (3 points)

** Multiple Regressions **

**(28 points).**

Now you’re going to add a third variable to your model, and run a multiple regression. Use the variable agewed in your model (keeping your original independent variable of educ in the model).

- What is the coefficient for educ? (1point) _____________

What is the coefficient for agewed? (1point) ______________

What is the intercept term? (1point) _______________

Which of these independent variables (if any) are statistically significant at the .05 level? (2 points)

Interpret the coefficient for the independent variable AGEWED. (3 points)

Interpret the coefficient for the ** intercept **. (3 points)

- Did adding the variable “agewed” improve our regression model? ____________

Justify your answer (3 points):

- Let’s use our equation to make a prediction. Use the number “18” for educ and the number “30” for agewed. Solve for this specific kind of person, and plug your numbers into the equation below, and solve for Ŷ (3 points):

Ŷ = a + b1X1 + b2X2

- Your friend Jeremy has no idea what you are doing and doesn’t understand statistics. Explain your answer in part C to Jeremy (3 points):
- You are going to produce a multiple regression that uses educ, agewed, and sex as independent variables to predict childs. In order to use the nominal variable of sex, you will need to recode it. Recode it to indicate whether or not the respondent is
(3 points).*male* - Is the new variable (male) statistically significant? (1 point) _________
- Write a sentence or two that summarizes your findings below (2 points):
- Interpret the coefficient for “male.” (2 points)

******************************************************************************

EXTRA CREDIT: What is the name of Professor Paino’s cat? (1 point)

This is the extra credit question I have used every year for the past ten years. But, since I recently said goodbye to my kitty, this is the last year this question will be used.

Hint: He made an appearance in some notes and activities.