Numerically Summarizing Data

Chapter 3 – Numerically Summarizing Data

OUTLINE

3.1 Measures of Central Tendency

3.2 Measures of Dispersion

3.3 Measures of Central Tendency and

Dispersion from Grouped Data

3.4 Measures of Position

3.5 The Five-Number Summary and

Boxplots

Putting It Together

When we look at a distribution of data, we should consider three characteristics of the distribution: shape, center, and spread. In the last chapter, we discussed methods for organizing raw data into tables and graphs. These graphs (such as the histogram) allow us to identify the shape of the distribution: symmetric (in particular, bell shaped or uniform), skewed right, or skewed left.

The center and spread are numerical summaries of the data. The center of a data set is commonly called the average. There are many ways to describe the average value of a distribution. In addition, there are many ways to measure the spread of a distribution. The most appropriate measure of center and spread depends on the distribution’s shape.

Once these three characteristics of the distribution are known, we can analyze the data for interesting features, including unusual data values, called outliers.

Section 3.1

Measures of Central Tendency

Objectives

Determine the Arithmetic Mean of a Variable from Raw Data

Determine the Median of a Variable from Raw Data

Explain What It Means for a Statistic to be Resistant

Determine the Mode of a Variable from Raw Data

Objective 1: Determine the Arithmetic Mean of a Variable from Raw Data

Introduction, Page 1

Answer the following after watching the video.
1) What does a measure of central tendency describe?

Objective 1, Page 1

2) Explain how to compute the arithmetic mean of a variable.

3) What symbols are used to represent the population mean and the sample mean?

Objective 1, Page 2

4) List the formulas used to compute the population mean and the sample mean.

Note: Throughout this course, we agree to round the mean to one more decimal place than that in the raw data.

Objective 1, Page 3

Example 1 Computing a Population Mean and a Sample Mean

Table 1 shows the first exam scores of the ten students enrolled in Introductory Statistics.

Table 1 Student Score

Michelle 82
Ryanne 77
Bilal 90
Pam 71
Jennifer 62
Dave 68
Joel 74
Sam 84
Justine 94
Juan 88

A) Compute the population mean, m.

B) Find a simple random sample of size n = 4 students.

C) Compute the sample mean, , of the sample found in part (B).

Objective 1, Page 5

Answer the following after experimenting with the fulcrum animation.
5) What is the mean of the data?

6) Explain why it is helpful to think of the mean as the center of gravity.

Objective 2: Determine the Median of a Variable from Raw Data

Objective 2, Page 1

7) Define the median of a variable.

Objective 2, Page 2

8) List the three steps in finding the median of a data set.

Objective 2, Page 3

Example 2 Determining the Median of a Data Set (Odd Number of Observations)

Table 2 shows the length (in seconds) of a random sample of songs released in the 1970s. Find the median length of the songs.

Table 2

Song Name Length

“Sister Golden Hair” 201

“Black Water” 257

“Free Bird” 284

“The Hustle” 208

“Southern Nights” 179

“Stayin’ Alive” 222

“We Are Family” 217

“Heart of Glass” 206

“My Sharona” 240

Objective 2, Page 5

Example 3 Determining the Median of a Data Set (Even Number of Observations)

Find the median score of the data in Table 1.

Table 1 Student Score

Michelle 82
Ryanne 77
Bilal 90
Pam 71
Jennifer 62
Dave 68
Joel 74
Sam 84
Justine 94
Juan 88

Objective 3: Explain What It Means for a Statistic to be Resistant

Objective 3, Page 1

Answer the following as you work through the Mean versus Median Applet.
9) When the mean and median are approximately 2, how does adding a single observation near 9 affect the mean? How does it affect the median?

10) When the mean and median are approximately 2, how does adding a single observation near 24 affect the mean? The median?

Objective 3, Page 1 (continued)

11) When the mean and median are approximately 40, how does dragging the new observation from 35 toward 0 affect the mean? How does it affect the median?

Objective 3, Page 2

Answer the following as you watch the video.
12) Which measure, the mean or the median, is least affected by extreme observations?

13) Define what it means for a numerical summary of data to be resistant.

14) Which measure, the mean or the median, is resistant?

Objective 3, Page 3

15) State the reason that we compute the mean.

Objective 3, Page 7

Answer the following as you work through Activity 2: Relation among the Mean, Median, and Distribution Shape.

16) If a distribution is skewed left, what is the relation between the mean and median?

17) If a distribution is skewed right, what is the relation between the mean and median?

Objective 3, Page 7 (continued)

18) If a distribution is symmetric, what is the relation between the mean and median?

Objective 3, Page 11

19) Sketch three graphs showing the relation between the mean and median for distributions that are skewed left, symmetric, and skewed right.

Objective 3, Page 12

Example 4 Describing the Shape of a Distribution

The data in Table 4 represent the birth weights (in pounds) of 50 randomly sampled babies.

A) Find the mean and median birth weight.
B) Describe the shape of the distribution.
C) Which measure of central tendency best describes the average birth weight?

Table 4

5.8 7.4 9.2 7.0 8.5 7.6

7.9 7.8 7.9 7.7 9.0 7.1

8.7 7.2 6.1 7.2 7.1 7.2

7.9 5.9 7.0 7.8 7.2 7.5

7.3 6.4 7.4 8.2 9.1 7.3

9.4 6.8 7.0 8.1 8.0 7.5

7.3 6.9 6.9 6.4 7.8 8.7

7.1 7.0 7.0 7.4 8.2 7.2

7.6 6.7

Objective 4: Determine the Mode of a Variable from Raw Data

Objective 4, Page 1

20) Define the mode of a variable.

21) Under what conditions will a set of data have no mode?

22) Under what conditions will a set of data have two modes?

Objective 4, Page 2

Example 5 Finding the Mode of Quantitative Data

The following data represent the number of O-ring failures on the shuttle Columbia for the 17 flights prior to its fatal flight:

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 3

Find the mode number of O-ring failures.

Objective 4, Page 3

Example 6 Finding the Mode of Quantitative Data

Find the mode of the exam score data listed in Table 1.

Table 1 Student Score

Michelle 82
Ryanne 77
Bilal 90
Pam 71
Jennifer 62
Dave 68
Joel 74
Sam 84
Justine 94
Juan 88

Objective 4, Page 5

23) What does it mean when we say that a data set is bimodal? Multimodal?

Objective 4, Page 6

Example 7 Finding the Mode of Qualitative Data

The data in Table 5 represent the location of injuries that required rehabilitation by a physical therapist. Determine the mode location of injury.

Table 5

Back Back Hand Neck Knee Knee

Wrist Back Groin Shoulder Shoulder Back

Elbow Back Back Back Back Back

Back Shoulder Shoulder Knee Knee Back

Hip Knee Hip Hand Back Wrist

Data from Krystal Catton, student at Joliet Junior College

Objective 4, Page 8

Summary

24) List the conditions for determining when to use the following measures of central tendency.

A) Mean

B) Median

C) Mode

Section 3.2

Measures of Dispersion

Objectives

Determine the Range of a Variable from Raw Data

Determine the Standard Deviation of a Variable from Raw Data

Determine the Variance of a Variable from Raw Data

Use the Empirical Rule to Describe Data That Are Bell-Shaped

Introduction, Page 1

Measures of central tendency describe the typical value of a variable. We also want to know the amount of dispersion (or spread) in the variable. Dispersion is the degree to which the data are spread out.

Introduction, Page 2

Example 1 Comparing Two Sets of Data

The data tables represent the IQ scores of a random sample of 100 students from two different universities.

For each university, compute the mean IQ score and draw a histogram, using a lower class limit of 55 for the first class and a class width of 15. Comment on the results.

Objective 1: Determine the Range of a Variable from Raw Data

Objective 1, Page 1

1) What is the range of a variable?

Objective 1, Page 2

Example 2 Computing the Range of a Set of Data

The data in the table represent the first exam scores of 10 students enrolled in Introductory Statistics. Compute the range.

Student Score

Michelle 82
Ryanne 77
Bilal 90
Pam 71
Jennifer 62
Dave 68
Joel 74
Sam 84
Justine 94
Juan 88

Objective 2: Determine the Standard Deviation of a Variable from Raw Data

Objective 2, Page 1

2) Explain how to compute the population standard deviation s and list its formula.

Objective 2, Page 2

Example 3 Computing a Population Standard Deviation

Compute the population standard deviation of the test scores in Table 6.

Table 6

Student Score

Michelle 82
Ryanne 77
Bilal 90
Pam 71
Jennifer 62
Dave 68
Joel 74
Sam 84
Justine 94
Juan 88

Objective 2, Page 5

3) If a data set has many values that are “far” from the mean, how is the standard deviation affected?

Objective 2, Page 6

4) Explain how to compute the sample standard deviation s and list its formula.

Objective 2, Page 7

5) What do we call the expression ?

Objective 2, Page 8

Example 4 Computing a Sample Standard Deviation

In a previous lesson we obtained a simple random sample of exam scores and computed a sample mean of 73.75. Compute the sample standard deviation of the sample of test scores for that data.

Objective 2, Page 10

Answer the following after you watch the video.
6) Is standard deviation resistant? Why or why not?

Objective 2, Page 11

7) When comparing two populations, what does a larger standard deviation imply about dispersion?

Objective 2, Page 14

Example 5 Comparing the Standard Deviations of Two Sets of Data

The data tables represent the IQ scores of a random sample of 100 students from two different universities.

Use the standard deviation to determine whether University A or University B has more dispersion in the IQ scores of its students.

Objective 2, Page 17

Answer the following after using the applet in Activity 1: Standard Deviation as a Measure of Spread.

8) Compare the dispersion of the observations in Part A with the observations in Part B. Which set of data is more spread out?

9) In Part D, how does adding a point near 10 affect the standard deviation? How is the standard deviation affected when that point is moved near 25? What does this suggest?

Objective 2, Page 18

Watch the video to reinforce the ideas from Activity 1: Standard Deviation as a Measure of Spread.

Objective 3: Determine the Variance of a Variable from Raw Data

Objective 3, Page 1

10) Define variance.

Objective 3, Page 2

Example 6 Determining the Variance of a Variable for a Population and a Sample

In previous examples, we considered population data of exam scores in a statistics class. For this data, we computed a population mean of points and a population standard deviation of points. Then, we obtained a simple random sample of exam scores. For this data, we computed a sample mean of points and a sample standard deviation of points. Use the population standard deviation exam score and the sample standard deviation exam score to determine the population and sample variance of scores on the statistics exam.

Objective 3, Page 3

Answer the following after you watch the video.
11) Using a rounded value of the standard deviation to obtain the variance results in a round-off error. How should you deal with this issue?

Objective 3, Page 5

Whenever a statistic consistently underestimates a parameter, it is said to be biased. To obtain an unbiased estimate of the population variance, divide the sum of the squared deviations about the sample mean by .

Objective 4: Use the Empirical Rule to Describe Data That Are Bell-Shaped

Objective 4, Page 1

12) According to the Empirical Rule, if a distribution is roughly bell shaped, then approximately what percent of the data will lie within 1 standard deviation of the mean? What percent of the data will lie within 2 standard deviations of the mean? What percent of the data will lie within 3 standard deviations of the mean?

Objective 4, Page 2

13) Sketch the third part of Figure 5.

Objective 4, Page 3

Example 7 Using the Empirical Rule

Table 9 represents the IQs of a random sample of 100 students at a university.

A) Determine the percentage of students who have IQ scores within 3 standard deviations of the mean according to the Empirical Rule.
B) Determine the percentage of students who have IQ scores between 67.8 and 132.2 according to the Empirical Rule.
C) Determine the actual percentage of students who have IQ scores between 67.8 and 132.2.
D) According to the Empirical Rule, what percentage of students have IQ scores between 116.1 and 148.3?

Table 9

73 103 91 93 136 108 92 104 90 78

108 93 91 78 81 130 82 86 111 93

102 111 125 107 80 90 122 101 82 115

103 110 84 115 85 83 131 90 103 106

71 69 97 130 91 62 85 94 110 85

102 109 105 97 104 94 92 83 94 114

107 94 1121 113 115 106 97 106 85 99

102 109 76 94 103 112 107 101 91 107

107 110 106 103 93 110 125 101 91 119

118 85 127 141 129 60 115 80 111 79

Section 3.3

Measures of Central Tendency and Dispersion from Grouped Data

Objectives

Approximate the Mean of a Variable from Grouped Data

Compute the Weighted Mean

Approximate the Standard Deviation from a Frequency Distribution

Objective 1: Approximate the Mean of a Variable from Grouped Data

Objective 1, Page 1

1) Explain how to find the class midpoint.

2) List the formulas for approximating the population mean and sample mean from a frequency distribution.

Objective 1, Page 2

Example 1 Approximating the Mean for Continuous Quantitative Data from a Frequency Distribution

The frequency distribution in Table 10 represents the five-year rate of return of a random sample of 40 large-blend mutual funds. Approximate the mean five-year rate of return.

Table 10

Class (5-year rate of return) Frequency

8-8.99 2

9-9.99 2

10-10.99 4

11-11.99 1

12-12.99 6

13-13.99 13

14-14.99 7

15-15.99 3

16-16.99 1

17-17.99 0

18-18.99 0

19-19.99 1

Objective 1, Page 2 (Continued)

Objective 2: Compute the Weighted Mean

Objective 2, Page 1

3) When data values have different importance, or weights, associated with them, we compute the weighted mean. Explain how to compute the weighted mean and list its formula.

Objective 2, Page 2

Example 2 Computing the Weighted Mean

Marissa just completed her first semester in college. She earned an A in her 4-hour statistics course, a B in her 3-hour sociology course, an A in her 3-hour psychology course, a C in her 5-hour computer programming course, and an A in her 1-hour drama course. Determine Marissa’s grade point average.

Objective 3: Approximate the Standard Deviation from a Frequency Distribution

Objective 3, Page 1

4) List the formulas for approximating the population standard deviation and sample standard deviation of a variable from a frequency distribution.

Objective 3, Page 2

Example 3 Approximating the Standard Deviation from a Frequency Distribution

The frequency distribution in Table 11 represents the five-year rate of return of a random sample of 40 large-blend mutual funds. Approximate the standard deviation five-year rate of return.

Table 11

Class (5-year rate of return) Frequency

8-8.99 2

9-9.99 2

10-10.99 4

11-11.99 1

12-12.99 6

13-13.99 13

14-14.99 7

15-15.99 3

16-16.99 1

17-17.99 0

18-18.99 0

19-19.99 1

Section 3.4

Measures of Position

Objective

Determine and Interpret z-Scores

Interpret Percentiles

Determine and Interpret Quartiles

Determine and Interpret the Interquartile Range

Check a Set of Data for Outliers

Objective 1: Determine and Interpret z-Scores

Objective 1, Page 1

1) What does a z-score represent?

2) Explain how to find a z-score and list the formulas for computing a population z-score and a sample z-score.

3) What does a positive z-score for a data value indicate? What does a negative z-score indicate?

4) What does a z-score measure?

Objective 1, Page 1 (continued)

5) How are z-scores rounded?

Objective 1, Page 2

Example 1 Determine and Interpret z-Scores

Determine whether the Boston Red Sox or the Colorado Rockies had a relatively better run-producing season. The Red Sox scored 878 runs and play in the American League, where the mean number of runs scored was and the standard deviation was runs. The Rockies scored 845 runs and play in the National League, where the mean number of runs scored was and the standard deviation was runs.

Objective 1, Page 5

With negative z-scores, we need to be careful when deciding the better outcome. For example, when comparing finishing times for a marathon the lower score is better because it is more standard deviations below the mean.

Objective 2: Interpret Percentiles

Objective 2, Page 1

6) What does the kth percentile represent?

Objective 2, Page 2

Example 2 Interpreting a Percentile

Jennifer just received the results of her SAT exam. Her math score of 600 is at the 74th percentile. Interpret this result.

Objective 3: Determine and Interpret Quartiles

Objective 3, Page 1

7) Define the first, second, and third quartiles.

Objective 3, Page 2

8) List the three steps for finding quartiles.

Objective 3, Page 3

Example 3 Finding and Interpreting Quartiles

The Highway Loss Data Institute routinely collects data on collision coverage claims. Collision coverage insures against physical damage to an insured individual’s vehicle. Table 12 represents a random sample of 18 collision coverage claims based on data obtained from the Highway Loss Data Institute for 2007 models. Find and interpret the first, second, and third quartiles for collision coverage claims.

Table 12

$6751 $9908 $3461

$2336 $21,147 $2332

$189 $1185 $370

$1414 $4668 $1953

$10,034 $735 $802

$618 $180 $1657

Objective 4: Determine and Interpret the Interquartile Range

Objective 4, Page 1

9) Which measure of dispersion is resistant?

10) Define the interquartile range, IQR.

Objective 4, Page 2

Example 4 Finding and Interpreting the Interquartile Range

Determine and interpret the interquartile range of the collision claim data from Table 12 in Example 3.

Table 12

$6751 $9908 $3461

$2336 $21,147 $2332

$189 $1185 $370

$1414 $4668 $1953

$10,034 $735 $802

$618 $180 $1657

Objective 4, Page 4

11) If the shape of a distribution is symmetric, which measure of central tendency and which measure of dispersion should be reported?

12) If the shape of a distribution is skewed left or skewed right, which measure of central tendency and which measure of dispersion should be reported? Why?

Objective 5: Check a Set of Data for Outliers

Objective 5, Page 1

13) What is an outlier?

Objective 5, Page 2

14) List the four steps for checking for outliers by using quartiles.

Objective 5, Page 3

Example 5 Checking for Outliers

Check the data in Table 12 on collision coverage claims for outliers.

Table 12

$6751 $9908 $3461

$2336 $21,147 $2332

$189 $1185 $370

$1414 $4668 $1953

$10,034 $735 $802

$618 $180 $1657

Section 3.5

The Five-Number Summary and Boxplots

Objectives

Determine the Five-Number Summary

Draw and Interpret Boxplots

Objective 1: Determine the Five-Number Summary

Objective 1, Page 1

1) What values does the five-number summary consist of?

Objective 1, Page 2

Example 1 Obtaining the Five-Number Summary

Table 13 shows the finishing times (in minutes) of the men in the 60- to 64-year-old age group in a 5-kilometer race. Determine the five-number summary of the data.

Table 13

19.95 23.25 23.32 25.55 25.83 26.28 42.47

28.58 28.72 30.18 30.35 30.95 32.13 49.17

33.23 33.53 36.68 37.05 37.43 41.42 54.63

Data from Laura Gillogly, student at Joliet Junior College

Objective 2: Draw and Interpret Boxplots

Objective 2, Page 1

2) List the five steps for drawing a boxplot.

Objective 2, Page 2

Example 2 Constructing a Boxplot

Use the results of Example 1 to construct a boxplot of the finishing times of the men in the 60- to 64-year-old age group.

(The five-number summary is: 19.95, 26.06, 30.95, 37.24, 54.63.)

Objective 2, Page 4

3) If the right whisker of a boxplot is longer than the left whisker and the median is left of the center of the box, what is the most likely shape of the distribution?

Objective 2, Page 5

When describing the shape of a distribution from a boxplot, be sure to justify your conclusion. Possible areas to discuss:

Compare the length of the left whisker to the length of the right whisker
The position of the median in the box
Compare the distance between the median and the first quartile to the distance between the median and the third quartile
Compare the distance between the median and the minimum value to the distance between the median and the maximum value

Objective 2, Page 10

Example 3 Comparing Two Distributions Using Boxplots

Table 14 shows the red blood cell mass (in millimeters) for 14 rats sent into space (flight group) and for 14 rats that were not sent into space (control group). Construct side-by-side boxplots for red blood cell mass for the flight group and control group. Does it appear that space flight affects the rats’ red blood cell mass?

Table 14

Flight Control

7.43 7.21 8.59 8.64 8.65 6.99 8.40 9.66

9.79 6.85 6.87 7.89 7.62 7.44 8.55 8.70

9.30 8.03 7.00 8.80 7.33 8.58 9.88 9.94

6.39 7.54 7.14 9.14

Data from NASA Life Sciences Data Archive

Last Updated on September 20, 2019