**Statistics Methods and Inference**

*TUTOR-MARKED ASSIGNMENT (TMA)*

**This assignment is worth 16 % of the final mark for MTH220e –Statistics Methods and Inference.**

## Statistics Methods and Inference Question 1

(a) (i) Find the 0.05–quantile of W, where W~ χ2(12).

(ii) Find the value of k such that P(W > k) = 0.01, where W~ χ2(15).

(2 marks)

(2 marks)

(iii) Find the best possible lower bound and the best possible upper bound on P(W > 16.5), where W~ χ2(8).

(3 marks)

(b) Consider a decision by mission controllers regarding scheduling tasks for an orbiting astronomical platform. Because of its complexity and the hostile space environment, the satellite system experiences random malfunctions in its various subsystems.

Controllers exert considerable effort to correcting these.

One scientist has postulated that any greater or lesser incidence of malfunctions in some subsystems may be explained by differences in the satellite’s environment when exposed directly to the sun as opposed to being shaded. If so, various tasks might be scheduled during particular phases of orbit, thereby greatly increasing the satellite’s chances for meeting various mission goals.

From the log of the shakedown orbits, during which system testing was done, the number of malfunctions was noted. These are summarized in contingency Table 1, which involves two qualitative variables. The position variable is represented by a separate column for each attribute, one for when the platform is directly in line with the sun and the other for when it is in Earth’s shadow. There is a different row for each attribute of the second variable, the subsystem that malfunctioned.

Five subsystems have been deemed to be potentially affected by the sun: (1) data- transmission, (2) power, (3) data collection, (4) reception, and (5) mechanical. The value within each cell is the number of observations sharing the respective attributes for the two populations.

The data in Table Q1(b) are a representative random sample of the respective populations for all malfunctions that might arise during the lifetime of the satellite system. The scientist wishes to use these data to determine whether solar radiation affects the incidence of subsystem malfunction. The null hypothesis being tested is that exposure to the sun has no effect on any subsystem’s operating characteristics.

Carry out a chi-square test to determine whether this sample data provide sufficient information to conclude that subsystem and position when a malfunction occurs are two independent variables. You are required to state the hypotheses, compute the expected

frequencies, calculate the value of the test statistic and report the conclusions. Use α = 0.01 in your analysis.

(20 marks)

Subsystem | Position | |

(1) Direct Sunlight | (2) Earth’s Shadow | |

(1) Data Transmission | 30 | 4 |

(2) Power | 10 | 6 |

(3) Data Collection | 43 | 11 |

(4) Reception | 41 | 6 |

(5) Mechanical | 12 | 10 |

**Table Q1(b)**

#### Statistics Methods and Inference Question 2

(a) A study was conducted to investigate the association between the height of the automobile from the top of the door to the ground and the luggage capacity of the automobile’s boot. The data is shown in Table Q2(a).

(i) Determine the Pearson correlation coefficient between these two variables. You need to show the details of your workings clearly.

(9 marks)

(ii) Interpret and comment on the validity of your results.

(3 marks)

Automobile | Door to ground height
| Luggage capacity(Y) |

1 | 47.5 | 16 |

2 | 50.0 | 12 |

3 | 50.1 | 15 |

4 | 49.5 | 10 |

5 | 51.5 | 18 |

6 | 52.0 | 12 |

7 | 48.5 | 10 |

8 | 49.5 | 11 |

9 | 48.0 | 12 |

10 | 50.0 | 11 |

11 | 49.0 | 13 |

12 | 49.0 | 12 |

13 | 50.0 | 18 |

14 | 50.5 | 9 |

15 | 49.5 | 11 |

16 | 49.5 | 13 |

17 | 51.0 | 18 |

18 | 51.0 | 17 |

**Table 2(a)**

(b) Comment on the statement: Correlation is not causation. Illustrate using an example of your choice.

(4 marks)

(c) What is the difference between Pearson correlation coefficient and Spearman rank correlation coefficient? Briefly discuss the advantage(s) of Spearman rank correlation coefficient over the Pearson correlation coefficient.

(4 marks)

(d) A sample data of the times (in minutes) that Mr. Tan waited for the bus for the past few weeks was collected. A Wilcoxon signed rank test was used to analyze whether the data provide sufficient evidence that on average, Mr. Tan waits more than 20 minutes for the bus. The p value of the test is 0.257. At the 5 % level of significance, what is your conclusion?

(2 marks)

#### Statistics Methods and Inference Question 3

(a) The following data were collected on the birth weights (in grams) of 15 babies who suffered from Sudden Infant Death Syndrome (SIDS):

3345 3629 2041 2240 3714 3289 3487 2013

3090 3260 4309 3374 3544 2835 3827

Assume that the data can be reasonably modeled by a normal distribution. Interest centered on comparing these figures with an underlying mean birth weight of 3300 g. (i) Perform a fixed-level testing at the 5 % level of significance of the null hypothesis

H0: µ = 3300 against the alternative hypothesis H1: µ ≠ 3300. Is this a one-sided or two-sided hypothesis test?

What is the distribution of the test statistic when H0 is true? In your answers, state the value of the test statistic and the degrees of freedom.

By obtaining the boundary points of the rejection region for the test, draw your conclusions. You are also required to use R and show the screenshots of the R commands for this hypothesis test to verify the output results.

(15 marks)

(ii) Based on this sample, construct a 90 % confidence interval for the mean birth weight.

(4 marks)

(b) A plastic casing for a magnetic disk is composed of two halves. The thickness of each half is normally distributed with a mean of 2 millimeters and a standard deviation of 0.1 millimeter. The two halves are independent.

(i) What is the mean and standard deviation of the total thickness of the two halves?

(5 marks)

(ii) Determine the probability that the total thickness exceeds 4.3 millimeters.

(3 marks)

(c) A study was conducted to estimate the costs of employee absences. Based on a sample of 176 blue-collar workers, it was estimated that the mean amount of paid time lost during a three-month period was 1.4 days per employee with a standard deviation of 1.3 days. It was also estimated that the mean amount of unpaid time lost during a three-month period was 1.0 day per employee with a standard deviation of 1.8 days.

(i) Suppose that we randomly select a sample of 100 blue-collar workers. State the appropriate distributions for the random variables under study. What is the probability that the average amount of paid time lost during a three-month period for the100 blue-collar workers will exceed 1.5 days?

(4 marks)

(ii) What is the probability that the average amount of unpaid time lost during a threemonth period for the100 blue-collar workers will exceed 1.5 days?

(3 marks)

#### Statistics Methods and Inference Question 4

(a) In a classic study of problem solving, a researcher asked participants to mount a candle on a wall in an upright position so that it would burn normally. One group of participants was given a candle, matches, and a box of tacks. A second group was given the same items, except the tacks and the box were presented separately as two distinct items.

The solution to this problem involves using the tacks to mount the box on the wall, which creates a shelf for the candle. The researcher reasoned that the first group of participants would have trouble seeing a “new” function of the box (a shelf) because it was already serving a function (holding tacks). For each person, the amount of time to solve the problem was recorded. The data obtained are shown in Table Q4(a) below.

Time to solve | Problems (in seconds) |

Box of Tacks | Tacks and Box Separate |

128 | 42 |

160 | 24 |

53 | 68 |

101 | 35 |

94 | 47 |

**Table 4(a)**

(i) State the null and alternative hypotheses for the two-tailed test.

(2 marks)

(ii) Solve and perform a two-sample t test at a significant level of α = 0.01 to determine whether these data indicate a significant difference between the two conditions.

(12 marks)

(b) Determine the missing values in the one-way ANOVA below.

Source SS DF MS F statistic

Treatment 382.79 3 ? ?

Error ? 20 6.51

Total 512.96 23

Table 4(b)

(3 marks)

—- END OF ASSIGNMENT —-