Home Study Task 1: Statistical concepts
The following are a set of multiple choice (MCQ), extended multiple choice (EMCQ), closed and open answer questions to test your understanding of the statistical concepts covered in the course.
All answers should be provided underneath each question. Please indicate your answers clearly by either highlighting your choice in bold font or with the word highlighter tool; or by typing or writing your answerunderneath each question or in the gaps provided. Where calculations are required, you are advised to present your workings.
Some MCQs may require more than one selection. There is no negative marking but if you provide more responses than are asked for in an MCQ, then no marks will be awarded.
The marks awarded for each question are indicated in square brackets.
You are advised to read each question carefully.
 Select the most appropriate term for defining data and describing analysis, from the list below, to complete each of the following sentences. Each of the listed terms may be used more than once but use only one term per sentence gap. [6]:
 Continuous
 Discrete
 Ordinal
 Nominal
 Binary
 Point estimate
 Population parameter
 Independent variable
 Dependent variable
 Covariable
 Confounder
a) Stage of cancer (1, 2, 3, 4, 5)is a __________ variable  
b) The number of abnormal cells from a cytological smear is a ___________ variable.  
c) The levels of creatinine in blood samples is a __________ variable.  
d) Sex of a person is a ___________ variable.  
e) The number of teeth a toddler has is a ___________ variable.  
f) The continent from which a person comes from is a ___________ variable.  
g) A that is related to the exposure and outcome in a study is a potential .
h) In a study of the effect of a new treatment for aphthous ulcersversus the standard treatment, treatment is the _____________ and aphthous ulcers is the __________ ; the difference in the number of aphthous ulcers developed observed between groups is a ____________ of the _______________. 

 Which of the following graphs has the largest standard deviation? [1]
 A
 B
 C
 D
 They all have the same variance
 It is not possible to tell from these plots
 A sample of 100womenundergoing root canal were asked to rate their pain and discomfort using a visual analogue scale (VAS). A 100mm line was used to represent the severity of the pain and discomfort they felt. The line was scaled from 0 at one end, which indicated ‘no pain’, up to 100 at the other, which indicated ‘the worst pain imaginable’.A histogram of all the measurements is shown below. Which 2 of the following statements below are false? [2]
 The distribution is positively skewed.
 The distribution is not normally distributed.
 The median would be a good measure of spread of these values
 The majority of women (more than 50%) have scores less than 10.
 More women have scores between 30 and 60 than have scores between 0 and 30.
 The mean would be higher than the median for this set of values.
 The figure below shows the distribution of oral blister diameters in a sample of 2000patientswith oral mucous membrane pemphigoid. What are the units for the scale displayed on the Xaxis at the top of the figure (select one choice)?[1]
 68%, 95% and 99.7% confidence intervals
 Standard deviation
 Standard error
 Log of birth length
 Lines corresponding to those on a box and whisker plot
The information in the table belowshows some statistics calculated from a sample of 2000 patientswith oral mucous membrane pemphigoid. Please use this information to answer question 5:
mean  median  range  interquartile range  standard deviation  standard error  
Blister diameter (mm)  5.0  5.0  1.6 to 8.9  4.3 to 5.7  1.0  0.02 
 A consultant wishes to know the lower and upper limit of blister diameterto identify the most extreme 5% of patientswith oral mucous membrane pemphigoid in terms of blister diameter.
 What can you calculate to help her identify these patients, what assumption regarding the distribution do you make? [3]
 Which oneof the following statements about log transformationsis correct?[1]
 A log transformation is useful for making a distribution that is negatively skewed more normally distributed.
 If we back transform the mean of a set of logged values, it is always equal to the arithmetic mean.
 A log transformation does not change the shape of the distribution.
 A log transformation is a type of linear transformation.
 A log transformation will always make a variable more normally distributed regardless of the shape of the distribution.
 None of the above.
 The figure below shows the relationship between a mother’s prepregnancy body mass index (BMI) (in log to the base 2 units) and the birth weight of her offspring. What is the correct value for the correlation coefficient (select one choice)? [2]
 0
 0.2
 0.95
 2
 95
 Select the most appropriate term from the following list*to fill the gaps in each of the statements below.Each of the listed options may be used once. [2]
 Asymmetrical distribution
 95% Confidence Interval
 Histogram
 Normal distribution
 Range
 95% reference range
 Sample mean
 Sample median
 Sample mode
 Standard deviation
 Standard error
 Population parameter
 The standard error is the __________ of the distribution of sample means.
 The ____________ is our best guess of the true population mean.
 A low exertion fitness test is administered to a sample of 25 menrandomly selected from the appointments database of a periodontal disease clinic. A measure of VO_{2} max (mL/(kg.min)), an indicator of aerobic fitness,is obtained. The aim was to estimate the average aerobic fitness of men attending treatment for periodontal disease.
 What is the target population in this study [1]?
 What is the population parameter of interest [1]?
 What statistic could you calculate to estimate the population parameter of interest [1]?
 Why was a random sample from the appointments database taken as opposed to, for example, selecting all men who attended the clinic during lunch hours over a 3 week period? [3]
 Which of the following can beestimated from this single sample of data? [1]
 The accuracy of the estimate or amount of bias
 The precision of the estimate or the amount of sampling error
 The probability that the random sampling method was successful
 The probability that another estimate would be within a certain range of values
 Which twoof the following statements best describe a 95% confidence interval (CI)?[2]
 It provides an estimate of sample variability.
 If more data is collected, 95% of the new values in the new data will lie within the 95% CI
 There is a 95% probabilitythat the true value lies within the 95% CI
 The true population value is expected to be in this interval 95% of the time
 It provides a range of values that we are 95% confident contain the true value
 If we took 100 repeat samples of the same size and calculated a 95% CI for each, we would expect 95% of the intervals to contain the true population value
 In a sample of data that is normally distributed which statement best describes the 2.5th centile? [1]
 It is the same as the lower limit of a 95% CI of the mean
 Any values lower than the 2.5th centile would be significantly different at the 0.05 level.
 If the data were normally distributed it would approximate the lower limit of a 95% reference range
 If the data were normally distributed it would approximate the lower limit of a 97.5% reference range
 A very large parallel group randomised trial was conducted to compare the effect of taking a 1g/day oral dose of gingerwith a 40mg/day oral dose of vitamin B6 on oral mucosal painin patients with Burning Mouth Syndrome. Oral mucosal painwas rated on a 10 point scale. At follow up,oral mucosal pain had reduced by 2.2 points in the ginger group and by 0.9 in the vitamin B6 group. A twosided test of the differences between groups produced a pvalue of 0.03.
 The researchers did a hypothesis test of the difference in pain between the two groups.Which of the following is the most appropriate type of statistical test for the hypothesis test? [1]
 Independent samples test
 A chisquared test
 Paired samples test
 A test of equal variances
 Write down the null hypothesis for this test? [2]
 Which twoof the following statements concerning the interpretation of thepvalue of 0.03arecorrect? [3]
 The effect is statisticallyinsignificant, there is no difference in the effect of treatment on pain.
 There is only a 3 in 100 chance of seeing such an extreme difference in pain symptoms if the two treatments have equal effects.
 There is only a 3 in 100 chance that the null hypothesis is true.
 03 is the probability that the ginger and vitamin B12 have the same effect on oral mucosal pain.
 There’s some evidence that ginger is more effective than vitamin B12in reducingoral mucosal pain
 03 is the probability that ginger is worse than vitamin B12 in treating oral mucosal pain.
 There’s a 3% probability this is a chance finding.
 What would the pvalue be if a onesided test that ginger is better than vitamin B12in treating oral mucosal painhad been performed? [1]
 The following are the confidence intervals for the difference between the ginger and vitamin B groups in the oral mucosal pain trial above. Indicate whichis the 90% CI, which is the95% CI, and which is the 99% CI? [2]
 3 to 2.3:
 5 to 2.1:
 0.1 to 2.7:
 If randomisation was done using a 11 allocation, which two of the following does the randomisation help to ensure? [2]
 Each patient has an equal chance of receiving either treatment.
 The doctor is blinded to the treatment the patient is receiving.
 Confounding is minimised
 Loss to follow up bias is minimised
 The two groups are treated in the same way apart from the treatment under study.
 Reverse causality is minimised
 Researchers at a large dental hospital wanted to examine whether alcohol consumption was associated with oral cancer. They recruited a random sample of100patients with a recent diagnosis of oral squamous cell carcinoma and a random sample of100 patients with no diagnosis of oral cancer.At the same time they asked the patientto complete a questionnaire about alcohol intake and other lifestyle factorsduring their pregnancy. What type of study design is this? [1]
 Casecontrol study.
 Randomisedcontrolled trial.
 Crosssectional study
 Retrospective cohort study
 Natural experiment
 None of the above
