HASC 413 Assignment one
Question 1:[26 marks]
The dataset “stoneclinic”contains a selection of interview responses and test results collected during routine Renal Calculus (Kidney Stone) clinics held at a New Zealand Hospital. A data dictionary for this dataset is included in Appendix 1.
- [10 points] Check all the variables in this dataset. Document any potential data errors in a nice table. Describe the extent to which data is missing.
- [8 points] Produce a table describing the demographics of this sample dataset using appropriate summary statistics. (like what is often presented in Table 1 of a journal article). Write a paragraph to accompany this table.
- [5 points] The researchers are interested in assessing if Dietary intake of Calcium are different between males and females. Carry out some descriptive analyses to answer this question. Summarise your findings in a paragraph referring to appropriate graphs or/and tables.
- [3 points] We want to look at whether dietary calcium intake affects 24 hour urine calcium levels. Produce a suitable plot to display the relationship between dietary calcium and 24 hour urine calcium levels. Interpret the plot.
Question 2:[14 marks]
Search and read the following paper: A randomised, controlled study on theeffects of a short-term endurance training programme in patients with major depression. K Knubben, F M Reischies, M Adli, P Schlattmann, M Bauer, F Dimeo. Br J Sports Med 2007;41:29–33.
- [4 marks] Use table 1 to find :
- a nominal variable
- an ordinal variable
- a binary variable
- a continuous variable
- [9 marks] Identify the response variable(s), explanatory variable(s) and the potential confounding variable(s) in this study. Please provide your reasoning.
- [1 marks] Explain why figure 1 is superfluous.
Question 3:[14 marks]
The following questions use the Framingham Heart Study, an important longitudinal study that first identified the major risk factors for cardiovascular disease. You can download a subset of the data from Blackboard. You will also find a description on the Framingham Study there, that you will need to read in order to answer the questions below. Your answers may include figures and tables that need to be inserted and properly referenced into this document. Remember that all your answers must be supported, concisely and clearly.
- [4 marks] Get detailed summaries of baseline total cholesterol for both sexes, which include means, medians, ranges and standard deviations.
- [1 mark] What are the three smallest values for baseline total cholesterol?
- [1 mark] What is the baseline cholesterol value that 75% of the sample men have lower values than?
- [1 marks] Are men or women more variable in their total baseline cholesterol?
- [1 mark] What is the median baseline total cholesterol value for women over the age of 60?
- A high blood level of LDL increases the risk of heart attack and stroke, whereas a high blood level of HDL lowers those risks. We are often interested in looking at the ratio of LDL/HDL.
- i) [2 marks] Create a new variable that represents this ratio, and find the median value for the final period of observation.
- ii) [4 marks] Create a scatter plot of this new variable versus the total cholesterol, for period three. Since a few of the very large values of total cholesterol obscure the effect, only use values where total cholesterol is less than 400 mg/dl. How do these quantities appear to be related?
Question 4:
[15 marks]
The cuckoo birds are classified as what is known as brood parasites, because they lay their eggs in the nests of other birds, called hosts. Then the eggs are hatched and the young cuckoo chicks are raised by the hosts. Very often the cuckoo egg hatches earlier than the host’s, and the cuckoo chick grows faster, sometimes evicting the eggs or young of the host species. A study found that cuckoos return each year to the same territory and lay their eggs in the nests of the same host species. Therefore, it is interesting to compare the size of the cuckoo eggs laid at particular hosts in order to determine if different sub-species are developed. The following data are the lengths (in mm) of cuckoo eggs found at four different host species: Hedge Sparrow, Robin, Pied Wagtail, and Wren. Please carry out some descriptive analyses to help the researchers to answer their research question.
Hedge Sparrow: 20.85, 21.65, 22.05, 22.85, 23.05, 23.05, 23.05, 23.05, 23.45, 23.85, 23.85, 23.85, 24.05, 25.05.
Robin: 21.05, 21.85, 22.05, 22.05, 22.05, 22.25, 22.45, 22.45, 22.65, 23.05, 23.05, 23.05, 23.05 23.05, 23.25, 23.85.
Pied Wagtail: 21.05, 21.85, 21.85, 21.85, 22.05, 22.45, 22.65, 23.05, 23.05, 23.25, 23.45, 24.05, 24.05, 24.05, 24.85.
Wren: 19.85, 20.05, 20.25, 20.85, 20.85, 20.85, 21.05, 21.05, 21.05, 21.25, 21.45, 22.05, 22.05, 22.05, 22.05, 22.25.
Question 5:
[4 marks]
In New Zealand one of the commonest autosomal recessive disorders is cystic fibrosis, with about one in 3500 live births being affected. If both parents are heterozygous for the abnormal gene there is a 1 in 4 chance of their child having cystic fibrosis (Cystic Fibrosis Association of New Zealand).
- [1 mark] What is the probability that a couple who are both heterozygous will have two unaffected children and one affected children?
- [1 mark] If they have two unaffected children and one affected children, what is the probability that the fourth child will be unaffected?
- [2 marks] Cystic Fibrosis occurs in people of Caucasian origin, but it is rare in Africans, Asians and Polynesians. In New Zealand, it is estimated that about 1 in 25 of the Caucasian population will be a carrier of the abnormal gene. In a hospital where there are 2400 live births of Caucasian origin, what is the expected number of Caucasian babies per year that are affected by cystic fibrosis (assuming that there is no genetic counselling)?
Question 7:
[4 marks]
A drug company has developed a new pregnancy test for use on an outpatient basis. The company anticipates that of the women who will use the pregnancy test kit, 10% will actually be pregnant. The sensitivity of the test is 0.95 and the specificity of the test is 0.99. A woman who presents for the test produces a positive result indicating that she is pregnant. What is the probability that she is actually pregnant?
Appendix 1
KIDNEY STONE CLINIC STUDY | DATA DICTIONARY | |
Variable | Label | Values |
idno | Identification number | |
age | Age of patient (years) | |
fhstone | Family history of kidney stones | 1=No, 2=Yes, 7=Don’t Know |
phinfect | Past History of Urinary Tract Infection | 1=No, 2=Yes |
dumg1 | 1st sample 24 hr urine magnesium level (mmol/l) | |
dumg2 | 2nd sample 24 hr urine magnesium level (mmol/l) | |
dcalcim | Dietary Intake of Calcium (mmol/day) | |
ethnicity | Ethnicity of patient | 1=European, 2=Maori, 3=Polynesian, 4=Other |
height | Height of patient (cm) | |
weight | Weight of patient (kg) | |
mult_stn | Multiple stones | 1=one stone, 2=more than one stone |
huca | One hour urine calcium level (mmol/l) | |
hypcau | Hypercalciuria | 1=No, 2=Yes |
duk1 | 1st sample 24 hr urine potassium level (mmol) | |
duk2 | 2nd sample 24 hr urine potassium level (mmol) | |
duca | 24 hour urine calcium (mmol/l) | |
blca | Blood calcium (mmol/l) | |
gender | Patient’s gender | 1=Male, 2=Female |