Assignment questions
Part 1: Data cleaning, documentation and data dictionary update 65%
Explore the data and decide on the approach to clean GP and ED datasets. For this assignment, you should concentrate on within-dataset cleaning. You are not required to cross-check inconsistencies between the two datasets, so no need to merge them together for this assignment.
For Part 1, you are required to:
- Present your work in cleaning GP data in the following forms (25 marks):
- A written document explaining the process of GP data exploration, data cleaning, decisions made, and results of your analyses (10 marks);
- A flowchart to graphically present procedures taken for cleaning GP data (5 marks); and SAS code showing analysis for GP data exploration, data cleaning and annotations (10 marks).
2. Present your work in cleaning ED data in the following forms (20 marks):
- A written document explaining the process of ED data exploration, data cleaning, decisions made, and results of your analyses (8 mark);
- A flowchart to graphically present procedures taken for cleaning ED data (5 marks); and SAS code showing analysis for ED data exploration, data cleaning and annotations (7 marks).
3. Create new variables in the GP data (see definitions in Table 1) (10 marks):
- Create variable smoke_status_GP to indicate a person’s smoking status. Describe/justify your decision in the written document (6 marks);
- Create variable risky_alcohol_GP to classify health risk alcohol consumption (1 mark); Calculate BMI score (variable BMI_GP) (1 mark);
- Create variable obese_GP to indicate whether a person is obese (1 mark); and Create variable highBP_GP to indicate whether a person has high blood pressure (1 mark).
Table 1: New health risk factors variables, values and definitions (in people aged 18 years and over)
New variable Values, definition, and label of values |
smoke_status_GP 0=Never smoked
1=Current smoker 2=Ex-smoker |
risky_alcohol_GP 0=No (≤2 drinks per day)
1=Yes (>2 drinks per day) |
BMI_GP Weight/Height2 (weight divided by the square of the height; weight is measured in kg, height is measured in meters) |
obese_GP 0=No (BMI<30)
1=Yes (BMI ≥30) |
highBP_GP 0=Normal blood pressure (systolic<135mmHg & diastolic<85mmHg) 1=High blood pressure (systolic≥135mmHg or diastolic≥85mmHg) |
4. Update the data dictionaries. If you decide not to update a data dictionary, you should provide reasons for not updating the data dictionary (10 marks).
- Present an updated data dictionary for GP dataset, based on results of data cleaning step A and C above (5 marks).
- Present an updated data dictionary for ED dataset, based on results of data cleaning step B above (5 marks).
Part 2: Research Question 20%
The manager of the Medical Plus GP practice is planning for health care coordination within the practice and wants to know characteristics of their clients such as socio-demographic characteristics, lifestyle factors, health status and others. You will be helping the practice manager to analyse the GP dataset and report on patient characteristics. Both practice manager and you have agreed that the analysis is based on a cleaned GP dataset (i.e. the dataset you cleaned in Part 1) and the report (in Word or PDF) is reproducible.
For Part 2, you are required to analyse the dataset that you cleaned in Part 1 and:
- In the report, present results of your analysis in tabular format (7 marks). Your table(s) should be presented in an academic format similar to what would be found in the results section of a published journal article. You can present more than one table.
- In the report, provide written interpretation of the results (8 marks).
Your written interpretation of results should be presented in academic writing styles.
- Include SAS code that generates the results that you report above (5 marks). Your SAS code should be well annotated and functioning.
Part 3: Data linkage 15%
No data analysis is required to answer Part 3 questions.
Medical Plus GP Manager wishes to link their GP dataset to the Registry of Births, Deaths and Marriages (RBDM) deaths and PBS data to examine medication compliance among their patients. As the data custodian, the Medical Plus GP has access to patient identification information (names, addresses, dates of birth and Medicare number). RBDM data custodian has access to the identification information (names, addresses, and dates of birth). PBS data custodian has access to the Medicare number only. A research institute will be contracted to analyse linked data.
For this assignment, we can assume that patients have given consent for data linkage and ethics approval has been granted. The linkage will be carried out by the Centre for Health Record Linkage (CHeReL).
- What data linkage strategies will CHeReL use to link the GP dataset to RBDM deaths and PBS data? Justify your decision (5 marks).
- B. Draw a diagram depicting variable exchange information between data custodians (GP, PBS and RBDM data), CHeReL and analyst to depict information interchange for data linkage and analysis purpose. Justify this data exchange process (10 marks).
GP data dictionary
Variable | Description | Variable type | Format name | Allowable entries |
ID | Unique person ID | Number | ||
GP_last | Date of most recent GP visit | Date | DDMMYY10. | Dates in the range
01/01/2014 – 31/12/2014 |
age | Age of patient at the most recent GP visit in 2014 | Number | ||
sex | Gender of the patient | Character | 1=male
2=female |
|
cob | In what country were you born? | Number | cobf. | 1= Born in Australia
2= Born overseas |
healthcare_card | Do you have a healthcare card1? | Number | ynf. | 1= Yes
0= No |
ever_smoked | Have you ever been a regular smoker? | Number | ynf. | 1= Yes
0= No |
smoke_now | Are you a regular smoker now? | Number | ynf. | 1= Yes
0= No |
age_start | How old were you when you started smoking regularly? | Number | Invalid if <10 or >105 | |
age_stop | How old were you when you stopped smoking? Or when did you stop smoking? | Number | Invalid if <10 or >105 | |
drinks_day | About how many alcoholic drinks do you drink per day? | Number | Invalid if >20 | |
height | How tall are you without shoes? (meters) | Number | Invalid if <0.55m or >2.40m | |
weight | About how much do you weigh? (kilograms) | Number | Invalid if <5.0kg or >270kg | |
adverse_reaction | Have you had any adverse reaction to any medication? | Number | ynf. | 1= Yes
0= No |
syst_bp | Systolic blood pressure (mmHg) | Number | ||
diast_bp | Diastolic blood pressure (mmHg) | Number | ||
reason | Reason for the most recent GP visit | Character | HEADACHE
NAUSEA TINNITUS VOMITING ITCHING ABDOMINAL PAIN DIZZINESS SKIN RASH PALPITATIONS HALLUCINATIONS |
1 Australian residents may be eligible to have a Health Care Card if they receive financial support from the government. Benefits include a lower fee for prescription medicines under the Pharmaceutical Benefits Scheme, higher refunds for medical expenses through the Medicare Safety Net, and some other social concession.
ED data dictionary
Variable | Description | Variable type | Format name | Allowable entries |
ID | Unique person ID | Number | ||
ed_admission | Date of ED presentation | Date | DDMMYY10. | Dates in the range
01/01/2014 – 31/12/2014 |
ed_separation | Date of ED separation | Date | DDMMYY10. | Dates in the range
01/01/2014 – 31/12/2014 |
age_ed | Age of patient at ED
presentation |
Number | ||
sex_ed | Gender of the patient | Number | sexf. | 1=male
2=female |
cob_ed | In what country were you born? | Number | cobf. | 1= Born in Australia
2= Born overseas |
interpreter | An interpreter is needed? | Number | ynf. | 1= Yes
0= No |
health_insurance | Do you have private health insurance? | Number | ynf. | 1= Yes
0= No |
triage_category | Urgency of presentation | Number | triagef. | 1 = Resuscitation
2 = Emergency 3 = Urgent 4 = Semi urgent 5 = Non urgent |
dx1 | Principal presenting diagnosis (ICD-10-AM codes) | Character | International Statistical
Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification 8th edition |
|
dx2-dx5 | Up to 4 additional diagnoses (ICD-10-AM codes) | Character | International Statistical
Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification 8th edition |
|
separation_mode | Status of the person at
separation from emergency department |
Number | sepmodef. | 1 = Admitted to hospital
2 = Departed ED 3 = Died in ED 4 = Dead on arrival |
Marking rubric for quality of Assignment 2A
The grading rubric is outlined below.
Criteria | HD
85 – 100 |
D
75 – 84 |
Credit
65 – 74 |
Pass
50 – 64 |
Fail
Less than 50 |
Written
documentation of reproducible notes – data dictionary, cleaning notes, flowcharts Question answers |
Full logic shown with elaborate and very coherent
descriptions which goes beyond replication All correct answers, full working shown, with in-depth reflection and suggestions |
Full logic shown with coherent
description All correct answers, full working shown, with some critical reflection |
Most logic shown with mainly
adequate detail Mostly correct answers, with some misinterpretation, working shown |
Some
documentation presented but logic not clear Incomplete answers, misinterpreted results in most parts. |
Not
presented No answer |
SAS code Fully functioning and reproducible code with extensive and consistent annotation and commenting throughout Fully functioning and reproducible code with moderate but consistent annotation Functioning code with some annotation Code with some errors, lacking annotation. Code with errors in most parts, or no code provided.
The scoring rubric for quality of Assignment 2A is outlined below.
Assessment of written document* and SAS code** | Max score | Your mark |
Part 1. Documentation of data cleaning decision | ||
1.A – GP dataset exploration and cleaning of (25 marks) | ||
A written document* explaining results of exploration, decisions made, and results of GP data cleaning | 10 | |
Flow chart for GP data cleaning | 5 | |
SAS code showing work for GP data exploration, decisions made and producing results of cleaning | 10 | |
1.B – ED dataset exploration and cleaning (20 marks) | ||
A written document* explaining results of exploration, decisions made, and results of ED data cleaning | 8 | |
Flow chart for ED data cleaning | 5 | |
SAS code showing work for ED data exploration, decisions made and producing results of cleaning | 7 | |
1.C – SAS code to create 4 new variables (10 marks) | ||
Smoke_status_GP (including documented justification) | 6 | |
Risky_alcohol_GP | 1 | |
BMI_GP | 1 | |
Obese_GP | 1 | |
HighBP_GP | 1 | |
1.D – Update data dictionaries (10 marks) | ||
Updated data dictionary for GP dataset | 5 | |
Updated data dictionary for ED dataset | 5 | |
Part 2. Analyse cleaned GP data and report (20 marks) | ||
Tabular presentation of results | 7 | |
Written interpretation of results* | 8 | |
SAS code to generate results | 5 | |
Part 3. Data linkage (15 marks) | ||
Data linkage strategies | 5 | |
Diagram depicting variable exchange information | 10 | |
TOTAL MARK | 100 |
*: Assessment of written document includes the use of academic and linguistic conventions, coherent and succinct discussion with supporting information presented.
**: Assessment of SAS code include i) correct use of common SAS commands introduced in the course, ii) effective creation of variables that takes into account variable type and values, and iii) clear annotation that enables reproducibility.
HDAT9400 Management and Curation of Health Data
Management and Curation of Australian Health Data
Assignment 2A -Group work
Please grade (Excellent to No contribution) and provide a score (0 to 20) for each other member of your group for their contribution to group process. If a peer made absolutely no contribution to the group, please score zero.
Contribution grading | Score
(0 to 20) |
Comment | ||||||
t
n e l l e c x E |
d
o o g
y r e V |
y
r o t c a f s i t a S |
r
i a F |
r o o P |
n
o i t
u o b i N r t n o c |
|||
1. Insert Peer’s name | ||||||||
2. Insert Peer’s name | ||||||||
3. Insert Peer’s name | ||||||||
4. Insert Peer’s name |
Group check list (not for submission, just for your information)
For group tasks and assignments, it is in everybody’s interest to ensure an effective contribution from all group members, to make sure that the finished assignment is of high quality. For this assignment, peer assessment is used to determine the relative contributions of everyone to the group process and this will be used to moderate the marks for the assignment.
The check list below is recommended to guide peer assessment and to monitor and improve effective group process. Please refer to the UNSW Guide https://student.unsw.edu.au/groupwork for the value of work group and principles for effective group work.
Check list | Myself | Peer | Comments |
Effectively clarifying task or objective at each stage? | |||
Checking on progress? | |||
Clarifying and recording what the group decides? | |||
Clarifying who is going to do what? | |||
Clarifying when each task is to be done by? | |||
Establishing procedures for handling meetings? | |||
Keeping to agreed procedures? | |||
Listening to each other? | |||
Dominating / Allowing some members to dominate? | |||
Withdrawing / Allowing some members to withdraw? | |||
Compromising individuals wants for the sake of the team? | |||
Recognising the feelings of other members? | |||
Contributing equally to team progress? | |||
Following agreed procedures for writing and file naming? |
Peer assessment to moderate individual final mark
Individual contribution score (0 to 20) will be calculated as an average of scores given by peers. If individual contribution score is greater than zero, the final mark of an individual is the sum of scores for assignment quality and individual contribution. If individual contribution score is zero (i.e. no contribution was made), the final mark for that particular individual is zero. Table below illustrates how final mark will be moderated based on quality of group work and individual contribution.
Group member | Quality of group work (marks out of 80) | Individual
contribution score (marks out of 20) |
Final mark
(marks out of 100) |
Member 1 | 75 | 18 | 93 |
Member 2 | 75 | 15 | 90 |
Member 3 | 75 | 5 | 80 |
Member 4 | 75 | 0 (zero) | 0 (zero) |