Assessing Data Quality Assignment Answer

Assignment questions 

Part 1: Data cleaning, documentation and data dictionary update 65% 

Explore the data and decide on the approach to clean GP and ED datasets. For this assignment, you should  concentrate on within-dataset cleaning. You are not required to cross-check inconsistencies between the two  datasets, so no need to merge them together for this assignment.  

For Part 1, you are required to:

  1. Present your work in cleaning GP data in the following forms (25 marks):
  • A written document explaining the process of GP data exploration, data cleaning, decisions  made, and results of your analyses (10 marks); 
  • A flowchart to graphically present procedures taken for cleaning GP data (5 marks); and SAS code showing analysis for GP data exploration, data cleaning and annotations (10 marks).

2. Present your work in cleaning ED data in the following forms (20 marks):

  • A written document explaining the process of ED data exploration, data cleaning, decisions  made, and results of your analyses (8 mark);  
  • A flowchart to graphically present procedures taken for cleaning ED data (5 marks); and  SAS code showing analysis for ED data exploration, data cleaning and annotations (7 marks). 

3. Create new variables in the GP data (see definitions in Table 1) (10 marks):

  • Create variable smoke_status_GP to indicate a person’s smoking status. Describe/justify your  decision in the written document (6 marks); 
  • Create variable risky_alcohol_GP to classify health risk alcohol consumption (1 mark); Calculate BMI score (variable BMI_GP) (1 mark); 
  • Create variable obese_GP to indicate whether a person is obese (1 mark); and Create variable highBP_GP to indicate whether a person has high blood pressure (1 mark). 

Table 1: New health risk factors variables, values and definitions (in people aged 18 years and over) 

New variable Values, definition, and label of values
smoke_status_GP 0=Never smoked 

1=Current smoker 

2=Ex-smoker

risky_alcohol_GP 0=No (≤2 drinks per day) 

1=Yes (>2 drinks per day)

BMI_GP Weight/Height2 (weight divided by the square of the height; weight is  measured in kg, height is measured in meters)
obese_GP 0=No (BMI<30) 

1=Yes (BMI ≥30)

highBP_GP 0=Normal blood pressure (systolic<135mmHg & diastolic<85mmHg) 1=High blood pressure (systolic≥135mmHg or diastolic≥85mmHg)

4. Update the data dictionaries. If you decide not to update a data dictionary, you should provide  reasons for not updating the data dictionary (10 marks). 

  • Present an updated data dictionary for GP dataset, based on results of data cleaning step A and C above (5 marks). 
  • Present an updated data dictionary for ED dataset, based on results of data cleaning step B above (5 marks).

Part 2: Research Question 20% 

The manager of the Medical Plus GP practice is planning for health care coordination within the practice and  wants to know characteristics of their clients such as socio-demographic characteristics, lifestyle factors,  health status and others. You will be helping the practice manager to analyse the GP dataset and report on  patient characteristics. Both practice manager and you have agreed that the analysis is based on a cleaned  GP dataset (i.e. the dataset you cleaned in Part 1) and the report (in Word or PDF) is reproducible. 

For Part 2, you are required to analyse the dataset that you cleaned in Part 1 and: 

  1. In the report, present results of your analysis in tabular format (7 marks).  Your table(s) should be presented in an academic format similar to what would be found in the  results section of a published journal article. You can present more than one table
  2. In the report, provide written interpretation of the results (8 marks).  

Your written interpretation of results should be presented in academic writing styles. 

  1. Include SAS code that generates the results that you report above (5 marks)Your SAS code should be well annotated and functioning

Part 3: Data linkage 15% 

No data analysis is required to answer Part 3 questions. 

Medical Plus GP Manager wishes to link their GP dataset to the Registry of Births, Deaths and Marriages  (RBDM) deaths and PBS data to examine medication compliance among their patients. As the data  custodian, the Medical Plus GP has access to patient identification information (names, addresses, dates of  birth and Medicare number). RBDM data custodian has access to the identification information (names,  addresses, and dates of birth). PBS data custodian has access to the Medicare number only. A research  institute will be contracted to analyse linked data. 

For this assignment, we can assume that patients have given consent for data linkage and ethics approval  has been granted. The linkage will be carried out by the Centre for Health Record Linkage (CHeReL). 

  1. What data linkage strategies will CHeReL use to link the GP dataset to RBDM deaths and PBS data? Justify your decision (5 marks).
  2. B. Draw a diagram depicting variable exchange information between data custodians (GP, PBS and RBDM data), CHeReL and analyst to depict information interchange for data linkage and analysis purpose. Justify this data exchange process (10 marks).

GP data dictionary 

Variable  Description  Variable type  Format name  Allowable entries 
ID  Unique person ID  Number
GP_last  Date of most recent GP visit  Date  DDMMYY10.  Dates in the range  

01/01/2014 – 31/12/2014 

age  Age of patient at the most recent GP  visit in 2014 Number
sex  Gender of the patient  Character  1=male  

2=female 

cob  In what country were you born?  Number  cobf.  1= Born in Australia  

2= Born overseas 

healthcare_card  Do you have a healthcare card1 Number  ynf.  1= Yes  

0= No 

ever_smoked  Have you ever been a regular  smoker?  Number  ynf.  1= Yes  

0= No 

smoke_now  Are you a regular smoker now?  Number  ynf.  1= Yes  

0= No 

age_start  How old were you when you started  smoking regularly?  Number  Invalid if <10 or >105 
age_stop  How old were you when you stopped  smoking? Or when did you stop  smoking? Number  Invalid if <10 or >105 
drinks_day  About how many alcoholic drinks do  you drink per day?  Number  Invalid if >20 
height  How tall are you without shoes?  (meters)  Number  Invalid if <0.55m or >2.40m 
weight  About how much do you weigh?  (kilograms)  Number  Invalid if <5.0kg or >270kg 
adverse_reaction  Have you had any adverse reaction  to any medication?  Number  ynf.  1= Yes  

0= No 

syst_bp  Systolic blood pressure (mmHg)  Number 
diast_bp  Diastolic blood pressure (mmHg)  Number
reason  Reason for the most recent GP visit  Character  HEADACHE  

NAUSEA  

TINNITUS  

VOMITING  

ITCHING  

ABDOMINAL PAIN 

DIZZINESS  

SKIN RASH  

PALPITATIONS  

HALLUCINATIONS 

1 Australian residents may be eligible to have a Health Care Card if they receive financial support from the government. Benefits include a lower fee  for prescription medicines under the Pharmaceutical Benefits Scheme, higher refunds for medical expenses through the Medicare Safety Net, and  some other social concession.

ED data dictionary

Variable  Description  Variable type  Format name  Allowable entries 
ID  Unique person ID  Number
ed_admission  Date of ED presentation  Date  DDMMYY10.  Dates in the range  

01/01/2014 – 31/12/2014 

ed_separation  Date of ED separation  Date  DDMMYY10.  Dates in the range  

01/01/2014 – 31/12/2014 

age_ed  Age of patient at ED  

presentation

Number
sex_ed  Gender of the patient  Number  sexf.  1=male  

2=female 

cob_ed  In what country were you born?  Number  cobf.  1= Born in Australia  

2= Born overseas 

interpreter  An interpreter is needed?  Number  ynf.  1= Yes  

0= No 

health_insurance  Do you have private health  insurance?  Number  ynf.  1= Yes  

0= No 

triage_category  Urgency of presentation  Number  triagef.  1 = Resuscitation  

2 = Emergency  

3 = Urgent  

4 = Semi urgent  

5 = Non urgent 

dx1  Principal presenting diagnosis  (ICD-10-AM codes)  Character  International Statistical  

Classification of Diseases and  Related Health Problems, Tenth  Revision, Australian  

Modification 8th edition 

dx2-dx5  Up to 4 additional diagnoses  (ICD-10-AM codes)  Character  International Statistical  

Classification of Diseases and  Related Health Problems, Tenth  Revision, Australian  

Modification 8th edition 

separation_mode  Status of the person at  

separation from emergency  department 

Number  sepmodef.  1 = Admitted to hospital  

2 = Departed ED  

3 = Died in ED  

4 = Dead on arrival 

Marking rubric for quality of Assignment 2A 

The grading rubric is outlined below. 

Criteria  HD 

85 – 100

75 – 84

Credit 

65 – 74

Pass 

50 – 64

Fail 

Less than 50

Written  

documentation of  reproducible  

notes – data  

dictionary,  

cleaning notes,  

flowcharts  

Question answers 

Full logic shown  with elaborate and  very coherent  

descriptions which  goes beyond  

replication 

All correct  

answers, full  

working shown,  with in-depth  

reflection and  

suggestions

Full logic shown  with coherent  

description 

All correct  

answers, full  

working shown,  with some  

critical reflection

Most logic shown  with mainly  

adequate detail  

Mostly correct  

answers, with  

some  

misinterpretation,  working shown

Some  

documentation  presented but  

logic not clear 

Incomplete  

answers,  

misinterpreted  results in most  parts.

Not  

presented 

No answer

SAS code Fully functioning and reproducible code with extensive and consistent  annotation and commenting throughout Fully functioning and reproducible code with  moderate but  consistent  annotation Functioning code with some  annotation Code with some  errors, lacking  annotation. Code with  errors in  most parts,  or no code  provided.

The scoring rubric for quality of Assignment 2A is outlined below. 

Assessment of written document* and SAS code**  Max score  Your mark
Part 1. Documentation of data cleaning decision
1.A – GP dataset exploration and cleaning of (25 marks)
A written document* explaining results of exploration, decisions made, and  results of GP data cleaning  10
Flow chart for GP data cleaning  5
SAS code showing work for GP data exploration, decisions made and  producing results of cleaning  10
1.B – ED dataset exploration and cleaning (20 marks)
A written document* explaining results of exploration, decisions made, and  results of ED data cleaning  8
Flow chart for ED data cleaning  5
SAS code showing work for ED data exploration, decisions made and  producing results of cleaning  7
1.C – SAS code to create 4 new variables (10 marks)
Smoke_status_GP (including documented justification)  6
Risky_alcohol_GP  1
BMI_GP  1
Obese_GP  1
HighBP_GP  1
1.D Update data dictionaries (10 marks)
Updated data dictionary for GP dataset  5
Updated data dictionary for ED dataset  5
Part 2. Analyse cleaned GP data and report (20 marks)
Tabular presentation of results  7
Written interpretation of results*  8
SAS code to generate results  5
Part 3. Data linkage (15 marks)
Data linkage strategies  5
Diagram depicting variable exchange information  10
TOTAL MARK  100

*: Assessment of written document includes the use of academic and linguistic conventions, coherent and succinct  discussion with supporting information presented.  

**: Assessment of SAS code include i) correct use of common SAS commands introduced in the course, ii) effective  creation of variables that takes into account variable type and values, and iii) clear annotation that enables  reproducibility.

HDAT9400 Management and Curation of Health Data 

Management and Curation of Australian Health Data 

Assignment 2A -Group work 

Please grade (Excellent to No contribution) and provide a score (0 to 20) for each other member of  your group for their contribution to group process. If a peer made absolutely no contribution to the  group, please score zero. 

Contribution grading Score 

(0 to 20) 

Comment
t

n

e

l

l

e

c

x

E

d

o

o

g

 

y

r

e

V

y

r

o

t

c

a

f

s

i

t

a

S

r

i

a

F

 

r

o

o

P

n

o

i

t

  

u

o

b

i

N

r

t

n

o

c

1. Insert Peer’s name
2. Insert Peer’s name
3. Insert Peer’s name
4. Insert Peer’s name

Group check list (not for submission, just for your information) 

For group tasks and assignments, it is in everybody’s interest to ensure an effective contribution  from all group members, to make sure that the finished assignment is of high quality. For this  assignment, peer assessment is used to determine the relative contributions of everyone to the  group process and this will be used to moderate the marks for the assignment. 

The check list below is recommended to guide peer assessment and to monitor and improve  effective group process. Please refer to the UNSW Guide https://student.unsw.edu.au/groupwork for the value of work group and principles for effective group work. 

Check list  Myself  Peer  Comments
Effectively clarifying task or objective at each stage? 
Checking on progress? 
Clarifying and recording what the group decides? 
Clarifying who is going to do what? 
Clarifying when each task is to be done by? 
Establishing procedures for handling meetings? 
Keeping to agreed procedures? 
Listening to each other? 
Dominating / Allowing some members to dominate? 
Withdrawing / Allowing some members to withdraw? 
Compromising individuals wants for the sake of the team? 
Recognising the feelings of other members? 
Contributing equally to team progress? 
Following agreed procedures for writing and file naming? 

Peer assessment to moderate individual final mark 

Individual contribution score (0 to 20) will be calculated as an average of scores given by peers. If  individual contribution score is greater than zero, the final mark of an individual is the sum of  scores for assignment quality and individual contribution. If individual contribution score is zero (i.e.  no contribution was made), the final mark for that particular individual is zero. Table below  illustrates how final mark will be moderated based on quality of group work and individual  contribution.

Group member  Quality of group work (marks out of 80) Individual  

contribution score (marks out of 20)

Final mark 

(marks out of 100)

Member 1  75  18  93
Member 2  75  15  90
Member 3  75  80
Member 4  75  0 (zero)  0 (zero)