Data Analysis

Full Marks 10
• In order to complete this Assignment you will need to read through the Course
Specifications (via Useful links menu of the StudyDesk) of the course, and the
Introductory Material under the “Getting started” section of the StudyDesk. An alternative direct link

to the Course Specifications is
http://www.usq.edu.au/course/specification/2015/STA2300-S2-2015-EXT-TWMBA.html

• This assignment is to be submitted online via the appropriate assignment submission link on the Data

Analysis StudyDesk accessed via UConnect.
• This assignment is intended to encourage you to gain familiarity with the resources and support

available in this course, and plan your use of these resources
across the semester.
• Answer the questions in your own words. Do not copy and paste text from
the Introductory Material on the StudyDesk (or from other sources) into your
assignment.
• Convert your word document to pdf before submission. See the Introductory
Material (Section 5, Assignments) for information about how to do this properly.
• Satisfactory completion of all the items will contribute 5% to your final mark in
the course.
• This assessment item consists of 4 questions with multiple parts.

Question 1

(2 marks)

Answer the following questions by finding the relevant information under the Introductory Material link

in the “Getting started” section on the Data Analysis StudyDesk.
(a) List (do not give details) the Course Resources you will have access to on the
StudyDesk. [0.5 marks]

STA2300—Data Analysis

2

(b) In the sections about assignments, what does it say about what you need to do if
your assignment is going to be submitted late and what is considered a valid reason?
[0.5 marks]
(c) What is the penalty for late submission of an assignment? [0.5 marks]
(d) According to the Course Specifications, what is the weighting of Assignment 3 and
out of what mark is it? [0.5 marks]

Question 2

(3 marks)

Answer the following questions by finding the relevant information on the Data Analysis
StudyDesk.
(a) What are the due dates of Assignment 2 and Assignment 3? [0.5 marks]
(b) Give a summary of the important information in the News Forum post with Important Information as

its subject heading (DO NOT COPY AND PASTE). [1
mark]
(c) According to the Study Schedule, which Module will be studied in the week just
after the mid-semester break? [0.5 marks]
(d) What is the penalty for plagiarism in an assignment? [1 mark]

Question 3

(3.5 marks)

In the Assignments and Datasets folder under the “Getting started” section of the
course StudyDesk, the data file Library2012New.sav contains information on various
aspects of selected libraries in the USA. [You should download the data file to your
computer before you open it in spss.]
(a) Write down the name of all categorical variables in the data set. [1.5 marks]
(b) From the data set find out the maximum and minimum values of the variable, local
population size. [1 mark]
(c) How many cases (or individuals) and how many variables are there in the data set
(exclude the last two manipulated columns)? [1 mark]
Note: If you can’t open Library2012New.sav data file in spss for any reasons, you
may use the Excel version of the same dataset, Library2012.xls to answer this question.

STA2300—Data Analysis

3

However, you will use the Library2012New.sav data file to do your remaining assignments. To learn the

basic data entry skill in spss, make sure you work through at least
the first 7 SPSS Practice Exercises before attempting Assignment 2.

Question 4

(1.5 marks)

Read the Course Specifications and answer the following questions based on the information under

Important assessment information:
(a) What is the requirement for a student to satisfactorily complete an assessment
item in the course? [0.5 marks]
(b) Write in your own words (no copy and paste) the requirements for a student to
be awarded a passing grade in the course. [0.5 marks]
(c) What are the items that students are allowed to bring with them in the final
exam? [0.5 marks]

STA2300—Data Analysis

4

Assignment 2
Due Date:
Weighting:

Monday, 7 September 2015
20%

Full Marks 100
• This assignment is important in providing feedback and helping to establish
competency in essential skills.
• Answer all the questions. The questions are not of equal weight, and some
questions are worth much more than others.
• The questions relate to material in Modules 1 to 6.
• Read the Introductory Material of the course before starting this Assignment.
• When you are asked to comment on a finding, usually a short paragraph is all
that is required.
• For all graphs, label the axes correctly, include a contextual title and the units
of measurement or categories.
• In many cases the spss output contains much more information than is required
for a correct and complete answer. In those cases just reproducing the output
may not attract any marks. Make sure you report only the information from the
spss output relevant to your answer.

Question 1

(22 marks)

The data set Library2012New.sav, available on the course StudyDesk, contains information on several

variables of some selected libraries including city, accreditation, year
of establishment, total revenue, operational revenue, number of books and registered
borrowers. Here we are interested in the accreditation status and handicapped access
facility.
(a) Produce a two-way (contingency) table to investigate any association between accreditation status

and handicapped access facility. [4 marks]

STA2300—Data Analysis

5

(b) Find the joint distribution of accreditation status and handicapped access facility.
[4 marks]
(c) In no more than 60 words, describe any special features (variation in the cell, and/or,
row/column percentages) of the joint distribution. [3 marks]
(d) Find the (conditional) distribution of handicapped access facility for non-accredited
libraries. [4 marks]
(e) What percentages of accredited libraries have a handicapped access facility? [3
Marks]
(f) Is there any indication of an association (relationship) between accreditation status
and handicapped access facility? Support your answer by any evidence from the
data. [4 marks]

Question 2

(20 marks)

The data set Library2012New.sav, available on the course StudyDesk, contains information on several

variables of some selected libraries including city, accreditation, year
of establishment, total revenue, operational revenue, number of books and registered
borrowers. Here we are interested to investigate the association between the number of
registered borrowers and number of children’s materials in circulation.
(a) Use an appropriate graph to display the relationship between the number of registered borrowers and

number of children’s materials in circulation. [6 marks]
(b) Describe the form, direction and strength of the relationship between the number
of registered borrowers and number of children’s materials in circulation in about
40 words. [4 marks]
(c) Calculate the value of an appropriate statistic to describe the strength of the linear
association between the number of registered borrowers and number of children’s
materials in circulation. [2 marks]
(d) Write the equation of the regression line to predict the value of the number of
children’s materials in circulation based on the number of registered borrowers. [4
marks]
(e) Use the above regression equation to predict the number of children’s materials in
circulation for the library having 56425 registered borrowers. [Refer to case 85 of
the data set.] [2 marks]
(f) From the above predicted number of children’s materials in circulation, find the
residual if the observed number in circulation is 182571. [2 marks]

STA2300—Data Analysis

Question 3

6

(16 marks)

The data set Library2012New.sav, available on the course StudyDesk, contains information on several

variables of some selected libraries including city, accreditation, year
of establishment, total revenue, operational revenue, number of books and registered
borrowers. Here we are interested to investigate the distribution of hours open per week.
(a) Use an appropriate graph to display the distribution of the hours open per week. [4
marks]
(b) Describe the shape, centre and spread of the distribution in about 50 words. [4
marks]
(c) On the same graph, display the two distributions of the hours open per week for the
accredited and non-accredited libraries. [4 marks]
(d) Compare the two distributions of the hours open per week for the accredited and
non-accredited libraries in no more than 60 words. [4 marks]

Question 4

(16 marks)

A recent business survey in Toowoomba reveals that 20% of the retail shops plan to hire
new staff within the financial year. An Economics professor at USQ takes a random
sample of 15 retail shops in Toowoomba to study the issue more thoroughly. A particular
variable of interest is the number of retail shops planning to hire new staff. Based on
the above information answer the following questions:
(a) What is an appropriate model to represent the variable of interest? Write down the
parameters of the model, if any. [3 marks]
(b) Discuss how the conditions of the above model are satisfied in the current study. [4
marks]
(c) Find the mean and standard deviation of the model using the parameters of the
model. [3 marks]
(d) What is the probability that at least 2 of the retail shops in the sample plan to hire
new staff? [3 marks]
(e) What is the probability that no more than 2 of the retail shops in the sample plan
to hire new staff? [3 marks]

STA2300—Data Analysis

Question 5

7

(14 marks)

How does a new vaccine protect from the Swine Flu? A pharmaceutical company
prepared three levels of doses for a new Flu vaccine to be tested clinically. The first level
A contained 5ml, the second level B contained 7.5ml, and the third level C contained
10ml of the actual drug. The vaccine was administered to a group of 150 randomly
selected healthy adults equally divided for the three levels of doses. Another 50 randomly
selected healthy adults received a placebo. Like the health workers who administered
the vaccine, the subjects were not aware of the level of dose or placebo they received.
The incidence of Swine Flu was monitored for a period of six months, and the data were
recorded for each group of subjects.
(a) For the above study identify, if appropriate,
(i) the response variable(s). [2 marks]
(ii) the factor and its levels. [2 marks]
(iii) the experimental units. [1 mark]
(b) Is this an experimental or observational study? Justify your answer in the context
of the question. [2 marks]
(c) Is this a double blinded study? Explain it in the context of this study. [2 marks]
(d) What was the sample size for the study? [1 mark]
(e) Are the four principles of experimental design used in this study? Explain, in the
context of the study. [4 marks]

Question 6

(12 marks)

The height of the members of a city basketball club is distributed according to a normal
model with mean µ = 170cm and standard deviation σ = 6cm.
(a) What is the probability that a randomly selected member of the club is taller than
160cm? [2 marks]
(b) What proportion of the members are of height between 164cm and 182cm? [3 marks]
(c) Suppose that the tallest 10% of the members are selected for a friendly weekend
match. What is the minimum height to be selected for the match? [3 marks]

STA2300—Data Analysis

8

(d) What is the cutoff height for the shortest 28.1% of the members of the club? [4
marks]

STA2300—Data Analysis

9

Assignment 3
Due Date:
Weighting:

Monday, 19 October 2015
25%

Full Marks 100
• This assignment is important in providing feedback and helping to establish
competency in essential skills.
• Answer all the questions. The questions are not of equal weight, and some
questions are worth much more than others.
• The questions relate to material up to and including Module 10.
• Read the Introductory Material of the course before starting this Assignment.
• When you are asked to comment on a finding, usually a short paragraph is all
that is required.
• For all graphs, label the axes correctly, include a contextual title and the units
of measurement.
• In many cases, spss output contains much more information than is required
for a correct and complete answer. In those cases just reproducing the output
may not attract any marks. Make sure you report only the information from the
spss output relevant to your answer.
• Unless instructed otherwise, show all working and formulae used in calculating
confidence intervals and performing hypothesis tests. (Answers may of course
be checked where possible using computer software.)

Question 1

(25 marks)

Considering the two variables number of visits and number of adult materials in circulation in the data

file Library2012New.sav, available on the course StudyDesk, assuming
that the data represents an SRS, answer the following questions.
(a) Find estimates of the mean (µ) and standard deviation (σ) of the (i) number of
visits and (ii) number of adult materials in circulation. [6 marks]

STA2300—Data Analysis

10

(b) Obtain a 90% confidence interval for the difference of means of the number of visits
and number of adult materials in circulation in the libraries. [4 marks]
(c) State the hypotheses to test, if the difference of means of the number of visits and
number of adult materials in circulation is significant. [3 marks]
(d) Find the value of the appropriate test statistic for the test in part (c). [5 marks]
(e) Obtain the P -value, and make an appropriate conclusion on the outcome of the
test. [4 marks]
(f) What assumptions are necessary for the inference procedure in part (b) to be valid?
[3 marks]
Note: In some parts of this question you may require to decide if a paired sample or
two independent samples procedure is appropriate. You will not be penalised if you
properly justify your choice and subsequent answers are correct.

Question 2

(13 marks)

A new surgical procedure is successful with probability p = 0.8. Assume that the
operation is performed five times and the results are independent of one another.
(a) What is the probability that all five operations are successful? [2 marks]
(b) What is the probability that less than two operations are successful? [3 marks]
(c) Find the mean and standard deviation of number of successful operation. [3 marks]
(d) If the procedure is performed 100 times, what is the probability that at least 95
operations are successful? [5 marks]

Question 3

(18 marks)

A Psychology research team was interested to study the reaction time of university
students. They took a random sample 100 students from across the university and
administered a series of tests to determine the reaction time. The observed mean and
standard deviation of the data are 27.35 seconds and 6.31 seconds respectively.
(a) What is the sampling distribution of the sample mean? Justify your answer. [4
marks]
(b) Find a 95% confidence interval for the mean reaction time of all university students.
[4 marks]

STA2300—Data Analysis

11

(c) Give the correct interpretation of the above confidence interval. [2 marks]
(d) Calculate the margin of error for a 99% confidence interval for the mean reaction
time. What is the width of the 99% confidence interval? [4 marks]
(e) If the population standard deviation is 8 seconds, what sample size would be required to produce a

95% confidence interval for the population mean reaction time
with a margin of error of 1.50 seconds? [4 marks]

Question 4

(17 marks)

From long term experience it is known that the time required to answer a set of 10
computer managed questions in the Data Analysis course follows a normal distribution
with mean µ = 15 minutes and standard deviation σ = 2 minutes. If a randomly
chosen off-campus student answers a test of 10 computer managed questions, answer
the following questions.
(a) What is probability that she would complete the test in less than 14 minutes? [4
marks]
(b) What is the probability that she would complete the test between 15 and 19
minutes? [4 marks]
(c) Determine her completion time so that only 10% of the students doing the test
will take longer than her. [4 marks]
(d) For a set of 5 randomly selected tests (each with 10 questions), what is the probability that her

mean completion time will be 14 minutes or more? [5 marks]

Question 5

(12 marks)

In January this year, 200 randomly selected voters in Australia were asked whether they
believed that the Government is doing a good job to protect the environment.
(a) If 156 of these 200 voters believe the Government is doing a good job, determine a 90% confidence

interval for the true proportion of voters who believe the
Government is doing a good job to protect the environment. [6 marks]
(b) In previous years, approximately 70% of the the voters believed the Government
was doing a good job to protect the environment. Has the proportion of voters who
believe the Government is doing a good job to protect the environment changed?
Test this hypothesis at the 1% level. Show all your working. [6 marks]

STA2300—Data Analysis

Question 6

12

(15 marks)

Answer the following questions:
(a) In no more than 100 words, identify the problems associated with non-random
sampling. [2 marks]
(b) Based on a random sample of size n = 144 from a population with proportion
p = 0.52, explain the sampling distribution of the sample proportion. State the
name of the distribution, underlying parameter(s), and any assumptions required.
[3 marks]
(c) Based on a random sample of size n = 144 from a population with mean µ = 20 and
standard deviation σ = 6, explain the sampling distribution of the sample mean. In
your answer you may state the name of the distribution, underlying parameter(s),
and any assumptions required. [3 marks]
(d) State how you would describe any association between (i) two categorical variables,
(ii) one categorical and one quantitative variable, and (iii) two quantitative variables. [3 marks]
(e) State and explain the Central Limit Theorem (CLT) for the sample mean when the
population distribution is (i) symmetric and (ii) not symmetric. [2 marks]
(f) With appropriate examples, distinguish between paired samples and two independent samples. [2

marks]