Distribution of Quantitative Data:
Create both a histogram and modified boxplot for EACH of your quantitative variables (this means you should have a total of 2
histograms and 2 modified boxplots). You can choose what technology to use, but it must look professional (hand drawn histograms or
boxplots will receive no credit). Indicate how your graph was created using the technology (i.e. indicate what technology was used
and give instructions).
For each boxplot report the fences. You must show your computations for anything computed by hand and show instructions for anything
computed using technology.
For each quantitative variable: Use the histograms and boxplots to describe the distribution. Discuss the shape (modality,
symmetry/skew, and unusual features such as gaps or outliers) of your data for each variable. For example were any of your data
points designated outlie
Project Part II: Data and Descriptive Statistics (47 points total)
In this section you will expand your project to include your actual data set, descriptive statistics for your quantitative
variables, linear regression between those two variables, and a discussion of your results. Your project should be submitted as a
professional report including everything from Part I and II using the following headings:
Research Proposal: Introduction, Methods, and Materials from Part I
*** IF you did not do Part I: you must at a minimum obtain approval from your instructor on your variables and collect data. If
Part I is missing, you will still lose completeness points.
Data: You must collect your own data set using your survey and following the plan you set out in part I. Include your actual data,
neatly arrange in a table, one row per case with a column for each variable. Remember, you should have at least 20 cases
(subjects)!
For example:
Case: Hours studied per week GPA
Student 1* 5 3.0
Student 2 10 3.2
Student 3 8 2.9
*Note: Do not report subject names due to confidentiality
Distribution of Quantitative Data:
Create both a histogram and modified boxplot for EACH of your quantitative variables (this means you should have a total of 2
histograms and 2 modified boxplots). You can choose what technology to use, but it must look professional (hand drawn histograms or
boxplots will receive no credit). Indicate how your graph was created using the technology (i.e. indicate what technology was used
and give instructions).
For each boxplot report the fences. You must show your computations for anything computed by hand and show instructions for
anything computed using technology.
For each quantitative variable: Use the histograms and boxplots to describe the distribution. Discuss the shape (modality,
symmetry/skew, and unusual features such as gaps or outliers) of your data for each variable. For example were any of your data
points designated outliers based on the fences (which are based on the 1.5 IQR rule)? Report the summary statistics (Mean, SD, 5-
number summary, IQR). Indicate whether the mean or median is a better measure of center, explain why, and give that value.
Indicate whether the standard deviation or IQR is a better measure of spread, explain why, and give that value.
Z-Scores and the Normal Distribution:
Identify which one of your quantitative variables has a distribution that is closest to a normal distribution (unimodal and
symmetric). Discuss which variable you chose and why.
For this variable, select the highest and lowest value within your data set and compute the z-score. You must show all your
calculations typed neatly using appropriate word processing software with a mathematics package (for example Equation Editor or
MathType in MS Word). Answer the question: which z-score is more extreme?
Also, pick one other data value from your data set. Compute its z-score. Use the normal model to approximate the probability
(percent of the data) of being more extreme than this value. For example, if your z-score is negative such as -0.4 you’d want to
find the probability of having a z-score LESS than -0.4. If your z-score is positive such as 1.2, you’d want to find the
probability of having a z-score GREATER than 1.2. If you use technology to assist you, explain how.
Linear Regression and Correlation
Recall that in your research proposal you discussed that you believed there might be an association between your two
quantitative variables. You are now going to examine this relationship using Linear Regression. Based on what you wrote in Part I
of your project, state which variable you are selecting to be your explanatory (x) variable and which variable you are selecting to
be your response (y) variable and explain why you made this decision.
Scatterplot: Create a scatterplot of your explanatory and response variable. You can choose what technology to use, but it must
look professional (hand drawn scatterplots will receive no credit). Based on your scatterplot, discuss the direction, form, and
strength of the association using appropriate statistical terminology. Are there any suspected outliers or clusters? Is linear
correlation and regression appropriate based on your scatterplot?
Correlation Coefficient and Linear Regression: Report the correlation coefficient and linear regression equation using your choice
of technology. Indicate how your correlation coefficient and linear regression equation were created using the technology (i.e.
give detailed instructions). Does your correlation coefficient confirm your observations of the scatterplot from the previous
section? How so, or why not?
Discussion: Do the results of your linear regression and correlation analysis appear to confirm or contradict your initial belief
that these variables were associated in some way? Critically evaluate this conclusion by addressing both the evidence in support of
your conclusion about the purported relationship between your two variables, as well as cautions or problems with your data that
would weaken your case. The form, shape, and strength of your scatterplot, as well as the strength of the correlation coefficient
should be discussed in this evaluation. If your data contained outliers, these must also be discussed for full credit.
Additionally, you must identify the sampling method you used (systematic, simple random, stratified, cluster, convenience, voluntary
response, etc.) and discuss what limitations you see with your research, including sources of bias or other problems that might
limit how well your research generalizes to the greater population. What further conclusions might you draw based on this deeper
analysis?
Technology Considerations: Discuss what technology you chose to use for your displays of quantitative data (Histogram and Boxplot)
and why. Discuss what technology you chose to create your scatterplots and to compute the correlation coefficient and linear
regression equation and why. For full credit be sure to discuss what technology options you considered, the pros and cons of each,
and what considerations or concerns led you to make the choice you made.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grading Rubric: Part II will be graded by components according to the following guidelines with comments provided to students.
Data and Inclusion of Part I (6 points)
3 Criteria: 1) Part I included 2) Data is Complete/Readable and 3) Data matches proposal in Part I.
6 Points 4 Points 2 Points 0 Points