# Distribution of Quantitative Data

Distribution of Quantitative Data:
Create both a histogram and modified boxplot for EACH of your quantitative variables (this means you should have a total of 2

histograms and 2 modified boxplots). You can choose what technology to use, but it must look professional (hand drawn histograms or

boxplots will receive no credit). Indicate how your graph was created using the technology (i.e. indicate what technology was used

and give instructions).
For each boxplot report the fences. You must show your computations for anything computed by hand and show instructions for anything

computed using technology.
For each quantitative variable: Use the histograms and boxplots to describe the distribution. Discuss the shape (modality,

symmetry/skew, and unusual features such as gaps or outliers) of your data for each variable. For example were any of your data

points designated outlie

Project Part II: Data and Descriptive Statistics (47 points total)
In this section you will expand your project to include your actual data set, descriptive statistics for your quantitative

variables, linear regression between those two variables, and a discussion of your results. Your project should be submitted as a

professional report including everything from Part I and II using the following headings:
Research Proposal: Introduction, Methods, and Materials from Part I
*** IF you did not do Part I: you must at a minimum obtain approval from your instructor on your variables and collect data. If

Part I is missing, you will still lose completeness points.
Data: You must collect your own data set using your survey and following the plan you set out in part I. Include your actual data,

neatly arrange in a table, one row per case with a column for each variable. Remember, you should have at least 20 cases

(subjects)!
For example:
Case: Hours studied per week GPA
Student 1* 5 3.0
Student 2 10 3.2
Student 3 8 2.9
*Note: Do not report subject names due to confidentiality
Distribution of Quantitative Data:
Create both a histogram and modified boxplot for EACH of your quantitative variables (this means you should have a total of 2

histograms and 2 modified boxplots). You can choose what technology to use, but it must look professional (hand drawn histograms or

boxplots will receive no credit). Indicate how your graph was created using the technology (i.e. indicate what technology was used

and give instructions).
For each boxplot report the fences. You must show your computations for anything computed by hand and show instructions for

anything computed using technology.
For each quantitative variable: Use the histograms and boxplots to describe the distribution. Discuss the shape (modality,

symmetry/skew, and unusual features such as gaps or outliers) of your data for each variable. For example were any of your data

points designated outliers based on the fences (which are based on the 1.5 IQR rule)? Report the summary statistics (Mean, SD, 5-

number summary, IQR). Indicate whether the mean or median is a better measure of center, explain why, and give that value.

Indicate whether the standard deviation or IQR is a better measure of spread, explain why, and give that value.

Z-Scores and the Normal Distribution:
Identify which one of your quantitative variables has a distribution that is closest to a normal distribution (unimodal and

symmetric). Discuss which variable you chose and why.
For this variable, select the highest and lowest value within your data set and compute the z-score. You must show all your

calculations typed neatly using appropriate word processing software with a mathematics package (for example Equation Editor or

MathType in MS Word). Answer the question: which z-score is more extreme?
Also, pick one other data value from your data set. Compute its z-score. Use the normal model to approximate the probability

(percent of the data) of being more extreme than this value. For example, if your z-score is negative such as -0.4 you’d want to

find the probability of having a z-score LESS than -0.4. If your z-score is positive such as 1.2, you’d want to find the

probability of having a z-score GREATER than 1.2. If you use technology to assist you, explain how.
Linear Regression and Correlation
Recall that in your research proposal you discussed that you believed there might be an association between your two

quantitative variables. You are now going to examine this relationship using Linear Regression. Based on what you wrote in Part I

of your project, state which variable you are selecting to be your explanatory (x) variable and which variable you are selecting to

be your response (y) variable and explain why you made this decision.
Scatterplot: Create a scatterplot of your explanatory and response variable. You can choose what technology to use, but it must

look professional (hand drawn scatterplots will receive no credit). Based on your scatterplot, discuss the direction, form, and

strength of the association using appropriate statistical terminology. Are there any suspected outliers or clusters? Is linear

correlation and regression appropriate based on your scatterplot?
Correlation Coefficient and Linear Regression: Report the correlation coefficient and linear regression equation using your choice

of technology. Indicate how your correlation coefficient and linear regression equation were created using the technology (i.e.

give detailed instructions). Does your correlation coefficient confirm your observations of the scatterplot from the previous

section? How so, or why not?
Discussion: Do the results of your linear regression and correlation analysis appear to confirm or contradict your initial belief

that these variables were associated in some way? Critically evaluate this conclusion by addressing both the evidence in support of

would weaken your case. The form, shape, and strength of your scatterplot, as well as the strength of the correlation coefficient

should be discussed in this evaluation. If your data contained outliers, these must also be discussed for full credit.

Additionally, you must identify the sampling method you used (systematic, simple random, stratified, cluster, convenience, voluntary

response, etc.) and discuss what limitations you see with your research, including sources of bias or other problems that might

limit how well your research generalizes to the greater population. What further conclusions might you draw based on this deeper

analysis?
Technology Considerations: Discuss what technology you chose to use for your displays of quantitative data (Histogram and Boxplot)

and why. Discuss what technology you chose to create your scatterplots and to compute the correlation coefficient and linear

regression equation and why. For full credit be sure to discuss what technology options you considered, the pros and cons of each,

and what considerations or concerns led you to make the choice you made.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grading Rubric: Part II will be graded by components according to the following guidelines with comments provided to students.
Data and Inclusion of Part I (6 points)
3 Criteria: 1) Part I included 2) Data is Complete/Readable and 3) Data matches proposal in Part I.
6 Points 4 Points 2 Points 0 Points