1. STATA time!
(a) Download the dataset from
(b) Remove all observations prior to year 1980. (hint, “keep if” might help here)
(c) Make a histogram with a kernel density plot of the ATM variable.
(d) Provide descriptive statistics for the ATM variable including mean, median, min, max,
and standard deviation.
(e) Are the mean and the median close for the ATM variable? Based on the plot above, why
do you think they are different?
(f) What is the standard deviation of ATMs? What does this measure and how is it inter-
(g) Create a new variable for log of GDP (xlrealgdp) and log of population (xlpopulation).
(hint, new variables in STATA are created using the generate command, or “gen logx =
log(x)” )
(h) For both new variables log of population and log of GDP, plot the distribution and derive
the sample statistics. From this information, do you think the new variables appear
normally distributed?
(i) Limit the dataset to cases where cellphone usage is under a million. With this new dataset,
use the table function to identify the number of years each country had less than a million
phones. Which 5 countries had the fewest number of years with under a million phones?
(j) Create a new variable representing the percentage of people who have cell phones in a
county by dividing cellphone numbers by population. Create a summary table of the
descriptive statistics. Do these seem reasonable? Why or why not?