Stat 311 Homework 5
1
Most questions in this assignment require R, so do this assignment in rmarkdown and upload as a pdf file to
Gradescope. This assignment requires you to install the infer package. General reminders:
- We must be able to see your R code and the output. If you save things to an object, you need to print
the object to display the output. - All non-R writing goes outside the code chunks after the code chunks.
- Proofread your assignment. Make sure all the headers and sub headers display correctly. Comment out
or delete code for anything extra that is not part of the problem. - Double check your pagination before your final submission in Gradescope. If a problem occurs on
more than one page, the problem must be assigned to all the pages on which the problem occurs.
- The midterm scores in a large introductory chemistry class are distributed 𝑁𝑁(𝜇𝜇= 72, 𝜎𝜎= 5). Let 𝑋𝑋 be the
midterm score for a randomly selected student. Use R to find the probabilities and midterm scores.
a) Find 𝑃𝑃(𝑋𝑋 ≤70).
b) Find 𝑃𝑃(𝑋𝑋>87).
c) Find 𝑃𝑃(8 2 ≤𝑋𝑋 ≤95).
d) What midterm score corresponds to the 90th percentile of exam scores?
e) What midterm score separates the top 85% of students from the rest of the students?
f) You take a random sample of 30 students from the class. What is the distribution of 𝑥𝑥̅?
g) For the random sample of 30 students, find 𝑃𝑃(𝑥𝑥 ̅>75). - The large retail pet store sells 50 lb. bags of your dog’s favorite pet food. However, the 50 lb. bags do not
weigh exactly 50 lbs. If we let 𝑋𝑋𝑖𝑖 be the weight of a randomly selected 50 lb. bag of pet food, historical
data indicate that 𝑋𝑋𝑖𝑖~𝑁𝑁(𝜇𝜇= 50.3, 𝜎𝜎= 1.20). The local grocery store sells 10 lb. bags of the same dog
food, which also do not weigh exactly 10 lbs. If 𝑌𝑌𝑖𝑖 is the weight of a randomly selected 10 lb. bag of dog
food, historical data indicate that 𝑌𝑌𝑖𝑖~𝑁𝑁(𝜇𝜇= 10.3, 𝜎𝜎= 0.4). If we randomly select five of the smaller 10
lb. bags of dog food (assuming 10 lb. bags are independent) and one 50 lb. bag of dog food, what is the
probability that the sum of the weights of the five 10 lb. bags exceeds the weight of one 50 lb. bag? Based
only on weight, what do you think is best, buying five smaller bags or one larger bag. For this problem
you can use R to get the probability, but you must show some work to convince us you know what
you are doing to solve this problem. - The random variable _X_ has a continuous uniform distribution on the interval from 1 to 10, that represents
the waiting time, in minutes, to place your order at the often busy, favorite neighborhood coffee shop. Use
this information to answer the following. Show your work.
a) What is the height of the probability density function for _X_? Show your work.
b) What is the expected value of _X_? Show your work. Explain in layperson terms what this means in the
context of the problem.
c) Calculate 𝑃𝑃(𝑋𝑋 ≤4). Work out by hand and confirm your answer using R.
d) Calculate 𝑃𝑃(2≤𝑋𝑋≤6). Work out by hand and confirm your answer using R.
e) What is 𝑃𝑃(𝑋𝑋=8)?
Stat 311 Homework 5
2
- Recall the zone out duration (ZOD) data we looked at in one of the regression lectures from Lesson 3. An
additional experiment was conducted to look at the impact of sugary desserts eaten at lunch, two hours
before class, and ZOD. Twelve students volunteered to participate in the experiment. Students were
randomly assigned to eat a large slice of apple or cherry pie, with six participants randomized in each
group. Two hours later, their ZODs (in minutes) were recorded during a 50-minute lecture. The data are in
the file ZODTwoGroups.csv.
a) In the HW5 template, we provide code to produce a comparative boxplot for ZOD by pie type. Describe
b) In the HW5 template, we provide code to create 1000 permutations for the difference of mean ZODthe sample distributions of ZOD for apple and cherry pie based on what you see in the boxplots. Does there appear to be a difference between the ZODs for the two groups?
c) Write out the statistical hypotheses, using symbols , for testing that mean ZOD for cherry pie is greaterfor cherry pie minus the mean ZOD for apple pie. Note, we use **set.seed(12)** so that all students will get the same permutations. What is the sample observed difference in means for the sample data?
d) In the HW5 template, we use ggplot to produce a histogram of the null distribution with an addedthan the mean ZOD for apple pie.
e) In the HW5 template, we provide code to calculate the _p_ - value for this permutation test. What is thevertical line for the sample observed difference. Describe the shape of the null distribution and how the sample observed difference compares with the overall distribution.
f) What do you conclude for this hypothesis test in the context of the problem?meaning of this _p_ - value as a probability? [We are asking for the definition of the p-value in context for this problem, not a conclusion about the hypothesis test]
- This problem uses the PopularDietsCombined.csv data set. We are focusing on the WtLossKG variable
(weight loss after 12 weeks in kg) and Diet. Round all confidence intervals to two decimal places in your
reporting.
a) Make a comparative boxplot to look at differences in WtLossKG by Diet type. Summarize what you
b) Since there does not seem to be too much difference by diet type, we will only work WtLossKG,see for weight loss by diet type. [Hint: copy the code from Problem 4a and modify it for this problem]
c) In the HW5 template we provide code to create 1000 bootstrapped samples using all 93 observationsignoring diet type. What is the point estimate for mean weight loss across all diets?
d) In part (c) we calculate 1000 bootstrap samples using all 93 observations. What is a single bootstrapacross all diets and to produce a histogram of the bootstrapped distribution for mean weight loss using ggplot. Describe the shape of the distribution.
e) In the HW5 template we provide code that calculates the 90% bootstrap confidence interval for weightsample?
f) Copy/paste/edit the code from 5e to get the 95% and 99% bootstrap confidence intervals from the sameloss (kg). Report and provide an interpretation of this interval in the context of the problem. Note, we used **set.seed(12)**.
bootstrap sample (this means use the same seed as used for part (e); you must reset the seed between each interval you calculate so all students get the same answers). Since we are using built-in functions in R for this, we rerun each time and change the confidence level. Describe how these intervals compare with the 90% interval reported in part (e). [Hint: think about interval widths]