# There is lots of new functionality introduced in this activity, and I do # not spend a lot of time explaining it. Make liberal use of the Help tab # to look up functions or Google them. It is excellent practice to start # looking up how to do things and what various functions accomplish for when # you are out there coding on your own. Of course, feel free to ask me any # questions during the session as well :). # Let's begin by creating a new Project in R. A project enables us to open # right where we left off, with our working directory set, and all of the # variables and datasets we have created still saved in our environment window # (the upper right hand window). # In RStudio, go to File -> New Project... -> New Directory -> New Project -> # Browse and navigate to where you would like your new folder for your # new project to be located. Make sure you select a location you know how to # find because this will be the location for your working directory and will # contain your scripts, data files, output files, etc. As soon as you have # navigated to the location you want, click open. Then, in the Directory Name # box give your project folder a name. For the purposes of this activity, # lets call it testproject, then click Create Project. Notice in your bottom # right pane, under the Files tab, you have a file called testproject.Rproj. # that is the file you will open to open that project if you are moving back # and forth between projects. # I made another copy of the iqlead.csv data file called iqlead2.csv. In this # file, I changed the 1 and 2 for male and female to M and F and I changed the # blood lead levels from 1, 2, 3 to Low, Medium, and High so that we could # work with categorical data that are made up of words (strings) instead of # numbers to give you some practice with that. # Grab the iqlead2.csv file from my faculty.cascadia.edu/rburke website and put # it in the testproject folder. # Go to File -> New File -> R Script to open your script window with a new # untitled script. Go to File -> Save and save your new script as # firstproject.R # Import the data by typing the line of code below and then running that line # of code. iqlead <- read.csv("iqlead2.csv") # Note that we can name the dataset in R something different than the filename. # The filename was iqlead2.csv, and we used the read.csv() function to import # it, but inside R we are calling the dataset iqlead. Notice this dataset now # shows up in your upper right pane in the Environment tab. Let's test out # being able to "pick up where we left off." Save your script again since the # new name is red, and we want the name to be black (saved) before closing. Go # to File -> Quit Session... and say yes to any questions about saving (like # save workspace image). Re-open RStudio and you should be right back where # you came from. If not, go to File -> Open Project and navigate to your # testproject.Rproj file and open it and then you should be right back where # you came from with your iqlead dataset in tact in your Environment tab. # Let's take a look at a summary of the iqlead dataset. In your script window # type and run the line of code below: summary(iqlead) # Observe the lead levels are listed alphabetically, High Low Medium levels(iqlead$LEAD) # We could leave this, but really, it should be Low < Medium < High so let's # fix it: iqlead$LEAD <- factor(iqlead$LEAD, levels = c("Low", "Medium", "High")) # Let's count the number of females and save that as the variable iqlead.F # Notice the use of double quotes around the word (string) F. We need to # use quotation marks when dealing with words instead of numbers. iqlead.F <- length(iqlead$SEX[iqlead$SEX=="F"]) # Let's do the same thing with the males. iqlead.M <- length(iqlead$SEX[iqlead$SEX=="M"]) # Notice that those variables have shown up in the Environment Window. You can # ignore the L after the number. This is just an indication that the number is # an integer. # Let's count (and save) the number of children in the Low group iqlead.Low <- length(iqlead$LEAD[iqlead$LEAD=="Low"]) # Same with the Medium group iqlead.Medium <- length(iqlead$LEAD[iqlead$LEAD=="Medium"]) # Same with the High group iqlead.High <- length(iqlead$LEAD[iqlead$LEAD=="High"]) # Let's calculate the mean of the IQP Low group and save it iqlead.IQP.Low <- mean(iqlead$IQP[iqlead$LEAD=="Low"]) # Same with the Medium group iqlead.IQP.Medium <- mean(iqlead$IQP[iqlead$LEAD=="Medium"]) # Same with the High group iqlead.IQP.High <- mean(iqlead$IQP[iqlead$LEAD=="High"]) # We are now going to create and export a boxplot of IQP broken out by lead # level. I also want to add the means of each group to the plot (since # boxplots display the median, not the mean. To help me with that addition, # I am going to create a vector with the means for each lead level using # the function tapply(). IQP.means <- tapply(iqlead$IQP,iqlead$LEAD,mean) # Now, we'll create a boxplot of IQP broken out by lead level saved as a .png # file with the mean values added as red boxes. First type all the lines of # code in the block below and then highlight all the lines from the png() # function to the dev.off() line and run them all at once. png(file="IQPboxplot.png") boxplot(iqlead$IQP~iqlead$LEAD, main="Performance IQ's of Children Living Close to Lead Smelter", xlab="Blood Lead Level", ylab="Performance IQ Score") points(IQP.means,col="red",pch=15) dev.off() # It looks like not much happened, but in the lower right hand pane in the # Files tab, you should now see IQPboxplot.png. If you double click this # file you will see the plot you created. # Let's run an ANOVA on IQP broken out by blood lead level to see if one of # the three lead level means is statistically significantly different from the # others. We are going to save our results in the variable iqlead.IQP.LEAD.aov iqlead.IQP.LEAD.aov <- aov(iqlead$IQP~iqlead$LEAD) summary(iqlead.IQP.LEAD.aov) # Assuming our dataset satisfied all the requirements in order to utilize an # ANOVA, we get a p-value of 0.0195 (found right below Pr(>F)) and so as long # as we are comfortable with a level of significance greater than 0.0195 we # can make the claim that not all the means are equal (statistically # speaking). Recall that common levels of significance are 0.1, 0.05, and # 0.01. We can claim a significance of 0.05, but not of 0.01 (since our p- # value falls between those two values). # We could now perform a Tukey test to determine which groups are statistically # different from one another saving the results as iqlead.IQP.LEAD.tuk: iqlead.IQP.LEAD.tuk <- TukeyHSD(iqlead.IQP.LEAD.aov) # And then viewing the results: iqlead.IQP.LEAD.tuk # Looking at the p adj column, we have some grounds to claim a difference in # the Low and Medium mean, and in the Low and High mean, but the difference # between the Medium and High mean is not statistically significant. These # p-values are not fabulous. They are below 0.1, but being below 0.05 would # be more convincing and hold up to more scrutiny. # Now, let's export some of our results to a file for documentation purposes # as well as making it easy to cut and paste into Word documents, Power # Point presentations, etc. We'll open a text file called results.txt and # export what we want with the sink() function. sink("results.txt") # Give a label to the summary we are about to run cat("Summary of iqlead dataset\n") # Send the summary to a file summary(iqlead) # Give a label to the mean IQP of the Low group cat("\nMean IQP of the Low group\n") cat(iqlead.IQP.Low) # Give a label to the mean IQP of the Medium group cat("\nMean IQP of the Medium group\n") cat(iqlead.IQP.Medium) # Give a label to the mean IQP of the High group cat("\nMean IQP of the High group\n") cat(iqlead.IQP.High) # Give a label to the ANOVA results of the IQP means for the three lead levels cat("\n\nANOVA Results for the IQP of the Three Lead Levels\n") summary(iqlead.IQP.LEAD.aov) # Give a label to the Tukey results of the IQP means for the three lead levels cat("\n\nTukey HSD Results for the IQP of the Three Lead Levels\n") iqlead.IQP.LEAD.tuk sink() # You will see that you now have a file called results.txt in your Files tab in # the bottom right hand pane. If you double click it, it will open in the top # left pane and you can view your exported results.