1 Introduction to Excel and R

If you are currently participating in a timetabled BIOS103 QS workshop, please ensure that you cover all of this section’s content and complete this week’s formative and summative assessments in the BIOS103 Canvas Course.

1.1 Estimating the Volume of a Snail

In this section, you will learn how to import data from a CSV file into Excel, perform basic calculations, create a scatterplot including a linear trendline to make predictions. The primary objective is to estimate the volume of a snail based on its mass using a provided dataset.

Google Sheets Alternative

If you like, you can do everything in the above video using Google Sheets instead! Here’s my alternate video just for Google Sheets.

1.1.1 Download and Import the CSV File

  1. Download the CSV File:
    • Here is an example dataset. Download it to your local machine.
    • If that link doesn’t you can get the dataset in your browser here. Right click and Save as.
  2. Import into Excel:
    • Open Excel (Use the desktop version - you won’t be able to do this using the online version!).
    • Go to Data > From Text/CSV and select the downloaded CSV file.
    • When the import wizard appears, click Load.

1.1.2 Calculate Volume \(V\) in Excel

We will estimate the volume \(V\) of our snails using the formula for the volume of a sphere: \[ V = \frac{4}{3} \pi r^3 \]

where \(r\) is the radius, which we assume is equal to half the Height L (mm) column.

  1. Add a New Column for Volume \(V\):
    • In the third column, label it as Volume V (mm^3).

    • In the first cell of this column, use the formula:

      = (4/3) * PI() * ((B2/2)^3)
    • Drag the formula down to apply it to all rows.

1.1.3 Add a Linear Trendline and Equation

  1. Create a Scatter Plot:
    • Select the Mass M (g) and Volume V (mm^3) columns.
    • Go to the Insert tab and select Scatter Plot.
  2. Add Trendline:
    • Click on the plot
    • Click the green + icon that appears at the top right of the plot.
    • Click the > symbol on the Trendline option and click More options…
    • From the trendline options menu that appears on the right, select the Linear trendline and check the box isplay Equation on chart.
    • Right-click on a data point in the scatter plot.
  3. Interpret the Equation:
    • The trendline equation will appear on the chart in the form of \(y = ax + b\), where:
      • \(y\) is the volume.
      • \(x\) is the mass.
      • \(a\) and \(b\) are coefficients.
    • For the Example dataset:
      • a = 1341.7 \(mm^3.g^{-1}\)
      • b = 1140.2 \(mm^3\)

1.1.4 Estimation of Volume for a Snail with Mass 10g

  1. Use the Trendline Equation:
    • Substitute \(x = 10\) into the trendline equation to calculate the estimated volume \(V\).
    • Express the volume in \(cm^3\) to one decimal place (Note: \(1 cm^3 = 1000 mm^3\)).
  2. For the example dataset:
    • The estimated volume of a snail that is \(10 g\) is \(14.6 cm^3\).

1.2 Getting Ready for R

Over the next couple of weeks you will continue to use Excel to load, manipulate, analyse and visualise data. Beyond this you will be using the coding language R exclusively. To prepare for this, you need to download, install and configure R and RStudio today.

Chromebook Users

I love a chromebook. Sadly, installing R and RStudio on one involves a bit of extra work compared to Windows and Mac. This YouTube video seems to have all the bases covered: “How to Install RStudio on a Chromebook”

1.2.1 Download and Install (Windows and Mac only)

R and RStudio are actually separate things, although they are often mentioned together.

R is a programming language and software environment specifically designed for statistical computing and data visualisation. Other examples of programming languages include Python, Java and Ruby.

RStudio is the integrated development environment (IDE) for R. It provides a user-friendy interface that will allow you to write all your R scripts and compile them to do stuff. I’m using it to write this handbook right now!

Despite their differences, you might hear the terms R and RStudio used interchangeably, as RStudio serves as the primary interface through which users interact with the R programming language.

You need to download and install both R and Rstudio.

Once both are installed, open up RStudio and get ready to create your first R Project.

All University of Liverpool MWS machines already have R and RStudio installed and ready to use.

1.2.2 Creating Your First R Project in RStudio

Follow these steps to set up and manage your first R project in RStudio:

  1. Open RStudio
  • Launch RStudio from your applications menu.
  1. Start a New Project
  • Click on the File menu at the top of the RStudio window.
  • Select New Project… from the dropdown menu.
  1. Choose Project Type

You will be prompted with three options:

  • New Directory: Create a new project in a new directory.
  • Existing Directory: Use an existing directory as the project’s folder.
  • Version Control: Clone a project from a version control repository (e.g., GitHub).

For your first project, select New Directory.

  1. Select Project Template
  • Choose Empty Project.
  • Click Next.
  1. Set Up Project Directory
  • Directory name: Enter a name for your project folder. This will be the name of the directory created for your project.
  • Subdirectory of: Choose the parent directory where the new project folder will be created. You can navigate to the desired location using the file browser.
  • Click Create Project.
  1. RStudio Project Interface

Once the project is created, you will see a new RStudio window or tab with the following components:

  • Files pane: Displays the files and folders in your project directory.
  • Script editor: Where you write and edit your R scripts.
  • Console: Where you can directly enter and execute R commands.
  • Environment/History: Shows your workspace objects and command history.
  • Plots/Packages/Help/Viewer: Various tabs for viewing plots, managing packages, accessing help documentation, and viewing other outputs.
  1. Create and Save an R Script
  • Click File > New File > R Script.
  • Write some R code in the script editor. For example:

To run your print command, click on the line and click the Run button at the top right of your script editor window or press Ctrl + ENTER (Cmd + Enter on Mac).

You will see the following output in your console window:

## [1] "Hello World!"

And there it is! You’ve just successfully compiled your first line of R. Congratulations!

1.2.3 Something More Complicated

As you delve deeper into R programming, you’ll find that your scripts become more sophisticated than the “Hello World” example above. In the following example, I’ll walk you through a script to create a random number generator. Here’s the script in its entirety:

# create a variable called "seed"
seed <- 999

# set the seed
set.seed(seed)

# generate a random number
random_number <- runif(1)

# print the random number
print(random_number)

Cut and paste these lines into the script file we were working on earlier (overwrite the “Hello World” example).

Now, you can run each line in turn as before (using Ctrl + ENTER) or you can run everything by clicking the Source button. You should see the following number appear in your console:

Let’s break down each line of the script to understand its purpose and functionality.

  1. Creating a Variable Called “seed”
# create a variable called "seed"
seed <- 123
  • seed <- 123: Here, we are using the <- operator to assign the value 123 to the variable named seed. In R, variables are used to store data that can be reused or manipulated later in the script. The number 123 is arbitrary in this case, but we use it to illustrate how to set a seed for random number generation.
  1. Setting the Seed
# set the seed
set.seed(seed)
  • The set.seed() function initializes the R environment’s build in random number generator with the value stored in seed. Setting a seed is essential for reproducibility, meaning that if someone else runs this code with the same seed, they will get the same random number output in their console. This is particularly useful in simulations and randomised experiments where consistent results are needed.
  1. Generating a Random Number
# generate a random number
random_number <- runif(1)
  • random_number <- runif(1): This line generates a single random number between 0 and 1. The function runif() generates random numbers from a uniform distribution, which means that each number within the specified range has an equal probability of being selected. The 1 inside the parentheses specifies that only one random number should be generated. The resulting number is then assigned to the variable called random_number.
  1. Printing the Random Number
# print the random number
print(random_number)
  • print(random_number): The print() function outputs the value stored in random_number to the console. This is useful for verifying the output of your code and ensuring that the operations have been executed correctly.

Summary

This example script demonstrates a more complex task than the basic “Hello World” script. It introduces key concepts like variable assignment with <-, setting a seed for reproducibility using set.seed(), generating random numbers with runif(), and printing results using print(). As you continue learning R, these foundational concepts will become increasingly important, enabling you to build more advanced and meaningful analyses. Remember, comments (#) are your friends! They help explain what each part of your code does, making it easier for you and others to understand and maintain your scripts. Be liberal with your comments. You’ll thank yourself later (trust me).

Give me feedback

I really value your feedback on these materials for quantitative skills. Please rate them below and then leave some feedback. It’s completely anonymous and will help me improve things if necessary. Say what you like, I have a thick skin - but feel free to leave positive comments as well as negative ones. Thank you.

1.3 Complete your Weekly Assignments

In the BIOS103 Canvas course you will find this week’s formative and summative QS assignments. You should aim to complete both of these before the end of the online workshop that corresponds to this section’s content. The assignments are identical in all but the following details:

  • You can attempt the formative assignment as many times as you like. It will not contribute to your overall score for this course. You will receive immediate feedback after submitting formative assignments. Make sure you practice this assignment until you’re confident that you can get the correct answer on your own.
  • You can attempt the summative assignment only once. It will be identical to the formative assignment but will use different values and datasets. This assignment will contribute to your overall score for this course. Failure to complete a summative test before the stated deadline will result in a zero score. You will not receive immediate feedback after submitting summative assignments. Typically, your scores will be posted within 7 days.

In ALL cases, when you click the button to “begin” a test you will have two hours to complete and submit the questions. If the test times out it will automatically submit.