If you are working in the natural sciences, you may have heard of researchers use something called “R” for data analysis.
Here, we provide readers – particularly those with a pharmacology or medicinal chemistry background – with a taste of R programming and what R has to offer, even to those with no prior experience.
What is R?
R is a programming language widely used by data miners and statisticians that can be used for data analysis and visualization. The integrated development environment (IDE) RStudio, which is often used with R, is an interactive desktop environment where users can write code written in R and save their work with “.R” as the file extension. R also has the major advantage of being free and open-source!
Who uses R?
Aside from data scientists who use R and other programming languages such as Python for their work, R is also used by researchers and scientists who work in disciplines such as medicine, pharmacology, statistics, chemistry, biology and everything in between. Examples of recently published scientific research which have used R include:
- Chae, D. et al.. Mechanistic Model for Blood Pressure and Heart Rate Changes Produced by Telmisartan in Human Beings. Basic & Clinical Pharmacology & Toxicology. 2018, 122 (1), 139–148 DOI: 10.1111/bcpt.12856.
- Hill, A. C. et al.. Correction of medication nonadherence results in better seizure outcomes than dose escalation in a novel preclinical epilepsy model of adherence. Epilepsia. 2019, 60 (3), 475-484 DOI:10.1111/epi.14655.
- Loder, A. Let al.. Water Chemistry of Managed Freshwater Wetlands on Marine-Derived Soils in Coastal Bay of Fundy, Canada. Wetlands 2019, 39 (3), 521–532 DOI: 10.1007/s13157-018-1101-y.
How can I install R and where can I download RStudio?
- R may be installed by following the instructions listed in this page (r-project), selecting the appropriate operating system that you’re using.
- Download the desktop version of RStudio by going to RStudio’s webpage (RStudio downloads), again selecting your operating system.
- Follow the installation guide and instructions for both.
Once RStudio has been installed, open the software and you should see something similar to the image shown below.Key areas have been highlighted:
- Code editor pane: Space where R codes can be written. After running your code, outputs are displayed in the Console pane.
- Console tab: Commands such as simple calculations can be typed here for immediate execution. While in the console tab, pressing “Ctrl + L” (Windows) clears it.
- Workspace pane: Where the Environment, History and Connections tabs may be found.
- Notebook pane: Where the Files, Plots, Packages, Help and Viewer tabs may be found. The Help tab is useful should you require further assistance.
You can customize the appearance of RStudio based on your personal preferences by going to:
Tools > Global Options… > Appearance
For the rest of this article, the “Twilight” theme is used.
Other Windows keyboard shortcuts you may be familiar with can be used in RStudio such as:
- “Ctrl + O” : To open files
- “Ctrl + S” : To save .R files
- “Ctrl + Shift + N” : To start a new file
- “Ctrl + W” : To close a tab
Running your code in RStudio
As mentioned earlier, you can immediately execute commands such as calculations by typing directly on the console tab. For the rest of this article, we will use the code editor pane instead. If you wish to run code on RStudio:
- Write your R code in the Code Editor pane
- Click “Run” to run the current line or selection (Windows keyboard shortcut “Ctrl + Enter”)
- The output will be displayed in the console tab
If you have more than one line of code such as in the example below, clicking “Source” runs them all sequentially (Windows keyboard shortcut “Ctrl + Shift + S”).
Note: Both “print()” and “cat()” functions can be used to display an output on the console tab.
Storing variables in RStudio Windows
In R, you can store values into a container referred to as a variable. A value, which can be anumeric or a character data type, can be stored using the “<-” assignment operator. “=” is an alternative to “<-” but for the rest of this article, “<-” will be used.
In the example shown below, character and numeric data types are stored in the user-defined vectors “a_variable”, “another_variable”, “testing.again” and “testing.again_with.comment” using the assignment operator “<-“. After pressing “Ctrl + Shift + S”, the vectors are displayed in the console tab because of print() and cat(). Note that character data types are stored with quotation marks.
Note: The “\n” adds a new line after the first cat(). This is particularly useful when using cat() since unlike print(), the cat() function does not automatically add a new line in the output! Using cat() instead of print() removes the quotation marks in the output in the console tab.
Multiple values can be assigned to a variable with the built-in combine function, c(), where each value should be separated with a comma. This is similar to “arrays” in other programming languages like Python.
For instance, several Maltese words can be assigned to “random.maltese.words”:
Data Types in R
Numeric variables in R can bea double (a decimal number) or an integer (whole number). By default, if a number is stored without the suffix “L”, it is a double data type.
In contrast, including the “L” suffix, would create an integer data type. We’ve also encountered character types which are contained within quotation marks. Storing a number with quotation marks stores it as a character type. You can check the data type with the built-in function typeof(). Logical values (TRUE or FALSE) are referred to as Boolean data types, named after the English mathematician George Boole.
See examples below:
If you wish to create an integer type, as.integer() is an alternative to the suffix “L”:
Leaving Comments in R
When working with .R files, it is good practice to leave comments in the code editor for future reference and so that others who may want to look at your code have an idea what the written code is for. You may do this using hashtags (#) as shown in the example below (don’t worry about the code in the example):
RStudioNote: In the code editor, “Ctrl + Shift + C” is the Windows keyboard shortcut for “#” for leaving comments. If you used this keyboard shortcut on a line with code already written on it, the entire line will be turned into a comment line so be careful!
Performing Calculations with R
R isn’t just for displaying text and numbers in the console tab.
Mathematical calculations can be carried out too. More experienced and advanced users can import CSV files onto RStudioand use R for visualizing complex data and for performing complex statistical analysis!
Performing simple math on R is simple and in some cases, notations used in MS Excel you may be familiar with can be used in R.
Some of the common operators and built-in functions are shown below:
|Operator or Function||What it does|
|max()||Finds the maximum value|
|min()||Finds the minimum value|
|sum()||Finds the sum|
|mean()||Finds the mean|
|sd()||Finds the standard deviation|
|length()||States length of vector|
Operations can be performed on single numeric values or multiple numeric values stored in variables as shown in the examples below:
Another useful function in base R is:
- Some of the results given by summary() are also given by MS Excel’s descriptive statistics feature.
Using the basic operators and built-in functions you have learned thus far, you can now use R to perform pharmaceutical calculations!
Pharmaceutical Calculation: Calculation of Molar Mass using R [Sample Problem]
Question: Determine the molar mass of the antiretroviral HIV fusion inhibitor Enfuvirtide (C204H301N51O64) using the following values: C = 12.01 g/mol, H = 1.01 g/mol, N = 14.01 g/mol and O = 16.00 g/mol.
- Assign values to vectors in the code editor. For this problem, we’ll use x_val where x = an element.
- Assign the sum of the atomic mass to a variable (something like enfuvirtide_val) and using the molecular formula of enfuvirtide and the assigned numeric values in step 1), calculate the molar mass of enfuvirtide in g/mol.
enfuvirtide_val<- C_val*204 + H_val*301 + N_val*51 + O_val*64
cat(“Molar mass of Enfuvirtide [g/mol]”, enfuvirtide_val)
- Execute the code and see the results in the console. You should see something like:
You can reuse the variables above for future pharmacology, medicinal chemistry and pharmaceutical calculations!
Where can I learn more about R?
Now that you’ve had a taste of what the R programming language has to offer, you can take some extra steps to learn more about R.
Here are some exceptional resources to help you on your way:
Also, typing ?X where X is an in-built function in the console tab opens up a help page for that in-build function:
- Zuur AF, Ieno EN, Meesters E. A Beginner’s Guide to R. Springer New York: New York, NY, USA, 2009; DOI:10.1007/978-0-387-93837-0.
- Crawley MJ. The R Book. John Wiley & Sons, Ltd: Chichester, UK, 2007; DOI:10.1002/9780470515075.