If you are working in the natural sciences, you may have heard of researchers use something called “R” for data analysis.
Here, we provide readers – particularly those with a pharmacology or medicinal chemistry background – with a taste of R programming and what R has to offer, even to those with no prior experience.
R is a programming language widely used by data miners and statisticians that can be used for data analysis and visualization. The integrated development environment (IDE) RStudio, which is often used with R, is an interactive desktop environment where users can write code written in R and save their work with “.R” as the file extension. R also has the major advantage of being free and open-source!
Aside from data scientists who use R and other programming languages such as Python for their work, R is also used by researchers and scientists who work in disciplines such as medicine, pharmacology, statistics, chemistry, biology and everything in between. Examples of recently published scientific research which have used R include:
Once RStudio has been installed, open the software and you should see something similar to the image shown below.Key areas have been highlighted:
You can customize the appearance of RStudio based on your personal preferences by going to:
Tools > Global Options… > Appearance
For the rest of this article, the “Twilight” theme is used.
Other Windows keyboard shortcuts you may be familiar with can be used in RStudio such as:
As mentioned earlier, you can immediately execute commands such as calculations by typing directly on the console tab. For the rest of this article, we will use the code editor pane instead. If you wish to run code on RStudio:
If you have more than one line of code such as in the example below, clicking “Source” runs them all sequentially (Windows keyboard shortcut “Ctrl + Shift + S”).
Note: Both “print()” and “cat()” functions can be used to display an output on the console tab.
In R, you can store values into a container referred to as a variable. A value, which can be anumeric or a character data type, can be stored using the “<-” assignment operator. “=” is an alternative to “<-” but for the rest of this article, “<-” will be used.
In the example shown below, character and numeric data types are stored in the user-defined vectors “a_variable”, “another_variable”, “testing.again” and “testing.again_with.comment” using the assignment operator “<-“. After pressing “Ctrl + Shift + S”, the vectors are displayed in the console tab because of print() and cat(). Note that character data types are stored with quotation marks.
Note: The “\n” adds a new line after the first cat(). This is particularly useful when using cat() since unlike print(), the cat() function does not automatically add a new line in the output! Using cat() instead of print() removes the quotation marks in the output in the console tab.
Multiple values can be assigned to a variable with the built-in combine function, c(), where each value should be separated with a comma. This is similar to “arrays” in other programming languages like Python.
For instance, several Maltese words can be assigned to “random.maltese.words”:
Numeric variables in R can bea double (a decimal number) or an integer (whole number). By default, if a number is stored without the suffix “L”, it is a double data type.
In contrast, including the “L” suffix, would create an integer data type. We’ve also encountered character types which are contained within quotation marks. Storing a number with quotation marks stores it as a character type. You can check the data type with the built-in function typeof(). Logical values (TRUE or FALSE) are referred to as Boolean data types, named after the English mathematician George Boole.
See examples below:
If you wish to create an integer type, as.integer() is an alternative to the suffix “L”:
When working with .R files, it is good practice to leave comments in the code editor for future reference and so that others who may want to look at your code have an idea what the written code is for. You may do this using hashtags (#) as shown in the example below (don’t worry about the code in the example):
RStudioNote: In the code editor, “Ctrl + Shift + C” is the Windows keyboard shortcut for “#” for leaving comments. If you used this keyboard shortcut on a line with code already written on it, the entire line will be turned into a comment line so be careful!
R isn’t just for displaying text and numbers in the console tab.
Mathematical calculations can be carried out too. More experienced and advanced users can import CSV files onto RStudioand use R for visualizing complex data and for performing complex statistical analysis!
Performing simple math on R is simple and in some cases, notations used in MS Excel you may be familiar with can be used in R.
Some of the common operators and built-in functions are shown below:
|Operator or Function||What it does|
|max()||Finds the maximum value|
|min()||Finds the minimum value|
|sum()||Finds the sum|
|mean()||Finds the mean|
|sd()||Finds the standard deviation|
|length()||States length of vector|
Operations can be performed on single numeric values or multiple numeric values stored in variables as shown in the examples below:
Another useful function in base R is:
Using the basic operators and built-in functions you have learned thus far, you can now use R to perform pharmaceutical calculations!
Question: Determine the molar mass of the antiretroviral HIV fusion inhibitor Enfuvirtide (C204H301N51O64) using the following values: C = 12.01 g/mol, H = 1.01 g/mol, N = 14.01 g/mol and O = 16.00 g/mol.
You can reuse the variables above for future pharmacology, medicinal chemistry and pharmaceutical calculations!
Now that you’ve had a taste of what the R programming language has to offer, you can take some extra steps to learn more about R.
Here are some exceptional resources to help you on your way:
Also, typing ?X where X is an in-built function in the console tab opens up a help page for that in-build function: