1 Introduction

In this section, we will introduce some well known built-in functions. Like any other developed programming language, the R lexicon contains thousands of words which denote various functions. We’ll cover a select few in what follows, but it would be overbearing and unproductive to try providing an exhaustive list. Finding the correct function when programming will often involve reading R documentation or a trip to Stack Exchange. Certainly, you’ve spent years of training on search engines looking for all sorts of things. This skill (and it is, indeed, a skill) will come in handy for when you would like to find the correct function for a task.

By documentation, we mean that R, along with all other popular programming languages, has written explanations for how their commands actually work. In R, documentation is available online at the Comprehensive R Archive Network (CRAN). A common first place to look is Stack Exchange. Here, fellow frustrated programmers post questions concerning all types of languages. Often, it makes sense to include your language in whatever type of query you plug into a search engine (e.g. search for “how to sort a list R” rather than “how to sort a list”).

A convention for these notes will be the posting of questions marked with a Q:. Sometimes these questions will be answered, and other times you’ll be left to figure things out on your own.

Q: Suppose you want to multiply two numbers on R. By any means necessary, find the correct symbol.

The answer here is pretty simple: multiplication is denoted by the "*" symbol. Here, and in many times in the future, we are going to show code (presented in gray blocks), along with the output of code given in typewriter font. Text in typewriter font is essentially communicating that was is being written can be directly copied to a programming language (sometimes urls are also written in typewriter font).

Let’s give an example. To demonstrate multiplication in R, we may write, for example

#Comment for R code.  Comments give explanations 
#about what is going on with code, 
#but are not actually used in compiling (running) a program.
#Lots of helpful commenting is common practice for good programmers.

4*5
## [1] 20

For another example, let’s look at randomly generated numbers, a concept that we will run into quite often. To draw a single random number, we use

runif(1)
## [1] 0.3328911

Q: If I ran again, should I expect to produce the same number again?

Is this number truly random? Not quite. Programming languages often use pseudorandom numbers, which are actually produced in a deterministic fashion. There are plenty of ways to make such numbers, but a common method involves “modding out” huge integers by large prime numbers. There are several “randomness” tests that exist to check whether the behavior of pseudorandom numbers is similar to that of truly random numbers. We won’t go into such details, and for the sake of this class, feel free to act like the random numbers you generate are in fact random (even if they’re not).

Q: You would like to generate 100 numbers, randomly chosen between 3 and 5. What should you do?

In this case, R makes things quite easy. The runif function actually has three arguments. The first gives you the number of random numbers to generate. The second and third give you a range of the lower and upper numbers from which you choose uniformly.

runif(10, 3, 5)
##  [1] 4.220551 3.737474 3.606614 3.993987 4.949165 4.967508 4.818483 3.297522
##  [9] 4.554884 3.792002

If you move your cursor over a function, it will give you arguments. Sometimes you can omit arguments, and R will just provide default values. What appear to be the default values of runif?

Q: By hovering your cursor over runif, can you tell whether an argument can be omitted, and what default value will be inserted for omitted arguments?

1.1 The console and R scripts.

Suppose we’d like to compute the product 23845*92834. There are two ways to do this (well, three, I suppose, if you have a pen, paper, and some patience…). First, we can use the and simply type what we want.

Q: Compute the product 23845*92834 using the command line.

We can also evaluate by opening a .r file and running the code. To do so, open a new file (File -> New File -> R Script) and type your product in the first line. Then highlight the code you wish to run and click the ``Run" button just above your script. Using an R script is a very good idea for (at least) two reasons.

  1. You can save your code for later. To save code, simply click on the floppy disk icon above the script (aside for youngsters: have you actually ever seen one of these things?) and give your file a name. Make sure your file ends with `.r’. Otherwise R studio won’t know how to read it.
  2. Almost all programs involve multiple lines. The console is good for evaluating quick programs (one-liners). We’ll see soon that nontrivial programs involve many steps, and to evaluate them all at the same time will involve using an R script.

Q: Find the product by using an R-script. Save the file as myfirstfile.r. Open the file to make sure all’s well, and run the script again.

1.2 Exploring basic functions

R can function as a scientific calculator. This is akin to saying that a nuclear bomb can function as a bug zapper. The operators for addition, subtraction, multiplication, and division are given by

  1. \(+\) for addition, so 1+1 = 2.
  2. \(-\) for subtraction, so 1-1 = 0.
  3. \(*\) for multiplication, so 2*3 = 6.
  4. \(*\) for division, so 6/2 = 3.
  5. \(\hat{}\) for exponentiation, so 2^3 = 8.

The PEMDAS order of operations applies, although if you’re in doubt, adding additional parentheses is the way to go.

Q: Twitter sometimes generates controversy (actually, when does it not?) by asking people to evaluate 6/2*(1+2). What does R give? What different responses do you think are often provided? By using parentheses, how should the expression be written to remove any ambiguity? Also, what happens when you type in 6/2(1+2)?

The response from R is:

6/2*(1+2)
## [1] 9

We can expect that some people are thinking that the expression should be evaluated as

(6/2)*(1+2)
## [1] 9

There’s likely other people who believe that the expression should instead be

6/(2*(1+2))
## [1] 1

The important point here is that parentheses clear any ambiguity regarding PEMDAS. When in doubt, err on the side of more clarity.

1.3 Variables and class types

If you want to save a value for later, we can use a variable. For instance, let

A = (6/2)*(1+2)

There are two things to note here. First, observe how no response is returned when you run this in an R script. This is because we are simply creating a new variable, rather than commanding to print a value. Second, note how the variable A is now stored in the Environment on the upper right corner of RStudio. You’ll see the variable A along with its value `9’. Rather than typing the the entire expression again, we can simply evaluate

A
## [1] 9

Q: R actually has a special expression <- which is used for object assignment.
Test this and create the same variable of A using this assigment. Throughout these notes, we will most often use = for variable assignment (what’s a very basic argument for why = is preferable to <-), but there are in fact cases where the two commands will give you different responses (this won’t arise in our class, however).

A <- (6/2)*(1+2)
A
## [1] 9

Variables can span a vast collection of different types of objects. The type of object can be found with str.

Q: What kind of object is A?

str(A)
##  num 9

This tells you that you’re dealing with a numeric variable, with a value of 9. For another class of objects, consider character objects (often called strings in other languages), which is basically text valued variables. Here’s a popular “Hello World” program

s = 'Hello World'
s
## [1] "Hello World"

The variable s has object type

str(s)
##  chr "Hello World"

We can also look at lists of numbers, called vectors. These are written using c() and then plugging in whatever number you want. This is an ordered list, so c(1,2) is not the same as c(2,1)

v = c(6,4,7,3,8)
str(v)
##  num [1:5] 6 4 7 3 8

This is a vector of length 5, with all numeric values. We can also plug in characters for each entry.

v = c('The','rain','in','Spain')
str(v)
##  chr [1:4] "The" "rain" "in" "Spain"

Q: What happens if we ‘mix’ characters and numbers?

v = c(3,'rain','in','Spain')
str(v)
##  chr [1:4] "3" "rain" "in" "Spain"

When mixing, R will convert everything to characters. It makes sense after some thought. It’s fairly easy to convert a string to a number again. It would be nonsensical, however, going the other way around and defining everything as a number.

If you’d like to call on a certain index in a vector, use square brackets. Like humans almost always do, R begins counting at 1. This is not an obvious statement! Many languages, such as Python, begin counting at 0, and translating between the two is often filled with ‘off by 1’ errors.

v = c('The','rain','in','Spain')

v[2]
## [1] "rain"

It’s possible to call on several indices at once. You can use a colon to call on a sequence of values.

v = c('The','rain','in','Spain')

v[1:3]
## [1] "The"  "rain" "in"

Or you can simply call on the index values using another vector

v = c('The','rain','in','Spain')

v[c(1,3,4)]
## [1] "The"   "in"    "Spain"

Let’s now give a moment to discuss assigning multiple variables. We’ll start with a simple example:

Q: What do each of these programs evaluate?

a = 5
b = 4
b = a

print(b)
## [1] 5
a <- -5
b <- -4
b < - a
## [1] TRUE
print(b)
## [1] -4

Note the space in the second program! We’ve broken up the <- symbol into the two symbols < (less than) and - (negative sign). So b < -a is actually asking a yes or no type question: Is b less than -a? Since \(b = -4 < 5 = -(-5) = -a\), then answer is “Yes”, or in terms of R, TRUE or simply 1. This yes/no question didn’t actually reassign b, so when we run print(b), it simply returns it initial value of -4.

1.4 Sorting

We can also apply functions to variables. Functions use parentheses to take in variables. Beware! It is a very, very common mistake to call on functions with brackets, or call a vector index with parentheses.

Consider the vector

v = c(3,6,4,6,1)

What function should you apply to sort the vector?

sort(v)
## [1] 1 3 4 6 6

Q: How does R sort text?

v = c('The','rain','in','Spain')

sort(v)
## [1] "in"    "rain"  "Spain" "The"

Sorting is done alphabetically.

Q: What if we want to sort backwards?

v = c(3,6,4,6,1)
sort(v, decreasing= TRUE)
## [1] 6 6 4 3 1

What, then, is the default value of decreasing in sort?

1.5 Zero, infinity, and everything in between

It’s important to remember that while computers often have the upper hand with calculating expressions, once in awhile they produce an answer that we humans can immediately identify as not exactly correctly. For example, you should know from trigonometry that \(\sin(\pi) = 0\) (you do remember the unit circle, right?). However, when we ask R to evaluate this expression, we obtain

sin(pi)
## [1] 1.224647e-16

The “e-16” that we see in our answer means \(10^{-16}\). This is an extremely small number, but not quite the correct value of zero that we know is the exact answer. The issue here isn’t so much that we’re interested in measuring something with 16 digits of accuracy. These kinds of demands on precision are quite rare. Rather, problems may arise when dealing with things like if statements, which we’ll run into shortly. By this, we might say something along the lines of “if an expression is equal to 0, then run some snippet of code”. If the expression we evaluate is very small, but not exactly 0, we would not run the code snippet. In general, be cautious about expecting precise answers, especially when evaluating with continuous functions like sines, exponentials, and logarithms.

Sometimes we’ll deal with infinity, denoted in mathematics as \(\infty\), and expressed in R as Inf. As you may have learned in calculus, infinity isn’t exactly a number, but more of a concept. Nevertheless, we can still treat \(\infty\) as a number when dealing with calculations which are intuitive. For instance, it’s reasonable to say things like \(\infty +1 = \infty\). If there are infinitely many balls, and we remove or add some balls, then we should still have infinitely many balls, right? Other cases aren’t so obvious: what if we remove infinitely many balls? Do we still have infinitely many left? In ambiguous cases like these, we simply shrug our shoulders and say “Who knows?”. In R, expressing such an indeterminate form (\(0/0, \infty-\infty\), etc.) is given by NaN (read as “not a number”). > Q: What do each of the following commands do?

2/0
## [1] Inf
-1/0
## [1] -Inf
0/0
## [1] NaN
Inf/Inf
## [1] NaN
Inf+1
## [1] Inf
Inf-1
## [1] Inf
Inf+Inf
## [1] Inf
Inf-Inf
## [1] NaN
Inf*Inf
## [1] Inf

1.6 Booleans

An operator which gives a value of TRUE or FALSE (1 or 0) is called a Boolean operator. Here are Booleans for comparing two numbers.

#Here we are assigning the value of 4 to the variable 'a'
a = 4

#And then the value of 5 to 'b'
b = 5

#And 5 to 'c'
c = 5

#Here are some Boolean statements (which are essentially yes/no questions)

#Is a equal to b?
a == b
## [1] FALSE
#Is a not equal to b?
a != b
## [1] TRUE
#Is a less than b?
a< b
## [1] TRUE
#Is b less than b?
b > b
## [1] FALSE
#Is c less than or equal to (not greater than) b?
c <= b
## [1] TRUE
#Is c greater than or equal to (not greater than) a?
c >= a
## [1] TRUE