Magrittr
Magrittr
{magrittr}
is a package that allows us to perform more operations within a timeline to make our code more efficient to write and to read. Its main implementation is the pipe (%>%
) which we already discussed in Dplyr. However, it also offers as a number of aliases to make operators available in our pipelines.
.
and `
Before we get into the aliases offered by {magrittr}
, we should realize that we can do without these aliases by using dots (.
) and backticks (`
). Here, the dot indicates the data in its current form (i.e. after all previous operations have been applied) and the backticks are put around the operator (e.g. +). The operator is then treated as a function (i.e. parentheses are required).
For instance, take the below data frame:
If we want to subtract 3 from all numbers, multiply by 1.5, and then set all negative values to NA
, we could do the following:
Aliases
However, by using {magrittr}
allows us functions instead of these operators that improve readability of the code and are easier to use, which are called the aliases. Using those aliase, we would write the above code as:
# Start piepline with vector
vec_fib %>%
# Subtract 3
subtract(3) %>%
# Multiply by 1.5
multiply_by(1.5) %>%
# Set all negative values to NA
# Here, the alias is perhaps not as useful, as we use . < 0 in a function. The aliases are mostly useful as first function in each piece of the pipeline
ifelse(is_less_than(., 0), NA, .)
[1] NA NA NA 0.0 3.0 7.5 15.0 27.0 46.5 78.0
The available aliases (also available here or in the help function of each alias in R) are:
Description | Symbol |
---|---|
extract2 | `[[` |
inset | `[<-` |
inset2 | `[[<-` |
use_series | `$` |
add | `+` |
subtract | `-` |
multiply_by | `*` |
raise_to_power | `^` |
multiply_by_matrix | `%*%` |
divide_by | `/` |
divide_by_int | `%/%` |
mod | `%%` |
is_in | `%in%` |
and | `&` |
or | `|` |
equals | `==` |
is_greater_than | `>` |
is_weakly_greater_than | `>=` |
is_less_than | `<` |
is_weakly_less_than | `<=` |
not (n’est pas) | `!` |
set_colnames | `colnames<-` |
set_rownames | `rownames<-` |
set_names | `names<-` |
set_class | `class<-` |
set_attributes | `attributes<-` |
set_attr | `attr<-` |
|>
Besides the pipe implemented by {magrittr}
, R also offers a native pipe: |>
. Instead of existing data being called by the dot (.
), you can use a low dash (_
). Although in general the pipes function the same, there are some differences in what they can do and how they are used. The tidyverse has a more elaborate explanation on this topic here and more details are also available here and here.
Other pipes
The %>%
pipe is not the only pipe {magrittr}
offers us.
%T>%
The Tee pipe returns the left-hand side of the value instead of the right-hand side. In other words, it returns the input into the function instead of the output of the function. This is helpful when we are only interested in the side-effects of a function, instead of its main output (e.g. printing in console).
Imagine we are interested in only the description of a .csv file, but not actually loading it into our global environment. In that case, I could use the read_csv()
function from {readr}
with the Tee pipe:
# Load readr
pacman::p_load("readr")
# Get information on .csv file available online
"https://drive.google.com/uc?id=1zO8ekHWx9U7mrbx_0Hoxxu6od7uxJqWw&export=download" %T>%
# Print only data information
read_csv()
Rows: 100 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Customer Id, First Name, Last Name, Company, City, Country, Phone...
dbl (1): Index
date (1): Subscription Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1] "https://drive.google.com/uc?id=1zO8ekHWx9U7mrbx_0Hoxxu6od7uxJqWw&export=download"
%$%
The exposition pipe gives the names of the data to the next function, which is especially useful if the function does not have a data argument, such as table()
:
$<>$
The assignment pipe is a shorthand pipe for a pipeline that assigns the final value back into the object that was used as the start of the pipeline (i.e. it is short for x <- x %>% ...
). For example:
Exercises
1. Use aliases to change values
From the starwars dataset (available in {dplyr}
, ), extract the column height and divide by 100.
Answer
[1] 1.72 1.67 0.96 2.02 1.50 1.78 1.65 0.97 1.83 1.82 1.88 1.80 2.28 1.80 1.73
[16] 1.75 1.70 1.80 0.66 1.70 1.83 2.00 1.90 1.77 1.75 1.80 1.50 NA 0.88 1.60
[31] 1.93 1.91 1.70 1.85 1.96 2.24 2.06 1.83 1.37 1.12 1.83 1.63 1.75 1.80 1.78
[46] 0.79 0.94 1.22 1.63 1.88 1.98 1.96 1.71 1.84 1.88 2.64 1.88 1.96 1.85 1.57
[61] 1.83 1.83 1.70 1.66 1.65 1.93 1.91 1.83 1.68 1.98 2.29 2.13 1.67 0.96 1.93
[76] 1.91 1.78 2.16 2.34 1.88 1.78 2.06 NA NA NA NA NA
2. Use different pipes to meddle with starwars
Now, reassign a frequency table of species to the object name starwars of individuals of at least 2 meters tall using the pipe operators from {magrittr}
.
Next topic
With that, we discussed a large part of the tidyverse. Although some other packages exist, we discuss these in other sections. Now that we know part of the basic grammar of the tidyverse, we can learn a new set of skills useful for any data analysis.
Next: Plotting