This repository has been archived by the owner on Jun 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
conversion.qmd
153 lines (117 loc) · 4.7 KB
/
conversion.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
title: "Conversion to and from R data"
---
One of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R's C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.
Thus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.
```{r, echo=FALSE}
# to enable extendr/extendrsrc chunks
library(rextendr)
# names(knitr::knit_engines$get())
```
A fundamental data-type in R is the 32-bit integer, `int` in C, and `i32` in Rust. Passing that type around is essential, and straight forward:
```{extendrsrc}
#[extendr(use_try_from = true)]
fn ultimate_answer() -> i32 {
return 42_i32;
}
```
And now this function is available within your R-session, as the output is `r ultimate_answer()`.
Also, another fundamental data-type in R is `numeric` / `f64`, which we can also pass back and forth uninhibitated, e.g.
```{extendrsrc}
#[extendr(use_try_from = true)]
fn return_tau() -> f64 {
std::f64::consts::TAU
}
```
where $\tau := 2\pi =$ $`r return_tau()`$.
However, passing data from R to Rust must be done with a bit of care: In R, representing a true integer in literal form requires using `L` after the literal.
```{extendrsrc}
#[extendr(use_try_from = true)]
fn bit_left_shift_once(number: i32) -> i32 {
number << 1
}
```
This function supposedly is a clever way to multiply by two, however passing `bit_left_shift_once(21.1)` results in
```{r, error=TRUE, echo=FALSE}
bit_left_shift_once(21.1)
```
where `bit_left_shift_once(21)` is `r bit_left_shift_once(21L)`, as expected.
R also has the concept of missing numbers, `NA` encoded within its data-model. However `i32`/`f64` do not natively have a representation for `NA` e.g.
```{r, error=TRUE}
bit_left_shift_once(NA_integer_)
bit_left_shift_once(NA_real_)
bit_left_shift_once(NA)
```
Instead, we have to rely on extendr's scalar variants of R types, `Rint` / `Rfloat` to encompass the notion of `NA` in our functions:
```{extendrsrc}
#[extendr(use_try_from = true)]
fn double_me(value: Rint) -> Rint {
if value.is_na() {
Rint::na()
} else {
(value.inner() << 1).into()
}
}
```
which means, we can now handle missing values in the arguments
```{r, error=FALSE}
double_me(NA_integer_)
double_me(NA_real_)
double_me(NA)
```
One may notice here that `NA_real_` was accepted even for an `Rint`. The reason
for this, is when you specify a type without `&`/`&mut`, the value is coerced
in a similar way, as R coerces values. In order to have strict type-checking
during run-time, use `&` / `&mut`, as
```{extendrsrc}
#[extendr(use_try_from = true)]
fn wrong_input(value: &Rint) -> Rint {
value.clone()
}
```
```{r, error=TRUE}
wrong_input(NA_integer_)
wrong_input(NA_real_)
wrong_input(21.0)
wrong_input(21L)
```
Here, only the last literal is a true `Rint`.
## Vectors
Most data in R are vectors. Scalar values are in fact 1-sized vectors, and
even lists are defined by a vector-type. A vector type in Rust is `Vec`. A
`Vec` has a type-information, length, and capacity. This means, that if necessary,
we may expand any given `Vec`-data to contain more values, and only when capacity
is exceeded, will there be a reallocation.
Naively, we may define a function like so
```{extendrsrc}
#[extendr(use_try_from = true)]
fn repeat_us(mut values: Vec<i32>) -> Vec<i32> {
assert_eq!(values.capacity(), values.len(), "must have zero capacity left");
values[0] = 100;
values.push(55);
values
}
```
```{r, error=TRUE}
x <- c(1L, 2L, 33L)
repeat_us(x)
```
Even if the argument is `mut Vec<_>`, what happens is that the R vector gets
converted to a Rust owned type, and it is that type that we can modify, and augment, with syncing to the original data.
Of course, a slice e.g. `&[i32]` / `&mut [i32]` could be used instead, and this allows us to modify the original data, i.e.
```{extendrsrc}
#[extendr(use_try_from = true)]
fn zero_middle_element(values: &mut [i32]) {
let len = values.len();
let middle = len / 2;
values[middle] = 0;
}
```
```{r}
x <- c(100L, 200L, 300L)
zero_middle_element(x)
x
```
This is great! If we wanted to insert an `NA` in the middle, we would have had to operate on `&mut [Rint]` instead.
A slice is a representation of a sequence of elements that are part of a larger collection. Since they represent only part of a collection (vector, in this case), we cannot add new elements to this. To do so, we have to rely on extendr provided types, that provide a `Vec`-like API to R's vector-types. These are the `Integers`, `Logicals`, `Doubles`, and `Strings` types.
## Strings are special