How to Import Data from R
If the Opal server’s data importers are not sufficient (unsupported data format, missing data extraction options etc.), the recommended alternative is to use a R script (which execution can be automated to update data) as follows:
Note
Preliminary: install opalr R package
Connect to Opal server using
opal.login()
Load and prepare data in R as a
tibble
object[optional] Fine tune data dictionary using
attributes()
ordictionary.apply()
ordictionary.annotate()
Save data using
opal.table_save()
[optional] Update data dictionary using
opal.table_dictionary_get()
⇒ The table is created/updated with the imported data and is to be accessed directly or through a view
Step 1 - Prerequisites
Install R Packages
Opal is a server application. The client R script will connect the Opal server. Then the prerequisites are:
Server: having Opal connected to a functional R server,
Client: having the opalr R package installed, and data sources accessible.
See also the Using R documentation.
Connect with Opal Server
Your script must start with:
# load opal library
library(opalr)
# connect to the opal server
o <- opal.login(...)
See the opal.login() documentation for more details about credentials.
Prepare Project
If the destination project does not exist yet, it is possible to create it using R:
# create a new project with a database backend for storing tables' data
opal.project_create(o, "myproject", database = TRUE)
You can find more information in the Opal Projects vignette.
Step 2 - Prepare Data in R
Load Data in R
There are many ways of having data available in R whether the source is a file, a database, a remote service etc. Some reference manuals can be found at:
Make sure your data are tidy, and identify which column holds the identifiers. The expected R data structure for data import into Opal is a tibble.
[optional] Prepare Data Dictionary
The data dictionary can be fully or partially described directly in the tibble that will be imported.
Raw R Attributes
The data dictionary can be fine-tuned before saving the data into a Opal table. As an example, a R column of type double
can be saved as a variable with the integer
value type in place of the default decimal
one. This is done by setting the R vector attributes with some Opal keys.
The R attribute keys that can be used are:
R attribute |
Description |
---|---|
|
Variable value type, as described in the Variables and Categories section. |
|
The measurement unit of the variable values. |
|
The type of the entity referred when variable values are identifiers. |
|
The mime type of the variable values. |
|
Whether the variable has repeated values. True when value is “1”, false otherwise. |
|
Name of the occurrence group, when several variables are repeated together. |
|
Position in the variables list, for ordering. |
For instance, the column cyl will be interpreted as a vector of integer values at data importation time:
data <- tibble::as_tibble(mtcars)
# apply 'opal.value_type' attribute to 'cyl' column
attributes(data$cyl) <- list(opal.value_type = 'integer')
Another example makes a numerical variable with categories in Opal from a factor column in R:
data <- tibble::as_tibble(mtcars)
# make column a factor, each level will be a category
data$cyl <- as.factor(data$cyl)
# append 'opal.value_type' attribute to 'cyl' column
attributes(data$cyl) <- append(attributes(data$cyl), list(opal.value_type = 'integer'))
Full Data Dictionary
Another approach is to apply the full data dictionary (same structure as in the Excel template
) to the tibble to be saved. Use the dictionary.apply() for that purpose.
It is not necessary to use Excel to define this data dictionary:
data <- tibble::as_tibble(mtcars)
variables <- tibble::tribble(
~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index,
"mpg", "decimal", "Mpg label", "Value1", "years", 0, 1,
"cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2,
"disp", "decimal", "Disp label", NA, NA, 1, 3
)
categories <- tibble::tribble(
~variable, ~name, ~missing, ~`label:en`,
"cyl", "4", 0, "Four",
"cyl", "6", 0, "Six",
"cyl", "8", 1, "Height"
)
data <- dictionary.apply(data, variables, categories)
Taxonomy Term Annotations
To annotate one or more variables with a taxonomy term without having to define a full data dictionary, see the dictionary.annotate() documentation.
# annotate some variables with a taxonomy term
data <- dictionary.annotate(data,
variables = c("A_SDC_EDU_LEVEL", "A_SDC_EDU_LEVEL_AGE"),
taxonomy = "Mlstr_area",
vocabulary = "Sociodemographic_economic_characteristics",
term = "Education")
Step 3 - Import Data
Saving the data into a Opal table is as simple as:
# save 'data' tibble into 'mytable' table, using 'id' column to provide identifiers
opal.table_save(o, data, project = "myproject", table = "mytable", id.name = "id")
See the opal.table_save() documentation for more details about saving operation options.
You can follow the import task progress in the project’s Tasks page.
Step 4 - [optional] Update Data Dictionary
The data dictionary in Opal is the rich description of the data.
Table Dictionary
After data have been saved the data dictionary can be amended, except the variable value types. See previous section (Prepare Data Dictionary) to control value types at importation time.
Other data dictionary properties and attributes can be set using the same data structure as in the Excel template
, expressed in R.
See the opal.table_dictionary_update() documentation (that can be usefully combined with opal.table_dictionary_get()).
As an example the following data dictionary defined in R is applied to an Opal table:
variables <- tibble::tribble(
~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index,
"mpg", "decimal", "Mpg label", "Value1", "years", 0, 1,
"cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2,
"disp", "decimal", "Disp label", NA, NA, 1, 3
)
categories <- tibble::tribble(
~variable, ~name, ~missing, ~`label:en`,
"cyl", "4", 0, "Four",
"cyl", "6", 0, "Six",
"cyl", "8", 1, "Height"
)
opal.table_dictionary_update(o, "myproject", "mytable", variables, categories)
View Dictionary
When data type has not been specified before the import and needs to be changed, an Opal view can transform values on the fly. See the How to Transform Tables with Views for making a view based on the imported table using R.