How to Import Data from R¶
If the Opal server’s data importers are not sufficient (unsupported data format, missing data extraction options etc.), the recommended alternative is to use a R script (which execution can be automated to update data) as follows:
- Preliminary: install opalr R package
- Connect to Opal server using
- Load and prepare data in R as a
- [optional] Fine tune data dictionary using
- Save data using
- [optional] Update data dictionary using
⇒ The table is created/updated with the imported data and is to be accessed directly or through a view
Step 1 - Prerequisites¶
Install R Packages¶
Opal is a server application. The client R script will connect the Opal server. Then the prerequisites are:
- Server: having Opal connected to a functional R server,
- Client: having the opalr R package installed, and data sources accessible.
See also the Using R documentation.
Connect with Opal Server¶
Your script must start with:
# load opal library library(opalr) # connect to the opal server o <- opal.login(...)
See the opal.login() documentation for more details about credentials.
Step 2 - Prepare Data in R¶
Load Data in R¶
There are many ways of having data available in R whether the source is a file, a database, a remote service etc. Some reference manuals can be found at:
[optional] Prepare Data Dictionary¶
The data dictionary can be fully or partially described directly in the tibble that will be imported.
Raw R Attributes
The data dictionary can be fine-tuned before saving the data into a Opal table. As an example, a R column of type
double can be saved as a variable with the
integer value type in place of the default
decimal one. This is done by setting the R vector attributes with some Opal keys.
The R attribute keys that can be used are:
||Variable value type, as described in the Variables and Categories section.|
||The measurement unit of the variable values.|
||The type of the entity referred when variable values are identifiers.|
||The mime type of the variable values.|
||Whether the variable has repeated values. True when value is “1”, false otherwise.|
||Name of the occurrence group, when several variables are repeated together.|
||Position in the variables list, for ordering.|
For instance, the column cyl will be interpreted as a vector of integer values at data importation time:
data <- tibble::as_tibble(mtcars) # apply 'opal.value_type' attribute to 'cyl' column attributes(data$cyl) <- list(opal.value_type = 'integer')
Another example makes a numerical variable with categories in Opal from a factor column in R:
data <- tibble::as_tibble(mtcars) # make column a factor, each level will be a category data$cyl <- as.factor(data$cyl) # append 'opal.value_type' attribute to 'cyl' column attributes(data$cyl) <- append(attributes(data$cyl), list(opal.value_type = 'integer'))
Full Data Dictionary
It is not necessary to use Excel to define this data dictionary:
data <- tibble::as_tibble(mtcars) variables <- tibble::tribble( ~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index, "mpg", "decimal", "Mpg label", "Value1", "years", 0, 1, "cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2, "disp", "decimal", "Disp label", NA, NA, 1, 3 ) categories <- tibble::tribble( ~variable, ~name, ~missing, ~`label:en`, "cyl", "4", 0, "Four", "cyl", "6", 0, "Six", "cyl", "8", 1, "Height" ) data <- dictionary.apply(data, variables, categories)
Taxonomy Term Annotations
To annotate one or more variables with a taxonomy term without having to define a full data dictionary, see the dictionary.annotate() documentation.
# annotate some variables with a taxonomy term data <- dictionary.annotate(data, variables = c("A_SDC_EDU_LEVEL", "A_SDC_EDU_LEVEL_AGE"), taxonomy = "Mlstr_area", vocabulary = "Sociodemographic_economic_characteristics", term = "Education")
Step 3 - Import Data¶
Saving the data into a Opal table is as simple as:
# save 'data' tibble into 'mytable' table, using 'id' column to provide identifiers opal.table_save(o, data, project = "myproject", table = "mytable", id.name = "id")
See the opal.table_save() documentation for more details about saving operation options.
You can follow the import task progress in the project’s Tasks page.
Step 4 - [optional] Update Data Dictionary¶
The data dictionary in Opal is the rich description of the data.
After data have been saved the data dictionary can be amended, except the variable value types. See previous section (Prepare Data Dictionary) to control value types at importation time.
Other data dictionary properties and attributes can be set using the same data structure as in the
Excel template, expressed in R.
As an example the following data dictionary defined in R is applied to an Opal table:
variables <- tibble::tribble( ~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index, "mpg", "decimal", "Mpg label", "Value1", "years", 0, 1, "cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2, "disp", "decimal", "Disp label", NA, NA, 1, 3 ) categories <- tibble::tribble( ~variable, ~name, ~missing, ~`label:en`, "cyl", "4", 0, "Four", "cyl", "6", 0, "Six", "cyl", "8", 1, "Height" ) opal.table_dictionary_update(o, "myproject", "mytable", variables, categories)
When data type has not been specified before the import and needs to be changed, an Opal view can transform values on the fly. See the How to Transform Tables with Views for making a view based on the imported table using R.