How to Import Data from R
=========================
If the Opal server's data importers are not sufficient (unsupported data format, missing data extraction options etc.), the recommended alternative is to use a R script (which execution can be automated to update data) as follows:
.. note::
0. Preliminary: install opalr R package
1. Connect to Opal server using ``opal.login()``
2. Load and prepare data in R as a ``tibble`` object
3. [optional] Fine tune data dictionary using ``attributes()`` or ``dictionary.apply()`` or ``dictionary.annotate()``
4. Save data using ``opal.table_save()``
5. [optional] Update data dictionary using ``opal.table_dictionary_get()``
⇒ The table is created/updated with the imported data and is to be accessed directly or through a view
Step 1 - Prerequisites
----------------------
Install R Packages
~~~~~~~~~~~~~~~~~~
Opal is a server application. The client R script will connect the Opal server. Then the prerequisites are:
* Server: having Opal connected to a functional R server,
* Client: having the `opalr R package `_ installed, and data sources accessible.
See also the :ref:`r` documentation.
Connect with Opal Server
~~~~~~~~~~~~~~~~~~~~~~~~
Your script must start with:
.. code-block:: r
# load opal library
library(opalr)
# connect to the opal server
o <- opal.login(...)
See the `opal.login() `_ documentation for more details about credentials.
Prepare Project
~~~~~~~~~~~~~~~
If the destination project does not exist yet, it is possible to create it using R:
.. code-block:: r
# create a new project with a database backend for storing tables' data
opal.project_create(o, "myproject", database = TRUE)
You can find more information in the `Opal Projects vignette `_.
Step 2 - Prepare Data in R
--------------------------
Load Data in R
~~~~~~~~~~~~~~
There are many ways of having data available in R whether the source is a file, a database, a remote service etc. Some reference manuals can be found at:
* `R Data Import/Export `_
* `R for Data Science: Data import `_
* `rio: A Swiss-Army Knife for Data I/O `_
* ...
Make sure your data are `tidy `_, and identify which column holds the identifiers. The expected R data structure for data import into Opal is a `tibble `_.
[optional] Prepare Data Dictionary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The data dictionary can be fully or partially described directly in the tibble that will be imported.
.. rubric:: Raw R Attributes
The data dictionary can be fine-tuned before saving the data into a Opal table. As an example, a R column of type ``double`` can be saved as a variable with the ``integer`` value type in place of the default ``decimal`` one. This is done by setting the R vector attributes with some Opal keys.
The R attribute keys that can be used are:
.. list-table::
:header-rows: 1
* - R attribute
- Description
* - ``opal.value_type``
- Variable value type, as described in the :ref:`vars-cats` section.
* - ``opal.unit``
- The measurement unit of the variable values.
* - ``opal.referenced_entity_type``
- The type of the entity referred when variable values are identifiers.
* - ``opal.mime_type``
- The mime type of the variable values.
* - ``opal.repeatable``
- Whether the variable has repeated values. True when value is "1", false otherwise.
* - ``opal.occurrence_group``
- Name of the occurrence group, when several variables are repeated together.
* - ``opal.index``
- Position in the variables list, for ordering.
For instance, the column *cyl* will be interpreted as a vector of integer values at *data* importation time:
.. code-block:: r
data <- tibble::as_tibble(mtcars)
# apply 'opal.value_type' attribute to 'cyl' column
attributes(data$cyl) <- list(opal.value_type = 'integer')
Another example makes a numerical variable with categories in Opal from a factor column in R:
.. code-block:: r
data <- tibble::as_tibble(mtcars)
# make column a factor, each level will be a category
data$cyl <- as.factor(data$cyl)
# append 'opal.value_type' attribute to 'cyl' column
attributes(data$cyl) <- append(attributes(data$cyl), list(opal.value_type = 'integer'))
.. rubric:: Full Data Dictionary
Another approach is to apply the full data dictionary (same structure as in the :download:`Excel template <../../archive/opalVariableTemplate.xls>`) to the tibble to be saved. Use the `dictionary.apply() `_ for that purpose.
It is not necessary to use Excel to define this data dictionary:
.. code-block:: r
data <- tibble::as_tibble(mtcars)
variables <- tibble::tribble(
~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index,
"mpg", "decimal", "Mpg label", "Value1", "years", 0, 1,
"cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2,
"disp", "decimal", "Disp label", NA, NA, 1, 3
)
categories <- tibble::tribble(
~variable, ~name, ~missing, ~`label:en`,
"cyl", "4", 0, "Four",
"cyl", "6", 0, "Six",
"cyl", "8", 1, "Height"
)
data <- dictionary.apply(data, variables, categories)
.. rubric:: Taxonomy Term Annotations
To annotate one or more variables with a taxonomy term without having to define a full data dictionary, see the `dictionary.annotate() `_ documentation.
.. code-block:: r
# annotate some variables with a taxonomy term
data <- dictionary.annotate(data,
variables = c("A_SDC_EDU_LEVEL", "A_SDC_EDU_LEVEL_AGE"),
taxonomy = "Mlstr_area",
vocabulary = "Sociodemographic_economic_characteristics",
term = "Education")
Step 3 - Import Data
--------------------
Saving the data into a Opal table is as simple as:
.. code-block:: r
# save 'data' tibble into 'mytable' table, using 'id' column to provide identifiers
opal.table_save(o, data, project = "myproject", table = "mytable", id.name = "id")
See the `opal.table_save() `_ documentation for more details about saving operation options.
You can follow the import task progress in the project's **Tasks** page.
Step 4 - [optional] Update Data Dictionary
------------------------------------------
The data dictionary in Opal is the rich description of the data.
.. rubric:: Table Dictionary
After data have been saved the data dictionary can be amended, except the variable value types. See previous section (*Prepare Data Dictionary*) to control value types at importation time.
Other data dictionary properties and attributes can be set using the same data structure as in the :download:`Excel template <../../archive/opalVariableTemplate.xls>`, expressed in R.
See the `opal.table_dictionary_update() `_ documentation (that can be usefully combined with `opal.table_dictionary_get() `_).
As an example the following data dictionary defined in R is applied to an Opal table:
.. code-block:: r
variables <- tibble::tribble(
~name, ~valueType, ~`label:en`, ~`Namespace::Name`, ~unit, ~repeatable, ~index,
"mpg", "decimal", "Mpg label", "Value1", "years", 0, 1,
"cyl", "integer", "Cyl label", "Value2", "kg/m2", 0, 2,
"disp", "decimal", "Disp label", NA, NA, 1, 3
)
categories <- tibble::tribble(
~variable, ~name, ~missing, ~`label:en`,
"cyl", "4", 0, "Four",
"cyl", "6", 0, "Six",
"cyl", "8", 1, "Height"
)
opal.table_dictionary_update(o, "myproject", "mytable", variables, categories)
.. rubric:: View Dictionary
When data type has not been specified before the import and needs to be changed, an Opal view can transform values on the fly. See the :ref:`cb-views` for making a view based on the imported table using R.