Datasources are the entry point in Opal for accessing to Variables and Data. Datasources can be of different kinds, some being more suitable for different purposes (variables import, data import and export, permanent storage).
|Datasource type||Variables Import||Variables and Data Import||Variables and Data Export||Storage|
|Opal Archive Datasource||x||x|
|SPSS R Datasource||x||x|
|SAS R Datasource||x||x|
|Stata R Datasource||x||x|
|Opal SQL Datasource||x|
|Tabular SQL Datasource||x||x||x|
File Based Datasources¶
File based datasources are convenient for import and export operations.
CSV datasource will expect the file to use a “delimiter separated values” format (default delimiter being comma). The first column will represent the entity identifiers and the subsequent column names will identify variables. Each row of the file (except the first row) are the values for one entity. The entity identifier must be unique: there cannot be two rows starting with the same identifier.
Due to the nature of the CSV format, the data dictionary is limited to the variable names (i.e. the name of the columns). A CSV file can be imported as-is but the variables will be considered as being of text type only. When importing CSV data, if the destination table already exists, Opal will consider that the data dictionary of the CSV file is the one of the destination table. Then before importing CSV data it is recommended to prepare the destination table variables first.
The following data dictionary is used in this example:
- Var1: text value type
- Var2: integer value type
- Var3: text value type, repeatable (i.e. each value is a sequence of value)
The data to be represented in CSV are for instance:
|123||This is a value||1||“Value 1”,”Value 2”|
This is a
The CSV file uses the options:
- the separator character: ,
- the quote character: “
ID,Var1,Var2,Var3 123,"This is a value",1,"""Value 1"",""Value 2""" 234,,2,"x,y" 345,"This is a multi-line value",,a
For more information about CSV format:
Opal Archive Datasource¶
Opal Archive datasource is a fully featured file-based datasource. This datasource comes as a .zip file (that can be optionally encrypted) containing a folder for each table having: the full data dictionary in a XML file, a XML data file per entity. This is the file format used when exporting data from Onyx.
An SPSS datasource is a read-only datasource. The SPSS source file must be a valid non-compressed binary file with a .sav extension. In Opal an SPSS file represents a table and its variables are used as the table’s data dictionary. An Opal compatible SPSS file must have its first variable represent the identifiers. If this is not the case, before a file import, the identifier variable must be moved to the first position of the SPSS variable sheet.
The following SPSS variable attributes are imported to the data dictionary:
- format (F9.2, ADate10, etc)
In addition, variable categories and missing values are also imported and converted to their Opal counterparts
Currently, Opal does not handle missing values with large intervals (-9999..9999). Until a more robust solution is implemented, try to keep the intervals small or discrete.
Opal supports both Excel 97 and Excel 2007 formats. Excel format limitations are:
|.xls||Excel 97||256 columns and 64K lines|
|.xlsx||Excel 2007||16K columns and 1M lines|
R Based Datasources¶
R based datasources are datasources that are using R server to extract/write data in a given format. The supported formats are the ones defined in the haven R package (package which is expected to be installed on the R server). Note that this is still an experimental feature: value type mappings with R could change in a future release and some limitations of the haven package may apply.
SPSS R Datasource¶
The expected/produced file extension is .sav.
SAS R Datasource¶
The expected/produced file extension is .sas7bdat. If when importing, a file exists with same base name in the same parent folder and with extension .sas7bcat, it will be automatically used as the catalog file.
Stata R Datasource¶
The expected/produced file extension is .dta.
SQL Based Datasources¶
SQL based datasources are convenient for variables and data storage. With some limitations, this type of datasource can be used for import and export.
Opal SQL Datasource¶
Opal SQL is the most versatile datasource type with MongoDB datasource. The underlying SQL database schema is a EAV which allows to store an unlimited number of variables.
For more information about this datasource see Opal SQL Schema documentation.
Tabular SQL Datasource¶
Tabular SQL datasources are suitable for datasets with a (relatively) small number of variables. Data copied into Tabular SQL datasource are stored in classical SQL tables, i.e. one row per entity and one variable per column. Check SQL database vendor specifications to know the number of columns (i.e. variables) that can be defined for a table: see for instance MySQL Table Column-Count and Row-Size Limits. Comprehensive meta-data for each column field can be optionally stored in separated tables. Opal is able to increment copies into Tabular SQL datasources if update timestamp column is given.
For more information about this datasource see Tabular SQL Schema documentation.
Document Oriented Datasources¶
NoSQL document oriented datasources are convenient alternative to SQL based datasources. It allows to store an unlimited number of variables.
MongoDB is the most versatile datasource type with Opal SQL datasource.
Other Server Based Datasources¶
Server based datasources are convenient for import operations, from a data collection application usually.
Limesurvey datasource is able to extract, from a Limesurvey SQL database, one table per survey with its fully described data dictionary. The data that will be imported are the interviews that are completed.
Opal datasource allows one Opal server to connect to a remote Opal server. This can be useful when syncing datasources in different Opal instances.