How to Import Data from Files¶
Importing SPSS, SAS, Stata Data¶
Unlike the CSV file format, the SPSS/SAS/Stata file formats contain their own data dictionary. Opal uses the haven R package (and then require a functional R server) to import data in these formats: value types, categories, missing values etc. are extracted from the R data structure read by haven.
Procedure
Note
- Go to the destination project’s tables page,
- Select Import,
- Select the Data Format SPSS (R), or SAS (R) etc., follow instructions (upload data file, set options etc.) and launch the import task,
- When the import task is completed, go to the new table’s page.
⇒ The table is created/updated with the imported data
Importing CSV Data¶
The following applies to any delimited textual file formats, CSV (comma delimiter) being the most common one.
Opal supports two CSV importers:
- CSV, which is the built-in one,
- CSV (R), available as a plugin (opal-datasource-readr, see Plugins administration), based on the readr R package.
The main difference between these two CSV importers is how the data types are handled. CSV (R) is recommended for its capability to detect data types.
CSV (R) Importer¶
This R-based importer will attempt to detect the data types by reading the first 1000 rows. In case a variable has no values in these first rows, the text value type will be used.
Procedure
Note
- Preliminary: having the opal-datasource-readr plugin installed and a functional R server,
- Go to the destination project’s tables page,
- Select Import,
- Select the Data Format CSV (R), follow instructions (upload CSV file, set options etc.) and launch the import task,
- When the CSV (R) import task is completed, go to the new table’s page.
⇒ The table is created/updated with the imported data
CSV Importer¶
In the case of CSV importer, the data types are NOT guessed from the provided CSV file:
- Either the destination table already exists and then the variable value types are the ones declared (if there is an inconsistency with the provided data, import will fail),
- Or all the value types are text.
Procedure
Setting up the destination table variables prior to the CSV import can be counter intuitive and error prone (missing variables, wrong data types): this is NOT a recommended procedure. The recommended approach is to import first the CSV data as-is and then make a view (i.e. a logical table) to transform variable value types to the correct ones.
Note
- Go to the destination project’s tables page,
- Select Import,
- Select the Data Format CSV, follow instructions (upload CSV file, set options etc.) and launch the import task,
- When the CSV import task is completed, go to the new table’s page (note that all variables have text type),
- Make a view based on the imported table: How to Transform Tables with Views.
⇒ The table is created/updated with the imported data, and is accessed through a view