Accessing WoSIS points (tutorial in R)

WoSIS (World Soil Information Service) is ISRIC's data service, primarily for soil profile and soil samples data. WoSIS is a PostgreSQL DB with multiple data services running on top. The system currently serves about 100,000 soil profiles (with coordinates) covering some 60+ countries. For more info about WoSIS please refer to the official documentation.

To access soil profile (point) data you can either use QGIS or similar GIS software that supports Web Feature Service. This document explains primarily how to access WoSIS points directly from R, but this assumes that you have already installed:

  • R / RStudio;
  • R packages: plotKML, rgdal, aqp
  • QGIS;
  • GDAL (for Windows machines use e.g. “gdal-???-1800-x64-core.msi” installer;

The complete R tutorial you can download from github. Instructions on how to install and set-up all software used in this example you can find here.

To open and view WoSIS points in QGIS, you need to first connect to the WFS by selecting “Layer → Add Layer → Add WFS layer” (http://wfs.isric.org/geoserver/wosis/wfs).

WoSIS WFS in QGIS.

Next, select spatial layer of interest listed under “Title”. Note, it might take even few minutes until you fetch all points for the global coverage (important: the maximum number of points that it can fetch is set to 100,000). So possibly a more efficient approach, if you require only a selection of points, is to first zoom to the area of interest and then when adding the layer tick on “Only request features overlapping the current view extent”. Another option, when adding the layer, is to double click on “Filter” column, expand “Fields and Values” and, for example, build the expression e.g.:

country_id = 'BR'
Query WoSIS points using expression builder.

After you fetch point data, you can view it in QGIS, best overlaid over the country borders (you can download those from e.g. NaturalEarth website or install the 'OpenLayers plugin' under 'Plugins' menu). The global coverage map will look something like this:

Global distribution of WoSIS points shown in QGIS.

After every major update of WoSIS a snapshot of the harmonized values is generated and shared publicly (read more). These are interesting for the users that aim at running global analysis (downloading all points via WFS could take a lot of time). The following R data object (“wosis_profiles.rda”; 15MB in size) has been prepared on May 15th 2017 by querying directly tabular data in the PostgreSQL database. You can access this data by using e.g.:

> download.file("http://gsif.isric.org/lib/exe/fetch.php?media=wosis_profiles.rda", "wosis_profiles.rda")
> load("wosis_profiles.rda")
> str(wosis_profiles)
List of 2
 $ sites   :'data.frame':    109321 obs. of  6 variables:
  ..$ profile_id   : int [1:109321] 144913 144914 144915 144916 144917 144918 144919 144920 144921 145374 ...
  ..$ country_id   : chr [1:109321] "AE" "AE" "AE" "AE" ...
  ..$ datasets     : chr [1:109321] "US-NCSS" "US-NCSS" "US-NCSS" "US-NCSS" ...
  ..$ latitude     : num [1:109321] 24 24.1 24.3 24.3 24.2 ...
  ..$ longitude    : num [1:109321] 53.3 52.5 54.2 54.2 54.2 ...
  ..$ geom_accuracy: num [1:109321] 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 ...
 $ horizons:'data.frame':    679217 obs. of  70 variables:
  ..$ profile_layer_id      : int [1:679217] 152511 152543 152574 152609 152639 152671 152704 152737 152767 152801 ...
  ..$ dataset_id            : chr [1:679217] "BE-UplandsI" "BE-UplandsI" "BE-UplandsI" "BE-UplandsI" ...
  ..$ profile_id            : int [1:679217] 36897 36898 36899 36900 36901 36902 36903 36904 36905 36906 ...
  ..$ upper_depth           : int [1:679217] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ lower_depth           : int [1:679217] 30 24 30 27 36 36 36 30 27 36 ...
  ..$ sample_code           : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ country_id            : chr [1:679217] "BE" "BE" "BE" "BE" ...
  ..$ title                 : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ reference_page        : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ reference_profile_code: chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ isbn                  : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ publisher             : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ publication_year      : int [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ issue                 : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ url                   : chr [1:679217] NA\ NA\ NA\ NA\ ...
  ..$ bd_fe                 : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ bd_ws                 : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cce_f01               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cce_t                 : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ carbon_o              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ carbon_t              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ nitrogen_t            : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ gypsum                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ electrical_c          : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ hydraulic_c           : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ wr_g                  : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ wr_v                  : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_cacl2              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_h2o                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_kcl                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_naf                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_nh4cl              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_field              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ph_unk                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfg_f01               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfg_f02               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfg_f03               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfg_t                 : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfv_f01               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfv_f02               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfv_f03               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cfv_t                 : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f01              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f02              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f03              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f04              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f05              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f06              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f07              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f08              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f09              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_f10              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ sand_t                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ silt_f01              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ silt_f02              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ silt_f03              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ silt_t                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ clay_f01              : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ clay_t                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_al               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_h                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_ca               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_mg               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_k                : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_na               : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_acidity          : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ exch_bases            : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ cec                   : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ecec                  : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...
  ..$ ecec_isric            : num [1:679217] NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ NA\ ...

Explanation of column codes can be found here. Note that each soil horizon might have different data license and terms of use, so to be on the safe side, best use WoSIS Web Feature Service to access the most up-to-date information.

A more flexible way to access WoSIS points is to use the GDAL functionality (see WFS driver documentation for GDAL). For this we can use OGR functions ogrinfo and ogr2ogr (basically a translation function). Before you can start, make sure you first install GDAL binaries (do not come with rgdal). Under Windows OS, this requires first locating GDAL executables. If GDAL is installed and path known, we can request information about the data on the server by using the ogrinfo:

> gdal.dir <- shortPathName("C:/Program Files/GDAL")
> ogr2ogr <- paste0(gdal.dir, "/ogr2ogr.exe")
> ogrinfo <- paste0(gdal.dir, "/ogrinfo.exe")
> system(paste(ogrinfo, '-ro WFS:\"http://wfs.isric.org/geoserver/wosis/wfs\"'))
INFO: Open of `WFS:http://wfs.isric.org/geoserver/wosis/wfs'
      using driver `WFS' successful.
1: wosis:profile (Point)
2: wosis:bulk_density_fine_earth (Point)
3: wosis:bulk_density_whole_soil (Point)
4: wosis:calcium_carbonate_equivalent_total (Point)
5: wosis:organic_carbon (Point)
6: wosis:total_carbon (Point)
7: wosis:coarse_fragments_gravimetric_total (Point)
8: wosis:coarse_fragments_volumetric_total (Point)
9: wosis:clay_total (Point)
10: wosis:sand_total (Point)
11: wosis:silt_total (Point)
12: wosis:water_retention_gravimetric (Point)
13: wosis:water_retention_volumetric (Point)
14: wosis:ph_cacl2 (Point)
15: wosis:ph_h2o (Point)
16: wosis:ph_kcl (Point)
17: wosis:ph_naf (Point)

This gives a list of layers currently available via WoSIS. For more info about the code names please refer to the official documentation.

To import points from WoSIS into R and then use them for analysis or visualize them in Google Earth we can fetch only a subset of points i.e. clay content for a bounding box of 10 by 5 degrees (France) directly via the ogr2ogr:

> system(paste(ogr2ogr, '-f \"ESRI Shapefile\" clay_total_sub.shp WFS:\"http://wfs.isric.org/geoserver/wosis/wfs" clay_total -clipsrc 0 45 10 50'))
Warning 6: Normalized/laundered field name: 'profile_layer_id' to 'profile_la'
Warning 6: Normalized/laundered field name: 'descriptor_id' to 'descriptor'
Warning 6: Normalized/laundered field name: 'profile_code' to 'profile_co'
Warning 6: Normalized/laundered field name: 'observation_date' to 'observatio'
Warning 6: Normalized/laundered field name: 'upper_depth' to 'upper_dept'
Warning 6: Normalized/laundered field name: 'lower_depth' to 'lower_dept'

Note that 0 45 10 50 are the xmin ymin xmax ymax coordinates.

Other conversion possibilities using the WFS driver are explained here. The output of the operation above will fetch only few hundred points, which we can now import to R via rgdal package:

> library(rgdal)
> clay_total_sub <- readOGR("clay_total_sub.shp", "clay_total_sub")
OGR data source with driver: ESRI Shapefile
Source: "clay_total_sub.shp", layer: "clay_total_sub"
with 406 features
It has 15 fields

Note that these are in fact 3D points as they refer to different sampling depths (see: “upper_dept”, and “lower_depth” columns). We can visualize the points by using the plotKML package:

> library(plotKML)
> shape = "http://maps.google.com/mapfiles/kml/pal2/icon18.png"
> kml(clay_total_sub, colour=value, shape=shape, points_names=clay_total_sub$value, balloon=TRUE)
KML file opened for writing...
Writing to KML...
Closing  clay_total_sub.kml
> kml_View("clay_total_sub.kml")
WoSIS points with tabular data shown in Google Earth.

Probably more accurate thing to do with this data is to visualize it as 3D points, which is possible by adding the 3rd dimension into the kml function:

> clay_total_sub$depth <- clay_total_sub$upper_dept + (clay_total_sub$lower_dept - clay_total_sub$upper_dept)/2
> kml(clay_total_sub, file="clay_total_sub3D.kml", colour=value, shape=shape, points_names=clay_total_sub$value, balloon=TRUE, altitude=300-depth)
KML file opened for writing...
Writing to KML...
Closing  clay_total_sub3D.kml
kml_View("clay_total_sub3D.kml")
Soil profile with horizon values displayed in Google Earth.

To further explore possibilities of processing and visualizing soil profile data, consider using the aqp package, which will allow you to produce soil depth plots such as the one shown in this gallery.