Geographic Information System (GIS) software has been in existence since the 1980's and since then over 80 raster and vector file formats have been created. As many of these file formats are proprietary, specific software is required to work with them limiting their reuse. In an effort to promote the interoperability of geospatial data, a few open standards have emerged. The following document focuses on these open file formats and provides a brief explanation of each. Links to resources on how to create and open each file format have been included, along with an example file for each format.
The file formats have been broken-down based on the types of data they can contain - either raster, vector, or both. For a brief explanation of raster and vector data types, please visit https://libguides.colostate.edu/gis/geospatial_data_basics
GeoTIFF
Hierarchical Data Format (HDF)
Network Common Data Format (NetCDF)
ESRI Shapefile
Comma Separated Values (CSV) and Tab Separated files (TSV)
Geographic Markup Language (GML)
Keyhole Markup Language (KML)
GeoJSON
The combination of a Tag Image File Format (TIFF) with geospatial (i.e., location) information produces a GeoTIFF. The extension for these files is either .tif or .tiff, and upon first glance look like standard image files. The inclusion of geospatial information as part of TIFF files can be determined using a GIS or programmatically using GDAL.
2016 riparian vegetation within the Colorado River Basin https://mountainscholar.org/handle/10217/186040
HDF files are very versatile allowing researchers to store images, tables, graphs, and even documents within one file. Identified by files with the extension .hdf, .he5, and .h5, these files are comparable to a folder with a root level containing child folders and files within. These self-describing files can store multi-dimensional data and are useful for storing general-purpose scientific data. HDF5 is the latest version which is much more structured than the previous.
OMI Level 2 O2-O2 Cloud Data https://gamma.hdfgroup.org/ftp/pub/outgoing/opendap/data/HDF5/zoo/OMI-Aura_L2-OMCLDO2_2009m1001t0941-o27729_v003-2009m1001t155925.h5
Developed in 1980 for sharing atmospheric science data, the NetCDF file format has gained popularity across many scientific disciplines. NetCDF files have the file extension .netcdf, .nc, and .nc4. They can be used to store large numeric multi-dimensional datasets with a time-series and spatial component, making them great for computational model outputs. Like the Hierarchical Data Format (HDF), NetCDF files are self-describing ensuring that documentation accompanies the data when shared. A few organizations using NetCDF include NOAA, NASA and USGS.
Understanding the response of tropical ascent to warming using an energy balance framework https://mountainscholar.org/handle/10217/199724
Developed in the early 1990's, the shapefile format is used to store vector data in either points, lines, or polygons. The main shapefile with extension .shp must be accompanied by at least two other files with extensions .shx and .dbf. Often more than 3 files encompass a single shapefile, all with the same name but different extensions (denoting the different types of information stored in each). This file format requires field names shorter than 11 characters and file sizes less than 2 GB. When sharing these data, it is important to package all accompanying files in zip so they remain together.
Snow persistence grids and snow zone shapefiles for the western United States https://mountainscholar.org/handle/10217/171907
CSV or TSV files are plain text with columns of data separated by commas (or tabs) and each line represents a new record. Having a field or fields with census tracts, zip codes, or latitude and longitude values allows each record to be spatially aware. These files can be opened in any text editor, spreadsheet software or using programmatic methods. These files can be identified by the extension .csv or .tsv. It is best to include column names in the top line of these files, though it could be omitted (known as head-less files). Beyond a column header, documentation for these files generally remains separate. Efforts like the ICARTT standard (file extension .ict) have reserved a portion in the top of these files to more thoroughly describe each column of data, offering a self-describing option.
2016 NETCARE Amundsen campaign ship_s position data SHIPSPOSN_Amundsen_20160714_R0_NETCARE.ict
GML is an Extensible Markup Language (XML) structure for storing geographic features. Initially released in 2000, the GML format provided a way to standardize geospatial data being shared online. GML files can have the extension of either .gml, or .xml, and as with most XML structures, there is a separate schema file to describe the elements used in the data and the data itself. The GML schema offers metadata for coordinate reference systems, units of measure, and feature descriptors (for primitive objects like points, lines, polygons, and even curves). The schema is also extensible allowing for new feature descriptions of real-world objects like roads, bridges, etc. for use within a specific community. In 2007 GML became an ISO standard, with the official number ISO 19136:2007. GML is the default encoding for all INSPIRE (Infrastructure for Spatial Information in Europe, https://inspire.ec.europa.eu/about-inspire/563) spatial data themes. Though generally not considered for its ability to support raster data, GML files can also be used to georeference jpg2000 images (https://www.ogc.org/standards/gmljp2)
INSPIRE dataset: Airspace Area https://inspire-geoportal.ec.europa.eu/download_details.html?view=downloadDetails&resourceId=%2FINSPIRE-f670705f-f4e9-11e6-81e4-52540023a883_20220601-135202%2Fservices%2F1%2FPullResults%2F201-220%2Fdatasets%2F9&expandedSection=metadata
KML is a derivative of GML used for styling of geographic information, and includes a specific set of features for use in Google Earth and Google Maps. Developed by Keyhole Inc. for use in their Keyhole Earth Viewer, Google acquired the company in 2004 and a year later released Google Earth. Features from the platform were also incorporated into Google Maps. These files are identified by the .kml extensions and a compressed version of KML has a .kmz extension. KML became an OGC standard in 2008 and as part of this standard, a camera view can be used to control what the user sees, add image overlays, captions and icons, differentiating it from other geospatial data standards.
Example File
Real-Time Earthquakes https://services.google.com/earth/kmz/realtime_earthquakes_n.kmz
Leveraging JavaScript Object Notation (JSON), the GeoJSON file format has several defined geospatial datatypes used for easily sharing data online. Compared to XML, JSON files are smaller in size as they rely on key-value pairs instead of a tag-based structure. Recognized by the extension .json and .geojson, GeoJSON files can include vector data with one or more point, line, polygon, multi-point, and multi-line features. All GeoJSON files have the same geographic coordinate reference system (WGS84) simplifying the visualization of these data. Most web maps have libraries to view GeoJSON files, making it a great choice for sharing and visualizing data online.
Native Territories https://native-land.ca/wp-json/nativeland/v1/api/index.php?maps=territories
Built upon the open-source database SQLite, GeoPackages provide a platform-independent, self-describing, and compact format that supports both raster tile matrix sets and vector data. These geospatial files have the extension .gpkg and can store many geospatial layers together in one compact file. Beyond sharing data with others, geopackages are useful for field data collection when internet access is intermittent or non-existent as data can be stored locally and later synced to a remote system.
Global Urban Street Networks GeoPackages https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/E5TPDQ