The following document is intended for individuals interested in learning the basics of working with geospatial data.
Data can be stored in many ways, though it is more often represented in a tabular format. Spreadsheet software like Microsoft Excel and Google Sheets facilitate working with this kind of data. Data can also include columns of spatial information that correspond to a latitude and longitude, enabling it to be located on a map. As all geospatial data shares location information, it can be plotted together to draw new insights. Geospatial data is applied in many fields including: public health, agriculture, wildlife habitat management, and natural hazard forecasting, monitoring and response, to name a few.
Comma separated files (CSV) are some of the more common and widely distributed data files. This non-proprietary file format can be made and opened in most text edit programs, and use commas between the columns of information and a line break for a new row of data. This data format could be a bit limiting and has giving rise to many other data formats. While spatial data can be stored in CSV files, separating the component parts across multiple files has become a common convention with commonly used geospatial data management software. This has given rise to a range of file types, each well suited to their specific domain.
The method of converting the surface of the earth onto a flat plane (known as projection), combined with the relation to real world coordinates (known as scale) and the reference point (or datum), make up the coordinate system. In order to properly map geospatial data, it is important to know the coordinate system in use. If in doubt, check the documentation or verify with the data provider. For a more in-depth guide to Coordinate Reference Systems see docs.qgis.org/3.16/en/docs/gentle_gis_introduction/coordinate_reference_systems.html.
Geospatial data can be either raster (think pictures, where data is stored in a grid) or vector (using x and y coordinates).
Raster Data
|
Vector Data
Image source: GIS data acquisition: |
Vector data can be generated from Raster data and vice versa, though there is often data loss in the conversion.
Looking through the data to understand the information it contains is the first step towards gaining an understanding of its contents and omissions. Depending on the data type, tabular data can be viewed within a text editor or spreadsheet application, while specialized software is often needed to navigate and visualize geospatial data. The component parts of geospatial data include values, projection, geometry and data index, and specialized software store these parts in files with unique extensions .dbf, .prj, .shp, and .shx, respectively. There are many other file types to learn about, see shape file extensions for a more complete list. Software known as Geographic Information Systems (GIS) is commonly used to work with geospatial data and include desktop and web-based products, offering varying levels of functionality. Desktop GIS software include licensed and open-source options, with ArcGIS and QGIS being among the most popular. Web-based GIS offerings are more limited in functionality compared with their desktop counterparts, though this distinction continues to erode as web-based technologies improve. Google Maps and OpenStreetMap are two popular options used as the foundation for web-based GIS. The use of licensed and open-source programming language modules for working with geospatial data offer an alternative to GIS. MathWorks® Mapping Toolbox, Python CartoPy and R Simple Features are a few popular options.
As there are often many different files to keep track of when working with geospatial data, it is important to establish an organizational structure. Geospatial data software (GIS) merely links to the data location, making project organization important. Moving files and folders or renaming files after a project starts will cause broken links, so plan ahead! Before starting a new geospatial data project, consider setting up a specific location solely for data and another for project files. It is also a good idea to establish a naming convention for data to be used in a project. Depending on the project, it might be advisable to rename datasets to include the collection name and year along with the original file name to keep things clear.
Once a sound familiarity with the data is established, the use of data on its own or in combination with other datasets can help answer questions. The types of questions asked, and the potential techniques employed, depends on the data. In most cases the use of statistical methods will drive the data analysis. Cluster analysis is one type of statistic commonly used with geospatial point data which dates back to the mid 19th century and was used to help identify the source of a cholera outbreak. Land-use analysis, utilizing historical maps, census and health data revealed how the redlining practices of the early 20th century remains as present day inequality. And, remote sensing data analysis has also illuminating many patterns on our planet (and others) including the loss of rainforest tree cover.
As others might benefit from your research findings, it is helpful to publish your data online. Establishing a data management plan prior to embarking on a new project will facilitate data sharing. Among the considerations, this entails carefully documenting each aspect of your project to support replicability and reuse. To learn more about research data management (RDM) see https://lib.colostate.edu/services/data-management/.
Hopefully this guide has piqued your interest in geospatial data, its uses and potential applications. There is a wealth of online learning resources available at https://gis.colostate.edu/resources/learning-resources/, as well as, in-person (or virtual) support by contacting gis@colostate.edu.
Happy Mapping!