What we mean by 'Spatial Data Quality'
What we mean by 'Spatial Data Quality', and why it matters
What do we mean by Data Quality?
Whenever we discuss the concept of data quality the biggest challenge is how to decide what we mean by this term, as data that is “good enough” for one purpose may be “totally useless” for another. For this reason, we look to avoid the subjective approach and instead try to determine if data is “fit-for-purpose” by considering the utility value of the data (i.e. the expected value that is derived from the utilisation of the data).
There are two main elements to be considered in determining the utility value of the data. The first is the direct impact when data quality is below expectation, and the second is the lost opportunity cost which, although harder to measure, can often be the more significant.
We can recognise data quality impact under a number of headings:
- Financial (increased costs, delays, missed opportunities, penalties, etc)
- Confidence (customer/employee/supplier satisfaction, low organisational trust, low confidence in forecasts, etc)
- Productivity (increased workload and decreased productivity, production delays, sub-standard end product, etc)
- Risk and Compliance (not conforming to regulations, investment risk, competitive challenges, etc)
To avoid these impacts it is necessary to put in place appropriate validation processes that allow us to assess, measure and control the quality of the data. Ideally the results should also be validated by an external data audit to verify the results and establish stakeholder confidence.
The primary objective must be to maximise the recognised utility value of the information, based on reducing the negative impacts that are consequences of not delivering “fit-for-purpose” data.
How does this apply to Spatial Data?
Although it may be difficult to quantify the monetary value of spatial data quality, we can readily see the impact it has in our daily lives, for example:
- Road network data must be properly connected for satellite navigation systems to function correctly.
- Cadastral information must be accurate to support a functioning property market, provide security to allow investment, facilitate provision of services and enable valid taxation.
- Built environment information must be correct in order to support urban planning, environmental protection etc
- Utilities infrastructure information must be accurate in order to ensure safe and effective asset management and maintenance.
In extreme situations, for example when emergency services are sent to the wrong location or they find that they cannot get to the correct location due to incorrectly recorded infrastructure restrictions, accurate spatial data can be a matter of life or death.
The type of data that most people are used to dealing with is typically a record of a simple fact that is either right or wrong, such as Date of birth or Bank account balance. However, when we are working with spatial data representing real-world features, we are usually dealing with a simplified model that is intended to be used for particular activities. It may be fit for that specific purpose but may be totally inappropriate for another purpose. For example, a building that is represented as a point location may be perfectly adequate for identifying a delivery point for a courier service, but it is completely unsuited to an analysis of the percentage of land area that has been built on in a town. Likewise, a parcel of land that is captured from low resolution aerial photography and represented as a simplified polygon may be perfectly adequate for an analysis of land use in an environmental study, but it will be totally inadequate if it is to be used in the transaction for the sale transfer of a part of the land.
Spatial data is particularly sensitive to two specific quality measures:
Accuracy
- Positional – the geometric representation of the location and shape of the feature
- Topological – the spatial relationships between the features properly reflecting what exists in the real world (e.g. are the pipes connected?)
- Temporal – how a feature changes over time (e.g. a house extension or coastal erosion)
- Thematic – the classification (e.g. is it a river or a canal?)
Completeness
- Missing – is required data missing e.g. does a pipe have an appropriate connection.
- Detail – is the required level of feature detail captured e.g. is the land parcel representation suitable for use in a property transfer of part, is the building represented in CityGML LOD3.
The richness and complexity of spatial data models representing real-world features demands that the quality measures ensuring fitness-for-purpose are comprehensively determined and that robust, trusted processes to verify compliance are put in place. This allows a transition from fearing that the data is not of high enough quality – and suffering the effects – to knowing the exact level of quality and being able to measure and improve it.
The dependencies that exist between data are also important, especially when it comes to geospatial data. A lot happens in the same place, so it makes sense that we have many different perspectives about space, recorded to model it. This means it is important to validate any assumptions you might have made about dependencies, and any facts that have shared meaning between your data. Frequently, it might be that legislation, or certifications for some datasets make them authoritative data, such that it influences how you define your accuracy and completeness of spatial data quality rules.
1Spatial’s core business is in making geospatially referenced data current, accessible, easily shared and trusted. We have over 40 years of experience as a global expert; uniquely focused on the modelling, processing, transformation, management, interoperability, and maintenance of spatial data – all with an emphasis on data integrity, accuracy and on-going quality assurance. We have provided spatial data management and production solutions to a wide range of international mapping and cadastral agencies, government, utilities and defence organisations across the world. This gives us unique experience in working with a plethora of data (features, formats, structure, complexity, lifecycle, etc.) within an extensive range of enterprise-level system architectures.
Find out more by downloading our 'Little Book of Spatial Data Quality'
Listen to the Podcast - Ep.1 'Data Quality'
Tune in to Episode 1 of our podcast where CTO Seb Lessware and CPO Bob Chell discuss what we mean by Data Quality
Listen now