Thursday, January 14, 2010

An Essay on Uncertainty in Spatial Data Integration

Spatial data come in a variety of forms from oral or text descriptions and values, to any number of digital representations of the real world.  There are a plethora of techniques used to describe the world in space through raster and vector data alone.  The quality of such data however can be described by addressing four key factors.  Accuracy measures how close data can matches true values and descriptions.  Not all values can be exactly measured.  Due to human error or the detection/precision limits of equipment accuracy can easily be skewed.  Accuracy however, is largely associated with scale – decreasing scale increases accuracy.  Precision – reproducible performance qualities – measures how exactly data measured and stored.  In an assessment using high precision, several errors can be repeated with similar patterns of the same error, however low precision will yield little or no representation to the original error.  Error is a typical deviation or variation from reality.  Finally, uncertainty is the overall doubt or lack of confidence in data.  Though error and uncertainty are similar, the discrepancies are known and can possibly be avoided, where uncertainty deals with the absence of knowing the truth of a situation.  Moreover, spatial data quality is a measure of how well GIS data represent reality.

Due to the fact that a map is a model of reality and that model can never be a completely valid representation of reality, it is important to expect a certain level of uncertainty when working on a project as a quality control or assurance method.  Further, the differences in data types that are attempting to represent the same real-world features inherently exhibit a degree of variation.  For instance a raster image and a vector representation of the same coastline will align separately; by any number of spatial units.  Adjusting the scale at which the features are obtained may alleviate this misalignment.  Combining vector data digitized separately will rarely align perfectly.  Thus, combining two such coverages will result in sliver polygons, disconnected vertices, or polyline dangles.  It will be necessary to use overlay tools – such as a spatial join, union, etc. – to aggregate datasets while avoiding unwanted errors.

Storing information in metadata files is another powerful method that can be used to track errors that are present in GIS application.  Metadata is a means of recording information about a given dataset.  It can describe, among many other matters and parameters, which methods have been used to capture data in a coverage.  Providing such information can easily and efficiently inform others who use the dataset what he or she should anticipate as far as what deviation or error occur.  A lecture in class discussed how improvement in awareness of data quality is necessary, as data quality can have large impacts on geographic data analysis if it is not addressed properly; or even overlooked entirely.

Scale is another issue that may assist in the avoidance of uncertainty.  Determining an appropriate level at which any error that may occur in a dataset will be minimized is a sure way to improve the accuracy of a dataset.

Controlled uncertainty is another spatial data quality issue that works in inverse from the previously discussed methods.  Deliberate degradation of data is used as a mechanism for protecting the data at hand, whether a dataset covers records, or locations whose location or confidentiality may be guarded.  The Fisher reading summarizes well that analysis performed on a project without accommodating for uncertainty will have significantly degrade its validity, and therefore its usefulness is questionable.  By planning for a certain degree of error, it is possible to continue to create valid results.

4 comments:

Anonymous said...

Dear Justin,

Hi my name is Muharrir. I am quite interested with blog which is full of information and variety of ideas on geography. Hopefully, based on your excellent knowledge on GIS, i hope that your can help me to solve my problem.

For your information,I currently doing a research on developing a spatial model for entertainment
outlets (e.g pubs & bars) related crimes. Initially, i planned to point on the map the reported crime on location and subsequently to calculate the distance between pubs with the location of reported crime. But now i have a problem with my data in particular crime data from the police. They just gave me a crime statistics according to the types of crime.

My question is, what or which suitable method can be applied particularly with GIS in order measure the distance between pubs and with problematic crime statistics?

TQ

Justin said...

Greetings,

Does the dataset that you were provided by the police department have any kind of a spatial attribute associated with instances of crime events, or were you only given calculations?

Spatial attributes range from latitude/longitude coordinates, an address, a name of a neighborhood, or any other description of where an event occurred in a study area. If this type of dataset is available you have a number of options. You can either project the lat/longs onto a base map, geocode addresses onto a network, or manually digitize points and perform a tabular join to associate the dataset with the locations on the map.

Unfortunately if you were provided only raw statistics (for instance, if your information only states facts like, "20% of crimes were alcohol related,") you'll have to continue searching for a more appropriate dataset to use in a mapping project.

Once you are able to plot the crime data on a map there are a wide variety of spatial investigation techniques to try in your project. If you're using ArcGIS, you will want to use some of the tools in the Spatial Analyst extension to help identify interactions between certain of crimes near businesses.

You can look at the density of all crime in an area by interpolating the events (or just certain types of events, rather than all crimes) as a raster surface. These types of results make great maps that are very effective in demonstrating your point.

Another option would be to obtain a layer of land use from the city planner, or other government office. This will allow you to look at what crimes happen in relation to certain land uses (parks vs. commercial vs. residential, etc.).

Since you are interested in linear distances from bars, you can create buffers from these locations and count how many crimes happen within 100 meters of a bar, then 200 meters, and so on. Again, you can look at just a subset of types of crime and run the query a few different times (first look at violent crimes, then theft, then again with non-violent crimes like public disturbances, etc.).

This is an interesting topic to research and there are many directions you can go with it. I'll be glad to help further if you send your e-mail address. Otherwise, search Google for anything that doesn't make sense. There's a lot of information on the Internet and in journal articles that may be helpful as well. Good luck!

Anonymous said...

Hi Justin,

It seems that you are really understand very well on my research topic. Well, as what you assumed, i had been given only a raw statistics such as rape=123, murder=234, snatch thief=456 etc.This because in my country, Malaysia, the data on crime events are totally restricted to the public. Unlike in
your country or the other developed countries, the public can easily downloading that kind of information from the police's website. Therefore, if i don't have any other option, is it possible if i use the Hawth Tools like sampling tools for generating random points for predicting of crime related bars/clubs through a regression model?

I think that the availability of crime events data are critically vital for my research because as you said it can be examined or relate with other data set such as land uses data set.

I also did some literature review and found that some of the geo-spatial techniques had been successfully applied as what you were already mentioned.

Anyway, zillion thanks for your help and i looking forward to hear further information from you. Anyhow, you can contact me via my email : muharir7@gmail.com

Have a nice day!

Essays Writing said...

Many institutions limit access to their online information. Making this information available will be an asset to all.