caite.infoli - Introduction to. Geographic Information Systems. 3. We define GIS ( Geographic Information System) as a structure constituted by a powerful set of. References. Cartensen, Laurence W. and Henry, Norah F. Digital Mapping and Geographic Analysis: An. Introduction to Geographic Information systems. 1) Introduction. This manual was prepared for GIS training courses organised by the Crop Crisis Control. Project (C3P) in the “Great Lakes” region of East Africa.
|Language:||English, Spanish, Dutch|
|ePub File Size:||24.48 MB|
|PDF File Size:||14.15 MB|
|Distribution:||Free* [*Regsitration Required]|
Kang-tsung Chang. The interpolators such as TIN (Triangular Irregular Networks), IDW (Inverse Distance Weighted), RBF (Radial Basis Function) and LPI (Local Polynomial Interpolation), are classified as deterministic methods. Bilinear interpolation technique has been used in this. Introduction. Geographic Information System (GIS) is a computer based information system used to digitally represent and analyse the geographic features. 1 A gentle introduction to GIS Computer representations of geographic information. .. The book is also made available as an electronic PDF document.
This definition of geographic data includes the information necessary to create, store and utilize digital representations of the earth as well as the characteristics associated with specific locations and areas. The early years of GIS were highlighted by this type of system that required substantial financial resources, depended upon the programming skills of the developers, required the development of digital base maps from scratch, and had relatively limited statistical analytical capabilities. Free Access. Hierarchial Linear Models: Abstract This chapter presents an overview of the development, capabilities, and utilization of geographic information systems GIS. Understanding GIS:
Skip to Main Content. John P.
Wilson A. Stewart Fotheringham. First published: Print ISBN: About this book This Handbook is an essential reference and a guide to the rapidly expanding field of Geographic Information Science.
Designed for students and researchers who want an in-depth treatment of the subject, including background information Comprises around 40 substantial essays, each written by a recognized expert in a particular area Covers the full spectrum of research in GIS Surveys the increasing number of applications of GIS Predicts how GIS is likely to evolve in the near future.
Author Bios John P. Free Access. Summary PDF Request permissions. PDF Request permissions. Tools Get online access For authors. It is the utilization of GIS as tool for conducting spatial analysis that is the primary focus of this chapter.
The definition of GIS has changed over time in response to the broad applications it is now used for and in response to the definition as viewed through the lens of the end user.
The development of GIS paralleled other technological developments such as computer information systems, software, and analytical algorithms. This led to a moving target of definitions over time. Here are some examples GIS definitions:. Longley et al. It is apparent from these definitions that there has been a transition from viewing GIS as a computerized system for a specific application to a more general set of hardware and software tools that are used to facilitate the utilization of geographic information to analyze and model data, and to solve problems.
The key concept in the definition of GIS for this chapter is the focus on it as a tool for conducting spatial analysis. Geography is crucial because almost every activity, feature, or decision has a geographic component. Geographic data have some connection to spatial aspects of the earth, including all of the spheres associated with earth, e. This definition of geographic data includes the information necessary to create, store and utilize digital representations of the earth as well as the characteristics associated with specific locations and areas.
There are a number of critical aspects of geographic data that differentiates this type of information from other types of data. All geographic data is multidimensional. Location requires some form of a spatial reference such as an x, y coordinate, or latitude, longitude component, plus some associated definition or attribute e. Geographic data, especially the digital base maps, is extremely voluminous. For example, the number of street segments in a single US county of about a million people Erie County, NY is about 60, The database that is used by the GIS to create and be able to utilize this base map is much larger because multiple pieces of data are required for each street segment e.
The size of the entire street network is in the range of a gigabyte. The number of parcels in Erie County, New York is over ,, including all types of property. These examples provide an idea of how much data needs to be manipulated by a GIS to perform mapping and spatial analysis. Attribute data e. Geographic data can be stored and analyzed in a GIS in a number of ways. The two most relevant approaches to storing geographic information are as vector or raster representations. Raster data often are images represented by the number of pixels in a row-and-column format that compose the image.
The number of pixels or cells can be quite large, especially for a high-resolution image. Each point within a raster data set has an implied location based on its relationship to a single known location on the raster image, which can be determined by the GIS. Vector data representation is based on the exact location of geographic elements, such as points, lines, and areas.
Data storage is usually more efficient for vector data because the geographic features can be represented by points nodes that are connected by lines arcs to form the features, whereas usually all of the raster cells need to be stored. Commercial GIS can typically handle both vector and raster data, including switching between formats when necessary.
For example, a GIS can be used to identify point features such as buildings, and lines features such as streets, from a remotely-sensed raster data image and then save this information in vector format. Geographic data also exists at a number of scales. Scale is the relationship between the actual size of an object and its representation in an abstract form, such as a digital base map.
Map scale is often presented as either a scale bar on a hardcopy or computer image, or through the representative fraction, which shows the ratio of the abstract to the real world. For example a representative fraction of 1: Scale defines the level of resolution; small scale e.
Scale is relevant because different processes may occur at different scales. For example, it would not be relevant to conduct an analysis of local neighborhoods or of micro-erosion processes using a digital base map at a scale of 1: Scale and geodetic accuracy are closely related. Some types of applications do not require high-level accuracy. Social and health applications may only need to place the locations of attributes e. Great care is necessary for conducting geographic analyses using data sources at more than one scale, to avoid such issues as the modifiable areal unit problem [ 21 ], which is an extension of the ecological fallacy concept.
Mathematical transformations of this spherical, three-dimensional coordinate space, called a map projection, are required in order to accurately produce maps of earth on a plane, such as a hard copy map or a computerized image. Projected coordinates are two dimensional. There are four primary properties that are correct in a spherical representation of the earth that must be considered when projecting onto a flat surface: Map projections can maintain some but not all of these properties and in turn different projections have been developed in order to achieve accuracy in specific properties for the purpose of representation and analysis, e.
Most commercial GIS packages will automatically display latitude and longitude-based coordinates as planar x and y coordinates.
For small areas, the effect of the projection is small or negligible and may not impact the visual representation of the map within a GIS. However, even at the county-level, there may be distortions in the appearance of polygons in terms of size and shape. Additionally, if precise measurements or analyses are to be conducted, a projected coordinate system that maintains a high level of locational accuracy should be used. The Universal Transverse Mercator coordinate system, for example, is a commonly used grid-based system of projections comprised of a series of sixty zones with minimal local distortion.
The various sources and high level of complexity of geographic data, especially the data required for creating the underlying digital maps utilized in a GIS, create a major need for organizational standards.
In the US, the National Geospatial Program [ 22 ] coordinates several programs for the creation and implementation of data standards including the Federal Geographic Data Committee, the National Map, and geodata. This topic also highlights the challenges of conducting geographic analysis using GIS in areas of the world that are less developed and lacking in digital map resources.
However, the utilization of global positioning system information can be a major assistance in creating data that are amenable to GIS applications. GIS, like any tool, can be a boon or a bane depending on the relevance of the application.
If there is no relevance, then GIS and spatial analysis is an inappropriate tool for that situation.
However, geographic relevance, while not universal, is quite ubiquitous. Spatial dependence is the statistical recognition that some entity or process is spatially distributed in a non-random manner. If there is no spatial dependence, then spatial analysis is not relevant. The explosive growth in GIS utilization since the early s is a strong endorsement of the fact that much of what exists or occurs on the earth is not randomly distributed. Spatial statistics [ 24 ] are based on exploiting and understanding these spatial dependencies, including networks, spatial regression, spatial clustering, and simple statistics used to identify autocorrelation.
High negative spatial autocorrelation indicates that there is a pattern in the spatial distribution, not a simple clustering. Low absolute values indicate a lack of spatial dependence. GIS facilitates the utilization of spatial statistics and modeling because it automates procedures necessary for the calculation of spatial statistics. Spatial dependence can also be explored in number of additional approaches.
Visualization techniques capitalize on the ability of GIS to display spatial information in a various ways, including animations, three-dimensional representations, and with changes over time [ 26 , 27 ]. The utilization of GIS and spatial clustering approaches are integral aspects of data mining and knowledge discovery in databases [ 28 , 29 ].
The development of conditioned choropleth maps, which permits the dynamic visual examination of a dependent variable and two potential predictor variables, highlights the intersection between GIS, statistics, and visualization in an application to generate well-informed, relevant hypotheses [ 30 ].
The role of GIS is critical to hypotheses in two ways: An understanding of these capabilities provides the background needed to initiate the use of GIS for a specific application. Although GIS can utilize both raster and vector information, raster data is not truly geographic because it is just a simple array of values. Geographic data elements are those entities that one would readily recognize in the real world or on a map.
The three elements of spatial objects are points, lines, and polygons areas. GIS uses coordinates to represent these three geographic objects. Combinations of these three objects are able to represent any geographic entity or the attributes of a geographic entity.
Figure 1 shows examples of the three types of geographic data elements. The points show the location of a specific attribute, which in this example could be buildings. Points are located using x, y coordinates and are considered to have zero dimension. Many objects or events relevant for GIS would be represented as points, such as crime locations, mail boxes, wells, individual trees and so on. However, recognize that scale and resolution often have a role on how an entity will be characterized.
For example, a GIS analysis at the scale of North America is likely to display individual cities and towns as points, but they would not be conceptualized in this manner for an analysis of crime within a single city, where the crime locations would be represented as points. The lines on figure 1 represent roads. The thickness of the line is based on the type of road. Note that lines and arcs are synonymous in GIS terminology.
Lines have a single dimension and are represented by the GIS as points connected by arcs. The lines in figure 1 define areas. The areas are polygons of any shape. Polygons are represented in the GIS by a closed set of lines that define a specific area.
These polygons are seen as the areas in figure 1 marked by the letters A and B. A crucial aspect of GIS is that it retains the topological relationships between the geographic data elements. Topology, the mathematics of spatial relationships of connecting adjacent features, is critical for modeling, routing, network analysis, and spatial statistics.
GIS also retains whether points can fall on a line or within a specific area. Thus, in figure 1 , the GIS recognizes that area A is to the left of the north-running line road and that area B is to the right of that line road. Georeferencing, also called geocoding, is the ability to specify the location of geographic data. This section on georeferencing focuses on how to create a georeference for attributes.
The creation of the digital base map is a part of computer cartography that provides the tools needed for conducting spatial analysis using GIS, including the development of digital base maps and associated databases used for geocoding.
Georeferencing applies to any method that is able to link some entity to its location in a GIS. This can apply to point, line, or area data. Census tracts, towns, zip-code areas, and counties are examples of areas that can be georeferenced.
The process for these types of entities is usually relatively simple because either place names or specific codes can be matched in a database that links the characteristics of these places to locations on a digital base map in a GIS. The georeferencing process for full address information is more complex, but is readily accomplished for most areas where the base map includes address and street-level information, such as the TIGER files from the US Census. Figure 2 provides an example for georeferencing the address for Buffalo State College.
The college is represented as a point showing the address of Elmwood Avenue. The georeferenced database in figure 2 shows the topology used to link the address to the digital base map. Notice that there is a left and right side of the street address ranges.
This process can be accomplished automatically for large databases, making GIS-based spatial analysis readily applicable to any attribute database that has an address. The digital base map addresses could also be in the form of a parcel database that includes each specific address along a block, rather than the range of addresses.
One main capability of a GIS is to measure distances between objects and to identify whether objects are adjacent to one another. The use of coordinate systems in GIS makes distance measures relatively simple to accomplish, taking into account the sophistication of scale and projection issues. Figure 3 shows the distance measurement capabilities of GIS.
This example shows the distances between a number of points. Distance has many obvious uses, but the most relevant and not as obvious one is for spatial analysis and statistics. The bottom section of figure 3 shows a database output of the distances between the points labeled by zip code.
This information is a non-map format output of GIS that is a critical input for point pattern analysis and related statistics [ 31 ]. Figure 4 shows the same locations, but is now focused on adjacency, i. Using GIS, adjacency can be measured in number of ways, including a binary yes 1 or no 0 , or by the length of the shared boundary. The figure 4 example provides a contiguity matrix of binary adjacency for zip code area This adjacency matrix would be needed for many types of spatial statistics.
Also note that some geographic models assess lagged relationships i. Essential to the use of GIS is the ability to overlay multiple layers of information and access these various layers simultaneously. Figure 5 shows an example of the overlay function. Points, such as crime locations, lines, such as major roadways, and polygons, such as police districts, are combined into a single digital map by the GIS.
GIS can also count the number of crimes in each district in the final overlay. The counting of events or places within a specific geographic area is often needed to facilitate multi-level hierarchical models [ 32 ].
The overlay function facilitates spatial analysis by the ready creation of combinations of information, by creating new forms of information by allocating points to areas for a new area-based metric e.
For example, if one wanted to find a location that was within a specific police district, within a specific distance of a main road, and also a specific distance from a the nearest crime, a query could be written to locate the places that meet those requirements. A spatial buffer identifies a specified area around a specific geographic feature.
Buffers are useful for identifying neighborhood-related factors for decision-making e. Buffers combine the distance measurement capability of GIS by applying it to various features. Figure 6 shows a radial buffer with the Buffalo State College address as the centroid. GIS buffers would enable counts of students or housing or whatever other layers of data that could be available. The use for policy and planning for this function is obvious, but hard to duplicate without a GIS.
Figure 7 shows a buffer around a line feature, in this case a road that has public transportation. This figure illustrates how a buffer can follow the shape of a more complex feature. The buffer could be used to identify patients or workers that have ready access to public transportation.
One can easily imagine how spatial buffers can be combined with multiple overlays and complex queries to facilitate geographic decisions and to create measures that could be used in a variety of statistical or modeling applications. GIS provides the capability of reclassifying data in an automated manner. The example in figure 8 is based on reclassifying one specific attribute poverty rate in four different ways.
Map a in the figure shows an equal interval distribution. Equal interval classes are based on creating categories of the attribute that are defined by an equivalent range e.
Note that an equal interval classification does not usually result in an equal proportion of the distribution in each category, as can been seen in figure 8a. Figure 8b shows a classification based on using natural breaks, which utilizes the distributional characteristics of the attribute data to create categories that reflect the majority of the areas as middle ranges and the extremes of the distribution as smaller number of areas.
Figures 8c and 8d are based on quantile classifications, which allocate the areas into categories that consist of an equal proportion of the areas in each category.
Figure 8c uses a quartile approach; this creates four categories which can readily be interpreted as two categories consisting of the areas below the median and two areas above the median.
Figure 8d shows a quintile classification approach. Reclassification can be especially useful as a visualization technique; notice in figure 8 how certain areas are recognizable as having a high poverty rate regardless of the classification scheme. Geodatabase is the term used to describe the database that contains the information relevant to a specific spatial analysis or application [ 33 ].
A geodatabase is scalable data architecture that allows the storage of all aspects of the geographic application in a relational database format. This approach allows for greater portability and sharing of specific projects or applications while facilitating complex queries.
A geodatabase integrates the GIS application software e.
This collaborative project constitutes a data-driven decision-making approach using small area risk factors, where spatial forms of data are used to assess phenomena across space and to make informed decisions about the most appropriate individual and system-wide responses. These small area risk factors are used to take advantage of available sources of information to improve the planning, provision, and impact of services at the local and county level.
This study illustrates many of the main capabilities of GIS; in this case, GIS was used to facilitate a geographic-based needs assessment, a spatial cluster analysis, and to show that the high-risk areas also overlap with the spatial clusters of individual drug users.
It is impossible or impractical to measure specific outcomes in an entire population. However, information is available on factors associated with the phenomenon, such as economic deprivation, crime, and community disorganization in the case of substance use. Rather than trying to measure all of the specific behavioral outcomes that are of interest, such as early drinking and adolescent drug use, social indicators provide a more economical and efficient way to assess the well-being of different populations and sub-populations of interest.
The use of indicators is an indirect method of needs assessment for services as it shows the relative need against the other locations in the vicinity and can help to estimate the actual need for service in some situations [ 34 ]. While risk factors and social indicators are particularly convenient sources of information for researchers and policy makers since they often can be created based on publically available data, they are also effective in providing organizations local and regional governments and service providers with information about local problems on which to focus such as poverty, alcohol availability, and crime in addition to the specific behavioral outcome of interest.
This information can be used to tailor services to specific characteristics of the population so as to enhance the effectiveness of interventions. The indicators provided in the examples here are drawn from the Erie County Risk Indicator Database RIDB and are based on the risk and protective factor model of substance abuse, delinquency, and other problem behaviors developed by Hawkins and Catalano [ 34 ]. Quartiles are used to quantify the level of risk, which is a way of assessing need for services, because these analyses are focused on comparing the risk of small areas relative to one another, as well as for their interpretability i.
An important component of the development of this database of risk factors was the validation of the indicators. This validation was carried out to address the question: Individual-level, alcohol and drug use and associated health outcome data from the Erie County Health Outcomes ECHO survey were used to assess the relationship between these indicators and the outcomes.
Many significant associations between the risk indicators and the behaviors of individuals from the same geographic areas were found, supporting the notion that the risk indicators are a valid measure of the need for prevention and treatment services. Figure 9 shows crime rate by zip code area, figure 10 shows the trauma death rate by quartile, and figure 11 shows a composite poverty index by quartile.
These maps indicate that the need for alcohol and drug prevention and treatment is not evenly or randomly distributed, as well as that each indicator shows a different aspect of the need for services.
Also note that these maps clearly show a number of key aspects such as overlays of major road, municipal boundaries, and the inclusion of a scale bar and a compass rose to help orient the end user.
Each map also provides an inset view of the city of Buffalo so that the details of the main urban area can be easily viewed. The general population sample of 3, total respondents aged to years old from Erie County was gathered using a random-digit-dial procedure during — The sampling frame consisted of all working telephone blocks in Erie County, New York and reflects the varying population densities throughout the county and is highly representative of the underlying population, which can be seen by comparing key figures such as race white: These comparisons indicate a minimum of non-response bias in the sample.
A total of 3, interviews were completed based on 5, eligible respondents, yielding a response rate of Of the completed interviews, were removed from the dataset because the home addresses given by the respondents was unable to be geocoded.
Additionally, for these analyses, lifetime abstainers of alcohol were removed from the dataset in order to better reflect an at-risk population of controls from which our cases were generated. Lifetime abstainers of alcohol are at substantially lower risk for developing illicit drug use problems when compared to individuals who have ever consumed alcohol. A total of respondents who were lifetime alcohol abstainers were removed from the dataset, bringing the sample size to 3, Survey respondents were questioned on usage of any illicit drug and in turn on usage of specific types of drugs cocaine, heroin, marijuana, LSD, etc.
Respondents who had used an illicit drug within the past twelve months were classified as current users of a drug. The marijuana use variable was analyzed directly, whereas all other drug use categories e.
Statistically significant spatial clusters can be defined as geographically bounded groups of events where the actual number of events exceeds the expected number when compared to a distribution such as Poisson or Bernoulli.
This method has been used to examine breast cancer rates at the county level [ 38 ], alcohol mortality at the county level [ 39 ], and West Nile Virus activity [ 40 ].
When the window encounters a new case, elevated risk is tested with the likelihood function on the events within the window compared to those outside, which allows both high and low clusters to be detected. The following equation is the likelihood function I for the Bernoulli model used in this research:.
The detected clusters are then tested against a simulated Monte Carlo distribution of the data set generated under the null hypothesis. This method allows multiple clusters of both high and low use to be simultaneously detected [ 41 ]. Secondary clusters have overestimates of their true p -values because they are compared to the most likely clusters from the simulations [ 42 ]. This method is especially valuable due to its ease of use particularly in combination with GIS , applicability to both point and area data, controls for multiple comparisons and population density, and incorporation of covariate and temporal analysis which can aid in its real-world implementation for surveillance of drug-related health problems and service assessments.
Spatial cluster analysis incorporated with the use of GIS mapping capabilities offers a wealth of potential applications in research of illicit drug-related phenomena in both searching for and analyzing identified clusters.
Spatial data of possible correlates or causes can be incorporated with detected clusters in GIS, but issues such as latency in exposure, migration and activity space of individuals within a population, and the differing influences of direct and mediated effects of environmental and social factors obfuscate the understanding of clustering processes and remain stumbling blocks for the development of more sophisticated and powerful theories and methods.
SaTScan v4. The user can specify the grid of coordinates used by the scanning window, frequently polygon centroids when using area data, as well as the maximum size of the scanning window as a percentage of the study population and the number of simulation iterations for the generated distribution. In this case, the analysis takes an object-oriented approach using the coordinates of the underlying population as centers for the moving circles and uses the default settings of a 50 percent scanning window and iterations.
The default scanning window settings is the maximum window size and allows for smaller clusters to be detected as well as the largest possible clusters. A higher number of iterations serves to increase the accuracy of resultant p -values, but also takes more time, whereas fewer simulations yield slightly more uncertain p -values.
This method can be computationally challenging in terms of time needed to run the cluster analysis for a personal computer, which was the platform used for these analyses. This analysis takes advantage of the overlay capabilities of these programs, allowing the user to layer multiple sources of spatial data on the clustered populations, risk indicators, street network, municipal boundaries, and other pertinent information.
Using the spatial output database of clustered and unclustered populations, appropriate statistical analysis such as cross-tabulations, mean comparisons and ANOVA can be utilized to compare clustered and unclustered population groups. Significant spatial clusters were found for this case study for both marijuana use and for hard drug use. The clusters were not the same for the marijuana use figure 12 and hard drug use group figure 13 , although there was some overlap in high use cluster members.
The high use clusters for both drug categorizations are centered in the city of Buffalo, though the hard drug use cluster is smaller and focused in the western and north central part of the city.
The marijuana high use cluster extends slightly beyond the city boundary. Additionally, the detected cluster of low use of marijuana extends across the southeastern part of the county, containing an area that is primarily suburban and rural in character; no low hard drug use cluster was detected, suggesting that usage patterns are similar throughout the county outside of the urban high use cluster.
The analyses discussed here are primarily focused on examining the relationship between the risk indicators and clusters of substance use by comparing these data sets.
These analyses assess: The tables show the proportion of each clustered drug user group that lives within the highest risk quartile for all of the RIDB indicators. Characteristics for Marijuana Use Clusters: Characteristics for Hard Drug Use Clusters: