On the Contribution of Volunteered Geographic Information to Land Monitoring Efforts

Land-related inventories are important sources of geoinformation for environmentalists, researchers, policy-makers, practitioners, and ecologists. Traditionally, a considerable amount of energy, time, and money have been dedicated to map global/regional/local land use datasets. While remote sensing images and techniques along with field surveying have been the main sources of data for determining land use features, field measurements of ground truth have always amplified the required time and money, as well as information credibility. Nowadays, volunteered geographic information (VGI) has shown its great contributions to different scientific disciplines. This was made possible thanks to Web 2.0 technologies and GPS-enabled devices, which have advanced citizens knowledge-based projects and made them user-friendly for volunteered citizens to collect and share their knowledge about geographical objects. OpenStreetMap as one of those leading VGI projects has shown its great potential for collecting and providing land use information. The collaboratively collected 270 European Handbook of Crowdsourced Geographic Information land use features from diverse citizens could greatly back up the challenging element of land use mapping, which is in-field data gathering. Hence, in this literature we will look at the completeness, thematic accuracy and fitness for use of OpenStreetMap features for land mapping purposes over European countries. The empirical findings reveal that the degree of completeness varies widely ranging from 2% to 96% and overall and per-class thematic accuracies goes up to 80% and 96%, respectively compared to the European GMESUA datasets. Furthermore, more than 50% of land use features of eight European countries are mapped. This messages that the harnessing citizens’ knowledge can play a great role in land mapping as an alternative and complementary data source.


Introduction
Land cover (LC) and land use (LU) inventories contain geoinformation on the coverage and usage of our surrounding lands, respectively.LU and LC inventories are of high importance for many applications with regards to urban and regional planning, policy making, among others.These two concepts present two distinctive concepts, because LU maps explain human activities happening on the land, such as artificial surface construction, farming, and forestry that represent the usage of land (Ellis 2007;Wästfelt & Arnberg 2013), while LC maps present the physical cover on the ground (De Sherbinin 2002).Traditionally, applying image processing algorithms on remotely sensed data elaborated with ground-truth measurements and other complementary archive data have been the main source of collecting LU and LC features (Qi, Yeh, Li & Lin 2012;Saadat et al. 2011).Although remote sensing images and techniques often facilitate earth observation efforts, in-field surveying as well as personal interviews with local residents are required for the sake of results' validation, i.e. as groundtruth data coming from in-situ measurements play a critical role in delivering end products (Cihlar & Jansen 2001;De Leeuw et al. 2011).Therefore, we have to collect ancillary data as well in order to assign appropriate LU types to land parcels.As a result, LU mapping becomes even more complicated than LC mapping, and extensive data collection from local citizens, land managers, and evidence sources are vital for accurate LU mapping (Fritz et al., 2012).
From financial and temporal perspectives, a great deal of budget and time have been dedicated for producing LU and LC maps at global, regional, and local scales.Examples of global and regional scale with coarse resolution products include Global Land Cover (GLC)-2000 (Fritz et al., 2003), Moderateresolution Imaging Spectroradiometer (MODIS (McIver & Friedl, 2002)), and GlobCover (Arino et al., 2012), CORINE 2000 (Büttner, Feranec, & Gabriel, 2002) and Global Monitoring for Environment and Security Urban Atlas (GMESUA (Seifert, 2009)) among others.In the case of GMESUA, highresolution images including SPOT, RapidEye, and ALOS Images have been utilized to generate fine-scale maps of large metropolitan areas delivering GMESUA (Kong, Yin, Nakagoshi, & James, 2012).But, the accuracy of them has been the main concern as outlined by (Fritz et al., 2012;Herold, Mayaux, Woodcock, Baccini, & Schmullius, 2008).Thus, the necessity of having an alternative and complementary solution for mapping LU and LC features is evident.We believe that VGI could be of great importance, because the development of web technologies and large availability of GPS-enabled devices have resulted in the emergence of a large number of VGI platforms, which provide information about geographical objects from citizens (Fonte, Bastin, See, Foody, & Lupia, 2015).The majority of the VGI-like platforms offer very highresolution satellite and aerial images (from 20 cm spatial resolution) through image libraries (e.g.Bing Maps) in their interfaces, which enable volunteers to visualize the whole globe with high detail so that they can map a large variety of features and attach respective attributes to them (Rouse, Bergeron, & Harris, 2007).In other words, a sort of visual analysis and interpretation of satellite images is applied.This convenient and straightforward way of visual interpretation of remote sensing images can be considered as an alternative solution for LU mapping and even achieving finer resolution LU maps than our current stored datasets at a global scale (Jokar Arsanjani, Mooney, Helbich, & Zipf, 2015).Undoubtedly, OSM has been a pioneer example of VGI and has shown its huge potential for being the Wikipedia of maps exactly as its motto.OSM is a unique platform for several reasons namely, it has attracted a huge amount of public attention and contributions (Ramm et al., 2011) by having exceeding 2.3 million users until today and continues to grow as outlined by Jokar Arsanjani, Helbich, et al. (2015).More importantly, OSM is highly democratic in receiving contributions through enabling any volunteer to add/edit/modify the existing features and sharing the whole data history freely and openly with the public in a structured way (Flanagin & Metzger, 2008;Koukoletsos, Haklay, & Ellul, 2012).Moreover, OSM collects geographic information in the form of GIS vector data such as points, polylines, and polygons and releases them based on different tags, which makes it quite user-friendly for end users (Jokar Arsanjani, Helbich, Bakillah, Hagenauer, & Zipf, 2013;Jokar Arsanjani, Mooney, Helbich, et al., 2015).
An extensive amount of analysis of road networks in OSM has been carried out (Ludwig, Voss, & Krause-Traudes, 2011;Mooney & Corcoran, 2012) and a few attempts in analyzing OSM for LU mapping has been conducted.We will assess the role of OSM in LU and LC mapping.Besides preparing a LU dataset from OSM contributions, we aim at a) measuring the completeness of OSM LU features, b) cross-comparing the thematic accuracy of the OSM LU features with the GMESUA data through a statistical assessment, c) assessing the fitness for use of OSM for LU and LC mapping.

OSM dataset
A snapshot of OSM features tagged as 'natural' and 'landuse' from November 2013 and February 2014 was collected.The features tagged with 'natural' describe a wide variety of physical features, which are categorized into different categories such as water bodies, forest, etc. as described in (Ramm, 2014).The term 'landuse' concerns the human use of land, which represents the purpose a land parcel is being used for.

Reference dataset
In this study, the pan-European GMESUA dataset serves as reference data, which comprises LU data for selected metropolitan areas exceeding 100,000 inhabitants.It is prepared for European needs and the contained information has been derived mainly from Earth Observation (EO) data supported by other reference data including commercial-off-the-shelf (COTS) navigation data and topographic maps.It has a minimum mapping unit (MMU) of 0.25-1 ha, and a minimum width of linear elements of 100 m with ± 5 m positional accuracy (European Union, 2011).It currently covers 305 urban regions within Europe.The minimum thematic accuracy for all classes is 80%.For more details see the Urban Atlas mapping guide (European Union, 2011).Table 1 represents the defined classes and their codes in GMESUA at different levels of details.

Study areas
In this study, the whole European continent was chosen as the study area for the regional scale analysis and ten random metropolitan areas were selected as case studies for the local scale analysis.These cities including their metropolitan areas are Berlin, Frankfurt am Main, Munich, and Hamburg, Bucharest, Rome, Stockholm, London, Budapest, and Vienna.Having multiple case studies from different countries would help to understand the heterogeneity of contributions in terms of quantity and quality.
External quality measures the suitability of a dataset for a particular purpose and addresses its 'Fitness of Use' (FoU: (Devillers, Bédard, Jeansoulin, & Moulin, 2007;Guptill & Morrison, 1995)).The major standard organizations (e.g.ISO, ICA, FGDC, and CEN) have described their main criteria for data quality analysis and the following five criteria are common amongst them: (1) completeness, (2) positional accuracy, (3) thematic accuracy, (4) temporal accuracy, and (5) logical consistency (Guptill & Morrison, 1995).In this study, two major aspects of internal data quality namely completeness and thematic accuracy are considered and their external use is discussed.
Following Figure 1, first, OSM features tagged with 'landuse' and 'natural' are retrieved and merged together into a unique dataset.Second, overlaps and topological errors in the dataset are then resolved to assure the logical consistency of features.Third, the OSM features are re-classified and matched according to the GMESUA nomenclature.Fourth, the percentage of completeness for each country/city is determined to measure how complete a certain city is mapped.Finally, an error matrix between the OSM and GMESUA datasets is computed to measure the overall thematic accuracy of the OSM features along with a detailed per-class analysis.

Regional (European) scale
Figure 2 represents the measured completeness indices across European countries.This is calculated based on the total mapped area in each country relate to total area of the corresponding country.The values are diverse.While only 1.6% of land use features in Iceland are mapped, 96% of Bosnia and Herzegovina are mapped.
More than half of Belgium, Bosnia & Herzegovina, Germany, France, Luxemburg, the Netherlands, Romania, and Slovakia are mapped.Spatial distribution of the mapped features within Europe is displayed in Figure 3 by green cells.It should be noted that considering European countries with dissimilar population and physical patterns, these completeness values should not be used for judging the topology of citizen participations in OSM.For instance, Iceland with an area of 103,000 km 2 and nearly 300,000 inhabitants is the least

Completeness (%)
mapped country, which is not comparable with the Netherlands, holding an area of 41,500 km 2 and nearly 17 million inhabitants, corresponding to one the best mapped countries (82%).Likewise, while the completeness index for Sweden is reported as almost 13%, almost more than half of this country is covered by forests.This justifies the low completeness index value as minor residents live there or mappers do not prioritize mapping forests.This heterogeneity and inequality of public participation should be further investigated as outlined in (Jokar Arsanjani & Bakillah, 2015).

Local (metropolitan) scale
The degree of completeness at local level i.e. metropolitan area in several countries was checked and a wide range of values from 39% for Frankfurt to 100% for Bucharest was achieved.These values are shown in Figure 4.

Thematic accuracy
Apart from completeness, thematic accuracy is a key criterion to judge about the quality of the contributed LU features.This is meant to explore how properly the land parcels are tagged.Thematic accuracy is basically called 'accuracy assessment' in the LU/LC classification studies, which reflects the difference between a target dataset against a reference dataset (Congalton, 1991;G. M. Foody et al., 2013;Giles M Foody, 2002).This is carried out through summarizing all data in a confusion matrix (i.e., error matrix) and calculating several indicators including 'overall/per class accuracies' , 'Kappa index of agreement' , 'user's accuracy' and 'producer's accuracy'" (Giles M Foody, 2002;Herold et al., 2008).In this study, a confusion matrix analysis is applied to reach these measures.A measure for the overall accuracy is calculated by dividing the number of identical pixels by the total number of pixels.However, it does not identify how well individual classes between the two datasets match.Hence, the user's accuracy and producer's accuracy should be calculated to measure the accuracy of each class (Herold et al. 2008).The user's accuracy indicates the probability that a pixel from the OSM LU map actually matches the GMESUA dataset, while the producer's accuracy refers to the probability that a specific LU type from the reference dataset is classified as such.These two measurements are not necessarily equal.For instance, if for a specific land type of 'farming' , with accuracies achieved of 75% and 82% for user's accuracy and producer's accuracy respectively, it implies that as a user of the data, roughly 75% of all the pixels classified as 'farming' are the same in the reference dataset and, as a producer, only 82% of all 'farming' pixels are classified as such (Jokar Arsanjani et al., 2015).
In order to assess how well LU types in each city are mapped, Kappa index, overall accuracy, user's accuracy, and producer's accuracy, are calculated.Due to heterogeneous accuracies across cities, interpretation of the confusion matrix is discussed for each city separately in (Jokar Arsanjani, Mooney, Zipf, et al., 2015;Jokar Arsanjani & Vaz, 2015).Further to this, the geographical distribution of agreements and disagreements is visualized in (Jokar Arsanjani, Mooney, Zipf, et al., 2015).In general, land classes such as Isolated structures [113], Industrial,   natural+wetlands [200], Forests [300],and Water [500] show the highest level of agreement in the two datasets.In contrast, the remaining classes show disagreement, assuming that they are correctly reflected in the reference (GMESUA) dataset.This brings up the question whether OSM represents the right classification or the reference dataset.Finally, it can be concluded that the contributed OSM-LU features are heterogeneously distributed over inside/outside urban areas, which confirms the availability of LU features in both urban and rural areas.

Conclusions and recommendations
The recent emergence and rapid evolution of VGI platforms, such as OSM, has involved a massive number of citizens to collect and share geolocated information and attributes about geographical objects.This bottom-up process of collecting individuals' contributions has resulted in shaping big (geo)data, which has leveraged new applications such as indoor mapping (Goetz & Zipf, 2010), routing applications (Bakillah et al., 2014), tourism recommendations (Sun, Fan, Bakillah, & Zipf, 2013), and environmental monitoring (Fritz et al., 2012;Jokar Arsanjani & Vaz, 2015).Although the question on how to attract users and how to keep them active in the crowdsourcing activities is yet to be addressed, OSM has shown its continuing success in attracting more than 2.7 million users.Thus, a considerable potential in OSM exists and is yet to be further explored.Thus, in this study, we comparatively evaluated the completeness aspect of the contributed OSM-LU features across Europe as well as their thematic accuracy in ten large metropolitan areas to find out how reliable we could start exploiting them.
Results show that from a thematic accuracy perspective, the thematic quality of OSM features range from 'moderate' to 'substantial' rank of Kappa indices and overall accuracies.Per-class analysis of the LU types shows that, depending on the city, Isolated structures [113], Industrial, commercial, public, military and private units [121], Road and rail networks and associated land [122], Sport and leisure facilities [142], Agricultural+semi-natural+wetlands [200], Forests [300] and Water [500] reach the 'substantial' rank of accuracies, which means that these classes are highly useable.It should be noted that integrating ground-truth information with other reference data for accuracy assessment could be an alternative approach for producing hybrid LU datasets.
From a temporal accuracy perspective, archived images from within 2005-2010 have been used for LU mapping and this could have caused the abovementioned disagreements, whereas the OSM-LU features have mainly been uploaded within since 2009, and therefore, some information from OSM might be even more close to reality than our reference data.Moreover, the MMU of the GMESUA datasets is 0.25-1 ha and, therefore, land parcels smaller than this MMU are ignored in the course of mapping, while in OSM even smaller parcels are mapped, i.e. a smaller MMU in OSM is possible.This means that in some parts while a polygon in GMESUA dataset is representing a specific LU type, the same area in OSM-LU dataset is covered by multiple small polygons showing multiple land types.
Concerning the volunteers' recognition of LU features, the citizens' perception of LU types should be further investigated to understand the way they visually interpret LU types from the online image libraries in OSM.As a final conclusion, the OSM-LU features message a promising data source for updating LU inventories.Certainly, the longer OSM exists, the more contributions will be received and consequently higher data quality can be achieved.
This study points out some other recommendations to the LU researchers, environmental scientists, policy makers, among others that will lead future research possibly in more suitable directions.Based on the presented completeness indices across Europe, as well as the accuracy values of the selected cities, the contributed OSM-LU features account for a potential alternative data source for mapping LU.Further studies on other areas must be conducted to explore the heterogeneity of completeness and thematic accuracy across space.Furthermore, applying data mining techniques and data fusion with national and regional datasets (e.g., GMESUA) for extracting the LU information of unmapped areas are of high importance.Additionally, the land types with the highest reliability can be separately incorporated into respective applications.This enables experts to: (a) possibly find ways to draw the attention of volunteer mappers to mapping LU features by highlighting their importance for more effective environmental monitoring, (b) possibly improve the OSM ontology of the LU dataset, (c) maximize the efficiency of OSM for LU mapping as users are not able to add further features in the urban areas, because the massive volume of mapped objects (e.g.POIs, roads, building, etc.) do not let users to have enough space for adding LU features.

Figure 1 :
Figure 1: The flowchart of evaluating OSM land use features.

Figure 2 :
Figure 2: The calculated completeness index of OpenStreetMap land use features for European countries.

Figure 3 :
Figure 3: Spatial distribution of land use features from OpenStreetMap in Europe.

Figure 4 :
Figure 4: Completeness index of OpenStreetMap land use features for ten large metropolitan areas.

Table 1 :
Classification scheme applied in the preparation of GMESUA datasets as outlined in European Union (2011).