The problem![]() There are many geocoders, the most known are Google Maps, Bing Maps and OpenStreetMap. The former two are commercial services that provide an API interface. The latter is a collaborative open source project that provides both APIs and raw data. All solutions pose some problems to companies that need to integrate geocoding information in their private information systems. Unfortunately Google Map APIs, as many commercial geocoding services, are not suitable to generate data to be included in a corporate knowledge base because of its stringent license restrictions: the Google Maps geocoding APIs may only be used in conjunction with a Google Map service; geocoding results without displaying them on a map is prohibited. See the Google Maps Geocoding API Usage Limits for more information. OpenStreetMap (OSM) does not suffer license restrictions and exposes both an API and a data interface. Even if it is a very good service, OSM API does not provide any SLA and suffers performance penalties on the public geocoding server: no more than a query per second is allowed. The OSM data interface requires an expensive ETL process (Extract, Transform, Load) to iterate very often. Beside this, as a matter of facts in all systems, civic numbers (mainly in rural sites), are not always accurate. As a consequence, creating and maintaining an accurate geocoding knowledge base using these data sets can be really expensive. The solutionThe GeocodIT project, proposes a solution to this problem that leverages the many existing linked (open) data and the semantic web practices to develop a near-to-zero maintenance private geocoder.The geocoder exposes:
Moreover LinkedData.Center has software agents that recognize KEES descriptions to manage automatic data ingestion and data updating. This makes the upload of multiple datasets automatic and manageable. The big data sourcesOpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. The license that they use is the Open Data Commons Open Database License, a very open license that allows the reuse of data for any purpose. OSM exposes their data in a ready to use 5-star format (through the LinkedGeoData project). This allows LinkedData.Center to effortlessly connect with them.Even though the OSM coverage is still low compared to other blasonated services (like google maps or bing), the accuracy of their maps is astonishing since all the data are collected and validated by a human being. Moreover, the number of active contributors is constantly rising. Geoportale Nazionale is a portal produced by the Italian Ministry for the Environment and the Protection of Natural and Marine Resources. It offers a series of geo-referenced web-services. Along the services offered there is a dataset of the house numbers. This dataset is distributed under a Creative Commons BY-SA license. The database is updated at 2012 and, even though it is not complete and contains some inconsistencies, features a good coverage of the Italian territory. Data are supplied through a Web Map Service and must be converted before usage. Open data sources in the web long tailsData sets on the web are literally exploding. Of course they’re still fragmented and of different quality but that will be fixed and it is only a matter of time. That is a great opportunity for companies. With Linked Open Data, it does not make sense to use only the biggest ones because the highest quality data are often in the smallest data sets. These are in the long tail of the whole available data sets. Hence you’ve to use them or you’ll lose a lot of value. We invested some days scouting the web to find resources containing specific geocoding data for the Italian territory. This scouting included eGovernment portals (at country, regional and local level) and the coordination with existing similar projects. A table containing some of the most interesting analysed datasets is available in the white paper you can download in LinkedData.Center. Focalizing only on the italian data released by institutional bodies we found:
for a total of 178 different datasets from 54 unique data sources This picture summarizes dataset size: The GeocodIT architecture
All required system components are available as a service or as open source code:
All GeocodIT components are released as PHP and open source libraries. All the source code is shared in a GitHub repository under MIT license. |
Business documentation > Case studies >