Server-side geo clustering for Drupal 7

More Intro
1. More
  1. Clustering Theory
    1. Names, Keywords
      1. Geospatial clustering
      2. Analytical regionalization
      3. Spatially constrained clustering
    2. Approaches
      1. Where to cluster
      2. Clustering Client-side
      3. Clustering by Logic
      4. Clustering by Database
      5. When to cluster
      6. realtime / on-request
      7. Pre-cluster / store clusters in db
      8. How to cluster
      9. Questions
      10. what about performance?
      11. how do filters work?
      12. how well does it integrate with current drupal mapping technologies?
      13. how does caching work?
      14. how does client interaction work?
      15. how does it work with and without spatial database capabilities?
  2. Articles, soso
    1. Google maps with lots of data, comparison of libs
      1. http://www.svennerberg.com/2009/01/handling-large-amounts-of-markers-in-google-maps/
    2. Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves (Geohash)
      1. http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves
      2. zitat Quadtrees -> Geohashes
    3. article
      1. link
      2. http://web.archive.org/web/20071121140547/http://trib.tv/tech/clustering-points-on-a-google-map/
      3. http://trib.tv/
      4. Basic Types
      5. Well-separated: Points belong to a cluster when they are closer to every point in the cluster than they are to any point not in the cluster. This requires each cluster of points to be separated by a distance at least equal to the diameter of the largest cluster at its longest point.
      6. Prototype-based: Points belong to a cluster when they are closer or more similar to the cluster’s prototype, ie. a point representing the cluster as a whole, than they are to any other cluster’s prototype. This tends to result in circular clusters since the prototype (in a geospatial dataset anyway) is usually the centrepoint of the cluster.
      7. Graph-based: A cluster is defined as a set of points that are all connected, directly or via a chain of other points, to each other, and which are not connected to any point not in the cluster.
      8. Density-based: A cluster exists where there is a region of high density surrounded by a region of low density. These clusters can end up being any shape, and vary widely in size.
      9. Conceptual: A cluster exists where all points in the cluster share a common attribute which is not present in any point outside the cluster.
      10. Implementation types
      11. Variable gridding
      12. Interleaved digits
      13. K-Means
      14. http://macwright.org/2012/09/16/k-means.html
      15. QT-Clust
      16. DBSCAN
      17. Agglomerative Hierarchical Clustering
      18. Grid based viral growth argorithm
2. More
  1. Utils
    1. php haversine (c)
      1. https://github.com/php-geospatial/geospatial
    2. PHP hilbert curve
      1. http://sourceforge.net/projects/hilbert-curve/
Introduction
1. Motivation
  1. Usability
    1. summarize data at high zoom levels by clustering
    2. allow exploration of individual points at lower zoom levels
  2. Speed
    1. Browser javascript/memory limitations
    2. Even more for mobile
    3. "sebgruhier: performance is the key word. To be efficient (according to me) enough it has to be done around 100ms"
  3. http://blog.davebouwman.com/2012/03/24/server-side-clustering-why-you-need-it/
  4. http://www.svennerberg.com/2009/01/handling-large-amounts-of-markers-in-google-maps/
2. Problem statement
3. Aim of the work
  1. Use cases
  2. Examples
    1. http://www.crunchpanorama.com/
    2. http://gmaps-utility-library.googlecode.com/svn/trunk/markerclusterer/1.0/examples/speed_test_example.html
    3. http://www.panoramio.com/map/
    4. http://www.usda.gov/recovery/map/
4. Methodological approach
5. Structure of the work
Foundations
1. Clustering foundations
  1. What is clustering
    1. why cluster data
    2. what is a cluster
    3. Task
      1. Pattern proximity, similarity
      2. grouping
    4. History
  2. Cluster types
    1. Well-separated
    2. Prototype-based
    3. Graph-based
    4. Density-based
  3. Aspects
    1. Agglomerative vs. divisive
    2. Monothetic vs. Polythetic
    3. Hard vs. Fuzzy
    4. Incremental vs. Non-incremental
  4. Distance
    1. Euclidean
    2. Manhattan
    3. Chebychev
  5. Representation of Clusters
  6. Large data sets
2. Algorithms
  1. Hierarchical
    1. Single Link
    2. Complete Link
  2. Partitional
    1. Square Error (k-means)
    2. Graph theoretic
    3. ...
3. Spatial Data
  1. Quadtrees
  2. Order, Morton, Peano-Hilbert
  3. Geohash
4. Web Mapping
  1. About
    1. http://en.wikipedia.org/wiki/Web_mapping
    2. Geospatial web technologies
    3. Collaborative mapping
      1. http://en.wikipedia.org/wiki/Collaborative_mapping
      2. OpenStreetMap
      3. Google Map Maker
      4. Licensing
      5. http://mashable.com/2012/01/19/google-maps-world-bank/
      6. Wikimapia
      7. http://wikimapia.org/
    4. http://en.wikipedia.org/wiki/Geosocial_networking
    5. Classification
      1. static vs dynamic
      2. http://kartoweb.itc.nl/webcartography/webmaps/classification.htm
    6. The layered mapping stack
    7. Functional mapping metrics
    8. Technological mapping metrics
  2. Location based services
    1. spatial db
    2. raster vs vector mode
    3. map layers - features and themes
  3. Projections
    1. http://www.progonos.com/furuti/MapProj/Dither/CartHow/cartHow.html
  4. Basemap Tiles
    1. Ready to go
      1. MapQuest
      2. MapBox
      3. OpenStreetMap
      4. CloudMade
      5. Google, ...
    2. Customize Services
    3. Tile Tools
      1. TileMill
      2. mapquest background tiles
      3. einfach beim embed generieren 'World baselayer' anhaken
      4. http://map.peoplesdistrict.com/fullmap.html
      5. Maperitive
      6. http://maperitive.net/
  5. Overlays
    1. SVG vs Canvas
      1. https://groups.google.com/d/msg/d3-js/4CQ7tmpDi-E/0auSzBMu10gJ
  6. Dynamic Overlays vs. static tiles
    1. http://developmentseed.org/blog/2011/sep/28/gains-dynamic-maps-bridging-couchdb-sqlite/
    2. daten regelmaessig von drupal in sqlite pushen
    3. UTFGrid
      1. http://mapbox.com/mbtiles-spec/utfgrid/
  7. Interaction
    1. None
    2. Per-point
    3. Hybrid
  8. Database
    1. MySQL Spatial
      1. http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
    2. PostGIS PostreSQL
      1. http://postgis.org/
    3. SOLR
      1. MetaCarta - GeoSearch Toolkit for Solr (prorietary)
      2. http://www.metacarta.com/products-overview.htm
      3. http://lucidworks.lucidimagination.com/display/solr/Spatial+Search
    4. http://cartodb.com/
  9. Logic
    1. GEOS
      1. https://github.com/phayes/geoPHP/wiki/GEOS
State-of-the-art
1. Web Mapping Stack
  1. Mapping JS
    1. Compare them
      1. http://www.spatialanalysis.ca/2012/alternatives-to-google-maps/
      2. http://geotux.tuxfamily.org/index.php/en/geo-blogs/item/291-comparacion-clientes-web-v6
      3. http://www.smartmapbrowsing.org/html/index_en.html
      4. new modest + leaflet
      5. http://mapbox.com/blog/modest-maps-and-leaflet-new-choices-web-apis/
    2. OpenLayers
      1. The Wary Guide to OpenLayers
      2. http://macwright.org/2012/01/12/openlayers.html
      3. OpenLayers with Canvas
      4. http://trac.osgeo.org/openlayers/wiki/Future/OpenLayersWithCanvas
      5. openlayers 3 plans
      6. http://openlayers.org/blog/2012/11/14/why-are-we-building-openlayers-3/
      7. goals for ol3
      8. http://www.youtube.com/watch?v=cgHudJim07o
      9. ol plus
      10. support many protocols & formats
      11. support multiple projections
      12. ol minus
      13. shows its age (-> make it look more like leaflet, modern web application)
      14. small - fast loading and quick running
      15. still powerful (keep all features)
      16. ensure compatibility (automated tests)
      17. friendly - clean API + pretty UI
    3. Leaflet
    4. Modest Maps
    5. polymaps
      1. http://polymaps.org/
      2. SVG-based large-scale data overlays on interactive maps
    6. d3 / Protovis
      1. http://mbostock.github.com/d3/
      2. http://mbostock.github.com/d3/tutorial/protovis.html
      3. D3 + maps
      4. https://groups.google.com/forum/?fromgroups#!topic/d3-js/4CQ7tmpDi-E
      5. basic d3maps integration
      6. https://github.com/bloomtime/d3map
    7. Other
      1. Tile5
      2. http://www.tile5.org/
      3. Canvas
      4. Openlayers heatmap canvas
      5. http://www.websitedev.de/temp/openlayers-heatmap-layer.html
      6. SVG vs Canvas in Openlayers
      7. http://unterbahn.com/2010/07/comparison-of-svg-and-canvas-in-openlayers/
2. Drupal & Mapping
  1. General
    1. Drupal Geo Stack
      1. http://groups.drupal.org/node/138884
      2. http://groups.drupal.org/node/89769
    2. Mapping Book
    3. Mapping
      1. http://drupal.org/project/mapping
      2. http://groups.drupal.org/node/91114
      3. Podcast
      4. http://drupaleasy.com/podcast/2012/01/drupaleasy-podcast-73-lots-options
      5. 05:35; Drupal was one of the first CMS to integrate with Maps
      6. 13:30; TileMill
      7. 19:00; Client-side performance issues & clustering
      8. 24:35; Geocode & center map on users location by using HTML5 Geolocation
      9. 29:53; OpenLayers complexities, CTools, Display using Views, Panels or API
      10. 34:05; OpenLayers related modules: Geofield, Geocoder, ...
      11. 35:20; Geofield in Drupal 7 instead of Location module before
      12. 37:40; Recommended setup - Quickstart for storing locations and displaying them
      13. OpenLayers + Geofield + Addressfield
      14. GMap + Location
      15. 42:10; Leaflet (Cloudmade)
      16. 44:55; Book wrapup
      17. 46:25; Book introduction chapters + Motivation on mapping
      18. 49:25; Popular mapping blogs, cartography, best map of the year
      19. 1:06; Location CCK migrate sandbox
      20. transforms D6 location_cck fields into D7 geofields
      21. 1:07; Geocoder, get geospatial data from address field into geofield
      22. 1:09:20; Baraka samsara films - map site use case
      23. http://barakasamsara.com/
      24. http://stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers
      25. http://brandonmorrison.com/sites/default/files/presentations-deckjs/drupal-geospatial/index.html
  2. Storage
    1. Geofield
      1. Presentation
      2. http://drupal.org/node/1570972
    2. PostGIS
      1. http://drupal.org/project/postgis
    3. Sync PostGIS
      1. http://drupal.org/project/sync_postgis
  3. Query + Display
    1. OpenLayers
      1. 3.x
      2. http://drupal.org/node/1353122
    2. Leaflet
      1. Widgets
      2. http://drupal.org/sandbox/Chi/1796796
      3. https://drupal.org/sandbox/GoalGorilla/1792420
      4. http://drupal.org/sandbox/thegreat/1802616
    3. Gmap
    4. Views GeoJSON
      1. http://drupal.org/project/views_geojson
      2. OL intergration
      3. http://drupal.org/node/1370448
      4. Documentaion, use cases
      5. http://drupal.org/node/1471026
      6. Bounding box filter
      7. http://drupal.org/node/1333324
    5. Leaflet GeoJSON
    6. Drupal SOLR
      1. Drupal Apachesolr Geo sandbox
      2. http://drupal.org/sandbox/pwolanin/1497066
      3. Search API Location
      4. http://drupal.org/project/search_api_location
      5. Support official Solr 3 spatial search APIs
      6. http://drupal.org/node/1328490
      7. Sandbox
      8. http://drupal.org/sandbox/madmatter23/1799850
      9. Search API
      10. Support location based search
      11. http://drupal.org/node/1744250
      12. Geofield
      13. Apache Solr 3.5 update
      14. http://drupal.org/node/1800570
      15. Blog
      16. http://ericlondon.com/posts/250-geospatial-apache-solr-searching-in-drupal-7-using-the-search-api-module-ubuntu-version-part-2
      17. OpenLayers Solr
      18. http://drupal.org/project/openlayers_solr
      19. solarium...
    7. Other
      1. Polymaps (depends on mapping)
      2. http://drupal.org/project/polymaps
      3. Mapstraction (only D6)
      4. http://drupal.org/project/mapstraction
      5. Modestmaps (no code)
      6. http://drupal.org/project/modestmaps
  4. Scenarios
    1. Geofield + Sync_Postgis + Tilestache + Openlayers
      1. http://affinitybridge.com/blog/gis-tools-pnwds-session-video
    2. Article Server-side mapping
      1. http://affinitybridge.com/blog/server-side-mapping
    3. GeoServer + PostGIS
      1. http://www.geops.de/blog/64-spatial-data-and-drupal-7
    4. tilestash + drupal + leaflet
      1. https://github.com/affinitybridge/drupal-tilestache
      2. https://img.skitch.com/20120426-jfu9fj5c17y5rfa9392y48xmb7.jpg
  5. More
    1. http://denver2012.drupal.org/program/sessions/nodejs-javascript-and-future
    2. Drupal Mapping presentation by levelos (overview + openlayers & leaflet focus)
    3. http://www.slideshare.net/loubabe/drupal-mapping-9713919
    4. Drupal Mapping presentation by zzolo 2012
      1. https://github.com/zzolo/spatially-drupal
    5. vector tiles
      1. [2:01pm] friedjoff: dasjo: maybe vector tiles might be worth looking into
      2. [2:01pm] tnightingale: some data doens't make sense to be pushed into drupal
      3. [2:01pm] tnightingale: vector tiles++
      4. [2:02pm] tnightingale: though last i looked, polymaps was the only lib that supported them
    6. Mapping Office hours log
      1. Notes + log
      2. https://docs.google.com/document/d/19F8XCwyxQc4JwouNxDDspiDCjPAWFsVb--xxOyedpmw/edit?pli=1
      3. server-side-clustering
      4. [1:52pm] zzolo: there is still a pretty big hole when trying to map (and handle) lots of geospatial features in Drupal
      5. [1:53pm] phayes: zzolo - I think the answer there is a combination of ViewsGeoJSON / server-side clustering
      6. [1:58pm] dasjo: final question before i have to leave: do we need a server-side clustering like nod_ and me discussed? http://drupal.org/node/1547610
      7. [1:59pm] nod_: dasjo: phayes seems to agree, see backscroll
      8. [1:59pm] Brandonian: dasjo: Definitlely think it'd be a cool feature, not sure what other work is being done with that in php/drupal land though
      9. [2:00pm] zzolo: dasjo: my initial thought is no, if we can focus on toolking up thinks like postgis and tilestache. but since that is not really going to happen overnight, a good stop gap solon could be server side clustering
      10. [2:00pm] phayes: dasjo: Yes we need it! That's going to be critical as we scale to support larger data sets
      11. [2:00pm] dasjo: yep Brandonian zzolo. basically some parts of the database implementation and client-side handling should be pluggable
      12. [2:01pm] dasjo: but maybe we can come with some basic out-of-the-box implementations and leave it open to a database to optimize the clustering
      13. tilestash + drupal + leaflet
      14. log
      15. [1:59pm] tnightingale: yeah - tilestache is (using mapnik) is able to generate tiles on demand from a postgis backend
      16. [1:59pm] mackh: of data from the oil and gas commksion - like pipeline right-of-ways, access roads, wellsites, watercrossings
      17. [2:00pm] tnightingale: zzolo: so we can sync drupal data into postgis and render that data into tiles using mapnik+tilestache
      18. [2:00pm] zzolo: tnightingale: and mackh just to confirm, the postgi data is drupal data? (from sync_postgis, i assume)
      19. [2:00pm] zzolo: sweet
      20. [2:00pm] mackh: yes and no?
      21. [2:00pm] tnightingale: yep
      22. [2:01pm] mackh: stored in drupal, synced to postgis
      23. [2:01pm] tnightingale: but doesn't have to be
      24. [2:01pm] zzolo: awesome. thats what i was hoping. this is the stack i have been envisioning, but i don't get paid to do any of this stuff so i don't have time.
      25. [2:02pm] Brandonian: for the more python/mapnik literate in the room: How feasible is it to do mapnik renders based directly off drupal field data if the database is Postgres/postgis and stored properly?
      26. [2:02pm] tnightingale: Brandonian: that's essentially what we're doing with sync_postgis & tilestache
      27. [2:03pm] zzolo: Brandonian: what i image is getting your design in tile mill, export out the mapnik xml file, then it should be pretty easy
      28. [2:03pm] tnightingale: using a mapnik xml file produced in tilemill as our style guide
      29. [2:06pm] tnightingale: zzolo: we also have a rough cut at a tilestache config management module on g.h
      30. [2:06pm] phayes: ooo! link tnightingale?
      31. [2:07pm] phayes: Oh tnightingale, what's the status of D7 Spatial-tools?
      32. [2:07pm] zzolo: tnightingale: you have a link. i am actually gonna try to get this stack running locally this week as part of my presentation at the TC Drupal camp
      33. [2:07pm] mackh: https://github.com/affinitybridge/drupal-tilestache]
      34. [2:07pm] zzolo: i think this is definitely the future of geo
      35. [2:07pm] tnightingale: re: tilestache module - still pretty rough, be warned
      36. [2:08pm] zzolo: and i think postgis sync is the way to go to handle goespatial data and spatial querying.
      37. https://github.com/affinitybridge/drupal-tilestache
      38. https://img.skitch.com/20120426-jfu9fj5c17y5rfa9392y48xmb7.jpg
      39. search api
      40. [2:08pm] mackh: i also want to extend search api to return WKT onto maps
      41. [2:08pm] mackh: from faceted results, hoping to get that client funded in the next 3 months
      42. Office hours finishing
      43. log
      44. [2:07pm] tom_o_t: Brandonian: perhaps schedule dedicated Q&A time for stuff general questions that aren't related to module development?
      45. [2:08pm] Brandonian: tom_o_t: Not a bad idea.
      46. [2:09pm] jeffschuler: tom_o_t++ … this has been super useful, but It's been more of a facilitated discussion on high-level geo in drupal topics rather than a time for folks to come get help or figure out how they can pitch in… different from core office hours http://drupal.org/node/1242856
    7. Use Case
      1. Online-Biodiversitätsportale mit Indicia, Drupal und OpenLayers
      2. http://www.fossgis.de/konferenz/2011/programm/events/239.en.html
      3. CivicApps
      4. http://civicapps.org/
      5. OpenPublic Map Visualization Feature
      6. http://www.openpublicapp.com/map-visualization-feature
      7. broken?
    8. Geofield proximity updates by brandonian
      1. http://drupal.org/node/1469956#comment-6025452
      2. http://en.wikipedia.org/wiki/Haversine_formula
  6. Solr
    1. Articles
      1. Location-aware search with Apache Lucene and Solr
      2. http://www.ibm.com/developerworks/opensource/library/j-spatial/?ca=drs-
      3. Outdated localsolr
      4. https://issues.apache.org/jira/browse/SOLR-773
3. Cluster Implementations
  1. Client-side (Javascript)
    1. Distance Grid
      1. Leaflet Markercluster
      2. MapBox Clustr Library
      3. https://github.com/mapbox/clustr
    2. Neighbor-check
      1. Openlayers (client side)
      2. http://dev.openlayers.org/releases/OpenLayers-2.11/lib/OpenLayers/Strategy/Cluster.js
      3. Google MarkerClustering
      4. http://code.google.com/p/google-maps-utility-library-v3/wiki/Libraries#MarkerClusterer
      5. http://googlegeodevelopers.blogspot.com/2009/04/markerclusterer-solution-to-too-many.html
      6. http://google-maps-utility-library-v3.googlecode.com/svn/trunk/markerclusterer/docs/reference.html
      7. https://developers.google.com/maps/articles/toomanymarkers
    3. K-means
      1. Polymaps k-means
      2. http://polymaps.org/ex/cluster.html
      3. no interactivity, static
  2. Server-side
    1. Articles & recipes
      1. Google maps Perl impl + discussion
      2. Google Maps Hacks: Tips and Tools for Geographic Searching and Remixing
      3. http://flylib.com/books/en/2.367.1.102/1/
      4. siehe auch downloaded chm ebook
      5. K-mean
      6. Method
      7. 1. Select a center point for each of k clusters, where k is a small integer.
      8. 2. Assign each data point to the cluster whose center point is closest.
      9. 3. After all the data points are assigned, move each cluster's center point to the arithmetic mean of the coordinates of all the points in that cluster, treating each dimension separately.
      10. 4. Repeat from Step 2, until the center points stop moving.
      11. Hierachical clustering
      12. Method
      13. 1. Assign each point to its own cluster.
      14. 2. Calculate the distance from each cluster to every other cluster, either from their respective mean centers, or from the two nearest points from each cluster.
      15. 3. Take the two closest clusters and combine them into one cluster.
      16. 4. Repeat from Step 2, until you have the right number of clusters, or the clusters are some minimum distance apart from each other, or until you have one big cluster.
      17. Naïve grid-based clustering
      18. grid by display pixels / marker size
      19. assign points to clusters within grid
      20. order clusters by points
      21. make superclusters (heavy clusters claim their neighbors)
      22. Clustering Maps thesis
      23. DB-Scan + R-Tree based implementation, tested up to 400-500 items only (~1 sec)
      24. Demo http://www.wannesm.be/maps/
      25. PHP Implementation of Google Marker Clustering
      26. http://www.appelsiini.net/2008/11/introduction-to-marker-clustering-with-google-maps
      27. artikel + links
      28. http://labo.eliaz.fr/spip.php?article89
      29. Vizmo
      30. Hierarchical Clustering by Meaningful Units
      31. PHP, Symphony2, closed source?
      32. www.globalimpactstudy.org/wp-content/uploads/.../vizmo-poster.pdf
      33. http://www.globalimpactstudy.org/2011/12/open-source-presentation/
    2. Database
      1. PostGIS
      2. idea based on z-curve
      3. http://postgis.refractions.net/pipermail/postgis-users/2006-March/011431.html
      4. uses clustering on indices for fast retrieval
      5. http://workshops.opengeo.org/postgis-intro/clusterindex.html
      6. kmeans-postgresql
      7. http://pgxn.org/dist/kmeans/doc/kmeans.html
      8. http://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis
      9. grid clustering
      10. SnapToGrid
      11. http://postgis.refractions.net/docs/ST_SnapToGrid.html
      12. MySQL
      13. SQLDM – implementing k-means clustering using SQL (linear)
      14. http://www.abibasystems.com/white_paper/sqldm.pdf
    3. http://www.maptimize.com/
      1. Maptimize can deal with up to 50,000 markers - but the underlying algorithm can handle over one million!
      2. http://v2.maptimize.com/faq#
      3. http://seb.box.re/2009/5/1/maps-geolocalization-and-optimization-with-maptimize
      4. interview
    4. More
      1. Google maps implementation
      2. http://maps.forum.nu/server_side_clusterer/index2.php
      3. with clustering POIs on a route
      4. http://gis5.com/pois_along_route/gm_pois_along_route.php
      5. R-Project
      6. SGCS
      7. Spatial Graph based Clustering Summaries for spatial point patterns
      8. http://www.inside-r.org/packages/sgcs
      9. http://cran.r-project.org/web/packages/SGCS/
      10. graph-based implementation
      11. Flex
      12. http://thunderheadxpler.blogspot.co.at/2008/12/clustering-20k-map-points.html
      13. insert into grid + check for overlaps
      14. clusterPy
      15. http://code.google.com/p/clusterpy/
      16. http://www.rise-group.org/risem/clusterpy/index.html
      17. clusterPy algorithms
      18. AMOEBA A Multidirectional Optimum Ecotope-Based Algorithm
      19. AriSeL - Automatic Rationalization with Initial Seed Location
      20. Automatic Zoning Procedure (AZP)
      21. Reactive tabu variant of Automatic Zoning Procedure (AZP-R-Tabu)
      22. Simulated Annealing variant of Automatic Zoning Procedure (AZP-SA)
      23. Tabu variant of Automatic Zoning Procedure (AZP-Tabu)
      24. Geo Self Organizing Map(geoSOM)
      25. Max-p-regions model (Tabu)
      26. Generate random regions
      27. Self Organizing Map(SOM)
4. Drupal & Clustering
  1. Drupal 6 github
    1. https://github.com/ahtih/Geoclustering
    2. beschreibung
      1. * source data (points) is in the form of Drupal nodes
      2. * a new Drupal module maintains a DB table of multi-level geographic clusters, using hook_nodeapi() to track changes to source nodes and update the clusters table accordingly. Clusters table is accessible via a Views plugin
      3. * WFS module accesses clusters view and serves out both clusters and source nodes, with clustering level and bbox as request parameters
      4. * a new OpenLayers Strategy or layer type displays the map, selecting a suitable clustering level based on zoom and/or number of points in map view area
      5. http://drupal.org/node/622720#comment-3239286
      6. http://www.letsdoitworld.org/wastemap
  2. Drupal TagMap Clusterer JS
    1. http://drupal.org/project/tagmap
  3. WFS (D6 wenig user)
    1. http://drupal.org/project/wfs
  4. Geohash with Drupal
    1. Geospatial searching with solr 3.x
      1. http://drupal.org/node/1187888
    2. Apachesolr geo sandbox
      1. http://drupal.org/sandbox/pwolanin/1497066
5. Solr clustering
  1. http://stackoverflow.com/questions/8399152/how-to-best-do-server-side-geo-clustering
  2. Implementierung ohne code
    1. http://blog.sybit.de/2010/11/geografische-suche-mit-solr/
  3. Good discussion
    1. http://www.mail-archive.com/solr-user@lucene.apache.org/msg40651.html
    2. http://searchhub.org/dev/2009/09/28/solrs-new-clustering-capabilities/
  4. Spatial4j
    1. https://github.com/spatial4j/spatial4j
  5. SOLR-2155
    1. https://github.com/dsmiley/SOLR-2155
  6. solr geohash notes
    1. http://spatial4j.16575.n6.nabble.com/Dev-My-Spatial-notes-from-Lucene-Revolution-td4980074.html#a4981005
    2. http://codegouge.blogspot.co.at/2012/05/using-geohashing-in-solr.html
  7. implemented, missing code
    1. http://lucene.472066.n3.nabble.com/Geographic-clustering-td502559.html
  8. Facet prefix filter (David Smiley)
    1. http://stackoverflow.com/questions/11319465/geoclusters-in-solr/11321723#11321723
  9. collpase, group
    1. solr 4.0
      1. http://www.searchworkings.org/blog/-/blogs/result-grouping-field-collapsing-with-solr/
    2. http://lucene.472066.n3.nabble.com/How-to-facet-data-from-a-multivalued-field-td3897853.html
  10. search api solr 4.0 (open)
    1. http://drupal.org/node/1676224
6. Geohash
  1. about
    1. http://en.wikipedia.org/wiki/Morton_number_(number_theory)
    2. http://en.wikipedia.org/wiki/Geohash
  2. good presenation
    1. http://www.lucidimagination.com/sites/default/files/Lucene%20Rev%20Preso%20Smiley%20Spatial%20Search.pdf
    2. http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-smiley-geospatial-search.pdf
  3. geohash explanation demo
    1. http://openlocation.org/geohash/geohash-js/
  4. SOLR-2155
    1. edge n-gram'ed geohashes with a PrefixTree/Trie search algorithm
    2. https://github.com/dsmiley/SOLR-2155
    3. http://wiki.apache.org/solr/SpatialSearch#SOLR-2155
  5. SOLR
    1. http://lucene.apache.org/core/3_6_0/api/contrib-spatial/index.html?org/apache/lucene/spatial/geohash/GeoHashDistanceFilter.html
    2. https://issues.apache.org/jira/browse/SOLR-2155
    3. A new geospatial framework for Lucene and Solr
      1. https://github.com/spatial4j/spatial4j
  6. MongoDB
    1. http://www.mongodb.org/display/DOCS/Geospatial+Indexing
  7. PHP additional hilbert-curve
    1. http://www.phpclasses.org/package/6202-PHP-Generate-points-of-an-Hilbert-curve.html
    2. http://stackoverflow.com/a/9645315
  8. geoPHP integration
    1. https://github.com/phayes/geoPHP/issues/32
  9. geohash efficiency
    1. http://karussell.wordpress.com/2012/05/23/spatial-keys-memory-efficient-geohashes/
  10. geohash neighbors java
    1. https://github.com/kungfoo/geohash-java/blob/master/src/main/java/ch/hsr/geohash/GeoHash.java
7. Drupal Clustering
  1. leaflet + drupal + client-side clustering
    1. http://drupal.org/project/leaflet_markercluster
  2. Clustering
    1. Blog: exploring large sets of geodata
      1. http://groups.drupal.org/node/104014
    2. Benchmarking OpenLayers with Drupal
      1. http://rjsteinert.com/content/benchmarking-openlayers-module-views-and-openlayersjs
Objectives
1. Design and implement a server-side clustering algorithm for Drupal
2. Open source
3. Integrative & Exensible
4. Use cases
5. Usability
Realization
1. Analysis
  1. Algorithm considerations
    1. speed, on-the-fly
    2. database, abstract, geohash, portable
    3. database vs. application layer
  2. Planning for the Drupal mapping environment
    1. Drupal integration
      1. Drupal mapping
      2. Recruiter
      3. Geofield
      4. Views
      5. Search API
      6. Recruiter
      7. Solr
    2. Configuring maps in Drupal
      1. Data model
      2. Query
      3. Mapping Library
      4. Mapping stack
    3. Data flow
      1. Views
      2. SAPI
  3. Usability - Leaflet, client-side clustering
2. Algorithm
  1. Geohash
  2. Algorithm - theory
    1. input
    2. output
    3. prototype-based clusters
      1. Prototype-based clusters are defined so that objects are closer to their cluster’s prototype than to any other one. Prototypes of clusters are either centroids (the mean of all points for a cluster) for continuous data or medoids (the most central point within a cluster) for categorical data.
    4. hierarchical vs. partitional
    5. agglomerative vs. divisive
    6. hard vs fuzzy
    7. complete vs partial
    8. proximity
    9. STING (grid-based)
    10. geohash
3. Implementation
  1. Architecture
    1. Index
    2. Parameters
    3. Processing
    4. Visualization
    5. Helpers
  2. Spatial index - geofield
  3. Configuration
  4. Query - views
  5. Display - views, leaflet, leaflet_geojson, ...
  6. Solr
    1. Solr
      1. map requesthandler
      2. handleSelect=true
      3. ?qt=/geocluster
  7. Geospatial computation requirements
    1. https://github.com/mapbox/clustr/blob/gh-pages/API.md#clustr
    2. distance between two points
      1. https://github.com/mapbox/clustr/blob/gh-pages/src/clustr.js#L34
    3. centroid of multiple points
      1. https://github.com/mapbox/clustr/blob/gh-pages/src/clustr.js#L55
    4. meters to pixels
      1. http://msdn.microsoft.com/en-us/library/aa940990.aspx
      2. http://wiki.openstreetmap.org/wiki/Zoom_levels
      3. https://github.com/mapbox/clustr/blob/gh-pages/src/clustr.js#L4
      4. problem: kalibriert für äquator, wird weniger nach oben
    5. bounding box strategy
      1. http://www.geojson.org/geojson-spec.html#bounding-boxes
      2. openlayers ok
      3. http://openlayers.org/dev/examples/strategy-bbox.html
      4. http://dev.openlayers.org/docs/files/OpenLayers/Strategy/BBOX-js.html#OpenLayers.Strategy.BBOX
      5. leaflet custom?
      6. https://github.com/CloudMade/Leaflet/issues/962
      7. http://switch2osm.org/using-tiles/getting-started-with-leaflet/
      8. http://boomphisto.blogspot.co.at/2011/07/nodejs-express-leaflet-postgis-awesome.html
      9. movend
      10. https://github.com/CloudMade/Leaflet/issues/327#issuecomment-2460917
4. Realization
  1. Geocluster
    1. geocluster_views_post_execute
      1. not very flexible, prefer to do plugins
    2. views_plugin_style_geocluster extends views_plugin_style_geojson
      1. views_geojson has complicated _views_geojson_render_fields
    3. geocluster_plugin_style_geofield_map extends geofield_map_plugin_style_map
      1. fields already rendered by field handler post execute
    4. geocluster_handler_field_geofield extends geofield_handler_field
    5. ähnlich wie groupby, get_aggregation_info ?
    6. display that uses another display
      1. datenquelle = andere view
      2. Subtopic 2
    7. queryhandler clustering
    8. alter views result research
      1. Search API research 2008
      2. Building a view from a different source (not Database)
      3. http://groups.drupal.org/node/12838
      4. Describing tables to Views
      5. http://views-help.doc.logrus.com/help/views/api-tables
      6. fake entities
      7. geo data is stored in a field api field
      8. think about the pager
      9. group entity altering of views
      10. http://drupalcode.org/project/views.git/blob/refs/heads/7.x-3.x:/modules/field/views_handler_field_field.inc#l727
      11. views_plugin_query_default
      12. $result = $query->execute();
      13. post_process afterwards
      14. register a custom handler to be exectued before the field handlers
      15. problem: no way to modify views_object_types()
  2. done
    1. geofield + geohash support
      1. http://drupal.org/node/1662584
  3. Feedback
    1. http://affinitybridge.com/blog/server-side-mapping#comment-297
  4. Geohash algorithm
    1. issue
      1. http://drupal.org/node/1662432
5. Project phases
  1. 2011 AustroFeedr
  2. January - March
    1. Drupal + Mapping topics research
  3. April
    1. Frontend United: Geocluster kickoff
      1. 20-22 April
  4. June
    1. first sandbox draft
      1. 10 June
    2. Drupal Developer Days Barcelona: Mapping Sprint
      1. 15 June
      2. Drupal Developer Days Barcelona: Mapping Sprint
    3. Geohash idea (Nick)
  5. September
    1. First demo
      1. 27 September
  6. October
    1. Alpha 1: PHP based clustering
    2. Views GeoJSON integration
    3. Solr plugin
  7. November
    1. Alpha2: MySQL + views aggregation baed clustering
  8. December
    1. Solr implementation finished
  9. January + February
    1. Documentation
    2. Polishing
Use cases
1. GeoRecruiter
  1. Daten
    1. GeoTaxonomie verwenden / geofield
      1. jobs
      2. bezirke
  2. Suche
    1. Search API integration
      1. http://drupal.org/project/search_api_location
  3. Job
    1. Job location auf karte anzeigen
  4. Mapping & Recruiter
    1. http://drupal.org/node/1254716
    2. by Adam S, based on OpenPublic
Conclusions & outlook
1. Evaluation
  1. Objectives
  2. Performance
    1. Algorithm optimization tasks issue
      1. http://drupal.org/node/1828584
  3. Demo data & test framework
  4. Clustered/aggregated data
    1. issue
      1. http://drupal.org/node/1824954
  5. cluster stability
  6. visualization
2. Related work
3. Future work
More
1. Server-side implementation
  1. http://www.ushahidi.com/
2. Clustering 6 example
  1. http://www.letsdoitworld.org/wastemap#
People
1. Nick_vh
  1. geohash, solr
2. Smiley, David W
  1. solr
3. phayes
  1. geophp
4. mollux, Matthias Michaux
  1. http://drupal.org/user/785804
  2. Search API Location
5. Underdark GIS
6. alex?
7. Österreich
  1. Gomogi
    1. http://www.gomogi.com/
    2. micheal diener
  2. FOSSGIS UG AT
    1. fossgis@spektral.at
    2. http://wiki.alpine-geckos.at/wiki/FOSSGIS_UserGroup_Austria
  3. http://www.alpine-geckos.at/category/geowissenschaften/
Visual
1. http://vis4.net/blog/posts/clean-your-symbol-maps/
2. Vizualizing Large Spatial Datasets in Interactive Maps
  1. Voronoi polygons
    1. compare
      1. d3
      2. http://mbostock.github.io/d3/talk/20111116/airports-all.html
  2. hierarchical aggregation.
  3. can effectively be used with datasets of up to 1000 items.
  4. Clutter not only reduces the background visibility, but also hinders the users understanding of the structure and content of the data.
  5. Hierarchical aggregation is a common visualization technique to make visual representations more visually scalable and less visu- ally cluttered [1]. In particular, hierarchical aggregation tech- niques have been proposed for exploring spatial data sets [2], [3].
  6. Ellis and Dix’s [7] distinguish three main types of clutter reduction techniques: appearance (alter the look of the data items), spatial distortion (displace the data items in some ways) and temporal (animation).
  7. Heatmap
    1. However, a problem with this approach is that users interactions with the data items are harder to deal with, as they need to be handled at the pixel level.
  8. Icons
    1. One of the main drawbacks of using icons as aggregation symbols is that they do not show the area covered by the clusters.
  9. Visualization evaluation
    1. Evaluating visualization techniques is a well-known prob- lem [16], [17].
    2. Types
      1. summative (i.e., comparison-based),
      2. formative (evaluation that leads to suggestions for improving the evaluated technique)
      3. exploratory analysis (evaluation that is helpful to discover new ideas and concepts about the technique).
  10. Color
    1. Color is used to drive user’s attention on denser clusters. The color assigned to a polygon is determined using a hot-to- cold color ramp where hot colors are assigned to dense clusters and cold colors to sparse ones [18].
  11. if points are concentrated in small visible areas, polygons will cover a significantly wider area than the area of their points. In that case, using bounding boxes or hulls, as cluster footprints could be more effective.
3. Types
  1. Heatmap
    1. Clustering and Visualizing Geographic Data Using Geo-tree
    2. http://en.wikipedia.org/wiki/Heat_map
  2. Icons
  3. Grid
  4. Container shapes
    1. Boxes
    2. Hulls
    3. Paper
      1. compare
      2. Circle Packing
      3. d3
      4. http://bl.ocks.org/mbostock/4063530
4. Properties
  1. Size
  2. Center (weighted)
  3. Shape
  4. Color
    1. http://en.wikipedia.org/wiki/Map_coloring
  5. Interaction, Animation
5. clustermap
  1. https://code.google.com/p/clustermap/wiki/Introduction
  2. example
6. Attribute clustering
  1. https://github.com/Leaflet/Leaflet.markercluster/issues/96
  2. http://openlayers.org/dev/examples/strategy-cluster-extended.html
7. Process
  1. vis
8. Cluster quality Measures
  1. Purity
  2. Entropy
  3. NMI Normlized Mutual Information
9. Allgemein
  1. Analysis Visualization
  2. – Flat Tree Viewer, 2D Matrix or Heat Map, Hyperbolic Lens
  3. Viewer, Table Viewer
10. Examples
  1. Google
    1. https://developers.google.com/maps/documentation/javascript/training/visualizing/earthquakes
    2. Basic markers, sized circles, and heatmaps.
  2. SnapToGrid example
    1. http://developers.cartodb.com/examples/point-clustering.html
  3. Chart Maps
    1. Pie vs Bar
    2. http://kartograph.org/showcase/charts/
  4. Choroplath Maps
    1. Kartograph
      1. http://kartograph.org/showcase/choropleth/
  5. Dot Grid Maps
    1. Kartograph
      1. http://kartograph.org/showcase/dotgrid/
  6. Symbol Maps
    1. Kartograph
      1. http://kartograph.org/showcase/symbols/
  7. 3-dim
    1. Kartograph
      1. http://kartograph.org/showcase/3d/
      2. compare
      3. d3
      4. animated wind chart
      5. http://prcweb.co.uk/lab/ukwind/
  8. Icons
    1. MapBox
      1. http://www.mapofthedead.com/map#14.00/48.8607/2.3440
    2. Wind Icons
      1. http://windhistory.com/map.html#9.00/37.8931/-121.7366
  9. Scaled Data Values
    1. MapBox
      1. http://mapbox.com/blog/scaled-data-value-design-in-tilemill/
  10. Scaled Dots
    1. MapBox
      1. http://mapbox.com/blog/scaled-data-value-design-in-tilemill/
  11. Hexagonal Binning (Heatmap)
    1. MapBox
      1. http://mapbox.com/blog/binning-alternative-point-maps/
  12. Animated OpenLayers
    1. OL
      1. http://acuriousanimal.com/blog/2012/08/19/animated-marker-cluster-strategy-for-openlayers/