Well-separated: Points belong to a cluster when they are closer to every point in the cluster than they are to any point not in the cluster. This requires each cluster of points to be separated by a distance at least equal to the diameter of the largest cluster at its longest point.
Prototype-based: Points belong to a cluster when they are closer or more similar to the cluster’s prototype, ie. a point representing the cluster as a whole, than they are to any other cluster’s prototype. This tends to result in circular clusters since the prototype (in a geospatial dataset anyway) is usually the centrepoint of the cluster.
Graph-based: A cluster is defined as a set of points that are all connected, directly or via a chain of other points, to each other, and which are not connected to any point not in the cluster.
Density-based: A cluster exists where there is a region of high density surrounded by a region of low density. These clusters can end up being any shape, and vary widely in size.
Conceptual: A cluster exists where all points in the cluster share a common attribute which is not present in any point outside the cluster.
Implementation types
Variable gridding
Interleaved digits
K-Means
http://macwright.org/2012/09/16/k-means.html
QT-Clust
DBSCAN
Agglomerative Hierarchical Clustering
Grid based viral growth argorithm
More
Utils
php haversine (c)
https://github.com/php-geospatial/geospatial
PHP hilbert curve
http://sourceforge.net/projects/hilbert-curve/
Introduction
Motivation
Usability
summarize data at high zoom levels by clustering
allow exploration of individual points at lower zoom levels
Speed
Browser javascript/memory limitations
Even more for mobile
"sebgruhier: performance is the key word. To be efficient (according to me) enough it has to be done around 100ms"
[1:52pm] zzolo: there is still a pretty big hole when trying to map (and handle) lots of geospatial features in Drupal
[1:53pm] phayes: zzolo - I think the answer there is a combination of ViewsGeoJSON / server-side clustering
[1:58pm] dasjo: final question before i have to leave: do we need a server-side clustering like nod_ and me discussed? http://drupal.org/node/1547610
[1:59pm] nod_: dasjo: phayes seems to agree, see backscroll
[1:59pm] Brandonian: dasjo: Definitlely think it'd be a cool feature, not sure what other work is being done with that in php/drupal land though
[2:00pm] zzolo: dasjo: my initial thought is no, if we can focus on toolking up thinks like postgis and tilestache. but since that is not really going to happen overnight, a good stop gap solon could be server side clustering
[2:00pm] phayes: dasjo: Yes we need it! That's going to be critical as we scale to support larger data sets
[2:00pm] dasjo: yep Brandonian zzolo. basically some parts of the database implementation and client-side handling should be pluggable
[2:01pm] dasjo: but maybe we can come with some basic out-of-the-box implementations and leave it open to a database to optimize the clustering
tilestash + drupal + leaflet
log
[1:59pm] tnightingale: yeah - tilestache is (using mapnik) is able to generate tiles on demand from a postgis backend
[1:59pm] mackh: of data from the oil and gas commksion - like pipeline right-of-ways, access roads, wellsites, watercrossings
[2:00pm] tnightingale: zzolo: so we can sync drupal data into postgis and render that data into tiles using mapnik+tilestache
[2:00pm] zzolo: tnightingale: and mackh just to confirm, the postgi data is drupal data? (from sync_postgis, i assume)
[2:00pm] zzolo: sweet
[2:00pm] mackh: yes and no?
[2:00pm] tnightingale: yep
[2:01pm] mackh: stored in drupal, synced to postgis
[2:01pm] tnightingale: but doesn't have to be
[2:01pm] zzolo: awesome. thats what i was hoping. this is the stack i have been envisioning, but i don't get paid to do any of this stuff so i don't have time.
[2:02pm] Brandonian: for the more python/mapnik literate in the room: How feasible is it to do mapnik renders based directly off drupal field data if the database is Postgres/postgis and stored properly?
[2:02pm] tnightingale: Brandonian: that's essentially what we're doing with sync_postgis & tilestache
[2:03pm] zzolo: Brandonian: what i image is getting your design in tile mill, export out the mapnik xml file, then it should be pretty easy
[2:03pm] tnightingale: using a mapnik xml file produced in tilemill as our style guide
[2:06pm] tnightingale: zzolo: we also have a rough cut at a tilestache config management module on g.h
[2:06pm] phayes: ooo! link tnightingale?
[2:07pm] phayes: Oh tnightingale, what's the status of D7 Spatial-tools?
[2:07pm] zzolo: tnightingale: you have a link. i am actually gonna try to get this stack running locally this week as part of my presentation at the TC Drupal camp
[2:08pm] mackh: i also want to extend search api to return WKT onto maps
[2:08pm] mackh: from faceted results, hoping to get that client funded in the next 3 months
Office hours finishing
log
[2:07pm] tom_o_t: Brandonian: perhaps schedule dedicated Q&A time for stuff general questions that aren't related to module development?
[2:08pm] Brandonian: tom_o_t: Not a bad idea.
[2:09pm] jeffschuler: tom_o_t++ … this has been super useful, but It's been more of a facilitated discussion on high-level geo in drupal topics rather than a time for folks to come get help or figure out how they can pitch in… different from core office hours http://drupal.org/node/1242856
Use Case
Online-Biodiversitätsportale mit Indicia, Drupal und OpenLayers
Google Maps Hacks: Tips and Tools for Geographic Searching and Remixing
http://flylib.com/books/en/2.367.1.102/1/
siehe auch downloaded chm ebook
K-mean
Method
1. Select a center point for each of k clusters, where k is a small integer.
2. Assign each data point to the cluster whose center point is closest.
3. After all the data points are assigned, move each cluster's center point to the arithmetic mean of the coordinates of all the points in that cluster, treating each dimension separately.
4. Repeat from Step 2, until the center points stop moving.
Hierachical clustering
Method
1. Assign each point to its own cluster.
2. Calculate the distance from each cluster to every other cluster, either from their respective mean centers, or from the two nearest points from each cluster.
3. Take the two closest clusters and combine them into one cluster.
4. Repeat from Step 2, until you have the right number of clusters, or the clusters are some minimum distance apart from each other, or until you have one big cluster.
Naïve grid-based clustering
grid by display pixels / marker size
assign points to clusters within grid
order clusters by points
make superclusters (heavy clusters claim their neighbors)
Clustering Maps thesis
DB-Scan + R-Tree based implementation, tested up to 400-500 items only (~1 sec)
AMOEBA A Multidirectional Optimum Ecotope-Based Algorithm
AriSeL - Automatic Rationalization with Initial Seed Location
Automatic Zoning Procedure (AZP)
Reactive tabu variant of Automatic Zoning Procedure (AZP-R-Tabu)
Simulated Annealing variant of Automatic Zoning Procedure (AZP-SA)
Tabu variant of Automatic Zoning Procedure (AZP-Tabu)
Geo Self Organizing Map(geoSOM)
Max-p-regions model (Tabu)
Generate random regions
Self Organizing Map(SOM)
Drupal & Clustering
Drupal 6 github
https://github.com/ahtih/Geoclustering
beschreibung
* source data (points) is in the form of Drupal nodes
* a new Drupal module maintains a DB table of multi-level geographic clusters, using hook_nodeapi() to track changes to source nodes and update the clusters table accordingly. Clusters table is accessible via a Views plugin
* WFS module accesses clusters view and serves out both clusters and source nodes, with clustering level and bbox as request parameters
* a new OpenLayers Strategy or layer type displays the map, selecting a suitable clustering level based on zoom and/or number of points in map view area
Design and implement a server-side clustering algorithm for Drupal
Open source
Integrative & Exensible
Use cases
Usability
Realization
Analysis
Algorithm considerations
speed, on-the-fly
database, abstract, geohash, portable
database vs. application layer
Planning for the Drupal mapping environment
Drupal integration
Drupal mapping
Recruiter
Geofield
Views
Search API
Recruiter
Solr
Configuring maps in Drupal
Data model
Query
Mapping Library
Mapping stack
Data flow
Views
SAPI
Usability - Leaflet, client-side clustering
Algorithm
Geohash
Algorithm - theory
input
output
prototype-based clusters
Prototype-based clusters are defined so that objects are closer to their cluster’s prototype than to any other one. Prototypes of clusters are either centroids (the mean of all points for a cluster) for continuous data or medoids (the most central point within a cluster) for categorical data.
can effectively be used with datasets of up to 1000 items.
Clutter not only reduces the background visibility, but also hinders the users understanding of the structure and content of the data.
Hierarchical aggregation is a common visualization technique to make visual representations more visually scalable and less visu- ally cluttered [1]. In particular, hierarchical aggregation tech- niques have been proposed for exploring spatial data sets [2], [3].
Ellis and Dix’s [7] distinguish three main types of clutter reduction techniques: appearance (alter the look of the data items), spatial distortion (displace the data items in some ways) and temporal (animation).
Heatmap
However, a problem with this approach is that users interactions with the data items are harder to deal with, as they need to be handled at the pixel level.
Icons
One of the main drawbacks of using icons as aggregation symbols is that they do not show the area covered by the clusters.
Visualization evaluation
Evaluating visualization techniques is a well-known prob- lem [16], [17].
Types
summative (i.e., comparison-based),
formative (evaluation that leads to suggestions for improving the evaluated technique)
exploratory analysis (evaluation that is helpful to discover new ideas and concepts about the technique).
Color
Color is used to drive user’s attention on denser clusters. The color assigned to a polygon is determined using a hot-to- cold color ramp where hot colors are assigned to dense clusters and cold colors to sparse ones [18].
if points are concentrated in small visible areas, polygons will cover a significantly wider area than the area of their points. In that case, using bounding boxes or hulls, as cluster footprints could be more effective.
Types
Heatmap
Clustering and Visualizing Geographic Data Using Geo-tree