TDX/AI
← Technical LandscapeRapid Research

Geospatial Technology

Spatial APIs, coordinate systems, and mapping toolchains

Geospatial Technology and Visualization for Transport

Research document for the Global Intelligence System for Transport (GIST) Last updated: 2026-02-09

1. Overview

A Global Intelligence System for Transport requires geospatial infrastructure that can ingest, store, query, and visualize transport data at planetary scale in real-time. This document covers the relevant standards, spatial data infrastructure patterns, and visualization technologies.


2. OGC Standards Relevant to Transport

The Open Geospatial Consortium (OGC) defines interoperability standards for geospatial data. The following are most relevant to a global transport data system.

2.1 OGC API Family (Modern REST-based Standards)

The OGC is transitioning from legacy XML/SOAP standards (WMS, WFS, WCS) to modern RESTful JSON-based APIs:

  • OGC API - Features (successor to WFS): RESTful API for querying and serving vector geospatial features. JSON/GeoJSON output. Supports filtering (CQL2), spatial queries, pagination. Ideal for serving stop/station/route data.

  • OGC API - Tiles: Serves pre-rendered map tiles (raster or vector). Supports Mapbox Vector Tile (MVT) format. Efficient for large-scale basemaps and transport network overlays.

  • OGC API - Maps: Dynamic map rendering API. Less relevant for GIST (prefer vector tiles).

  • OGC API - Processes: Server-side geoprocessing. Could be used for on-demand spatial analysis (e.g., isochrone generation, network analysis).

  • OGC API - Records (successor to CSW): Metadata catalog API. Relevant for cataloging transport datasets across the GIST system.

  • OGC API - EDR (Environmental Data Retrieval): Querying data along transport corridors (e.g., weather conditions along a route).

  • OGC SensorThings API: IoT sensor data standard. Highly relevant for real-time transport sensor data (vehicle positions, traffic sensors, environmental sensors). Supports MQTT for real-time streaming.

2.2 Legacy OGC Standards (Still Widely Used)

  • WMS (Web Map Service): Serves rendered map images. Still used by many government mapping agencies.
  • WFS (Web Feature Service): Serves vector features as GML/XML. Still used by INSPIRE and many European spatial data infrastructures.
  • WCS (Web Coverage Service): Serves raster/grid data (elevation, satellite imagery).
  • GML (Geography Markup Language): XML encoding for geographic features. Used by NeTEx, INSPIRE, and many European standards.

2.3 OGC Standards for Transport-Specific Use Cases

  • CityGML: 3D city model standard. Includes transportation module for roads, railways, waterways in 3D. Relevant for 3D transport visualization.
  • IndoorGML: Indoor spatial data model. Relevant for in-station navigation and GTFS-Pathways.
  • Moving Features: Standard for representing objects that move through time and space. Directly relevant to vehicle tracking, vessel tracking, flight paths.

3. EU INSPIRE Directive

3.1 Overview

The INSPIRE Directive (2007/2/EC) establishes a Spatial Data Infrastructure (SDI) for Europe. It mandates that EU member states publish spatial data in interoperable formats using OGC standards.

3.2 Transport-Relevant INSPIRE Themes

  • Transport Networks (Annex I, Theme 7): Road, rail, water, air, cable transport networks. Defines a common data model for European transport infrastructure. Published via WMS/WFS services.
  • Addresses (Annex I, Theme 5): Relevant for geocoding transport locations.
  • Administrative Units (Annex I, Theme 4): Governance boundaries affecting transport jurisdiction.
  • Hydrography (Annex I, Theme 8): Waterway networks.
  • Land Use / Land Cover: Context for transport planning.
  • Utility and Government Services: Public service facilities including transport hubs.

3.3 INSPIRE Transport Network Data Model

The INSPIRE Transport Networks schema defines:

  • Network elements: Nodes (junctions), Links (road/rail/water segments), Link Sequences
  • Properties: Form of way, functional class, number of lanes, speed limits, restrictions
  • Intermodal connections: How different transport networks connect
  • Temporal attributes: Valid from/to dates for network changes

Format: GML 3.2.1, served via WFS 2.0. Some countries also provide GeoJSON, GeoPackage alternatives.

3.4 Relevance to GIST

INSPIRE provides a harmonized European spatial data layer for transport infrastructure. However:

  • Implementation quality varies significantly across member states
  • Update frequency is often poor (annual or less)
  • Focus is on infrastructure, not services/timetables/real-time
  • Useful as a base network layer, but must be supplemented with operational data from NeTEx, SIRI, DATEX II, etc.

4. Spatial Data Infrastructure Patterns for Global Systems

4.1 Architecture Patterns

Federated SDI: Each data source maintains its own services; a central catalog discovers and mediates access. This is the INSPIRE model. Pros: Data sovereignty, no centralization bottleneck. Cons: Query performance depends on weakest source, inconsistent quality.

Centralized Data Lake: All data is ingested, transformed, and stored centrally. A single system serves all queries. Pros: Consistent quality, fast queries, unified schema. Cons: Storage and compute costs, data freshness challenges, governance complexity.

Hybrid (Recommended for GIST): Core datasets are centralized and harmonized; real-time data is federated with caching; metadata catalog provides discovery across all sources. This balances performance with data freshness and sovereignty.

4.2 Spatial Databases for Transport

PostGIS (PostgreSQL extension):

  • The standard open-source spatial database
  • Supports geometry and geography types, spatial indexes (GiST, SP-GiST), spatial functions (ST_Distance, ST_Intersects, ST_Buffer, ST_Within, etc.)
  • pgRouting extension adds network routing capabilities (Dijkstra, A*, driving distance, TSP)
  • Handles millions of features efficiently
  • Supports time-series with TimescaleDB extension
  • Excellent integration with GTFS data (multiple tools import GTFS into PostGIS)
  • Recommended as primary spatial database for GIST

DuckDB + Spatial extension:

  • In-process analytical database with spatial support
  • Excellent for analytical queries over large datasets (billions of rows)
  • Reads Parquet, CSV, GeoJSON, GeoPackage natively
  • Column-oriented storage for fast aggregation
  • Good for analytics pipelines but not for transactional/real-time workloads
  • Recommended for analytical queries and batch processing in GIST

Apache Sedona (GeoSpark):

  • Distributed spatial computing on Apache Spark/Flink
  • Handles planetary-scale spatial data
  • Supports spatial joins, queries, and processing at massive scale
  • Relevant if GIST needs to process very large spatial datasets (e.g., all global AIS data)

H3 (Uber's Hexagonal Hierarchical Spatial Index):

  • Not a database but a spatial indexing system
  • Divides Earth into hierarchical hexagonal cells at multiple resolutions
  • Excellent for aggregating point data (vehicle positions, trip origins/destinations)
  • Supported by PostGIS, DuckDB, BigQuery, and most modern tools
  • Recommended as spatial indexing layer for aggregation and visualization in GIST

Elasticsearch / OpenSearch with Geo:

  • Full-text search + geospatial queries
  • Good for geographic search (find stops near a point, within a polygon)
  • Real-time indexing
  • Relevant for GIST search functionality

4.3 Cloud-Native Spatial Data Formats

The spatial data ecosystem is rapidly moving to cloud-native formats that support HTTP range requests (no need to download entire files):

FormatTypeDescriptionUse Case for GIST
GeoParquetVectorParquet files with geometry columnsAnalytical queries, data lake storage
FlatGeobufVectorBinary format with spatial indexFast feature streaming for visualization
PMTilesTilesSingle-file tile archiveServerless map tile serving
Cloud-Optimized GeoTIFF (COG)RasterGeoTIFF with internal tilingSatellite imagery, terrain data
ZarrArrayChunked, compressed array storageWeather/environmental data
GeoArrowVectorApache Arrow with geometryIn-memory analytics, inter-process transfer

4.4 Spatial Data Catalogs

  • STAC (SpatioTemporal Asset Catalog): Primarily for Earth observation data, but the catalog pattern is applicable to transport datasets. JSON-based metadata with spatial/temporal extent.
  • CKAN: Open-source data catalog used by many government transport data portals (e.g., data.gov, transport.data.gouv.fr).
  • OGC API - Records: Standard for spatial data catalog services.

5. Real-Time Geospatial Visualization Technologies

5.1 Deck.gl

Developer: Originally by Uber's visualization team, now part of the Open Visualization Collaboration (vis.gl) under the Linux Foundation / Urban Computing Foundation.

Architecture: WebGL2/WebGPU-powered visualization framework for large-scale data. React-friendly but also works standalone.

Key capabilities for transport:

  • ScatterplotLayer: Vehicle/stop positions (millions of points)
  • LineLayer / ArcLayer: Origin-destination flows, route visualization
  • PathLayer: Vehicle trajectories, route geometries
  • GeoJsonLayer: General-purpose geospatial rendering
  • TripsLayer: Animated vehicle movements along paths over time
  • HexagonLayer / H3HexagonLayer: Spatial aggregation and heatmaps
  • TileLayer / MVTLayer: Efficient rendering of tiled data
  • IconLayer: Station icons, vehicle icons
  • TextLayer: Labels

Performance: Can render millions of data points at 60fps using GPU acceleration. Supports WebGL instanced rendering and binary data transfer.

Data integration:

  • Reads GeoJSON, binary formats, Arrow/Parquet (via loaders.gl)
  • Supports tiled data loading (tiles loaded on demand as user pans/zooms)
  • Real-time data update via efficient state management

Strengths: Extremely high performance, rich layer library, well-maintained, large community, excellent for transport-specific visualizations (trips, flows, networks).

Limitations: WebGL/GPU requirement (not all devices), learning curve, requires custom development (not a turnkey solution).

Assessment for GIST: Primary recommendation for data visualization layer. The TripsLayer, H3 integration, and massive point rendering capability make it ideal for a global transport intelligence system.

5.2 Kepler.gl

Developer: Originally by Uber, now part of vis.gl / Open Visualization Collaboration.

Architecture: Built on top of Deck.gl and MapLibre GL JS. Provides a complete visual analytics application with GUI.

Key capabilities:

  • No-code/low-code geospatial visualization
  • Drag-and-drop data import (CSV, GeoJSON, Arrow)
  • Multiple layer types (point, arc, line, hexbin, heatmap, trip, polygon, cluster, grid, icon, S2, H3)
  • Time playback for temporal data (trip animations)
  • Filters and cross-filtering
  • Split map view
  • Map styles and basemap switching
  • Export to image and HTML

Strengths: Fastest path from data to visualization, excellent for exploration and prototyping, shareable visualizations, no coding required for basic use.

Limitations: Less customizable than raw Deck.gl, not designed for production embedded applications (though embeddable as React component), limited interactivity model, not ideal for real-time streaming data.

Assessment for GIST: Excellent for data exploration, prototyping, and analyst-facing dashboards. Could serve as the visual analytics component for internal data exploration. For the production-facing Transport Angel interface, custom Deck.gl development offers more control.

5.3 MapLibre GL JS

Origin: Community fork of Mapbox GL JS after Mapbox changed its license from BSD to proprietary (December 2020). MapLibre is fully open source (BSD-3-Clause).

Architecture: WebGL-based vector map rendering engine. Renders Mapbox Vector Tiles (MVT) using a style specification (compatible with Mapbox Style Spec).

Key capabilities:

  • High-performance vector tile rendering
  • Style-driven cartography (programmatic map styling via JSON)
  • 3D terrain and buildings
  • Smooth animations and camera control
  • Marker and popup support
  • Custom layers (integrate with Deck.gl)
  • Globe view (3D globe rendering)
  • Right-to-left text rendering (important for global system)
  • Localization support

Ecosystem:

  • MapLibre GL JS (web)
  • MapLibre Native (iOS, Android)
  • MapLibre RS (Rust rendering engine, in development)
  • Large plugin ecosystem
  • Compatible with MapTiler, Stadia Maps, Jawg, and self-hosted tile servers

Integration with Deck.gl: Deck.gl can be used as a MapLibre custom layer, combining MapLibre's basemap rendering with Deck.gl's data visualization layers. This is the recommended architecture for GIST.

Strengths: Fully open source, high performance, beautiful cartography, mature and well-maintained, large community, native mobile support.

Limitations: Vector tile rendering requires tile infrastructure (or use cloud tile services), style spec is complex, WebGL requirement.

Assessment for GIST: Primary recommendation for basemap rendering. MapLibre GL JS + Deck.gl is the strongest open-source stack for real-time transport visualization.

5.4 Other Visualization Tools

MapTiler: Commercial tile hosting and processing with generous free tier. Provides pre-built basemap tiles, geocoding, and SDKs. Uses MapLibre under the hood.

Mapbox: Commercial platform (no longer open source). Higher performance and features than MapLibre in some areas, but proprietary and expensive at scale.

Leaflet: Lightweight, simple map library. Good for basic maps. Not suitable for GIST's performance requirements (no WebGL, struggles with large datasets).

OpenLayers: Full-featured open-source web map library. Supports many formats and projections. More complex API than Leaflet. Good OGC standards support. Less performant than MapLibre for vector tiles.

CesiumJS: 3D globe visualization. WebGL-based. Excellent for 3D terrain, flight paths, satellite tracking. Relevant if GIST needs 3D globe visualization for aviation/maritime.

Felt: Commercial collaborative mapping platform. Good for team workflows but not suitable for embedded/custom applications.


6. Real-Time Data Streaming Infrastructure

6.1 Streaming Patterns for Transport Data

Real-time transport visualization requires a streaming data pipeline:

Data Sources --> Ingestion --> Processing --> Serving --> Visualization
(GTFS-RT,     (Kafka,       (Flink,      (WebSocket, (Deck.gl,
 SIRI,        Pulsar,       Spark        SSE,        MapLibre)
 AIS,         MQTT)         Streaming,   HTTP/2)
 ADS-B)                     Kafka
                            Streams)

6.2 Message Brokers

Apache Kafka:

  • Industry standard for high-throughput event streaming
  • Excellent for ingesting multiple transport data feeds
  • Supports exactly-once semantics, compaction, replay
  • Topic-per-source or topic-per-region partitioning for transport data
  • Recommended as primary message broker for GIST

Apache Pulsar:

  • Alternative to Kafka with multi-tenancy, geo-replication
  • Built-in tiered storage (hot/warm/cold)
  • Native geo-replication (relevant for global system)

MQTT:

  • Lightweight IoT messaging protocol
  • Used by OGC SensorThings API
  • Relevant for vehicle/sensor telemetry
  • Lower overhead than Kafka for edge devices

Redis Streams:

  • In-memory stream processing
  • Very low latency
  • Good for real-time caching layer between backend and WebSocket server

6.3 Client-Side Real-Time Delivery

WebSocket: Full-duplex communication. Best for continuous real-time updates (vehicle positions). Maintains persistent connection.

Server-Sent Events (SSE): One-directional push from server. Simpler than WebSocket. Good for alerts, status updates. Auto-reconnect built in.

HTTP/2 Server Push / HTTP/3: Multiplexed streaming over HTTP. Emerging alternative to WebSocket for some use cases.

gRPC Streaming: Protocol Buffer-based bidirectional streaming. Efficient binary format. Good for service-to-service communication, less common for browser clients (requires gRPC-web proxy).

6.4 Spatial Streaming Patterns

Geospatial subscription: Clients subscribe to updates within a geographic bounding box or polygon. As vehicles enter/exit the area, subscriptions automatically route relevant updates.

Spatial partitioning: Data is partitioned by geographic region (e.g., H3 cells, S2 cells, geohash prefixes) in the message broker. Consumers subscribe to relevant partitions based on the user's viewport.

Level-of-detail streaming: At low zoom levels, send aggregated data (heatmaps, cluster counts). At high zoom levels, send individual features. Adapts data volume to viewport.


7. Tile Infrastructure for Global Scale

7.1 Vector Tile Pipeline

For serving transport network data at global scale:

  1. Data processing: Convert transport network data to vector tiles using tippecanoe (Mapbox) or martin (PostGIS-native tile server).
  2. Tile storage: Store as PMTiles (single file on cloud storage) or in MBTiles (SQLite).
  3. Tile serving:
    • PMTiles on S3/R2/GCS: Serverless, uses HTTP range requests. No tile server needed. Very cost-effective.
    • Martin (Rust): Dynamic vector tile server from PostGIS. Real-time tile generation from database queries.
    • pg_tileserv (PostGIS): Another PostGIS tile server option.
    • TileServer GL: Serves pre-generated tiles and can render raster tiles from vector sources.
  • Static layers (transport networks, stop locations, administrative boundaries): Pre-generate as PMTiles, serve from cloud object storage. Update periodically (daily/weekly).
  • Dynamic layers (vehicle positions, real-time status): Serve directly from PostGIS via Martin, or push to clients via WebSocket as GeoJSON/binary features.
  • Basemap: Use MapTiler or self-hosted OpenMapTiles for basemap tiles.

8.1 Open-Source Geocoding

  • Pelias (Linux Foundation): Modular open-source geocoder. Supports multiple data sources (OSM, Who's on First, OpenAddresses, GeoNames). Elasticsearch-based.
  • Nominatim: OSM's geocoder. PostgreSQL-based. Good quality but single-source (OSM only).
  • Photon: Elasticsearch-based geocoder using OSM data. Fast, supports reverse geocoding.

For GIST, users need to search for:

  • Stop/station names (fuzzy matching, multilingual)
  • Route names and numbers
  • Place names and addresses
  • Operator names

Recommendation: Use Elasticsearch/OpenSearch with custom analyzers for transport entity search. Index stop/station names from GTFS/NeTEx alongside place names from OSM/geocoding services. Support multilingual search with language-specific analyzers.


ComponentTechnologyRationale
Basemap renderingMapLibre GL JSOpen source, high performance, mobile support
Data visualizationDeck.glGPU-accelerated, transport-specific layers
Spatial databasePostGISStandard, mature, routing support (pgRouting)
Analytical databaseDuckDB + SpatialFast analytics, cloud-native formats
Spatial indexingH3Hexagonal aggregation, multi-resolution
Tile servingPMTiles (static) + Martin (dynamic)Serverless + real-time
Real-time streamingApache Kafka + WebSocketHigh throughput + browser delivery
SearchElasticsearch / OpenSearchFull-text + geospatial
GeocodingPelias or PhotonOpen source, multilingual
Tile processingTippecanoeIndustry standard for tile generation
Data formatsGeoJSON (API), GeoParquet (storage), FlatGeobuf (streaming), MVT (tiles)Best format for each use case

10. Scalability Considerations for Global System

10.1 Data Volume Estimates

Data SourceApproximate VolumeUpdate Frequency
GTFS Schedule (global)~50-100 GB (all agencies)Weekly/monthly
GTFS-RT (global)~10-50 GB/dayEvery 15-60 seconds
AIS (global)~20-50 GB/dayEvery 2-30 seconds
ADS-B (global)~5-20 GB/dayEvery 1-5 seconds
GBFS (global)~1-5 GB/dayEvery 30-60 seconds
DATEX II (European)~5-20 GB/dayEvery 1-5 minutes
OSM transport data~10-20 GB (extract)Minutely diffs available

10.2 Key Scaling Strategies

  1. Spatial partitioning: Partition data by geographic region (continental, national, or H3 cell level).
  2. Temporal partitioning: Hot data (real-time, last 24 hours) in fast storage; warm data (last 30 days) in analytical storage; cold data (historical) in archival storage.
  3. Tile-based delivery: Pre-compute and cache tiles to avoid per-request computation.
  4. Edge caching: Use CDN for static/semi-static data (schedule data, basemap tiles).
  5. Viewport-aware loading: Only load data within the user's current viewport and zoom level.
  6. Progressive loading: Load coarse data first, refine as user zooms in.

11. References