Data Interoperability
Standards, formats, and cross-source integration patterns
Transport Data Standards and Interoperability
Research document for the Global Intelligence System for Transport (GIST) Last updated: 2026-02-09
1. Overview
The global transport data ecosystem is fragmented across hundreds of formats, standards, and proprietary systems. Building a system that aggregates data from many different transport databases requires deep understanding of the dominant standards, their data models, relationships, and the gaps between them. This document maps the landscape of transport data standards and interoperability patterns relevant to GIST.
2. Public Transit Standards
2.1 GTFS (General Transit Feed Specification)
Origin and governance: Created by Google and TriMet (Portland, OR) in 2005 as "Google Transit Feed Specification." Renamed to "General Transit Feed Specification" to reflect broader adoption. Now governed by MobilityData (a non-profit), with an open specification process on GitHub.
GTFS Schedule (static data):
- Format: A collection of CSV files packaged in a ZIP archive.
- Key files:
agency.txt-- Transit agency metadatastops.txt-- Stop locations (lat/lon, name, accessibility info)routes.txt-- Route definitions (short name, long name, type)trips.txt-- Individual trips on a routestop_times.txt-- Arrival/departure times at each stop for each tripcalendar.txt/calendar_dates.txt-- Service schedules and exceptionsshapes.txt-- Geographic path of vehicle travel (polylines)fare_attributes.txt/fare_rules.txt-- Fare informationfrequencies.txt-- Headway-based schedulestransfers.txt-- Transfer rules between stopspathways.txt-- In-station navigation pathslevels.txt-- Station levelsfeed_info.txt-- Feed metadata
- Extensions: GTFS-Fares v2 (complex fare modeling), GTFS-Flex (demand-responsive transit), GTFS-Pathways (in-station navigation).
- Strengths: Extremely widely adopted (thousands of agencies worldwide), simple CSV format, easy to parse, large open-source tooling ecosystem.
- Limitations: No native support for complex fare structures (being addressed by Fares v2), limited demand-responsive transit modeling, no real-time data (separate spec), limited accessibility detail.
GTFS Realtime:
- Format: Protocol Buffers (protobuf) binary format.
- Message types:
TripUpdate-- Real-time arrival/departure predictionsVehiclePosition-- Current vehicle locations (lat/lon, bearing, speed)Alert-- Service alerts (delays, detours, station closures)
- Delivery pattern: HTTP polling (consumer requests feed URL periodically). Some providers also offer WebSocket or push-based delivery.
- Update frequency: Typically every 15-60 seconds for vehicle positions, 30-120 seconds for trip updates.
- Strengths: Complements GTFS Schedule, wide adoption, efficient binary format.
- Limitations: Polling-based model can create latency; no standardized streaming; protobuf requires schema compilation.
Adoption: Over 10,000 transit agencies worldwide produce GTFS feeds. Major aggregators include Transitland, OpenMobilityData, and the European NAPs (National Access Points).
2.2 NeTEx (Network Timetable Exchange) -- CEN/TS 16614
Origin and governance: European Committee for Standardization (CEN) standard, specifically CEN/TS 16614, published in three parts. Based on the Transmodel conceptual data model (EN 12896). Mandatory for EU member states under the ITS Directive and Delegated Regulation (EU) 2017/1926.
Architecture:
- Part 1: Network topology (stops, routes, lines, networks, tariff zones)
- Part 2: Timetables (service journeys, vehicle journeys, passing times, calendar)
- Part 3: Fares and ticketing (fare structures, distance matrices, access rights, sales packages)
Format: XML, with a highly detailed and deeply nested schema. The full NeTEx XSD is extremely large (thousands of elements).
Data model (Transmodel-based):
- Network: Lines, Routes, Route Points, Route Links
- Timing: Service Journeys, Vehicle Journeys, Passing Times
- Stops: Scheduled Stop Points, Stop Places, Quays, Entrances
- Fares: Fare Zones, Distance Matrices, Fare Products, Access Rights, Sales Offer Packages
- Resources: Vehicle Types, Operators, Organisations
Profiles: Because the full NeTEx schema is extremely complex, countries create national profiles:
- Nordic NeTEx Profile (Norway, Sweden, Finland)
- UK NeTEx Profile (used by Bus Open Data Service)
- French NeTEx Profile (Profil France)
- Netherlands NeTEx/OVapi Profile
- EPIP (European Passenger Information Profile) -- a minimal European profile
Relationship to GTFS: NeTEx is far more expressive than GTFS. A NeTEx file can represent everything in GTFS and much more (complex fares, detailed stop place structures, vehicle types, operator info). Conversion from GTFS to NeTEx is lossy in one direction -- NeTEx to GTFS loses significant detail.
Strengths: Comprehensive data model, European legal mandate, handles complex fare structures, integrates stop place management, extensible.
Limitations: Extreme complexity (XML schema is very large), steep learning curve, inconsistent implementation across countries (national profiles diverge), verbose XML format, fewer open-source tools than GTFS.
2.3 SIRI (Service Interface for Real-time Information) -- CEN/TS 15531
Origin and governance: CEN standard for real-time public transport information exchange. Companion standard to NeTEx (NeTEx for planned/static data, SIRI for real-time).
Services (functional modules):
- Production Timetable (PT): Changes to the planned timetable
- Estimated Timetable (ET): Real-time predictions for entire timetables
- Stop Monitoring (SM): Real-time arrivals/departures at a specific stop
- Vehicle Monitoring (VM): Real-time vehicle positions
- Connection Timetable (CT): Planned connections between services
- Connection Monitoring (CM): Real-time connection protection
- General Message (GM): Incident and service information
- Situation Exchange (SX): Structured incident information (aligned with DATEX II)
- Facility Monitoring (FM): Status of facilities (elevators, escalators)
Exchange patterns:
- Request/Response: Client sends a request, server returns current data
- Subscribe/Notify: Client subscribes, server pushes updates
- Fetched Delivery: Client fetches accumulated notifications
Format: XML (SOAP-based in SIRI 1.x, lighter XML/JSON options in SIRI 2.x). SIRI Lite is a simplified RESTful JSON profile.
Strengths: Comprehensive real-time functionality, European standard with legal backing, integrates with NeTEx data model.
Limitations: Complex XML/SOAP interface, heavy bandwidth requirements for full SIRI, inconsistent national implementations, less widely adopted than GTFS-Realtime globally.
2.4 TransXChange
Scope: UK-specific XML standard for bus timetable data exchange. Used by the UK Bus Open Data Service (BODS).
Key elements: Services, Routes, Route Sections, Journey Patterns, Vehicle Journeys, Stop Points (referencing NaPTAN stop database), Operating Profiles (calendar rules).
Relationship to NeTEx: TransXChange predates NeTEx. The UK is gradually transitioning to NeTEx, but TransXChange remains the primary format for bus data submission in England. TransXChange can be mapped to NeTEx.
Relevance to GIST: Important for UK bus data. Conversion tools exist (TransXChange to GTFS converters are available as open source).
3. Shared Mobility Standards
3.1 GBFS (General Bikeshare Feed Specification)
Origin and governance: Created by NABSA (North American Bikeshare Association), now governed by MobilityData. Version 3.0 released in 2023.
Format: JSON files served over HTTP. Each feed is a separate JSON endpoint.
Key feeds:
gbfs.json-- Auto-discovery file listing all feed URLssystem_information.json-- System metadata (name, operator, URL, timezone)station_information.json-- Station locations, capacity, coordinatesstation_status.json-- Real-time availability (bikes available, docks available)vehicle_types.json-- Types of vehicles (e-bike, scooter, etc.) [v3.0+]free_bike_status.json-- Dockless vehicle locations and availabilitysystem_pricing_plans.json-- Pricing structuressystem_alerts.json-- System-wide alertsgeofencing_zones.json-- Operating areas, speed zones, no-park zones [v2.1+]
Strengths: Simple JSON format, real-time by design, covers bikeshare and scootershare, growing adoption.
Limitations: Limited to shared micromobility (no car-share, ride-hail), no historical data in spec, no routing information.
Adoption: Over 800 systems worldwide publish GBFS feeds. Required by many city regulators.
3.2 MDS (Mobility Data Specification)
Origin: Created by LADOT (Los Angeles Department of Transportation) through the Open Mobility Foundation.
Purpose: City-centric regulatory standard for shared mobility, covering dockless bikes, e-scooters, and other shared vehicles. Unlike GBFS (consumer-facing), MDS is designed for city regulatory compliance.
APIs:
- Provider API: Operators report trip data and vehicle status to cities
- Agency API: Cities send directives to operators
- Policy API: Cities publish regulations digitally (geofencing, caps, speed limits)
- Metrics API: Aggregated metrics for policy analysis
Relevance to GIST: Complementary to GBFS. MDS provides trip-level data and regulatory context that GBFS lacks.
4. Road Traffic & Travel Data Standards
4.1 DATEX II (CEN/TS 16157)
Origin and governance: CEN standard for road traffic and travel information exchange. Widely used by national road authorities across Europe. Now aligned with the ITS Directive.
Scope:
- Traffic status and flow data (speed, volume, occupancy)
- Travel time information
- Road works and events
- Weather conditions affecting traffic
- Parking information
- Variable message sign content
- Road condition and winter maintenance
- Charging/fueling infrastructure status
- Urban traffic management
Data model: UML-based conceptual model, with XML encoding. Publication/subscription model.
Exchange patterns:
- Push (publisher-initiated): Publisher sends updates to subscribers
- Pull (client-initiated): Client requests current data
- Exchange: Bidirectional data sharing
Profile structure: DATEX II uses "Levels" (A, B, C) to define increasingly detailed data models. Level A is basic traffic data, Level B adds detailed events, Level C adds traffic management actions.
Strengths: Comprehensive coverage of road traffic data, European standard with strong adoption among road authorities, supports both real-time and historical data.
Limitations: Primarily road-focused (not multimodal), complex XML format, national variations in implementation.
4.2 TPEG (Transport Protocol Experts Group)
Scope: ISO standard for delivering traffic and travel information to end users, particularly via broadcast (DAB, internet). Covers traffic events, weather, fuel prices, parking, public transport.
Relevance to GIST: Relevant as a consumption format, less as a data exchange format. TPEG data can be a source for real-time road conditions.
5. Maritime Transport Standards
5.1 AIS (Automatic Identification System)
Governance: International Maritime Organization (IMO), ITU regulations.
Purpose: Automatic tracking system for vessel identification and positioning. Mandatory for all ships over 300 GT in international voyages and all passenger ships.
Data transmitted:
- Static data: MMSI, IMO number, vessel name, ship type, dimensions
- Dynamic data: Position (lat/lon), course over ground, speed over ground, heading, rate of turn, navigational status
- Voyage data: Destination, ETA, draught, cargo type
Transmission: VHF radio (ship-to-ship and ship-to-shore), with satellite reception (S-AIS) for global coverage.
Data format: ITU-R M.1371 defines the binary message format. NMEA 0183 sentence format for serial data. Typical decoded format is CSV or JSON in most modern systems.
Data sources:
- MarineTraffic: Major AIS data aggregator
- VesselFinder: AIS-based vessel tracking
- AISHub: Community AIS data sharing
- UN Global Platform / IMO GISIS: Official maritime data
Strengths: Global mandatory coverage, real-time positioning, standardized data, large existing data infrastructure.
Limitations: Spoofing/manipulation risks, coverage gaps in mid-ocean (without satellite AIS), data quality varies, high-frequency data creates large volumes.
5.2 S-100 Framework
Governance: International Hydrographic Organization (IHO).
Purpose: Universal hydrographic data model framework. S-101 (Electronic Navigational Charts) is the successor to S-57. Relevant to maritime routing and navigation.
6. Aviation Standards
6.1 IATA Standards
Key standards:
- SSIM (Standard Schedules Information Manual): Airline schedule data format. Defines the format for communicating airline schedules between carriers, airports, and data aggregators.
- AIDX (Aviation Information Data Exchange): XML-based standard for exchanging operational flight data between airports and airlines.
- NDC (New Distribution Capability): XML/JSON API standard for airline retail (fares, offers, orders). Modernizes airline distribution beyond legacy GDS protocols.
- ONE Order: Simplifies airline order management into a single order record.
6.2 ICAO Standards
- FIXM (Flight Information Exchange Model): XML-based standard for flight data exchange across ATM (Air Traffic Management) systems.
- AIXM (Aeronautical Information Exchange Model): XML/GML standard for aeronautical information (airports, airspace, navaids).
- WXXM (Weather Information Exchange Model): Aviation weather data.
6.3 Eurocontrol / SWIM
SWIM (System Wide Information Management): Architecture for sharing ATM information across European airspace. Uses FIXM, AIXM, WXXM as data models. Relevant to accessing European aviation operational data.
Relevance to GIST: Aviation data is highly standardized but also highly restricted. Schedule data (SSIM) is commercially available through OAG, Cirium, etc. Real-time flight tracking (ADS-B) is more accessible through services like FlightRadar24, FlightAware, OpenSky Network.
7. Geospatial Transport Data Formats
7.1 GeoJSON (RFC 7946)
Format: JSON-based format for encoding geographic data structures.
Geometry types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
Transport applications:
- Stop/station locations (Point)
- Route geometries (LineString)
- Service areas / geofencing zones (Polygon)
- GBFS geofencing zones use GeoJSON
- Many transport APIs return GeoJSON
Strengths: Simple, human-readable, universal web support, native in most mapping libraries (Leaflet, MapLibre, Deck.gl).
Limitations: Verbose (large file sizes for complex geometries), no built-in topology, no native CRS support (assumes WGS84), not suitable for very large datasets.
7.2 GeoPackage (OGC)
Format: SQLite-based container for geospatial data (vector and raster).
Strengths: Single-file format, supports complex queries via SQL, efficient for large datasets, OGC standard, supports multiple layers and coordinate reference systems.
Transport applications: Offline geospatial data distribution, large-scale network datasets, basemap tiles.
7.3 FlatGeobuf
Format: Binary format optimized for fast streaming and random access to geospatial features.
Strengths: Extremely fast for large datasets, supports HTTP range requests (cloud-native), compact binary format.
Relevance to GIST: Excellent for serving large transport network datasets via HTTP without full download.
7.4 PMTiles / Vector Tiles (MVT)
Mapbox Vector Tiles (MVT): Protocol Buffer-based format for tiled vector map data. Efficiently serves map data at multiple zoom levels.
PMTiles: Single-file archive format for tilesets. Cloud-native (supports HTTP range requests). Eliminates need for tile server infrastructure.
Relevance to GIST: Critical for real-time geospatial visualization at global scale. Transport networks can be pre-tiled for efficient rendering.
7.5 OpenStreetMap Transport Data Model
Relevance: OSM is the world's largest open geographic database, with extensive transport infrastructure mapping.
Key transport elements in OSM:
- Highway network:
highway=*tags classify roads from motorways to footpaths - Public transport:
public_transport=*tags (stop_position, platform, station, stop_area) - Railway network:
railway=*tags - Waterways:
waterway=*tags - Aeroway:
aeroway=*tags for airport infrastructure - Route relations:
type=routerelations for bus routes, train lines, cycling routes, etc. - Public transport v2 schema: Modern tagging scheme using
public_transport=*tags with route relations
Data access:
- Overpass API for querying
- Planet file (full dump, ~70 GB compressed)
- Regional extracts (Geofabrik, BBBike)
- Vector tile services (OpenMapTiles, Protomaps)
Strengths: Global coverage, community-maintained, free and open, detailed infrastructure mapping, rich attribute data.
Limitations: Variable data quality and completeness by region, no timetable/schedule data, editing vandalism, complex data model for newcomers.
8. Cross-Cutting Interoperability Frameworks
8.1 Transmodel (EN 12896)
The conceptual reference model underpinning NeTEx, SIRI, and IFOPT. Defines a common vocabulary and data model for public transport:
- Network Description: Topological model of routes, lines, stop points
- Timing Information: Timetables, frequencies, calendar
- Vehicle Scheduling: Blocks, vehicle journeys
- Fare Management: Fare structures, products, distribution
- Passenger Information: Real-time information model
- Management Information: Performance, statistics
Importance to GIST: Transmodel provides the semantic foundation for harmonizing European transport data. Understanding Transmodel is essential for mapping between GTFS, NeTEx, SIRI, and other standards.
8.2 EU ITS Directive and NAPs
The EU ITS Directive (2010/40/EU) and delegated regulations require EU member states to establish National Access Points (NAPs) for transport data. Key delegated regulations:
- (EU) 2017/1926: Multimodal travel information services -- requires publishing of static and dynamic transport data through NAPs, preferencing NeTEx and SIRI formats.
- (EU) 2015/962: Real-time traffic information -- requires DATEX II for road traffic data.
- (EU) 2024/490: European Mobility Data Space (EMDS) -- updated framework for transport data sharing.
NAP implementations: Each EU member state operates a NAP (e.g., UK BODS, France transport.data.gouv.fr, Germany Mobilithek, Netherlands NDOV/OVapi, Norway Entur, Sweden Trafiklab).
8.3 Data Standard Mapping / Crosswalk
| Aspect | GTFS | NeTEx | SIRI | GBFS | DATEX II |
|---|---|---|---|---|---|
| Domain | Public transit (schedule) | Public transit (comprehensive) | Public transit (real-time) | Shared mobility | Road traffic |
| Format | CSV/ZIP | XML | XML/JSON | JSON | XML |
| Data model | Flat relational | Transmodel (deep hierarchy) | Transmodel | Flat JSON | UML/XML |
| Real-time | Via GTFS-RT (protobuf) | No (see SIRI) | Yes | Yes (by design) | Yes |
| Geographic scope | Global | Europe (expanding) | Europe (expanding) | Global | Europe (expanding) |
| Complexity | Low | Very High | High | Low | High |
| Open-source tools | Abundant | Limited | Limited | Moderate | Limited |
| Adoption | 10,000+ agencies | Mandatory in EU | Mandatory in EU | 800+ systems | EU road authorities |
| Fares | Basic (v1), improving (v2) | Comprehensive (Part 3) | No | Pricing plans | Tolling info |
| Accessibility | Basic | Comprehensive | Facility monitoring | No | No |
8.4 Key Interoperability Challenges for GIST
- Schema heterogeneity: Standards use fundamentally different data models (flat CSV vs deep XML hierarchies vs JSON).
- Identifier fragmentation: No global stop/station ID system. Each standard and agency uses its own identifiers. Efforts like IFOPT (Identification of Fixed Objects in Public Transport) and QUAY codes help but are not universal.
- Temporal alignment: Different standards handle time differently (absolute timestamps vs offsets, timezone handling, calendar modeling).
- Spatial reference: Most standards use WGS84 lat/lon, but accuracy, precision, and representation vary.
- Semantic gaps: Concepts that exist in one standard (e.g., NeTEx fare products) may have no equivalent in another (GTFS).
- Update frequencies: Static data (GTFS Schedule) updates weekly/monthly; real-time data (GTFS-RT, SIRI) updates every seconds.
- Data quality: Varies enormously across agencies, countries, and modes. No universal quality framework.
- Licensing: Data licensing ranges from fully open (CC0, CC-BY) to commercially restricted to government-restricted.
8.5 Recommended Interoperability Strategy for GIST
- Use GTFS as the lingua franca for public transit schedule data (widest adoption, best tooling).
- Ingest NeTEx/SIRI natively for European data where GTFS is insufficient (complex fares, detailed stop places).
- Build a canonical internal data model inspired by Transmodel but pragmatically mapped to GTFS concepts.
- Use GeoJSON as the interchange format for geospatial data in APIs and visualization.
- Implement per-source adapters that normalize heterogeneous sources into the canonical model.
- Maintain provenance metadata tracking original source, format, update time, quality metrics, and license.
- Leverage existing converters (NeTEx-to-GTFS, TransXChange-to-GTFS) where available, but accept some data loss.
- Adopt persistent identifiers (consider Transitland Onestop IDs or similar) as a stable reference layer.
9. Emerging Standards and Trends
- GTFS-Fares v2: Major effort to bring complex fare modeling to GTFS (zone-based, distance-based, transfers, capping).
- GTFS-Flex: Modeling demand-responsive transit (dial-a-ride, deviated fixed routes).
- GBFS v3.0: Vehicle types, deep-linking, improved geofencing.
- MDS 2.0: Expanded scope beyond micromobility, improved privacy.
- TOMP-API: Transport Operator to MaaS Provider API standard (European MaaS ecosystem).
- OSDM (Open Sales and Distribution Model): European standard for rail ticket distribution, complementing NeTEx Fares.
- EU Mobility Data Space: European framework for trusted transport data sharing with governance, consent, and business models.
- GTFS-Pathways: Indoor navigation within transit stations.
10. References and Key Resources
- GTFS specification: https://gtfs.org
- MobilityData (GTFS/GBFS governance): https://mobilitydata.org
- NeTEx: https://netex-cen.eu / CEN/TS 16614
- SIRI: CEN/TS 15531
- Transmodel: EN 12896 / https://transmodel-cen.eu
- DATEX II: https://datex2.eu
- GBFS: https://gbfs.org
- MDS: https://github.com/openmobilityfoundation/mobility-data-specification
- EU NAP overview: https://transport.ec.europa.eu/transport-themes/intelligent-transport-systems/road/action-plan-and-directive/national-access-points_en
- OpenStreetMap public transport: https://wiki.openstreetmap.org/wiki/Public_transport
- IMO AIS: https://www.imo.org/en/OurWork/Safety/Pages/AIS.aspx
- IATA standards: https://www.iata.org/en/programs/airline-retailing/
- OGC standards: https://www.ogc.org/standards/