COTA Buses on a Map

I joined the Smart Columbus team during my Master’s at Ohio State, and honestly, I didn’t fully appreciate what I was walking into. Back in 2016, Columbus had won the U.S. Department of Transportation’s Smart City Challenge, a $40 million grant that beat out 77 other cities. The pitch was to rethink how people move through the city. By the time I got involved, the initiative was well underway and the scope of it was staggering: connected vehicles, autonomous shuttles in the Linden neighborhood, six Smart Mobility Hubs along the CMAX bus rapid transit corridor, an open data platform called the Smart Columbus Operating System. All of it aimed at a city where 85% of residents were driving to work alone.

My piece of it was smaller. I was on the data side, working with transit datasets, specifically COTA’s feeds. COTA is the Central Ohio Transit Authority. They run the bus network across Franklin County and parts of the surrounding area, and they’d just launched CMAX in January 2018, Columbus’s first bus rapid transit line running from downtown up Cleveland Avenue to Westerville. The data was there. The question was what to do with it.

A colleague on the team had a good instinct about this. We’d been looking at COTA’s data feeds (route definitions, stop locations, trip schedules) and the conversation kept circling around the same problem: raw data doesn’t communicate. You can hand someone a spreadsheet with 3,000 rows of stop coordinates and they’ll nod politely. Put those stops on a map and suddenly the gaps in coverage become obvious. The colleague suggested we build a visualization, something interactive where you could see the entire bus network overlaid on Columbus. One of those ideas that seems obvious in hindsight but changes how you think about the data once you do it.

So that became my project. Parse COTA’s transit data and render it on a map using React and Leaflet.

The GTFS Format#

Before getting into the code, the data format itself deserves some attention.

GTFS stands for General Transit Feed Specification, and the history behind it is kind of great. In 2005, a Google engineer named Chris Harrelson was trying to figure out how to get transit directions into Google Maps. He connected with Tim and Bibiana McHugh at TriMet, Portland’s transit agency, who were frustrated that mapping services could give you driving directions but had no idea how to tell you to take the bus. They worked out a data format together, Portland became the first city in Google’s Transit Trip Planner in December 2005, and by September 2006 the format was published publicly as the “Google Transit Feed Specification.” It got renamed to “General Transit Feed Specification” in 2009 because the Google branding was making some agencies hesitant to adopt it.

The format is basically a ZIP file full of CSV files with .txt extensions. You could open them in Excel if you wanted to. I think that simplicity is a big part of why it caught on, because hundreds of transit agencies worldwide publish GTFS feeds now.

Here are the files that matter most for a visualization project like this:

agency.txt has the basic info about the transit agency. Name, URL, timezone. For COTA, this is straightforward: one agency, Eastern time.

routes.txt defines each route in the system. Each row has a route_id, route_short_name (like “2” or “CMAX”), route_long_name (like “East Main Street”), and a route_type (3 for bus, which is everything COTA runs). There’s also a route_color field, a six-digit hex color, which comes in handy for color-coding routes on a map.

trips.txt lists individual trips along a route. A single route might have dozens of trips throughout the day, each with a trip_id, a reference back to route_id, a service_id (linking to the schedule), and a shape_id. The shape ID is the important one here.

stops.txt is every stop in the system. Each row gives you a stop_id, stop_name, stop_lat, and stop_lon. COTA has roughly 3,000 of these.

stop_times.txt is the arrival and departure time at each stop for each trip, and it’s the largest file by far. It connects trips to stops with timestamps in HH:MM:SS format. Times can exceed 24:00:00 for trips that run past midnight, which is a fun quirk of the spec.

shapes.txt is the one that makes the visualization work. Each row contains a shape_id, a latitude, a longitude, and a sequence number. Ordered by sequence, these points trace out the actual geographic path a bus follows. Without this file, you’d have to draw straight lines between stops, which looks wrong because buses follow roads.

calendar.txt and calendar_dates.txt define when services run. Days of the week, start and end dates, and exceptions for holidays.

The relationships between these files are pretty intuitive. Routes have trips, trips have stop_times that reference stops, and trips reference shapes for their geographic path. Once I understood these relationships, the whole feed clicked into place:

agency.txt → routes.txt → trips.txt → stop_times.txt → stops.txt
                              ↓
                          shapes.txt

plaintext

COTA’s Feed#

COTA publishes their GTFS data publicly¹. You download a ZIP, unpack it, and you’ve got all the files I described above. When I was working on this, the feed had around 41 routes and roughly 3,000 stops across their service area.

One thing I noticed early on: not every transit agency includes shapes.txt. It’s technically optional in the spec. COTA does include it, which saved me from having to infer routes from stop sequences. The shapes give you smooth, road-following polylines for each route.

COTA also provides GTFS-realtime feeds with vehicle positions and trip updates delivered as protocol buffers, but for this project I stuck with the static feed. Getting the routes and stops on a map first felt like the right foundation before worrying about real-time positions.

Setting Up React and Leaflet#

For the frontend, I went with React and Leaflet. React was at 16.12 at the time. Hooks had been stable for about a year (shipped in 16.8, February 2019), and I’d been writing new code with them rather than class components. For something like this, mostly loading data and passing it as props to map components, hooks felt like a good fit. useEffect for fetching and parsing the GTFS files, useState to hold the parsed results, and the rendering flows from there.

The mapping library was react-leaflet v2.x, which wrapped Leaflet’s API in React components: Map, TileLayer, Polyline, CircleMarker, Popup, and so on. Under the hood, v2 was still built on class-based abstractions, but the component API was clean enough that you didn’t need to care about that.

I scaffolded the project with Create React App:

npx create-react-app cota-map
cd cota-map
npm install react-leaflet leaflet

bash

CRA 3.x handled all the Webpack and Babel configuration, so I could just focus on the application logic. The basic map setup looks like this:

import React from 'react'
import { Map, TileLayer } from 'react-leaflet'

import 'leaflet/dist/leaflet.css'

function CotaMap() {
  const columbusCenter = [39.9612, -82.9988]

  return (
    <Map center={columbusCenter} zoom={12} style={{ height: '100vh', width: '100%' }}>
      <TileLayer
        url='https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png'
        attribution='&copy; OpenStreetMap contributors'
      />
    </Map>
  )
}

export default CotaMap

jsx

A couple things to note. The Map component (not MapContainer, that’s a later version of react-leaflet) takes a center prop with latitude and longitude for Columbus, and a zoom level. TileLayer pulls map tiles from OpenStreetMap, which is free and doesn’t require an API key. You do need to import Leaflet’s CSS separately or the map tiles won’t render correctly. That tripped me up for longer than I’d like to admit.

Parsing and Rendering the Data#

The core of the project is turning the GTFS text files into something react-leaflet can render. I wrote a few utility functions for this.

Parsing shapes.txt#

shapes.txt is a CSV where each row is a single point along a route’s path. To render a route, you need to group these points by shape_id and sort them by shape_pt_sequence:

function parseShapes(csvText) {
  const lines = csvText.trim().split('\n')
  const shapes = {}

  for (let i = 1; i < lines.length; i++) {
    const values = lines[i].split(',')
    const shapeId = values[0]
    const lat = parseFloat(values[1])
    const lon = parseFloat(values[2])
    const seq = parseInt(values[3], 10)

    if (!shapes[shapeId]) shapes[shapeId] = []
    shapes[shapeId].push({ lat, lon, seq })
  }

  Object.values(shapes).forEach((points) => {
    points.sort((a, b) => a.seq - b.seq)
  })

  return shapes
}

jsx

Nothing fancy. Split the CSV, skip the header row, accumulate points into arrays keyed by shape ID, sort each array by sequence. You end up with an object where each key is a shape_id and each value is an ordered array of { lat, lon } coordinates.

Parsing stops.txt and routes.txt#

Similar approach for stops and routes:

function parseStops(csvText) {
  const lines = csvText.trim().split('\n')
  const headers = lines[0].split(',')
  const idIdx = headers.indexOf('stop_id')
  const nameIdx = headers.indexOf('stop_name')
  const latIdx = headers.indexOf('stop_lat')
  const lonIdx = headers.indexOf('stop_lon')

  return lines
    .slice(1)
    .map((line) => {
      const values = line.split(',')
      return {
        stop_id: values[idIdx],
        stop_name: values[nameIdx],
        stop_lat: parseFloat(values[latIdx]),
        stop_lon: parseFloat(values[lonIdx])
      }
    })
    .filter((s) => !isNaN(s.stop_lat) && !isNaN(s.stop_lon))
}

function parseRoutes(csvText) {
  const lines = csvText.trim().split('\n')
  const headers = lines[0].split(',')
  const idIdx = headers.indexOf('route_id')
  const nameIdx = headers.indexOf('route_short_name')
  const colorIdx = headers.indexOf('route_color')

  return lines.slice(1).map((line) => {
    const values = line.split(',')
    return {
      route_id: values[idIdx],
      route_short_name: values[nameIdx],
      route_color: values[colorIdx] ? `#${values[colorIdx]}` : '#3388ff'
    }
  })
}

jsx

I used indexOf on the header row rather than hardcoding column positions. GTFS doesn’t guarantee column order, so this is more robust. The filter on stops catches any malformed rows that might have empty coordinates.

Connecting routes to shapes#

To color-code the route polylines, you need to link routes.txt to shapes.txt through trips.txt. Each trip references both a route_id and a shape_id, so you can build a mapping:

function buildRouteShapeMap(tripsText, routes) {
  const lines = tripsText.trim().split('\n')
  const headers = lines[0].split(',')
  const routeIdx = headers.indexOf('route_id')
  const shapeIdx = headers.indexOf('shape_id')

  const routeColorMap = {}
  routes.forEach((r) => {
    routeColorMap[r.route_id] = r.route_color
  })

  const shapeColorMap = {}
  for (let i = 1; i < lines.length; i++) {
    const values = lines[i].split(',')
    const routeId = values[routeIdx]
    const shapeId = values[shapeIdx]
    if (shapeId && routeColorMap[routeId]) {
      shapeColorMap[shapeId] = routeColorMap[routeId]
    }
  }

  return shapeColorMap
}

jsx

This gives you a lookup from shape_id to hex color, so when you render the polylines you can color each route correctly.

Putting it all together#

With the parsing done, the map component loads everything with useEffect and renders it:

import React, { useEffect, useState } from 'react'
import { CircleMarker, Map, Polyline, Popup, TileLayer } from 'react-leaflet'

import 'leaflet/dist/leaflet.css'

function CotaMap() {
  const [shapes, setShapes] = useState({})
  const [stops, setStops] = useState([])
  const [shapeColors, setShapeColors] = useState({})

  useEffect(() => {
    Promise.all([
      fetch('/data/shapes.txt').then((r) => r.text()),
      fetch('/data/stops.txt').then((r) => r.text()),
      fetch('/data/routes.txt').then((r) => r.text()),
      fetch('/data/trips.txt').then((r) => r.text())
    ]).then(([shapesText, stopsText, routesText, tripsText]) => {
      const parsedShapes = parseShapes(shapesText)
      const parsedStops = parseStops(stopsText)
      const parsedRoutes = parseRoutes(routesText)
      const colorMap = buildRouteShapeMap(tripsText, parsedRoutes)

      setShapes(parsedShapes)
      setStops(parsedStops)
      setShapeColors(colorMap)
    })
  }, [])

  return (
    <Map center={[39.9612, -82.9988]} zoom={12} style={{ height: '100vh', width: '100%' }}>
      <TileLayer
        url='https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png'
        attribution='&copy; OpenStreetMap contributors'
      />
      {Object.entries(shapes).map(([shapeId, points]) => (
        <Polyline
          key={shapeId}
          positions={points.map((p) => [p.lat, p.lon])}
          color={shapeColors[shapeId] || '#3388ff'}
          weight={3}
          opacity={0.7}
        />
      ))}
      {stops.map((stop) => (
        <CircleMarker
          key={stop.stop_id}
          center={[stop.stop_lat, stop.stop_lon]}
          radius={4}
          fillColor='#ff7800'
          color='#000'
          weight={1}
          opacity={1}
          fillOpacity={0.8}
        >
          <Popup>{stop.stop_name}</Popup>
        </CircleMarker>
      ))}
    </Map>
  )
}

jsx

The useEffect fires on mount, fetches all four GTFS files in parallel with Promise.all, parses them, and sets state. React re-renders the map with routes drawn as colored polylines and stops as orange circle markers with popups. Leaflet handles thousands of markers and polylines without any issues, which I wasn’t sure about going in. I half expected to need some kind of clustering or viewport culling, but it just worked.

How React Fits This Problem#

I kept thinking about this while building it: React’s component model maps really well to layers on a map. Your routes are Polyline components, your stops are CircleMarker components, and you compose the whole thing the same way you’d compose any other UI.

I was still getting comfortable with hooks at this point. The pattern of “fetch data in useEffect, store it in useState, render from that” was starting to click for me. A year earlier I would’ve written this as a class component with componentDidMount and this.setState, and it would’ve been fine, but the hooks version reads better. I also liked that the useEffect cleanup function gave you a place to handle request cancellation, which was always kind of awkward with class lifecycles.

The separation between data parsing and rendering turned out to be nice too. The parse functions are pure: give them a string, get back structured data. The React component just maps over that data. If I wanted to swap react-leaflet for Mapbox or something else later, the parsing code wouldn’t change at all.

What I Learned#

This project taught me a few things I don’t think I would’ve picked up from a tutorial.

The GTFS format is really well-designed. On the surface it’s just CSVs in a ZIP file, but the relational model underneath (routes to trips to stop_times to stops, with shapes as a separate geographic layer) does a lot with very little. Any agency can produce CSVs, but you can still model complex transit networks with service calendars, fare rules, pathways within stations. It started as a side project between one Google engineer and one transit agency, and now it’s a global standard. That doesn’t happen unless the data model is right.

Seeing the routes render on a map changed how I thought about the data. I’d been staring at COTA’s feed as rows and columns for weeks. The moment the colored lines appeared tracing actual streets in Columbus, I started noticing things. Routes bunch up along High Street and Broad Street downtown. Coverage thins out past the I-270 outerbelt. The CMAX corridor runs straight up Cleveland Avenue like a spine. All of that was in the data already, obviously, but I hadn’t seen it.

Being on the Smart Columbus team gave me context I wouldn’t have had otherwise, and it’s hard to overstate how much that mattered. I’d be working on transit data and then sit in a meeting where city planners are talking about Linden’s infant mortality rate and how lack of transportation access contributes to it. That reframes what you’re building. A gap in coverage on the map isn’t an abstract data point, it means there are people in that area who can’t easily get on a bus.

And then there’s open data, which made all of this possible in the first place. GTFS works because it’s open. Any developer can download COTA’s feed and build something with it. The Smart Columbus Operating System was designed so any city could fork the code and stand up their own instance. There’s no reason a grad student with a laptop should be able to visualize an entire city’s transit network in a weekend, but open standards make that weirdly doable.

What’s Next#

I want to take this further eventually. The static GTFS feed gives you the network as it’s designed to operate, but COTA also publishes GTFS-realtime feeds with live vehicle positions. Putting those on the same map, showing where every bus actually is right now, would make this more than just a visualization exercise. The real-time feed uses protocol buffers instead of CSV, so it’s a different parsing problem, but the rendering side would be similar.

There’s also the question of layering other datasets on top. The SCOS platform has crash data, mobility hub locations, demographic information. Overlaying transit coverage with access to healthcare facilities or employment centers could surface the kinds of equity insights the Smart Columbus initiative was built around.

For now though, I’m happy with where this ended up. A few hundred lines of React, an open data format, and a free mapping library gave me a window into how an entire city moves.

COTA GTFS data is available at cota.com/data ↗. ↩