Thursday, March 30, 2017

Foundations of Evaluating Public Transit Networks, Part 1: From Maps to Scores

In the Foundations of Evaluating Public Transit Networks introduction post, I established the goal of measuring how useful a public transit network is: how well it allows many people to get to all of their desired destinations. Public Transit Analytics has developed a scoring system based on isochrone maps to accomplish this goal. This type of map has become an important, and increasingly standard, component of the transit planning process. If you are unfamiliar, public transit consultant Jarrett Walker describes them concisely and compellingly on his Human Transit blog. The transit planning software tool Remix, also uses isochrone maps to help transit planners see the impact of the service changes that they implement. My local transit agency, King County Metro, has produced a set of isochrone maps using Remix that highlight the impressive growth of their transit network expected between now and 2040.

With such a tool already in place for generating isochrone maps, it is logical to ask why Public Transit Analytics would implement an entirely new software system. I have never used Remix, but its wide adoption speaks to its quality. For King County Metro's long range planning, it provides a clear visual that the network is indisputably improving. However, looking at the maps that it is generating, there appear to be limitations that would hinder score production, especially for more surgical changes to a transit network. For that reason, Public Transit Analytics's Score Generator generates isochrone maps in what appears to be a somewhat different way.

Consider the isochrone maps from King County Metro or on the Remix blog post. The reachable area of the map is curvy and bulbous. Based on that, I would speculate that for every destination that a rider reaches in a transit vehicle, the map software draws a circle based on the amount of remaining time. This circle represents the area that the rider can reach on foot in the time remaining after they have exited the transit vehicle. This introduces a level of imprecision. Dead end streets and hills, for example, can drastically change this reachable area in real life.

Perhaps more critically, note the disclaimer on the bottom of King County Metro's maps. The maps are generated using "the average time spent waiting to transfer". This is another source of imprecision. A map that assumes an average transfer time can introduce errors. Such a map could overstate where riders can reach: if the rider arrives at the stop as the bus is pulling away, they will have to wait more than the average time. That might seem like a small error source, but these errors cascade. Each destination that the map thinks that the rider can reach opens up a set of connections that may in fact be impossible to reach. The reverse is true as well: the map can also understate where a rider can reach if there are circumstances that cause them to make a transfer with below average wait time. What is worse is that the difference in time between the average transfer and the true transfer time can be a significant amount of time relative to the total time.

The Score Generator's approach avoids these imprecisions. It divides the area that a transit agency serves into a grid of regions called Sectors. It also keeps track of a set of Transit Stops, as well as what Sector each of the Transit Stops is in.  The Score Generator models the course of a virtual rider that starts at some location, walks to other locations, and rides transit when at Transit Stops. When at a Transit Stop, copies of the virtual rider board every vehicle that arrives within the time constraint and exit at every stop. When a rider exits a vehicle, it marks that Transit Stop and the containing Sector as visited. It then then considers the rides it can take at that new stop, as well as each additional Transit Stop and Sector that can be reached by walking. It sends more copies of itself on each of these walks. Once at that new location, it marks as reachable each of these new locations as well, and potentially boards transit vehicles there, continuing the cycle until time is exhausted. Google's DistanceMatrix API is used to compute, with high accuracy, what can be reached by walking.

Of course, one could argue that such an approach is also imprecise, as it considers reaching any point within a Sector to indicate that the entire Sector was reached. This is moderated by the fact that the Score Generator allows for arbitrarily sized Sectors. Public Transit Analytics customers get to decide how small their Sectors need to be in order to produce a map of adequate precision. The tradeoffs are higher costs, both on the computational time taken and the need to acquire more distance measurements from the DistanceMatrix API.

The Score Generator also uses real schedules for every part of its calculation, including transfer times. This ensures that as long as the transit agency's schedules are accurate, the result will closely mirror what real riders experience. It will not under- or over-estimate what is reachable.

Once the Score Generator calculates what sectors can be reached within the time limit, it is easy to compute a score. This score is called Point Utility, reflecting the fact that it defines how useful the transit network is at a given locational point. Point Utility is Qualified by start time (in superscript) and a duration (in subscript). It is defined by the reached sectors divided by the total sectors, all multiplied by a scaling factor of 1000 and rounded to a three-digit integer to make the number more human-readable.

As an added benefit, because the scores are derived directly from maps, the maps themselves become a compelling visual for explaining the score within an agency or to its customers. As an example, here is a map of the  from the Public Transit Analytics office on a typical weekday. The map area is the bounding box around the city of Seattle, divided into a 100 by 100 grid of Sectors. An interactive version that shows the best route to the sector on hover and detailed information on click, is also available.

Of course, a Qualified Point Utility score on its own does not come close to representing how useful a public transit network is: real riders do not originate from a single location or do all their traveling at 10:00. Nevertheless, it is an important subcomponent that goes directly into the overall network utility calculation. In the next couple of weeks, I will be continuing this series of posts, building up the components that Public Transit Analytics uses to arrive at its overall public transit network score.

Do you represent a transit agency, municipality, or community group that would like analyses like this done for your local transit network? Contact

Sunday, March 19, 2017

Foundations of Evaluating Public Transit Networks, Introduction

When comparing two complex things of the same type, condensing the complex thing into a single comparative score can simplify decision making. This is a common strategy in a variety of disciplines. In sports, measurements like American football's Passer Rating and baseball's Wins Above Replacement (WAR) form a single score from many components of a player's performance. In the case of WAR, the combined season-long WARs of players on a team closely correlates with that team's number of wins. In this way, baseball players can be compared and ranked, even if they have very different sets of baseball skills. In real estate, Walk Score combines many elements of livability into a single score that homebuyers can use when comparing houses in different neighborhoods. Rotten Tomatoes uses a weighted aggregation of reviews to rank movies.

For transit planning, it would be helpful to have an analogous concept for evaluating how useful a public transit network is. Transit planners could use such a score in a variety of contexts. When considering multiple proposals to restructure transit service, an objective score can be used to select the better one. When a new transit line is planned, the score difference, between the transit network with the new line and without it, can be measured against the line's monetary cost to ensure that it is a good use of resources.

Of course, such a score is unlikely to be the final word in any public transit planning decision. Public transit agencies operate in the context of a government and community. The values of those entities may be extremely difficult to capture in a single score. Nevertheless, having scores available, and having tools that demonstrate how the scores were derived, can help guide conversations within an agency and among the community of riders. They can provide a compelling explanation of why certain decisions were made or why certain alternatives were not considered.

To create a score, a tempting choice may be to consider the number of riders as a score for how useful parts of the transit network are. Using ridership, however, makes it easy to draw erroneous conclusions. Low ridership can be cited as a reason to cancel a transit route, as some may attribute the lack of use to redundancy. However, the route might be fundamentally useful, but not often ridden because it does not run frequently enough for customers to rely on it. Additionally, doing any sort of speculative planning, such as adding new lines, requires projecting ridership, which is imprecise at best. On top of that, ridership is difficult to measure, requiring rider counters to work reliably and in a large enough sample to be representative.

Public Transit Analytics's core score does not use ridership. Instead, the score is motivated by one of the company's tenets: to help build transit networks that are more useful. A useful transportation improvement is one that allows more people to access more of the places that they desire to go. In this series of posts, I will discuss how Public Transit Analytics has derived its own process for measuring exactly that.