Tuesday, September 26, 2017

Route Contributions Revisited

In the last blog post, I considered how a route-oriented measurement of Spontaneous Accessibility could be more illuminating to planners than ridership-based measurements of route productivity. The post analyzed some routes that King County Metro labeled as the most unproductive in their system, and found a variety of clues about their lack of productivity by measuring Network Accessibility Contribution. This post views those previous results in a broader context.

In this analysis, I selected every King County Metro route that met two criteria: the route must run entirely in Seattle, and it must not have changed substantially in routing since the ridership data published in the 2016 Service Evaluation was collected. I selected August 31, 2017 as a typical weekday and, for each route, calculated a 2000-sample Sampled Network Accessibility Contribution where the change was to delete the route. For each of these contributions, I negated the measurement, so that it is the accessibility gain of the route existing rather than the (negative) benefit of its non-existence, and divided it by the total amount of vehicle hours that service on the route requires. The lack of this additional step was mentioned as a weakness of the analysis in the previous post. Measuring the contribution on a rate basis rather than an absolute one allows the contributions of routes of very different lengths, spans, and frequencies to be compared fairly. The results are shown in the two charts below.



There are several noteworthy conclusions. The previous post established low absolute SNAC measurements for routes 47 and 99. On a rate basis, these routes similarly provide extremely low value. This additional information confirms that these routes have a low SNAC for reasons other than their short length. Poorly performing routes 24 and 33 are above average in terms of their rate-basis SNAC. This is a challenging situation to rectify: the routes allow trips that other routes are unable to provide. Nevertheless, riders are not using these routes at the expected rate. A consideration for routes of this nature may be to improve frequency despite their present unpopularity. The lack of use may stem from potential riders choosing other means of transportation because timing the bus trip is an inconvenient stricture.

Looking at routes with exceptional rate-basis contributions, there are not clear commonalities. Route 75 (comparative map) provides the highest absolute contribution by a scant margin, but relative to its in-service hours it is significantly the leader. It connects two major transfer points, the University of Washington campus and the Northgate Transit Center, in an indirect, backwards-C shape. The east-west portions of its route have several connections to north-south routes, while the north-south portion serves a corridor that is not readily accessed by other transit. Route 65 (comparative map) is a fairly direct north-south route in northeast Seattle. Route 50 (comparative map) meanders through a largely east-west path in south Seattle with multiple connections to frequent light rail service. These three routes are similar in that they do not directly serve downtown Seattle and do not cross bridges over the Lake Washington Ship Canal, which are frequent chokepoints. However, the route with the next-highest rate-basis contribution is the Rapid Ride D line (comparative map), which shares none of the previous properties.

While the lack of commonalities between routes that contribute the most to Spontaneous Accessibility may be disappointing, it is an important result. There is no simple set of properties that make a transit line valuable. Instead, it is vital to analyze each route in terms of its contribution to the entire network. Spontaneous Accessibility contribution measurements are ideal for guiding this process, allowing both immediate and careful analysis of transit routes.

Friday, September 8, 2017

The Route Productivity Problem

Spontaneous Accessibility measurements concern themselves with properties of transit networks as a whole. This is most evident with Network Accessibility, but even Time Qualified Point Accessibility, though localized to a single center point and starting time, has this property. Though viewed through a narrow aperture, it measures the ability of the transit network as a whole to provide service. Though the blog has touched on the many advantages of Spontaneous Accessibility measurement, it is not without downsides. It can feel far removed from the techniques that planners can actually use to improve a transit network. The realm of planners is one of transit corridors, routes, and their respective frequencies and spans; these are the controls that can be manipulated in network design.

Spontaneous Accessibility measurements evaluate the outcomes from these manipulations, but on their own do not offer much guidance into how the controls can be manipulated to achieve a positive outcome. This is problematic because when planners make changes, they are rarely overhauling the entire transit network. While drawing out a new network from scratch and measuring its Network Accessibility could result in a vastly improved transit network, it is a costly endeavor that is incredibly disruptive to current transit users. For this reason, I consulted service planning documents from King County Metro in Western Washington and the Massachusetts Bay Transportation Authority (MBTA) of the Metro-Boston area, to demonstrate how Spontaneous Accessibility measurement could best improve and supplement tried and tested service planning processes.

Both documents establish procedures for evaluating the benefit that an individual transit route provides. King County Metro calls this route productivity in their 2016 System Evaluation document. Route productivity is based on two measurements, riders per platform hour and passenger miles per platform mile (where the "platform" qualifier indicates that the measurement includes time when the bus is out of service, such as driver breaks or deadheading). While route productivity is a secondary consideration to crowding and lateness when allocating additional service, for removing service it is the primary signal. Routes are divided into urban and suburban categories based on their characteristics. Then each route is measured on both measurements for peak, off-peak, and night timeframes. Routes that fall within the bottom 25% of these categories are candidates for service reductions when funding is imperiled.

The MBTA's Service Delivery Policy describes a Bus Route Cost-Benefit Ratio. This measurement is a weighted combination of the ridership of the route, the ratio of those on board who are transit dependent, and a Value to the Network measurement. The latter includes the catchment area: a count of people uniquely serviced by the route, the number of jobs near the path of the service, and the proportion of passengers who make use of the service to connect to additional service. Though a low score is not a trigger for making service cuts to a route, it may be used to make other modifications.

Both route productivity and Bus Route Cost-Benefit Ratio are fairly complex measurements. Ridership measurements, which inform all of route productivity and the vast majority of Bus Route Cost-Benefit Ratio, require that passenger boardings and deboardings are properly recorded. This can be difficult because equipment may be present on only a subset of buses, the recording may not work accurately, or, due to shortages, buses with malfunctioning sensors may still be used, polluting the data. Furthermore, ridership variation may have causes outside of the transit network itself. Weather, extended road closures, and special events may influence rider behavior enough to distort the six month data collection timeframe that King County Metro uses to measure route productivity. Measuring and utilizing catchment area is also complicated. While a route might appear to have a small catchment area because other transit service might exist in the proximity of a route's path, that service may or may not allow the same destinations to be reached as the route that is being evaluated.

Nevertheless, the ability to assign a value to a single route is clearly an important part of a practical transit planning process. While Spontaneous Accessibility is a network-level measurement, it is possible to apply its principles to the measurement of a single route. Each route contributes some amount of Spontaneous Accessibility to the whole network. A valuable route contributes Spontaneous Accessibility in a unique way, connecting origins and destinations at times that no other route does, whether directly or through the connections that it enables. To measure this, first, a Network Accessibility or Sampled Network Accessibility measurement is computed for the entire network. Then, using the same collection of Sectors if Sampled Network Utility was used, a single transit line is marked as ineligible and the measurement is made again. The proportion of change between the ratios, called the Sampled Network Accessibility Contribution (SNAC) or the Network Accessibility Contribution (NAC), is used to evaluate the impact of the route on the network's Spontaneous Accessibility. If the transit line was largely redundant, the network will hardly show an effect, as riders have alternate paths to the Sectors that the line served. Otherwise, the ratio may be substantially changed reflecting that many unanticipated trips have become more difficult.

Where NAR' and SNAR' indicate the ratios calculated with the proposed change in effect.


To demonstrate, SNACs modeling the elimination of several King County Metro routes that were considered unproductive by route productivity were calculated. A thirty minute isochrone and 2000 samples were used. Each route ranked in the bottom 25% in at least half of the timeframe and measurement type pairs for which it was eligible (some routes do not operate at night or off-peak, and thus do not have measurements from these periods) and did not have any measurement in the top 25%. As SNAC measures the Spontaneous Accessibility improvement of some change, routes that are more valuable have lower scores. The results indicate that though Metro's measurements view these routes as comparably unproductive, their contributions to the network's Spontaneous Accessibility vary widely, suggesting a variety of causes, and thus solutions, to their deficiencies.

Route Route Map SNAC Comp. Map (Common scale) Comp. Map (Normalized scale)
4 Link -0.00111 Link Link
24 Link -0.00455 Link Link
33 Link -0.00393 Link Link
37 Link -0.00045 Link Link
47 Link -0.00003 Link Link
99 Link -0.00011 Link Link

Routes 47 and 99 appear to be largely redundant with other service. For these routes, more frequent service is available on streets that are very close (Broadway for the 47 and 3rd Avenue for the 99), and thus for most trips, it is a better option to take this more frequent service and walk to destinations on the paths of these less frequent routes. They appear to be the strongest candidates for complete elimination. Route 37 provides value along the coastline of West Seattle, but the route's tail into the interior provides little benefit from a Spontaneous Accessibility standpoint. As such, costs can be reduced by truncating it. Route 4 provides most of its value on the path that it shares with route 3. It provides additional Spontaneous Accessibility value to the Sectors north of Mount Baker in the I-90 corridor. Perhaps this area would be better served by service to frequent transit on Rainier Avenue rather than a meandering path towards downtown. Routes 24 and 33 provide considerable value throughout Magnolia; reducing their service would make unanticipated trips to and from there considerably more difficult. However, route 24 is very circuitous and route 33 does not get to northern Magnolia in a particularly direct way. In this case, it may only be possible to preserve Spontaneous Accessibility and reduce hours by fundamentally restructuring service to Magnolia.

Conducting an analysis like this one takes mere days from conception, to execution, to visualization and evaluation. A more thorough study would consider a variety of walking speeds, to ensure the eliminated routes are not critical to maintaining Spontaneous Accessibility for riders with mobility issues. In addition, it would also be useful to know the SNAC of eliminating route per some cost of its operation. This would ensure that short routes are not disproportionately targeted. Also it would be best to study a variety of SNACs with different isochrone times. These variations would not extend the study time greatly, allowing much faster understanding of the value of routes than studying ridership for six months. It is also a more direct measurement. Low ridership is a symptom of many diseases afflicting transit networks; elimination is not always the proper cure. By measuring the properties of the network directly, it is more evident whether and how a route is providing value.

For both King County Metro and the MBTA, using Spontaneous Accessibility Contribution measurements could improve their route evaluation processes in a natural and non-disruptive way. Its extra insights allow any agency that must evaluate its routes to make more nuanced decisions without upending existing processes.

Thursday, August 24, 2017

Technical Brief: Rethinking Parking Requirement Exemptions with Spontaneous Accessibility

Rethinking Parking Requirement Exemptions with Spontaneous Accessibility proposes Spontaneous Accessibility as a mechanism for determining when developers of new residential or commercial sites in Seattle should be exempted from providing parking. Currently this decision is made by evaluating access to frequent transit using a distance and headway-based process. Replacing this process with Spontaneous Accessibility measurements incorporates additional precision and nuance, better determining whether new parking-exempted sites are truly served by transit that can meet the entirety of resident needs. Though focused on a single problem in Seattle, it provides a framework for using Network Accessibility and Point Accessibility to solve land use problems generally.

Wednesday, August 16, 2017

Technical Brief: Transit Planning with Spontaneous Accessibility

Transit Planning with Spontaneous Accessibility is a short document that explains how planners can use Spontaneous Accessibility measurements to improve their transit networks. It contrasts Spontaneous Accessibility with existing modeling techniques and explains how planners can use it to incrementally close more of the gap between transit networks and private vehicle ownership by creating networks that are more amenable to unexpected, unanticipated trips.

Monday, August 7, 2017

Two Months

Two months ago, Public Transit Analytics described Network Utility as a measurement that could quantify how useful a transit network is. However, computing this measurement in practice was difficult: calculating it for a reasonably detailed map of Seattle would take an infeasible amount of time. While Cumulative Point Utility was proposed as a method for simplifying the calculation, the parameters of using it practically were unclear. Unfortunately the desired clarity remained elusive. At the same time it became obvious that the utility measurements on the whole failed to communicate their purpose to transit planners. These two setbacks necessitated a challenging rethinking of these measurements, both in their terminology and calculation.

Until now, Public Transit Analytics proposed that how "useful" or the "utility" of a transit network was what its measurements quantified. Unfortunately these terms are both overly broad and fundamentally miscast in describing those measurements. "Utility" has a specific meaning in economics, which sometimes finds its way into transit planning literature. Meanwhile, there was an understandable resistance to using a term as general as "useful" to solely express the ability of individuals to reach destinations regardless of the time of day or the differing value of locations. Reviewing the literature of academic transit planning was critical to properly name these measurements, giving them precision and clarity.

Fundamentally, Public Transit Analytics's measurements quantify accessibility, which, broadly, is the ability of individuals to reach opportunities. Studies of accessibility are a perennial topic in peer-reviewed transit planning literature. Many of these studies use measurements that resemble what this blog described as Point Utility or Network Utility. However, these studies have often focused on a specific type of opportunity, such as access to jobs during the morning rush hour or how easily certain shopping centers can be reached by transit. In contrast, Public Transit Analytics's measurements have worked towards greater generality: focusing on the ability to start in arbitrary locations and reach arbitrary destinations, at arbitrary times of day. In considering this contrast, it's clear that both Public Transit Analytics and other researchers are measuring some way in which the transit network is useful. Other researchers have largely focused on transit trips that are preexisting or expected. On the other hand, Public Transit Analytics is measuring the ability to take unanticipated, unexpected trips. These trips may occur any time of day, with unpredictable origin and destination points. In other words, the measurements describe the ability of the network to support spontaneous transit trips, and therefore the measurements that once quantified utility now describe Spontaneous Accessibility. Building a transit network that excels at allowing these trips is difficult, given the limitations that transit service has in contrast to private vehicle ownership. By ensuring that new projects and restructures incrementally improve Spontaneous Accessibility, individuals can increasingly count on transit to meet all of their needs.

Furthermore, the literature review made it clear that accessibility-based measurements represent only a subset of the ways of quantifying the value of a transit network. Primarily, many agencies use a four step forecasting model to predict the ridership or total time saved of proposed modifications to transit networks. This blog formerly expressed some skepticism of such methods. After a careful examination of the literature, Spontaneous Accessibility measurements are more a complement than a replacement. When modifying a transit network, understanding its impact to existing riders and their anticipated trips is important. At the same time, improving Spontaneous Accessibility has value in attracting new transit trips, discouraging car ownership by making spontaneous trips on transit more achievable, and improving service for those who solely rely on transit.

This recasting of Spontaneous Accessibility would be for naught if it could not be calculated in a network-wide, all-day way feasibly. Unfortunately, the approach of using mutual information calculations to establish a threshold for using Cumulative Point Utility yielded inconsistent results. Consequently, Public Transit Analytics did not just rename Network Utility to Network Accessibility, but made its calculation faster. By rewriting its destination-finding algorithm to use dynamic programming and eliminating the consideration of Sectors that are entirely on water, the time to calculate Network Accessibility has been improved by multiple orders of magnitude. While now computationally feasible, it is still expensive. It's possible, though, to make a measurement of the current transit network using Network Accessibility, wherein every Sector center and minute of the day is used. For subsequent transit planning experiments, a sample of these starting points can be selected. The sample's closeness to the full Network Accessibility can be tested using a different information theoretic technique: Kullback-Leibler divergence. Once a sufficiently close sample is found, a Sampled Network Accessibility of the experiment can be compared to the original Network Accessibility.

The research that went into confronting these two issues yielded a very favorable outcome: Public Transit Analytics submitted a paper to the Transportation Research Board describing Spontaneous Accessibility and analyzing its change in Seattle over a year period. If accepted, it will be presented or published in January of 2018. The following map from the paper shows Spontaneous Accessibility in Seattle on January 25, 2016, before the opening of the Link light rail extension through Capitol Hill and the University of Washington. Each Sector is colored based on the proportion of origin location and time pairs that allow the Sector to be reached within 30 minutes. Averaging these proportions results in the Network Accessibility Ratio, which eschews the scaling factor that Network Utility used. In this case, the Network Accessibility Ratio is 0.06017. Currently, the map is only available in a non-interactive form, though improved interactive versions are in the works.

With the ability to compute Network Accessibility now possible, this blog will begin to primarily focus on using it to measure hypothetical transit network changes. In lieu of completing the Foundations of Evaluating Public Transit Networks series, a technical brief describing transit planning using Spontaneous Accessibility measurements will be published shortly.

Tuesday, May 30, 2017

Blessing in Disguise

Over the past week, Public Transit Analytics replaced a major component of the Score Generator software. One of the differentiating features of the Score Generator is the care that it gives to accurately determining what is reachable by walking in a public transit journey. When the virtual rider starts their journey or departs a transit vehicle, the stops that they can reach are not just those located in some radius of the point. Therefore, when Public Transit Analytics replaced the Score Generator's source of walking distance measurements, the decision to do so was made with utmost care. It is one with considerable implications, and some surprising benefits.

Before exploring these implications, it is useful to understand the process that the Score Generator uses to determine which destinations are in reach by walking. Recall that a Point Utility score is computed with a center point and a duration. When first run, the Score Generator builds two sets of points. The origin set contains the center point and every transit stop. The destination set contains every transit stop and points corresponding to the center of every Sector. The Score Generator calculates the straight-line distance between each origin and each destination. It then uses a walking speed estimate to convert these distances into times. Since these times are never longer than the actual walking time between the points, and no walking time can be greater than the duration, it retains only the measurements that are under that duration. These are called the candidate distance measurements.

Then, as the virtual rider journeys through the transit network, it checks how much time it has left to continue and considers all the candidate distance estimates for its current location that are less than or equal to the remaining duration. It takes these candidates and uses a software component called the Distance Client to get accurate walking directions to each destination. These directions include a total walking time, which provides the final filter on which destinations are reachable.

Prior to last week, the Distance Client relied on Google's Distance Matrix API to get walking times and distances. Google makes the Distance Matrix API available as a pay-per-use web service with no minimums and no upfront costs. For that reason, its presence made it possible to develop the Score Generator without needing considerable domain knowledge in mapping and wayfinding. Unfortunately, Google imposes a 100,000 request per day maximum for its standard pricing plan. In attempting to calculate a full Network Utility for a 10,000 Sector map of Seattle, it became clear that somewhere around four million distance measurements would be needed. Thus, acquiring the data would take around 40 days. Though premium service plans do exist, they are intended for usage patterns that involve a large number of continuous requests, not occasional periods of very heavy use.

Thus, current Utility maps from Public Transit Analytics use map data from OpenStreetMap with a locally-running instance of GraphHopper providing directions. This new solution maintains the most important factors of walking distance measurements: paths that account for the street layout and times that account for slowdowns and speedups from going up and down hills. However, to say the solutions yield identical maps would not be accurate. Compare the previous version of the Outbound Point Utility map from the Public Transit Analytics office (interactive version) with the current one (interactive version).
Overall, the current walking distance measurement source results in many more destinations being treated as reachable more often.  Based on several spot checks, variation appears to principally come from walking speed differences rather than routing differences. The open source nature of GraphHopper's navigation software gives some insight as to why. It appears that for most scenarios, it uses a walking speed of five kilometers per hour. This is a very typical speed used to model the preferred human walking speed. Google's speed selection algorithm is unknown, but appears to be slower on average.

That changing walking distance computation considerably alters Utility is an important observation. Both GraphHopper and Google intend to model what can be reached by some hypothetical average human being. An individual request to each service will probably result in only a minute or two of time variation. However, the aggregate impact when computing Point Utility is considerable. If so much variation exists between two systems trying to measure the same thing, even more variation must exist among the many riders of transit systems who, for a variety of reasons, may not behave anything like the average.

So far, the Public Transit Analytics blog has focused on how using the Score Generator makes transit networks more useful. For the network to also be just, it is necessary to ask for whom the network is being made more useful. If an individual's ability to move around the world is sufficiently different than the assumptions that walking distance calculations make, changes that make the network more useful for the average person may seriously degrade the network for that individual. Much like how using real schedule data instead of average transfer times results in a more genuine model of a transit network, a truly accurate model must also have the ability to capture a broad range of pedestrian abilities and preferences, not just some average.

Fortunately, using walking measurements based on open source solutions like OpenStreetMap and GraphHopper enables exactly that; the maps and software can be modified extensively. Transit planners working with Public Transit Analytics can opt for Utility maps that show their transit network from the perspectives of individual transit riders, accounting for factors ranging from mobility impairments, to age, to preferences. Though the need to change the walking distance measurement source was borne out of technical necessity, it has in fact been a blessing in disguise. By helping ensure that the unique needs of individuals are not lost amidst optimizing for averages, it has helped Public Transit Analytics further commit to its core tenet of justice.

Saturday, May 20, 2017

Foundations of Evaluating Public Transit Networks, Part 6: Similar, Rather than Different

Last week in the Foundations of Evaluating Public Transit Networks series, I highlighted the fact that shifting the center of a Point Utility computation a small amount can have a substantial impact on the reachability map and the corresponding score. This presented the problem of how to measure the utility of a whole network when individual points may tell very different stories about how useful the network is. This post describes one strategy for solving this problem by focusing on how Point Utility measurements from different centers are similar rather than how they are different, and using techniques from the discipline of information theory to define this similarity in a rigorous way.

To ease this discussion, this post makes use of the following terminology. Network Utility (NU) is the hypothetical measurement of how useful a transit network is using Public Transit Analytics's definition of useful. In theory, it would be calculated by running a series of full-day Point Utility computations centered at each Sector of the service area. As explained in the last post, this is a very difficult measurement to make, owing to the large number of Sectors from which Point Utility calculations must be run. A related concept is Cumulative Point Utility (CPU). Rather than aggregating the results from every Sector, CPU is the result of generating the Point Utilities at some sample of the Sectors' centers and combining these. CPU has the benefit of being flexible in the amount of computation required; there are no restrictions on how many or few samples make up a CPU calculation. There is, of course, the tradeoff that fewer samples will produce results further away from the true Network Utility.

Information theory comes in handy in establishing what "further away" means. Mutual information is an information theoretic technique that measures how much information one random variable indicates about another. It uses the joint and marginal probability distributions of two random variables. Thus, to measure how far away one Point Utility measurement is from another, it must be possible to model a Point Utility as a probability distribution. The diagram below indicates how this can be done.

Consider the simplified Point Utility maps above. Recall that in a Point Utility map, each Sector is assigned a shade of green depending on how often the Sector can be reached. Each shade of green corresponds to a number, one through nine, referred to as the "bin value". Treat each Sector as an observation of a random variable. The value of that observation is the bin value of the Sector. Sectors that cannot be reached are given a bin value of zero. Looking over a whole Point Utility map, one can count the number of Sectors with each reachability bin value. These sums can be divided by the total number of Sectors, creating a probability distribution for what the bin value of an arbitrary Sector would be on that map.

When considering two different Point Utility maps, it is then possible to construct a joint probability distribution. To do this, choose a Sector on the first map and find the same Sector on the second map. Keep a count of each pair of observations; for example Sector one has a bin value of 9 in the first map and a bin value of 5 in the second map, so the pair (X=9, Y=5) is recorded. Continue this process for each Sector, obtaining a total of each observation. Dividing these by the number of Sectors yields the joint probability distribution for the bin values of the two Point Utility maps.

With the joint distribution, and the marginal distributions that can be computed from it, it is possible to compute the mutual information of two Utility measurements of any type. This calculation can be used in two ways. If the Network Utility has been computed at great cost, the mutual information can be used to determine how accurately a Cumulative Point Utility approximates the Network Utility. Once a sufficiently accurate CPU has been found, changes to the transportation network can be measured using CPU rather than NU, saving computational resources and money.

Furthermore, even if the Network Utility is unknown, mutual information sheds light on whether the Cumulative Point Utility is approaching it. Consider building a series of Cumulative Point Utilities by selecting an initial Sector at random, adding new random Sectors, and computing a new CPU for after each new Sector is added. After each CPU is measured, the mutual information between it and the previous CPU is measured. Initially the mutual information will decrease: mutual information between any two Point Utilities is relatively high because both PUs indicate that the plurality of Sectors are unreachable and thus the maps are similar. As more PUs are combined into the CPU, fewer points are outright unreachable, and thus the maps will be less similar from the perspective of mutual information. As more PU measurements are added, it is expected that the rate of decrease will slow or reverse, as the data added from a PU is capable of being inferred from the previous CPU. In a more concrete sense, this means that one can learn something about what is reachable at a point by considering the points around it. After all, that point has similar transit stops within its walking range. Eventually, the random points that were selected comprise nearly all the information about the transit network, even though they are not an exhaustive collection of every location on the map.

Public Transit Analytics is actively working through some of the practical concerns of using Cumulative Point Utility. It is unclear how many samples would be sufficient and what properties of a transit network may result in alterations to the sufficiency criteria. Nonetheless, the approach of using mutual information appears to be a promising one. Its ultimate promise is a way of measuring an entire transit network in an unbiased way not seen in other models available to planners today.