Tuesday, May 30, 2017

Blessing in Disguise

Over the past week, Public Transit Analytics replaced a major component of the Score Generator software. One of the differentiating features of the Score Generator is the care that it gives to accurately determining what is reachable by walking in a public transit journey. When the virtual rider starts their journey or departs a transit vehicle, the stops that they can reach are not just those located in some radius of the point. Therefore, when Public Transit Analytics replaced the Score Generator's source of walking distance measurements, the decision to do so was made with utmost care. It is one with considerable implications, and some surprising benefits.

Before exploring these implications, it is useful to understand the process that the Score Generator uses to determine which destinations are in reach by walking. Recall that a Point Utility score is computed with a center point and a duration. When first run, the Score Generator builds two sets of points. The origin set contains the center point and every transit stop. The destination set contains every transit stop and points corresponding to the center of every Sector. The Score Generator calculates the straight-line distance between each origin and each destination. It then uses a walking speed estimate to convert these distances into times. Since these times are never longer than the actual walking time between the points, and no walking time can be greater than the duration, it retains only the measurements that are under that duration. These are called the candidate distance measurements.

Then, as the virtual rider journeys through the transit network, it checks how much time it has left to continue and considers all the candidate distance estimates for its current location that are less than or equal to the remaining duration. It takes these candidates and uses a software component called the Distance Client to get accurate walking directions to each destination. These directions include a total walking time, which provides the final filter on which destinations are reachable.

Prior to last week, the Distance Client relied on Google's Distance Matrix API to get walking times and distances. Google makes the Distance Matrix API available as a pay-per-use web service with no minimums and no upfront costs. For that reason, its presence made it possible to develop the Score Generator without needing considerable domain knowledge in mapping and wayfinding. Unfortunately, Google imposes a 100,000 request per day maximum for its standard pricing plan. In attempting to calculate a full Network Utility for a 10,000 Sector map of Seattle, it became clear that somewhere around four million distance measurements would be needed. Thus, acquiring the data would take around 40 days. Though premium service plans do exist, they are intended for usage patterns that involve a large number of continuous requests, not occasional periods of very heavy use.

Thus, current Utility maps from Public Transit Analytics use map data from OpenStreetMap with a locally-running instance of GraphHopper providing directions. This new solution maintains the most important factors of walking distance measurements: paths that account for the street layout and times that account for slowdowns and speedups from going up and down hills. However, to say the solutions yield identical maps would not be accurate. Compare the previous version of the Outbound Point Utility map from the Public Transit Analytics office (interactive version) with the current one (interactive version).
Overall, the current walking distance measurement source results in many more destinations being treated as reachable more often.  Based on several spot checks, variation appears to principally come from walking speed differences rather than routing differences. The open source nature of GraphHopper's navigation software gives some insight as to why. It appears that for most scenarios, it uses a walking speed of five kilometers per hour. This is a very typical speed used to model the preferred human walking speed. Google's speed selection algorithm is unknown, but appears to be slower on average.

That changing walking distance computation considerably alters Utility is an important observation. Both GraphHopper and Google intend to model what can be reached by some hypothetical average human being. An individual request to each service will probably result in only a minute or two of time variation. However, the aggregate impact when computing Point Utility is considerable. If so much variation exists between two systems trying to measure the same thing, even more variation must exist among the many riders of transit systems who, for a variety of reasons, may not behave anything like the average.

So far, the Public Transit Analytics blog has focused on how using the Score Generator makes transit networks more useful. For the network to also be just, it is necessary to ask for whom the network is being made more useful. If an individual's ability to move around the world is sufficiently different than the assumptions that walking distance calculations make, changes that make the network more useful for the average person may seriously degrade the network for that individual. Much like how using real schedule data instead of average transfer times results in a more genuine model of a transit network, a truly accurate model must also have the ability to capture a broad range of pedestrian abilities and preferences, not just some average.

Fortunately, using walking measurements based on open source solutions like OpenStreetMap and GraphHopper enables exactly that; the maps and software can be modified extensively. Transit planners working with Public Transit Analytics can opt for Utility maps that show their transit network from the perspectives of individual transit riders, accounting for factors ranging from mobility impairments, to age, to preferences. Though the need to change the walking distance measurement source was borne out of technical necessity, it has in fact been a blessing in disguise. By helping ensure that the unique needs of individuals are not lost amidst optimizing for averages, it has helped Public Transit Analytics further commit to its core tenet of justice.