May 7, 2009

Spark it Up

This post is literally 2 years in the making. In the Spring of 2007, Jeff (You Don't Mess With the) Zupan gave me a spreadsheet with the annual 'registrations' (i.e. recorded entries) at each station in the NYC subway system going back to the beginning (1905). At the time, I was heavy into the new open source geo stack, as is reflected in the main piece of work I did at RPA. Hammer in hand, I of course saw this spreadsheet as a bucket of nails.

The result, after much whacking, is, I think, compelling, but you'll have to see for yourself. The general idea it that the history of subway ridership tells a story about the history of a neighborhood that is much richer than the overall trend. An example, below, shows the wild comeback of inner Williamsburg, and how the growth decays at each successive stop away from Manhattan on the L train:


This is somewhat in contrast to the South Bronx, which is yet to see the resurgence in ridership, other than at Yankee Stadium and the Grand Concourse:


The stations around Wall Street tell a totally different story, in which the ups and downs of each dep/recession have more immediate but temporary effects:


My first stab at visualizing this data was a traditional cartographic approach, showing the overall growth from 1977 to 2006 at each station. This told an approximate story at the level of the whole the city, but did not leave much room for detailed exploration. Thanks to geoserver's awesome new(ish) dynamic symbolizers functionality, it was trivial to plot the station-by-station time series sparklines (generated in R of course) onto the interactive online map. (Originally I the plots were produced in Perl and placed onto the map with a Javascript WFS layer, but that is so 2005.)

For all this, I never really felt like this little experiment was ready for an audience. That all changed when OpenGeo put up its Open StreetMap base layer for the web, giving fancy overlays like this one the context they need.

http://transit.frumin.net/subway/spark.php

UPDATES:

At least 2 people have taken the data I put out there and used it to make some zippier interactive flash apps:

  • The first is very polished, but I think the designer is quite misled in his desire to not plot dots on a map, and thus to plot what looks like a network flow diagram but with totally bogus data
  • The second is a little rougher around the edges, but I'd say is much more honest, and thus useful

Not sure if anyone knows, but I also have GIS files for the subway here: http://transit.frumin.net/trx/Alignment

February 28, 2009

It's Not a Tumor

For reasons not worth mentioning, I had a brain MRI in recent weeks. As for prognosis, Arnold definitely said it best:



Not that I ever really thought it would be. What I did think about was the pictures I would get back. And they are money (perhaps even more so than the nap I took in the machine). Here's the best of the crop (click for enlargements, or go here):


Images | AVI Video | Flash Video

Images | AVI Video | Flash Video
 

Images | AVI Video | Flash Video
Press play on this video:
AVI Video | Flash Video



I don't think I should have expected to recognize myself from a cross-section of my brain, eyeballs, and sinuses. In fact, I think I look damn silly.



I do recognize my crooked nose in this one, but I didn't expect the crookedness to continue inside the skull. That huge white ball inside my right cheek -- is that mucous? Is there a doctor in the house?



I can't resist this one: peep the brain stem.

December 28, 2008

Merry Fuckin Chanukah

Get it while it's hot: Frumix 2008 (don't shuffle, there's an ordering here).

I've been working on this one for a while. Not because it's so great, but because with my own convention of only including music that is new to me since the previous mix, it can be hard to get to a full 10 or 15 songs.

Don't hate cuz Snoop snuck on or cuz KRS is even preachier in his old age; the Kanye track you've never heard before absurd Ghotsface flow-fest make it worth dropping before the year is out. Props to Evan and Bennet for the Knife and J. Valentine tracks that I never would have heard otherwise.

If that's not enough, there's also a new personal best of Redman mix (again, no shuffling).

If you're a fan of FunkDoc then there's not much new there. Otherwise, you've been sleeping, so wake up!

PS if you don't already have them, grab Frumix 2003, 2004, and 2006 while you're at it.

PPS Credit where it's due: among the extremely useful and free software packages I used to make these mixes are: Audacity, MP3Tag, and Winamp.

PPPS UPDATE: Audacity/LAME actually wrote out the MP3 frame headers and length tag to be all messed up, so the first version of the new Frumix 2008 would have given you issues in most MP3 players. That's been all fixed now using a fix I found on the web

November 27, 2008

What's in a Schedule?

I owe somebody what amounts to this blog post. Pardon the lack of illustrative diagrams.

I have been thinking about mass transit trip planning software for the web and for mobile devices. Between the individual efforts of agencies around the world, and Google's efforts towards open sharing of structured transit system data, we seem to be on the right track, institutionally speaking. As a user, however, I am perpetually frustrated by the focus that every transit trip planner I have ever used puts on the supposed schedule, even for services that are high frequency and/or less-than-perfectly reliable.

This general feeling, combined with two recent and exciting meetings I have had, leave me with a few nagging questions:

  • In providing transit users with such software, how useful is the schedule by which the transit provider has planned their operations?
  • When are expected waiting and travel times more useful than precise trip-by-trip itineraries?
  • What effect do randomness and unreliability have on those expectations?
  • Should the passenger plan her trip differently if she has to be on time than if her schedule is flexible?
  • Finally, does real-time information obviate the need for any or all of these other inputs?

The answer: it depends. The actual schedule (R trains leave Union St at 8:13, 8:25, 8:37 arriving at Union Square at 8:39, 8:51, 9:03, etc) is only relevant to the degree to which operations follow the plan. And even in the face of near-perfect operations, I only care about the schedule of departures when I have something to lose by ignoring it (i.e. when there's not always another train or bus in tolerably few minutes).

Expectations implied by the schedule (I should wait 6 minutes on average, but never longer than 12, and the ride is expected to take 26 minutes) are meaningful even when the precise schedule isn't, but only if those expectations are reasonable. For example, a simple model shows that as the service becomes even slightly variable, expectation of waiting time increases, as does the maximum. Of course, many things that cause some passengers to wait longer are experienced by other passengers as delays along the way.

Let's now think specifically about trip planning software for relatively high frequency urban transit services with normal amounts of variability. I don't want to be bothered with exact but fairly useless times of scheduled departures and arrivals. I just want to know how long I can realistically expect to have to wait, and how long the trip is likely to take. And when I have a hard timeline, like getting to a meeting or a catching an airplane, I want to know the (approximately) worst case scenario.

Current levels of unreliability in our transit systems are not something we should have to live with. More funding, saner public policy, and better management can go a long way towards fixing some problems. I am not focusing here on the sources of unreliability, but suffice it to say they are many, some debatably the provider's responsibility (eg missing drivers, faulty equipment) and some debatably not (eg on-street traffic, passenger behavior). But given that they are here today, would you rather think a trip will be fast and have it end up being slow, or would you prefer to have the best information possible when making your own decisions?

The copious amounts of real service data collected by transit providers from bus GPS and rail signaling systems are of great value here. They allow us to fairly easily and cheaply describe distributions of waiting and travel times, and thus estimate expectations and approximate maximums for use in trip planning software.

Often, those systems were in fact installed to provide real time data, with historical performance analysis a secondary or accidental purpose. The notion of an expected waiting time changes radically when real-time "next-vehicle" information is provided, assuming the real-time predictions are in fact accurate. However, even perfect real-time data doesn't protect from problems from occurring down the line or reduce the variability inherently introduced by successive transfers.

In the next generation of (open source?) web and mobile transit trip planning, please:

  • Give me the option to use the schedule or to use expected values, but try to be smart about the default.
  • When not using the schedule, please allow me to plan depending on how flexible my own schedule is.
  • Use real performance data to generate realistic expected and worst case scenarios.
  • When possible, especially when the trip is imminent, use real time data to reduce uncertainty in my trip plan, but make use of realistic expectations for forecasting the balance of the trip.
    .

To implement such a trip planner, a number of open questions remain:

  • Even for a perfectly reliable system, where exactly is that threshold between using the schedule and using expectations?
  • How does this threshold change as a function of normal or excessive variability in operations?
  • What is the best way to integrate real-time data (of varying predicative quality) with realistic expectations for trip planning on-the-go?

If you're still awake, and have comments or questions, let's talk. The fact that this post found its way onto your computer makes it highly likely you already know how to get in touch.

November 6, 2008

You Know What I Did Last Summer?

I spent 10 weeks last Summer as an intern on the strategy team of Transport for London's (TfL) London Rail division. This part of TfL is responsible for the London Overground, the Docklands Light Railway, and Tramlink, is the presumptive operator of Crossrail (if and when...), and serves as TfL's interface with the National Rail network. My general task was to help London Rail start to make use of the oceans of data spewing out of the Oyster smartcard ticketing system, but I spent the bulk of my time working on a project that came to be titled Oyster-Based Performance Metrics for the London Overground. I've posted my final report and slides and outline for the presentation I gave to TfL executive management.

Rather than try to explain the work, I've just cut and pasted the executive summary from the report and included some of my favorite figures (with no explanation). It's not a terrible paraphrasing, but if there is a lot of really good meat in the document if you are bored and hungry. Snooze on...


The London Overground is a pre-existing rail service in London whose operating responsibility and revenue risk were recently granted to Transport for London (TfL). Here we discuss the prospect of using data from the Oyster smartcard ticketing system to evaluate the performance of the London Overground explicitly from a passenger’s perspective.

The core idea behind our approach is to directly measure end-to-end individual journey times by taking the difference between entry and exit transactions stored by the Oyster system. The focus of this study is Excess Journey Time (EJT), calculated on a trip-by-trip basis as the difference between the observed journey time and some standard. In this case, the standard is determined for each trip with reference to published timetables, indicating how long the trip should have taken under right-time operations. A positive EJT indicates that the journey took longer than was expected.

Excess Journey Time is interpreted as the delay experienced by passengers as a result of services not running precisely to schedule. The distribution of EJT indicates reliability. We validate these interpretations using a detailed graphical analysis, and then aggregate them to the line and network level over a variety of time periods. Our analysis is conducted on large samples of Oyster data covering several months and millions of Overground trips in 2007.

At the aggregate level, relative values of Excess Journey Time are largely in line with expectations. The North London Line has the highest average Excess Journey Time of all lines on the London Overground, around 3 minutes, and the widest distributions (i.e. least passenger reliability). On all lines, there is significant day-to-day variability of Excess Journey Time. For the whole London Overground, and for the North London Line in particular, Excess Journey Time is worst in the AM and PM Peak timebands.

The current performance regime for the London Overground is the Public Performance Measure (PPM), which measures the fraction of scheduled vehicle trips arriving at their destinations fewer than five minutes late. Over time, EJT shows a strong correlation to PPM. There is clear additional variation in EJT, indicating that it captures certain information about passenger experiences that PPM does not. This variation tends to increase as PPM decreases, particularly in the AM and PM peak timebands, which suggests that the effectiveness of PPM as a measure of the passenger experience decreases as service deteriorates.

Another quantity of interest derivable from Oyster data is the time between passenger arrival at the station and the scheduled departure of the following train. The spread of this distribution of this quantity indicates the degree to which passengers arrive randomly (i.e. "turn up and go") rather than time their arrivals according to schedules. We have found that on the North London Line, especially during the AM, interpeak, and PM peak periods, passengers tend to arrive randomly. This is apparently in contrast to conventional wisdom for National Rail services, and has distinct implications for crowding levels and timetabling practice. In an appendix to this report we look at this in detail, and recommend that even headways be prioritized in timetabling the North London Line.

The Overground is, by design, part of a larger integrated multimodal network. Oyster data, by nature, is somewhat ambiguous in representing passenger trips on such a network that involve transfers or multiple routing options. This poses certain problems to our methodology, but also presents the opportunity to quantify and understand the experience of passengers across the entire network. We discuss these problems, potential solutions, and opportunities at length, as well as other applications for this methodology, and future research directions.

We have concluded that Oyster-based metrics are effective for monitoring and identifying problems as experienced by passengers on the London Overground. They may be even more effective for use across the whole of London's public transport network, particularly as Oyster is in the process of being rolled out to all National Rail services in the Greater London Area.

July 16, 2008

It's the distribution, stupid

Never thought I'd see this in print, but the MTA let NY Times publish a distribution of Metrocard usage for monthly passes (see below).

While the caption of image points out that "some riders use the $81 passes for 40 or fewer rides," it fails to point out that anyone making 46* or fewer trips is losing money on their pass. The calculus of "losing" or "making" money on a monthly pass is of course fraught with nuance (e.g. how much is it worth to me to not have to think about paying on each trip? are any of these passes subsidized?) but the article doesn't touch on it at all.

It's no secret that I love NYC Transit and transit in general, but that doesn't mean people should be buying passes when it's far from beneficial. Don't even ask about the London case...

* 46 trips is the breakeven point when the monthly pass cost $81 and individual trips cost $1.74 (after the bonus)

May 20, 2008

Climb the Charts, Schmimb the Charts

Someone I know asked me to post this little ditty from a past life career of mine. Judging from the time scale on the graph, I must have made this diagram 200 weeks after November 21, 1998 -- around the fall of 2002 I guess. In fact, if my memory serves me, the whole thing was really Steve's idea.

What the chart below shows to me is that the notion of "climbing the charts" is basically defunct. Albums hit their peak Billboard positions immediately or very shortly after their release dates, and then fade out like like the ego and stature of your high school football team's quarterback.

(click for a full size image)

There are a number of ways I have thought that this particular graphic could be improved -- make it interactive, somehow link albums by the same artist, etc -- but, really, who cares?!?

December 5, 2007

I Know A Guy

Q: Mike, how was it that you got to visit the work site of the East Side Access project, hundreds of feet under Manhattan, and check out a real Tunnel Boring Machine?
A: I Know a Guy

I definitely never thought I'd actually get to say that. Fortunately, the guy I know told me to bring a camera. The battery was dying so I didn't get to snap that many pics, but what I got follows. I think these pictures capture, to a degree, how messy the prospect of assembling a TBM way under ground really is.

The overriding feeling I had throughout was of being inside, around, and on top of a Sandworm in the novel Dune. What really impressed me was the number of people necessary just to put one of these things together, and the enthusiasm of all the people working on the project (sweating it out under 200 feet of bedrock).

The entrance to the work site, in Sunnyside, Queens



The 30+ years old but never used LIRR tunnel under the East River



The beast itself:

So, point being, East Side Access is really happening! This visit was in late August, just before the machine was brought on line. Perhaps I'll get to visit again to see it in action.

December 3, 2007

Congestion Pricing is a Technology, Remember?

In many ways London's system for Congestion Pricing should be model for New York, but in other ways it really isn't. The most obvious way that it isn't is in the actual technology proposed to do the job. Yes there are cameras and computers involved, that's sort of where the similarity ends.

Specifically, in the UK tradition, all of the cameras relay a full video feed to some central processing location. Not only is this absurdly costly (think fibre!!) but it allows for plenty of privacy invasion by anyone who has access to the cameras' feeds. The proposition for New York is very different. The proposition is much cheaper and seems to all but eliminate the possibility of using the cameras for anything but looking at license plates. That's because the cameras would be equipped with enough smarts to know when to snap a photo, and only that still image would be sent to be processed. If you don't believe me, read this excerpt from IBM's recently released proposal:

A worst case analysis shows that for a very busy lane, with one thousand vehicles passing the detection equipment every hour and forced to send two 100kB images for each vehicle, the bandwidth requirement is a mere 57kB/s. This is within the capacity of wireless networks today, but is not the optimal solution approach.

A more realistic case, in which 50% of vehicles are equipped with an E-ZPass tag, 90% of the remaining license plates are read with a sufficient confidence at roadside and 80% of charges are paid in a timely manner, leads to a bandwidth requirement of 8kB/s. A very busy, six-lane detection point would thus be well within the capacity of NYCWiN, even without local reinforcement of the wireless network.

We estimate that, with our proposed solution approach to vehicle detection at the edge of the network and given the estimated amount of traffic in the city, the average local bandwidth requirement across the system will be on the order of less than 1kB/s per lane, and the overall load on the backbone of the wireless network will be small.

More generally speaking, I think it suffices to say that Congestion Pricing uses technology, and as we know technology only gets better and cheaper over time, so we can be sure that NYC's Congestion Pricing technology will be much better and cheaper than London's.

Now, if only somebody would only explain this to all the privacy freaks and civil libertarians that are making this process so painful...

November 20, 2007

Mailer for Mayor (In Memorium)

In 1969 Norman Mailer ran for Mayor. In 2002 my grandfather (probably the most Mailer-like person I know) gave me an original copy of the campaign poster that he had squirreled away for 30+ years. A week or so ago, Mr. Mailer passed away, so it seems like the appropriate time to put this poster on the web, since I have never been able to find a copy online before. I'm about as far from a knowledgeable design critic as you can get, but this thing is an undeniable work of art, especially in the eye of any native New Yorker.



What a platform:



The boroughs, in order of time I've spent in them:



Some lovely embellishments on the Hudson River:



And my favorite little twist:



Not a bad running mate:



But not a good day either (they came in 4th):



For more info on the campaign itself, check out a recent NYTimes podcast and an interview on WNYC with Jimmy Breslin, Mailer's running mate and another icon of New York realness.