The MTA Taught Me Data Structures

posted 10/28/2024

Code referenced on this page can be found at:

In the constant soul-search for new projects, a CS major looks for three key attributes: usefulness, teachability, and personal appeal. A great project should solve real-world problems, teach its creator a lesson, and be actively fun to make. I'll settle for 2/3.

But before I talk about this project, I want to give a little history lesson. Let me begin motivating this problem with some pictures:

Images credit @cars.destroyed.our.cities

Around the middle of the 20th century, riding on the heels of New York Parks Commissioner Robert Moses's urban renewal programs, local governments across America rubber-stamped federal proposals for devastating highway construction projects that displaced hundreds of thousands of people, disproportionately affecting minority communities. While the interurbans and longer intercity routes that had once connected the country and helped form a unified American culture were becoming insolvent due to corruption, auto lobbying, and White Flight, urban cores themselves began to decay as those with more wealth and political power migrated to suburbs and exurbs well outside city centers. Local governments beholden to these wealthy suburbanites needed to provide options for them to commute comfortably to their downtown jobs, thus, highways were built through the heart of most American cities, ripping apart urban fabric and tearing apart countless communities.

This is worth mentioning because it's a critically important, oft-missing piece of context explaining our current "situation" in America. Suburban exodus and urban decay has formed the demographics of today's political parties, and has set the stage for various frontiers of the "culture war" tearing apart modern America. It primes us for a difficult conversation around solutions to climate change and sustainable ways of living. And it has led to, in my view, some problems.

Defendants of American car dependency typically argue that alternative transportation options are inconvenient, dangerous, or otherwise undesirable, and that they are free to choose to drive their car at their convenience, exercising their freedom to get around without depending on other people and their schedules. But these arguments are predicated on the false depiction of non-car-oriented life that has itself been created by car dependency. Transit is only inconvenient because it has been underfunded for a century. Biking is only dangerous (and cyclists are only assholes) because we do not build safe separated bike trails and lanes. Walking is only undesirable because we have ruined the landscape of our cities and spaced everything inconveniently far apart. Investing in alternatives to the automobile does not mean taking away the choice to drive, or forcing people back into cities. It just means making reasonable compromises about where cars can go and routing some of the hundreds of billions of dollars a year we spend on them to safer, more community-friendly infrastructure.

Back to computer science. One of the (admittedly many) things I could see myself doing for a career is creating models of urban areas that help guide city planning and administration. But cities are really complicated, and I have to start somewhere, so I thought it best to begin with just one aspect: mobility.

In this context, mobility is how people travel around an urban area. And while focusing on cars or pedestrians would be interesting, lots of people are doing that already, so I decided to start somewhere closer to my heart: public transit. I wanted to simulate ridership flow throughout the New York City Subway.

Object-oriented solution overview:

When I first conceived this as a final project for my class, I wrote it in Java, and it was pretty straightforward, requiring essentially no optimization. The small graphics wrapper provided by my professor batched renders, the JVM collected my garbage, and everything ran smoothly. I wrote straightforward OOP code and Java did its job to make it easy. The simulation chugged along at around 30,000 concurrent agents at reasonable speed.

But real programmers make their lives harder, so I rewrote it in C++.

It did not take long to (hypothetically) disappoint my Computer Organization prof and violate the paradigms of low-level languages, as it turned out to be easier and perhaps necessary to continue extending Trains and Nodes from drawable objects in the graphics library for easy rendering--thus returning to horrid OOP.

But many other changes were necessary.

Core C++ rewrite changes:

The results:

There is a lot to criticize about this project, and there is much that I still want to do. By virtue of having no formal training in C++, only one course's worth of experience with C, and more experience than any developer should have in Java, this project's code is styled in a way that satisfies nobody. But bodging is OK.

What's less OK is the criterion that this app misses for being a "good" CS project: usefulness. This "simulation" in no way simulates the real world. Riders are probabilistically generated from annual ridership data, neglecting real-world commute patterns or the daily schedule that drives the actual system. Train times, counts, and capacities are guesstimates designed mostly to keep the simulation visually appealing. But there are things that I plan to do to remedy this:

Throughout the development process, there were also several failed experiments to boost performance. These included path contraction, updates to the path cache, and bidirectional A*. These led me towards several possible future performance improvements:

I hope to make it back around to these improvements as I grow as a programmer.