In the constant soul-search for new projects, a CS major looks for three key attributes: usefulness, teachability, and personal appeal. A great project should solve real-world problems, teach its creator a lesson, and be actively fun to make. I'll settle for 2/3.
But before I talk about this project, I want to give a little history lesson. Let me begin motivating this problem with some pictures:
3rd and Central, Cincinatti, OH, demolished to make room for I71, displacing 25,000
The once vibrant and dense neighborhood surrounding the now-defunct Union Terminal, Cincinatti, OH
Main Street, Kansas City, MO, demolished to make way for I70, strangling the city's downtown
The construction of I-35W in Minneapolis, MW, which displaced thousands
The interior of old Penn Station, New York, NY, demolished to cut costs for the failing Penn Central Railroad
Around the middle of the 20th century, riding on the heels of New York Parks Commissioner Robert Moses's urban renewal programs, local governments across America rubber-stamped federal proposals for devastating highway construction projects that displaced hundreds of thousands of people, disproportionately affecting minority communities. While the interurbans and longer intercity routes that had once connected the country and helped form a unified American culture were becoming insolvent due to corruption, auto lobbying, and White Flight, urban cores themselves began to decay as those with more wealth and political power migrated to suburbs and exurbs well outside city centers. Local governments beholden to these wealthy suburbanites needed to provide options for them to commute comfortably to their downtown jobs, thus, highways were built through the heart of most American cities, ripping apart urban fabric and tearing apart countless communities.
This is worth mentioning because it's a critically important, oft-missing piece of context explaining our current "situation" in America. Suburban exodus and urban decay has formed the demographics of today's political parties, and has set the stage for various frontiers of the "culture war" tearing apart modern America. It primes us for a difficult conversation around solutions to climate change and sustainable ways of living. And it has led to, in my view, some problems.
Car-centric development tends to prioritize visibility for drivers and minimize costs, thus neglecting decoration and vegetation. This leads to lost biodiversity, dead animals, and the "urban heat island effect", which kills people.
Car-centric development complicates pedestrian alternatives and promotes a sedentary lifestyle. Too much sitting also kills people.
Car-centric development is antisocial and frustrating. In your car, you are often alone. In traffic, everyone on the road is your enemy. This can end up killing people.
But more than that, cars promote a solitary lifestyle. As corny as it sounds, in most cases driving makes travel about getting efficiently from A to B, not about appreciating your surroundings. When driving, you're less likely to run into people, or randomly stop at an unplanned location, or explore somewhere new. This leads to a noticeable decrease in quality of life that Americans might often note when travelling abroad.
Car-centric development is ugly. This does not kill people.
Cars are much bigger than people, thus, car-centric development takes up lots of space. This extra space increases the cost of utilities and makes alternatives like walking, biking, and transit much harder to implement, forcing more people into cars in a vicious cycle.
For businesses, car dependency hurts the good guys and helps the bad guys. Commerical businesses, especially small businesses, lose revenue, while seedy lawyers, insurance firms, and polluting car companies profit.
Car dependency restricts social mobility. Without a car, people lose access to jobs, shopping spaces, third places, schools, healthcare, and more, making it harder for them to improve their lives.
Cars are expensive for everyone. The vehicles themselves are expensive, as are predatory insurance and maintenance costs. Direct subsidies like gas and infrastructure are loaded onto the taxpayer, who also unwittingly pays for negative externalities like healthcare, climate change, and "free" parking.
Defendants of American car dependency typically argue that alternative transportation options are inconvenient, dangerous, or otherwise undesirable, and that they are free to choose to drive their car at their convenience, exercising their freedom to get around without depending on other people and their schedules. But these arguments are predicated on the false depiction of non-car-oriented life that has itself been created by car dependency. Transit is only inconvenient because it has been underfunded for a century.ⓘ Biking is only dangerous (and cyclists are only assholes) because we do not build safe separated bike trails and lanes. Walking is only undesirable because we have ruined the landscape of our cities and spaced everything inconveniently far apart. Investing in alternatives to the automobile does not mean taking away the choice to drive, or forcing people back into cities. It just means making reasonable compromises about where cars can go and routing some of the hundreds of billions of dollars a year we spend on them to safer, more community-friendly infrastructure.
Back to computer science. One of the (admittedly many) things I could see myself doing for a career is creating models of urban areas that help guide city planning and administration. But cities are really complicated, and I have to start somewhere, so I thought it best to begin with just one aspect: mobility.
In this context, mobility is how people travel around an urban area. And while focusing on cars or pedestrians would be interesting, lots of people are doing that already, so I decided to start somewhere closer to my heart: public transit. I wanted to simulate ridership flow throughout the New York City Subway.
Object-oriented solution overview:
We construct a graph of Nodes representing stations. We use the ridership of a station to probabilistically generate [start, destination] station pairs.
We use an ArrayList of Citizens as the primary simulation agents. Citizens generate, store, and travel along a path of Nodes.
We also have Trains that monotonously move along a path of Nodes (stored as a Line).
Movement is accomplished through walking or using transit. A timer, path index, and status enum is used to track movement. Walking uses a fixed travel speed. Citizens using transit can "board" a station (waiting until the correct train arrives) or board a train (waiting until the train reaches their stop). Boarding, including waiting for trains, is accomplished with references. Timers are used for realism, including during boarding/deboarding.
A* is used for Citizen path generation. Heuristics are tweaked to reward express lines (a fixed penalty is applied for every stop), penalize transfers between lines (using Line objects), and increase the cost of walking relative to taking the train. To integrate walking into the system, a WalkingTransfer dummy line is used. To store information about a node/line neighbor, simple PathWrapper objects are used.
When I first conceived this as a final project for my class, I wrote it in Java, and it was pretty straightforward, requiring essentially no optimization. The small graphics wrapper provided by my professor batched renders, the JVM collected my garbage, and everything ran smoothly. I wrote straightforward OOP code and Java did its job to make it easy. The simulation chugged along at around 30,000 concurrent agents at reasonable speed.
The simulation manages around 20,000 agents across the system.
Citizens are shown travelling by train and walking between stations in dense Manhattan.
Citizens are spawned by the user near the cursor and distribute around the system.
But real programmers make their lives harder, so I rewrote it in C++.
It did not take long to (hypothetically) disappoint my Computer Organization prof and violate the paradigms of low-level languages, as it turned out to be easier and perhaps necessary to continue extending Trains and Nodes from drawable objects in the graphics library for easy rendering--thus returning to horrid OOP.
But many other changes were necessary.
Core C++ rewrite changes:
We require manual rendering using triangle generation with VertexArrays and VertexBuffers to flush/store data in GPU VRAM.
Low single-thread speeds necessitate multithreading. The final implementation settles on a dedicated render and pathfinding thread, and a thread pool that distributes Citizen update tasks (moving citizens along paths) to various worker threads.
We reduce copy overhead by using a custom data structure to overwrite citizens in-place. A stack of references to "dead" citizens is used to track overwrite-eligible citizens, and the vector is only extended when the stack is empty. We avoid race conditions by pooling citizen pointers and waiting until the end of the frame to push them all to the stack.
We implement a two-way LRU cache for pathfinding with a ~20% hit rate.
The results:
The simulation easily handles 50,000 concurrent agents.
The user interacts with nodes to see how citizens pathfind between two stops.
The user spawns citizens at a custom node to distribute them around the system.
There is a lot to criticize about this project, and there is much that I still want to do. By virtue of having no formal training in C++, only one course's worth of experience with C, and more experience than any developer should have in Java,ⓘ this project's code is styled in a way that satisfies nobody. But bodging is OK.
What's less OK is the criterion that this app misses for being a "good" CS project: usefulness. This "simulation" in no way simulates the real world. Riders are probabilistically generated from annual ridership data, neglecting real-world commute patterns or the daily schedule that drives the actual system. Train times, counts, and capacities are guesstimates designed mostly to keep the simulation visually appealing. But there are things that I plan to do to remedy this:
Use better ridership data and patterns to mimic e.g. rush hour, weekend service, more accurate commuters
Introduce more randomness in citizen timers and more complex pathfinding heuristics for more realism
Use accurate train counts, speeds, and headways
Improve pathfinding with algorithms such as D* that can handle dynamically shifting weights from train arrivals (think of how Google Maps can path you on transit using timetables)
Give trains physics and have them move along their real-world track instead of idealized lines
Introduce delays and update capacity constraints to better model congestion
Throughout the development process, there were also several failed experiments to boost performance. These included path contraction, updates to the path cache, and bidirectional A*. These led me towards several possible future performance improvements:
De-OOPification: experiment with ways to separate objects into arrays for improved performance, as well as test AoS (current) vs. SoA speed.
Path contraction: improved A* heuristics or preprocessing such as neighbor sorting could improve pathfinding performance with path (edge) contraction. Previously, this hurt performance because A* was checking too many extra edges, but it is worth revisiting. Candidates for path contraction would be major transfer points on the same line as a node.
Path compression: after citizen paths are generated, citizens only really need to store transfer points in their path, rather than every node they pass through. This could reduce memory usage by 90% and prevent unecessary checks on citizens who might just be sitting idle on a train, especially in combination with:
Priority queue for citizen updates: every frame, every citizen is checked by a worker thread. CPU usage analysis shows that this is wasting a lot of time. This simply isn't necessary--at any given time, ~35% of citizens are on a train, which has a set arrival time--meaning that those citizens only need to be updated when the train gets to their stop, and don't need to be accessed every frame. Another 45% are at stops and don't need to be accessed until a train is in the station. These access delays could be implemented with a priority queue maaging the core citizen loop--but there are drawbacks to this additional complexity.
I hope to make it back around to these improvements as I grow as a programmer.