🚈 Why Did I Build a Transit Bot?

By Heliomass | November 08, 2023 | 15 minutes

Recently I made a bot to report transit delays in Montreal on Mastodon and Bluesky. If you’ve not given them a follow yet, you’ll find the links here.

I had a few questions come my way once I began sharing the accounts, so I thought I’d write a post about it, which is also my way of drawing a line under the project… at least for now.

Back in 2008 Jeff Atwood of Coding Horror fame wrote what is (in my opinion) a seminal post entitled “UsWare vs. ThemWare”. This post really stuck with me over the years and I often think about it when I’m making anything.

In my day job, we write “ThemWare” for our clients. Often personal projects are “MeWare”. It’s special to find that perfect spot where you’re both writing something for yourself, but which others will also find useful, the so-called “UsWare”.

And so for once, I found an idea which not only I’d find useful, but others might find useful too. I had an idea for UsWare.

Motivations

We’re still in the early days of these emerging social networks. Whilst the site formally known as Twitter continues to fall apart in numerous ways, a large number of its existing user base hasn’t yet jumped ship. This group includes many media organisations, influencers who’d built up their followers over a decade and public agencies.

The latter category of public agencies is the focus here, and it’s hard to blame any of these groups for trepidation in jumping from the sinking ship. The emerging social networks (consisting of Mastodon, Bluesky and Threads in this context) have chipped away at the site formally known as Twitter, fragmenting users over these three networks which are still jostling to be the next Twitter – although in terms of pure numbers Threads seems to be winning the battle here.

There’s also the overhead for an organisation to expand its social media presence. Legacy websites need to be updated, and additional social networks must be monitored and updated.

Coming back on topic, here in Montreal the three main authorities which use the site formally known as Twitter to push out transit updates are the STM, exo and the REM. None of these organisations have indicated publicly they’ve explored Mastodon or Bluesky, and this is where my motivation comes from, because I predict it will be a long time before they take the plunge.

The more Mundane motivation is to have all transit updates for Montreal in one place. The official updates on the site formally known as Twitter have individual accounts for each line. I thought it would be convenient to have everything in a single update.

Media outlets still embed “tweets” but not “toots” or “skeets”. Momentum of these emerging social media services into the mainstream is going to remain slow, and I think it’s up to hobbyists to bridge the gap until everyone else catches up.

Mini Transit Glossary

A quick note for anyone reading this who isn’t from Canada’s second largest metropolitan region. The hierarchy of transit authorities in Greater Montreal is complex, and I’m hoping at some point to do a deeper dive into the topic. But for now, here are the terms you’ll see cropping up:

STM: Acronym for “Société de transport de Montréal”. The organisation which runs the Metro on the Island of Montreal
REM: Acronym for “Réseau express métropolitain”, the brand new light rail network connecting Greater Montreal

Both of these are at the core of Greater Montreal’s mass transit, and are supplemented by the many bus networks and also by commuter rail services.

Behind the Scenes

We’re going to get a little technical here when taking a peek behind the curtain, so feel free to skip to the next section if this isn’t your area of interest.

How do things work behind the scenes? What’s the architecture and which technologies were employed?

I made the decision early on to have two separate components:

A RESTful API to gather and collate the status info
A script to post the updates to Mastodon and Bluesky

My thinking was to create a division between the act of scraping the status updates and posting the updates. At the back of my mind I was also considering I might open up the API to wider consumption at some point, although now the API endpoint is only available internally on the host.

The API

The API is quite simple. It has one call, /statuses, and it returns a JSON response like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


{
  "en": {
    "metro": {
      "1": {
        "line_name": "Line 1 - Green",
        "line_status": "Normal metro service"
      },
      "2": {
        "line_name": "Line 2 - Orange",
        "line_status": "Normal metro service"
      },
      "4": {
        "line_name": "Line 4 - Yellow",
        "line_status": "Normal metro service"
      },
      "5": {
        "line_name": "Line 5 - Blue",
        "line_status": "Normal metro service"
      }
    },
    "rem": [
      "REM A1 - Brossard: Normal service",
      "REM Gare Centrale: Normal service"
    ]
  },
  "fr": {
    "metro": {
      "1": {
        "line_name": "Ligne 1 - Verte",
        "line_status": "Service normal du métro"
      },
      "2": {
        "line_name": "Ligne 2 - Orange",
        "line_status": "Service normal du métro"
      },
      "4": {
        "line_name": "Ligne 4 - Jaune",
        "line_status": "Service normal du métro"
      },
      "5": {
        "line_name": "Ligne 5 - Bleue",
        "line_status": "Service normal du métro"
      }
    },
    "rem": [
      "REM A1 - Brossard: Service normal",
      "REM Gare Centrale: Service normal"
    ]
  }
}

At the moment the response contains the actual text gathered from the STM and REM websites in both English and French, with some minor modification of the text (for example, in English the accented “é” is removed from “metro”).

The Metro statuses are organised by line in dictionary format, whereas the REM statuses are organised sequentially in an array. The reason for this is the different ways the STM and the REM separate their services.

The STM has a static number of lines – numbered 1 (Orange), 2 (Green), 4 (Yellow) and 5 (Blue). Each Metro line has no branches and is a single link between the two termini. Therefore, they provide a single status for each line which we assume applies to trains running in either direction.

(Side note: You may wonder why there’s no line 3. The answer is it was planned but never built)

(Pedantry corner: The Orange Line turns some peak hour trains back at Henri-Bourassa, so you could argue it in fact has three termini rather than just two)

The REM handles things differently. There’s essentially one REM line which is known simply as “Line A”. The choice of a letter would appear to have been done to distinguish the system from the Metro which uses numbers, and perhaps also with the expectation of future REM lines being commissioned.

Unlike the Metro lines, REM Line A will have multiple branches and travel considerably further afield than the Metro. This means the REM reports service status according to destination.

Destinations are named with the line’s letter and a number. Eg: A1 is for trains to Brossard, A2 for trains to the airport, etc.

With the full system not yet complete, there’s a temporary terminus at Gare Centrale which doesn’t have a number assigned to it, and we’ll return to this point in a while when talking about how the updates are displayed in the posts to Mastodon and Bluesky.

The RESTful API is written entirely in Python, save a shell wrapper to create a virtual environment and pull in dependencies from PIP. The two main libraries it uses are:

Beautiful Soup: This library is specifically designed for website scraping, and makes it really quite simple to drill down to the data you’re trying to extract. It’s fantastic.
Flask: A well-known library for developing web applications, but also works nicely for a light-weight RESTful API.

The API does a couple of other useful things as well. It keeps a cache of the latest update for a period of time. This is important so the STM and REM websites don’t get hit with requests each time the bot wants to poll for an update. This would become more of a problem if at some point in the future the API is made public and we don’t want anyone black-listing the API’s IP address because we’re sending too many requests.

At the moment the expired cache only gets refreshed at the time the API receives a request for an update. This is OK for the limited number of requests received but has the side effect of some requests to the API taking a few seconds as the cache is refreshed. This would create a bottleneck if the API were to be publicly accessible. A better approach if the API is opened up to other consumers is to have a background process looking after the cache so requests would receive an instant response.

The second benefit is in the way it abstracts the act of data gathering from the act of making a post. Instead of an app we have an architecture, which is much more flexible and future proofed if we want to turn it to additional purposes.

However, the API isn’t stateful. It only provides a glimpse of the current state of affairs, reflecting the experience of visiting the STM or REM website. It’s up to the services leveraging the API to track changes in state if that is what’s required.

The Bot

A separate script consumes from the API and keeps track of the state. If there’s a change in the status of any of the lines, it posts and update to all accounts on both Mastodon and Bluesky. Each of those services has two accounts (one in English, and one in French).

There’s also a way of posting updates at specific times of the day, regardless of status. At the moment I’ve set this to post a morning update at 7am, and an afternoon update at 4pm.

In terms of formatting, the bot takes the JSON from the RESTful API and turns it into nicely formatted text.

A typical service update as seen on the English language Mastodon account

I’ve made heavy use of Emoji in the status to help people hone in on their particular line of interest. For example, Metro lines use their colour and number.

For the REM, the destination shows the line’s letter and its destination number (eg: A1 for Brossard). In the case of Gare Centrale which is a temporary destination with no assigned number, I’ve gone with an asterisk to make “A*”. I’m sure there’s an astrophysics joke in there somewhere.

Hashtags are only added to Mastodon posts, and then only for posts which are live updates, versus the morning and afternoon updates. This is to help reduce pollution of the Mastodon timelines and draw attention to the posts where the service status has changed.

In terms of making the actual posts to Mastodon and Bluesky, both these platforms have done a good job of making integration simple.

On Mastodon, you can generate an API key via your instance, and then it’s just a case of sending a request to the right end-point on your instance’s API using your token and the status update as the data payload. For example, as a curl command it would look like this:

1
2
3


curl https://mastodon.example/api/v1/statuses
  --header 'Authorization: Bearer MY_TOKEN'
  --data 'status=This is my status'

Very simple indeed, and straight forward to implement in Python.

Bluesky works a bit differently. Instead of generating a token, you create a dedicated app password for your account, and then use your Bluesky username and the app password to send the update.

To make this easier I found a nice Python library called atproto which handles the details and is in active development.

Limitations

At the end of the day, a bot can only be as accurate as the data it consumes. There are likely private APIs available to transit agency partners, but in this case we need to use what’s visible on the websites of the STM and REM.

We know not all outages are guaranteed to be posted. For example, the STM promises an update if the outage time will exceed 10 minutes. I may be also be cynical in thinking transit agencies don’t necessarily want to be too transparent around the quality of their service.

As we’re essentially scraping data from web pages, we can also end up consuming bad data if there’s a problem with that content.

In one instance the STM set the Yellow Line’s status to “Test”. In another case, there were no REM updates at all published on the REM status page resulting in an empty array in the RESTful API.

Someone at the STM set the Yellow Line's status in an unexpected way

Here's how the web page looked when this glitched out

When these glitches start happening frequently, it’s worth coding around them so you’re not pushing out useless updates, but when it’s a once-in-a-while glitch, it’s something I’m happy to live with.

There’s also the risk that at some point in the future the STM and REM websites will change layout, meaning the bot will break and I’ll need to update the API to be able to gather the service data again. I’m hedging my bets that since we’re in an era where transit organisations are running a deficit, they’ll have an “if it ain’t broke don’t fix it” policy.

I would imagine in both the STM and the REM’s cases, they invested a lot up-front in their front-end and back-end infrastructure, and will leave it as-is so long as it works. This is the same reasoning as to why I don’t expect to see feeds implemented on Mastodon or Bluesky in the medium-term.

Artwork

When I realised other people were potentially going to follow these accounts, the design didn’t feel like it was up to scratch. I wanted something which looked more sleek and refined.

The original design during the development phase left a lot to be desired

Thankfully I had help from my talented wife, and opted for a Japanese-style kawaii rendering of the Metro’s MPM-10 “Azur” cars as well as a stylised Montreal skyline. The coloured lines seen in both the avatar and skyline represent the four colours of the Montreal Metro which are instantly familiar to any Montrealer.

Account avatar

Previous Efforts

It’s also worth quickly noting for the sake of completion, this isn’t the first time I’ve tried to build something for transit updates. My last attempt was building a webapp for the iPhone which I ended up abandoning because I couldn’t get a handle on iOS webapp behaviour, but also it didn’t add a lot of value beyond the STM’s own status page and their updates on the site formally known as Twitter.

I wasn’t tracking usage and it wasn’t well promoted, so I’ve no idea how many people other than myself were using it, but it was a fun project.

Future Plans

With the writing of this post I’m drawing a line under this project. I do have ideas for features which will inevitably be added when I have time, energy and attention to do so.

Some of the ideas on that roadmap include:

Weather warnings: This might sound like an odd idea for a feature, as there’s not a lot of point adding regular weather to the bot. Seasonal weather is mostly consistent and Montrealers know exactly what to expect for the time of year. But when extreme weather is predicted, it would be good to know as these things can escalate into transit problems (and also, who doesn’t want a heads-up when there’s a tornado?).
exo updates: There are 5 exo (commuter rail) lines which deserve to be added to the bot. I’m aware that there’s limited characters for a post – especially on Bluesky – so I’d also need to change the overall format. Something which shows “Good service on all exo lines” unless there’s something specific to report.
Threads support: I’ve not looked into whether it’s possible to create a bot on Threads, but I suspect out of all the emerging social media platforms Threads is going to be the most used. It may even overtake the site formally known as Twitter.
Web app: Nowadays it should be straight forward to have a static web page displaying the current transit status. There would also be the option for people to add the app to their phone’s homescreen and use it as a web app.
Performance Statistics: It would be possible to have a database tracking the history of shutdowns over the course of the month, and publish a graph or some interesting stats periodically.
Making the API public: If there’s interest and some good use cases, I’d consider making the API publicly accessible as well as making the API’s source code available on GitHub.

Final Tip

I gave some consideration to iOS and iPadOS widgets and push notifications, but you can sort of do this already.

If you’re using a Mastodon app such as my personal favourite, Ivory, you can allow push notifications from specific accounts.

Expanding an Ivory notification results in seeing the full update.

It’s also possible to use Ivory to show updates from a specific lists on a widget, meaning you can create a dedicated list for transit updates and add it to your homescreen.

A homescreen widget of sorts using Ivory. If I fixed the order of the status text, this could work as a widget.

A Project is Never Truly Finished

It makes a change to have tried to build something which is useful to others as well, instead of my usual niche side projects.

If you have any comments or suggestions, you can of course contact me on Mastodon and Bluesky.

You can follow the accounts by going to the main project page, and if you’re one of those people who’s already following these accounts, shown interest in the project or asked interesting questions, I thank you. That’s always the real motivation to build these kinds of things.

This article made use of Mastopoet and Apple Frames. Undoubtably both of these are “UsWare”!

Enjoyed this post? You may also enjoy: