When we have a live incident the ops tickets language is close to how I imagine air traffic control sounds. "Live issue noted" "Confirmed"
— Max (@hawkida) March 30, 2017
As Frasier Crane was wont to say, “I’m listening”.
I’m the test lead for the iPlayer Radio websites. I get into work by 8am most days. I have my breakfast at my desk and ease into the day doing the following things:
Looking at the results of the automated tests that ran the previous night.
Catching up on email
Reading messages from angry people.
Where there are problems showing from the test automaton I investigate the causes. Software is a fickle beast and often things have timed out as the tests ran because the test environment is slow, or empty caches have cause temporary glitches, or Browserstack decided not to listen when we called it. Sometimes we find actual bugs and I’ll spend some time isolating the problem in order to raise a bug ticket. If we’re lucky there will be nothing to report and the tests will all be green.
Reading email informs me of the state of the product, the team, the department and the company. It contains details of deployed code, Jira ticket state changes and plans for future direction. I flag things that will need follow up action such as requests from my line manager or pull requests that need a code review, and I check my schedule for the day.
The “angry” people are also found in my inbox, collected together in reports from the people who manage our audience feedback. I subscribed to these lists in order to see first hand responses from the audience before it gets filtered through more people or deprioritised out of existence due to more pressing concerns. There isn’t always time to get through it and when I’m otherwise busy I may skip this part of my routine. I try to leave them unread if I haven’t glanced through it. Why? Because I want to know where our products are failing, and ideally I want to know early. One of the core principles of working for the BBC is accepting that “The audience is at the heart of everything we do,” and yet only a few of us see the direct messages they send for our consideration. To my mind, this insight is all part of promoting quality and it can cover both the functional failures and areas where we’ve made assumptions in building that haven’t worked out for the audience.
There’s an argument to be made that says that flippantly calling them angry is at odds with that ethos, but sadly the truth is that you do need something of a thick skin to read the messages about the product you have worked on. The BBC is something of a behemoth. Nobody can tell you the answer to the question “What’s it like to work at the BBC?” because we only have experience of our own corner and departments, teams and experiences differ massively. There are thousands of workers, a multitude of departments, and even the website isn’t run, as many think, by “a” coding team so we don’t necessarily know what the rest of the web teams are doing. As such, sometimes things aren’t as joined up as they might be, and for an audience member it’s quite difficult to submit a complaint or comment using the website while ensuring it will reach the right people. Sometimes the logs show that somebody started out wanting to compliment something, or make a benign comment but the prolonged and confusing route to submitting a message leaves them so incensed it turns into a complaint along the way. That aside, people who contact us are often angry because they believe that we don’t care about them, that their experience is a universal one, and that their problem could be easily fixed if only we weren’t such idiots. We are accused of being incompetent, called derogatory names and belittled in many of the messages we receive. Not all, by any means, and I can’t imagine we’re unique in this.
That aside, it is still good to consume these reports. Sometimes people have a good understanding of what’s going wrong and submit a message akin to a thoroughly researched bug report explaining the workarounds they’ve found, the exact time the issue started for them, or details of things they’re tried with or without success. Others are less helpful, simply reporting things like “I wanted to let you know the radioplayer doesn’t work now” – we don’t know what doesn’t work, under what circumstances, or whether it has ever worked for this user. All the same, even that can be helpful if we see a number of similar reports around the same time.
And then there are the times when a sole user alerts us to something we’d missed. It’s something that, as a single report, might take a little while to filter through to the devs, but with the team’s eyes directly on it we’re able to recognise things that look odd and investigate further, and if there’s a real issue we can step up and turn around a fix very quickly. My colleague, our tech lead, Alex, wrote about one such instance at length in his own blog post recently: https://medium.com/@alexgisby/on-live-incidents-and-process-ee7a4d83c5f7
We spotted another issue recently whereby the pop out player, the main interface on our website for listening to content on a desktop machine, had stopped autoplaying – again, this was initially an issue we saw in the logs, and through our own testing and investigation we were able to confirm that it was not, as initially though, a problem on Firefox for the PC only, but it was related to the very latest version of Firefox that had been drip fed into the wild during that week. We worked out where in the code the failure was and alerted the team who was able to do something to mitigate against it. A full fix will come later but in the meantime we were able to feed back to the audience facing staff that for the majority of users there were workarounds: the two main ones being to use an alternative browser, or to press the play button after the window popped up. It might sound ridiculously obvious but our own behaviour revealed that it wasn’t. Having the autoplay just work was such an entrenched idea for our own team that we sat there dumbfounded when it didn’t, and just stared at the “loading” message wondering how long it would be there. We discovered that a large number of people facing the issue were able to work out for themselves that pressing the play button worked, since there was no noticeable drop in the number of connections on that browser from prior to the rollout. For us, and for the audience members who were regular users of the software, our immediate conclusion was “its broken” and many wrote in to say so.
This was another area where I intervened. From viewing the logs I was able to see the standard response that was being sent out to our users after the workarounds were discovered and I was a little stunned to note that while we offered two options as workarounds, the first was to use an alternate browser and the second was to press the play button. I pointed out that we ought to offer the path of least resistance as the first option and that downloading and installing new software, or even switching to software you have but don’t prefer is a far more jarring experience than simply pressing play. While we planned to switch the message around, the fix came in in the meantime and it wasn’t necessary, but the point stands: our team is empowered to make decisions and champion them, from fixing bugs in code, to optimising the audience experience outside of it.
One of the most rewarding parts of working at the BBC is to be able to step beyond the standard definition of your job role and get involved in other areas where you can see a way to add value. Keeping that finger on the pulse and direct line to the audience lets us enhance the quality of our product.