Machine Listening

Listening with the pandemic¶

“When the sounds of the pandemic recede, how will our hearing be changed?”
Shannon Mattern¹

The pandemic is not an intermission, it’s an opportunity¶

The water behind a dam holds an immense amount of potential energy. A crack might become a hole might become a break. If legal regulation, public opinion or under-investment has been the dam holding machine listening back from becoming truly pervasive, then SARS-CoV-2 has weakened that structure throughout. Every voice industry startup and tech juggernaut, every streaming platform, every military funded machine diagnostics lab, every automated care industry device manufacturer has been ready and waiting for just this moment to spring into action with a www⁄quick technological fix that just so happens to bolster their position. From the perspective of the machine listening industry, the pandemic is a dream come true. It gets to do what it was always going to anyway, only blanketed now in the twin auras of inevitability and social good.

According to one team of researchers from the UK, Germany, Japan and China, machine listening is not just “ready for implementation” but “urgently necessary” in the “fight against COVID-19”. In addition to audio diagnostic and “pre-diagnostic” tools, which would aim to diagnose the virus from a person’s voice before symptoms had otherwise manifested, they cite as possible use-cases: “automatic recognition of deceptive speech when people are questioned about their recent contacts or whereabouts”; automatic monitoring of “telephone or other spoken conversations”, which “together with GPS coordinates from smart phones… could establish real-time spread maps”; acoustic monitoring of “public obedience and discipline in social-distancing”, and in hospital settings, of patients' “emotions, eating habits, fatigue, or pain, etc.” Finally, “public spaces could be empowered by AI that detects potentially risky settings, which are over-crowded, under-spaced in terms of distance between individuals, and spot potentially COVID-19 affected subjects among the crowd, and whether these and others are wearing a protective mask while speaking."²

If this sounds like a blueprint for intensified surveillance, for even greater capture and control of our sonic worlds, a true panacousticism,³ that is apparently not the authors' problem. Important ethical questions exist, they write, which unfortunately “cannot be addressed”. They don’t say why.

Thoughtlessness¶

One US company hoping to capitalise on the pandemic sells www⁄voice analytic technologies, which it says can “vet for fraud, security, and safety risks” with greater than 94% accuracy. All this based on a 2-10 minute long phone call, in which it isn’t what you say that matters, but how. Your voice, it is presumed, will betray you. Representation not only can but should be bypassed. Before the pandemic, this company’s products were already available in 13 languages across 12 countries and 23 industries, including to government and military contractors. Today, they also offer “automated telephonic vocal risk assessment” for the determination of www⁄fraud in allocating Covid-related welfare and stimulus packages. How the system works, what precisely constitutes “vocal risk” and why, is never explained. It is, after all, proprietary.

The imaginary that underpins this kind of technology does not emerge with machine listening. In addition to the obvious economic incentives, t is directly related to the history of the stethoscope in medicine,⁴ along with lie detection and personality profiling in security and policing.⁵ But vocal risk analysis also extends and modifies these practices insofar as it follows on from machine listening’s original wager: that silicon ears might discern what meat ears⁶ never could; that there is a layer of auditory truth beneath or beyond the threshold of human hearing, and that this can be accessed only by machinic systems whose workings, in many cases, cannot be reverse engineered or explained to those same human ears.

There is a profound and ramifying “thoughtlessness” here: at once ethical, political, and epistemic.⁷ Once you start down the road that machines might directly audit reality, where to get off? A computational physiognomy of voice becomes so much easier to imagine, sell, and embed.

Touchlessness¶

Most of us will experience machine listening as an interface. Say goodbye to spring-mounted keys and clicking mice, maybe soon even the quiet tap of fingers on capacitive glass. “Alexa,” we command - or is it ask? - into an airy, expectant atmosphere. Touchlessness⁸ refers first to this invisibility of interface, but it is also “social distancing”, remote work, standing no less than 2m apart in a queue for toilet paper, and, in the case of corona voice diagnostics, the idea that computational systems might determine the presence of the virus from the sound of a person’s speech or cough (touchlessness is part of a history of stethoscopy and mediate auscultation too).

There’s no evidence yet that such a thing is possible, but many organisations are trying and they are thirsty for data [www⁄i, www⁄ii, www⁄iii, www⁄iv, www⁄v, www⁄vi, www⁄vii, www⁄viii] . www⁄“Donate your voice." “Hit record and read the following sentences while pinching your nose.” “Press record and cough three times.”

Touchless covid diagnosis would make life and labor much safer for primary care workers. It would also be destined for automation and embedding into existing audio systems like telehealth, the smart city, and the smart speakers increasingly found at www⁄patients' bedsides. During a pandemic, where underequipped hospitals, testing centers, workplaces and urban centers are vectors of virus transmission, touchlessness becomes a hygienic imperative as well as an economic one. In a world in which we increasingly understand the air itself as toxic, touchlessness tends towards breathlessness too. After all, smart assistants don’t breathe.

A world of pure touchlessness is a world in which every breath becomes an examination, every word an interrogation. Do you have COVID? Have you been displaying psychotic tendencies? Are you happy or sad? Local or foreign? Were you really in quarantine for the last 14 days? How will we determine what questions can be asked of a voice? Who will decide? And with what degree of scrutiny? What will become of www⁄the freedom of speech itself?

Wakewordlessness¶

Wakewords were never going to last. They were always a trojan horse, designed to inveigle voice assistants into our homes and machine listening into our daily existence via a fantasy of consent. Their tendency is to disappear. In the future, machine listening will be wakewordless. Much of it is already. The smart city in particular is always listening; though the distribution of this listening is heavily stratified by, for instance, www⁄race and class. Urban gunshot detection systems like Shot Spotter,⁹ and the microphones embedded in www⁄CCTV cameras and www⁄street lights, never sleep. Only personal devices retain the pretense. And not for long.

In response to the pandemic, the latest OS update to Apple Watch will include a www⁄feature that uses “machine-learning models to determine motion, which appears to be hand-washing, and then use audio to confirm the sound of running water or squishing soap in your hands.” All this with a view to helping you “keep going for the amount of time recommended by global health organisations.”

Here is a future in which the wake word becomes something we do, a situation, environment or atmosphere. Our bodies will consent on our behalf. Maybe one day soon, it will be our tone of voice, or a cough. As wakewords wane, so the qualities of wakefulness expand. What it means for a microphone to be ‘on’ grows by the day. Already, my headphones know where I am listening, and www⁄adjust how I listen accordingly. Though Google Assistant is built in, I never have to ask. And if I want to stop and talk to someone, they pause automatically to let in ambient sound. That way I never have to take them off.

Ambient assisted living¶

Of the many things the pandemic has clarified, one is that aged care homes are a microcosm worth paying attention to. By August 2020, 68% of Australia’s Covid-related deaths were residents of aged care. In the UK, aged care residents are dying at three times the “normal rate”. No surprise then that aged care is also a laboratory for smart homes and cities.

In Google Home patents, elderly relatives are monitored like Tomagotchi pets. Systems send out suggestive prompts at opportune moments, reminding you to “Give your mother a call.” Across the spectrum of this technological imaginary, from nursing homes to nurseries, care is automated, accident averted, and human touch is always the last resort. When ambient assisted living is generalized, it slips out of the the retirement village like a fog: it is the becoming retirement village of the world. Or as Amazon now calls it: www⁄Alexa for Residential.

Big tech knows you can get away with things in assisted living facilities that you couldn’t in the city outside. Or not yet anyway. Nursing homes and their residents are already socially isolated, already underfunded, already sites of exploitation and abuse: ready and waiting for a magic bullet offered out of the goodness of some billionaire’s heart. This is a context in which companies can proclaim in all seriousness that www⁄“Continuous monitoring offers greater privacy”, where automated care and the total surveillance it entails justifies the absence of human care as a new feature: greater privacy.

For such an ambient sensing environment to work, this very environment must be designed and shaped with embedded cameras and microphones in mind. Every room becomes a studio. Background noise must be minimized to make objects and sounds a little more legible. And we know that such environmental design doesn’t stop at objects and spaces: it reshapes our own patterns of speaking and living as we learn to enunciate with a cadence, accent and tone that an algorithm can understand.

Resources¶

Shannon Mattern, www⁄‘Urban Auscultation; or, Perceiving the Action of the Heart’ Places (2020) ↩︎
Schuller et al, www⁄COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis (2020) ↩︎
⦚bib:d6b5c725-9964-4218-9157-b3d6fd7ca62anot found; Grant Vetter, The Architecture of Control (Zero, 2012) ↩︎
Jonathan Sterne, “Mediate Auscultation, the Stethoscope and the ‘Autopsy of the Living’: Medicine’s Acoustic Culture,” Journal of Medical Humanities 2, (June, 2001): 115–36; Tom Rice “Learning to Listen: Auscultation and the Transmission of Auditory Knowledge,” Journal of the Royal Anthropological Institute 16, no. 1 (2010): 41– 61. ↩︎
⦚bib:6e8f7c36-d251-4a07-ac5d-0b938c5f5feenot found; Ken Alder, “A Social History of Untruth: Lie Detection and Trust in Twentieth-Century America,” Representations 80, no. 1 (November 1, 2002): 1–33, https://doi.org/10.1525/rep.2002.80.1.1. ↩︎
McKenzie Wark, Capital is Dead: Is this something worse? (Verso, 2019) ↩︎
⦚bib:491e5855-378e-4882-8f6e-c0f1d1099fe3not found ↩︎
Interview with Mark Andrejevic recorded on 21 August, 2020 ↩︎
Interview with Lawrence Abu Hamdan recorded on 13 September, 2020, publication forthcoming 2021. See also www⁄Abu Hamdan, “H[gun shot]ow c[gun shot]an I f[gun shot]orget?" (2016) ↩︎