Dear Readers: My History with Aural Texts, Part 4

Georgina Kleege, UC-Berkeley

Portrait of Georgina Kleege.

Picture shows Kleege, slightly tilted to the left and slightly facing right. She has silver hair worn in bangs and cut to just below her ears. She wears a black, collarless shirt. [end of description]

By the time I began my teaching career, I was able to request a reading assistant as a workplace accommodation under the Americans with Disabilities Act. Fortunately, in academia, it is common for faculty to employ graduate students as research assistants, so it has never been hard to convince my employers to provide this assistance. For approximately the first ten years of teaching I required students in my writing classes to record themselves reading their work out loud to me. This had the double benefit of reserving time for my paid reading assistants to perform other reading tasks that I needed, while giving the students a useful editing technique. Many told me that reading their work out loud made them notice things they would not have observed otherwise. In addition to these virtues, listening to students read their work gave me unexpected insights. I learned to listen for slight changes in tone and pace that signaled passages that gave them trouble or made them proud, and this allowed me to Taylor my comments accordingly.

I have learned many things from my reading assistants over the years. For instance, I learned from one assistant, who was from South Africa and had the corresponding accent, that William Faulkner’s prose has the power to make any reader adopt the cadences of North Mississippi dialects.

The last time I used human readers extensively was when I was writing my book about Helen Keller. I wanted to have all her written work read by a single voice. The reader I hired both contributed to my ambivalence about Keller’s personality and also complicated it. Keller’s writing style is characterized by rather elaborate and flowery diction. Though she was a twentieth century writer, her style was a throwback to earlier literary tastes, but I think she also never met a floral metaphor she didn’t like. In addition to this flowery prose style, even when describing negative experiences, Keller maintains an unrelenting cheerfulness, which can be hard to take. When I started listening to my assistant’s readings of Keller’s prose, I heard a slight mocking irony in her tone. Since this coincided with my own attitude toward Keller, I wasn’t bothered by it. When I questioned her about it, she admitted that Keller reminded her of her sister—a sister with whom she did not get along. My assistant’s voice, colored with her tone of sibling mockery, became intertwined with Keller’s words, and led me to consider the ways Keller deliberately represented herself to conform to the tastes and prejudices of her times.

Despite the many virtues of human readers, since the late 1990s I have completely converted from recordings of live readers to synthesized text-to-speech technologies, on my computer and on a range of portable devices, including my phone. I have come to prefer the consistency of the synthesized voice. What it lacks in affect and pronunciation, it makes up for in tirelessness and absolute malleability. It reads what I want, when I want it. It will read and reread endlessly, without complaint or quaver. While human readers get tired, bored, or sick and sometimes add unwanted commentary to texts simply by intonation, the synthesized voice reads everything the same. There are things I can demand of a synthesized voice that no human reader would tolerate and no analogue playback device would allow. It makes a valiant effort to read whatever text I throw at it, and will persevere until I tell it to stop. It will read the same paragraph or sentence over and over again, at different rates of speed, sometimes word by word, occasionally even letter by letter. It does not need breaks, get sick, or go on vacation. It also does not have personal tastes that can get in the way of what I’m reading. It does not get bored, embarrassed or offended. These virtues are often described by the manufacturers of TTS technology as efficiency. Recently I was sampling a number of different voices that are available for download to the reading app on my phone. The voices each read a brief text so consumers can comparison shop. The voices are gendered, or at least come in a range of pitches from soprano to bass, and are identified with gendered names. There are even voices identified as child-like, presumably to be more appealing to children. The voices come with national accents: British, American, Australian, Indian English; Castilian and North American Spanish, and so forth. A typical sample text described the voice as, “Efficient, fast and of very high quality, why not try me out with your own words?” Another claimed that it would “…make your leisure reading more efficient and enjoyable.” A few other voice samples claim to have a wider emotional range. One opined pluckily, “I have good news, you can try me out.” And another, named Will Happy says, “I can read text that you want to hear in a cheerful voice.” There is a British male voice named Peter that comes in both happy and sad versions. I admit that I have some trouble detecting the difference between the three versions, and I am not sure when or why I would choose a voice to impose this kind of emotional layer to my reading.

Although I appreciate the principle of choice, I seem always to opt for the default voice, even though the newer versions are slightly more humanoid. The voice on my computer is a medium tenor named Reed. This passes for wit in the world of TTS technology. Reed does all the reading. It also may be an acknowledgment that “read” is one of those words that can be pronounced differently depending on the tense, but the screenreader cannot always get the context. So when I read this sentence now it pronounces the verb in the past tense when I intend it to be in the present.

Other readers may be happy with all these choices. A friend of mine chooses the voice to coincide with the gender of the author. Recently she reported some consternation when the male voice reading a novel by a male author encountered a first person narrator who was female. This opens up the possibility of all sorts of gender bending reading against the grain. What I value about the synthesized voice is not so much efficiency, as an endearing earnestness. It’s hard not to anthropomorphize. I know that the computer’s voice sometimes narrates my dreams. But it is so ubiquitous, so constantly in my ears, reading everything from email, to all manner of websites, to student papers, and my own writing that my mind erases its influence. It has become aurally transparent. I am no more conscious of it than a sighted reader is of the font the text is printed in.

I’d like to think that the rise in popularity of recorded books for the general public since the 1990s may have normalized blind people’s aural reading practices. Now, I’m hoping that the ubiquity of synthesized speech in quotidian technologies will continue this work toward greater social inclusion. For instance, the marketing of Siri, the iPhone’s text-to-speech and speech-to-text interface, touts the technology as allowing users to read and write eyes and hands free. Advertisements showed people running, driving, cooking and engaged in other tasks while commanding Siri to read emails, send texts, and look up information on the web. Some early advertisements featured blind people among all the other users in a rare characterization of a blind person as a consumer of new technology rather than an object of charity. Public transportation systems around the world feature synthesized voice announcements. There are increasing numbers of household appliances that have some kind of synthesized voice interface—televisions, thermostats, ovens, scales—so that users, sighted and blind alike, can interact eyes and hands-free. So people are becoming more familiar with synthesized voices making TTS technology specifically for blind people seem less unusual.

Synthesized voice technologies are not only ubiquitous but are more and more likely to already be loaded into the device’s software package, as with the iPhone. Everyone is not aware of this of course, which is why I so often find myself enlisted to show off these features to strangers, such as the young man at the airport. Now that I read student work with my screen reader, they often comment on my ability to notice proofreading errors. I point out that the synthesized voice reads word for word, while visual reading, where the eye jumps in saccades, makes it too easy to engage in wishful seeing, missing small errors and omissions. I encourage them to turn on the screen reader that’s probably already built into their computer, and try it for themselves.

For anyone who wants to hear the iPhone’s built-in screen reader, here’s what to do. Tap the round “home” button at the bottom of the screen three times fast. You will here, “voiceover on.” Touch any icon on the screen and hear its name. Tap twice to open the app. For instance, open your email. Tap twice on any message. When it opens, swipe down with two fingers to hear the text read. To turn off the voice, triple tap the home button again. Additional instructions can be found in the accessibility section of the general settings, along with other features identified as aids to vision, hearing, learning and physical/motor issues.

If mainstream consumers find uses for these features, the more blind people benefit. Newly blind adults may be less dependent on social service agencies as they adapt to their new condition using the same electronic devices they already own in different ways. Manufacturers will be less likely to discontinue these features if they assess that large numbers of general users find them valuable. I harbor no illusions about the technology corporation’s goals or priorities. The first computer I owned was an Apple Macintosh. Unlike other personal computers in the 1980s, it made it easy to enlarge the print on the screen. There was also rudimentary screen reader software which worked quite well. Unfortunately, this was discontinued in the early 1990s. When I called to ask why, the person on the phone explained that visually impaired users were not numerous enough to be a viable market. The fact that the Apple Corporation now includes screen readers in all its products does not make me less wary of some future policy change dictated by bottom line considerations.

But for now, I remain optimistic. I have lived long enough to perceive progress. Technology exists to allow me access to whatever reading material I want any time I want it. My life has become streamlined as I abandon all the specialized equipment and do all my reading on my phone. For the first time in my life, blind people stand on the threshold of complete access to information on a par with the sighted majority. As print newspapers, magazines, and books are digitized and become available online blind people benefit, but only if the digital versions of texts are compatible with the reading technologies that blind people use. For instance, early versions of Google Books were produced as image rather than text files, which made them inaccessible to screen readers. Incidentally, a text file is also easier for anyone to navigate using word search functions, so there are many good reasons to produce digital texts this way. There are many who dread the digitization of the world’s libraries, who cannot imagine curling up with a computer screen rather than with a traditional book, who shudder at the sound of a synthetic voice, but my history tells me that human beings can and do adapt to all sorts of unlikely situations. The point of reading is the transmission of ideas from one mind to another, and that can happen through a variety of modalities and technologies.