Infrastructuring Vision: An Examination of Automatic Alt Text on Facebook
In this commentary, I apply Susan Leigh Star and Geoffrey C. Bowker’s concept of infrastructure to Facebook’s automatic alt text tool. I discuss the tool in terms of its relational quality, the interplay between foreground and background, and the importance of standards and classification in providing blind and low-vision users full access to the platform’s content.
When a blind or low-vision user encounters an image on Facebook, the platform’s automatic alt text tool (AAT) uses a computer vision algorithm to process it and generate a description for a screen reader, which reads the description (alt text) aloud (Facebook AI, 2021). AAT is part of Facebook’s infrastructure, through which a network of different actors and objects interact to produce the descriptive text. Infrastructure, as Susan Leigh Star and Geoffrey C. Bowker describe it in their chapter How to Infrastructure (2006), “is that which runs ‘underneath’ actual structures.” They argue that new media infrastructure is relational—it mediates between people and tools—and requires a “social and theoretical approach” to its design (Star and Bowker, 2006). In this essay, I’ll illuminate the authors’ concepts about infrastructure’s relational quality, the interplay between foreground and background, and the importance of standards and classification using examples from Facebook’s alt text infrastructure.
Automatic alt text on Facebook
In the most basic terms, alt text is embedded text on a web interface that describes an image for those who can’t see it. In 2016, Facebook introduced AAT, which could detect 100 different concepts, such as “tree” or “outdoors” (Facebook AI, 2021). The company shied away from more complicated descriptions related to identity or adjectives (as they tend to be context dependent) because they feared that inaccuracies could lead to uncomfortable situations for blind people who wouldn’t know that the algorithm had failed to correctly identify an image’s content (Wu et al., 2017). The latest version of the tool can identify 1200 concepts, but still only displays descriptions that the system is highly confident in (Facebook AI, 2021).
The tool has come under criticism from activists, artists, journalists, and scholars for the quality of information it produces. Journalist April Glaser (2019) spoke with blind activists about how Facebook’s accessibility measures have constant bugs, to the extent that blind users have come to expect them as part of the platform’s environment. In their 2021 paper, scholars Margot Hanley, Solon Barocas, Karen Levy, Shiri Azenkot, and Helen Nissenbaum delve into the different values that AAT inscribes by way of the policies that Facebook’s team develops and how they might be in conflict with the blind community’s needs. Disabled artists and activists Shannon Finnegan and Bojana Coklyat’s project “Alt Text as Poetry” (2020) guides participants through the creative ways users can translate images into words as an alternative to automatic alt text, with the goal of increasing access for blind and low-vision users.
Interdependent actors and objects
To contextualize these conversations, it is useful to return to the underlying infrastructure of Facebook’s alt text. Star and Bowker (2006) emphasize that infrastructure is relational: “It never stands apart from the people who design, maintain and use it.” A network of actors comprise the alt text infrastructure on Facebook, including the user who uploads the photo, the computer vision algorithm and its designers, screen readers, and blind and low-vision users. Blind or low-vision users are dependent on screen readers to access image descriptions. Likewise, the image can only be interpreted by the screen reader because of the computer vision algorithm (unless of course the user decides to describe the image when they upload it—constructing a relationship between the user and the algorithm where one steps in for the other). The way AAT describes an image is a direct result of the designers’ decisions about what concepts the algorithm can and should detect. Each component relies on the other to bring the infrastructure to life.
Complicating foreground and background
What is “underneath” (Star and Bowker, 2006) Facebook’s alt text infrastructure changes depending on the actor interacting with it. As Star and Bowker (2006) explain, “Analyzing infrastructure means problematizing the relationship between background and foreground;” infrastructure typically becomes visible when it breaks. For a sighted user, the image is the content, the surface, while alt text is infrastructure because it is embedded underneath the image until an image fails to load, whether it be because of bandwidth issues or such as in 2019, when Facebook’s platform went down and users couldn’t upload or view photos, revealing alt text to sighted users for the first time at scale (Hanley et al., 2021). Whereas for the blind user, alt text is the content that they engage with and the photo is relegated to infrastructure. A breakdown in the infrastructure for the blind community results in no content at all—which can happen when Facebook’s algorithm hits the limit of what it feels it can confidently describe and instead only writes “image” (Glaser, 2019).
Standards and classifications, values and access
An integral and often overlooked part of infrastructure is the way that each layer requires standardization and classification, argue Star and Bowker (2006). Facebook’s designers developed a classification system for their computer vision algorithm, which produced the original 100 (and now 1200) concepts that it can detect (Wu et al., 2017). On a larger scale, the Americans with Disabilities Act (ADA) set forth standards to prohibit discrimination against disabled individuals in a variety of locales (Acessibility.Works, 2021), but lacks a technical legal framework that informs internet companies exactly how their website infrastructure should be set up accessibly (Reid, 2019). In lieu of a legal framework, the Web Accessibility Initiative (W3C) (2008) has created guidelines, but these function as suggestions rather than requirements. Without this standardization, blind users have access to different levels of information within Facebook’s infrastructure, as well as across different social media platforms. Standards and classifications not only organize the software and hardware, but also the way people interact with the platform (Star and Bowker, 2006). While blind users must use screen readers, sighted users don’t regularly include alt text because it is not a standard requirement when uploading an image.
While this blog post only scratches the surface of Facebook’s alt text infrastructure and its embodiment of Star and Bowker’s (2006) arguments, it is undeniable that infrastructure, this digital one in particular, is a delicate network of relationships between users, designers, and technical tools. Star and Bowker (2006) determine, “a social and theoretical understanding of infrastructure is key to the design of new media applications in our highly networked, information convergent society.” What is particularly striking about Facebook’s alt text infrastructure, is that it explicitly shows how blind users have a lower level of access to information on the platform as opposed to their sighted counterparts, specifically because AAT is limited in the concepts it uses in its descriptions. Disabled users of all kinds, not just those who are blind or low-vision, deserve equal access to our digital information infrastructures. Importantly, this does not come without unique challenges as to who decides what information is valuable and how it should be communicated.
“2021 ADA Website Requirements & WCAG Compliance Standards for Websites.” Accessed September 22, 2021. https://www.accessibility.works/blog/2021-ada-wcag-website-accessibility-standards-requirements/.
About Facebook. “Using AI to Improve Photo Descriptions for People Who Are Blind and Visually Impaired,” January 19, 2021. https://about.fb.com/news/2021/01/using-ai-to-improve-photo-descriptions-for-blind-and-visually-impaired-people/.
Coklyat, Bojana and Shannon Finnegan. “Alt Text as Poetry.” Accessed October 1, 2021. https://alt-text-as-poetry.net.
Facebook AI. “How Facebook Is Using AI to Improve Photo Descriptions for People Who Are Blind or Visually Impaired,” January 19, 2021. https://ai.facebook.com/blog/how-facebook-is-using-ai-to-improve-photo-descriptions-for-people-who-are-blind-or-visually-impaired/.
Facebook Engineering. “Under the Hood: Building Accessibility Tools for the Visually Impaired on Facebook,” April 5, 2016. https://engineering.fb.com/2016/04/04/ios/under-the-hood-building-accessibility-tools-for-the-visually-impaired-on-facebook/.
Glaser, April. “When Things Go Wrong for Blind Users on Facebook, They Go Really Wrong.” Slate, November 20, 2019. https://slate.com/technology/2019/11/facebook-blind-users-no-accessibility.html.
Hanley, Margot, Solon Barocas, Karen Levy, Shiri Azenkot, and Helen Nissenbaum. “Computer Vision and Conflicting Values: Describing People with Automated Alt Text.” Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, July 21, 2021, 543–54. https://doi.org/10.1145/3461702.3462620.
Initiative (WAI), W3C Web Accessibility. “Web Content Accessibility Guidelines (WCAG) Overview.” Web Accessibility Initiative (WAI). Accessed September 22, 2021. https://www.w3.org/WAI/standards-guidelines/wcag/.
Reid, Blake E. “Internet Architecture and Disability.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, February 17, 2019. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3338589.
Star, Susan Leigh, and Bowker, Geoffrey C. “How to Infrastructure.” In Handbook of New Media: Social Shaping and Social Consequences of ICTs, edited by Leah A. Lievrouw and Livingstone, Sonia, Updated student ed. London: SAGE, 2006.
Wu, Shaomei, Jeffrey Wieland, Omid Farivar, and Julie Schiller. “Automatic Alt-Text: Computer-Generated Image Descriptions for Blind Users on a Social Network Service.” In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, 1180–1192. CSCW ’17. New York, NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/2998181.2998364.