Who controls your voice controlled world?

On: October 24, 2017
One of the latest trends in the tech world is the development of the smart digital voice assistant. It was first popularized by iPhone’s Siri, and now companies like Google, Amazon and Microsoft are developing software and hardware technology in this category. However, this assistant is no longer constrained to your personal cell phone but is rather made omnipresent throughout the home.

Smart speakers want to control your home Illustration: Ståle Grut/Breather (Unsplash Licence)

The smart speaker, powered by artificial intelligence and presented as the newest household commodity, has become an integral step in the digitalization of one’s direct environment. One of the leading examples in this category – Amazon’s voice assistant, Alexa – claims to make ‘your life easier by letting you voice-control your world’. Amazon seeks to compute many of the menial tasks typically found around the home, as well as enabling these tasks to interact with each other. These new types of digital assistance products have gained a lot of attention in the media due to their potential for optimizing the user’s home environment. This is often referred to as ‘home automation’. After reading the many reviews and speculations around this new product category, different questions arose on what the digitalization actually entails. This form of digitization brings forth a myriad of unforeseen social, economic and political consequences.

In our research, we will focus on the voice assistant Alexa. The most important reason being the estimation that Amazon will control 70% of the voice-enabled device market by the end of 2017 (Perez). This implies that their technology, development and sales are the most advanced. Amazon has integrated Alexa into a range of different products, all possible to be operable by the user’s voice.

According to Morgan Stanley, Amazon has in total sold over 11 million Alexa enabled devices (González), clearly indicating this voice operating development as a new trend. The scope of our research also incorporates the broader systems Alexa is embedded in. Among other things, this refers to the institutions and governments that are potentially interested in data collected by the voice assistant (Raj and Raman 31). A lot of research has been done on the privacy implications of using voice assistants. However, we want to look past the privacy-debate and focus on the implications of control for the user, producer, Alexa and their environments. This leads us to the question:

To what extent does the user gain control over their environment by digitalizing it with a voice operated home automation system?

To answer this research question, we will first look at how Alexa works technically. This approach will lead us to first contextualise Alexa within the Internet of Things, which allows us to understand Alexa not only as a platform, but as a self-contained cybernetic system tied into Amazon’s systems. Once Alexa as a platform and cybernetic system is explored, we will have a better understanding of the notion of its control. We will use different audio and video examples to support our findings and conclude with a final statement.

Unboxing the black box

Amazon gives developers the opportunity to implement Alexa in two ways. One is by integrating it into any device that has a microphone and a speaker via Alexa Voice Service, or by adding new capabilities to an Alexa device through Alexa Skills Set. Reading through the developer’s design guidelines (Amazon Developer Services) makes it clear that Alexa’s skills are a grammatization of human behaviour (Agre). Possible actions are atomized into units and arranged into sequences that Alexa can execute. Alexa currently has 20,000 available skills, which we have categorized into four basic affordance types (Gibson):

  • a/ providing convenient access to information
  • b/ mediating interaction with third-party services (such as Uber, Domino’s etc.)
  • c/ simplifying everyday tasks (notes, reminders, home automation)
  • d/ providing entertainment

When setting up Alexa, it asks to determine your location, time zone and measurement units (Alexa App), as such personalization is an option but only to a certain extent. However, a problem occurs when most users don’t know how to make full use of their product through its convoluted skills set. In a recent study done in the US, 72 percent of the surveyed agreed with the statement that “they don’t know enough about their Smart Speaker to use all its features” (NPR and Edison).

This proves the device’s complicating lack of interface and the issues it has conveying its potential to its users. Looking beyond how people work with Alexa we wanted to investigate how the device itself works. To ‘unbox the black box’ we made use of an Alexa skill testing tool for developers, called Echosim. This website provides a browser version of Alexa with a console mode that makes visible the actual code that Alexa is processing during any request.

The question we posed was:

Alexa, what’s the weather in Amsterdam?

The first thing the console shows is that the device itself is not able to answer this kind of question. It sends all information to the “cloud”, where Alexa Voice Service is located.

This video provides a brief explanation of the process that you can read about in full detail below.

Outlined below are the steps that the device takes to answer the question (Amazon Developer):

  1. Activation: by the wake word “Alexa”
  2. Streaming the captured audio to cloud-based Alexa Voice Services
  3. Automatic speech recognition converts the speech into text
  4. Natural Language Understanding converts the text into intents that Alexa can act on
  5. In this request, the task is to access the weather data and convert the results back into lifelike speech
  6. The final step is for Alexa to send a directive to the device instructing it to play the created audio file

This process shows how Alexa itself is only as capable as the network it is a part of. Moving away from the technical aspects of Alexa, we will now focus on Alexa’s integration and role in the home environment with the accompanying advantages and disadvantages to the user.

Who is in control?

Alexa and the Internet of Things

Alexa is marketed by showing how voice input is more convenient than traditional ways of interacting with digital products – like typing or touching. Hands free operation allows for easy completion of a wide range of actions. By connecting Alexa to other smart devices, a simple task like turning on the lights can be made easier but at the same time infinitely more complex. Instead of turning on each individual light switch, you can adjust the lights in the whole home by speaking to Alexa. A ‘morning mode’ or ‘TV mode’ that involves multiple light sources, and possibly other devices like thermostat etc is an example of this. According to Bratton, many successful platforms receive quicker generative entrenchment because they incorporate existing systems and add value for its users instead of introducing new systems (50). While most electronic appliances in a home are connected through a power grid, this often does not or barely enable remote control or any interaction between the devices. Alexa can therefore be seen as a forerunner of the change from power grid systems to online cloud systems.

By connecting a sensor or device to the internet, it is possible to access their data remotely and control them in real time. Combining this with structure provided by the web gives rise to new synergistic services that go beyond the services offered by any isolated system (Kopetz). This is popularly referred to as the ‘internet of things’ (iot) which enables a wide range of interaction between different devices. Samsung says all of their devices will be connected to the internet by 2020 – everything from washing machines to office chairs (Murphy). However, this kind of infrastructure is also vulnerable. Just as quick as physical infrastructures are built, they become susceptible to attack, sabotage, or destruction (Parks, 364). For instance, inaudible voice commands can trigger a voice enabled device at any time – forgoing any human perception. TV-shows and TV advertisements have already been used to trigger devices like Alexa and Google Home, even causing them to place orders for an item (Liptak).

Despite security concerns, the goal for most tech companies is turning your entire home ‘smart’. The more devices you connect to the cloud – the smarter your home will be or so they propagate. This creates a home environment where anything can be remotely controlled, and monitored and turned into a data feed. Companies like Amazon aggressively compete to be the one controlling the user’s home environment (O’Donnell). The position of mediator is potentially a lucrative one, that can sway the user into investing more in their ecosystem (Plantin et al. 9).

Alexa as a platform

Shifting different products into a single ecosystem leads us to the notion of platforms.
As stated earlier, Alexa claims to make your life easier by letting you voice-control your world. By employing skills whilst the user remains autonomous in decision-making Alexa can be seen as providing a choice architecture. Moreover, Alexa adds value to the user’s actions (Bratton 50) by ‘suggesting’ possibilities to users in highly personalised ways (Yeung 121). An example of this is the ‘morning mode’ or ‘TV mode’ lighting mentioned earlier. In this sense, the notion of control seems to lie in the hands of the user. Yet Alexa relies on opaque software algorithms, often if not completely unknown to its users while influencing their behaviour. Thereby Alexa actively shapes its users and their environment (Yeung 129-30). By gathering user data, standardizing their actions and modifying them via recursive feedback loops, the notion of who is in control becomes increasingly vague.

What Alexa affords also limits the user and standardizes their actions. As Amazon presents Alexa as a platform, it stimulates independent developers to benefit from its market share. However, as accessories, software and devices from different companies are incompatible with others, users are persuaded into choosing one provider and staying in its “walled garden” which is hard to escape from. In this way Amazon’s pursuit to directly influence the user’s home environment seems to fade away into the background.

According to Plantin et al. the corporation’s goals only reside in its technical properties, made visible via its terms and conditions (6). When looking at Alexa’s Terms of Use, the user is constantly redirected to the Amazon.com Privacy Notice and Amazon.com Conditions of Use. This ties Alexa into bigger “standards-based technical-economic systems” distributing decentralised interfaces while centralising control (Bratton 42) and incorporating Amazon’s business goals. What Alexa actually does, then becomes even more murky and the notion of who’s in control changes further.

According to Gillespie the growth and opaqueness of these platforms creates pressures that “mount to strike a different balance between safe and controversial, between socially and financially valuable, between niche and wide appeal” (359). For instance, what the technology and its terms afford or hide, actively shapes the public discourse. Something that becomes painfully clear in the example presented by NYU professor Scott Galloway. He shows how Amazon exploits Alexa’s lack of interface by only presenting limited products, at higher prices. Thus, the disappearance of the interface results in pre-set possibilities which are not to the user’s advantage.

The products work in Amazon’s favour and the notion of control resides in steering the user. Although some control can be retained by the user in taking a step back to Amazon’s original web shop, these “adjustment[s] of both the standard-setting and behaviour modification phases” shed new light on Alexa as a self-contained cybernetic system (Yeung 122) which transmits a form of control that enables, but most definitely shapes.

Alexa as a cybernetic system

Digitalization of the home through a voice controlled automation device ties neatly into systems theory because it is the systemization of the house. By modulating all the different processes involved in maintaining one’s housekeeping into an interacting platform, a new system is born. This allows the user to engage former independent actions into a chain of effects that can be adjusted and optimized to their wishes. The Internet of Things in the user’s house takes them one step further away from executing these tasks themselves and towards a controlling and surveilling position within the system. The user is no longer directly responsible for the executions of these tasks, but instead manages the overarching endeavours and all its intricacies.

The system that you set up in your private home is not the only way the home automation device is of relevance to the field of cybernetics. Once you have modulated all the processes controlling your house and have made them digitally accessible, this system can now be linked up to existing larger systems. An example is the Amazon’s Dash Wand which has Alexa integrated in the device. By scanning a desired product, one can order a new shipment of it. Be it toothpaste, detergent or olive oil. This turns the user’s immediate needs into a step of Amazon’s logistics supply chain. This is a big advantage for Amazon as it turns the home into a marketplace. The user’s data acquired by these systems can be of great profit to corporations, governments or other entities.

The current way home automation devices are set up is that they promote adding more and more of your home’s assets to their chain of processes. This goes along with the utopian techno-deterministic thought that the more products Amazon can connect to, the more the user can control and optimize their environment. Despite the notion of a ‘walled garden’ and competition between the main tech companies, there is a great need for compatibility between products. In Amazon’s case this increases the users ease of consumption whilst enforcing their brand loyalty and enabled products. At the same time, all this greatly increased data is fed into Amazon’s database. These aspects directly relate to the increase of power and influence that these companies have over the user’s spending and home.


In this intervention, we wanted to explore to what extent the user gains control over their environment by digitalizing it with a voice operated home automation system. Before, the user was the common denominator in every action being executed around the house. Now, through Alexa, the user is surrendering direct control to Amazon in favour of efficiency and ease of use. Where Alexa’s users want to optimize the processes around the house, Amazon’s goal is to tie the user’s private home (system) into their larger systems of infrastructure. In this pursuit, the user through the home, becomes a part of their logistics chain and the notion of control shapes the user and its environment. In the end, users trade control for convenience. Moreover, as a result of forthcoming integration of Alexa into cars, watches, headphones and glasses, Amazon’s reach will no longer be restricted to the home – allowing it to collect even more data.

Currently, the Alexa skill’s grants the user a direct sense of control because the device is only able to execute actions that have been pre-programmed into it beforehand. This allows the main user a grasp of the affordances provided by the voice assistant, and possible limitations of its potential. This is an awareness that could vanish in the future, once the device acquires mainstream traction and pre-programmed skills possibly becomes the norm.

Further research should be done to investigate the notion of control as we have only attempted to provide a starting point. However, our research has uncovered how the user of a voice operated home automation system surrenders direct control in a trade-off for ease of use and convenience.


