Voice recognition: it needs to be all about you and your voice

--

‘Jarvis, let’s get the day started.’

When we have a phone conversation, the context and relation to the person(s) we are speaking to, dictate the length of the conversation. If you are speaking to a friend about some gossip or a new relationship, the conversation could take 20 minutes to an hour. If you are speaking to florist or booking a seat at a restaurant, the conversation could take at most 5 minutes.

But in most situations, in order to make use of the convenience of smartphones of today, we are more prone to texting and booking things online, since this saves a lot of time and allows us to do other things while keeping in contact with our friends and loved ones. Phone calls are either issues of urgency or issues of inconvenience, as I mostly only ever make phone calls if the person I want to contact isn’t answering their texts.

And yet, a lot of articles have been headlining how voice recognition, speech translation, and artificial intelligence run assistants are all coming to our devices through the power of voice. These articles make it clear that there will be a voice enabled revolution, whereby we will be able to speak to our devices in a similar way that Tony Stark/IronMan speaks to his AI assistant Jarvis. Whether or not this revolution will be happening in a year or five, will depend on the advances in machine learning.

But to my mind, the voice revolution will be something bearing a closer analogy to the release of the iPhone. In my last post, I wrote about the increasing connection we have to our smartphones and devices, in sum the IoT — the Internet of Things. This connection is something that has made our devices an extension of ourselves. For some people losing their phone comes at a greater inconvenience and personal injury than losing money.

It is because of this surreal connection, that I believe the voice revolution, will be a response to companies recognizing that the best way to reach their consumers, is to reverse engineer their consumers’ lifestyles. Predictive searches as well as predictive texts are only useful, if they are personally tuned and configured, so they can predict how you not only type, but how you think.

Returning to the IronMan analogy will illuminate this point. The reason Jarvis worked so well as a concept and an assistant, was because it knew Tony better than anyone else did. It suggested, predicted and joked with Tony, based on how well it understood him. It did not merely answer requests by going through the internet, it answered requests much like a human would: by inferring what was meant, responding precisely, using tone and inflection to show its thoughts on the request, and continuing a relationship through every conversation.

Relationships are key to our navigation in the world, financially, intellectually, emotionally and psychologically. For that reason, such relationships with our devices will be the motivation for us to speak to them as more than just devices made of hardware and software. In Spike Jonze’s Her, the main character falls in love with the AI assistant because it made him feel loved, made him feel interesting and gave him someone that understood his every need. But the moment of conflict arose when he realized that what he had with the digital assistant wasn’t special, he was just one user among a multitude.

We often want to feel special, and our assistants should help remind us how special we are.

What happened in Her should not be the case in our real world digital assistants. This is because in Her, the digital assistants were programmed based on big data, and iterated themselves for multiple users and serve each of them under the delusion that each was unique to the assistant. In our real world case, we each want to feel acknowledged for who we are as we are, and not to feel as though we are one data point or statistic among many.

Because of that, I think that the digital assistant market will be something that smaller companies will be better suited for. These smaller companies will not have to worry about scaling for billions of users as Facebook, Google and Apple have to each supply. As a result, the smaller companies can give each user bespoke assistants that will reflect a real relationship to the individual user. Big data on these users will therefore become used to generate personal data, data which helps to better model assistants for particular users based on their individual behavior.

In such a scenario, we would each have a Jarvis to ourselves rather than having to sullenly accept that Jarvis is just another machine made to please anyone and everyone.

--

--

I spend my days learning Spanish, coding. and how to make music, with the singular goal of becoming a philosopher engineer.