By: Michael Feldman
Google has demonstrated an artificial intelligence technology that represents the most sophisticated example to date of a computer engaging in natural conversation with a human. Upon hearing the interaction, some listeners felt the software had convincingly passed the Turing test.
Even though it was developed in 1950, the Turing test is perhaps the most well-recognized way of measuring an AI system’s capacity to demonstrate human intelligence. Developed by legendary computer scientist Alan Turing, the idea was to have a computer program converse with someone at a level where the person would be unable to tell if they were talking to a computer or a human. The test actually encompasses a good deal more complexity than that, but the gist of it is to prove whether or not a computer can pass as human.
Before we get too far into this, you need to watch the five-minute demonstration of the technology, known as Google Duplex, presented by Google CEO Sundar Pichai at last week’s Google I/O 2018 event. The demo represents two phone conversations with different people in which Duplex successfully navigated some challenging exchanges. It’s kind of mind-blowing, to the point you almost forget one of the participants is a computer.
<iframe src="https://www.youtube.com/embed/ogfYd705cRs?start=2109" width="784" height="441" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
As Pichai noted, the key to the technology is its ability to “understand to nuances of conversation.” However, Duplex can’t converse about everything. In a blog posted by Google Duplex lead Yaniv Yaniv Leviathan, Google Duplex lead, and Matan Kalman, engineering manager on the project,, being able to pull this off necessitated constraining the models to particular “closed domains” in order to develop the extensive conversational networks required for specific tasks. At this point, the technology is not sophisticated enough to produce a general-purpose AI conversationalist. In that sense, it might fail the Turing test once the conversation strayed into unsupported subject areas.
But the demonstration does illustrate how sophisticated those models are for the selected domains. Not only was Duplex able to converse naturally with the people on the phone, it was able to react appropriately when problems were presented – especially in the second phone call, when the person led the conversation astray. Leviathan and Kalman say the technology is also able to extract the meaning from context when ambiguities are presented. For example, the phrase “OK for four” could refer to 4 people or 4:00, depending on the conversation that preceded it.
The other thing that is immediately apparent is how well the technology has advanced for basic speech input and output. On the input side, the poor quality of the call on the first exchange and strong accent on the second exchange did not appear to trouble the Google software a bit. As far as Duplex’s own voices, they appears to be based on the company’s WaveNet technology, which has advanced speech generation to the point where it is all but indistinguishable from a real person. The addition of filler words like ‘umm’ and ‘uh’ and colloquialisms like “mmm-hmm” and "gotcha" is also a nice touch, adding some extra authenticity.
In the blog write-up, Leviathan and Matias offer a few details on the underlying technology, which they encapsulate thusly:
“At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX). To obtain its high precision, we trained Duplex’s RNN on a corpus of anonymized phone conversation data. The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks. Finally, we used hyperparameter optimization from TFX to further improve the model.”
No word on what hardware was used or how long the training took for any particular domain. According to Pichai, Duplex has been in the works for years, so presumably was developed over two or three generations of Tensor Processing Units (TPUs) and possible other hardware. As we reported last week, Google used this same I/O event to unveil its third generation TPU, which will be used to develop bigger and better neural networks for the web giant’s internal needs. Special mention was made of using TPU 3.0 to improve the AI behind Google Assistant, which also happens to be the initial platform for Duplex.
In this case, the idea is to be able to tell Google Assistant to schedule something on your behalf by phone – a haircut appointment or restaurant reservation in the examples above. The app then does this phone magic offline via Duplex and notifies you when it completes the task. Ironically, this initial application is most useful for interacting with low-tech businesses that have yet to embrace modern online tools for managing appointments and reservations. But the underlying technology seems destined to expand into more lucrative areas like automated technical support, human intelligence gathering, or essentially any type of expert system that relies on personal interaction.
Google plans to start testing Duplex in Google Assistant this summer.