Chinese researchers have developed a method to measure the intelligence quotient (IQ) of AI applications and found that Google’s technology scored nearly as well as a human six-year old. The researchers also measured applications developed by Baidu, Microsoft and Apple, all of which fared less well.
The study was written up by Liu Feng of Beijing Jiaotong University; Yong Shi, the Director of the Chinese Academy of Sciences Research Center on Fictitious Economy and Data Science; and Ying Liu of the School of Economic Management, UCAS. In the study, they attempt to quantify the capabilities of a number of well-known AI technologies. They present the problem thusly: “Quantitative evaluation of artificial intelligence currently in fact faces two important challenges: there is no unified model of an artificially intelligent system, and there is no unified model for comparing artificially intelligent systems with human beings.”
In the paper they published, the authors propose to solve that by developing a “standard intelligence model” that attempts to encompass AI systems and humans, and which categories them across a seven-level taxonomy of knowledge capability. Superimposed on that is the IQ ranking itself.
Before we delve into the details of what that all means, we can illustrate some of its utility by describing the results they obtained in 2016: Google’s AI – presumably its search engine algorithm – was evaluated as having an IQ of 47.28. For reference, the IQ of the average six-year-old child is 55.5, while that of an 18-year-old is 97. Baidu’s AI came in with an IQ of 32.92, which was nearly equivalent to that of Microsoft’s Bing search engine, which recorded an IQ of 31.98. Apple’s Siri trailed the others, with an IQ of 23.94.
The researchers also employed this same methodology to test both Google and Baidu back in 2014. During that earlier effort, Google’s AI IQ was ranked at 26.5, while Baidu’s was 23.5. The authors noted that both technologies improved “significantly” over the two-year interval, with Google’s IQ nearly doubling. The authors describe the rather arcane formulas they used to derive the IQ, and if you are interested in picking that apart, you should download the PDF.
At higher level though, they also devised seven intelligence levels or grades of intelligence capability, specified as grade 0 through grade 6, which they use to qualify AI systems in a more general sense. They based these grades on a number of attributes, including the system’s ability to perform I/O, the presence (or absence) of a local store of knowledge, the ability to update and expand that knowledge base, and its ability to share its knowledge with other intelligent systems.
We can dismiss the grade 0 systems, since they have no real-world representation. Grade 1 systems are represented by inanimate object that lack the basic ability to exchange information with anything else. The researchers apparently included it in the taxonomy to represent a baseline system that expresses no intelligence – at least to humans. At the other end of the scale, at grade 6, is a god-like intelligence that has an infinite ability to innovate and create knowledge. It represents an evolutionary endpoint of sorts for AI, envisioned by Ray Kurzweil and others as the “singularity.”
Grades 2 and 3 is where most of the digital action is today, and represents current mainstream intelligent devices and platforms. A grade 2 system can interact with humans and can store a modicum of information, but are essentially static in their capability. They include semi-intelligent appliances like smart TVs, refrigerators, and other such gadgets. Grade 3 systems, on the other hand, are more dynamic, in that they can be enhanced to include more functionality and capabilities via software or hardware upgrades. These are represented by devices like smartphones and personal computers.
Grade 4 systems represent some of the cutting-edge work going on in AI. Such systems have all the attributes of grade 3 systems, with the additional ability to self-upgrade via a network. The authors use the example of RoboEarth, an EU-funded project in which robots can share knowledge with one other using a cloud-based data store. Other examples of grade 4 systems include Google Brain and Baidu Brain. The idea here is that these systems have some autonomy with regard to their training and don’t rely on human interaction to direct their evolution.
Fifth-grade systems have the ability to create new knowledge and use it to innovate, as well as the ability to apply these innovations to the process of human development. Human beings are the only example, and the authors regard them as “special” artificial intelligent systems since they were created by non-artificially, that is, by nature. The authors’ biases appear to be creeping in here, since they appear to regard creativity in a very anthropomorphic sense. “Unlike the previous four types of systems, humans and some other lifeforms share a signature characteristic of creativity, as reflected in the complex webs of knowledge, from philosophy to natural science, literature, the arts, politics, etc., that have been woven by human societies,” write the authors.
That bias asserts itself a bit more clearly when they evaluated Google’s AlphaGo, an AI Go-playing application that vanquished Lee Sedol and Ke Jie in separate Go matches in 2016 and 2017, respectively. (Ke Jie is the number one-ranked Go player in the world.) The authors categorized AlphaGo as a grade 3 AI, in the same category as a smartphone and home computer. That was based on the criteria that the program was not hooked up to a network, where it could develop some autonomy from its human trainers.
More controversially, the authors claimed that AlphaGo did not exhibit creativity, but was rather driven by internal formulations. “We believe that AlphaGo still relies on a strategy model that uses humans to perform training through the application of big data,” the authors said. “In its game play, AlphaGo decides its moves according to its own internal operational rules and opponents’ moves. Ultimately, the resulting data are collected to form a large game data set.”
But that’s not exactly true. For one thing, human players are also trained by other humans, and they make the internal decisions based on their opponents’ moves as well. The only real difference here is the definition of creativity, which the authors seem to reserve to humans. AlphaGo, like other game-playing applications developed with deep learning tools, exhibit novel strategies that defy explanation. Even the humans that trained these applications have no way of predicting what strategies would emerge during the course of the game.
Nevertheless, the taxonomy the authors laid out and the IQ testing represents some interesting first steps in measuring AI capabilities. If nothing else, it provides a practical method of evaluating the intelligence of various AI platforms in the field. More importantly, the study seems to infer that AI applications are getting smarter rather quickly. At this point the authors are unsure if AI is on a path that will enable it to overtake humans, or will level off and never quite reach our capabilities. As the research suggests, more data points are needed.