An interdisciplinary approach to artificial intelligence testing
Humanity is confronted more than ever with artificial intelligence (AI), yet it is still challenging to find a common ground. We talked with Marisa Tschopp, researcher at scip ag about Artificial Intelligent Quotient (A-IQ), how to automate A-IQ testing and more.
JAXenter: The term ‘intelligence’ is not easy to understand. What’s the best way to explain it and how can we apply it to machines?
Marisa Tschopp: Human intelligence has been a very controversial topic and has undergone dramatic changes in history since the beginnings in the early 19th century. Intelligence gained importance especially in the educational context as these “mental abilities” were the best predictors for success in school and aimed to place students into the right classes. There are various, very elaborated theories, that define human intelligence. Nowadays, human intelligence is taking a more systemic perspective and incorporates various dimensions, not only the ability to calculate or solve riddles.
It is not easy to simply define human intelligence, and the same applies to machine intelligence. We must be aware that we are still in the process of clarifying terms and definitions about AI. For our research, we created the intelligence test from an interdisciplinary perspective. This means we analyzed the various theories and created our intelligence framework, based on what is currently appropriate in an AI context. Our framework is understood as a system of abilities:
- to understand ideas (e.g. questions or commands)
- in a specific environment and learn from experiences (e.g. referring to prior information or put it in context)
- able to engage in reasoning to solve problems (e.g. to answer questions or solve tasks).
Areas of human intelligence are verbal skills, such as knowledge, understanding, and numerical reasoning, spatial and visual abilities, such as solving a puzzle or arranging images in a logical manner. Other dimensions are inter- and intrapersonal competencies, physiological or language skills. From the myriad of existing sub-skills, we have chosen several dimensions for testing:
- Explicit Knowledge
- Language Aptitude
- Working Memory
- Verbal- and Numerical Reasoning
- Critical and Creative Thinking
JAXenter: And what is Bloom’s Taxonomy? Could you explain the reasoning behind it?
Marisa Tschopp: The intelligence domains aim to measure specific abilities, which all contribute individually with varying significance to the overall concept of interdisciplinary artificial intelligence. Furthermore, we have included Bloom’s Taxonomy to better understand the underlying hierarchies of thought.
Bloom explains thinking along the dimensions of lower to higher order skills. The domain Explicit Knowledge for example, measures Know-What as opposed to Know-How: it is comparable to information or data found in books or documents; like lexical knowledge – this domain is rated as a lower order thinking skill. On the other end are the higher order thinking skills, and these are represented as Creative or Critical Thinking in our model.
When we want to know if a machine is able of higher order thinking, we measure the ability to define and analyze a problem and to formulate counter questions adequately to get to a better solution. We investigate the handling with over-simplification, ambiguous questions and answer-uncertainty as part of the Critical Thinking Domain. In the end, we try to merge the best scientific approaches to get the best results; a result is good when it is valid, meaning that it accurately measures the actual capabilities.
JAXenter: What are academic IQ tests and how do they work?
Marisa Tschopp: Academic IQ tests aim to quantify intelligence in an objective manner. Scientific standards play a critical role, like for example the retest-reliability, which measures the correlation of the results of the same test taken at different times.
In short: The IQ is a standardized, numerical measurement of intelligence, with the Stanford-Binet and Wechsler Scales being those mostly in use. Nowadays, the intelligent quotient is a measure of deviation. This means if you take a valid, standardized test your result is compared to those of other testees. The distribution of results follows the rules of a normal distribution; this means that the majority of the people have an IQ around 100 and only 5% of the testees score very high or very low, or in other words are geniuses or in the state of mental retardation.
JAXenter: Are there plans to automate A-IQ testing? Can you talk us through the concent?
Marisa Tschopp: In the future, we want to execute A-IQ tests with all kinds of digital assistants, independent of their ecosystem. We are working on a solution to automate the A-IQ testing procedure to make it available to the broad public. This device will take over the role of the personal analyst, the investigator, who now evaluates the test manually, which is quite time-consuming.
A-IQ Test questions are administered acoustically from a computer (emulating the analyst) to the digital assistant, who is taking the test. Answers will be saved as audio data (e.g. mp3 file), which will be transformed into transcripts via speech2text. This will allow a continuous comparison with past test results. A distant based method like Soundex or Levenshtein will then be used to determine contextual differences. Deviations will be reported to the research department to identify implications and track changes in AI capabilities.
Marisa Tschopp will be speaking at ML Conference 2018 alongside Marc Ruef.