AI Paper Summary - On the Measure of Intelligence, November 5, 2019 (Francois Chollet / Google. Inc.)

신동현·2023년 3월 4일

AI Papers

목록 보기


  1. To achieve effective progress in AI, we need an appropriate definition and evaluation systems for AI.

  2. Just measuring specific-skill performance falls short of measuring intelligence. That is, It couldn't show all of the system's own generalization power.

  3. Our new formal definition of intelligence is 'skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience.'

  4. Our ARC (Abstraction and Reasoning Corpus) can help to measure a human-like form of general fluid intelligence, and to compare AI systems and humans for fair general intelligence.

I. Context and history

I.1 Need for an actionable definition and measure of intelligence

  1. AI perform very well on specific tasks, but it has limitations. Especially, it couldn't deal with new tasks them-selves without intervention of human researchers.

  2. The only successes of AI was narrow, task-specific systems. But in this context, goal difinitions and evaluation benchmarks are the most powerful drivers of scientific progress in AI.

  3. Common-sense dictionary definitions of intelligence are not actionable, explanatory, or measurable. It is not valid for our purpose that is pursuing the efficient progress.

  4. The goal is that proposing the implicit assumptions, correcting salient baises, and provide actionable formal definitiona and measurement benchmark for human-like general intelligence.

I.2 Defining intelligence: two divergent visions

  1. Legg and Hutter summarized in 2007, "Intelligence measures an agent's ability to achieve goals in a wide range of environments."

  2. One is an emphasis on task-specific skill, the other focused on generality and adaptation.

  3. One view is that the mind only can learn things that are programmed to acquire, another view is that the mind is a general-purpose of "blank slate".

I.2.1 Intelligence as a collection of task-specific skills

  1. Early AI researchers deemed electronic computers as an analogue of the mind, intelligence as a set of static program-like routines and storing learned knowledge in a database-like memory.

  2. Marvin Misky's 1968 definition of AI: "AI is the science of making machines capable of performing tasks that would require intelligence if done by humans".

  3. This definition and evaluation philosophy has an critical paradox.

  4. Hernandez-Orallo pointed out that the field of artificial intelligence has been very successful in developing artificial systems that perform these tasks without featuring intelligence.

I.2.2 Intelligence as a general learning ability

  1. Contrast Minsky's task-focused definition of AI, a number of researchers claim that intelligence refers to the general ability to acquire new skills.

  2. Locke's Tabula Rasa (blank slate) means that intelligence is a flexible, adaptable, highly general process. It can turn experience into behavior, knowledge, and skills.

  3. Both of views - either a collection of task-specific programs or a general-purpose Tabula Rasa - are likely wrong.

I.3 AI evaluation: from measuring skills to measuring broad abilities

  1. These two ideas of intelligence have influenced approaches for evaluating machines and humans.

I.3.1 Skill-based, narrow AI evaluation

  1. Human review: It stems for the Truing test. It is rarely used because it is expensive, subjectvie and cannot be automated.

  2. White-box analysis: This method is that inspecting the implementation of the systems. Fully-described task in a fully-described environment.

  3. Peer confrontation: By competition, determine which AI is better than others.

  4. Benchmarks: By using "test set" to get outputs, compare them with desirable outcome to evaluate the system.

  5. Benchmarks have been a major driver of progress in AI, because they are reproducible, fair, scalable, easy to set up, and flexible.

  6. The focus on achieving task-specific performance is far from the system that has the sort of intelligence.

  7. Measuring of success for task-specific performance is not suitable to measuring of intelligence because the success of achieving task doesn't warrant that the system do well on totally different environment.

  8. Here is the need to move beyond skill-based evaluation.

I.3.2 The spectrum of genearlization: robustness, flexibility, generality

I.3.3 Measuring broad abilities and general intelligence: the psychometrics

I.3.4 Integrating AI evaluation and psychometrics

II. A new perspective

II.1 Critical assessment

II.1.1 Measuring the right thing: evaluating skill alone does not move us forward

II.1.2 The meaning of generality: grounding the g factor

II.1.3 Separating the innate from the acquired: insights from developmental

II.2 Defining intelligence: a formal synthesis

II.2.1 Intelligence as skill-acquisition efficiency

II.2.2 Computation efficiency, time efficiency, energy efficiency, and risk efficiency

II.2.3 Practical implications

II.3 Evaluating intelligence in this light

II.3.1 Fair comparisons between intelligent systems

II.3.2 What to expect of an ideal intelligence benchmark

III. A benchmark proposal: the ARC dataset

III.1 Description and goals

III.1.1 What is ARC?

III.1.2 Core Knowledge priors

III.1.3 Key differences with psychometric intelligence tests

III.1.4 What a solution to ARC may look like, and what it would imply for AI applications

II.2 Weaknesses and future refinements

III.3 Possible

III.3.1 Repurposing skill benchmarks to measure broad generalization

III.3.2 Open-ended adversarial or collaborative approaches

<Source: On the Measure of Intelligence, Francis Chollet, Google. Inc.>


0개의 댓글