DigitalThought.info DRAFT UNDER CONSTRUCTION 2011.3.28, 2012.4.10

Post-Game Analysis of the IBM Watson
Jeopardy Challenge

by celeste.horner @gmail.com

The IBM Watson artificial intelligence system achieved a milestone in the field of artificial intelligence and natural language processing by winning the Jeopardy general knowledge competition. [video] Competing in real time while disconnected from the internet, Watson's massively parallel computing resources comprised 90 servers, equivalent to over 6000 personal computers, 200 million pages of information, 15 terabytes of RAM, World Book Encyclopedia, Wikipedia, American Medical Association Enclopedia of Medicine, Dictionary of Cliches, Facts on File Word and Phrase Origins, Internet Movie Database, New York Times archive, and the Bible [PBS Newshour - Miles O'Brien report].   [PBS Nova: The Smartest Machine on Earth] The hard-working programming team at IBM scored a well-earned public-relations coup for their fine company and helped define an industry-leading vision for ubiquitous, intelligent cloud computing.


Big System Knowledge Power from a Small Computer (plus Google)

Digitalthought.info produces knowledge management software with a conceptual understanding of natural language. Both knowledge recording and question answering algorithms are developed which enable the average person or small business to reproduce Watson's big system question answering performance with one personal computer and output from a search engine such as Google or Bing.

Google's performance has proved particularly impressive because it cleverly includes singular, plural, tense (invented/invents, open/opens/opened) and synonym (writer=author murder=kill) variations in search results.) and even compensates for misspellings (estabished/established)

Next Generation IT: Deep Semantics

In the commentary below, we discuss algorithms that are effective for answering Jeopardy questions and propose how to exceed that level of performance by encoding deep semantics related to the questions.


Algorithms for Answering Jeopardy Questions


Two Phase Text Match Method

  1. EXTRACT PRIORTY SEARCH TERMS
    from the question.
    Prioritization helps evade misleading and irrelevant terms.

    • FIRST PRIORITY
      • quoted expressions
      • proper nouns
      • CATEGORY INFO, noun phrases preceded by "this" or "these", is separated; used in MATCHES CATEGORY? phase of the search
      • noun phrases complementary to a pronoun (joined by comma or BE verb)
      • verb phrase, work in reverse from target noun
      • dates
    • SECOND PRIORITY
      • modified noun string
      • currency, numbers, quantities plus modified expression
      • verb phrases
      • verb stems (Google automatically generates verb form variations)
      • category terms (see EXTRACT CATEGORY below)
    • THIRD PRIORITY
      • 3+ word idioms
      • lexically rare words
      • the whole clue
  2. EXTRACT CATEGORY INFO
    category clue words
    preceded by demonstrative pronoun in question
    • this X (author)
    • these X
    • one of these
    pronouns his or her may indicate category human and gender
    pronouns his or her indicates gender
  3. PERFORM SEARCH (keep top 5 hits)
    • match literal multi-word string if possible
    • eliminate term which is included in question clue
  4. VALIDATE ANSWER
    1. MATCH CATEGORY
      • Accept first of 5 answers which matches target category, assign confidence vlue
      • is answer candidate a member of target category?
      • First priority: CATEGORY INFO + ANSWER CANDIDATE
      • Second priority: CATEGORY CLUE + ANSWER CANDIDATE
      • Match if terms appear in first 5 hit headlines, or text
    2. SUBSTITUTE answer candidate in statement, try to find corroborating documents
    3. CONFIDENCE LEVEL - high/ideal if NLP syntax comprehension used to read hit text
      • calculate from percentage of hits containing target terms
      • high confidence if headline expresses candidate criteria equivalence
      • high if candidate answer substituted in original clue is corroborated elsewhere
    4. check if category term is an element of the search result dictionary definition,
    5. check if a search juxtaposing the category and the candidate category ranks high in (top 5) search results.
    6. take top results from specialized sources or limited domain source (Bible, opera, medical encyclopedia)
  5. ANSWER
    • list candidate answers ranked by confidence level
    • merge results of competing parallel computations
    • multiple lines of interpretation may contribute a component in ambiguous situations

Other important algorithms

  • Fill in the blank. Match text string in question to one in reference resources and identify missing element.
    Common placeholders:
    • this
    • these
    • one of these
  • Syntax - identify part of speech for each word, parse phrases
  • Phonetics - match rhyming words. (Presidential rhyme time: Obama llama, Bush Tush. Obrien demo round)
  • Length filter - Use clue such as "4-letter word" to select answer by letter count
  • Hypothesis - (1) the answer is defined or described by the clue. Select answer whose definition matches clue, or where clue terms are highly ranked (ex: headline or lead text of search results); (2) the answer is an element of the clue. Seek other quoted instances of clue; select missing element to be the answer.
  • Thesaurus and dictionary look-up. Match the clue to a word's definition or synonym list. Return main entry for term whose definition matches query terms.
  • Boolean - find vocabulary intersection of two search retrieval sets or documents
  • Word Trains - find phrases where the last word of one is the start of the next
  • Determine gender from associated pronouns. Access to gender-specific baby name list would be important
  • PERSONAL NAMES
    • look up first name; infer gender
    • recognize people are often referred to only by last name. Obama = Barack Obama
    • recognize equivalence of references using full name, first name, last name, full name including middle initial
  • Determine animate / inanimate status from verb, word definitions and object classifications database
  • Containment - use clues to assemble group of answer candidates, then select one that contains a target string such as "state" in reinstate and gestate
  • Acronyms - extract initial letters from expressions
  • Text search in news archive, encyclopedia, dictionary, thesaurus, famous quotations and idioms. Important: history, art history, Bible, opera


Game Analysis


Day 1 video   part 2   text archive   commentary

Day 2 video   part 2   text archive   commentary

Day 3 video   part 2   text archive   commentary   CONCLUSIONS  

PBS Nova: The Smartest Machine on Earth



COMMENTARY DAY 1

Day 1 video   part 2   text archive


LITERARY CHARACTER APB
(Alex: All points bulletin.)
BEATLES PEOPLE
OLYMPIC ODDITIES
NAME THE DECADE
FINAL FRONTIERS
ALTERNATE MEANINGS


Category Clue: Literary Characters APB
Question:Wanted for stealing a loaf of bread in "Les Miserables"; really, really wanted, for other thefts too

This question lends itself to the two phase Jeopardy Algorithm we describe; search for quoted phrase "Les Miserables" and verb phrase "stealing a loaf of bread" immediately retrieves Jean ValJean. Watson won this question, the 18th of the first Jeopardy round televised February 14, 2011.


Solution: Use two-phase text match algorithm

1. Extract Priority Search Terms

  • first priority, quoted expressions: "Les Miserables"
  • second priority: verb phrases working backwards from objects: stealing a loaf of bread

2. Extract Category

  • category clue words: Literary Characters APB

3. Perform Search Phase 1. Keep top 5 results: "Les Miserables" stealing a loaf of bread

4. VALIDATE ANSWER: #1: Jean Valjean

  1. SUBSTITUTE candidate answer in clue sentence
    supported by search result: "Valjean steals a loaf of bread"
  2. MATCH CATEGORY (Search Phase 2)
    • category: Literary Characters
    • first answer candidate: Jean Valjean
    • search: literary characters Jean Valjean


Computing Deep Semantics of the Les Miserables Question

Victor Hugo's novel, Les Miserables, entrains deep philosophical themes of social justice and morality. At DigitalThought, we are tackling the ambitious task of creating a wise, ethical and compassionate computer capable of understanding human nature and situations.

DigitalThought represents the world as a interaction of entities. Active agents entities have a set of differentially prioritized motivations (directives, or obligations) which lead them to perform actions to achieve their goals.

Consider the interaction of two characters, Valjean and Javert:

Valjean - steals a loaf of bread to feed starving children. Caught, served 19 years hard labor. Embittered by prejudice against ex-convicts, returns to stealing, but converted by example of kind priest who forgives his thievery. He reforms, becomes a successful businessman and then mayor. He is doggedly pursued by law-enforcement officer Javert.

Javert - incorruptible, dogmatic, committed, legal absolutist, like a robot. Could not transcend programming, handle exception. Committed suicide when he realizes that Valjean is actually admirable and does not deserve his persecution. Couldn't reconcile extenuating circumstances with strict adherence to the law.

Evaluate input: understand APB

  • Input parse: Category Clue = (Literary (Character)) APB
  • Input parse: Literary Character modifies APB
  • Input parse: Literary modifies character
  • Input parse: Literary definiton: pertains to set including novel and play
  • APB implies $input_type = all points bulletin.
  • APB implies $input_format = X $BE wanted for $criminal_charge.
  • Expected category of X is human
  • Expected realm of X is physical world
  • Input X ∈ is of type literary character, realm is fiction
  • Potential humor or metaphor detected due to realm violation of X

The computational basis of the humor is a mild category violation, APB normally refers to an actual human, wheras in this instance, the targeted agent pertains to a fictional realm. idiom: $subject $be wanted for $committing? $crime APB: communication, law enforcement agents, apprehend agent literary character is fictional context if X has stolen Y, X has committed theft owner of Y, no longer in possession of Y Y motivated by family children, hunger, class animosity (revenge, inflict suffering on priviledged to reduce Y's perceived lack, lack of opportunity


Literary Characters APB: His victims include Charity Burbage, Mad Eye Moody & Severus Snape ...

This question was biased against Watson. The answer entailed human style cognition and was not solvable by straight text lookup methods. Character names like Severus Snape and Mad Eye Moody uniquely allude to the plot of Harry Potter, but there, search engine techniques face a dead end. First, the identity of victim must be applied to each member of a distributive, non-exhaustive list. The history of each character must be reviewed to find the nature of their misfortune (robbery? turned to frogs?) and who was the common perpetrator. All were apparently killed by or at the behest of Voldemort. The clue "He'd be easier to catch if you'd just name him" is mostly noise to a search engine, but a instant giveaway for an associative mind which understands that Voldemort's name was so feared it became unmentionable. With no verbatim link between the clue and answer, Watson did well to use the pronoun "his" to select only male characters in Harry Potter. Inference of gender from name or context is a task in itself. The correct answer, Voldemort was Watson's second choice reply with 20% confidence.


BEATLES PEOPLE was computationally the easiest set of questions in the initial Jeopardy round. Watson answered all 5 correctly, winning 4 of 5. Answer procedure required text matching between Beatles lyrics and the clue; extracting the lyrics word holding the same relative position as the word "this" or other pronoun in the clue.

Ex: $200 "And anytime you feel the pain, hey" this guy "refrain, don't carry the world upon your shoulders". Answer (Watson): Who is Jude? (baby sings Jude)

Watson did a good job distinguishing between animate and inanimate subjects, answering "Who is Lady Madonna?", but "What is London?". How did Watson know that "Jude" was a person? Perhaps tipped off by question asking for "this guy". Watson and smart computers need a database somewhere that designates the categories of terms, for instance that "guy" as well as king, queen, doctor, president, and lady are generally expected to be human. Otherwise, a great deal of mining and computation is needed to infer status from various references, relationships, and actions.

Jeopardy judges let Watson get away with a technical error for answering "What is Maxwell's silver hammer" when the Beatle's people category asked for a person's name -- "Who is Maxwell?"

Beatle's Wisdom: All you need is love, Imagine, Hey Jude, Take a sad song and make it betteri, Here comes the sun


Olympic Oddities: The anatomical oddity of George Eyser

Watson was robbed on this question. After correctly selecting George Eyser's leg as the most anatomically related aspect of his record, judges denied credit because Watson didn't specify that the leg was missing. Eyser had a partial leg and competed with a prosthetic [wikipedia] so that judgement was debatable. It would be interesting to know to what degree Watson, who is being groomed for medical consulting applications, is cognizant that a missing leg constitutes a physical anomaly.


Olympic Oddities: A 1976 entrant in the "modern" this was kicked out for wiring his epee to score points without touching his foe

Following the algorithm of extracting proper nouns, dates, and verb phrases produces a search for 1976 olympics epee wiring. This immediatley retrieves a relevant file. Fill in the blank technique focuses on placeholder pronoun "this", equating modern this to modern pentathalon. Answer: pentathalon.


Beatles lyrics and alternate meanings

Watson did well filling in the blank of Beatle's song lyrics, and matching dictionary definitions of stick as a part of a tree and as the verb meaning puncture, but again this was achievable with text string matching and fill in the blank algorithms; a mastery of the nuance of language was not required.





COMMENTARY DAY 2

Day 2 video   video part 2   text archive

ETUDE, BRUTE: Amazingly, Watson completely dominated the category Etude, Brute, whose title, a pun, needed to be ignored as it was only noise or distraction from the answer. Question 1: An etude is a composition that explores a technical musical problem; the name is French for this. Watson correctly provided the English translation of a French word although this action was not explicitly requested. Perhaps Watson has a translation algorithm triggered by X is $language for Y. Watson's distant alternative answers were Studies on Chopin's Etudes and Trancendental Etudes, which indicate that perhaps text matching was used and the dictionary entry got the highest relevance score.

In order to win every remaining question in the category, Watson replaced the pronoun "this" with the appropriate class of answer, for instance musical instrument, or human being. 12 Etudes for this instrument - fill in the blank; this Hungarian's Trancendental Etudes - dormant condition - hibernation, diapause, drought


CATEGORY: ART OF THE STEAL
QUESTION: REMBRANDT'S BIBLICAL SCENE " STORM ON THE SEA OF " THIS WAS STOLEN FROM A BOSTON MUSEUM IN 1990

Algorithm - fill in the blank

  1. EXTRACT first priority SEARCH TERMS: quoted strings and proper nouns.
  2. Rembrandt, " Storm on the Sea ", Boston
  3. PERFORM SEARCH
    • try just high priority terms first
    • If no answer candidates, prepend second priority terms
  4. First result: The Storm on the Sea of Galilee (wikipedia)
  5. Fill in the Blank - seek quoted string in top ranked search hits
  6. Identify " Galilee" as missing term


A Google search for Rembrandt's Storm on the Sea returns the correct answer first, even when Rembrandt is misspelled!



OHIO, NOT SPAIN

A Goya stolen (but recovered) in 2006 belonged to a museum in this city (Ohio, not Spain)

Watson ignored the negation of Spain, answering Madrid instead of Toledo; another error akin to offering Toronto as a US city



COMMENTARY DAY 3


Day 3 video   text archive   part 2   PBS Nova: The Smartest Machine on Earth


JEOPARDY!
IBM Challenge Game 2, Show #6088
Wednesday, February 16, 2011
BREAKING NEWS
Before this hotel mogul's elbow broke through it, a Picasso he owned was worth $139 million; after, $85 million


CATEGORY: BREAKING NEWS
QUESTION: Before this hotel mogul's elbow broke through it, a Picasso he owned was worth $139 million; after, $85 million

Perform algorithm

  1. EXTRACT SEARCH TERMS
  2. proper nouns
    Picasso
  3. this + noun phrase (hotel mugul or hotel mogul's elbow)
    hotel mogul or hotel mogul's elbow
  4. currency
    $139 million
    $85 million
  5. PERFORM SEARCH
    Search: hotel mogul elbow Picasso $139 million $85 million
  6. VERIFY ANSWER
    First result (wikipedia): Steve Wynn. Fits category hotel mogul?
    Search for hotel mogul Steve Wynn. This string is found in top results of refined confirmation search.
    Answer confirmed: Steve Wynn

-->


DOUBLE JEOPARDY ROUND -- Wednesday, February 16, 2011   text archive
day 3 categories

CATEGORY

FAMILIAR SAYINGS
QUESTIONIT'S A POOR WORKMAN WHO BLAMES THESE


THE CONTEST WINNING QUESTION

IBM Watson wins $1,000,000

Category

Final Jeopardy Question
19th century novelists William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel
Finishing second, Jeopardy master Ken Jennings concedes defeat to "our new computer overlords " video Answer: Who is Bram Stoker?

What author was inspired by Wilkinson's Wallachia and Moldavia?
Many Jeopardy questions can be answered with a multi-stage search engine query. See algorithm below.
Question requests "this author". Answer verification search: Bram Stoker = author?


CONCLUSIONS

  • The Jeopardy contest was fair; the proportion of questions favoring human vs machine cognition was balanced.
  • The IBM team did a great job of producing a system which could evaluate options and answer a broad variety of questions in real time
  • The parallel architecture that IBM is focusing on is ideal because numerous hypotheses about text meaning must compete and collaborate in order to forge an interpretation.
  • Superficial text matching approaches to language processing are nonetheless sufficient to answer most questions
  • GRAMMATICAL ANALYSIS DEMONSTRATED? - "The Hedgehog and the Fox" is an essay on this Russian count's view of history by the liberal philosopher Isaiah Berlin NOT AUTHOR, TOLSTOY
  • Heitor Villa-Lobos dedicated his "12 Etudes" for this instrument (guitar) to Andres Segovia 12/composer = Lobos dedicated_to
  • One can answer most Jeopardy questions with only about 5 algorithms
  • Knowledge of categories is important; can be inferred from many text examples
  • gender indicators, '20s date abbreviations, initial letters, rhymes, contains string church, state
  • extract subject of possessive
  • CATEGORIZE: Russian count = human. US city = place
  • VERIFY ANSWER. Reinsert candidate answer is question to verify
  • PROPER RELATIONSHIP - not always retrieving composer; sometimes instrument, sometimes author,
  • author=source. owner possessive
  • Most impressive: identify leg as George Eyser's anatomical oddity, not a text match feasible Q. distinctive feature.

Questions Likely Favoring Watson
answer contains unique combination of terms specified in the question. Ex:
  • famous quotations and distinctive phrases:
  • What's right twice a day? Clock.
  • What's horse designed by committee? Camel
  • What has 736 members? EU parliament
Lookup and rank - what's the northernmost of 3 cities? Lookup coordinates, sort by latitude. Watson probably had a custom app for this. (Miles Obrien demo competition 10:59/21:40)

Questions Likely Difficult for Watson
Questions requiring semantic comprehension
Questions demanding physical world knowledge
Questions requiring emotional and moral reasoning
Questions with short text clues -- search too broad for automated approach
Questions whose answers are not explicitly stated and must be inferred
Questions diluted with mostly irrelevant terms
Questions involving negations
Questions which share no common vocabulary with the answer
Questions with common words -- chicken dish recipes (Miles Obrien demo round 12:55/21:40)

FUTURE WORK PROPOSAL

  • INCREASE LANGUAGE COMPREHENSION PROFICIENCY
    • Level 1: Pattern recognition and processing
    • Level 2: Inference from syntax
    • Level 3: heuristics, adjective processing, predicate evaluation
    • Level 4: internal conceptual modeling
    • Level 5: adept processing with domain expertise
  • DEVELOP KNOWLEDGE RECORDING CAPABILITY - add learning capability to question answering system
  • IMPROVE QUESION ANSWERING
    • cite sources
    • declare reasoning process
  • Future work: natural language knowledge recording, source citation, semantics, syntax
  • Jeopardy is an engaging forum in which to test AI systems
  • Watson should cite his sources
  • declare reasoning process
  • develop a more sophisticated internal representation of meaning to handle more probing questions
  • explain the basis of the humor or viewpoint
  • Need system transparency and test of values, wisdom in order to avoid HAL scenario
  • deep semantics, human situations and values. why is the case of the olympian remarkable? Beause he overcame adversity of loss of leg.
  • language choice guidance -- out of the ordinary. Positive connotation - remarkable, extraordinary, negative - strange, odd; clinical - deviation, anomalous
  • DEEP SEMANTICS, COMPUTER PHILOSOPHER
    • need to cultivate wise, moral computers to avoid HAL scenario
    • condensed ontology nucleus - ground concepts, enable scaling, avoid ad hoc rules
    • human nature, human condition
  • meaning - deprecated - cheating, murder; admired - overcoming adverstiy

SUGGESTED APPLICATIONS FOR MASSIVELY PARALLEL SYSTEMS

  • Assistance, enhancement and support, not replacement of humans
  • Diagnostics - correlate complex symptom prfile with diagnosis and treatment
  • Invention - Analyze patented and published methods to develop new solutions for problems
  • Mutual benefit transaction design - mediation, negotiation -- analyze goals and assets of parties and generate optimal solution for all
  • Decipher the genetic code, cure diseases, aging
  • AI for autonomous machines (space and ocean exploration, household maintenance assistance, dangerous apps) - track cognition with real-time world modeling

original intro


IBM System overview
Slate ezine: Ken Jennings comments about competing against Watson in Jeopardy
visual thesaurus -- How Watson Trounced the Humans -- Ben Zimmer
Liz Liddy, Syracuse iSchool dean comments   blog
Final Jeopardy - Stephen Baker It wasnt just that the computer had to master straightforward language, it had to master humor, nuance, puns, allusions, and slang
Nova
Congressman beats Watson
Shakespeare TEacher Ferruci - How Watson Works
Watson vs. Google data mining RIT
Tech News Daily
quora transcript
Jeopardy match had out-takes
Jeopardy question statistics


Alex Trebek Q: Dave Eggers not-so-modestly titled his memoir "a heartbreaking work of" this
Watson: What is Staggering Genius?

Jeopardy Day 2 introduction: An IBM computer system able to rapidly understand and analyze natural language including puns, riddles, and complex questions across a broad range of knowledge, please welcome Watson.