1. introduction
    1. Modern Human Knowledge Base
      1. Library? no
      2. The Internet
      3. Especially Wikipedia
    2. How do we get cross site informations?
      1. search
        1. visit multiple webpages and extract/compare by hand
    3. Current cross page info presentation (hand made)
      1. Show wikipedia list of XXX
        1. man made informations from across many entrys
        2. slow
          1. needs constant updated on new/modify entrys
      2. Show product comparison page
        1. product property on indivisual pages
        2. opinons on seperated set of pages
        3. new products come out very often
      3. Too weak, and only a few list pages availiable
    4. Corpus will be wikipedia instead the whole Net
      1. Pros:
        1. very high quality
        2. good grammer and volcabulary
        3. easy to get data set
        4. small data set runs faster
      2. Cons:
        1. Conventional facts are not included
          1. fun!
          2. lyrics
          3. conspiracy
          4. jokes
          5. idioms
          6. quriosity kills cats
        2. may contain many false informations (rumors)
        3. not much redundent data for verification
    5. development framework
      1. MapReduce (by Hadoop)
        1. used by Y!, free implementation of G's back bone
        2. wikipedia is still big
          1. 2,869,045 articles
          2. 2.1G just the abtractions
          3. 19G on full text (xml format)
      2. Programming language
        1. (mainly) Python
        2. data stream modulized, language mixing is easy
  2. method
    1. how dose facts look like in NL?
      1. a method: Howto ...
        1. step(Topic, (s1,s2,s3,s4....))
      2. a description: A is B's friend
        1. (A) -> (friend of B)
      3. a relation: cats eat fish
        1. (cat) --(eat)--> (fish)
    2. main method
      1. Extraction
        1. POS Tagging
          1. mark noun phrases and verb phrases
          2. Tools
          3. YamCha
        2. Facts extraction
          1. NVN
          2. solid, but too narrow
          3. NPVPNP
          4. better
          5. NP*VP*NP
          6. limit length
          7. NP+beV+VBN+介係詞
          8. p.p. opposite relation
        3. Normalization
          1. form A --B--> C
          2. 詞性
          3. passive
          4. phrases
        4. NER Tagging?
      2. Presenting
        1. Thesaurus on N
          1. for query expansion
          2. Tools:
          3. WordNet
        2. Thesaurus on V
          1. for query expansion
          2. Tools:
          3. WordNet
          4. VerbOcene
          5. AutoGen from N?
        3. Thesaurus on VP?
          1. AutoGen from N/NP?
  3. presentation
    1. A limited search form
      1. 5W + Verb + NP
      2. NP + Verb + 5W
      3. Problems:
        1. 5W need NER tagging
    2. Query Expansion
      1. N expansion using thesarus
      2. VP expansion ??
  4. Related works & topic
    1. related works
      1. TextRunner
        1. Comparison with the Net Corpus
    2. Comparison with other QA Systems
      1. traditional "Q&A" list
        1. man made
        2. static
        3. works good small scopes
        4. usually indexed, unsearchable
      2. forums
        1. not only contains QAs
          1. but a lot of them are
      3. social "Q&A" (Yahoo! Answers)
        1. an alternate form of forums
        2. man made (User contribute)
        3. potencially bad quality
      4. smarter ones
        1. Answers dot com
          1. not smart enough