1. introduction
    1. Philosophy in AI
      1. perception
      2. action
      3. reasoning
      4. problem-solving
      5. learning and adaptation
      6. sociality
      7. creativity
      8. two views
        1. behavior/action
        2. thought process/reasoning
    2. ideas for AI
      1. kinds of learning
        1. supervised learning
        2. unsupervised learning
        3. reinforcement learning
  2. Search and Problem Solving
    1. what is search
      1. cost
      2. solution
      3. effectiveness
      4. efficiency
      5. step
        1. 1、represent the problem
          1. initial states
          2. goal states
          3. represent the successor functions
          4. construct the state space
          5. find a way from initial states to goal states
        2. 2、search the graph for the solution
    2. problem representation
      1. state space
        1. s: the set of states
        2. f: the set of successor functions
        3. c: the set of function costs
        4. i: the set of initial states
        5. g: the set of goal states
        6. (S,F,C,I,G)
      2. state space based problem solving
        1. represent all states, in which the initial states and goal states are labeled
        2. represent all successor functions
        3. take states as node and successor functions as edge to get the corresponding graph
        4. search the graph for optimal or feasible solution
      3. and/or graph
        1. each node is corresponding with a problem
        2. and node= decomposition
        3. or node= transformation
        4. end nodes
          1. stop nodes= primitive problems
          2. other end nodes= unsolvable problems
    3. Graph Search
      1. idea
      2. basic data stucture
        1. open table
        2. closed table
      3. blind search strategies
        1. BFS
        2. DFS
        3. DLS: depth-limited search
        4. IDS: iterative deepening search
      4. heuristic search
        1. A* algorithm
          1. f(n)=g(n)+h(n)
          2. g(n)- cost so far to reach n
          3. h(n)- estimated cost from n to goal
          4. f(n)- estimated total cost of path through n to goal
          5. never overestimates the cost to reach the goal, it is optimistic
          6. if h(n) is admissible, A* is optimal for tree search
        2. AO* algorithm
          1. for And/or graph search
          2. two stages
          3. top-down graph generation
          4. bottom-up node evaluation
    4. Adversarial Search
      1. game tree
        1. 2-player
        2. deterministic
        3. turn-taking
        4. zero-sum
        5. perfect information
      2. minimax algorithm
        1. cutoff test:limit the depth of game tree
        2. evaluation function: evaluate the game state from the view of MAX
      3. α-β pruning
        1. improve search efficiency through pruning some unnecessary branches in the game tree
        2. use depth-limit search strategy
        3. α is the value of the best choice found so far at any choice point along the path for MAX
        4. if v is worse than α, MAX will avoid it- prune that branch
        5. define β similarly for MIN
  3. Knowledge and Reasoning
    1. Propositional logic and First-order logic
      1. general
        1. Logics are formal languages for representing information such that conclusions can be drawn, can be used as a tool for knowledge representation and reasoning
        2. Syntax defines the sentences in the language
        3. Semantics define the "meaning" of sentences
        4. Propositional logic is the simplest logic-illustrates basic ideas
        5. first-order logic assumes the world contains: Objects, Relations, Functions
        6. complex sentences are constructed from atomic sentences using logic connectives
      2. Inference in first-order logic
        1. sentences are true with respect to a model and an interpretation
          1. validity and satisfiablity
          2. model contains objects and relations among them
          3. Interpretation specifies referents for constant, predicate, function symbols
          4. A sentence is valid if it is true in all interpretations
          5. equivalence and entailment
          6. satisfiable--true in some interpretations
          7. unsatisfiable-- true in no interpretations
          8. two sentences are logically equivalent if they have same truth in all interpretations
          9. p entails another sentence Q iff P->Q is valid
          10. substitution and unification
          11. the first unifier is more general than the second
          12. there is a single most general unifier that is unique up to renaming of variables
      3. Resolution
        1. literal
        2. clause
        3. empty
        4. conjunctive normal form
        5. Resolution strategies
          1. unit preference
          2. set of support
          3. input resolution
          4. linear resolution
          5. elimination
    2. Ontology
      1. In AI
        1. a vocabulary used to describe some domain
        2. an explicit specification of the intended meaning of the vocabulary
        3. constraints capturing additional knowledge about the domain
      2. purpose
        1. share knowledge
        2. reuse domain knowledge
        3. make domain assumptions explicit
        4. distinguish domain knowledge from operational knowledge
      3. definition
        1. informal-from a specific domain
        2. formal-domain-specific vocabulary
      4. description logic
        1. reasoning tasks
          1. determining whether or not some individual satisfies a certain concept
          2. determining whether or not a concept is subsumed by another concept
        2. basics
          1. concepts
          2. roles
          3. individuals
          4. operators
    3. Reasoning under ignorance
      1. Probabilistic Reasoning
      2. Bayesian Belief Networks
        1. Exact inference by enumeration
      3. Dempster-Shafer Theory
        1. probabiliy assignment function
        2. belief function
        3. plausibility function
      4. Fuzzy Logic
        1. three strategies
          1. fire
          2. FIFA
          3. FAFI
  4. Artificial Neural Networks
    1. what is ANN
      1. connectionism
      2. bottom-up AI
        1. motivated by billogical neural systems
      3. artificial neurons
        1. input
        2. threshold
        3. y=f(g(x))
          1. f:activation function
          2. threshold
          3. linear
          4. saturation linear
          5. logistic sigmoid
          6. hyperbolic tangent sigmoid
          7. g:combination function
      4. two main problems
        1. architectures
        2. learning approaches
      5. ANN architecture
        1. feedforward architecture
          1. without loops
          2. static
        2. feedback(recurrent) architecutre
          1. with loops
          2. dynamic(non-linear dynamical systems)
      6. learning approaches to ANN
        1. learning model
        2. two categories
        3. learning strategies
          1. hebbrian learning
          2. error correction
          3. Delta Rule
          4. B-P learing
          5. competitive learning
          6. hard competition
          7. sofe competition
          8. stochastic learning
    2. Multi-layer perceptrons(MLP)
      1. single-layer perceptron
        1. adjust weights to reduce error on training set
        2. can represent AND,OR,NOT but not XOR
      2. multi-layer perceptron
        1. layers are usually fully connected
        2. numbers of hidden units typically chosen by hand
      3. B-P network
        1. architecture
        2. learning approach
          1. input data was put forward from input layer to hidden layer,then to out layer
          2. error info was propagated backward from out layer to hidden layer,then to input layer
        3. B-P derivation
          1. global error measure
          2. gradient descent
      4. Adaptive linear neuron
      5. Radial Basis function network
        1. a kind of multi-layer perceptron
          1. input neurons: pass inputs without transformation
          2. hidden neurons:using radial basis functions
          3. output neurons: adaline neurons
    3. Hopfield network and boltzmann machine
      1. hopfield network
        1. feedback architecture
        2. dynamic system
        3. two types
        4. hopfield model
      2. associative memories
        1. a primary application of hopfield network
        2. recalls data(attractor state)based on an input that partially resembles the data itself
        3. also named content-addressable memory
      3. combinatorial optimization
      4. boltzmann machine
        1. goal:get the globally optimal solution
        2. approach:replace the deterministic local search by a randomized local search
        3. metropolis algorithm
          1. based on monte carlo simulation
          2. start at a high temperature and then decrease it gradually
          3. the simulated annealing alorithm
          4. asymptotic convergence
          5. main drawback is the large amount of computational time
        4. boltzmann machine
          1. starts execution with a random initial configuration
          2. a cooling schedual determines how and when to decrement the control parameter T
          3. no state transitions occur, boltzmann machine has reached the final state
          4. in the first phase,the connection weights are determined
          5. in the second phase, the machine searches the global minimum through the annealing procedure
    4. self-organizing feature map of kohonen
      1. what is SOFM
        1. neural network with unsupervised learning
        2. dimensionality reduction concomitant with preservation of topological info
        3. three principals
          1. self-reinforcing
          2. competition
          3. finding the best matching weight vector for the present input
          4. criterion for determining the winning neuron
          5. cooperation
          6. identify a neighborhood around the winning neuron
          7. topological neighborhood can be of different shapes such as square, hexagonal or gaussian
          8. the width of the neighborhood is a function of time
          9. adaptation
          10. weights of neurons within the winning cluster are updated
  5. Machine learning
    1. what,why and how do machines learn?
      1. defining the learning task
      2. learning element
        1. design of a learning element
        2. type of feedback
          1. supervised learning:correct answers for each example
          2. unsupervised learning:correct answers not given
          3. reinforcement learning:occasional rewards
    2. supervised learning
      1. rote learning
        1. learning=memory
        2. save solutions to problems and retrieve them when encountering the same problem
        3. trade-off between costs of saving and computing
      2. what is supervised learning
        1. learn a function from examples
          1. f is the target function
          2. an example is a pair(x,f(x))
          3. problem:find a hypothesis h,h ≈ f
        2. learning principle
          1. construct/adjust h to agree with f on training set
          2. ockham's razor
      3. decision trees and learning
        1. expressiveness
        2. hypothesis spaces
          1. learning task
          2. prefer to find more compact decision trees
        3. decision tree learning
          1. aim:find a small tree consistent with the training examples
          2. idea:choose "most significant" attribute as root of tree
          3. choose an attribute
          4. using info theory
          5. info gain
          6. overfitting
          7. some other hypothesis that fits the training example less well actually performs better over the entire distribution of instances
          8. stop growing the tree earlier,befor it reaches the point where it perfectly classifies the training data
          9. allow the tree to overfit the data,and then post-prune the tree
          10. minimum description length criterion
          11. occam's razor
          12. prefer the simplest hypothesis that fit the data
          13. choose the shortest explanation for the observed data
          14. explain the train data under decision tree
          15. DL=size of the hypothesis and any additional cost of encoding the data given this hypothesis
          16. encoding the decision tree
          17. encoding the additional data which can not be explained by the decision tree
          18. learning decision tree with MDL
          19. use the criterion of MDL increase, instead of Max IG
          20. after the tree is generated,prune it at bottom-up mode until DL can not be reduced
      4. learning in bayesian inference
        1. Bayes' rule
        2. maximum a posterior criterion
          1. finding the most probable hypothesis
          2. determine the MAP hypotheses by using Bayes theorem to calculate the posterior probability of each candidate hypothesis
        3. what are learned in Bayesian inference
          1. class-conditional probabilities P(D|h)
          2. prior probabilities P(h)
          3. assign the same prior probability to each hypothesis
          4. measure hypothesis frequencies over training data
        4. naive bayesian classifiers(NBC)
          1. attribute values are conditionally independent
        5. Bayesian Belief networks(BBN)
          1. conditional independence assumption apply to set of attributes
          2. X is conditionally independent of Y given Z
          3. learning Bayesian Belief Networks
          4. learning the conditional probability tables
          5. the variables are fully observable
          6. only some of the variables are observed
          7. learning the network structure
          8. gradient ascent training
          9. objective(Mazimum likelihood estimation)
          10. optimization method: gradient ascent
      5. discriminative learning for Bayeisan classifiers:MMP
        1. statistical modeling tool: Gaussian Mixture Models
        2. two categories of learning pattern classifiers
          1. generative learning
          2. the first concern is the fit of the class model to observed data
          3. discriminative learning
          4. directly consider the discrimination between classes in the training phase
          5. DLA demonstrate significant better performance over generative counterparts
    3. unsupervised learning:clustering
      1. what is clustering
        1. classical unsupervised learning
        2. the process of dividing the data set into clusters according to the similarity between data
        3. discovering groups and identifying interesting distributions in the underlying data
      2. two key problesm
        1. how to measure the similarity between data or clusters
          1. distance between two ordered data should satisfy some axioms
          2. widely used distance include Euclidean, City block and so on
          3. statistical measure:represent clusters by using statistical models such as Gaussian
        2. how to divide the data set
      3. partition clustering
        1. partition data set into K subsets under defined criterion
        2. search the optimal partition in the set of all possible partitions
          1. blind search
          2. heuristic search
          3. k-means algorithm
          4. features
          5. time complexity O(tkn)
          6. local optimality
          7. need to preset the number of clusters
          8. sensitive to noises and initial means
          9. k-medoids algorithm
          10. also called PAM
          11. idea
          12. data is partitioned according to its distance to medoids
          13. improve clustering quality through iteratively substituting representative data by other data
          14. improvement of PAM
          15. improve the efficiency, especially for large data set
          16. improve algorithm
      4. hierarchical clustering
        1. similarity measure between clusters is important
        2. bottom-up agglomerative nesting
        3. top-down divisive analysis
        4. CURE algorithm
      5. statistical clustering
        1. the foundation of statistical clustering is probability
        2. the probability acts as the distance between data and the cluster
          1. higher probability means closer distance
          2. lower probability means farther distance
        3. typical method:GMM-MDL clustering
          1. the data modeling tool is Gaussian Mixture Model. each component in GMM denotes a cluster
          2. the EM algorithm is applied to fit GMM with specific number of components to the data
          3. the MDL criterion is adopted to determine the optimal number of GMM components
          4. data is clustered to corresponding GMM component
        4. EM algorithm
          1. learning criterion:Maximum likelihood
          2. solve the incomplete-data problem
          3. E-step: calculating the unknown data and computing the conditional expectation of log likelihood
          4. M-step:update the parameters through maximizing the log-likelihood
  6. Nouvelle AI
    1. autonomous agents
      1. agent definitions
        1. five key qualities
          1. situatedness
          2. autonomous
          3. proactive
          4. reactive
          5. social ability
      2. single-agent architectures
        1. deliberative agent systems
        2. reactive
        3. hybrid
      3. practical reasoning
      4. BDI architecture
        1. Beliefs-modeling world states
        2. Desires-choice between possible states
        3. Intentions-commitment to achieving particular state
      5. reactive architecture
      6. hybrid architectures
      7. turing machines
        1. three layers
          1. reactive layer
          2. planning layer
          3. modeling layer
    2. reinforcement learning
      1. computing return from rewards
        1. additive rewards
        2. average rewards
        3. discounted rewards
  7. evolutionary computation
    1. heuristic search methods
      1. deterministic search
        1. calculus-based
        2. greedy
        3. hill-climbing
        4. branch-and-bound
        5. mathematic programming
      2. stochastic search
        1. random search
        2. simulated annealing
        3. monte carlo
        4. tabu search
    2. biological inspiration to search
      1. darwinian evolution
        1. works on population scale,not individual
        2. survival of the fittest
        3. diversity drives change
        4. chance plays a part
    3. what is evolutionary computation
      1. evolution mechanism
        1. increasing diversity by genetic operators
        2. decreasing diversity by selection
      2. five factors
        1. representation
          1. genotypic representations
          2. phenotypic representations
        2. fitness evaluation
        3. genetic operations
          1. recombination
          2. mutation
        4. selection strategies
          1. parent selection
          2. aid escape from local optima
          3. survivor selection
          4. fitness based
          5. age based
          6. exploitation/exploration balance
          7. selection:exploitation
          8. genetic operators:exploration
          9. key issue:appropriate balance
        5. initialization\Termination schemes
          1. initialisation usually done at random
          2. termination condition checked every generation
      3. advantages
        1. widely applicable
        2. no a priori assumptions
        3. insensitive to noise
        4. parallelize
      4. disadvantages
        1. no guarantee for optimal solution
        2. weak theoretical basis
        3. parameter tuning
    4. evolutionary algorithm
      1. genetic algorithm(GA)
        1. simple genetic algorithm
        2. the genome is used to represent candidate solutions
        3. selection
          1. greedy:the fittest solutions are selected
          2. probabilistic:chances proportional to fitness
        4. crossover
        5. mutation
          1. Pm: mutation rate
      2. evolutionary programming(EP)
        1. representation:for continuous parameter optimization
        2. mutation
        3. recombination:none
        4. selection
      3. evolution strategies(ES)
      4. genetic programming(GP)
  8. swarm intelligence
    1. what is swarm intelligence
      1. components of SI
        1. agents
        2. simple behaviours
        3. communication
      2. characteristics of SI
        1. distributed,no central control or data source
        2. limited communication
        3. no model of the environment
        4. perception of environment
        5. ability to react to environment changes
    2. multi-agents system
      1. what is multi-agent systems
        1. a set of agents which interact in a common environment
        2. collaborative resolution of global problems by a set of distributed entities
        3. satisfy their own local goals as well as the collaborative global goals
        4. require the ability to cooperate,coordinate and negotiate with each other
        5. MAS as seen from distributed AI
      2. multi-agent interaction
        1. key elements to achieve multi-agent interaction
          1. coordination mechanism supported by a common agent communication language and protocol
          2. collaboration mechanism to support the organization goal
          3. a shared ontology
        2. coordination
          1. common dependencies among activities
          2. artificial communication
          3. computer communication
          4. communication in MAS
          5. low-level communication
          6. high-level communication
          7. direct communication vs indirect communication
          8. theory of speech acts
          9. locution:physical utterance by the speaker
          10. illocution:the intended meaning of the utterance by the speaker
          11. prelocution:the action that results from the locution
          12. KQML
          13. three layers
          14. content layer
          15. message layer
          16. communication layer
        3. collaboration
    3. simulate SI for search
      1. ant colony optimization
      2. particle swarm optimization
        1. advantages of PSO
          1. adaptation operates on velocities
          2. simple in concept
          3. easy to implement
          4. computationally efficient
          5. effective on a variety of problems