-
introduction
-
Philosophy in AI
- perception
- action
- reasoning
- problem-solving
- learning and adaptation
- sociality
- creativity
-
two views
- behavior/action
- thought process/reasoning
-
ideas for AI
-
kinds of learning
- supervised learning
- unsupervised learning
- reinforcement learning
-
Search and Problem Solving
-
what is search
- cost
- solution
- effectiveness
- efficiency
-
step
-
1、represent the problem
- initial states
- goal states
- represent the successor functions
- construct the state space
- find a way from initial states to goal states
- 2、search the graph for the solution
-
problem representation
-
state space
- s: the set of states
- f: the set of successor functions
- c: the set of function costs
- i: the set of initial states
- g: the set of goal states
- (S,F,C,I,G)
-
state space based problem solving
- represent all states, in which the initial states and goal states are labeled
- represent all successor functions
- take states as node and successor functions as edge to get the corresponding graph
- search the graph for optimal or feasible solution
-
and/or graph
- each node is corresponding with a problem
- and node= decomposition
- or node= transformation
-
end nodes
- stop nodes= primitive problems
- other end nodes= unsolvable problems
-
Graph Search
- idea
-
basic data stucture
- open table
- closed table
-
blind search strategies
- BFS
- DFS
- DLS: depth-limited search
- IDS: iterative deepening search
-
heuristic search
-
A* algorithm
- f(n)=g(n)+h(n)
- g(n)- cost so far to reach n
- h(n)- estimated cost from n to goal
- f(n)- estimated total cost of path through n to goal
- never overestimates the cost to reach the goal, it is optimistic
- if h(n) is admissible, A* is optimal for tree search
-
AO* algorithm
- for And/or graph search
- two stages
- top-down graph generation
- bottom-up node evaluation
-
Adversarial Search
-
game tree
- 2-player
- deterministic
- turn-taking
- zero-sum
- perfect information
-
minimax algorithm
- cutoff test:limit the depth of game tree
- evaluation function: evaluate the game state from the view of MAX
-
α-β pruning
- improve search efficiency through pruning some unnecessary branches in the game tree
- use depth-limit search strategy
- α is the value of the best choice found so far at any choice point along the path for MAX
- if v is worse than α, MAX will avoid it- prune that branch
- define β similarly for MIN
-
Knowledge and Reasoning
-
Propositional logic and First-order logic
-
general
- Logics are formal languages for representing information such that conclusions can be drawn, can be used as a tool for knowledge representation and reasoning
- Syntax defines the sentences in the language
- Semantics define the "meaning" of sentences
- Propositional logic is the simplest logic-illustrates basic ideas
- first-order logic assumes the world contains: Objects, Relations, Functions
- complex sentences are constructed from atomic sentences using logic connectives
-
Inference in first-order logic
-
sentences are true with respect to a model and an interpretation
- validity and satisfiablity
- model contains objects and relations among them
- Interpretation specifies referents for constant, predicate, function symbols
- A sentence is valid if it is true in all interpretations
- equivalence and entailment
- satisfiable--true in some interpretations
- unsatisfiable-- true in no interpretations
- two sentences are logically equivalent if they have same truth in all interpretations
- p entails another sentence Q iff P->Q is valid
- substitution and unification
- the first unifier is more general than the second
- there is a single most general unifier that is unique up to renaming of variables
-
Resolution
- literal
- clause
- empty
- conjunctive normal form
-
Resolution strategies
- unit preference
- set of support
- input resolution
- linear resolution
- elimination
-
Ontology
-
In AI
- a vocabulary used to describe some domain
- an explicit specification of the intended meaning of the vocabulary
- constraints capturing additional knowledge about the domain
-
purpose
- share knowledge
- reuse domain knowledge
- make domain assumptions explicit
- distinguish domain knowledge from operational knowledge
-
definition
- informal-from a specific domain
- formal-domain-specific vocabulary
-
description logic
-
reasoning tasks
- determining whether or not some individual satisfies a certain concept
- determining whether or not a concept is subsumed by another concept
-
basics
- concepts
- roles
- individuals
- operators
-
Reasoning under ignorance
- Probabilistic Reasoning
-
Bayesian Belief Networks
- Exact inference by enumeration
-
Dempster-Shafer Theory
- probabiliy assignment function
- belief function
- plausibility function
-
Fuzzy Logic
-
three strategies
- fire
- FIFA
- FAFI
-
Artificial Neural Networks
-
what is ANN
- connectionism
-
bottom-up AI
- motivated by billogical neural systems
-
artificial neurons
- input
- threshold
-
y=f(g(x))
- f:activation function
- threshold
- linear
- saturation linear
- logistic sigmoid
- hyperbolic tangent sigmoid
- g:combination function
-
two main problems
- architectures
- learning approaches
-
ANN architecture
-
feedforward architecture
- without loops
- static
-
feedback(recurrent) architecutre
- with loops
- dynamic(non-linear dynamical systems)
-
learning approaches to ANN
- learning model
- two categories
-
learning strategies
- hebbrian learning
- error correction
- Delta Rule
- B-P learing
- competitive learning
- hard competition
- sofe competition
- stochastic learning
-
Multi-layer perceptrons(MLP)
-
single-layer perceptron
- adjust weights to reduce error on training set
- can represent AND,OR,NOT but not XOR
-
multi-layer perceptron
- layers are usually fully connected
- numbers of hidden units typically chosen by hand
-
B-P network
- architecture
-
learning approach
- input data was put forward from input layer to hidden layer,then to out layer
- error info was propagated backward from out layer to hidden layer,then to input layer
-
B-P derivation
- global error measure
- gradient descent
- Adaptive linear neuron
-
Radial Basis function network
-
a kind of multi-layer perceptron
- input neurons: pass inputs without transformation
- hidden neurons:using radial basis functions
- output neurons: adaline neurons
-
Hopfield network and boltzmann machine
-
hopfield network
- feedback architecture
- dynamic system
- two types
- hopfield model
-
associative memories
- a primary application of hopfield network
- recalls data(attractor state)based on an input that partially resembles the data itself
- also named content-addressable memory
- combinatorial optimization
-
boltzmann machine
- goal:get the globally optimal solution
- approach:replace the deterministic local search by a randomized local search
-
metropolis algorithm
- based on monte carlo simulation
- start at a high temperature and then decrease it gradually
- the simulated annealing alorithm
- asymptotic convergence
- main drawback is the large amount of computational time
-
boltzmann machine
- starts execution with a random initial configuration
- a cooling schedual determines how and when to decrement the control parameter T
- no state transitions occur, boltzmann machine has reached the final state
- in the first phase,the connection weights are determined
- in the second phase, the machine searches the global minimum through the annealing procedure
-
self-organizing feature map of kohonen
-
what is SOFM
- neural network with unsupervised learning
- dimensionality reduction concomitant with preservation of topological info
-
three principals
- self-reinforcing
- competition
- finding the best matching weight vector for the present input
- criterion for determining the winning neuron
- cooperation
- identify a neighborhood around the winning neuron
- topological neighborhood can be of different shapes such as square, hexagonal or gaussian
- the width of the neighborhood is a function of time
- adaptation
- weights of neurons within the winning cluster are updated
-
Machine learning
-
what,why and how do machines learn?
- defining the learning task
-
learning element
- design of a learning element
-
type of feedback
- supervised learning:correct answers for each example
- unsupervised learning:correct answers not given
- reinforcement learning:occasional rewards
-
supervised learning
-
rote learning
- learning=memory
- save solutions to problems and retrieve them when encountering the same problem
- trade-off between costs of saving and computing
-
what is supervised learning
-
learn a function from examples
- f is the target function
- an example is a pair(x,f(x))
- problem:find a hypothesis h,h ≈ f
-
learning principle
- construct/adjust h to agree with f on training set
- ockham's razor
-
decision trees and learning
- expressiveness
-
hypothesis spaces
- learning task
- prefer to find more compact decision trees
-
decision tree learning
- aim:find a small tree consistent with the training examples
- idea:choose "most significant" attribute as root of tree
- choose an attribute
- using info theory
- info gain
- overfitting
- some other hypothesis that fits the training example less well actually performs better over the entire distribution of instances
- stop growing the tree earlier,befor it reaches the point where it perfectly classifies the training data
- allow the tree to overfit the data,and then post-prune the tree
- minimum description length criterion
- occam's razor
- prefer the simplest hypothesis that fit the data
- choose the shortest explanation for the observed data
- explain the train data under decision tree
- DL=size of the hypothesis and any additional cost of encoding the data given this hypothesis
- encoding the decision tree
- encoding the additional data which can not be explained by the decision tree
- learning decision tree with MDL
- use the criterion of MDL increase, instead of Max IG
- after the tree is generated,prune it at bottom-up mode until DL can not be reduced
-
learning in bayesian inference
- Bayes' rule
-
maximum a posterior criterion
- finding the most probable hypothesis
- determine the MAP hypotheses by using Bayes theorem to calculate the posterior probability of each candidate hypothesis
-
what are learned in Bayesian inference
- class-conditional probabilities P(D|h)
- prior probabilities P(h)
- assign the same prior probability to each hypothesis
- measure hypothesis frequencies over training data
-
naive bayesian classifiers(NBC)
- attribute values are conditionally independent
-
Bayesian Belief networks(BBN)
- conditional independence assumption apply to set of attributes
- X is conditionally independent of Y given Z
- learning Bayesian Belief Networks
- learning the conditional probability tables
- the variables are fully observable
- only some of the variables are observed
- learning the network structure
- gradient ascent training
- objective(Mazimum likelihood estimation)
- optimization method: gradient ascent
-
discriminative learning for Bayeisan classifiers:MMP
- statistical modeling tool: Gaussian Mixture Models
-
two categories of learning pattern classifiers
- generative learning
- the first concern is the fit of the class model to observed data
- discriminative learning
- directly consider the discrimination between classes in the training phase
- DLA demonstrate significant better performance over generative counterparts
-
unsupervised learning:clustering
-
what is clustering
- classical unsupervised learning
- the process of dividing the data set into clusters according to the similarity between data
- discovering groups and identifying interesting distributions in the underlying data
-
two key problesm
-
how to measure the similarity between data or clusters
- distance between two ordered data should satisfy some axioms
- widely used distance include Euclidean, City block and so on
- statistical measure:represent clusters by using statistical models such as Gaussian
- how to divide the data set
-
partition clustering
- partition data set into K subsets under defined criterion
-
search the optimal partition in the set of all possible partitions
- blind search
- heuristic search
- k-means algorithm
- features
- time complexity O(tkn)
- local optimality
- need to preset the number of clusters
- sensitive to noises and initial means
- k-medoids algorithm
- also called PAM
- idea
- data is partitioned according to its distance to medoids
- improve clustering quality through iteratively substituting representative data by other data
- improvement of PAM
- improve the efficiency, especially for large data set
- improve algorithm
-
hierarchical clustering
- similarity measure between clusters is important
- bottom-up agglomerative nesting
- top-down divisive analysis
- CURE algorithm
-
statistical clustering
- the foundation of statistical clustering is probability
-
the probability acts as the distance between data and the cluster
- higher probability means closer distance
- lower probability means farther distance
-
typical method:GMM-MDL clustering
- the data modeling tool is Gaussian Mixture Model. each component in GMM denotes a cluster
- the EM algorithm is applied to fit GMM with specific number of components to the data
- the MDL criterion is adopted to determine the optimal number of GMM components
- data is clustered to corresponding GMM component
-
EM algorithm
- learning criterion:Maximum likelihood
- solve the incomplete-data problem
- E-step: calculating the unknown data and computing the conditional expectation of log likelihood
- M-step:update the parameters through maximizing the log-likelihood
-
Nouvelle AI
-
autonomous agents
-
agent definitions
-
five key qualities
- situatedness
- autonomous
- proactive
- reactive
- social ability
-
single-agent architectures
- deliberative agent systems
- reactive
- hybrid
- practical reasoning
-
BDI architecture
- Beliefs-modeling world states
- Desires-choice between possible states
- Intentions-commitment to achieving particular state
- reactive architecture
- hybrid architectures
-
turing machines
-
three layers
- reactive layer
- planning layer
- modeling layer
-
reinforcement learning
-
computing return from rewards
- additive rewards
- average rewards
- discounted rewards
-
evolutionary computation
-
heuristic search methods
-
deterministic search
- calculus-based
- greedy
- hill-climbing
- branch-and-bound
- mathematic programming
-
stochastic search
- random search
- simulated annealing
- monte carlo
- tabu search
-
biological inspiration to search
-
darwinian evolution
- works on population scale,not individual
- survival of the fittest
- diversity drives change
- chance plays a part
-
what is evolutionary computation
-
evolution mechanism
- increasing diversity by genetic operators
- decreasing diversity by selection
-
five factors
-
representation
- genotypic representations
- phenotypic representations
- fitness evaluation
-
genetic operations
- recombination
- mutation
-
selection strategies
- parent selection
- aid escape from local optima
- survivor selection
- fitness based
- age based
- exploitation/exploration balance
- selection:exploitation
- genetic operators:exploration
- key issue:appropriate balance
-
initialization\Termination schemes
- initialisation usually done at random
- termination condition checked every generation
-
advantages
- widely applicable
- no a priori assumptions
- insensitive to noise
- parallelize
-
disadvantages
- no guarantee for optimal solution
- weak theoretical basis
- parameter tuning
-
evolutionary algorithm
-
genetic algorithm(GA)
- simple genetic algorithm
- the genome is used to represent candidate solutions
-
selection
- greedy:the fittest solutions are selected
- probabilistic:chances proportional to fitness
- crossover
-
mutation
- Pm: mutation rate
-
evolutionary programming(EP)
- representation:for continuous parameter optimization
- mutation
- recombination:none
- selection
- evolution strategies(ES)
- genetic programming(GP)
-
swarm intelligence
-
what is swarm intelligence
-
components of SI
- agents
- simple behaviours
- communication
-
characteristics of SI
- distributed,no central control or data source
- limited communication
- no model of the environment
- perception of environment
- ability to react to environment changes
-
multi-agents system
-
what is multi-agent systems
- a set of agents which interact in a common environment
- collaborative resolution of global problems by a set of distributed entities
- satisfy their own local goals as well as the collaborative global goals
- require the ability to cooperate,coordinate and negotiate with each other
- MAS as seen from distributed AI
-
multi-agent interaction
-
key elements to achieve multi-agent interaction
- coordination mechanism supported by a common agent communication language and protocol
- collaboration mechanism to support the organization goal
- a shared ontology
-
coordination
- common dependencies among activities
- artificial communication
- computer communication
- communication in MAS
- low-level communication
- high-level communication
- direct communication vs indirect communication
- theory of speech acts
- locution:physical utterance by the speaker
- illocution:the intended meaning of the utterance by the speaker
- prelocution:the action that results from the locution
- KQML
- three layers
- content layer
- message layer
- communication layer
- collaboration
-
simulate SI for search
- ant colony optimization
-
particle swarm optimization
-
advantages of PSO
- adaptation operates on velocities
- simple in concept
- easy to implement
- computationally efficient
- effective on a variety of problems