Suppose we have data D = {x(i)} (i = 1, … , N), θ(MLE) = argmax_θ (Π(i=1 to N)P(x(i) | θ).
Maximum A Posteriori
Suppose we have data D = {x(i)} (i = 1, … , N), θ(MLE) = argmax_θ (Π(i=1 to N)P(x(i) |θ) P(θ).
Generative approaches
hypothesis h(x,y) = p(x,y) specifies a generative story for how the data was created, then pick a hypothesis by maximum likelihood estimation (MLE) or Maximum A Posteriori (MAP).
Discriminative approaches
hypothesis h directly predicts the label given the features y = h(x) or more generally, p(y|x) = h(x), then define a loss function and find hypothesis with minimum loss.