Saturday, October 15, 2011

Machine Learning online course - Class 1

As I mentioned in my previous blog post I am going to use this blog as my course notebook. All posts related to this course will have a "ml-class" tag, just in case.

The first class was all introduction stuff, as expected. What I really liked about this class was the real world examples used. They were very useful in understanding what to expect from this course. Anyways, here are my notes for the class :

Initially there were formal definitions of Machine Learning, one of them with rhyming phrases. I think we can skip those parts.

There are two types of learning algorithms - Supervised and Unsupervised

1) Supervised - A bunch of right answers are already provided to the machine. The machine has to try and get more of those right answers for the next set of questions.
The data provided already has some sense of direction or some sort of inference. It is like a set of input and output values and we have to predict the output value for a new input value given based on the existing data. Here the resultant dimension is known and defined. We have to find a suitable function which when applied on the given set of input values will best match the corresponding output values. This same function will then be used to predict output values of new inputs.
  - Eg :
    1) Predicting the price of the house of a particular size given the price of various houses of varying sizes.
    2) Predicting whether a tumor is malignant or not based on the size given the answer for tumors of various sizes
 
  Different Types
    1) Regression - Machine tries to predict a continuous valued attribute, i.e. the value of the attribute whose value we are trying to predict belongs to a continuous range. (The house price example)
    2) Classification - Machine tries to predict a discrete valued output, i.e the range of values is a finite small set of discreet values. (The tumor example)
   
2) Unsupervised learning - The data set given doesn't provide anything conclusive. It is just a data set and we are expected to make sense out of it and come up with the inference. There is no expected or target domain defined. It has to be inferred by examining the data. Very likely several target domains will be defined over the course of analyzing the data.
  - Types :
    1) Clustering of data -
      -Eg : Google news example. Several articles about the same topic are grouped/clustered together. The input data set for this is just a bunch of articles (which is just one dimension/attribute). The other dimension (which is the common topic) itself is not well defined, i.e. the topics are not known before hand. We keep defining them as we go. So we have to infer that some of the articles belong to the same/similar topic and can be grouped together.

That's it. Done with the first class. YAY.. !. I am yet to attempt the review exercises. I have decided to go for review exercises of this and the next class together.

ಹರಿಃ ಓಂ.

No comments:

Post a Comment