
Assignment #1 from Udacity’s Deep Learning course gives you insight that a logistic multi-nomial (linear) regression model may not give you the best accuracy required for the non-MNIST dataset classification problem.
Let us look at the logisitic multi-nomial model as an algorithm and try to calculate it’s complexity.
The 2 parameters to consider here are W – Weight Matrix and b – bias matrix with 1 layer.
Imagine, the input image is a 28×28 image and the output is a 10 class vector.
The input image is going to be stretched out into individual pixels feeding into each unit. This makes the input layer dimensions to be 28×28. The dimensions of the parameter W become (28×28)x10 which gets added to a 10×1 bias matrix. The total number of parameters are:
28x28x10+10 = (N+1)*K
Where N is the number of inputs and K is the number of outputs.
Another way to understand this is – Between an input layer with 28×28 nodes and an output layer with 10 nodes, you need a minimum of 28x28x10 weights for a fully connected network with bias on top of that which adds the extra 10 to the above equation.
The argument started with exploring the accuracy of a logistic regression model which is close to 90% for this problem. In reality to achieve higher accuracy we need a lot more parameters to generalize and extend the model for a better solution to the problem. This paves to way to further exploring Deep Neural Networks.
Stay tuned for more.