Machine learning is getting lots of attention lately. It’s amazing that some 200 people showed up at Hack Bulgaria event and stayed almost 3 hours to learn more about machine learning. Not to mention it was friday and the venue was not in the center of the city! It was a clear indication for us that lots of developers are getting curious about machine learning (ML) and that’s totally cool for companies like ours.
This is a short overview of our not so tech presentation about machine learning for images. Some of the other lecturers have covered different aspects of ML and it’s application in various cases and industries.
From the moment of their invention the convolution networks were great for tasks as face detection and handwriting. Thanks to the advance of the GPU technology and the extended base of image data, the convolutional networks demonstrated far better results for complicated tasks such as visual classification of objects in images.
There are some specifics when it comes to image recognition using machine learning. Images are a matrix of pixels (raster data) and that’s why recognition is sensitive to lighting, contrast, saturation, blur, noise, geometric transformations (scaling, translation, rotation) and occlusion.
Conventional image recognition methods struggle to find the optimal set of filters (convolutions) to apply for each specific use-case.
There are multiple levels and scales of interest, from low-level features such as texture to high-level features such as composition. On the top of that there’s a need for data-augmentation to compensate for sensitivity (e.g. training with blurred, cropped, scaled, noised versions of the images)
In order to do proper image analysis you will need huge (both deep and wide) architecture which requires massive amount of memory and processing power, made more accessible today via GPU empowered machines. It still takes a lot of time (up to 10 days) to train some large architectures.
There are a few implementations for convolutional neural networks.
- cuda-convnet – python interface, c++/cuda implementation for convolutional neural networks, training using back-propagation method, fermi-generation nVidia GPU(GTX 4xx, GTX 5xx, or Tesla equivalent) is required, but no multi-GPU support
- cuda-convnet2 – an upgrade to cuda-convnet, optimized for new kepler-generation nvidia gpus and added multi-gpu support
- caffe – deep learning framework, developed by Berkeley Vision and Learning Center, big community of contributors, support for nVidia’s GPU accelerated convnet library – cudnn
- torch7 – lua interface, with support for python, wide support for machine learning algorithms, one of the fastest implementation for convnet is a torch7 extension – fbcnn from facebook artificial intelligence research team; there are other extensions as well and a support for cudnn
- theano – a python library, open-ended in terms of network architecture and transfer functions, slightly lower-level than the other implementations
We will be doing a series of articles in our blog on how image recognition is changing the paradigm and will allow image intensive business to finally better understand and monetize their image contents.
[slideshare id=47160420&doc=imaggaproductpresentationmleventslideshare-150419060621-conversion-gate02]