From 'Doom' to Viscera: the next leap forward for Cognitive Neural Networks?
A new innovation could be the biggest boost for the Cognitive Neural Network since the videogame revolution revived it over a decade ago
In 2012 the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) began to revive the fortunes and professional estimation of one of the research community's oldest stalwarts of Machine Learning - Convolutional Neural Networks (CNNs).
CNNs are computer learning systems modelled along the organization of the animal visual cortex, and date back to research from the 1950s. Their great promise in the field of visual recognition and natural language processing was, however, hamstrung by the need to create dedicated hardware and software frameworks which proved susceptible to rapid obsolescence, and which were restricted by their high demand for compute cycles.
Not even the increasing fulfilment of Moore's Law in the 1990s and 2000s was adequate to bring CNNs to parity with machine learning techniques that had grown up with the technologies that they were running on.
Then, ironically, Doom became the salvation of CNNs.
The new bottleneck for Convolutional Neural Networks
In 2005 researchers from Microsoft presented new research proving that the Graphical Processing Units (GPUs) in video cards, advancing faster than conventional CPUs due to the increasing demands of high-resolution and low-latency video games, could be leveraged to radically accelerate machine learning algorithms.
But the potential of GPU/CNN research only became clear around the time of the first ImageNet challenge, which furnished competitors with a comprehensive generic image database and eliminated — at least for the purposes of the challenge — the need for the main bottleneck in CNN research: the generation of the 'training set'.
For less theoretical work, it's still an obstacle; though there are now mature frameworks to automate the creation of training databases for CNN research, the resulting materials are usually project-specific. Despite huge reductions in recent years in the cost of compute cycles, and the availability of a growing number of open-sourced datasets, any database likely to actually be useful to researchers is inevitably also likely to need painstaking labelling.
The 'pre-baked' CNN dataset
IEEE members led by Nima Tajbakhsh of California-based biotech company CureMetrix have published research indicating that pre-existing image databases hold great practical and applicable potential in real-world use cases when 'tuned' (rather than scratch-built) towards the parameters of a particular study.
A three-way Convolutional Neural Network operating on medical imaging
The researchers conducted experiments in four sectors of medical imaging related to colonoscopy, and found that their fine-tuned, pre-trained CNNs either outperformed or equalled the performance of scratch-trained CNN datasets. The field of research chosen - medical imaging - represents a formidable initial challenge for such a 'pre-baked' system. If replicable across other (less-critical) fields, the idea promises a significant boon to the evolution of Convolutional Neural Networks in the post-GPU age.
Sceptics may wonder if the amount of 'template tweaking' involved in obtaining these results could be equal to the trouble and resources necessary to originate a dataset from scratch. However the researchers emphasize that 'neither shallow tuning nor deep tuning was the optimal choice for a particular application', and that both fine-tuned and fully-trained CNNs outperformed the 'handcrafted' solutions currently seen as the chief obstacle to the proliferation of new research that uses this methodology.
The 'hand-crafted' method produces among the highest number of false positives, compared to the far lower-effort 'tuned' pre-trained dataset.
The emergence of low-maintenance CNN dataset 'templates' capable of achieving this kind of efficacy would have significant import for artificial intelligence research in general.
Image-processing is one of the premier fields within machine learning, covering sectors as diverse as urban and private security, facial recognition, and image and video captioning and classification as well as medical research and diagnosis.
Because it crosses so far and so frequently into the topic of natural language recognition and semantic labelling, research breakthroughs in this sphere hold much potential for cross-fertilisation across disciplines within Supervised Learning.
The researchers based their work on the AlexNet architecture , which already contained a pre-trained model available in the Caffe library, though they note the potential to conduct further investigations with deeper architectures such as VGGNet and GoogleNet . However they don't anticipate notable performance gains with deeper architectures.
The paper also suggests that unsupervised pre-trained models, including those produced by restricted Boltzmann machines (RBMs), could be applicable to their approach.
Images: Wikimedia Commons / ARXIV