AI in Focus: Robotic Vision

Professor Tom Drummond presenting at the Monash University Machine Learning Symposium, organised by the Monash eResearch Centre and MASSIVE.

Professor Tom Drummond presenting at the Monash University Machine Learning Symposium, organised by the Monash eResearch Centre and MASSIVE.

Professor Tom Drummond of Monash University is working on computer-based models of the human brain’s visual cortex that will enable robots to make rapid decisions, based on a programmed understanding of their visual environment.

Perhaps the best-known example of a robot responding to visual cues is the self-driving car. But robots can also be deployed in aerial vehicles, for example to inspect tunnels for safety, or agricultural crops for the distribution of water and nutrients such as nitrates and phosphates. They could even be used to make a call on which fruit in an orchard is ready for harvesting.

“The key to advances in this area is the confluence of big data and big computing,” Professor Drummond said.

“Our involvement with MASSIVE’s super-computing capability, particularly their arsenal of GPUs (graphics processing units) enables us to tap into the truly colossal amount of computation required to train a robot to understand the world around them.”

While robots are well-established in the structured environment of, say, a factory production line, Professor Drummond wants to take them into ‘unstructured’ environments, like those they face on public roads, where they must negotiate the sudden, random appearance of lampposts and kerbs, pedestrians and cyclists.

The interpretation of data from video cameras requires two techniques. One is old-fashioned geometry, to triangulate and compute relative locations.

The second, called ‘deep learning’, is inspired by the biological brain in which complex networks of neurons communicate via electrical signals, which in turn cause them to fire or not to fire.

Deep learning requires the building of multilayered neural networks. The object patterns in the first layer might refer to edges, or simple shapes. More complex patterns such as faces, and the shapes of heads and shoulders are incorporated into the deeper layers.

Robots need training to understand data sets of painstakingly coloured (pixel-by-pixel, by a human) video images to give a faithful representation of physical objects.

“‘Training’ a robot is extremely computer-in- tensive. It can take one week’s use of a high-performance GPU to train one network, with the computer working at a rate of one trillion bits of information per second,” Professor Drummond said.

MASSIVE delivers that huge amount of computing power to the desk top computers in the Drummond laboratory.