Teaching robots to see

Child’s play and yet incredibly difficult: that’s the motto. Machines gain the ability to see through vision systems. However, it takes more than just a vision system to identify and understand objects.

A robot moves over a box of colourful building blocks in various shapes and purposefully grabs a red square and places it next to the box. This scenario seems simple enough to humans. A robot programmer, however, faces great challenges here. Reaching into the box is one of the most difficult tasks in robotics. It is not the grasping and depositing that cause problems, but the recognition of the unsorted objects.


Definition of vision

Perceiving. This is how the German dictionary defines the term seeing. But how is a machine that lacks this sense supposed to perceive anything? As already mentioned, machine vision systems are the solution. These function similarly to human vision, since both humans and a machine do not see an object themselves.

Both “eyes” – the human and the technical – merely perceive reflections that an object throws back. In humans, the iris, pupil and retina are responsible for this. The eye bundles light, focuses and maps colours. Finally, all the information is passed on to the brain. In a machine, these tasks are performed by cameras, shutters, cables and computing units.

Despite these many similarities, there are still major differences between human and technical vision. The biggest difference is in the understanding and interpretation of the image information. In the course of his life, a human being learns to assess and filter the meaning of objects as well as situations. Machine vision systems, on the other hand, only identify objects correctly if they have been programmed or trained to do so in advance. An example to illustrate this: Young children can distinguish bananas from apples or horses from cows without any problems.

A technical system, on the other hand, must be adapted for different tasks in order to distinguish between different types of fruit or animals. To do this, programmers must know in advance what a system must later be able to do.


How can a machine ultimately perceive objects?

Deep learning and neuronal networks make it possible to classify images. This is how standard applications achieve good results. To attain this, machine vision systems need a large amount of image material. Often, there is not enough learning material available, especially from faulty objects. In addition, it is useful not only to extract the acquired information regarding the handle point from the image.

It is also important to think about what actions result from this information for the robot. For example, if a component arrives at the robot in a shifted position, i.e. the grip point has changed somewhat, the robot must recognize the situation so that the component can be gripped correctly and positioned correctly in the further process.


Deep Learning expands the existing range of applications

The combination of Deep Learning and mobility opens up new fields of application: from robot-based harvesting to applications in the care sector. Another major future area is embedded vision. This describes the direct embedding of image processing in end devices and is needed, for example, in autonomous driving.


Assistance from cloud systems

All these areas require high computing capacity from machine vision. In the future, edge and cloud computing concepts will mainly play a role here. When image processing systems are connected directly from the production plant to cloud systems, there are several advantages:

  1. Logistics ERP modules can directly trigger repeat orders or collect and evaluate results from quality assurance transparently as statistics in the cloud.
  2. Flexible scalability, which enables additional storage capacity or greater computing capacity at any time if required.
  3. Greater cost efficiency, as users only pay for resources they actually need.
  4. Reputable cloud service providers offer highly available, clustered data centres, ensuring a high level of resilience.


All these challenges show that human vision and judgment are still somewhat ahead of technical vision. However, reaching into the box is evolving and improving all the time through Deep Learning and the cloud. While the human eye cannot yet completely replace sensors, very good results can be achieved through adjustments and training. In addition, a vision system meets the highest quality standards, as a robot can work 24/7, helping companies to be more productive and competitive

25. February 2021

Continue reading:

Share This