Machine perception: can robots grasp the invisible?

Can robots finally perceive and handle transparent objects? the answer is a cautious “yes, if they see differently”

As automation expands across manufacturing, logistics, and food service, one stubborn problem has continued to limit robotic deployment: handling transparent and reflective objects. From glassware and plastic packaging to polished metal components, these materials routinely confuse conventional vision systems, forcing costly human intervention.

A new approach from Tokyo University of Science suggests that barrier may be closer to falling.

Researchers have developed HEAPGrasp, a perception and grasping method that enables robots to reliably manipulate objects with diverse optical properties using only a standard RGB camera. In real-world tests, the system achieved a 96.0% grasp success rate while reducing camera movement and execution time.

Robots have long excelled at handling opaque items, where depth sensors can easily reconstruct 3D shapes. But when it comes to transparent or reflective materials, those same sensors often fail. Light passes through clear objects or scatters unpredictably off glossy surfaces, resulting in incomplete or noisy data.

“Traditionally, transparent or mirrored (glossy) objects such as reflective metal parts, transparent trays have been unstable to detect when using depth sensors or conventional 3D measurement techniques, making automatic grasping by robots difficult and ultimately leading to human intervention,” explains Shogo Arai, one of the creators of the new approach.

“Our approach is based on the idea that even when depth information is unreliable, object shape estimation and grasping are still possible as long as the object’s contours or silhouettes can be captured reliably in images.”

Rather than trying to fix depth sensing, HEAPGrasp avoids it altogether. The system uses semantic segmentation to isolate objects in RGB images, then reconstructs their 3D shape using a method known as Shape-from-Silhouette. By combining outlines from multiple viewpoints, the robot can estimate object geometry without relying on surface properties.

This shift from depth-based perception to contour-based reconstruction is key. It allows the system to work consistently across transparent, reflective, and opaque materials using the same hardware setup.

To ensure efficiency, the researchers also addressed a second challenge: time. Multi-view reconstruction typically requires moving a camera around a scene, which increases handling time. HEAPGrasp introduces an active perception strategy that selects only the most informative viewpoints, balancing accuracy with motion.

The result is a system that not only improves reliability, but also speeds up operations. Tests showed a 52% reduction in camera trajectory length and a 19% decrease in execution time compared with conventional scanning approaches.

The method, developed by Arai and Ginga Kennis, was evaluated against established techniques including depth-based grasping networks and neural radiance field models. While those methods saw performance drop sharply on transparent and reflective objects, HEAPGrasp maintained success rates above 92.6% across all categories.

There are still limitations. Because the system reconstructs a “visual hull”, it can struggle with concave shapes, and performance can decline in highly cluttered environments. But its robustness to segmentation noise and ability to generalise to unseen objects suggest strong practical potential.

For industry, the implications are significant. By relying only on RGB cameras, HEAPGrasp removes the need for expensive sensing hardware and can be integrated into existing robotic systems. This lowers both cost and complexity, making advanced robotic handling more accessible.

The research, published in IEEE Robotics and Automation Letters and to be presented at the IEEE International Conference on Robotics and Automation 2026, reflects a broader shift in robotics: solving perception challenges through software rather than hardware.