Geometric Reasoning
Overview
The internal representation of the state of the world that robot lives in is called the robot's belief state. It contains the knowledge about the objects in robot's environment which includes the current pose of an object, its dynamic state, e.g., is it stable or is currently in motion, and knowledge about the robot's internal state, such as the current configuration of its arms or its location in the world map.
CRAM uses Bullet physics engine to represent the objects in the world and for acquiring physics knowledge about them, and OpenGL for visualizing the belief state and for offscrean rendering for giving the robot visual information about the world in the virtual environment.
The belief state representation used for geometric reasoning is built upon the representation of the world used in the Bullet physics engine. However, whereas in Bullet the objects are just a set of meshes with certain physical parameters assigned to them (such as mass, inertia etc.), in CRAM objects have additional semantic information associated with them to enable reasoning.
Although CRAM has also been used in an outdoors scenario in the scope of a project for improving rescuing activities in avalanche scenarios, the main application area of CRAM is object manipulation tasks in indoor environments, where the geometric reasoning engine proves to have very powerful potential. Using the geometric reasoning module of CRAM it is possible to infer:
- what is a good pose to position the robot's base such that the robot could be able to grasp a specific object
- where should the robot stand in order to be able to see a specific pose in the world, such that other objects are not in the view occluding it
- if the robot puts down an object at the specific pose, will it be stable, e.g. the robot can infer that if the object is put really close to the edge of the table, it will fall down
- where to position the object such that it doesn't collide with other objects, e.g. what would be a good pose to put down an object on a very cluttered table, such that in the goal configuration the robot's arms don't collide with anything
- and other geometry-related facts, as what is the support of a specific object, which objects have their surfaces touching, what is the coordinate of a certain part of an object etc.
Equipped with the above-mentioned reasoning capabilities, the robot can infer parametrizations of under-specified actions, e.g., if the robot is required to bring a plate from the counter to the dining table, it can infer where is the counter positioned, where should it stand in order to be able to see objects on the counter, when it finds a plate on the counter where should it stand in order to grasp it, where should it stand to be able to put down the object in a location on the dining table etc. This information can then be used at run-time when actually performing the action on the real robot, or used in a simulation in order to find the best parametrization based on a number of trials. Which parametrization is the “best” is then decided based on a certain cost function, which can, for example, be a distance the robot would be driving throughout the action execution.
In order to be able to use the results of the simulation while executing the action in the real world, the simulation has to be extremely fast, so that the robot wouldn't spend too much time “thinking” before actually starting to execute the action. Therefore, the simulation environment used in CRAM is designed to be very lightweight. As a result, all the low-level specifics of executing actions are neglected, e.g., manipulation trajectories are actually not executed in a continuous manner, but the robot's arm is teleported from one key configuration to another, i.e. when grasping an object, at one moment the robot's arm is in the initial configuration, then it is in the pre-grasp position if one is given, and, finally, it is in the grasp configuration. The actual execution of the trajectory should then be handled by the motion planning library and the low-level controllers.
The video demonstrates execution of a pancake making task including table setting / preparation in the lightweight simulation that we call plan projection. The video is edited, it is not realtime, so for the time being the efficiency of the projection is not yet satisfactory for our needs. However, the fact that it doesn't have to calculate and render every timestamp of the continuous world state, as traditional simulator software does, gives it an advantage for efficient lightweight reasoning.
For more information, take a look at the related publications.