© Stefano Nolfi, 2021 | How to cite this book | Send your feedback | Collaborate
Index Next Chapter
Another fundamental property of robots is situatedness, i.e. being situated in an external environment. The presence of an external environment combined with the fact that the behavior of robots results from the continuous interaction between the robots and the environment has important consequences.
In any step: (i) the robot determines the state of its actuators on the basis of the current and previous observations, (ii) the action of the robot alters the robot/environmental relation and eventually the environment itself (Figure 4.1). Consequently, the action produced by the robot co-determines the sensory states that the robot experiences later.
This implies that the actions performed by a robot have two effects: (1) they enable the robot to achieve its goals, (2) they alter the perceptual environment of the robot and/or the physical environment. Ideally the second effect should be instrumental to the first, i.e. the robot should choose actions that alter the environment in a way that facilitates the achievement of the robot’s goal. In this Chapter we will show that the possibility to alter the environment plays a key role in adaptive robots.
The term sensory-motor coordination indicates the ability of an agent to act so to ensure that it will later experience useful observations, i.e. sensory states enabling the robot to achieve its goals.
It is important to consider that coordinating the sensory and motor processes is not simply an exploitable possibility but a necessity since the actions of the robot will inevitably alter the following observations. Indeed, the observations later experienced by the robot always depends on the actions performed by the robot previously. The perceptual environment of the robot is a relational property that depends both on the environment and on the robot.
The fact that the actions performed by a robot have long term effects on the further observations, which determine the actions that the robot will perform next, is also one of the reasons that explain why robots are difficult to design. To identify the right action that the robot should choose in a given moment, the designer has to predict the long-term effect that actions have on forthcoming observations. This is challenging since, as we will see in the next Chapter, the behavior of a robot is a dynamical system originating from continuous non-linear interactions between the robot and the environment. The behavior of dynamical systems of this kind cannot be predicted, even with a complete knowledge of the robot and of the environment.
To illustrate the role of sensory-motor coordination, let’s consider an animat (i.e. an abstracted robot) living in a circular world divided in 40 cells (Figure 4.2; Nolfi, 2005). The animat is rewarded or punished with scores 1 and -1 for each step taken in the left and in the right side of the environment, respectively. The animat is initially placed in a randomly selected cell. Consequently, to maximize its fitness the animat should stay in the left side and should move from the right to the left side, in case it finds itself located in the right side.
The animat observes only the color of the cell in which it is currently located. There are 20 different colors indicated by 20 corresponding numbers. Each color is present once in the left and once in the right side of the environment, in different positions. At any step, the animat can move one cell clockwise, move one cell anti-clockwise or remain still.
Now let’s try to guess a policy that could enable the animat to maximize its fitness by moving away from the right side and by remaining in the left side. Take your time before continue reading.
When I ask this question during lectures or seminars, listeners usually identify solutions in which the agent determine its actions on the basis of the current and previous observations, i.e. solutions that require to store information from previous observations in memory and later use that information to decide how to act. For example, the animat can solve the problem by circling clockwise through all cells and stopping on cell 10 after having observed cell 9. As there is a single cell 10 preceded by the cell 9, in the clockwise direction, and since this cell is located in the left side, the animat will correctly reach the left side and stop there. The agent will erroneously abandon the left side, when it happens to start from a succeeding cell, but it will re-enter and stay in the left side later on.
Now let’s try to imagine a solution that does not require memory, i.e. a policy only relying on the color of the cell where the animat is currently located only. Is such solution conceivable ? Again, take your time before continue reading.
I asked this question more than fifty times to small and large audience and nobody never identified a solution that does not require memory. Indeed, solutions exist. They actually sound trivial after seeing one of them (see Figure 4.10). Reactive solutions of this type can be encoded by using only 20 bits specifying the clockwise or anti-clockwise reaction of the animat to each of the 20 colors (“stay still” actions are not useful).
After understanding one of the possible solutions, generating similar solutions becomes easy. What it is crucial is that actions are chosen so to avoid the generation of attractors in the right side of the environment. In principle, actions should be also chosen so to produce attractors in the left side of the environment, but this will occur naturally as a consequence of the fact that the sequence of colors in the two sides differ. Attractors correspond to adjacent cells in which the robot react by moving clockwise and anti-clockwise in the first and in the second cells, in the clockwise direction. Solutions of this type exist for other environments as well, independently from the sequence of the colors, providing that the sequence of colors differ in the two sides.
We can indicate solutions of this type as emergent solutions to emphasize the fact that the ability of the animat in solving the problem emerges from the combined effect of how the animat reacts to many observations. The way the animat reacts to a specific color assumes a function only in combination with the way the animat reacts to the other colors.
I was not able to imagine emergent solutions of this type either. However, I observed them in experiments that I performed by evolving robots. For example, I observed a solution of this type in an experiment in which a Khepera robot (Mondada, Franzi & Ienne, 1993), situated in an arena surrounded by walls and containing a cylindrical object, was evolved to find and remain near the cylinder (Video 4.1; Nolfi, 1996).
The robot has 6 sensory neurons encoding the state of the 6 frontal infrared proximity sensors, 5 internal neurons, and 2 motor neurons controlling the desired speed of the two rotating (forward and backward) wheels. The sensory neurons enable the robot to detect obstacles up to 4 cm. The evolving robots are initially placed in randomly chosen positions and orientations. Evaluation episodes are terminated after 500 time steps corresponding to 50s. The fitness of the robots corresponds to the fraction of steps the robot spends near the cylinder.
We might reasonably think that to solve this problem the robot should display the following behaviors: explore the environment, discriminate whether a nearby obstacle is a wall or a cylinder, move away from walls, approach the cylinder, and stop near the cylinder. The evolved robots do indeed explore the environment and move away from walls (see Video 4.1). However, they do not stop near the cylinder. Instead they remain near the cylinder by alternating turn-left and turn-right movements and/or move-forward and move-backward movements.
By analyzing the sensory patterns experienced by the robot I observed that those experienced near walls and those experienced near cylinders largely overlapped in sensory space. In other words, as in the case of the foraging animat described in the previous section, a single observation does not allow discriminating a good location (near the cylinder) from a bad one (near a wall). In principle, these robots could discriminate the class of the obstacle by analyzing how observations vary over time while they approach the obstacle (a strategy that requires to memorize previous observations in internal states and to elaborate the current and the previous observations). However, the evolved robots do not rely on this strategy. Indeed, they solve the problem with the strategy described above regardless of wheather their brain has recurrent connections or not.
The evolved robots rely on an emergent solution analogous to that illustrated in the previous section. This in appreciable in Figure 4.3 showing the translational and rotational movements produced by a typical evolved robots near the walls and near the cylinder. As shown in the top part of the Figure, when the robot approaches a wall it keeps moving toward the object up to a distance of about 25mm and then avoids it by turning right. The robot encounters and avoid three walls in this episode before approaching the cylinder. When the robot approaches the cylinder, instead, it keeps moving toward the object up to a distance of about 10 mm and then starts oscillating by alternating move-forward and move-backward, and turn-left and turn-right actions. In other words, the evolved robots solve the problem by reacting to observations so to generate behavioral attractors near cylinders and not near walls. The presence or the absence of attractors depends on the way the state of the sensors vary near walls and cylinders and on the way in which the robot reacts to observations. By carefully regulating the reaction to observations, the robot ensures that the robot-environmental interaction, mediated by the chosen reaction rules, yield behavioral attractors near cylinders and not near walls.
In other words, the robots can produce different behaviors near walls and cylinders without varying the way in which they react to observations near the two objects by exploiting the fact that the distribution of observations in space vary near walls and cylinders. As we have seen for the Braitenberg’s vehicle illustrated in Figure 2.6, a single control rule can produce qualitatively different behaviors in different environmental circumstances. Consequently, the problem of producing differentiated behaviors in different environmental circumstances admits solutions that rely on a single set of control rules.
Indeed, the solution discovered by the evolved robots is substantially similar to Braintenberg’s Vehicle 3b, i.e. the robot sets the speed of the two motors to a high positive value when it is far from obstacles and reduces the speed of the left and right wheels proportionally to the activation of the right and left proximity sensors, respectively. These simple control rules enable the robots to avoid the obstacles, move away from walls, and explore the environment by alternating “moving straight” behaviors when far from obstacles and “obstacle avoidance” behaviors near obstacles. By carefully tuning the intensity of the inhibitory effect of the 6 infrared sensors on the two motors, the robots also manage also to alter the robot/environmental dynamics near walls and cylinders so to generate a behavioral attractor only near cylinders.
The solutions found in these experiments generalize to different environmental conditions. Indeed, the evolved robots showed the ability to find and stay near the cylindrical object also in post-evaluation tests with larger and smaller cylinders (Nolfi, 1996).
The ability to alter the following observations through actions play a key role in most of the solutions discovered by adaptive robots, regardless the nature and the complexity of the problem.
An example involving a more complex problem is the experiment concerning a simulated iCub robot (Metta et al., 2008) evolved for discriminating spherical and ellipsoid objects on the basis of rough tactile information (Figure 4.4, Tuci, Massera & Nolfi, 2010).
The neural network of the robot includes 22 sensory neurons, 8 internal neurons with recurrent connections, and 18 motor neurons (Figure 4.5). 7 sensors encode the angular position of the seven corresponding DOFs of the left arm and wrist, 5 sensors encode the extension/flexion of the fingers, and 10 sensors encode the state of the tactile sensors located on the palm, on the second phalange of the thumb, and on the distal phalange of each other finger (each sensor binarily encodes the presence/absence of pressure). The first 14 motor neurons encode the torque produced by seven couples of antagonistic muscles controlling the seven DOFs of the arm and of the wrist. The 2 following motor neurons encode the desired extension/flexion of the thumb and of the four fingers. Finally, the last 2 motor neurons are used to indicate the category of the object. The robot is free to determine the convention used to indicate spheres and cylinders providing that it produces differentiated activation patterns for the two types of object (see Tuci, Massera & Nolfi, 2010 for details).
The robots are rewarded for discriminating the category of the current object. Moreover, they receive a smaller reward for touching the object. This second fitness component encourages the robot to enter in contact with the object since it constitutes a pre-requisite for discriminating the category of the object. The fitness does not specify how the object should be manipulated. To solve the discrimination problem the robots should develop relatively complex abilities: they should appropriately control an articulated arm with many DOFs, and they should discriminate similar objects on the basis of low-resolution tactile sensors.
Video 4.2 shows how evolved robots manage to correctly discriminate the category of the object in the large majority of cases. The ability acquired is also robust with respect to variations of the relative positions of the object to be discriminated.
The analysis of the evolved robots indicates that the categorization processes involve three phases. In the first phase, the robot manipulates the object by wrapping it with its fingers and by moving the object until a suitable hand/object posture is reached. The information contained in the tactile stimuli experienced during this phase increases progressively by finally reaching a high informative value when the hand/object couple converge on a posture that remains substantially stable during the rest of the episode (Figure 4.7). During the second phase, the robot starts to indicate whether it believes that the object is an ellipsoid or a sphere, and keeps integrating the information extracted from observations to eventually revise its discrimination decision. The probability that the discrimination answer is modified, however, become progressively lower over time. This leads to the third and final phase in which the discrimination output remains fixed regardless the nature of the new observations.
Figure 4.6 shows the probability that the observations collected during a typical episode lasting 400 steps are experienced interacting with an ellipsoid or with a sphere. At the beginning of the episode the observations does not contain any information. Later on, while the robot manipulate the object, the amount of information provided by the observation increases. During the second phase, which in this case starts approximately at step 200, the observations indicate that the object belongs to one of the two categories with a confidence of about 80%. This means that single observations provide information on the category of the object but can be misleading. The robots overcome this problem by carrying the discrimination process over time, i.e. by summing the partially conflicting evidences extracted from observations over time until the evidences collected allow performing the categorization with a sufficiently high confidence. For a discussion of the role of continuous processes of this type in the context of human behavior see Spivey (2007).
Overall, this analysis indicates that the evolved robots produce actions enabling them to later experience observations that include cues on the category of the object. The reliability of the discrimination process is ensured by summing over time the evidences collected from multiple observations.
We will now see how adaptive robots can use actions to store relevant information in the environment and later exploit such information to achieve their goals. Storing information in the environment, instead than in the robots’ brain, represents a form of cognitive offloading since it enables to offload the brain from the need to store, update and retrieve information. Moreover, we will see how sensory-motor coordination can produce solutions that anticipate future needs, i.e. solutions in which the execution of the actions that need to be performed at a certain point is preceded by the execution of preparatory actions.
An example of cognitive offloading performed by Tetris players consists in rotating the pieces to check the perceptual match with unfilled spaces rather than rotating them mentally, which is slower and less reliable. Another example consists in crossing the fingers to remember that something needs to be done (Gilbert, 2015). An example of anticipatory behavior, in the context of a reaching and grasping behavior, consists in assuming the right hand posture before the beginning of the grasping phase (Hofsten & Ronnqvist, 1988).
The experiment that can be used to illustrate these aspects involves a simulated MarXbot robot (Bonani et al., 2010) evolved to navigate in a double T-maze environment (Figure 4.7; Carvalho & Nolfi, 2016). The robot, that is initially located in an area at the bottom of the central corridor, with a random position and orientation, should travel toward the blue cylinder that is located in one of the four possible ends of the double T-Maze. The right destination is marked by two green beacons located in the central corridor. For example, if the first green beacon is located on the right and the second on the left as in the case of Figure 4.7, the robot should turn right and then left at the two junctions of the T-maze to reach the right destination. The robot thus faces a time delay problem in which the sensory information experienced at time t should influence the action of the robot at time t+n.
The brain of the robot includes 8 sensory neurons encoding the state of the infrared sensors, 8 visual neurons encoding the percentage of green and blue pixels detected in the 4 frontal sectors of the camera, and 2 motor neurons controlling the speed of the robot’s wheels. The robot can visually detect the green beacons and the blue target only when in their proximity due to the visual occlusions caused by the walls. The experiment has been repeated in two conditions in which the robot is provided with a feed-forward or with a recurrent neural network controller. In the first condition the robot cannot preserve information about previous observations in internal states. The robot is rewarded with 1 point for every successful episode.
As shown in Figure 4.8, the evolved robot manages to navigate successfully to the right destination most of the times. The analysis of the trajectories produced by varying the position of the beacons and the target destinations permits to understand the strategy used by the robots to solve the problem. The trajectories first converge in the initial portion of the central corridor and then diverge when the robot pass near the beacons. The initial convergence enables the robot to reduce the differences caused by the fact that the initial positions and orientation of the robot vary randomly. The divergence, instead, enables the robot to enter into one of four separated basins of attraction of the robot/environmental dynamics leading to the correct destination.
The observed behavior implies that the robot does not need to store the position of the green beacons in memory and exploit the stored information to decide whether to turn left or right at the junctions. Moreover, it implies that the robot does not need to recognize whether it is located at the first or at the second junction. The robot “stores” the information conveyed by the position of the green beacons in its relative position with respect to the maze walls. Specifically, it “stores” the information by assuming a relative distance from the left and right wall of the corridor that varies depending on the position of the beacons and that anticipates the direction in which the robot should turn at the next junction. Consequently, when the robot arrives at a junction, it can decide the direction of turning on the basis of its position and orientation with respect to nearby walls. In other words, the robot reacts to the perception of the green beacons by varying its relative position in the corridor, maintains the assumed position, and later use the position to turn in the direction indicated by the position of the green beacons experienced earlier.
At this point we might wonder whether robots can discover the possibility to store information into internal states and later use this information to act appropriately. The results of a second set of experiments in which the robots were equipped with recurrent networks and the positions of the robot was perturbed during motion indicate that evolving robots are indeed able to develop cognitive strategies relying on memory. Consequently, the development of the strategy described above cannot be ascribed to the inability of the robots to develop alternative strategies.
In this new set of experiments, the robots are suddenly moved on their left or on the right with a given low probability. In the presence of this form of perturbations, the robots cannot longer solve their problem by using the strategy described above, or at least by using only the strategy described above, since the information stored in the robot’s position is cancelled by the displacement of the robot.
As shown in Figure 4.9, the robot managed to solve the problem also under these conditions. The analysis of the trajectory shows that it keeps using the cognitive offloading strategy. It however extends the strategy with the ability to store information about the beacons in internal states and to use this information to re-enter into the right basin of attraction after position perturbations.
These results demonstrate how reactive strategies, that use sensory-motor coordination to store the relevant information in the environment, and cognitive strategies, that store relevant information in internal states, do not represent alternative solutions and can rather be combined to maximize the effectiveness of the robot’s behavior.
I like to conclude this Chapter by going back to the notion of situatedness.
As for embodiment, we can define situatedness as a binary property that discriminates systems that are situated in an external environment, with which they interact, from systems that are not. This binary definition is useful since it allow distinguishing qualitatively different systems, e.g. autonomous robots from expert systems.
The term situatedness, however, can also be used to capture another property that is quantitative in nature. It can be used to indicate to what extent a system that is situated exploits the possibility to alter its perceptual and/or physical environment through actions. From this perspective, autonomous robots can be more or less situated. As for embodiment, the higher the level of situatedness is, the greater the chance that the robot will operate effectively is. Moreover, the higher the level of situatedness is, the higher the level of computation that is performed as a result of the interaction between the robot and the environment is, and consequently, the smaller the amount of computation that need to be performed internally by the brain is.
Read the section 13.7 and familiarize with the evorobotpy2 environments that permits to evolve wheeled robots. Follow the instruction included in the Exercise 6 to replicate the experiment described in Section 4.4. Analyze the behaviors obtained by robots that have two additional infrared sensors on the rear side and a recurrent neural network policy.
Bonani M., Longchamp V., Magnenat S., Retornaz P., Burnier D., Roulet G., Vaussard F. & Mondada F. (2010). The marXbot, a miniature mobile robot opening new perspectives for the collective-robotic research. IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, 2010, pp. 4187-4193, doi: 10.1109/IROS.2010.5649153.
Carvalho J.T. & Nolfi S. (2016). Cognitive offloading does not prevent but rather promotes cognitive development. PLoS ONE. 11(8): e0160679.
Gilbert S.J. (2015). Strategic offloading of delayed intentions into the external environment. The Quarterly Journal of Experimental Psychology, 68 (5): 971-992.
Metta, G. Sandini G., Vernon D., Natale L. & Nori F. (2008). The iCub humanoid robot: an open platform for research in embodied cognition. In Proceedings of the 8th workshop on performance metrics for intelligent systems, pp. 50-56.
Mondada F., Franzi E. & Ienne P. (1993). Mobile Robot miniaturisation: A tool for investigation in control algorithms, in: Proceedings of the Third International Symposium on Experimental Robotics, Kyoto, Japan
Nolfi S. (1996). Adaptation as a more powerful tool than decomposition and integration. In: T.Fogarty and G.Venturini (Eds.), Proceedings of the Workshop on Evolutionary Computing and Machine Learning, 13th International Conference on Machine Learning, University of Bari, Italy.
Nolfi S. (2005). Categories formation in self-organizing embodied agents. In H. Cohen & C. Lefebvre (Eds), Handbook of Categorization in Cognitive Science. Oxford, UK: Elsevier.
Spivey M. (2007). The Continuity of Mind. New York: Oxford University Press.
Tuci E., Massera G. & Nolfi S. (2010). Active categorical perception of object shapes in a simulated anthropomorphic robotic arm, Transaction on Evolutionary Computation Journal, (14) 6: 885-899.
von Hofsten C. & Ronnqvist L. (1988). Preparation for grasping an object: A developmental study. Journal of Experimental Psychology Human Perception and Performance, 14 (4): 610-621.