© Stefano Nolfi, 2021 | How to cite this book | Send your feedback | Collaborate
Index Next Chapter
In this chapter we will see how groups of evolving robots invent a communication system from scratch and use it to coordinate and cooperate. The evolved communication systems can include symbols, i.e. conventional signs analogous to the vocalization or the gestures used by humans and by some animals to indicate an entity or an event to another individual or to solicit an individual to perform an action. Moreover, we will see how a robot can acquire the ability to comprehend a restricted subset of a human language and use it to cooperate with a human user.
Disembodied neural networks can be trained to produce and/or comprehend human language. For example, they can be trained to translate a text from one language to another or to describe the content of pictures or videos by producing sentences in human language. The level of linguistic competence that these systems can acquire, however, is constrained by the limits of their interaction modalities. A translator system, which experiences and produces written text only, would not be able, for example, to ground (Harnad, 1990) the meaning of the word dog in the perceptual features of dogs. Similarly, a disembodied system trained to describe images in natural language would not be able to ground the meaning of the word chair in the proprioceptive and tactile information that can be experienced by sitting on a chair. A rich meaning of this kind, analogous to the meaning that we humans associate to the word chair, can only be acquired by a system that is able not only to visually observe chairs but also to physically interact with them. For embodied systems of this type a chair is, above all, an object on which you can sit, i.e. something that supports your body when you sit on it and relax the muscles of your legs. Indeed, tree stump or rigid boxes, affording a sitting behavior, can be appropriately conceptualized as chairs, despite their lack of the visual features typical in canonical chairs. Overall this implies that robots capable to interact with the external environment can potentially acquire deeper linguistic competences than disembodied systems.
For this reason, in this chapter we will discuss not only how adaptive robots can develop communication skills. We will also discuss how robots can develop integrated behavioral and communication skills. Moreover, we will analyze how the development of communication skills affects the development of behavioral skills and vice versa. For a review of research studying the evolution of communication and language in robots and the usage of robots to model natural evolution see Nolfi & Mirolli (2010) and Cangelosi & Parisi (2012).
In addition, we will use the experiments described in this chapter to better clarify the factors that promote the evolution of cooperative behaviors in groups of robots and the importance of the multi-level and multi-scale organization of behavior and cognition in robots.
Communication can evolve spontaneously in group of robots that are rewarded for the ability to carry on tasks requiring coordination and/or cooperation and which have the possibility to alter the perceptual environment of the other robots. This can be illustrated by a simple experiment in which groups of 10 MarXBot robots (Bonani et al., 2010) are evolved for the ability to forage by approaching and remaining near the food source and by avoiding the poison source (Video 9.1; Floreano et al., 2007; Mitri, Floreano & Keller, 2009). The food and poison sources correspond to the red objects located in the bottom-left and in the top-right part of Video 9.1.
The need to cooperate originates from the need to discriminate food and poisons source by distance. They can be discriminated by individual robots located over them from the color of the ground but they are visually identical by distance. The possibility to alter the perceptual environment of the other robots is ensured by the fact that the robots have a ring of LEDs which can emit green or blue light.
The robots are provided with a neural network controller with 12 sensory neurons, that encode the color of the ground and the intensity of green and blue light perceived over the frontal-left, frontal-right, rear-left and rear-rights sectors of the omnidirectional camera. Moreover, they have 3 motors neurons that control the desired speed of the differential driving systems and the color of the LEDs.
The robots are rewarded and punished with +1 and -1 for every time step spent close to food and poison sources, respectively. They are not rewarded for varying the color of their LEDs or for varying their motor behavior as a function of the colors signals that they perceive. In other words, the robots do not receive any reward for producing signals, for learning to react to signals appropriately, and for producing the behaviors that are necessary to carry on the task. This however does not prevent the evolution of the required communicative and behavioral skills. The required abilities are discovered by the adapting robots since they are instrumental to approach the food and avoid the poison.
Video 9.1 shows the behavior of the robots at the end of an evolutionary process. As you can see the robots manage to solve the task by displaying suitable motor behaviors, by varying the color of their LEDs in a meaningful way, and by reacting to the perception of color signals appropriately. In particular, in the case shown in the video, the robots turn their LEDs in blue near the food source and react to the perception of the blue robots by approaching them. This enables the robots to infer the location of the food and to navigate toward it.
In other replications of the experiment, the robots turn their LEDs in green near the food source and in blue far from it. This happens since choosing the blue or the green color to communicate the position of the food is clearly arbitrary and functionally equivalent. In other experiments, instead, the robots assume one color near the poison source and a different color elsewhere. This communication strategy is less efficient than the previous type. Indeed, some of the experiments that first discover the second communicattion strategy later switch to the first communication strategy.
As illustrated by the authors, however, an effective communication system of this kind evolves only when the group of robots is homogeneous and/or when the fitness is computed at the level of the group (Figure 9.1).
The evolution of a collective behavior can be realized in three possible ways. The first possibility consists in creating the group of robots from a single genotype and in computing the fitness of such genotype on the basis of the performance of the group (Figure 9.1, top). In this case the group is homogeneous, i.e. it is formed by individuals sharing the same characteristics, and the fitness is computed at group level, i.e. the fitness of the genotype is the sum of the fitness of the robots forming the group. The second possibility consists in creating the group of N individuals from N different genotypes and in attributing to each genotype the sum of the fitness of the robots forming the group (Figure 9.1, center). In this case the group is heterogeneous and the fitness is computed at group level. The third possibility consists in creating the group of N individuals from N different genotypes and in computing the fitness of each genotype on the basis of the performance achieved by the corresponding individual robot (Figure 9.1, bottom). In this case the group is heterogeneous and the fitness is computed at individual level.
Effective communication emerges only in the first two cases, in which the groups is homogeneous and/or in which the fitness is computed at group level. In the third case, in which the group is heterogeneous and the fitness is computed at individual level, there is a conflict of interest between the individuals composing the group. Consequently, the robots do not tend to evolve cooperative communicative skills. On the contrary, in the third case, the robots evolve deceptive communication, i.e. learn to exploit the tendency of the other robots to approach or avoid a color by generating the light signals soliciting the execution of the wrong behaviors (approaching the poison and avoiding the food). More precisely, these experiments lead to continuous alternation of poor deceptive communication phases followed by poor cooperative communication phases (Mitri et al., 2009). Both phases are unstable since the development of deceptive communication leads to the retention in successive generations of variations that eliminate the production of inappropriate behaviors in response to the deceptive signals. Moreover, the un-intended production of signals, that can be used to infer the position of food or poison from a distance, creates the adaptive condition for the development of deceptive communication, i.e. for to the retention of variations producing deceptive signals in successive generations.
We can now consider a slightly more complex experimental setting that permits observing how a communication system originates and evolves and how the development of communication abilities facilitates the development of behavioral abilities and vice versa.
The experiment (De Greef & Nolfi, 2009) involves two ePuck robots (Mondada et. al, 2009) situated in an environment containing a foraging area and a nest which are rewarded for the ability to find the two areas and to navigate back and forth between them (see Video 9.2). As in the case of the previous experiment, the need to cooperate originates from the fact that the target areas can be perceived only in their proximity by means of the color of the ground.
The neural network policy of the robots includes 2 motor neurons, encoding the desired speed of the two actuated wheels, and 1 motor neuron that controls the frequency of the sound emitted by the speaker of the robot.
From the sensory point of view the robots are provided with: 8 infrared sensors distributed around the body allowing them to detect the presence of nearby obstacles, 2 ground sensors detecting the color of the ground under the robot, a sound sensor detecting the frequency of the sound produced by the other robot, and 3 visual sensors detecting the intensity of the red light perceived in the left, frontal, and right sector of the camera. The red color provides information on the relative positions of the other robot since the robots are surrounded with a stripe of red paper. Due to the limited capability of the ePuck robot processor, the sound is transmitted and received through the wireless connection.
Overall, this implies that a robot has the ability to affect the perception of another robot by moving in the environment (so to alter the visual image perceived by the camera of the other robot) and by emitting sounds with varying frequency (so to alter the sound state of the microphone of the other robot).
The robots are rewarded with 1 point every time they are concurrently located in the two areas for the first time or after they move from one to the other area, i.e. after the robot located in the white area moves to the black area and vice versa. The group of robots is homogeneous and the fitness is computed at the group level. As in the case of the previous experiment, the robots are not rewarded for communicating or for producing any specific behavior.
Video 9.2 (top) shows an example of evolved behavior in simulation. At the beginning of the evaluation episodes the robots explore the environment by avoiding the walls and by moving straight in the attempt to find the black and white areas. The robots are initially situated in a random position with a random orientation outside target areas. Once they reach the two areas, they coordinate their behavior so to navigate toward each other and consequently toward the area previously occupied by the other robot.
Video 9.2 (bottom) shows the behavior of the same robots in hardware and includes an audio with the sounds produced. The robots generate three classes of sounds with different frequencies in the black, white, and neutral areas. The frequency of the sounds also varies slightly inside the black and white areas as a function of the time passed inside the area.
Video 9.3 shows the behavior obtained in another replication of the experiment. Much like the previous cases, the robots avoid obstacles and move straight outside the areas to explore the environment and find the target areas. In this replication, however, the robots move along the border of the areas after entering in them. Moreover, they exchange their positions in two phases. First, the robot located on the white area exits and navigates toward the center of the black area. Then, the robot located in the black area exits and navigates toward the center of the white area. Do notice how the robots manage to navigate toward the center of the areas even though they are unable to perceive the areas from a distance and are unable to perceive the center of the areas. In the case of this second replication of the experiments, the robots emit sounds with two well-differentiated frequency only (Video 9.3, bottom). More specifically, they produce a low frequency sound when they are located inside the black area and a high frequency sound outside the black area. Differently from the solution found in Video 9.2, the frequency of the sound produced by the robots remains almost constant, i.e. it does not depend on the time spent inside areas. These two solutions are representative of the solutions founds in the other replications of the experiments.
We can now analyze how the solutions shown in Video 9.3 is developed during the evolutionary process (see Figure 9.2).
The analysis of the behavior exhibited by the robot during the first generations shows that, at this stage, they display two simple behaviors that consisting in moving-forward and avoiding obstacles. These two behaviors are produced when the infrared sensors are on or off, respectively. The combination of these two behaviors allows the robots to explore the environment and to occasionally receive a 1 point reward when the two robots happen to cross over the two areas concurrently.
After few other generations, the robots develop a new behavior consisting in moving by turning on a side when the ground sensor detects the black area. This produces a remain-in-the-black-area behavior that is realized by moving along a circular trajectory inside the area. This new behavior increases the probability to be rewarded since the reward is collected as soon as the other robot enters into the white area.
The development of this new behavior naturally leads to the production of two differentiated sounds signals: (i) a high frequency sound that is produced when the robot moves straight outside black target areas, and (ii) a low frequency sound that is produced when the robot moves in circles inside the black area. This happens since the neural network policy tends to spontaneously produce different signals during the production of different behaviors (a phenomenon indicated with the term production bias, see Mirolli and Parisi, 2008). The state of the internal neurons, which influences both the motor neurons controlling the wheels and the motor neuron controlling the sound produced, is necessarily different during the production of the two behaviors. Consequently, the sound produced tends to also differ during the production of the two behaviors.
The production of two different sounds inside and outside black areas does not have any utility initially, since the robots are not yet able to react to these sounds appropriately. However, it creates the adaptive conditions for the development of the following motor behaviors afforded by the signals:
The development of these three additional behaviors enables the robots to solve their adaptive task relatively well at an intermediate state of the evolutionary process. They explore the environment until they find the black or white area, remain in the area in which they are located (unless they are both located in the same area) until the other robot enters in the other target area, and then exit from the areas. However, the robots are not yet able to navigate directly to the other target area. After exiting the area, they just resume their exploration behavior.
The problem of navigating directly toward the other area is challenging for robots which do not perceive the areas from a distance. On the other hand, the problem can be solved by re-using and adapting the behavioral and communicative skills developed in previous generations.
The first modification introduced for this purpose consists in exiting from the white area only when the robots visually perceive the other robots. The condition that triggers the exit-from-the-whote-area behavior thus becomes the perception of the low frequency signal and the visual perception of the other robot. This permits to exit and to navigate toward the other robot which is located in the other area. This new strategy permits to reach the destination most of the times but fails occasionally when the other robot is located on the border of the black area. Once again, however, the development of this new skill creates the condition for the development of further skills. The relative position assumed by the robot located in the black area now matters in an evolutionary sense, since it determines the direction of the robot exiting from the white area. This leads to the second adaptive modification which alters the way in which the robots remain in the black area. They now do so by moving along the border of the area and by stopping on a side when they visually perceive the other robot (see Video 9.3). This enables the robots located in the black area to stay on the left side of the area, with respect to the other robot. This, in turn, enables the robot exiting from the white area to navigate toward the center of the black area.
The development of the move-on-the-border-and-look-robot behavior, obtained by modifying the remain-on-the-black area behavior developed in previous generations, also solves the problem of navigating directly from the black area to the white area. This happens since the execution of the move-on-the-border-and-look-robot behavior implies that the robot located in the black area remains oriented toward the center of the white area. In other words, the execution of this behavior prepares the robot located in the black area to exit toward the center of the white area.
Overall, the development of all these behaviors leads to the optimal solutions shown in Video 9.3.
There are several aspects that are worth noticing in this experiment. The first is that evolution produces a progressive expansion of the behavioral and communication skills of the robots. This is due to the fact that progress is often achieved through the development of new skills and to the fact that old skills tend to be preserved.
The tendency to preserve old skills can be explained by considering that they tend to assume new functions, in addition to the original function they were developed for. Such additional functions derive from the fact that they often support new functional behaviors developed after them. As an example, consider the remain-on-black-area behavior and the associated production of low frequency sounds in the black area. This behavior not only increases the chance that the robots are rewarded once, i.e. its original function. It also supports the remain-in-white-area, the exit-from-white-area, and the exit-from-black area behaviors developed later that permit obtaining multiple rewards during an evaluation episode. This implies that the probability to retain variations that eliminate the remain-on-black-area behavior is extremely low. The only variations that can be retained are those that vary the way the behavior is realized while preserving its functions. This is the case of the variations leading to the development of the move-along-the-border-and-look-the-other-robot behavior, which preserves the original function (remaining on the black area) and enables it to play a new function (by enabling the other robot to infer the position of the center of the black area).
A second important aspect is that the expansion of the behavioral and communicative skills of the robots increases the evolvability of the robots, i.e. of the propensity of the robots to further improve their skills. This is due to the fact that new skills can often be produced by re-using old skills. Consequently, the larger the behavioral repertoire of the robots is, the greater the chance than they will develop new skills is. This positive feedback can potentially produce a snowball effect leading to a progressive acceleration of the adaptive capabilities of the robots.
The third aspect to notice is the strict interdependence between behavioral and communication skills. As we have seen, the development of new signals often supports the development of new behaviors and the latter, in turn, might support the development of new signals. This produces solutions with strongly integrated behavioral and communicative skills, in which the development of new behavioral and communication abilities supports each other producing a form of mutual scaffolding (see also Cangelosi et al., 2010).
Robots can also acquire the ability to comprehend and/or produce human language and use it to cooperate with a human agent. For instance, a robot can be trained for the ability to interact with a human in order to satisfy her/his requests as they are expressed in natural language.
An example of how this can be realized is illustrated in an experiment reported by Tuci et al. (2011) in which an iCub humanoid robot (Metta et al., 2008) was trained for the ability to comprehend and accomplish natural language requests such as “touch the red object” or ”grasp the blue object”.
As shown in Video 9.4, the robot is situated in front of a table containing three objects of different colors. The neural network policy of the robot receives as input the sensory information extracted from the robot’s camera, the proprioceptive information encoding the current positions of the joints of the left arm/hand, the tactile information collected by contact sensors located on the fingers and on the palm of the hand, and the command expressed in human language. The command is composed of an action word, selected from the set [“indicate”, “touch” and “grasp”], and an object word, selected from the set [“red object”, “green object”, and “blue object”]. For the sake of simplicity, the words forming the linguistic commands are not recognized from speech and instead encoded directly in the state of six input neurons encoding the six corresponding words. The motor neurons of the policy network encode the desired position of the 7 DOF of the left arm, the 2 DOF of the head, the 2 DOF of the torso, and of 2 DOF that control the extension/flexion of the thumb and the other four fingers.
The 3 action and the 3 object words can be combined to form 9 different commands. However, to verify the ability of the robots to generalize, the authors trained the robots with 7 commands only and then post-evaluated the trained robots with all the 9 commands. In addition, to verify whether the composition of the behavior set was relevant, the authors replicated the experiment in a control condition in which the action word “indicate” was substituted with “avoid”.
The connection weights of the policy network are encoded in a population of genotypes and evolved. During training, the robots were evaluated for 7 episodes experiencing the 7 different commands. The robots were rewarded for the ability to produce a behavior appropriate to the current command, i.e. for the ability to bring the hand over the right object in case of indicate commands, for the ability to touch the right object in case of touch commands, and for the ability to grasp the right object in case of grasp commands. In the control experiment, the robots were rewarded for moving the hand away from the object in case of avoid commands. In other words, the robots were rewarded for the ability to produce seven different behaviors and for the ability to comprehend the meaning of the sentences by executing the appropriate behavior.
The evolved robots manage to understand the 7 commands experienced during training by exhibiting 7 corresponding behaviors in all replications of the experiment (see Video 9.4 for an example of the behaviors produced). Moreover, in some of the replications, the robots demonstrate the ability to understand also the two commands that were not used during training by producing the appropriate corresponding behaviors. In other words, the robots display an ability to comprehend the meaning of the individual words and to combine these meanings in a compositional manner. This ability to generalize is never observed in the control experiments performed with the “avoid” action word.
The analysis of the solutions discovered by the robots in the different replications of the experiment shows that the robots which generalize solve the task by developing a set of elementary behaviors and by combining these elementary behaviors to produce the 9 different required behaviors.
More specifically, the robots which understand all 9 commands display the ability to produce the following 5 elementary behaviors:
The object word triggers the appropriate indicate behavior, which takes the hand of the robot over the appropriate object. Then the touch or grasp words trigger the corresponding behaviors. The indicate word does not trigger any additional behavior.
The reason why the adaptive process discovers combinatorial solutions of this type in the standard experimental condition, is that these solutions require simpler control rules than non-modular solutions, which would treat each of the 7 language commands independently.
The reason why combinatorial solutions do not emerge in the control condition with the “avoid” action word is that in this case the level of overlapping between behaviors is smaller. Consequently, the advantage that can be gained by generating the 7 required behaviors by recombining a smaller set of elementary behaviors is smaller.
This experiment extends the work of Sugita & Tani (2005) who demonstrated for the first time the possibility to acquire compositional and integrated behavioral and language comprehension capabilities in robots.
Before concluding, let’s discuss some of the implications of the complex system nature of robots’ behavior (Nolfi, 2009). The behavior exhibited by a robot often display a modular organization with somewhat semi-discrete and semi-dissociable sub-behaviors playing specific functions. The organization of these behaviors extends over multiple levels since higher-level behavior lasting for longer time periods often originate from the sequences of lower-level behaviors, lasting for shorter time periods (Figure 9.3).
This multi-level and multi-scale organization has several advantages:
Read Section 13.13 and make the Exercise 11 to learn to implement back-propagation.
Bonani M., Longchamp V., Magnenat S., Retornaz P., Burnier D., Roulet G., Vaussard F. & Mondada F. (2010). The marXbot, a miniature mobile robot opening new perspectives for the collective-robotic research. IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, 2010, pp. 4187-4193, doi: 10.1109/IROS.2010.5649153.
Cangelosi, A., Metta, G., Sagerer, G., Nolfi, S., Nehaniv, C., Fischer, K., ... & Zeschel, A. (2010). Integration of action and language knowledge: A roadmap for developmental robotics. IEEE Transactions on Autonomous Mental Development, 2(3), 167-195.
Cangelosi A. & Parisi D. (2012). Simulating the evolution of language. Springer Science & Business Media.
De Greef J. & Nolfi S. (2010). Evolution of implicit and explicit communication in a group of mobile robots. In S. Nolfi & M. Mirolli (Eds.), Evolution of Communication and Language in Embodied Agents. Berlin: Springer Verlag.
Floreano D., Mitri S., Magnenat A. & Keller L. (2007) Evolutionary conditions for the emergence of communication in robots. Current Biology 17:514-519.
Harnad S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1-3): 335-346.
Metta G., Sandini G., Vernon D., Natale L. & Nori F. (2008). The iCub humanoid robot: an open platform for research in embodied cognition. In Proceedings of the 8th workshop on performance metrics for intelligent systems, pp. 50-56.
Mirolli M. & Parisi D. (2008). How produced biases can favor the evolution of communication: An analysis of evolutionary dynamics. Adaptive Behavior 16: 27-52.
Mitri S., Floreano D. & Keller L. (2009). The evolution of information suppression in communicating robots with conflicting interests. Pnas, 106, 15786–15790.
Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., ... & Martinoli, A. (2009). The e-puck, a robot designed for education in engineering. In Proceedings of the 9th conference on autonomous robot systems and competitions (Vol. 1, No. CONF, pp. 59-65). IPCB: Instituto Politécnico de Castelo Branco.
Nolfi S. (2009). Behavior and cognition as a complex adaptive system: Insights from robotic experiments. In C Hooker (Ed.), Handbook of the Philosophy of Science. Volume 10: Philosophy of Complex Systems. General editors: Dov M. Gabbay, Paul Thagard and John Woods. Elsevier.
Nolfi S. & Mirolli M. (2010). Evolution of Communication and Language in Embodied Agents. Berlin: Springer Verlag.
Sugita Y. & Tani J. (2005). Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adaptive Behavior, (13) 1: 33–52.
Tuci E., Ferrauto T., Zeschel A., Massera G. & Nolfi S. (2011). An Experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots, IEEE Transactions on Autonomous Mental Development, (3) 2:176-189.