Behavioral and Cognitive Robotics
An adaptive perspective

Stefano Nolfi

© Stefano Nolfi, 2021   |   How to cite this book   |   Send your feedback   |   Collaborate

Index Next Chapter

8. Swarm Robotics

8.1 Introduction

In the previous chapters we have discussed problems involving a single robot. In this chapter, instead, we will consider problems concerning multiple robots or robotic swarms, i.e. group composed by several individuals. By properly interacting, multiple robots can achieve goals which isolated individuals would be unable to achieve (Dorigo, Birattari & Brambilla, 2014).

Multiple cooperating robots can display the following desirable properties (Camazine et al., 2001):

For these reasons, swarm robotics is particularly interesting for dangerous application such as demining or search and rescue, where the risk of losing robots is high.

The behavior of the individuals forming a swarm is not organized by a central entity that dictates instructions to individuals. Rather, it arises from the complex nonlinear dynamics of local interactions occurring in a distributed and decentralized way.

8.2 Reynold’s boids

Pioneering research in this area was conducted by Craig Reynolds (1987) who designed swarms of simple agents, called boids, capable of producing flocking behaviors analogous to those exhibited by bird flocks and fish schools.

Each boid has an independent brain and observes the position and orientation of nearby boids, i.e. those boids located within the maximum Euclidean distance indicated by the grey circle in Figure 8.1. Consequently, boids react only to nearby flockmates. Each boid is provided with an actuator that permits varying its orientation and moves at a constant speed.

Figure 8.1. Exemplification of the control rules used by Boids. Boids are represented by isosceles triangles oriented in the direction of the corner opposite to the base. The grey circle contains the flockmates that can be perceived by the black boid located at the center. The red arrows represent the steering behavior generated by the separation, alignment, and cohesion rules on the basis of the position and orientation of nearby boids, shown in blue. Distant boids, shown in white, are ignored. Each boid steers in the direction of the resultant of three vectors calculated with the three corresponding rules. (Adapted from

Each boid turns in the direction of the resultant of the three steering vectors calculated by the following rules:

A large collection of boids operating on the basis of these three simple rules produces a flocking behavior similar to that displayed by flocks of birds (Video 8.1).

Video 8.1. The behavior produced by a swarm of boids (, by Gavin Wood, see also

Some characteristics of the overall behavior exhibited by the swarm can be tuned by varying the relative strength of the steering forces produced by the separation, alignment and cohesion rules. For example, the reduction of the relative strength of the cohesion rule increases the probability that the swarm divides in multiple flocks and re-join into a single flock later on.

The behavior of the swarm can also be enriched by using additional steering rules. For example, the addition of an obstacle-avoidance and of a navigation rule, which consist respectively in steering away from nearby obstacles and toward the target destination, produces a flocking behavior in which the swarm of boids avoid obstacles and moves in a coordinated manner toward the target (Video 8.2).

Video 8.2. The boids approach a moving target in an environment including an obstacle (, by Janne Karhu, see also

Since the 2002 “Batman returns” movie directed by Tim Barton, the method is commonly used to animate collective behaviors in animation movies. 

8.3 Self-organization

The above described flocking behavior also illustrates another interesting property of collective systems: self-organization. We can define self-organization as the spontaneous formation of spatial, temporal, or spatiotemporal structures or functions in systems composed of many interacting elements. The structure (e.g. the flock) emerges from local interactions among the individuals without the intervention of external directing influences. Moreover, the emergent structure is typically robust with respect to external perturbations and capable to adapt spontaneously to environmental variations.

Self-organization arises as a result of multiple interactions among agents, fluctuations, and positive and negative feedbacks (Bonabeau, Dorigo & Theraulaz, 1999). Multiple interactions among agents are an obvious requirement. Fluctuations are necessary to explore new states or configurations. Positive and negative feedbacks are necessary to modulate fluctuations, and to promote coordination among individuals. More specifically, positive feedback plays the role of amplifying a trend that originates from a minor random deviation and grows as a result of a sort of snowball effect. Negative feedback, on the other hand, plays the role of damping and/or containing the amplification of the deviation. Overall fluctuations regulated by positive and negative feedbacks enable the swarm to explore new configurations by abandoning undesirable states and by prolonging desirable ones.

Self-organized behaviors are ubiquitous in groups of social biological systems. Video 8.3, for example, illustrates a living bridge that emerges spontaneously as a result of the behaviors of a colony of ants that move back and forth between a food source and a nest. The living bridge, which is constituted by ants supporting the passage of other ants, permits to create a convenient shortcut. The bridge is progressively transposed and elongated so to minimize the travelling distance between the nest and the foraging area (Reid et al., 2015).

Video 8.3. Living bridge built by army ants to create shortcut (, by Christopher R. Reid, Matthew J. Lutz, Simon Garnier, SwarmLab, New Jersey Institute of Technology, USA; Reid et al., 2015)

8.4 Self-organized path formation in a swarm of robots

Self-organizing properties are also commonly observed in swarm of robots evolved for the ability to perform some function.

Let’s consider for example the experiment illustrated in Figure 8 in which a group of 10 e-puck robots are evolved for the ability to forage by finding the nest and the foraging area and by moving back and forth between the two locations (Sperati, Trianni & Nolfi, 2011). The swarm of robots is rewarded with 1.0 point every time a robot enters in one of the two areas for the first time or enters in one area after visiting the other area.

Figure 8.2. (a) The e-puck robot equipped with the colored LED communication turret. (b) A schematic representation of the field of view of the camera of the robot.  The white and grey circles indicate the positions of the LEDs which can be turned on in blue and red, respectively. (c) The neural network architecture. (d) Schematization of the environment. The two gray circles represent the areas. The distance between them is selected randomly within [70, 150] cm at the beginning of evaluation episodes (Sperati, Trianni & Nolfi, 2011)

The robots include infrared sensors, ground sensors, a color camera, two motorized wheels, and a LEDs ring. In this experiment the frontal and rear LEDs can be turned on in blue and red, respectively. The camera has a view range of 144 degrees and can detect objects up to 35 cm, approximately, due to its limited resolution and to the noise added on the state of the simulated sensors (Figure 8.2b).

Each robot has a neural network with 13 sensory neurons, 3 internal neurons, and 4 motor neurons (Figure 8.2c). The sensory neurons encode the state of 8 infrared sensors, 1 ground sensor, and 4 visual sensors encoding the intensity of red and blue colors perceived on the frontal-left and frontal-right portions of the robot’s visual field. The motor neurons encode the desired speed of the robot’s wheels and whether the blue and red LEDs should be turned on or off. The colored light emitted by the LEDs can be perceived by the robots through their camera and can be used by the robots to coordinate.

The genotype of the evolving robots includes the connection weights and the biases of a single neural network controllers that is duplicated N times. The N networks are then embedded in the N corresponding robots forming the swarm. This implies that the swarmbot is homogeneous, i.e. it is formed by identical individuals. As discussed in Chapter 9.2, the usage of homogeneous robots eliminate the conflict of interest among individuals and favor the development of cooperative behavior.

The areas are marked by the grey color of the ground and by a red beacon located at their center. Since no explicit map of the environment is available, and since the sensory range of the robot is limited, the robots should explore the environment to find the areas.

The analysis of the evolved robots indicates that the swarm solves the problem by creating dynamic lanes, shaped, generated and maintained by the robots themselves. These lanes enable the robots themselves to move back and forth between the nest and the foraging area by minimizing the travel path.

Video 8.4 shows the behavior of a swarm composed of 10 robots. Video 8.5, instead, shows a post-evaluation in which the evolved network is embedded in 50 robots and in which the two areas are more distant among themselves.

Video 8.4. The behavior of a swarm composed of 10 robots (, Sperati, Trianni & Nolfi, 2011).

At the beginning of the evaluation episodes, when the robots are placed in randomly selected positions and orientations and the dynamical lanes are not yet formed, the robots move straight toward closeby target areas (if present), avoid collision with other robots (if present), and move along semicircular trajectory in all other cases. Once they reach a target are they invert their direction of motion by producing a U-turn. Moreover, the robots maintain their frontal blue LEDs always on and dodge on the left when they perceive a blue light ahead.

Video 8.5. The behavior of a swarm composed of 50 robots (, Sperati, Trianni & Nolfi, 2011).

The execution of these behaviors leads to the formation of robot lanes where the first robot keeps producing the behavior described above while the other robots just follow the first robot in an ordered queue. The creation of these linear formations enables the robots in the queue to start moving in the direction of a target area as soon as the first robot of the line perceives it and, consequently, before they can start perceive the area with their own sensors.

The behavior displayed by groups of robots moving in linear formations combined with the production of the U-turn behavior around target areas leads to the formation of double lanes moving in opposite directions along an ellipse-like trajectory with one of its focal points in either target areas.

At the beginning these lanes are unstable and can fall apart after a while, as a result of the interference caused by robot–robot avoidance behaviors. However, new, more stable lanes are created later.

The progression toward the formation of a single lane connecting the two areas is ensured by the inherent instability of the dynamic lanes passing around a single target area and the stability of the ones passing around two target areas. Moreover, this progression toward lanes connecting the two target areas is favored also by the fact that the portion of the lane passing around the target area is more stable than the opposite portion of the lane when the latter does not yet pass around the other target area. This implies that a portion of the lane explores the environment until it finds the other area.

Once a dynamic lane rotating around the two target areas is formed, the problem is substantially solved. The solution, however, is further optimized by progressively eliminating unnecessary deviations from the shortest path and by uniformizing the relative distance among robots so to reduce the interference caused by obstacle avoidance behaviors.

These dynamic lanes are clear examples of self-organizing spatiotemporal structures that originate from the interaction of several robots operating on the basis of simple rules. Indeed, such spatiotemporal structures: (i) self-generate and self-maintain, (ii) are robust with respect to small environmental perturbations and to variations in the number of robots, and (iii) are scalable, e.g. permits to swarms composed by many robots to find and to navigate between targets areas that would result too hard for swarms composed by less individuals.

The formation of lanes moving in opposite directions in these experiments is analogous to the spontaneous formation of lanes in pedestrian flows studied by Helbing and collaborators (Helbing, 1991; Helbing et al., 2005, Video 8.6).

Video 8.6. Lane formation in pedestrian counter-flows (, Helbing, 1991; Helbing et al., 1995).

Helbing and collaborators modelled moving pedestrians as agents operating with two simple rules: (i) move in the direction of your destination at a certain preferred velocity, and (ii) steer and/or slow down to avoid collisions. The interactions between several agents and their environment mediated by these rules generate emergent properties, i.e. well justified lanes moving in opposite directions. These properties enable the agents to flow in an ordered manner without the need of enforcing such behavior with signs or other external interventions. The crowd appears equipped with a collective intelligence able to identify and follow the optimal path.  The foraging robots and the groups of pedestrian agents thus constitute two examples of collective behaviors that emerge spontaneously from the interaction among the single agents without the need of a centralized coordination mechanism.

8.5 Specialization

In some cases, the efficacy of robotic swarms can be improved through division of labor. This implies that two or more subsets of individuals perform different complementary tasks that are functional to the realization of the overall objective of the swarm (Ferrante et al., 2015).

An example that illustrates how a functional specialization of this type can emerge in an experiments involving evolving robots is reported in Pagliuca & Nolfi (2019). The experiment involves a swarm of simulated MarXbots (Bonani et al., 2010). The task consists in collecting invisible food elements distributed in the environment and in periodically releasing the food collected into the nest, i.e. the circular gray area (Video 8.7). The swarm of robots is rewarded with 1.0 point for each food element collected in the environment and released in the nest. The nest cannot be detected through the camera, i.e. the robots can only detect whether they are located over the nest or not through their ground sensors.

The robots are provided with infrared sensors, ground sensors, a color camera, two actuated wheels, and frontal red and rear blue LEDs which can be turned on or off.

Video 8.7. The behavior of an evolved swarmbot in the case of a replication that converged over a non-specialized solution (, Pagliuca & Nolfi, 2019).

Video 8.7 shows the behavior displayed by an evolved swarmbot in the case of a replication that converged over a non-specialized solution. As can be seen, the robots display the ability to form lanes centered over the nest. These lanes enable the swarm of robots to explore the environment, to collect food, and to periodically return to the nest to release the food collected. More specifically the robots systematically explore the portions of the environment surrounding the nest by producing ellipsoid-shaped lanes rotating around the nest.

Other replications of the experiment, instead, converge over specialized solutions. For example, the replication shown in Video 8.8 converges to a solution in which the swarm self-divides in two sub-groups: a larger group collecting food and a smaller group (eventually formed by a single robot) marking the position of the nest. We can refer to the robots belonging to the former and latter groups as food collectors and nest markers.

Video 8.8. The behavior of an evolved swarmbot in the case of a replication that converged over a specialized solution (, Pagliuca & Nolfi, 2019).

The larger group of food collectors consists of individuals moving independently in different directions to collect food. The smaller group of nest markers consists of individuals which remain in the nest to mark its position to the food collectors which need to periodically return to the nest.

The role of the individual robots might change dynamically, during evaluation episodes, since food-collectors returning to the nest can become nest-markers and since nest-markers might exit from the nest and become food-collectors. The transition between the two roles is regulated so to ensure that the number of robots playing the role of nest markers is small.

8.6 Coordinated Locomotion in Self-Assembling Swarm-Bots

Baldassarre et. al. (2006, 2007) studied how a swarmbot formed by physically assembled robots could evolve the capability to move in a coordinated manner. The robots are composed of a chassis and a turret that can rotate along the vertical axis (Figure 8.3). The chassis includes a track with two motorized wheels, light sensors, and a force sensor that detects the direction and the intensity of the force that the turret of the robot exherts on the chassis. The turret includes a gripper that enables the robots to attach to their peers. When assembled, the robots can perceive whether and how the other robots are pushing or pulling them through the torque sensor. The robots can self-assemble to carry out missions that could not be performed by isolated robots, such as passing over a gap larger than a single robot. Once assembled, however, they need to coordinate to start moving together and to compensate for misalignments originating during motion.

Fig 8.3. A swarmbot composed of 8 assembled robots (Baldassarre et. al. 2007)

A group composed of four robots assembled in a linear structure was evolved for the ability to move in the direction of the light, if present, or to move in any direction otherwise (Video 8.9). The fitness consists of the distance travelled by the swarmbot toward the light, in the former case, and the distance from the original to the final position of the swarmbot, in the latter case. As in the experiments reported in previous sections, the swarm is homogeneous (i.e. the genotype encodes the parameters of a single neural network policy that is replicated four times and embedded into the four corresponding robots).

Video 8.9. The behavior displayed by a swarmbot evolved for the ability to locomote and eventually to move in the direction of a light source. The white lines over each robot indicate the direction and the intensity of the torque exhorted by the turret of the robot on the chassis (, Baldassarre, Parisi & Nolfi, 2006)

As can be seen, the evolved robots manage to negotiate a common direction of motion, start moving along that direction, and compensate the misalignments originating during motion. The direction of motion corresponds to the direction of light, when present.

Interestingly, the ability to coordinate acquired by these robots is robust with respect to the topology of their ensemble, to the number of robots, and to whether the robots are assembled through rigid or flexible links (i.e. whether the gripper permits a limited degree of relative motion or not). Moreover, the evolved robots post-evaluated in new environmental conditions, spontaneously display useful additional behaviors. We can indicate this property with the term behavioral generalization. The term generalization indicates the ability of the robots to react appropriately to new observations. The term behavior generalization refers to the ability to react to new environmental conditions by producing novel behaviors that are appropriate for the new condition.

This is illustrated by Video 8.10 showing the behavior of 8 robots assembled in a circular topology through non-rigid links situated in a maze-like environment. The robots are provided with the policy network evolved for the four robots assembled into a linear structure situated in the empty environment shown in Video 8.9. This means that these robots never experienced obstacles before. As can be seen, the robots keep producing coordinate motion and coordinate light approaching in the new situation. In addition, they display an ability to coordinately avoid obstacles, explore different portions of the environment and rearrange their shape so to fit narrow passages. The alternation of these behaviors allows the swarmbot to reach a distant light target initially occluded by obstacles through the exhibition of several different behaviors appropriately alternated over time.

Video 8.10. The behavior produced by eight robots assembled into a circular structure in a maze environment including walls and cylindrical objects (represented by gray lines and circles). The robots start in the central portion of the maze and reach the light target located in the bottom-left side of the environment by exhibiting a combination of coordinated-movement behaviors, collective obstacle-avoidance, and collective light-approaching behaviors. (, Baldassarre et al., 2006)

The ability of the robots to generalize to new conditions is also illustrated in Video 8.11 showing the behavior produced by the same evolved neural network policy embedded in the physical robots. As can be seen, the swarmbots evolved in simulation transfer in hardware and maintain their ability to display coordinated motion regardless of the topology with which the robots are assembled and of the irregularity of the terrain.

Video 8.11. The behavior produced in the physical environment by swarmbot formed by 4 or 8 robots assembled in different topologies and located over flat and irregular surfaces (, Baldassarre et. al., 2007)

The overall behavior produced in the maze-like environment (Video 8.10) originates from the combination and alternation of several lower-level behaviors. Some of these behaviors correspond to those shown in the empty environment. Others originate from the combination of the control rules which permit the production of these behaviors and the new environmental conditions. Still others originate from the interaction and the alternation of the former behaviors over time.

Fig. 8.4. Schematization of the behavior exhibited by the swarmbot in the maze-like environment. The central portion of the figure represents the brain and the body of four robots (shown in green and pink) situated in an environment (shown in yellow). The external portion indicate the behaviors exhibited by the swarm of robots. The behavior shown in red and light red are rewarded directly or indirectly by the fitness function. The behaviors indicated in black originate as a result of the interaction of the control rules responsible for the generation of the rewarded behavior and new environmental conditions or as a result of the interaction between lower-level behaviors (Nolfi, 2009)

Indeed, if we analyze the robots in the maze-like environment, we can see how their behavior can be characterized at four different levels of organization (Figure 8.4). At the first level we can identify the following four elementary behaviors:

The three control rules enabling the production of these four behaviors also permits the production of the other behaviors described below. More specifically, the combination and the alternation of these four elementary behaviors produce the following higher-level behaviors that generally extend over longer time spans:

The combination and the interaction between these behaviors produce the following higher-levels behaviors that generally extend over still longer time spans:

The combination of all these behaviors leads to the overall behavior of the swarmbot:

It is worth noticing that only the behaviors shown in red in Figure 8.4 have been rewarded during the evolutionary process. The behaviors shown in light-red have been rewarded indirectly since they are instrumental to the production of the behaviors shown in red. All other behaviors have not been rewarded either directly or indirectly. They emerge as a result of the interaction between the control rules responsible for the production of the behavior shown in red with new environmental conditions, or they are the result of the interaction of lower-level behaviors.

This analysis demonstrates how adapting robots tend to spontaneously display behaviors with a multi-level and multi-scale organization. Moreover, it demonstrates how adaptive robots are capable to display not only the behaviors that have been exhibited and rewarded during the adaptive process but also additional potential behaviors that did not manifest in the environmental conditions experienced during the adaptive process. The control rules allowing the production of the behaviors that have been rewarded can produce additional behaviors in new environmental conditions that can enable the robots to immediately handle such new environmental conditions appropriately, without the need to adapt to them.

8.7 Swarm of robotic boats operating in a natural environment

Duarte et al. (2016) demonstrated how a swarm of robotic boats can operate successfully in a natural environment.  The robot boats include differential drive motors, wireless communication, compass and GPS. The data collected by these sensors are used to update the state of a set of sensory neurons encoding the angle and position of a target destination and the distance of the other robots and of obstacles (in the frontal-left, frontal-right, rear-left and rear-right directions). The motor neurons encode the desired translational and rotational speed of the robotic boats.

The swarm of robots was evolved in simulation for the ability to perform: (i) a homing behavior consisting in navigating toward a given target destination while avoiding obstacles, (ii) a disperse behavior consisting in moving while maintaining a predefined target distance from the nearest robot, (iii) a clustering behavior consisting in approaching the other robots by possibly forming a single cluster, and (iv) a monitoring behavior consisting in maximizing the number of portions of the environment visited recently.

The evolution of robust solutions that could be transferred successfully from the simulated to the real environment was promoted by adding uncorrelated noise to sensors and actuators in simulation and by evaluating the evolving agents for multiple episodes carried in variable environmental conditions.

The evolved swarms were post-evaluated successfully in a semi-enclosed water area of 300 x 190 m located in Lisbon, Portugal, for the ability to sample the water temperature over an area of interest. This was realized by manually dividing the mission in the following five sequential phases with predetermined time duration: (i) collectively navigate from the base station to the center of the area of interest (homing), (ii) disperse, (iii) monitor the area of interest, (iv) aggregate by using the clustering behavior, and (v) return to the base station (homing).

8.8 Learn how

Learn how to implement an new AI-Gym environment based on the PyBullet 3D dynamic simulator from scratch by reading section 13.12 and by doing the Exercise 10.


Baldassarre G., Parisi D. & Nolfi S. (2006). Distributed coordination of simulated robots based on self-organisation. Artificial Life, 12(3):289-311.

Baldassarre G., Trianni V., Bonani M., Mondada F., Dorigo M. & Nolfi S. (2007). Self-organised coordinated motion in groups of physically connected robots. IEEE Transactions on Systems, Man, and Cybernetics, 37(1):224-239.

Bonabeau E., Dorigo M. & Theraulaz G. (1999). Swarm Intelligence: From Natural to Artificial Systems. Oxford, U.K.: Oxford University Press.

Bonani M., Longchamp V., Magnenat S., Retornaz P., Burnier D., Roulet G. et al. (2010). The marXbot, a miniature mobile robot opening new perspectives for the collective-robotic research. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4187–4193.

Camazine S., Deneubourg J.-L., Franks N.R., Sneyd J., Theraulaz G. & Bonabeau E. (2001). Self-Organization in Biological Systems. Princeton University Press.

Dorigo M., Birattari M. & Brambilla M. (2014). Swarm robotics. Scholarpedia 9 (1): 1463.

Duarte M., Costa V., Gomes J., Rodrigues T., Silva F., Oliveira S.M. et al. (2016). Evolution of collective behaviors for a real swarm of aquatic surface robots. PLoS ONE 11(3): e0151834.

Ferrante E., Turgut A.E., Duenez-Guzman E., Dorigo M. & Wenseleers T. (2015). Evolution of self-organized task specialization in robot swarms. PLoS Computational Biology, 11(8): e1004273.

Helbing D. (1991) A mathematical model for the behavior of pedestrians. Behavioral Science, 36:298-310.

Helbing D. et al. (1995) Social force model for pedestrian dynamics. Phys. Rev. E 51, 4282

Helbing D., Buzna L., Johansson A. & Werner T. (2005). Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transportation Science, 39 (1).

Nolfi S. (2009). Behavior and cognition as a complex adaptive system: Insights from robotic experiments. In C Hooker (Ed.), Handbook of the Philosophy of Science. Volume 10: Philosophy of Complex Systems. General editors: Dov M. Gabbay, Paul Thagard & John Woods. Elsevier.

Pagliuca P. & Nolfi S. (2019). Robust optimization through neuroevolution. PLoS ONE 14(3): e0213193.

Reid C.R., Lutz M.J., Powell S., Kao A.B., Couzin I. D. & Garnier S. (2015). Army ants dynamically adjust living bridges in response to a cost–benefit trade-off. Proceedings of the National Academy of Sciences, 112(49): 15113-15118.

Reynold C.W. (1987). Flocks, herds and schools: A distributed behavioral model. In M.C. Stone (Ed.), Proceedings of the 14th annual conference on Computer graphics and interactive techniques. New York: Association for Computing Machine.

Sperati V., Trianni V. & Nolfi S. (2011). Self-Organised path formation in a swarm of robots. Swarm Intelligence, 5:97-119.