Behavioral and Cognitive Robotics
An adaptive perspective

Stefano Nolfi

© Stefano Nolfi, 2021   |   How to cite this book   |   Send your feedback   |   Collaborate


Index Next Chapter


2.From Braitenberg Vehicles to Neuro-Robots

2.1 Introduction

The first systematic attempts to build autonomous robots were carried at the end of the first half of the last century by cyberneticists, an international group of researchers interested in combining ideas and theories of control theory, information theory, biology and neuroscience

An example of these pioneering robots is the machina speculatrix (Figure 2.1) built by William Grey Walter, one of the most influential exponents among the cyberneticists. Walter’s robot has contact, light and battery sensors. The sensors are connected to two motors, that control the rotation speed of the driving wheel and the steering gear, by means of electric wires and relays. The state of the motors varies as a direct function of the state of the sensors. These simple components enable the robot to appropriately alternate a wandering and a recharging behavior. The latter behavior is produced when the battery is low and is realized by approaching the light located over the recharging station, waiting until the battery is recharged, and moving away from the recharging station to resume the wandering behavior.

Interest in these studies unfortunately declined in the 1960s as a consequence of the booming of Artificial Intelligence and of the popularity of the deliberative approach. However, the interest in the cybernetic approach, that looked at natural organisms as an important source of inspiration and considered intelligent behavior as a property arising from the direct interaction between the agent and its environment, resumed in the 1980s. Indeed, the contributions of Valentino Braitenberg, who published a short but influential book called Vehicles: experiments in synthetic psychology (Braitenberg, 1984), and of Rodney Brooks, the visionary roboticist who invented the behavior-based approach, can be considered as a continuation of the cybernetic approach.

Figure 2.1. The machina speculatrix of William Grey Walter (1953).

2.2 Braitenberg’s vehicles

Valentino Braitenberg, who had a background in medicine and psychiatry, was interested in the study of the human brain. However, like the cyberneticists, he liked the synthetic approach. Namely, he believed that building artificial agents capable of exhibiting natural behavior would contribute to deepen our understanding of natural intelligence. Given the difficulties of constructing robots with the technology of the time, he decided to carry on thought experiments imagining robots of increasing complexity that he called vehicles.

Braitenberg’s vehicles include sensors measuring continuous properties of the environment (e.g. light intensity or temperature), and motors, which enable the vehicles to move. The brain of the vehicles is realized by connecting through wires the sensors to the motors directly or indirectly through internal neurons. The way in which sensors and motors are positioned and wired is chosen by taking inspiration from general properties of natural nervous systems such as symmetry, cross-lateral connection, excitation and inhibition, and nonlinearity.

The first vehicle hypothesized by Braitenberg, called vehicle 1, is the simplest possible vehicle that can be built since it includes a single motor, a single sensor, and a single linear excitatory wire that connects the sensor to the motor. The sensor measures the intensity of a certain environmental quantity, for example the temperature. The motor controls the forward speed of the driving wheel.

Figure 2.2. Vehicle 1. The large circle, the small circle, and the rectangle indicate the body, the sensor, and the motor of the vehicle. The green line indicates the excitatory wire.

Wires come in two varieties: they can be excitatory or inhibitory. The presence of a wire connecting a sensor to a motor implies that the activity of the motor depends on the state of the sensor. In the case of excitatory wires, the higher the activation of the sensor is, the higher the speed of the motor will be. Vice versa, in the case of inhibitory wires, the higher the activation of the sensor is, the lower the speed of the motor will be. In the case of linear wires, the relation between the activation of the sensor and the speed of the motor is linear. In the case of non-linear wires, the relation between the two varies according to some non-linear function.

Vehicle 1 will move in the direction it points, whatever it is. However, it will move slowly in cold regions and faster in warm regions. From the point of view of an external observer, therefore, this vehicle appears to “like” cold regions, in which it spends most of its time, and to “dislike” warm regions, from which it moves away at higher speed. The behavior of this vehicle will depend also on the characteristics of its environment and more specifically on the distribution of the temperature in space and on the characteristics of the terrain.  Indeed, it will move exactly straight only in an idealized perfectly flat environment. In more realistic surfaces, instead, it will deviate from its course due to the effects of the irregularities of the terrain. In the long run, it will produce a complicated trajectory, curving one way or another without apparent reason.

Figure 2.3. Vehicle 2a and 2b (left and right, respectively). The vehicles include two sensors and two motors on the left and right portions of their body. The yellow disk indicates a light bulb suspended over the floor. The dashed lines indicate the trajectories produced by the vehicles.

Let’s now imagine a slightly more complex vehicle, vehicle 2, that has two sensors and two motors located in the left and right portion of its body. Let’s assume that the sensors measure ambient light and that the environment contains a light bulb suspended over the floor (Figure 2.3). We can create three versions of this vehicle by using excitatory wires, depending on whether we connect: (a) each sensor to the motor on the same side, (b) each sensor to the motor on the opposite side, and (c) both sensors to both motors. The case (c) will behave like Vehicle 1, so we will consider (a) and (b) only. In the case of vehicle 2a, the motor located nearer the light will run faster than the other motor. Consequently, the vehicle will steer away from the light and will then move straight away from it at decreasing speed once the light becomes located on its rear side. In the case of vehicle 2b, instead, the motor located nearer the light will run slower than the other motor. Consequently, the vehicle will steer in the direction of the light and will later move toward it at increasing speed, once the light becomes located on its frontal side. In principle, also vehicle 2a might move toward the light source, if it happens to be oriented exactly toward it. In practice however, small dis-alignments originating during motion, resulting from friction and/or from noise, will amplify over time as a consequence of a positive feedback mechanism (i.e. as a consequence of the fact that small dis-alignments produce actions that increase the dis-alignment). The “temperament” of vehicle 2a and 2b thus looks quite different. Both vehicles seem to “dislike” light sources. However, Vehicle 2a “fears” the light and runs away from it by slowing down only when the intensity of the light diminishes. Vehicle 2b instead is “aggressive”. It resolutely faces the light and moves toward it at increasing speed, as if it “wants” to destroy it.

Figure 2.4. Vehicle 3a and 3b (left and right, respectively). The red lines indicate inhibitory wires.

By using inhibitory instead than excitatory wires, we can create other two vehicles: vehicle 3a and 3b (Figure 2.4). Vehicle 3a has straight connections between sensors and motors. Vehicle 3b, instead, has crossed connections. Both vehicles will slow down near the light and therefore will spend time near it. Vehicle 3a will steer toward the light since the motor located nearer the light will slow down more than the other motor. The vehicle will later move toward the light with decreasing speed by finally stopping in front the light. Vehicle 3b, instead, will steer away from the light at decreasing speed and will later move away straight away from the light at increasing speed. Both vehicles thus “like” the light, although they do so in different manners. Vehicle 3a “loves” it in a permanent way. It turns toward the light, approach it, and then stop in front of it “to admire its beauty”. Vehicle 3b “loves” the light but also like “to keep an eye open” for others possible lights around.

Clearly, we cannot continue a systematic analysis of all the possible vehicles that can be built since the number of possibilities explodes when the number of sensors and/or motors increase. A slightly more complex case, that is worth mentioning, is a vehicle including two type of sensors, e.g. light sensors measuring the intensity of light and infrared sensors measuring the proximity of nearby obstacles. Let’s consider, for example, the vehicle shown in Figure 2.5 that has two light sensors and two infrared sensors connected to the motors through crossed excitatory and inhibitory wires, respectively. The vehicle will approach the light (by steering toward it and by accelerating as it becomes closer) and will avoid obstacles (by steering away from obstacles and by decelerating near obstacles). If the conductivity of the wires (i.e. the intensity of the effect of excitatory and inhibitory wires) is equal, the vehicle can stall in the attempt to steer in one direction to approach the light and in the opposite direction to avoid the obstacle. Instead, if the conductivity of the wires originating from the infrared sensors is greater than those of the wires originating from the light sensors, the vehicle will correctly prioritize the obstacle avoidance behavior to the light approaching behavior. In other words, the vehicle will avoid proximal obstacles even when this conflicts with the need of approaching the light and will steer toward the light only when this does not conflict with the need to avoid obstacles. Notice also how this simple architecture with four sensors and four wires enables the vehicle to display sequential behaviors, e.g. an obstacle avoidance behavior followed by a light approaching behavior.

Figure 2.5. A vehicle provided with two infrared sensors (IR) measuring the proximity of nearby obstacles and two light sensors (L) situated in an environment that contains an obstacle and a light bulb suspended over the floor. The thickness of the green and red lines represents the conductivity of the corresponding excitatory and inhibitory wires.

Incidentally, vehicle 2.5 exemplifies a method of behavior arbitration that is alternative to the subsumption method introduced by Brooks (1986). The method consists in enabling the modules that are responsible for the production of the different behaviors to operate in parallel while ensuring that the contribution of the module that should be prioritized is significantly greater than the contribution of the other modules (Arkin, 1988). The priority levels of behaviors can be fixed, as in this example, or can regulated in a context dependent manner. In the latter case, the relative contribution of the different modules can be varied depending on the context. The relevant contexts can be discriminated by the robot on the basis of information extracted from the robot’s observations. 

Finally, a through experiments that is worth mentioning concerns the analysis of how the behavior produced by a vehicle can vary as a result of the characteristics of the environment in which it is situated. Consider for example a vehicle 2a (Figure 2.3, left) with infrared proximity sensors situated near an obstacle or near a maze-like structure (Figure 2.6). When placed near the obstacle, the vehicle will steer to avoid it and will then move straight away from it. When placed at the beginning of a maze-like structure, instead, the vehicle will display an articulated behavior that will enable it to navigate in the maze. This example shows how the behavior produced by a robot depends crucially on the local characteristics of the environment. Moreover, the example shows how a robot can produce multiple behaviors on the basis of a single controller. More generally, it shows how the number and the type of behaviors produced by a robot do not depend only on the brain of the robot but also on the environment.

Figure 2.6. The vehicle 2a with infrared proximity sensors situated near an obstacle (left) and near a maze-like structure (right).

In summary, Braitenberg’s vehicles illustrate in a simple and vivid way several important aspects characterizing robots and more generally embodied and situated agents. Firstly, robots can display purposeful behaviors, i.e. behaviors allowing the achievement of a goal, without possessing any representation of their goal. As we will see in Chapter 12, possessing an explicit representation of the goal can provide advantages, but is not a pre-requisite for producing goal-directed behaviors. Secondly, the characteristics of the environment co-determine the behavior produced by a robot. The environment is as important as the brain for the determination of the behavior exhibited by the robot. Thirdly, the complexity of the behavior produced by a robot can exceed the complexity of the robot’s brain. In other words, robots can produce complex behaviors without necessarily possessing complex brains.

2.3 Neuro-Robots

The electric wires and the relays used by the cyberneticists or imagined by Braitenberg can be replaced conveniently with artificial neural networks formed by interconnected sensory, internal, and motor neurons. Modern robots of this type are indicated with the term neuro-robots.

Artificial neural networks are constituted by simple computational units called neurons interconnected by directional weighted links called connections. Neurons produce as output a real number that is a function of the inputs received from incoming connections or, in the case of sensory neurons, from the sensors. The input received from an incoming connection depends on the output of the neuron from which the connection originates (pre-synaptic neuron) and from the connection weight, which is encoded in a real number. Connections with positive and negative weights increase and decrease the output of the post-synaptic neuron, respectively, and are referred as excitatory and inhibitory connections, respectively.

The way in which the neurons are wired together determine the architecture of the neural network. Usually neurons are organized in layers: a sensory layer, one or more internal layers, and a motor layer. In fully connected feed-forward architectures each neuron of a layer is connected to each neuron of the following layer (Figure 2.7, left). The information thus flow in one direction only from sensory neurons to motor neurons. In recurrent neural networks, instead, neurons can be connected also to neurons of the same or of previous layers (Figure 2.7, right). 

Figure 2.7. Left: a feed-forward neural networks with two internal layers. Right: a fully-recurrent neural network with one internal layer. Circles represents neurons. Rectangles represents layers. Arrows represents connections from all pre-synaptic neurons to all post-synaptic neurons. The outputs of the neurons and the connection weights are not shown.

In the case of sensory neurons, the output corresponds to the state of the corresponding sensor. For example, the activation state of 16 distance sensors can be encoded in 16 corresponding sensory neurons. The 640x480 image perceived by a color camera can be encoded in 921,600 sensory neurons encoding the RGB values of the 307,200 pixels. The activation of the sensors should be normalized within a suitable range (e.g. [-1.0, 1.0]).

In the case of the internal and motor neurons, the output is a function f () of the sum of all incoming input  weighted by connection weights  and of a bias term b:

where f() is the activation function, j is pre-synaptic neuron, i is the postsynaptic neuron, xj is the output of the pre-synaptic neuron, Wij is the connection weights between the pre-synaptic and post-synaptic neuron, and b is the bias term of the post-synaptic neuron. The bias is equivalent to a connection weight originating from a neuron that produces always 1.0 as output. Commonly used activation functions include the tanh nonlinear function, the linear function, and the ReLU function.

The output of motor neurons is used to set the state of the robot’s actuators. For example, in the case of a robot arm with 7 actuated degrees of freedom, the torque exhorted by the 7 motors controlling the 7 corresponding actuated joints can be set on the basis of the output of 7 motor neurons normalized in a [-100, 100] Newton range.

The state of the neurons is normally updated with a fixed frequency (e.g. 10 Hz) and with a fixed order, i.e. first the sensory neurons, then the internal neurons of the first layer, then the internal neurons of additional layers if present, and finally the motor neurons. In the case of recurrent neural networks, the input of recurrent connections (i.e. connections that originate from the same of successive layers) is calculated on the basis of the output of the pre-synaptic neuron at time t-1. We will discuss recurrent networks in more details in Chapter 10.

The action of the neural network, i.e. the vector of output of the motor neurons, depends on the observation (i.e. the state of the sensory neurons) and on the connection weights. In the case of recurrent networks, it also depends on the output of internal neurons at time t-1 which depends on the previous states of the neurons. This implies that feed-forward networks respond always in the same manner to the same observation (until the connection weights do not change) while recurrent neural network can vary their response as a function of previous observations.

The connection weights are initially set randomly and are then modified by means of an adaptive algorithm.  

The brain of the robot can be realized by using alternative formalisms to neural networks such as fuzzy rules or programming languages. Artificial neural networks, however, represent the most popular choice for the following reasons: (i) they provide a natural way to encode quantitative information, (ii) they degrade gracefully as a consequence of variations, and (iii) they generalize, i.e. respond to new observations by producing actions similar to those produced for similar observations. Moreover, they constitute an ideal substrate for the realization of an adaptation process thanks to the quantitative nature of their parameters and to the property described above. Finally, they permit to exploit the theoretical and practical knowledge developed over the years by the large community of researchers that studied them.

2.4 Adaptation

Braitenberg managed to correctly predict the behavior produced by his simple vehicles situated in simple environments. Indeed, his prediction were later confirmed by researchers who built the first three vehicles described in his book and reviewed above. Predicting the behavior exhibited by a robot and designing a robot capable to exhibit a given desired behavior, however, becomes more and more challenging as the complexity of the robot and/or of the environment increases.

Braintenberg himself realized this difficulty and proposed to overcome the problem by using a form of artificial selection that could be realized by repeating the following phases several times: (i) placing over a table a bunch of hand-designed vehicles (the table might include lights, sound sources, small cliff etc.); (ii) periodically creating copies of vehicles selected among those that remained on the table without falling down; (iii) introducing random mistakes in the copying process (e.g. including an inhibitory instead of an excitatory wire, adding or eliminating a sensor or a wire, etc.). This process can lead to the evolution of vehicles displaying behaviors which increase their ability to “survive”, e.g. moving away from the border of the table, pushing other vehicles out, resisting to aggressive behaviors displayed by the other vehicles. In other words, it can lead to the evolution of vehicles displaying useful behaviors. Indeed, the mistakes produced during the copying process can occasionally produce vehicles capable to survive longer. These vehicles will later proliferate since will have a greater chance to be selected as blueprint for new vehicles.

This idea was realized in concrete experiments few years later within Evolutionary Robotics (Nolfi and Floreano, 2000; Nolfi et. al. 2006) a research area that elaborated methods to evolve robots. A standard evolutionary robotics method that can be used to evolve the brain of a robot involves the following steps:

  1. The experimenter choses a task and a robotic platform suitable for the problem. For example, the experimenter might decide to evolve a legged robot for the ability to walk on an irregular terrain.
  2. The experimenter designs and implement the fitness function, i.e. a criterion that rate with a scalar value to what extent a robot accomplishes the task. For example, to evolve walking robots the experimenter can decide to use as fitness the distance between the initial and final position of the robot during one or more evaluation episodes.
  3. The experimenter evolves the connection weights of the brain of the robot in simulation by using an evolutionary algorithm like that illustrated below.
  4. The experimenter embeds the best evolved brain in the physical robot and verifies that the solution evolved in simulation transfers successfully to the real world.

A possible evolutionary algorithm (Pagliuca, Milano and Nolfi, 2018) that can be used is illustrated below. The procedure starts by creating a population of vectors that encode the parameters of a corresponding population of robots (line 1). In this case, the parameters encode the connection weights of the robots' brain and consist of a vector of real numbers of length p where p is the number of connection weights and biases of the robots’ neural network. The population θ  thus consists of to a matrix including λ vectors of length p. The population can be initialized with random gaussian number with average 0 and variance τ or by using a standard initialization method that varies the distribution on the basis of the number of pre-synaptic and post-synaptic neurons (Glorot & Bengio, 2010). Then, for a certain number of generations, the algorithm evaluates the fitness of the individuals forming the population (line 3), ranks the individual of the population on the basis of their fitness (line 5), and replace the parameters of the worse λ2 individuals with a copy with variations of the λ2 individuals (line 8). Variations are introduced by adding to copied parameters a vector of gaussian numbers with average 0 and variance σ. The evaluation of an individual (line 4) is realized by creating a robot with the parameters specified in the vector θi and by measuring the fitness while the robot interact with its environment for a given amount of time.

We will illustrate evolutionary algorithms and other adaptive algorithms in more details in Chapter 6.

2.5 Learn how

To acquire hands on knowledge on the topics discussed in this Chapter, you can read Section 3 and 4 of Chapters 13 and do the Exercise 2 and 3. You will learn how to implement a neural network policy and the evolutionary algorithm illustrated in Section 2.4 from scratch. You will also learn how to implement an algorithm compatible with the AI-Gym library that can be applied to all problems available in the AI-Gym library.  

References

Arkin R. (1998). Behavior-based Robotics. Cambridge, MA: MIT Press.

Braitenberg V. (1986). Vehicles: Experiments in Synthetic Psychology. Cambridge, MA: MIT press.

Brooks R.A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation (2) 1: 14-23.

Glorot X. & Bengio Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256).

Grey Walter W. (1953). The Living Brain. G. Duckworth London, W.W. Norton, New York.

McCulloch W. & Pitts W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115-133.

Nolfi S. & Floreano D. (2000). Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines. Cambridge, MA: MIT Press/Bradford Books.

Nolfi S., Bongard J., Husband P. & Floreano D. (2016). Evolutionary Robotics, in B. Siciliano and O. Khatib (eds.), Handbook of Robotics, II Edition. Berlin: Springer Verlag.

Pagliuca P., Milano N. & Nolfi, S. (2018). Maximizing adaptive power in neuroevolution. PloS one, 13(7): e0198788.