Behavioral and Cognitive Robotics
An adaptive perspective

Stefano Nolfi

© Stefano Nolfi, 2021   |   How to cite this book   |   Send your feedback   |   Collaborate

Index Next Chapter

1.Autonomous robots

1.1 Introduction

This book examines autonomous robots, namely systems capable of operating without human intervention for prolonged periods of time in partially unknown environmental conditions. An instance of this class are vacuum cleaner robots capable of operating autonomously in a natural domestic environment for hours, regardless the type and position of the furniture present in the house. Autonomous robots differ significantly from industrial robots that perform pre-determined sequence of movements and operate in well-defined environment structured to support robots’ operation (Figure 1.1). Moreover, they differ from tele-operated robots that are controlled manually by humans.

Figure 1.1. An example of industrial robots used for car manufacturing. (Wikipedia commons.

In this chapter we will introduce autonomous robots and their constituting hardware and software components.

1.2 Robots

For the purpose of this book, whose focus is on behavioral and cognitive robots, we can define a robot as an artificial system that: (i) has a physical body that includes actuators, sensors, and a brain, (ii) is situated in a physical environment and eventually in a social environment including other robots and/or humans, and (iii) exhibits a behavior performing a function.

1.3 The body

Robots’ bodies vary considerably depending on the environment in which they should operate and on the task that they should perform. For example, robots that need to move on irregular terrains can include legs. Robots operating on water can be characterized by aerodynamical fish-like shapes. Robots interacting with humans can have a humanoid morphology, etc.

Robots’ body also vary with respect to their material. Traditionally, robots were made of rigid materials enabling precise movements. More recently, however, robots are often made also with soft materials that do not enable precise motion but provide other important advantages. These advantages include compliance, i.e. the ability of the body to spontaneously adapt its shape during the interaction with the environment, and safeness, i.e. a reduced risk of inadvertently harm humans. Exploitating compliance permits to build robots capable of performing complex operations with simple control rules. For example, it allows to build robot manipulators capable of performing effective pick and place operations without knowing the exact position and the exact shape of the object to be grasped (Video 1.1).

Video 1.1. A soft gripper realized by Soft Robotics Inc (,

Finally, robots’ body vary with respect to the fine-grained characteristics of the elements that compose it, e.g. the length, the mass, the shape, and the elasticity of body parts. As we will see in Chapter 3, these aspects can also play an important role.

1.4 The sensors

Sensors permit to extract information from the robot’s environment by measuring continuous physical properties, e.g. the frequency of sound or light waves, air temperature, the velocity of the robot etc. Proprioceptive sensors measure properties of the robots’ internal environment, e.g. the speed of the robot’s wheels, the angle of the joints connecting different body parts, the voltage of the battery, the position and/or velocity of actuated joints, etc.. Exteroceptive sensors, instead, measure properties of the external environment. Frequently used exteroceptive sensors include contact sensors (eventually distributed over an artificial skin), distance sensors (infrared, sonar or laser), cameras, microphones, and depth cameras.

Sensors also differ with respect to their passive/active nature. Passive sensors, such as microphones and cameras, measure physical quantities present in the environment. Active sensors, instead, also emit energy. An example of active sensors are sonar or laser sensors that extract information from the environment by emitting ultrasound or light waves and by measuring the time required by the wave to be reflected back. Active sensors can be more informative than passive sensors but consume more energy and can interfere with each other.

Sensors can be combined with actuators and/or other passive parts to extract more or more accurate information. For example, laser sensors can be mounted on a rotating platform to measure distance over multiple directions. Cameras can be combined with conic mirrors to enable omnidirectional vision.

The state of the sensors in a particular moment can be indicated with the term observation.

1.5 The actuators

Actuators enable the robots to move and eventually to modify their environment by converting stored energy into movements.

The most widely used actuators are electric motors. They perform rotational movements that can be eventually converted into linear movements. Alternative options are constituted by hydraulic, pneumatic, or active material objects (e.g. electroactive polymers which expand their volume as a result of electric stimulation).

Actuators are often combined with rigid or elastic supporting structures like wheels, tendons, joints, that co-determine their effect.

The most straightforward way to control actuators consists in controlling directly the torque exerted (torque control). In the case of electric actuators, for example, this can be done by varying the voltage of the electric current powering the motor. Alternatively, it is possible to control the desired velocity of the motor or the position of the joint actuated by the motor by leaving the determination of the voltage to a software routine that determines it from the offset between the desired and actual position or velocity (velocity or position control). Variable stiffness actuators allow controlling both the position and the stiffness of actuated joints.

The state of the actuators in a particular moment, e.g. the voltage applied to electric motors, can be indicated with the term action.

1.6 The brain

The brain of the robot regulates the state of the actuators from the current and previous states of the sensors. In robotics the brain is usually indicated with the term controller. I prefer to use the term brain because the term controller suggests that the brain determines the behavior of the robot while in reality the brain only contribute to determine it. Alternatively, the brain can be indicated with the term policy. As we will see in the next chapters, the brain orchestrates the interaction between the robot and the environment. The behavior of the robot is the result of the robot/environmental interaction orchestrated by the brain --- it is not a direct product of the brain.

In principle the brain of a robot could consists only of wires and electronic components like transistors. In practice, however, the brain normally consists of a computer, embedded in the body of the robot and connected with the sensors and the actuators through wires, running a software. Alternatively, the computer can be located outside the robot and can communicate with sensors and actuators wireless. The software running on the computer includes a standard operating system, the drivers to communicate with the sensors and the actuators, and a high-level software that determine the state of the actuators (action vector) on the basis of the current and previous state of the sensors (observations).

The organization of this software and the way in which it is designed depends on the approach chosen. The main approaches used in robotics are illustrated in the next three Sections.  

1.6.1 The deliberative approach

The deliberative approach was elaborated and became dominant during the classic AI period, i.e. during the 60s and the 70s, following the advent of digital computers and under the influence of Cognitive Psychology.

The approach assumes that to achieve a given goal a robot (or more generally an intelligent system) should elaborate a mental solution (plan) and then execute the sequence of actions forming the plan. In other words, it assumes that the system should think and then act. Planning consists in searching a sequence of actions that permits to achieve the goal and is realized by looking ahead at the outcome of the possible actions that the robot can perform. The method requires: (i) an internal representation of the external environment (extracted from observations or compiled manually by the experimenter), (ii) a description of the goal, and (iii) a description of the actions that the system can perform (including a description of the prerequisites and outcomes of each possible action in each possible context).

Brains realized following this approach include three rather independent brain modules: (i) a perceptual module that is responsible for extracting the internal representation of the external world from observations, (ii) a reasoning or planning module that is responsible for elaborating a plan capable to achieve the current goal, and (iii) an action module that is responsible for executing the sequence of actions that compose the plan (Figure 1.2).

Figure 1.2. Deliberative architectures include three main modules: a perception module, a planning or reasoning module, and an action module.

The most remarkable example of the application of the deliberative approach is the robot Shakey (Nilsson, 1984) developed at the Stanford Research Institute. Shakey was able to perform a series of tasks. It operated by relying on a representation of the environment compiled manually by the experimenters. The robot accomplished the current task by elaborating a suitable plan. The output of the planning system consisted of a sequence of macro actions further translated into a longer sequence of micro actions by the action module. These micro actions were finally executed and modulated on the basis of the robot’s observation.  Shakey, however, was able to operate effectively only in a static and carefully engineered environment made of large empty rooms populated with few objects of uniform colors. Later attempts to generalize the approach to more dynamic and/or natural environment were much less successful.

The advantage of the deliberative approach is that it is general at the level of its core component: the planning system. In principle, in fact, the same planning system can be used to perform any type of task and to control any type of robot.

The deliberative approach, however, has several drawbacks. A first problem derives from the fact that elaborating a plan might require too much time, especially when the state space and the number of possible actions available to the robot is large. This constitutes a serious problem in robotic applications in which decision delays often have serious consequences. A second problem is that manually elaborating or automatically extracting an exhaustive representation of the external environment is not feasible. The inevitable inaccuracy and incompleteness of the internal representation implies that the execution of a plan, that leads to the desired outcome in the robot’s mental simulation, does not necessarily lead to the same outcome in the real world. A third problem is that plans can become obsolete during their own execution due to intervening variations of the environmental conditions. These problems can be mitigated, but apparently not solved, through the usage of hybrid methods that combine deliberative approaches with alternative approaches. For these reasons, pure deliberative approaches are rarely used today in robotics. They are rather used in domains such as chess playing or robot surgery in which it is possible to operate on the basis of a complete, accurate, and updated representation of the environment and in which time is not a constraint.

1.6.2 The behavior-based approach

A radically different approach, known as behavior-based robotics, has been proposed by Rodney Brooks in the 80s (Brooks, 1991). 

This approach looks at robots as systems that are situated and embodied. Situatedness means that they “are situated in the world --- they do not deal with abstract description, but with the ‘here’ and ‘now’ of the environment that directly influence the behavior of the system.” (Brooks, 1991, p. 1227). Embodiment means that “robots have bodies and experience the world directly --- their actions are part of a dynamic with the world, and the actions have immediate feedback on the robot’s own sensations”. (Brooks, 1991, p. 1227).

Behavior-based robots do not rely on an internal model of the external world and do not mentally simulate the effect of actions before acting in the environment. They determine how to act on the basis of the current and eventually previous observations. They can use internal states to store information extracted from observations. However,  the function of those internal states is not that to represent the external environment, as in deliberative methods, but rather that to regulate the actions produced by the robot.

The brain architecture of behavior-based robots includes multiple modules called layers that are responsible for the production of the behaviors enabling the robot to perform its task/s. For example, the brain of a robot might include four modules capable of generating an obstacle avoidance behavior, a wandering behavior, a grasping behavior, and a navigation behavior (see Figure 1.4). Each module incorporates the perceptual, and action-decision capabilities relevant for the production of the corresponding behavior and has direct access to sensors and actuators.

Figure 1.3. An example of behavior-based architecture. Rectangles represent the layers responsible for the production of the behavior indicated. Each module has direct access to sensors and actuators. The dotted lines represent the fact that higher layers can subsume lower layers.

The designer of a behavior-based robot starts by identifying the behaviors that the robot should produce to perform its task and then proceeds by implementing the corresponding behavioral layers. The latter is realized incrementally, by starting with the most elementary behaviors and by later proceeding with higher and higher-levels behaviors. Once the first elementary layer has been implemented, tested, and refined, the designer proceeds with the implementation of the second layer and then of the other layers dedicated to the production of progressively more complex behaviors.

The higher layers can rely on the contribution of the lower-layers. For example, the navigate layer, that enables the robot to travel toward a target destination, can rely on the avoid-obstacle layer, that enables the robot to turn to avoid nearby obstacles (see Figure 1.4).

Behavioral modules can be triggered directly by observations that invite (afford) the execution of the corresponding behavior, i.e. that indicate the opportunity of executing a given behavior. For instance, the observation of a near obstacle by means of distance sensors, which suggests the opportunity to perform an obstacle avoidance behavior, can trigger directly the avoid-obstacle layer responsible for the production of the corresponding behavior. This implies that observations affording multiple behaviors can trigger the execution of the appropriate multiple behavioral layers in parallel.  In some cases, however, the execution of a certain afforded behavior might conflict with the execution of another afforded behavior. For example, the execution of a grasping behavior can conflict with the execution of an avoidance behavior. In these cases, the designer can resolve the conflict by enabling the higher level layer to subsume, i.e. to inhibit, the conflicting lower-level layer (Brooks, 1986). In our case, this implies that the designer should include in the grasping layer a command that subsumes the obstacle-avoidance layer during the execution of the grasping behavior.

A strength of the behavior-based approach is that it can operate effectively in real time and in dynamic environments. A limit of the approach, however, is that it does not scale easily to problems that involve the production of a rich set of behaviors. This happens as controlling the conflicts among behaviors becomes exponentially more complex as the number of behavioral modules increase. A second limit is that identifying the control rules that can be used to produce a given desired behavior can be challenging for the designer. Finally, a third limit concerns the difficulty to realize effective transitions among behaviors. The realization of effective transitions often requires to adapt the way in which the preceding behavior is produced to the characteristics of the following behavior, and vice versa. However, this possibility is prevented by the incremental modality with which behavioral layers are designed.

1.6.3 The adaptive approach

Finally, the adaptive approach aims to create robots capable to develop the behavioral and cognitive skills required to perform a task autonomously, while they interact with their environment, through  evolutionary and/or learning processes. It focuses on model-free approaches with minimal human-designed intervention in which the behavior used by the robot solve its task and the way in which such behavior is produced is discovered by the adaptive process automatically, i.e. it is not specified by the experimenter.

The role of the experimenter consists instead in the design and implementation of the learning algorithm and of the reward function (i.e. a function that estimates how good or how bad a robot is doing). We will discuss adaptive approaches in details in Chapter 6.

The first pioneering attempt to design adaptive robots were carried by cyberneticists in the 50s, before the digital computers era and before the development of the deliberative approach. The first systematic studies, instead, were carried in the 90s after the development of evolutionary, reinforcement learning, and regression learning algorithms.

Adaptive methods present several advantages. A first advantage is that they free the designer from the need to identify a solution of the task. Identifying the behavior that a robot should produce to perform a task and the characteristics that the robot should have to produce such behavior can be challenging (we will discuss this aspect in more detail in Chapter 5). Moreover, the solutions eventually identified by the experimenter might result ineffective or unrealistic. Designing the reward function can also be challenging, as we will discuss in Chapter 6, but is certainly less challenging than designing a complete solution.

A second benefit is that the adaptive approach allow synthesizing integrated solutions where each characteristic is co-adapted to each other characteristic. This is due to the fact that the adaptive process is driven by a reward function that rate the overall effectiveness of the robot’s behavior. Moreover, it is due to the fact that all characteristics are subjected to variations during the adaptive process. This property clashes with the incremental approach used by the behavior-based robotics in which the characteristics of a layers can be modified during the implementation of that layers only and remain fixed during the implementation of the successive layers.

The downside of the adaptive approach is that the training process is time-consuming and does not guarantee the development of effective solutions. We will discuss how these problems can be handled in the next chapters.

1.7 Learn how

To learn how to create adaptive robots go to Section 1 and 2 of Chapter 13 and do Exercise 1. You will be introduced to AI-Gym, a software library developed by OpenAI that permits to experiment with robots in simulation and allow comparing adaptive algorithms in a large set of problems.


Brooks R.A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation (2) 1: 14-23.

Brooks R.A. (1991). New approaches to robotics. Science, 253:1227-1232.

Nilsson N.J. (1984). Technical Note No. 323. SRI International, Menlo Park, CA: USA. This is a collection of papers and technical notes, some previously unpublished, from the late 1960s and early 1970s.