Reactive task and motion planning for robust whole-body dynamic locomotion in constrained environments

Abstract

Contact-based decision and planning methods are becoming increasingly important to endow higher levels of autonomy for legged robots. Formal synthesis methods derived from symbolic systems have great potential for reasoning about high-level locomotion decisions and achieving complex maneuvering behaviors with correctness guarantees. This study takes a first step toward formally devising an architecture composed of task planning and control of whole-body dynamic locomotion behaviors in constrained and dynamically changing environments. At the high level, we formulate a two-player temporal logic game between the multi-limb locomotion planner and its dynamic environment to synthesize a winning strategy that delivers symbolic locomotion actions. These locomotion actions satisfy the desired high-level task specifications expressed in a fragment of temporal logic. Those actions are sent to a robust finite transition system that synthesizes a locomotion controller that fulfills state reachability constraints. This controller is further executed via a low-level motion planner that generates feasible locomotion trajectories. We construct a set of dynamic locomotion models for legged robots to serve as a template library for handling diverse environmental events. We devise a replanning strategy that takes into consideration sudden environmental changes or large state disturbances to increase the robustness of the resulting locomotion behaviors. We formally prove the correctness of the layered locomotion framework guaranteeing a robust implementation by the motion planning layer. Simulations of reactive locomotion behaviors in diverse environments indicate that our framework has the potential to serve as a theoretical foundation for intelligent locomotion behaviors.

Keywords

Task and motion planning legged locomotion temporal logic robust reachability sequential composition

1. Introduction

The goal of this paper is to devise a reactive task and motion planning framework for whole-body dynamic locomotion (WBDL) behaviors in constrained environments. We employ formal methods for synthesis of a symbolic task planner and design of reachability controllers to achieve legged locomotion behaviors that are reactive to the environment. Although widely used in mobile robot motion planning (Fu and Topcu, 2016; Kloetzer and Belta, 2010; Wongpiromsarn et al., 2012) and autonomous driving (Campbell et al., 2010; Xu et al., 2018), formal methods have not been previously used to reason about keyframe states of dynamic locomotion behaviors. To that end, we rely on dynamic locomotion abstractions that reduce the dimensionality of the reasoning process (Zhao et al., 2017). These abstractions allow to sequentially compose locomotion modes by reasoning about the previously mentioned keyframe dynamic locomotion states and achieve advanced reactive behaviors that can respond to dynamic events in the environment as well as to disturbances, a hallmark of intelligent locomotion behaviors. The complex locomotion behaviors studied in this paper could not be achieved by using motion planners alone without a high-level decision-making process. Reasoning about keyframe dynamic locomotion states has several advantages allowing to (1) take advantage of the passive dynamics of legged robots, (2) directly compose behaviors in the phase space of the locomotion process, (3) achieve goal state reachability considering robustness margins, and (4) adjust locomotion behaviors in response to disturbances.

Our technical approach relies on a suite of template-based locomotion modes that span a spectrum of desired whole-body dynamic locomotion behaviors. Sequentially composing these modes via the proposed reactive synthesis enables us to formally combine tasks such as multi-contact locomotion, swinging movements, and hopping motions, as shown in Figure 1. Using simplified models to characterize locomotion dynamics has been widely pursued such as the use of the linear inverted pendulum model (LIPM) (Kajita et al., 2001), the spring-loaded inverted pendulum model (Piovan and Byl, 2015), the brachiation-like pendulum model (Bertram et al., 1999), the multi-contact model (Caron and Kheddar, 2016; Sentis et al., 2010b), and our recently proposed prismatic inverted pendulum model (PIPM) (Zhao and Sentis, 2012), to name a few. Usually, these models are separately considered in their own specific scenarios and lack a framework to seamlessly integrate them. Seminal locomotion results using template models (Alexander, 1984; De and Koditschek, 2015; Full and Koditschek, 1999; Raibert, 1986) and sequential composition of these models (Burridge et al., 1999) championed the advantages of using simplified models to uncover the fundamental locomotion principles related to the fine details of multi-body mechanism and dynamics. The work in Arslan and Saranli (2012) employs sequential composition to achieve reactive and robust planning against both model uncertainty and measurement noise without replanning. Nevertheless, no high-level decision-making algorithms with formal guarantees have been investigated, although the mentality of hierarchical planning and control had been proposed in Full and Koditschek (1999).

Figure 1.

Maneuvering in a constrained environment via multi-contact whole-body dynamic locomotion. Three locomotion modes are illustrated. The contact decisions are made according to the high-level symbolic task planner.

In this study, we aim at bridging this gap by proposing formal symbolic-level decision-making theories to sequentially compose more challenging—highly dynamic, versatile, non-periodic—locomotion behaviors reactive to dynamic environmental events. In the vein of work addressing rough terrain locomotion (Englsberger et al., 2015; Sreenath et al., 2013; Zhao et al., 2016a, 2016b), we address the variability of the terrains by allowing the robot to respond to sudden environmental events. The behaviors we synthesize are required to satisfy formal task specifications in a provably correct manner, which we guarantee by using formal methods with discrete abstractions of hybrid systems (Alur et al., 2000). To the best of our knowledge, our study is the first attempt to use formal methods applied to phase-space keyframe state during dynamic locomotion behaviors.

The inherent hybrid dynamics of the locomotion process and our use of keyframe dynamic locomotion states facilitate the discrete planning synthesis. Instead of discretizing the robot’s state space, we rely on a discretization of the phase space keyframe states for synthesizing symbolic-level decisions which are further sent to the underlying motion planner. We focus on the integration between the symbolic-level discrete task planner and the continuous motion planner. This top-down planning approach significantly reduces the computational complexity compared to bottom-up approaches (Belta et al. (2017); Liu and Ozay, 2016; Liu et al., 2013; Tabuada, 2009). The correctness of our top-down hierarchy is guaranteed via a correct-by-construction synthesis at the task planner level and a reachability control synthesis at the motion planner level.

The contributions of this paper are as follows. The first one is on devising symbolic reasoning methods that make decisions on keyframe states of the dynamic locomotion process in response to the dynamically changing environment. Our second contribution is on ensuring robust locomotion under bounded disturbances by reasoning about keyframe state reachability. The third contribution is on using game theory to compose complex dynamic locomotion behaviors sequentially. The final contribution is on reasoning about the correctness of the overall planning framework.

This paper is organized as follows. Section 3 introduces various dynamic locomotion models and the problem formulation of switched systems, phase space planning, and temporal logic preliminaries. In Section 4, we present the task specifications for whole-body dynamic locomotion and a reactive planner winning strategy via defining a two-player logic game. Section 5 introduces a robust finite transition system for the hybrid locomotion process to reason about local robustness with respect to the bounded disturbances and proposes robustness margin sets using phase-space Riemmanian metrics. In Section 6, we reason about the one-walking-step robust reachability and the correctness of the overall planning strategy. Simulation results of whole-body dynamic locomotion behaviors over changing environments are shown in Section 7. In Sections 8 and 9, we discuss the results and the conclusions of this paper. The Appendix presents supplementary mathematical formulations, algorithms, and propositions. A preliminary version of this paper was published in a conference proceeding (Zhao et al., 2016a, 2016b). Compared to that proceeding, this paper presents a new study on robust reachability control synthesis, incorporates additional locomotion modes and more diverse task specifications, proposes a replanning strategy, and implements more sophisticated behaviors with a diversity of environmental events.

2. Related work

Formal methods have been widely investigated for mobile navigation (DeCastro et al., 2015; Kress-Gazit et al., 2011; Raman et al., 2015; Sadigh and Kapoor, 2016). The authors in Kloetzer and Belta (2010) proposed an automated computational framework for decentralized communications and control of a team of mobile robots from global task specifications. This work suffers from high computational complexity and does not address reactive response to environmental changes. To alleviate the computational burden, the work in Wongpiromsarn et al. (2012) proposed a receding-horizon based hierarchical framework that reduced the complex synthesis problem to a set of significantly smaller problems with a shorter horizon. An autonomous vehicle navigation process is simulated in the presence of exogenous disturbances. Provable correctness is an important property of temporal logic-based control and planning approaches. The work of Kress-Gazit et al. (2009) allows mobile robots to react to the environment in real time and guarantees the provable correctness of controllers. The approach proposed in Liu et al. (2013) extended controller synthesis with guaranteed-correctness to nonlinear switched systems and designed a reactive mechanism in response to an adversarial environment at runtime. Given a high-level discrete controller encoding reactive task behaviors, the work in DeCastro and Kress-Gazit (2015) designed low-level controllers to guarantee the correctness of a high-level controller. More recently, the work of Duperret and Koditschek (2020) solves a formal discrete leaping navigation problem of legged robots to reach a goal set while in the interim reactively avoiding a set of obstacle states. However, all of the work above is applied to 2D-world mobile robots or a single-leg hopper, which have simple dynamics unlike our focus on underactuated and hybrid legged robots. In particular, task and motion planning for bipedal or multi-limb robots requires to reason about contact-based dynamics which do not present in mobile robots.

2.1. Formal methods for manipulation and locomotion

Formal methods have also gained increasing attention in the mobile manipulation community via task and motion planning (TAMP) methods (Dantam et al., 2018; He et al., 2015; Kaelbling and Lozano-Pérez, 2011; Srivastava et al., 2014) or reactive synthesis methods (Chinchali et al., 2012; He et al., 2017; Sharan, 2014). However, many existing TAMP approaches rely on sampling-based motion planners which ignore the underlying physical dynamics. To fill this gap, the recent work of Toussaint et al. (2018) proposed a logic-geometric program to incorporate manipulation dynamics into the task and motion planning process, where discrete logic rules are used to specify the mode sequence for dynamic manipulation tasks. However, this work lacked a reactive mechanism in response to environmental actions and manipulated objects. More importantly, formal methods are yet to be used to reason about dynamic legged locomotion, or for more complex dynamic tasks for humanoid robots like the ones described in this paper. The authors in Antoniotti and Mishra (1995) determined goals for legged robots by using computational tree logic and synthesized controllers for locomotion. However, their work is restricted to static locomotion tasks which do not allow robots to walk dynamically or jump similarly to humans. An abstraction-based controller was proposed in Ames et al. (2015) for bipedal robots using virtual constraints, but this work focused on controller generation without addressing symbolic task reasoning. Recently, the work of Maniatopoulos et al. (2016) proposed an end-to-end approach to automatically synthesize temporal-logic-based plans on an Atlas humanoid robot. Reaction to low-level failures was formally incorporated by simply terminating the execution. However, the robot behaviors focus on manipulation and grasping tasks, instead of locomotion behaviors. The work of Sreenath et al. (2013) proposed a two-layer hybrid controller for locomotion over varying-slope terrains with imprecise sensing. To account for terrain uncertainties, a high-level controller implements a partially observable Markov decision process to make sequential decisions for controller switching. Once again, this work does not address symbolic task reasoning for dynamic locomotion. In addition, this work is limited to walking on terrains with mild roughness while our focus is locomotion on highly rough terrain and constrained environments.

2.2. Robustness reasoning of formal methods

Robustness to disturbances and reactiveness to changing environments are major challenges in robotic systems. Related work includes Fainekos and Pappas (2009) which studies the robust satisfaction of temporal logic specifications associated with continuous-time signals. Signal temporal logic (STL) (Donzé and Maler, 2010) allows to reason about dense-time, real-valued signals, enabling for the evaluation of the extent to which the specifications are satisfied or violated. This property makes STL especially suitable to quantify robustness (Deshmukh et al., 2015; Farahani et al., 2015; Sadraddini and Belta, 2015). The focus of all the work above is on the robust semantics of temporal logic, while our objective is to design robust locomotion planners where robustness margin sets are quantified as a goal in the reachability analysis under bounded disturbances. The work of Majumdar et al. (2011) studied robust controller synthesis on discrete transition systems against disturbances and proposed a robust metric to ensure that the state deviation from the nominal system is bounded by the magnitude of the disturbance. The work of Topcu et al. (2012), on the other hand, investigated the amount of uncertainty that can be tolerated while the controller still satisfies the given specifications. Both of the two papers above, however, focused on robustness reasoning in a purely discrete model, whereas our proposed method reasons about robustness in a hybrid locomotion system and incorporates the underlying physical dynamics. Recently, the work in Plaku et al. (2010), Bhatia et al. (2010), and He et al. (2015) proposed a multi-layered synergistic framework such that the low-level sampling-based planner communicates with the high-level discrete planner through a middle coordinating layer. This coordinating layer allows the motion planner to ask the task planner for a new high-level plan when a failure occurs at the low level. This synergy between multiple planning layers enhances the robustness of the planning framework. As an alternative, the work of Dantam et al. (2016) incrementally incorporated geometric information from the failure event of the motion planner into the task planner via the so-called incremental constraint updates. The robustness in the two lines of research above is reasoned from a replanning perspective. While our study employs a similar replanning strategy as theirs, our focus is on the formal synthesis of a task planner that can react to sudden event changes in the environment. In this paper, we address the robustness as follows: (1) at the task planning level, we devise a reactive mechanism that chooses appropriate system actions according to environmental actions, and (2) at the motion planning level, we achieve robustness against bounded state disturbances by designing robust keyframe transitions for dynamic locomotion.

2.3. Multi-contact legged locomotion

Multi-contact locomotion planning and control for humanoid robots have gained good traction as legged robots operate within complex environments more frequently in recent years (Bouyarmane and Kheddar, 2011; Chung and Khatib, 2015; Hauser, 2014; Posa et al., 2016; Sentis et al., 2010a). The work in Bretl (2006) studied multi-contact locomotion as a hybrid control problem while the work in Hauser et al. (2005) posed the multi-contact planning problem as a hierarchy that first reasons about contacts, and then interpolated these contacts with trajectories computed from a probabilistic planner. The study in Kudruss et al. (2015) formulated multi-contact centroidal momentum dynamics as an optimal control problem. However, all of the work above focused on either static or quasi-static mobility behaviors. Instead, our planning framework tackles highly dynamic behaviors, that is, non-periodic multi-contact dynamic locomotion over rough and constrained environments. The work in Caron et al. (2015) employed contact wrench cones to geometrically construct dynamic supports in arbitrary virtual planes for multi-contact behaviors. This work did not employ a rich set of locomotion templates due to restrictive assumptions on the center-of-mass behavior. Once again, all the work does not address symbolic reasoning of dynamic locomotion behaviors.

3. Preliminaries and problem formulation

Problem Statement: This study focuses on the reactive and robust synthesis of dynamic whole-body locomotion behaviors for robots equipped with arms and legs to maneuver in complex environments exposed to unexpected emergency events. We use a variety of reduced-order models characterizing the robot’s center-of-mass dynamic behaviors (see Figure 2). Robot actions are parameterized by discrete contact decisions (i.e., limb contact configurations) while environmental actions are composed of various features, such as stair height variations and emergency events, including the appearance of humans, terrain cracks, high ceilings, and narrow passages. A two-player game based on the linear temporal logic (LTL) method is employed for the robot to be reactive to environmental events. We combine the reactive synthesis and reachability control to provide formal guarantees for locomotion in terms of correctness and robustness. While the synthesized actions and continuous control policies are designed off-line, we make them available as look-up tables for real-time online execution of reactive whole-body locomotion decisions and control commands. In this paper, we choose a specific set of environmental actions to demonstrate the versatility of our method in employing multiple limbs for locomotion and responding to a diversity of environmental changes and emergency events. Our method is flexible to incorporate more diverse environments, such as including the contact from lateral supporting walls or obstacles coming from different directions.

3.1. Dynamic locomotion modes

We design a phase-space motion planner that consists of a palette of locomotion modes. To begin with, we introduce centroidal momentum dynamics in a general form. Dynamics of mechanical systems can be represented by their rate of linear and angular momenta, which are affected by external wrenches (i.e., force/torque) exerted on the system. We characterize this class of dynamical systems via the balance of moments around the system’s centroid.

\dot{l} = m {\ddot{p}}_{com} = \sum_{i}^{N_{c}} f_{i} + m g

(1)

\dot{k} = \sum_{i}^{N_{c}} (p_{i} - p_{com}) \times f_{i} + τ_{i}

(2)

where

l \in ℝ^{3}

and

k \in ℝ^{3}

represent the centroidal linear and angular momenta, respectively.

f_{i} \in ℝ^{3}

is the i^th ground reaction force, m is the total mass of the robot, g = (0, 0, −g)^T corresponds to the gravity field,

f_{com} = m {\ddot{p}}_{com} = m {(\ddot{x}, \ddot{y}, \ddot{z})}^{T}

is the vector of center-of-mass inertial forces. Equation (1) represents the rate of spatial linear momentum is equal to the total linear external forces.

p_{i} = {(p_{i, x}, p_{i, y}, p_{i, z})}^{T}

is the position of the i^th limb contact position.

τ_{i} \in ℝ^{3}

is the i^th contact torque. Equation (2) reveals that the rate of angular momentum is equal to the sum of the torques generated by contact wrenches at the CoM.

Given this general model, certain assumptions are commonly imposed to make the problem tractable (Audren et al., 2014). In our case, six locomotion modes are proposed to produce various WBDL behaviors.

Mode (a): prismatic inverted pendulum model. For single foot contact, Equation (2) is simplified to ( p _com − p _foot) × ( f _com + m g ) = − τ _com. Given a piece-wise linear CoM path surface to follow, the system dynamics are expressed as

(\begin{matrix} \ddot{x} \\ \ddot{y} \end{matrix}) = ω_{PIPM}^{2} (\begin{matrix} x - x_{foot} - \frac{τ_{y}}{m g} \\ y - y_{foot} - \frac{τ_{x}}{m g} \end{matrix})

(3)

where

\ddot{x}

and

\ddot{y}

are linear CoM acclerations aligned with sagittal and lateral directions as defined in Equation (1). The PIPM phase-space asymptotic slope (Zhao et al., 2017) is defined as

ω_{PIPM} = \sqrt{g / z_{PIPM}^{apex}}, z_{PIPM}^{apex} = (a \cdot x_{foot} + b \cdot x_{foot} + c - z_{foot})

, where a and b are the slopes for the piece-wise linear CoM path surface ψ_CoM(x, y, z) = z − ax − by − c = 0. Thus, the dynamics in the vertical direction are represented by

\ddot{z} = a \ddot{x} + b \ddot{y}

and not explicitly shown here. The control input is

u = {(x_{foot}, y_{foot}, ω_{PIPM}, τ_{x}, τ_{y})}^{T}

. For more details, please refer to the result in Zhao et al. (2017).

Mode (b): prismatic pendulum model. When the terrain is cracked, the robot has to grasp the overhead support to swing over an unsafe region using brachiation. The system dynamics can be approximated as a prismatic pendulum model (PPM). For a single hand contact, we have

(\begin{matrix} \ddot{x} \\ \ddot{y} \end{matrix}) = - ω_{PPM}^{2} (\begin{matrix} x - x_{hand} - \frac{τ_{y}}{m g} \\ y - y_{hand} - \frac{τ_{x}}{m g} \end{matrix})

(4)

where similarly we can define

ω_{PPM} = \sqrt{g / z_{PPM}^{apex}}, z_{PPM}^{apex} = (z_{hand} - a \cdot x_{hand} - b \cdot x_{hand} - c)

, given the same piece-wise linear CoM path surface ψ_CoM(x, y, z) = z − ax − by − c = 0 in Mode (a). Similarly, vertical direction dynamics are represented by

\ddot{z} = a \ddot{x} + b \ddot{y}

. A difference between modes (a) and (b) lies in that PPM dynamics are inherently stable since the CoM is always attracted to move towards the apex position while the PIPM dynamics are not. This study assumes the robot can firmly grasp the overhead support once receiving the upper limb contact command. Fine reasoning of the low-level grasping model and potential failure scenarios are out of the scope of this paper, though important, and will be studied in future work.

Figure 2.

Illustration of template-based locomotion behaviors dynamically interacting with complex environments. Inspired from real-world human and animal motions, our study focuses on how to make model abstractions and high-level decisions for complex environments. A fundamental problem is how to use template models to characterize essential locomotion modes and sequentially compose these modes to achieve agile and robust locomotion.

Mode (c): stop-launch model. When a human appears, the robot has to come to a stop, wait until human disappears, and start to move forward. The task in this mode consists on decelerating the CoM motion to zero and accelerating it from zero again. We name this model as a stop-launch model (SLM) with a constant CoM sagittal accelerations. The resulting phase-space trajectory is a parabolic manifold.

Mode (d): multi-contact model. In this mode, a multi-contact model (MCM) is proposed built upon the centroidal momentum dynamics. To make the dynamics tractable, we assume a known constant vertical acceleration a_z in each step and neglect of the angular momentum k_z around the z-axis (Audren et al., 2014). Therefore, we have a constant resultant vertical external force, that is, $\sum_{i}^{N_{c}} f_{i, z} = m (\ddot{z} - g)$ , where N_c is the number of limb contacts. Since our model has point contacts, τ _i = 0, ∀i ≤ N_c, and the dynamics are described by

(\begin{matrix} \ddot{x} \\ \ddot{y} \\ \ddot{φ} \\ \ddot{θ} \end{matrix}) = (\begin{matrix} \sum_{i}^{N_{c}} f_{i, x} / m \\ \sum_{i}^{N_{c}} f_{i, y} / m \\ - (\ddot{z} - g) \cdot y + z \cdot \sum_{i}^{N_{c}} f_{i, y} / m - \sum_{i}^{N_{c}} p_{i, z} \cdot f_{i, x} / m + \sum_{i}^{N_{c}} p_{i, z} \cdot f_{i, z} / m \\ (\ddot{z} - g) \cdot x - z \cdot \sum_{i}^{N_{c}} f_{i, x} / m + \sum_{i}^{N_{c}} p_{i, z} \cdot f_{i, x} / m - \sum_{i}^{N_{c}} p_{i, y} \cdot f_{i, y} / m \end{matrix})

where φ and θ are torso roll and pitch angles aligned with the CoM sagittal and lateral directions as derived from Equation (2). The external force vector (f_ix, f_iy , f_iz) represents the i^th contact force. The vertical position z is a function of x and y defined a priori.

Mode (e): hopping model. This model applies when the locomotion model needs to jump over an unsafe region. In this case, the CoM dynamics follow a free-falling ballistic trajectory. We have $\ddot{x} = \ddot{y} = 0, \ddot{z} = - g$ . The trajectory is fully controlled by the initial condition, where a discontinuous jump in the CoM state can occur and be used to generate a desired linear momentum. For instance, when the robot jumps over a cracked terrain, it needs to push the ground as the foot lifts to generate a sufficiently large sagittal linear acceleration.

Mode (f): sliding model. This model applies when the robot needs to slide through a constrained region. The CoM dynamics are subject to a constant friction force. Thus, $\ddot{x}$ is a constant negative value, and we assume $\ddot{y} = 0, \ddot{z} = 0$ . The sagittal linear velocity decays at a constant rate.

Given the locomotion modes above, we define the set of locomotion modes as

P : = {p_{PIPM}, p_{MCM}, p_{PPM}, p_{SLM}, p_{HM}, p_{SM}}

All the locomotion modes above are illustrated in Figure 3. Each mode has closed-form solutions for their phase-space tangent and cotangent manifolds as will be derived in Section 5 and Appendix C. The timing synchronization between the sagittal and lateral dynamics is guaranteed by a Newton-Raphson foot placement searching algorithm (Zhao et al., 2017). Likewise, more complex tasks can be defined in the locomotion mode set $P$ . For instance, cartwheel, dense gaps, and spinkick behaviors as shown in Peng et al. (2018) are promising behaviors to be explored.

Figure 3.

Contact planning strategies for locomotion in rough terrains. We discretize the terrain height to decide what locomotion actions to take, and define them as environmental actions to set up a two-player game decision problem. For instance, given a moderately upward or downward terrain, there can be multiple contact actions to deal with it. Events motivated by ordinary accidents in human daily lives, such as a crack on the terrain and the sudden appearance of a human, are treated as emergency events, and incorporated into the allowable environment. Detailed definitions of environment and system actions are provided in Section 4.1.

Our phase-space planning process produces three-dimensional locomotion. However, the planning framework of this study focuses on forward walking using sagittal keyframes. Given high-level sagittal keyframes, the robot’s lateral dynamic behavior is automatically computed by our motion planner. Turning behaviors can be incorporated in our framework by using the method that we introduced in Zhao et al. (2017).

3.2. Switched systems and phase-space planning

Given the continuous locomotion modes above, we formulate the locomotion planning problem as a switched system (Liberzon (2012)). The dynamics of the WBDL process are defined as

\dot{ξ} (ζ) = f_{p} (ξ (ζ), u (ζ), d (ζ)), p \in P

(5)

where

ξ (ζ) \in Ξ \subseteq ℝ^{12}

denotes the full system state vector at

ζ \in ℝ_{\geq 0}

, that is, the 12 dimensional center-of-mass position and angular state vector of the robot during the locomotion process¹. The phase progression variable ζ, analogous to time, represents the current phase progression on a locomotion trajectory. The control input is denoted by

u (ζ) = (p_{contact}, ω, τ_{x}, τ_{y}, τ_{z}) \in U

, where p _contact represents a set of contact position vectors, where each contact position vector is three-dimensional; ω represents the slope of the phase-space asymptote dependent on specific locomotion modes as defined in Section 3.1; and (τ_x, τ_y, τ_z) represents a three-dimensional torso torque vector. Each locomotion mode merely involves a subset of the full state and control vectors. In addition,

d \in D \subseteq ℝ^{d}

represents an external disturbance. The locomotion mode p (i.e., the locomotion mode) indexes a specific locomotion mode belonging to the set

P

and f_p (⋅) denotes a vector field associated with the locomotion mode p. A logic-based switched system modeling the locomotion process is shown in Figure 4.

Figure 4.

Logic-based locomotion planner structure. A set of locomotion templates is devised for maneuvering in constrained dynamic environments. Each template is indexed by a locomotion mode signal p. The discrete environment actions are represented by the variable e while control actions are represented by the variable s describing limb contact actions. The discretized dynamic locomotion keyframes are represented by $q = (p_{contact}, {\dot{x}}_{apex})$ . Based on an environmental action e and a keyframe state q at the current walking step, the contact decision maker decides the locomotion mode signal p and the next keyframe locomotion state. More details on the usage of this decision process are discussed in Section 5.

Our phase-space planning is a three-dimensional hybrid bipedal locomotion planning framework based on robustly tracking a set of non-periodic keyframe states. This framework focuses on non-periodic gait generation for robust and agile locomotion over various challenging terrains and under external disturbances. The keyframe state in the phase-space is defined as

Definition 1

(Phase-space keyframe). A keyframe state in the phase-space of a locomotion system is a critical point on the locomotion manifold normally located either at the point of minimal or maximal velocity, or at an approximately central position of the phase-space manifold of one continuous walking step (see the gray and red dots in Figure 5).

In general, this keyframe state refers to the apex state when the center-of-mass (CoM) velocity reaches the local minimal or maximum velocity in the CoM sagittal axis² Given two consecutive keyframe states, the phase-space planner evolves continuously and computes the contact transitions of one walking step as defined below.

Definition 2

(Phase-space contact switch). A phase-space contact switch, that is, a contact transition, is defined by the intersection of two adjacent phase-space trajectories (see green dot in Figure 5(a)).

Our contact-triggered switching strategy is especially suitable for non-periodic locomotion, which is abstracted as a progression map Φ between keyframe states, that is, driving the robot’s center-of-mass from one desired keyframe to the next one via the control input u , that is, $(p_{{contact}_{k + 1}}, {\dot{x}}_{{apex}_{k + 1}}) = Φ (p_{{contact}_{k}}, {\dot{x}}_{{apex}_{k}}, u)$ , where $p_{{contact}_{k}}$ and ${\dot{x}}_{{apex}_{k}}$ denote the k^th-step CoM sagittal position and velocity at the contact apex, respectively. To accomplish whole-body dynamic locomotion behaviors, we will compose a sequence of locomotion modes with planned keyframes. This can be achieved by synthesizing a high-level task planner protocol which makes proper contact decisions like the ones shown in Figure 3 and determines the switching strategy of the low-level motion planner.

Definition 3

(One walking step in the phase-space). One walking step (OWS) of the locomotion process is defined as two consecutive semi-step phase-space trajectories (see Figure 5(a)). The first semi-step trajectory starts at the first keyframe state (gray dot) and ends at the contact switch (green dot) while the second semi-step trajectory starts at the contact switch and ends at the second keyframe (red dot).

Instead of using generalized coordinates associated with the robot joints, our planning framework chooses to use the robot’s center-of-mass state as the output space. This simplified coordinate choice is often used in the locomotion communities. Alternatives for dimensionality reduction include, for instance, differential flatness (Liu et al., 2012) and partial hybrid zero dynamics (Ames et al., 2015). The switched system dynamics in Equation (5) can be represented by a tuple,

SS = (Ξ, Ξ_{0}, U, P, f, A P, ℒ)

(6)

where Ξ₀ ⊆ Ξ is a set of initial conditions, AP is a set of atomic propositions and

ℒ : Ξ \to 2^{A P}

is a labeling function. Then a control strategy for

SS

is a partial function defined as

Ω_{i} (ξ_{0}, ξ_{1}, \dots, ξ_{i}) = u_{i} \in U^{[0, Δ ζ_{i}]}, \forall i = 0,1,2, \dots

(7)

where ξ ₀, ξ ₁, …, ξ _i is a finite sequence of sampled states evaluated at discrete phase progression instants ζ₀, ζ₁, …, ζ_i satisfying ζ_j+1 − ζ_j = Δζ_i, ∀0 ≤ j ≤ i − 1, and

U^{[0, Δ ζ_{i}]}

denotes the set of control input signals from [0, Δζ_i] to

U

. It is assumed that u _i is the constant control input with a phase progression duration Δζ_i.

Contact switching planner synthesis problem: Given a switched system $SS$ in Equation (6) and a specification φ expressible in the LTL form, synthesize a contact planning strategy for the system that (i) only generates correct phase-space trajectories κ = ( ξ , ρ, η, μ) in the sense that κ⊧φ for all initial conditions in Ξ₀, (ii) generates a locomotion mode μ in response to the environment actions at runtime. φ is realizable by $SS$ if there exists such a switching strategy. ρ, η and μ are the continuous counterparts of the discrete environment action e, contact action s, and locomotion mode p. More detailed definitions will be introduced in Section 5.

3.3. Finite transition systems and LTL preliminaries

We now define system, environment, and product finite transition systems and describe LTL preliminaries.

Definition 4

(Finite transition system of the robot system). A finite transition system of the robot system is a tuple

T S_{s} : = (Q, P, S, T_{s}, ℐ_{s}, A P_{s}, {\tilde{ℒ}}_{s})

(8)

where

Q

is a finite set of states,

P

is a set of system modes as mentioned in Equation (5),

S

is a finite set of controllable robot contact actions,

T_{s} \subseteq Q \overset{P \times S}{\to} Q

is a transition,

ℐ_{s} = Q_{0} \subseteq Q

is a set of initial states, AP_s is a set of atomic propositions,

{\tilde{ℒ}}_{s} : Q \to 2^{A P_{s}}

is a labeling function mapping the state to an atomic proposition.

T S_{s}

is finite if

Q, P, S

and AP_s are finite.

Definition 5

(Finite transition system of the environment). A finite transition system of the environment is a tuple,

T S_{e} : = (ℰ, T_{e}, ℐ_{e}, A P_{e}, {\tilde{ℒ}}_{e})

(9)

where

ℰ

is a finite set of environmental states,

T_{e} \subseteq ℰ \times ℰ

is a transition,

ℐ_{e} = ℰ_{0} \subseteq ℰ

is a set of initial states, AP_e is a set of atomic propositions,

{\tilde{ℒ}}_{e} : ℰ \to 2^{A P_{e}}

is a labeling function mapping the state to an atomic proposition.

T S_{e}

is finite if

ℰ

and AP_e are finite.

Definition 6

(Open finite product transition system). Given $T S_{s}$ and $T S_{e}$ , we define an open finite product transition system (OFPTS) to describe the overall system behavior, including the robot and its environment as

T S_{prod} : = (Q, P, S, ℰ, T, ℐ, \tilde{A} P, \tilde{ℒ})

(10)

where

Q, P

and

S

are defined as previously,

ℰ

is a finite set of uncontrollable environmental actions,

ℱ = Q \times P \times S \times ℰ

T \subseteq ℱ \to ℱ

is a transition,

ℐ = ℱ_{0} \subseteq ℱ

is a set of initial states,

\tilde{A P}

is a set of atomic propositions,

\tilde{ℒ} : Q \to 2^{\tilde{A P}}

is a labeling function mapping the state to an atomic proposition.

T S_{prod}

is finite if

Q, P, ℰ, S

and

\tilde{A P}

are finite.

Note that, the environment states $ℰ$ in $T S_{e}$ are treated as uncontrollable actions in $T S_{prod}$ . This is why $T S_{prod}$ is called a “open” finite transition system (Topcu et al., 2012). Without loss of generality, it is assumed that for every pair $(q, e) \in Q \times ℰ$ , there exists at least one pair (p, s) such that $(q, e) \overset{p, s}{\to} (q^{'}, e^{'})$ . The OFPTS considered in this study has non-deterministic transitions.

Definition 7

(Execution and word of an OFPTS). An execution γ of an OFPTS $T S_{prod}$ is an infinite path sequence γ = (q₀, p₀, e₀, s₀) (q₁, p₁, e₁, s₁) (q₂, p₂, e₂, s₂)…, with $γ_{i} = (q_{i}, p_{i}, e_{i}, s_{i}) \in Q \times P \times ℰ \times S$ and $γ_{i} \overset{T}{\to} γ_{i + 1}$ . The word generated from γ is w_γ = w_γ(0)w_γ(1)w_γ(2)…, with $w_{γ} (i) = \tilde{ℒ} (γ_{i}), \forall i \geq 0$ .

The word w_γ is said to satisfy a LTL formula φ, if and only if the execution γ satisfies φ. If all executions of $T S_{prod}$ satisfy φ, we say that $T S_{prod}$ satisfies φ, that is, $T S_{prod} ⊨ φ$ . Please refer to Figure 6 for an illustration of the finite transition systems. Linear temporal logic is an extension of propositional logic that incorporates temporal operators. Preliminaries of linear temporal logic are explained in Appendix B.

Figure 6.

The interconnected feedback diagram of system and environment finite transition systems $T S_{s}$ and $T S_{e}$ , and a winning strategy $A_{WBDL}$ synthesized in Section 4.4.

3.4. Discrete task planner synthesis formulation

Given the preliminaries above, we formulate a discrete task planner synthesis problem and introduce a specific fragment of the temporal logic for the task specifications.

Discrete task planner synthesis problem: Given a product transition system $T S_{prod}$ and a LTL specification φ following the assume-guarantee form Bloem et al. (2012)

φ : = (φ_{e} \Rightarrow (φ_{q} \land φ_{s}))

(11)

where φ_e, φ_q, φ_s are propositions for the admissible environment actions, the keyframe states, and the correct overall system behavior, respectively; in particular, φ_s incorporates the behaviors of locomotion mode p and contact action s; we synthesize a contact planner switching strategy γ that generates only correct executions (q, p, e, s), that is, (q, p, e, s)⊧φ.

To make the computation tractable, we employ a fragment of LTL formulae with a favorable polynomial complexity, named the Generalized Reactivity (1) (GR (1)) formulae (Bloem et al., 2012). This class of formulae is expressed as, for v ∈ {e, q, s}

φ_{v} = φ_{init}^{v} \underset{i \in I_{safety}}{\land} □ φ_{trans, i}^{v} \underset{i \in I_{goal}}{\land} □ ⋄ φ_{goal, i}^{v}

(12)

where

φ_{init}^{v}

are the propositional formulae defining initial conditions.

φ_{trans, i}^{v}

refer to the transitional propositional formulae (i.e., safety conditions) incorporating the state at next step.

φ_{goal, i}^{v}

are the propositional formulae describing the goals to be reached infinitely often (i.e., liveness conditions).

Remark 1

The GR(1) formula is an efficient fragment of LTL and reasons over a rich set of states and actions and makes the task planner synthesis process tractable. A motivation of using this automated synthesis is to lay the theoretical foundation of devising a correct-by-construction decision-maker for composing complex locomotion trajectories.

4. Task planning for whole-body dynamic locomotion

In this section, we introduce the temporal logic specifications for locomotion in a possibly adversarial environment. We will specify a two-player game where the environment and keyframe state are the first player while the robot action is the second player. Our task specifications will capture two types of environmental events: (i) varying-height terrains are treated as ordinary events; and (ii) sudden incidents, such as a person appearing on the robot’s path or a crack on the terrain, are treated as adversarial actions, since if the robot does not respond properly they may cause an accident.

The design of LTL specifications relies on human designers who specify locomotion tasks and models of the environment. Our task specifications below are designed according to locomotion heuristics. In general, there are no unique ways to evaluate the efficacy of the LTL design process. For instance, we could decide to add an additional environmental specification to forbid repeating the same environmental actions terrainCrack-normalCeiling e_tc-nc, expressed as □(e_tc-nc ⇒¬○e_tc-nc). Using locomotion heuristics is an effective way to generate natural and safe locomotion behaviors. The heuristics that we have chosen employ human intuition, which often lead to natural and recognizable behaviors. In addition, another set of heuristics is used to guarantee safety.

4.1. Environment specifications

As previously stated, we treat the environment as a player “acting” against the robot’s locomotion process. We define an environmental action set, $ℰ$ , as the composition of two subsets: a set for varying-height terrain and a set for emergencies (i.e., the so-called sudden events), respectively.

ℰ : = ℰ_{terrain} \cup ℰ_{emergency} = {e_{md}, e_{hd}, e_{mu}, e_{hu}} \cup {e_{tc-nc}, e_{tc-hc}, e_{ha}, e_{np}}

(13)

where the elements in the set

ℰ_{terrain}

denote different height terrain actions, as illustrated in Figure 3. For instance, e_md denotes moderatelyDownward terrain. The actions in

ℰ_{emergency}

represent sudden events, that is, terrainCrack-normalCeiling, terrainCrack-highCeiling, humanAppear, and narrowPassage. The environmental action set specified above is generalizable to other environmental events while maintaining computational tractability. Given the environmental actions above, we design the following specifications. First, the following sudden environmental actions are assumed to not occur at the initial instant

φ_{init}^{e} = \neg e_{tc-nc} \land \neg e_{tc-hc} \land \neg e_{ha} \land \neg e_{np}

(14)

Since only one environmental action can be True at any time, we enforce the following transitional proposition

□ ((e_{md} \land \underset{e \in ℰ \ e_{md}}{⋀} (\neg e)) \lor (e_{hd} \land \underset{e \in ℰ \ e_{hd}}{⋀} (\neg e)) \lor \dots \lor (e_{np} \land \underset{e \in ℰ \ e_{np}}{⋀} (\neg e)))

(15)

where the operator

\land_{e \in ℰ \ e_{md}}

is used to represent the conjunction of multiple environmental propositions

\neg e, \forall e \in ℰ \ e_{md}

. To enable the robot to maneuver through the dynamic environment, certain sudden environmental actions are forbidden to occur consecutively, as shown in the following transitional specifications:

1. (S_e-1) If the current environmental action is terrainCrack-highCeiling, then the next environmental action cannot be terrainCrack-highCeiling, humanAppear, nor narrowPassage.

□ (e_{sc - hc} \Rightarrow \neg (e_{sc - hc} \land e_{ha} \land e_{np}))

(16)

2. (S_e-2) If the current environmental action is terrainCrack-normalCeiling, then the next environmental action cannot be terrainCrack-highCeiling, humanAppear, nor narrowPassage.

□ (e_{sc - nc} \Rightarrow \neg (e_{sc - hc} \land e_{ha} \land e_{np}))

(17)

3. (S_e-3) If the current environmental action is narrowPassage, then the next environmental action cannot be terrainCrack-normalCeiling nor terrainCrack-highCeiling.

□ (e_{np} \Rightarrow \neg (e_{sc - hc} \land e_{sc - nc}))

(18)

To evaluate the effectiveness of our proposed approach handling all the allowable environmental actions, we enforce them to occur infinitely often via the goal proposition:

φ_{goal}^{e} = (□ ⋄ e_{md}) \land (□ ⋄ e_{hd}) \land \dots \land (□ ⋄ e_{np})

(19)

To ensure the robot makes progress (i.e., continuously moves forward within the constrained environment), we define the following liveness condition:

φ_{liveness}^{e} = : \neg ⋄ □ e_{ha} \land \neg ⋄ □ e_{np}

(20)

which is consistent with the goal proposition of Equation (12). This specification establishes that the robot cannot eventually always encounter the conditions humanAppear or narrowPassage. In fact, this liveness condition should also include the environmental action terrainCrack-highCeiling, that is, ¬⋄□e_sc-hc, which is already guaranteed by □(e_sc-hc ⇒¬e_sc-hc) in specification (S_e-1).

4.2. Robot specifications

To maneuver in the environment using whole-body dynamic locomotion, we define the following robot actions:

S : = {s_{li-aj}, \forall (i, j) \in S_{index}}

(21)

where the indices ‘l’ and ‘a’ are short for leg and arm, respectively.

(i, j) \in S_{index}

corresponds to the contact limb with

S_{index} = ((h, n), (h, h), (h, f), (d, h), (d, f), (d, d), (d, n), (n, f), (n, n))

, where the letters ‘h’, ‘f’, ‘d’ and ‘n’ represent hind, fore, dual and no contacts, respectively. For instance, s_lh-af specifies the legHindArmFore contact action in the sense that the robot’s hind leg and the fore arm are in contact for that action while the other two limbs are not in contact. Notice that we don’t specify left and right limbs explicitly as the hind and fore adjectives lead to unique assignments during the locomotion process.

We enforce the robot not to take actions responding to emergency events of the environment $ℰ_{emergency}$ , that is, $φ_{init}^{s} = \neg s_{ln-af} \land \neg s_{ln-an} \land \neg (s_{ld-ah} \lor s_{ld-af})$ , which are already guaranteed by the initial propositions defined for the environmental actions in Equation (14). Given a specific set of locomotion modes $P$ as defined in Section 3.1, the robot transitional specifications $φ_{trans}^{s}$ are defined as follows:

• (S_robot-1) Robot actions in response to varying-height terrain $ℰ_{terrain}$ are specified as

\begin{array}{l} □ ((e_{md} \lor e_{mu}) \Rightarrow (p_{PIPM} \land s_{lh-an}) \lor (p_{MCM} \land (s_{lh - ah} \lor s_{lh - af}))) \\ \land □ (e_{hu} \Rightarrow p_{MCM} \land s_{lh - ah}) \land □ (e_{hd} \Rightarrow p_{MCM} \land s_{lh - af}) \end{array}

where moderate terrain variations allow for the use of more robot contact actions than in the case of huge terrain variations. For instance, if e = e_hu, that is, the terrain has an action hugelyUpward, the robot has only one action to choose from, consisting of using its hind arm for contact such that it can push forward its center of mass to overcome the huge terrain variation as shown in Figure 3.

• (S_robot-2) If the environmental action terrainCrack-normalCeiling occurs, that is, a crack on the terrain appears and the ceiling above the robot has a normal height (assumed to be accessible by the robot), the robot will grab a supposedly existing handle on the overhead support using its forearm (i.e., s_ln-af). On the other hand, when there is no crack on the terrain, we don’t allow the use of that action:

□ (e_{tc-nc} \Rightarrow p_{PPM} \land s_{ln-af}) \land □ (\neg e_{tc-nc} \Rightarrow \neg p_{PPM} \land \neg s_{ln-af})

• (S_robot-3) If the environmental action humanAppear occurs, that is, a person appears in front of the robot, the robot comes to a stop using the legDual contacts and the arm contacts. On the other hand, when the person disappears, the robot should continue walking from where it stopped before:

□ (e_{ha} \Rightarrow p_{SLM} \land (s_{ld - ah} \lor s_{ld - af} \lor s_{ld - an})) \land □ (\neg e_{ha} \Rightarrow \neg p_{SLM} \land \neg (s_{ld - ah} \lor s_{ld - af} \lor \neg s_{ld - an}))

• (S_robot-4) If a narrow passage narrowPassage appears, the robot will slide on the ground using two feet and no arm contacts. On the other hand, if there is no narrow passage, the robot will not use the sliding mode:

□ (e_{np} \Rightarrow p_{SM} \land s_{ld-an}) \land □ (\neg e_{np} \Rightarrow \neg p_{SM})

• (S_robot-5) If the environmental action terrainCrack-highCeiling appears, that is, a crack appears on the terrain and there is a high ceiling, the robot will have to leap over the cracked region using a hopping motion (i.e., s_ln-an). On the other hand, when this environmental action does not occur, we do not allow to use that action:

□ (e_{tc-hc} \Rightarrow p_{HM} \land s_{ln-an}) \land □ (\neg e_{tc-hc} \Rightarrow \neg p_{HM} \land \neg s_{ln-an})

As for the goal proposition of the robot, we require that all locomotion modes and contact actions will occur infinitely often to verify their correctness.

φ_{goal}^{s} = (□ ⋄ p_{PIPM}) \land (□ ⋄ s_{ld-ah}) \land (□ ⋄ s_{ld-af}) \land (□ ⋄ s_{ld-an})

(22)

where we do not list all the goal propositions of locomotion modes and contact actions. The reason is that the other goal propositions regarding contact actions and locomotion modes are implied by the goal propositions of the environment defined in Equation (19).

4.3. Keyframe specifications

Our phase-space motion planner relies on a keyframe state vector $q = {p_{contact}, {\dot{x}}_{apex}}$ as defined in Section 3.2. In the task planner, the keyframe state is designed to be non-deterministic. We define a discretized phase-space region to choose keyframe states for each walking step using a Riemannian geometry decomposition as shown in Figure 5(b). The keyframe states consist of ordinary and special types (see further below):

\begin{array}{l} Q : = Q_{ordinary} \cup Q_{special} = {q_{i - j - k}, i \in ℐ_{ordinary - behavior}, \forall (j, k) \in ℐ_{level} \times ℐ_{level}} \\ \cup {q_{i - j}, i \in ℐ_{special - behavior}, \forall j \in ℐ_{level}} \end{array}

(23)

where ordinary behaviors are

ℐ_{ordinary - behavior} = {walk, brachiation}

while special behaviors are

ℐ_{special - behavior} = {stop, hop, slide}

. A apex velocity index j and a step length index k refer to the set

ℐ_{level} = {s, m, l}

whose elements are three different keyframe “levels”: s (Small), m (Medium) and l (Large)³ For instance, q_walk-s-l represents walkSmallVelocityLargeStep, a walking keyframe with a small apex velocity, and a large step length. In our case, the ordinary locomotion behaviors (i.e., walk and brachiation) comprise 9 keyframe states, respectively while the special locomotion behaviors (i.e., stop, hop and slide) comprise 3 keyframe states, respectively.

Figure 5.

Phase-space partition of locomotion manifolds for keyframe design. The left figure shows a grid-based partition while the right figure is a non-Euclidean partition that follows the phase-space locomotion manifolds. The latter partition is consistent with locomotion dynamics and we define it as the ”locomotion-manifold-based partition.” This partition will be used to achieve robust locomotion. We use different granularities for two orthogonal axes.

Given the environmental actions in Section 4.1, the specifications for keyframe states are designed as follows.

• (S_q-1) If the next environmental action is moderatelyDownward e_md, the level for the next keyframe state q remains constant or increases by one level either from step length or apex velocity:

\begin{array}{l} □ ((q_{walk-s-s} \land ○ e_{md}) \Rightarrow ○ (q_{walk-s-s} \lor q_{walk-s-m} \lor q_{walk-m-s})) \\ \land □ ((q_{walk-s-m} \land ○ e_{md}) \Rightarrow ○ (q_{walk-s-m} \lor q_{walk-s-l} \lor q_{walk-m-m})) \\ \dots \\ \land □ ((q_{walk-l-m} \land ○ e_{md}) \Rightarrow ○ (q_{walk-l-m} \lor q_{walk-l-l})) \land □ ((q_{walk-m-l} \land ○ e_{md}) \Rightarrow ○ (q_{walk-m-l} \lor q_{walk-l-l})) \\ \land □ ((q_{walk-l-l} \land ○ e_{md}) \Rightarrow ○ q_{walk-l-l}) \land □ (((q_{brachiation} \lor q_{stop}) \land ○ e_{md}) \Rightarrow ○ (q_{walk-s-m} \lor q_{walk-m-m} \lor q_{walk-l-m})) \end{array}

where, if q = q_walk-s-s, ○q can be q_walk-s-s (remaining constant), q_walk-s-m (step length increases one level) or q_walk-m-s (apex velocity increases one level). All the other keyframes in ordinary scenarios follow the same pattern. There are three special cases: (i) when q = q_walk-l-m, there are only two choices for ○q, that is, q_walk-l-m and q_walk-l-l; (ii) the same situation applies to q_walk-m-l; (iii) when q = q_walk-l-l, the only choice is ○(q = q_walk-l-l). In emergency cases, we assign ○q by q_walk-s-m, q_walk-m-m or q_walk-l-m.

• (S_q-2) If the next environmental action is hugelyDownward e_hd, the level for the next keyframe state increases by one or two units, either on the step length or on the apex velocity. The only exception is as follows: when the current keyframe is q = q_l-l, then the next step is only allowed to choose the keyframe q_l-l.

\begin{array}{l} □ ((q_{walk-s-s} \land ○ e_{hd}) \Rightarrow ○ (q_{walk-m-s} \lor q_{walk-s-m} \lor q_{walk-l-s} \lor q_{walk-s-l} \lor q_{walk-m-m})) \\ \land □ ((q_{walk-s-m} \land ○ e_{hd}) \Rightarrow ○ (q_{walk-s-l} \lor q_{walk-m-m} \lor q_{walk-m-l})) \\ \dots \\ \land □ (((q_{walk-l-m} \lor q_{walk-m-l} \lor q_{walk-l-l}) \land ○ e_{hd}) \Rightarrow ○ q_{walk-l-l}) \\ \land □ (((q_{branchiation} \lor q_{stop}) \land ○ e_{hd}) \Rightarrow ○ (q_{walk-s-m} \lor q_{walk-m-m} \lor q_{walk-l-m})) \end{array}

where, if q = q_walk-s-s, ○q increases by (i) one unit level, that is, q_walk-s-m and q_walk-m-s, or (ii) two unit levels, that is, q_walk-m-m, q_walk-l-s and q_walk-s-l. Special cases are q_walk-l-m, q_walk-m-l and q_walk-l-l where q_walk-l-l is the only choice for the next walking step.

• (S_q-3) If there is a crack on the terrain with a normal-height ceiling, that is, e_sc-sc, then the next keyframe state is q_brachiation relying on a different set of apex velocities and step lengths than for walking behaviors:

□ (○ e_{sc - nc} \Rightarrow ○ (q_{brachiation - s} \lor q_{brachiation - m} \lor q_{brachiation - l}))

• (S_q-4) If there is a crack on the terrain and there is a high ceiling, that is, e_sc-hc, then the keyframe state is q_hop relying on a specific apex velocity, regardless of the current q:

□ (○ e_{sc - hc} \Rightarrow ○ (q_{hop - s} \lor q_{hop - m} \lor q_{hop - l}))

• (S_q-5) If a human appears in front of the robot, that is, e_ha, then the next keyframe state is q_stop relying on a specific step length, regardless of the current q:

□ (○ e_{ha} \Rightarrow ○ (q_{stop - s} \lor q_{stop - m} \lor q_{stop - l}))

• (S_q-6) If there is a narrow passage, that is, e_np, then the next key frame state is q_slide relying on a specific apex velocity, regardless of the current q:

□ (○ e_{np} \Rightarrow ○ (q_{slide - s} \lor q_{slide - m} \lor q_{slide - l}))

The remaining eight scenarios involving different environment and system action combinations are defined in a similar manner omitted here for brevity. The specifications in (S_q-1)-(S_q-6) and all others belong to $φ_{trans}^{q}$ .

From a high-level perspective, the goal of our task planner is to enable the robot to continuously maneuver through constrained environments by repeatedly selecting contact actions among $S$ . To be consistent with the environmental goal specification in Equation (15), we enforce the following liveness specification for the keyframe states.

φ_{goal}^{q} = \underset{q \in Q}{\land} (□ ⋄ q)

(24)

All the task specifications have been proposed such that $φ = ((φ_{q} \land φ_{e}) \Rightarrow φ_{s})$ holds.

4.4. Synthesis of a high-level reactive task planner

Here we formulate the high-level locomotion planning problem as a game between the robot and its possibly adversarial environment. Given the task specifications defined above, a reactive control protocol is synthesized such that the controlled legged robot behaviors satisfy all the designed specifications whatever admissible uncontrollable environment behaviors are.

A winning strategy for the task planner represented by the pair $(T S_{prod}, φ)$ is defined as a partial function (γ₀γ₁⋯γ_i−1, e_i)↦(q_i, s_i, p_i), where a keyframe state q_i, a contact action s_i, and a switching mode p_i are chosen according to the state sequence history and the current environmental action in order to satisfy the assume-guarantee form in Equation (11). All the specifications are satisfied whatever admissible yet uncontrollable environmental actions are.

Definition 8

(Game played by the WBDL task planner). A game for the whole-body dynamic locomotion task planner is a tuple,

G : = 〈 V, X, Y, θ_{i}, θ_{o}, Ψ_{i}, Ψ_{o}, ϕ_{win} 〉

with the following elements

• $X : = ℰ$ is a set of input variables for player 1;

• $Y : = Q \times S \times P$ is a set of output variables for player 2;

• $V = X \times Y$ is a finite set of proposition state variables over finite domains in the game;

• θ_i and θ_o are atomic propositions characterizing initial states of the input and output variables, respectively;

• $Ψ_{i} (V, X^{'})$ and $Ψ_{o} (V, X^{'}, Y^{'})$ are the transition relations for the input and output variables for next steps, respectively;

• ϕ_win is the winning condition given by an LTL formula.

Proposition 1

(Existence of a winning WBDL strategy). A winning WBDL strategy $A_{WBDL}$ exists for the game $G$ in Definition 8 if and only if $(T S_{prod}, φ)$ is realizable.

Figure 7 shows an automaton fragment of the WBDL contact planner $A_{WBDL}$ . Self-transition exists in moderatelyUpward states (e.g., state 09) and moderatelyDownward states (e.g., states 12 and 14) while hugelyDownward states (e.g., state 35) do not have a self-transition according to proposition (S_es-1). There is no transition between states 09 and 14 due to infeasible keyframe state transition. States 21 and 28 in red nodes represent humanAppear and terrainCrack events, respectively.

Figure 7.

A fragment of the synthesized automaton for the WBDL contact planner. Non-deterministic transitions are encoded in this automaton. The blue transitions represent a specific execution. For illustration, we index both the environmental action $ℰ$ in Equation (13) and the system action $S$ in Equation (21) as {0, …, 7} and {0, …, 8} in order, respectively. The robot keyframe state $Q$ is indexed as {0, …, 26} in order. For instance, when the automaton state is at number 12, we encounter environmental action e = 1. The winning strategy assigns keyframe state q = 3, robot contact action s = 4 and locomotion switching mode p = 3. This state allows several non-deterministic transitions for the next walking step decision.

Remark 2

Non-deterministic transitions exist in the synthesized automaton as follows: (i) environmental actions are non-deterministic. (ii) given an environmental action, several non-deterministic keyframe states can be chosen. (iii) even when both an environmental action and a keyframe state are given, non-deterministic system contact actions exist for certain transitions. This non-deterministic transitions allow self-transitions. In this case, we can guarantee the robot to make progress (i.e., maneuvering forward) due to the properties of locomotion keyframe states.

The keyframe specifications in this section purely reason about logic-level decisions and have no knowledge of underlying locomotion dynamics. However, the locomotion dynamics, especially those affected by external disturbances or model uncertainties, often result in the desired keyframe transitions being unrealizable. As such, we need to propose keyframe transitions with robustness margins and synthesize a reachability based controller to determine realizable keyframe transitions by the low-level locomotion dynamical system as proposed in the next section.

5. Robust reachability control of hybrid locomotion systems

When we model robot dynamics and estimate physical environments, uncertainty is ubiquitous due to sensor noise, model inaccuracy, external disturbance, sudden environmental changes, contact surface geometry uncertainty, and so on. As a result, commands from the symbolic task planner are potentially not achievable by the low-level motion planner. Additionally, a mismatch between the high-level discrete and low-level continuous planners is usually caused by the abstraction techniques applied on the underlying continuous systems. To handle these difficulties, we define a robust finite transition system and compute its keyframe transitions via synthesizing reachability controllers for every single walking step. In order to use phase-space locomotion manifolds to define robustness margin sets, a phase-space mapping needs to be defined between the Euclidean and Riemmanian spaces to evaluate whether a phase-space state is in the robustness margin or not.

5.1. Phase-space Euclidean-to-Riemmanian mapping

We first consider a specific locomotion process, for example, the prismatic inverted pendulum model (PIPM) (see Section 3.1 for more details) in order to establish a Euclidean-to-Riemmanian mapping in the phase space. Our previous study derives closed-form solutions of phase-space tangent and cotangent manifolds for this process (Zhao et al. (2017)) as follows.

Proposition 2

(PIPM phase-space tangent manifold). Given the PIPM of Equation (3) with initial conditions $(x_{0}, {\dot{x}}_{0}) = (x_{foot}, {\dot{x}}_{apex})$ and known foot placement x_foot, the phase-space tangent manifold is characterized by the states $(x, \dot{x}, x_{foot}, {\dot{x}}_{apex})$ such that

σ (x, \dot{x}, x_{foot}, {\dot{x}}_{apex}) = \frac{{\dot{x}}_{apex}^{2}}{ω_{PIPM}^{2}} ({\dot{x}}^{2} - {\dot{x}}_{apex}^{2} - ω_{PIPM}^{2} {(x - x_{foot})}^{2})

(25)

where σ = 0 represents the nominal phase-space manifold. When σ ≠ 0, it represents the Riemannian distance to the nominal phase-space manifold.

The tangent manifold can be used to measure deviations from the nominal locomotion trajectory in the phase-space. We use this manifold to quantify the width of a phase-space robustness margin.

Proposition 3

(PIPM phase-space cotangent manifold). Let ζ₀ be a nonnegative scaling value representing the initial phase of a cotangent manifold. Given the PIPM of Equation (3) and a specific initial state $(x_{0}, {\dot{x}}_{0})$ different from the keyframe $(x_{foot}, {\dot{x}}_{apex})$ , the cotangent manifold is characterized by the states $(x, \dot{x}, x_{0}, {\dot{x}}_{0})$ such that

ζ (x, \dot{x}, x_{0}, {\dot{x}}_{0}) = ζ_{0} {(\frac{\dot{x}}{{\dot{x}}_{0}})}^{ω_{PIPM}^{2}} \frac{x - x_{foot}}{x_{0} - x_{foot}}

where ζ₀ is chosen as the phase progression value at the keyframe state in this study.

This cotangent manifold represents the arc length along the tangent manifold σ in Equation (25). We use this cotangent manifold to quantify the length of a phase-space robustness margin. Detailed derivations of these two closed-form solutions above, that is, $σ (x, \dot{x}, x_{foot}, {\dot{x}}_{apex}) = 0$ and $ζ (x, \dot{x}, x_{0}, {\dot{x}}_{0}) = 0$ , are provided in Zhao et al. (2017). A similar analysis can be performed for other locomotion models as described in Section 3.1 (see the propositions in Appendix C for other locomotion modes). Given these analytical solutions, we define a mapping between the Euclidean and Riemmanian spaces as

(\begin{matrix} ζ \\ σ \end{matrix}) = Z_{p} (ξ) = (\begin{matrix} Z_{p, ζ} (x, \dot{x}) \\ Z_{p, σ} (x, \dot{x}) \end{matrix})

(27)

where

Z_{p} (ξ)

is a nonlinear mapping of the CoM state

(x, \dot{x})

to the Riemannian space states and obtained by using the phase space manifold of the p^th locomotion mode. This mapping will be used for the robust finite transition system definition in order to quantify the location of the phase-space state in the Riemmanian space.

5.2. Robust finite transition system for one walking step

We now focus on a case of the one-walking-step locomotion process as defined in Def. 3. As illustrated in Figure 5(b), the discrete task planner uses a Riemannian discretization of the local state space, which is defined by an abstraction map $ℳ_{Riem} : Ξ \to Q$ such that for all $(ξ, q) \in Ξ \times Q$

| ξ - q | ≼ ν \Rightarrow q = ℳ_{Riem} (ξ)

(28)

where ν is the granularity of the discretization⁴. The operators |⋅| and

≼

above represent vectorized absolute values and element-wise inequality, respectively.⁵

To guarantee that the motion planner yields feasible phase-space plans robust to disturbances, such as state measurement errors and disturbances in the dynamics, we introduce ϵ₁ and ϵ₂ as the bounds of initial and final robustness margins in the one-walking-step transition system. Namely, we not only consider the nominal initial and final keyframe states q _initial and q _final assigned by the task planner, but also neighborhood keyframe cells overlapping the ϵ-neighborhood of nominal keyframe states.

Definition 9

(Robustness margin sets). The initial and final robustness margin sets around the nominal keyframe states $q_{initial}, q_{final} \in Q$ are defined as

{\tilde{ℬ}}_{ϵ_{1}} (q_{initial}) : = {ξ (ζ_{0}) ‖ Z_{p_{1}} (ξ (ζ_{0})) - Z_{p_{1}} (q_{initial}) | ≼ ϵ_{1}}

{\tilde{ℬ}}_{ϵ_{2}} (q_{final}) : = {ξ (ζ_{final}) ‖ Z_{p_{2}} (ξ (ζ_{final})) - Z_{p_{2}} (q_{final}) | ≼ ϵ_{2}}

where

ϵ_{1}, ϵ_{2} \in ℝ^{2}

represent the bounds of

{\tilde{ℬ}}_{ϵ_{1}} (q_{initial})

and

{\tilde{ℬ}}_{ϵ_{2}} (q_{final})

, respectively. p₁ and p₂ denote the locomotion modes before and after a contact switch, respectively.

The robustness margins ϵ₁ and ϵ₂ in Def. 9 are defined in the Riemannian space. A mapping $Z$ is applied on the Euclidean states ξ and q to convert them to the Riemannian space. We design ϵ₁ ≻ ν and ϵ₂ ≻ ν such that the robustness margins are larger than the discretized cell. To provide different robust margins, we allow for non-uniform sets, that is, non-identical values for (ϵ₁, ϵ₂). This non-uniform set design makes the size of the total number of allowable keyframe transitions more manageable.

Now we describe how to simplify the robustness margin sets based on the closed-form phase-space manifolds defined in Section 5.1.

Definition 10

(Phase-space robustness margin sets). Given closed-form locomotion phase-space manifolds from Propositions 2 and 3, the initial and final robustness margin sets are simplified to

\begin{array}{l} ℬ_{ϵ_{1}} (q_{initial}) : = {ξ (ζ, σ) | ζ \in [ζ_{0} - δ ζ_{ϵ_{1}}, ζ_{0} + δ ζ_{ϵ_{1}}], σ \in [- δ σ_{ϵ_{1}}, δ σ_{ϵ_{1}}]} \end{array}

(29)

\begin{array}{l} ℬ_{ϵ_{2}} (q_{final}) : = {ξ (ζ, σ) | ζ \in [ζ_{final} - δ ζ_{ϵ_{2}}, ζ_{final} + δ ζ_{ϵ_{2}}], σ \in [- δ σ_{ϵ_{2}}, δ σ_{ϵ_{2}}]} \end{array}

(30)

where

ξ = (x, \dot{x})

, the initial state q _initial = ξ (ζ₀, 0) and the final state q _final = ξ (ζ_final, 0),

ϵ_{1} = [δ ζ_{ϵ_{1}}, δ σ_{ϵ_{1}}]

and

ϵ_{2} = [δ ζ_{ϵ_{2}}, δ σ_{ϵ_{2}}]

quantify the uncertainty bounds of

ℬ_{ϵ_{1}} (q_{initial})

and

ℬ_{ϵ_{2}} (q_{final})

, respectively.

These two pairs of bounds represent Riemannian distances in phase-space, as shown in the upper right miniature subfigure in Figure 8. A locomotion-manifold-based partition is illustrated in Figure 5(b). The proposed robust finite transition system will use this partition to design the robustness margin around keyframe states as shown in Figure 8. A merit of our analysis is that this partition is consistent with the vector field of the locomotion dynamics. Additionally, this partition simplifies mathematical descriptions of robustness margin sets.

Figure 8.

Phase-space abstraction via locomotion-manifold-based partition. This figure shows a keyframe state transition process with robustness margins. Compared to the conventional grid-based partition in Euclidean phase space of Figure 5(a), this partition complies with locomotion dynamics, further enabling us to define robustness margins based on closed-form locomotion manifolds.

Definition 11

(Robust finite transition system for one walking step). Given two triples composed of nominal keyframe states, locomotion modes, and system contact actions ( q _initial, p₁, s₁) and ( q _final, p₂, s₂)⁶, a finite transition subsystem $T S_{OWS}$ with robustness margins ϵ₁ and ϵ₂ for one walking step (OWS) is defined as a tuple

T S_{OWS} : = (Q_{OWS}, ℐ_{OWS}, P_{OWS}, S_{OWS}, A_{OWS}, T_{OWS})

(31)

with the following elements

• $Q_{OWS}$ is a set of keyframe states determined by the nominal keyframe pair ( q _initial, q _final) and robustness margins (ϵ₁, ϵ₂). $Q_{OWS} \subseteq Q$ , where $Q$ is the set of all the allowable keyframe states defined in Equation (23). $Q_{OWS} = Q_{p_{1}, OWS} \cup Q_{p_{2}, OWS}$ , where $Q_{p_{1}, OWS}$ and $Q_{p_{2}, OWS}$ are defined as

Q_{p_{1}, OWS} = {q_{initial}^{'} \in Q | ℳ_{Riem}^{- 1} (q_{initial}^{'}) \cap ℬ_{ϵ_{1}} (q_{initial}) a \neq \emptyset}

(32)

Q_{p_{2}, OWS} = {q_{final}^{'} \in Q | ℳ_{Riem}^{- 1} (q_{final}^{'}) \cap ℬ_{ϵ_{2}} (q_{final}) \neq \emptyset}

(33)

• $ℐ_{OWS} = Q_{p_{1}, OWS}$ is a set of initial states.

• $P_{OWS} = {p_{1}, p_{2}}$ is a pair of locomotion modes for one walking step.

• $S_{OWS} = {s_{1}, s_{2}}$ is a pair of contact actions for one walking step.

• $A_{OWS} = {A_{p_{1}}, A_{p_{2}}}$ with $A_{p_{1}} = \cup_{ζ_{0} \leq ζ \leq ζ_{p_{1}}} U_{p_{1}}^{[ζ_{0}, ζ]}$ and $A_{p_{2}} = \cup_{ζ_{p_{1}} \leq ζ \leq ζ_{final}} U_{p_{2}}^{[ζ_{p_{1}}, ζ]}$ ,⁷ where $ζ_{p_{1}}$ represents the phase instant when the contact switch occurs.

• $((q_{initial}^{'}, p_{1}, s_{1}), a, (q_{final}^{'}, p_{2}, s_{2})) \in T_{OWS}$ (i.e., $(q_{initial}^{'}, p_{1}, s_{1}) \overset{a}{\to} (q_{final}^{'}, p_{2}, s_{2})$ ) for $q_{initial}^{'} \in Q_{p_{1}, OWS}, q_{final}^{'} \in Q_{p_{2}, OWS}$ if there exists a control sequence $a = {u_{p_{1}} (ζ_{0}), \dots, u_{p_{1}} (ζ_{p_{1}}), \dots, u_{p_{2}} (ζ_{p_{2}}), \dots, u_{p_{2}} (ζ_{final})} \in A_{p_{1}} \cup A_{p_{2}}$ for all bounded external disturbances $d : [ζ_{0}, ζ_{final}] \to D_{OWS} \subseteq ℝ^{d}$ such that the resulting solution $ξ^{'} : [ζ_{0}, ζ_{final}] \to ℝ^{n}$ follows

{\dot{ξ}}^{'} (ζ) = f_{p_{1}} (ξ^{'} (ζ), u_{p_{1}} (ζ), d (ζ)), \forall ζ \in [ζ_{0}, ζ_{p_{1}}]

(34)

{\dot{ξ}}^{'} (ζ) = f_{p_{2}} (ξ^{'} (ζ), u_{p_{2}} (ζ), d (ζ)), \forall ζ \in [ζ_{p_{1}}, ζ_{final}]

(35)

satisfying

| Z_{p_{1}} (ξ (ζ_{0})) - Z_{p_{1}} (q_{initial}^{'}) | ≼ ϵ_{1}, q_{initial}^{'} \in ℳ_{Riem} (ξ (ζ_{0}))

(36)

| Z_{p_{2}} (ξ (ζ_{final})) - Z_{p_{2}} (q_{final}^{'}) | ≼ ϵ_{2}, q_{final}^{'} \in ℳ_{Riem} (ξ (ζ_{final}))

(37)

| Z_{p_{1}, σ} (ξ^{'} (ζ_{p_{1}})) - Z_{p_{1}, σ} (q_{switch}^{'}) | \leq δ σ_{ϵ_{1}}, | Z_{p_{2}, σ} (ξ^{'} (ζ_{p_{1}})) - Z_{p_{2}, σ} (q_{switch}^{'}) | \leq δ σ_{ϵ_{2}}

(38)

where

Z (\cdot)

is the two-dimensional Euclidean-to-Riemannian mapping introduced in Section 5.1. The system vector fields

f_{p_{1}}

and

f_{p_{2}}

are jointly determined by the locomotion mode set

P_{OWS}

and the contact action set

S_{OWS}

, respectively.

The mapping function $Z$ has two dimensions in the phase-space: tangent σ and cotangent ζ manifolds as defined in Equation (27). Initial and final bound conditions are represented by Equations (36) and (37), respectively. Equation (38) essentially defines an intermediate set where the mode switch takes place, and determines the bound for the switching instant $ζ_{p_{1}}$ . The inequalities in Equations (36) and (37) are element-wise.

A conceptual illustration of this transition computation is shown in Figure 9. Using the robustness margins, we construct the transition set $T_{OWS}$ of the robust finite transition subsystem $T S_{OWS}$ , as defined in Def. 11, by adding all the feasible transitions ${(q_{initial}^{'}, p_{1}, s_{1}) \overset{a}{\to} (q_{final}^{'}, p_{2}, s_{2})}$ , where $Z_{p_{2}} (ξ^{'} (ζ_{final}))$ is within the ϵ₂-distance to the targeted state $Z_{p_{2}} (q_{final}^{'})$ .

Figure 9.

Keyframe state reachability with robustness margins for one walking step. Due to measurement error or external disturbance, the initial state ξ ^′(ζ₀) may deviate from the desired keyframe state $q_{initial}^{'}$ . A robustness region $ℬ_{ϵ_{1}} (q_{initial}^{'})$ is defined to bound the allowable state deviations. The actual and desired states evolve according to their respective system dynamics in locomotion modes p₁ and p₂, respectively. The state switches from mode p₁ to mode p₂ at the phase instant $ζ_{p_{1}}$ . The bound of distance between the nominal intermediate keyframe $q_{switch}^{'}$ and $ξ^{'} (ζ_{p_{1}})$ is shown in Equation (38). Finally, these two states reach ξ ^′(ζ_final) and $q_{final}^{'}$ , respectively. To compute the keyframe transition, we require that $Z_{p_{2}} (ξ^{'} (ζ_{final}))$ should be ϵ₂-close to $Z_{p_{2}} (q_{final}^{'})$ , that is, being in the margin $ℬ_{ϵ_{2}} (q_{final}^{'})$ . More details of the definitions of robustness margin set are in Def. 10. Note that, since the robustness margin set $ℬ$ is defined in the Riemannian space, the mapping $Z_{p}$ is applied on the states q and ξ ^′ in the figure symbols for consistency.

The construction of $Q_{OWS}$ is shown in Figure 10. The keyframe transitions in the robust finite transition subsystem $T S_{OWS}$ can be computed as follows: for all $q_{initial}^{'} \in Q_{p_{1}, OWS}$ and $a \in A_{OWS}$ , if there exists $q_{final}^{'} \in Q_{p_{2}, OWS} \cap ℬ_{ϵ_{2}} (ξ)$ , then we add ${q_{initial}^{'} \overset{a}{\to} q_{final}^{'}}$ to the transition set $T_{OWS}$ .

Figure 10.

Sequential procedure of designing feasible robust keyframe transitions. Step 1 determines the nominal keyframe state pair from the symbolic task planner. In Step 2, we design discrete state set $Q_{OWS}$ in the robust finite transition system. Four cases are shown for illustration. The green cell represents the nominal keyframe state determined from the task planner while its surrounding gray cells represent other allowable discrete states in $Q_{OWS}$ . Finally, all feasible robust transitions are determined in Step 3.

Algorithm 1. One-walking-step robust finite transition subsystem $T S_{OWS}$ with robustness margins (ϵ₁, ϵ₂)

1: Input: ϵ₁, ϵ₂, q _initial, q _final, p₁, p₂, s₁, s₂

2: Define $A_{p_{1}}, A_{p_{2}}$ as finite subsets of $\cup_{ζ_{0} \leq ζ \leq ζ_{p_{1}}} U_{p_{1}}^{[ζ_{0}, ζ]}$ and $\cup_{ζ_{p_{1}} \leq ζ \leq ζ_{final}} U_{p_{2}}^{[ζ_{p_{1}}, ζ]}$

3: Define the discrete state of keyframe $Q_{OWS}$ according to Equations (32) and (33) of Def. 11 and Figure 10.

4: Initialize transition set $T_{OWS} \leftarrow (Q_{p_{1}, OWS} \times {p_{1}}) \times A_{OWS} \times (Q_{p_{2}, OWS} \times {p_{2}})$ {initialize all possible transitions}

5: for $q_{initial}^{'} \in Q_{p_{1}, OWS}$ do

6: for $q_{final}^{'} \in Q_{p_{2}, OWS}$ do

7: Construct an inter-sampling finite abstraction $T S_{OWS, INT}$

8: isReachable ← ReachabilityControl $(ℬ_{ϵ_{1}} (q_{initial}^{'}), ℬ_{ϵ_{2}} (q_{final}^{'}), T S_{OWS, INT})$

9: if isReachable == false then

10: $T_{OWS} = T_{OWS} {(q_{initial}^{'}, p_{1}) \overset{a}{\to} (q_{final}^{'}, p_{2})}$ {delete unqualified transitions}

11: end if

12: end for

13: end for

14: return $T S_{OWS} = (Q_{OWS}, ℐ_{OWS}, P_{OWS}, A_{OWS}, T_{OWS})$

Algorithm 1 details the construction of the robust finite transition subsystem above. The high-level task planner specifies the inputs of the algorithm, that is, two pairs of keyframe states, locomotion modes, and contact actions ( q _initial, p₁, s₁) and ( q _final, p₂, s₂) with robustness margins ϵ₁ and ϵ₂, respectively, and by Def. 11, determines the set of finite states $Q_{OWS}$ . This is the top-down component of our approach. The bottom-up component is the reachability control synthesis introduced in the next subsection. Algorithm 1 integrates the top-down and bottom-up components.

The proposed robust finite transition system (RFTS) differs from the abstraction approaches in Liu and Ozay (2014, 2016), Tabuada (2009), and Belta et al. (2017) with respect to the following points: (i) The most salient difference is that our planning approach is a hierarchy consisting of both top-down and bottom-up components. The RFTS is an interface taking the desired command from the high-level symbolic task planner (i.e., the top-down component) and use this command to synthesize a reachability controller in the low-level motion planner (i.e., the bottom-up component). The approach in Liu and Ozay (2014, 2016) is an abstraction of the underlying continuous dynamical system and represents a bottom-up approach. (ii) By using the proposed hierarhical structure, we are able to solve a more challenging problem with whole-body dynamic locomotion in a constrained environment, instead of simple examples such as 2D mobile robot or vehicle. (iii) Our RFTS reasons about the robustness to bounded state disturbances at not only the inter-sampling level, but also the locomotion keyframe level capturing the essential locomotion dynamics. (iv) We incorporate hybrid dynamics into our RFTS design, which is constructed for the one walking step. Overall, our planning framework sequentially composes multiple locomotion modes. (v) Instead of a grid-based partition, we use a locomotion-manifold-based partition to characterize the robustness margins of the keyframe states in the phase-space.

5.3. One-walking-step reachability control synthesis

To determine the transitions satisfying the conditions in Equations (36)–(38) of Def. 11, we employ abstraction-based control synthesis developed for general dynamical systems. The idea of this approach is to automatically and rigorously compute the set of states that can be controlled to realize a given specification and generate feedback controllers for those states. Generally, abstraction-based control synthesis consists of three steps: (1) Construct a finite transition system (also called a finite abstraction) that over-approximates the dynamics of the original continuous system. (2) Design control algorithms based on the finite transition system with respect to the given specification. This step not only verifies whether the given specification is realizable by the low-level robot dynamics, but also synthesizes a controller for the abstraction if realizable. (3) Interpolate the synthesized controller to be executed in the original continuous system.

We consider a one-walking-step locomotion subsystem defined on a local state space determined by two keyframe states.

Definition 12

(One-walking-step locomotion subsystem). Given the switched system tuple in Equation (6), a one-walking-step (OWS) locomotion subsystem from a given keyframe state $q_{initial}^{'}$ with a robustness margin ϵ₁ in the $(p_{1}^{th}, s_{1}^{th})$ mode to another keyframe state $q_{final}^{'}$ with a robustness margin ϵ₂ in the $(p_{2}^{th}, s_{2}^{th})$ mode is formulated as

S S_{OWS} = (Ξ_{OWS}, Ξ_{OWS, 0}, U_{OWS}, P_{OWS}, f_{OWS})

(39)

where the state space of the subsystem Ξ_OWS ⊆Ξ is a local area determined by the two keyframe states

q_{initial}^{'}

and

q_{final}^{'}

;

Ξ_{OWS, 0} = ℬ_{ϵ_{1}} (q_{initial})

is the set of initial continuous states, and

Ξ_{OWS} = Ξ_{p_{1}, OWS} \cup Ξ_{p_{2}, OWS}

with

Ξ_{p_{1}, OWS}

and

Ξ_{p_{2}, OWS}

representing the local state space of the locomotion modes p₁ and p₂, respectively;

U_{OWS}

is the allowable control input set for one walking step;

P_{OWS} = {p_{1}, p_{2}}

represents a locomotion mode set composed of two consecutive walking steps;

f_{OWS} = {f_{p_{1}}, f_{p_{2}}}

is the set of vector fields determined by

(p_{1}^{th}, s_{1}^{th})

and

(p_{2}^{th}, s_{2}^{th})

command pairs, respectively. The mode transition instant

ζ_{p_{1}}

is determined by Equation (38).

Remark 3

The state spaces $Ξ_{p_{1}, OWS}$ and $Ξ_{p_{2}, OWS}$ overlap so that a contact switch can happen during one walking step. This overlap should fully cover the intersection of two robust tubes defined in Equation (38). A straightforward option is to make both $Ξ_{p_{1}, OWS}$ and $Ξ_{p_{2}, OWS}$ identical to the state space fully covering one walking step.

The control actions a defined in Def. 11 are a sequence of control signals for one walking step. We discrete the control space and maintain a constant control signal for each time step. In the following, we propose a finite abstraction of the one-walking-step locomotion subsystem $S S_{OWS}$ . This abstraction is based on a predefined time step δζ for the construction of control signals, a bounded disturbance d, and a finer Euclidean space discretization (i.e., an abstraction map $ℳ_{OWS, Euc}$ ) rather than the one used in the task planner.

Definition 13

(Inter-sampling finite abstraction of one walking step). Given a one-walking-step locomotion subsystem $S S_{OWS}$ , an abstraction map $ℳ_{OWS, Euc} : Ξ_{OWS} \to Q_{OWS, INT}$ , and a time step δζ, a finite transition system

T S_{OWS, INT} = (Q_{OWS, INT}, Q_{OWS, INT, 0}, A_{OWS, INT}, T_{OWS, INT})

is defined as an inter-sampling finite abstraction of

S S_{OWS}

, denoted by

S S_{OWS} ≼_{(δ ζ, d, ℳ_{OWS, Euc})} T S_{OWS, INT}

if the following conditions hold

• $Q_{OWS, INT} = Q_{p_{1}, OWS, INT} \cup Q_{p_{2}, OWS, INT} = \cup_{ξ \in Ξ_{p_{1}, OWS}} ℳ_{OWS, Euc} (ξ) \cup \cup_{ξ \in Ξ_{p_{2}, OWS}} ℳ_{OWS, Euc} (ξ)$ is a finite set of discrete states; an initial set of discrete states is defined as $Q_{OWS, INT, 0} = \cup_{ξ \in Ξ_{OWS, 0}} ℳ_{OWS, Euc} (ξ)$ .

• $A_{OWS, INT} = {U_{p_{1}}, U_{p_{2}}}$ is the set of control values, where $U_{p_{1}}$ and $U_{p_{2}}$ are the allowable control input ranges in the $p_{1}^{th}$ and $p_{2}^{th}$ locomotion modes, respectively.

• $(q, a_{INT}, q^{'}) \in T_{OWS, INT}$ for $q, q^{'} \in Q_{p_{1}, OWS, INT}$ (or $q, q^{'} \in Q_{p_{2}, OWS, INT}$ , respectively), if there exists $a_{INT} \in A_{OWS, INT}$ being constant with one time-step duration δζ and some external disturbance d: [0, δζ] → D_OWS such that the resulting solution $ξ : [0, δ ζ] \to Ξ_{p_{1}, OWS, INT}$ (or $ξ : [0, δ ζ] \to Ξ_{p_{2}, OWS, INT}$ , respectively) satisfies ξ (0) = ξ ₀, $ξ_{0} \in ℳ_{OWS, Euc}^{- 1} (q)$ , $ξ (δ ζ) \in ℳ_{OWS, Euc}^{- 1} (q^{'})$ and the system dynamics in Equation (34) (or Equation (35), respectively).

The abstraction map $ℳ_{OWS, Euc} : Ξ_{OWS, INT} \to Q_{OWS, INT}$ maps a continuous state in Ξ_OWS,INT into a discrete state in the set $Q_{OWS, INT}$ . Equivalently, $Ξ_{OWS, INT} : = \cup_{q \in Q_{OWS, INT}} ℳ_{OWS, Euc}^{- 1} (q)$ . A typical implementation of such a map $ℳ_{OWS, Euc}$ is a uniform partition with a specific granularity. The condition in the last item of Def. 13 indicates that $T S_{OWS, INT}$ is an over-approximation of $S S_{OWS}$ . That is, all the transitions will be included as long as a transition is possible by using the locomotion dynamics under bounded disturbances. For instance, let us examine two consecutive inter-sampling discrete states q and q ^′. We add a transition ( q , a _INT, q ^′) if

ℳ_{OWS, Euc}^{- 1} (q^{'}) \cap ℛ_{δ ζ} (ℳ_{OWS, Euc}^{- 1} (q), a_{INT}) \neq \emptyset

(40)

where

ℛ_{δ ζ} (\cdot, \cdot)

is defined as

ℛ_{δ ζ} (Ξ_{0}, u) : = {ξ (δ ζ) | \dot{ξ} (τ_{r}) = f_{p_{i}} (ξ (τ_{r}), u (τ_{r}), d (τ_{r})), τ_{r} \in [0, δ ζ], ξ (0) \in Ξ_{0}, i = 1,2}

, representing the reachable set of Ξ₀ ⊆Ξ_OWS,INT after a time step δζ under the constant control input u . For nonlinear dynamics, it is difficult to compute the exact reachable set

ℛ_{δ ζ} (Ξ_{0}, u)

. To circumvent this hurdle, we compute an over-approximation of the exact reachable set

ℛ_{δ ζ} (Ξ_{0}, u)

, denoted as

{\hat{ℛ}}_{δ ζ} (Ξ_{0}, u)

. This over-approximation is obtained via employing interval-valued functions (refer to Jaulin (2001) for the details) of the discretized low-level dynamics. As a counterpart of real-valued functions, such an interval-valued function is evaluated over intervals and obeys interval arithmetic. As such, all the reachable states after a time step δζ from any state in Ξ₀ are captured in the output of an interval-valued function. By refining the set Ξ₀ into smaller intervals, we can approximate the reachable set

ℛ_{δ ζ} (Ξ_{0}, u)

with an arbitrary precision (Liu (2017)).

Next, we will discuss in detail how to compute the over-approximation ${\hat{ℛ}}_{δ ζ} (\cdot, \cdot)$ .

Assumption 1

(Disturbance additivity and boundedness). We assume that the right-hand side of the disturbed switched system in Equation (5) can be divided into a nominal part and a disturbance part:

f_{p} (ξ (ζ), u (ζ), d (ζ)) (ζ) = f_{p} (ξ (ζ), u (ζ)) + g_{p} (ξ (ζ), u (ζ)), p \in P, ζ \geq 0

(41)

and the disturbance part is element-wise upper bounded by

| g_{p} (ξ, u) | ≼ r, \forall p \in P, ξ \in Ξ, u \in U

(42)

with the bound vector

r \in ℝ^{n}

Algorithm 2: Reachability control synthesis

1: procedure ReachabilityControl (initial set I, target set G, inter-sampling finite abstraction $T S_{OWS, INT}$ )

2: assign a queue Que ← G, $G \subseteq Q$ {define a FIFO queue}

3: initialize a winning set $WIN \leftarrow G$

4: $K \leftarrow 0^{N \times M}$ ${N = {| Q_{OWS, INT} |}_{numel} and M = {| A_{OWS, INT} |}_{numel}}$

5: while Que ≠ ∅ do

6: q ^′ ←Que.pop ()

7: for all $q \in Q_{OWS, INT}$ and $a \in A_{OWS, INT}$ such that $q \overset{a}{\to} q^{'}$ do

8: if $q^{''} \in WIN$ for all q ^″ such that $q \overset{a}{\to} q^{''}$ then

9: if $q \notin WIN$ then

10: Que ← q

11: $WIN \leftarrow q$

12: $K (q, a) \leftarrow 1$

13: end if

14: end if

15: end for

16: end while

17: if $WIN \cap I \neq \emptyset$ then

18: isReachable ←true

19: else

20: isReachable ←false

21: end if

22: return isReachable, $WIN$ , $K$

Given a locomotion mode, we denote by δ ξ (ζ) the difference of two trajectories ξ ₁ and ξ ₂ at the same instant ζ. These two trajectories start from their initial states ξ _1,0 and ξ _2,0, respectively. With the Lipschitz condition and Assumption 1, we have

| δ \dot{ξ} (ζ) | ≼ L | δ ξ (ζ) | + r \Rightarrow | δ ξ (ζ) | ≼ | ξ_{1,0} - ξ_{2,0} | e^{L ζ} + L^{- 1} (e^{L ζ} - I_{n \times n}) r, \forall ζ \in [0, δ ζ]

which implies that under a disturbance bounded by the vector r, all the possible states after a time step δζ stay within a ball centered at the nominal trajectory state with a radius vector r_δζ = | ξ _1,0 − ξ _2,0|e^Lζ + L⁻¹ (e^Lδζ − I _n×n)r. Hence, the reachable set

ℛ_{δ ζ} (ℳ_{OWS, Euc}^{- 1} (q), a_{INT})

in Equation (40) can be over-approximated by the estimated reachable set of the nominal system

\dot{ξ} (ζ) = f_{p_{i}} (ξ (ζ), u (ζ))

enlarged by r_δζ.

Given the abstraction defined in Def. 13, we synthesize a reachability controller for the inter-sampling finite abstraction $T S_{OWS, INT}$ of a one-walking-step subsystem $S S_{OWS}$ as shown in Algorithm 2. This algorithm takes as inputs an initial set I, a target set G, and a finite abstraction $T S_{OWS, INT}$ . Backward dynamics propagation is used to determine the realizability of the reachability controller. This algorithm returns a Boolean value isReachable indicating the realizability of the target set G. If this target set is realizable, it outputs two additional sets: (i) a winning set $WIN$ defined as all the states from which the reachability goal is satisfied under bounded state disturbances; and (ii) a Boolean matrix $K$ indexing the control strategy $Ω : Q \to 2^{A}$ . Otherwise, $WIN$ and Ω are returned as empty sets. Note that, the operator |A|_numel on Line 4 of Algorithm 2 represents the total number of elements in the set A. Given a library of synthesized controllers in Algorithm 2, an execution of the complete reachability controller based on the robust finite transition system is shown in Algorithm 4 in the Appendix.

A merit of the proposed hierarchical structure in Figure 12 is to decompose the overall high-dimensional contact-rich planning problem into tractable sub-problems with smaller state dimensions, circumventing prohibitive computational complexity. In particular, the symbolic task planner takes charge of the high-level decisions being reactive to the environment actions. The middle-level robust finite transition system reasons about the robustness of a local phase space region around the nominal keyframe state w.r.t bounded state disturbances. The low-level phase-space planner executes the continuous locomotion dynamics. This hierarchy is analogous to the receding horizon control approach in Wongpiromsarn et al. (2012), where the complex high-dimensional planning problem is decomposed into a set of solvable sub-problems. This strategy facilitates efficient decision making during dynamic interactions with uncertain environments.

Remark 4

$T S_{prod}$ and $T S_{OWS}$ establish a hierarchical relationship for task decomposition. $T S_{prod}$ is a high-level decision maker of a nominal keyframe state while $T S_{OWS}$ reasons about the robustness of the local phase-space region around the nominal keyframe state determined from $T S_{prod}$ . Overall, $T S_{prod}$ and $T S_{OWS}$ form a top-down hierarchy (see Figure 11) to simultaneously achieve “global” phase-space decision making and “local” robustness reasoning.

In next section, we will prove the robust reachability of the synthesized controller, that is, the reachability goal is realizable for $S S_{OWS}$ if it is realizable for $T S_{OWS}$ . With such a guarantee, the robust finite transition system $T S_{OWS}$ interfaces the high-level task planner commands with the low-level hybrid motion planner.

6. Correctness of the reactive task and motion planner

Correctness guarantees of the WBDL planner play a key role in the successful execution of robust legged locomotion interacting with dynamic environments. The objective of this section is to prove such a correctness. In particular, the correctness of our planning framework is interpreted as successful implementations of the high-level task planner on the low-level motion planner under bounded disturbances. Our locomotion planner is a hierarchy composed of a task planning layer with reactive synthesis and a robust motion planner layer synthesized by a robust finite transition system. A high-level task planner, that is, a WBDL winning strategy $A_{WBDL}$ , is synthesized via a two-player game $G$ . The two-player game $G$ is solved between the robot and its environment to make a decision (p, s, q ) representing the locomotion mode p, the contact action s, and the keyframe state q , respectively. This locomotion decision determines a nominal phase-space motion plan and is sent to the robust finite transition system $T S_{OWS}$ such that the high-level decision is achieved in a robust manner by the low-level continuous motion planner.

6.1. One-walking-step robust reachability

To guarantee the robust finite transition system to be realizable by its underlying continuous system, we need to prove that the conditions in Equations (36)–(38) of Def. 11 also hold for continuous states. Since the robust finite transition system is based on the keyframes of one walking step, we name the keyframe reachability problem as “one-walking-step robust reachability.” We model the bounded disturbance causing initial state deviations, model uncertainties, and external perturbations during the evolution of the locomotion trajectory. The term “robust reachability” refers to the reachability of the goal robustness margin set centered around the final keyframe from the initial robustness margin set.

Theorem 1

(One-walking-step robust reachability). Consider a one-walking-step locomotion subsystem $S S_{OWS}$ with two pairs of decisions ( q _initial, p₁, s₁) and ( q _final, p₂, s₂) and its inter-sampling finite abstraction $T S_{OWS, INT}$ . Assume that $S S_{OWS} ≼_{(δ ζ, d, ℳ_{OWS, Euc})} T S_{OWS, INT}$ as defined in Def. 13. If it is realizable for $T S_{OWS, INT}$ , this walking step is realizable for $S S_{OWS}$ ; that is, the robustness margin set $ℬ_{ϵ_{2}} (q_{final})$ of the final keyframe q _final is reachable from $ℬ_{ϵ_{1}} (q_{initial})$ .

Proof

Suppose that the walking step from ( q _initial, p₁, s₁) to ( q _final, p₂, s₂) is realizable for $T S_{OWS, INT}$ , that is, the winning set $WI N_{TS} \neq \emptyset$ , and there exists a control strategy $Ω_{TS} : Q \to 2^{A}$ for $T S_{OWS, INT}$ . Let $q \in ℬ_{ϵ_{1}} (q_{initial})$ and $q \in WI N_{TS}$ . Under bounded disturbances, all the possible state sequences starting from q , generated by the locomotion dynamics and the control strategy synthesized by Algorithm 2, will finally reach the target set $ℬ_{ϵ_{2}} (q_{final})$ . Let ${q_{i}}_{0}^{l}$ $(l \in ℤ)$ be one of such sequences generated under a control sequence ${a_{i, INT}}_{0}^{l - 1}$ and a sequence of disturbances ${d_{i}}_{0}^{l - 1}$ such that q ₀ = q and $q_{l} \in ℬ_{ϵ_{2}} (q_{final})$ .

We construct a control strategy $Ω_{OWS} : Ξ_{OWS} \to 2^{A}$ for the one-walking-step locomotion system $S S_{OWS}$ by

a_{INT} \in Ω_{OWS} (ξ), if a_{INT} \in Ω_{TS} (ℳ_{OWS, Euc} (ξ))

Given the transitions of

T S_{OWS, INT}

assigned by Equation (40), for

\forall ξ (0) \in ℳ_{OWS, Euc}^{- 1} (q)

, there always exist a state q ′ and a control input a_INT such that

ξ (δ ζ) \in ℳ_{OWS, Euc}^{- 1} (q^{'})

. Thus, for any

ξ (0) \in ℳ_{OWS, Euc}^{- 1} (q)

, the resulting solution with the same control sequence

{a_{i, INT}}_{0}^{l - 1}

and disturbance

{d_{i}}_{0}^{l - 1}

generated by Ω_OWS will be guaranteed to reach the target set

ℬ_{ϵ_{2}} (q_{final})

. This implies that at time t_k = k ⋅ δζ, ∀k ≤ l,

ξ (t_{k}) \in WI N_{SS}

(i.e., the winning set of

S S_{OWS}

). Therefore,

WI N_{SS} \neq \emptyset

and Ω_OWS is such a controller that can realize the one walking step. This completes the proof. □

6.2. Correctness of the hierarchical WBDL planner

Given the one-walking-step robust reachability of Theorem 1, we now prove the correctness of the top-down planning hierarchy, that is, $T S_{prod} \to T S_{OWS} \to T S_{OWS, INT} \to S S_{OWS}$ as shown in Figure 11. The correctness is defined in a robust sense; that is, the actual keyframe states of the phase-space trajectory always stays within the robustness margins of the nominal keyframe states determined by the high-level task planner.By this definition and stutter-equivalence (Baier and Katoen, 2008), we can conclude κ⊧φ if and only if γ⊧φ, where φ is the task specifications in the symbolic task planner. For our phase-space planning, the interval H_k represents the phase duration of the k^th walking step. This guarantees that the left boundary point of H_k approaches to infinity as k → ∞, and thus the continuous implementation guarantees the Zeno behavior to be ruled out. For detailed explanations, reader can refer to Liu et al. (2013) and the reference therein. Given these preliminaries, we have the following correctness theorem:

Figure 11.

Top-down hierarchy of layered abstractions: $T S_{prod}$ represents the finite product transition system of the high-level task planner in Section 3; $T S_{OWS}$ denotes the robust finite transition system for one walking step in Def. 11; $T S_{OWS, INT}$ denotes the inter-sampling finite abstraction of one walking step in Def. 13; $S S_{OWS}$ represents the continuous one-walking-step locomotion subsystem in Def. 12.

Definition 14

Assume a low-level locomotion trajectory κ = ( ξ , ρ, η, μ) and a high-level decision sequence γ = (q, e, s, p) as defined in Definition 7. The low-level trajectory κ is a continuous implementation of the high-level execution γ, if there exists a sequence of non-overlapping phase intervals $ℋ = H_{1} \cup H_{2} \cup H_{3} \dots$ and $\cup_{i = 1}^{\infty} H_{i} = ℝ^{+}$ such that ∀ζ ∈ H_k, ∀k ≥ 1, the following mappings hold

ξ (ζ_{left - bnd}) \in ℳ_{Riem}^{- 1} (q_{k}), ξ (ζ_{right - bnd}) \in ℳ_{Riem}^{- 1} (q_{k + 1}), ρ (ζ) = e_{k}, η (ζ) = s_{k}, μ (ζ) = p_{k}

where

ℳ_{Riem}

is the Riemannian space abstraction defined in Equation (28), which maps the continuous state ξ region centered at the keyframe state into the discrete keyframe q (Liu et al. (2013)). ζ_left-bnd and ζ_right-bnd are the left and right boundary value of the interval H_k.

Theorem 2

(Correctness of the WBDL task and motion planner). Given a robust finite transition system $T S_{OWS}$ , a winning WBDL strategy synthesized from the two-player game is guaranteed to be implementable by the underlying low-level phase-space motion planner in a provable correct manner.

Proof

By Proposition 1, a winning WBDL strategy $A_{WBDL}$ synthesized from the WBDL task planner game solves the discrete locomotion planning problem on $T S_{prod}$ . This synthesis is correct-by-construction thanks to the properties of GR(1) formulae. According to $A_{WBDL}$ , the system action s_k+1, switching mode p_k+1, and keyframe q_k+1 at the next (k + 1)^th walking step are derived from next environment actions e_k+1 and current keyframe state q_k. To verify the correct implementation of a high-level decision sequence γ, we use the switching strategy semantics: given an initial state ξ (ζ₀) and an initial environment action ρ(ζ₀) = e₀, we assign η(ζ₀) = s₀ and μ(ζ₀) = p₀ according to $A_{WBDL}$ , where the step index k = 0. By using the control library synthesized from the robust finite transition system $T S_{OWS}$ with decision tuples of two consecutive walking steps (i.e., (p₀, s₀, q₀) and (p₁, s₁, q₁)), we select a specific reachability controller synthesized by $T S_{OWS, INT}$ to achieve a robust keyframe transition at the next walking step. This is guaranteed by the one-walking-step robust reachability in Theorem 1, where the realizability of $T S_{OWS, INT}$ implies the realizability of the underlying continuous system $S S_{OWS}$ . By executing this reachability controller, the continuous dynamics ξ (ζ) evolve by following the dynamics of a specific locomotion mode $\dot{ξ} (ζ) = f_{p_{0}} (ζ (ζ), u (ζ), d (ζ))$ under the bounded disturbance d. Once we detect a new environment action e₂ before the locomotion contact switch, a new decision tuple (p₂, s₂, q₂) is generated immediately based on $A_{WBDL}$ . Given this new decision tuple, the same procedure is repeated as above for the future k^th walking step where $k \in ℤ_{\geq 2}$ . Therefore, it is proved that the low-level trajectory correctly implements the high-level decision sequence. □

6.3. Replanning strategy and robustness

It is worth noting that the proposed correctness holds under a set of assumptions on allowable environmental and system actions and disturbance boundedness. However, sometimes the real-world disturbance can violate the bounded disturbance assumption and perturb the state to be out of the local reachability region (i.e., the winning set). To handle this situation, we establish a replanning strategy to request a new high-level task planner command. Ideally, the union set of all local winning sets is expected to cover the entire state space of interest. However, the existence of such a winning set union often cannot be guaranteed. Thus, it is difficult to generalize formal correctness of one winning set to that of the union set of all winning sets. What we strive to is to maximize the phase-space coverage by the union set of all winning sets. From a practical implementation viewpoint, synthesizing a large number of winning sets enables our planner to cover a sufficiently large phase space such that it is always likely to find a feasible winning set when large disturbances occur (Figure 12).

Figure 12.

Hierarchical structure of robust WBDL task and motion planner. This planner has three cascaded planning layers: high-level task planner, middle-level keyframe-based robust finite transition system, and low-level continuous motion planner. A replanning process (see blue lines) is triggered when the state is out of the reachability region (i.e., the winning set) or a sudden environmental change occurs.

In other words, there is no ground truth of “formal correctness” for real robotic systems. Even though we have a provably correct planner and implement it on a real robot in a correct way, the actual planner may not be formally “correct” due to many potential hardware issues. For instance, unmodelled actuator dynamics can easily break the correctness guarantee of task specifications at the high level. Thus, it makes more practical sense to target a formally correct approach that generates a palette of robust controllers, the winning sets of which jointly cover a sufficiently large phase-space of interest (if not a global state space). Our results indicate that a properly designed controller switching mechanism among these locomotion winning sets enables an effective replanning strategy such that a set of contiguous phase-space initial robustness margin sets can be controlled to reach a set of contiguous goal robustness margin sets under bounded disturbances.

Overall, the proposed planning framework reasons about robustness at the following three levels.

• The robust finite transition system $T S_{OWS}$ explicitly incorporates neighboring keyframe states via the robustness margin around the nominal keyframe state to handle initial state uncertainty in each walking step.

• If the disturbance is larger than the boundary value modeled in the controller synthesis, the state may be disturbed out of the winning set. In this case, an alternative winning set will be searched in the control library of allowable keyframe transitions determined by $T S_{OWS}$ . If no alternative winning set is feasible, a replanning signal will be sent to the high-level task planner. The task planner will use the synthesized automaton $A_{WBDL}$ to assign a new locomotion decision (p, s, q ) and send it to the motion planner layer for replanning the next walking step.

• When an environment event changes suddenly, a replanning signal will be sent to the high-level task planner. Note that, this replanning process can only be executed before the next step transition. Otherwise, the contact of the next walking step already occurs. Figure 23 in the Appendix shows a timing sequence of the replanning process. More details of this replanning process are shown in Algorithm 4 in the Appendix.

Algorithm 3. Execution of reactive task and motion planner

1: procedure ExecuteReactivePlanner (nextEnvironmentAction e_next, currentKeyframe q_current, currentLocomotionMode p_current, automaton $A_{WBDL}$ )

2: $e_{next} \in ℰ$

3: q_next ← getNextKeyframe (e_next, q_current) {look up $A_{WBDL}$ }

4: s_next ← getNextContactAction (e_next, q_next)

5: p_next ← getNextLocomotionMode (e_next, q_next)

6: ( ξ _trans, t_trans) ← searchContactTransition (q_current, q_next)

7: ( y _limb,next) ← searchLateralLimbLocation (q_current, q_next)

8: ( ξ , χ ) ← ExecuteOWSReachabilityControl (keyframeState q_current, q_next, locomotionMode p_current, p_next, contactAction s_current, s_next, initialCoMState ξ _init, transitionTime t_trans, nextEnvironmentAction e_next)

9: (e, q, p, s)_current ← (e, q, p, s)_next {update task planner states}

10: repeat from Line 2

11: end procedure

7. Results

We demonstrate WBDL results by sequentially composing the low-level locomotion modes via the symbolic task planner. In particular, we analyze in detail the robustness performance of the reachability control with respect to several key parameters. The Temporal Logic Planning (TuLiP) toolbox, a python-based embedded control software (Wongpiromsarn et al., 2011), is used to synthesize the symbolic task planner. The gr1c⁸ tool, involving the CU Decision Diagram Package, is used by TuLiP as the underlying synthesis solver. The synthesized planner is correct by construction, that is, satisfying all the proposed specifications. The LTL synthesis procedure is offline and will take around 20 min to generate an automaton on a MacBook with a 2.9 GHz Intel Core i9 processor and 32 GB of memory. Once the automaton is generated, the task planning process will be executed at run-time. To guarantee the successful implementations of the low-level motion plans by the high-level task planner, we perform a robust reachability analysis of the keyframe states by using the so-called robustly complete control synthesis (ROCS)⁹ tool (Li and Liu, 2018), which currently supports abstraction-based control synthesis using both uniform and non-uniform discretizations. ROCS also generates feedback control strategies, which are used to design the control library of our locomotion tasks.

7.1. Case I: Locomotion with stopping and brachiation behaviors

We first demonstrate a locomotion scenario involving environmental actions such as the appearance of a human and the terrain being crack in the scene. The synthesized discrete task planner is represented by a finite state automaton with 27 states and 148 transitions. The two-dimensional keyframe state q is composed of the apex velocity and step length. For either dimension, Small, Medium and Large labels are assigned to {1, 2, 3} in order while Stop and Swing labels are assigned to {0, 4}. Figure 13 illustrates the sequentially composed center-of-mass (CoM) sagittal phase-space trajectory of a 20-step walking process. Figure 14 illustrates the discrete environment and system contact actions, and the corresponding keyframe states. At the low level, four locomotion modes (i.e., PIPM, PPM, MCM, SLM) are alternated according to the high-level decisions¹⁰. These high-level decisions are sent to the low-level motion planner one walking step ahead, that is, a one-walking-step horizon. By inspecting the discrete sequences, we can verify that all the system contact actions and keyframe states respond to the environmental actions correctly; that is, all the task specifications are satisfied. Figure 15 illustrates dynamic motion snapshots and continuous kinematic trajectories of the vertical CoM, foot, and hand positions. An accompanying video about the WBDL behaviors is available at https://youtu.be/BdxYCmhRIMg.

Figure 13.

Sequential composition of the sagittal CoM phase-space trajectories and mode switchings for a 20-step WBDL maneuver. The top four figures illustrate phase-space manifolds of the four locomotion modes. The mode switching is governed by the synthesized high-level contact planner. Among these steps, two terrain crack and one human appearance events are taken into account. A short multi-contact phase is designed between every two consecutive modes for a smooth transition (see the short red trajectory between two green dots).

Figure 14.

Environment events, system actions and keyframe states of 50 walking steps according to the synthesized automaton. Actions and states are indexed by numbers. Emergency events, that is, human appearance and terrain crack, are highlighted in the shaded regions. In the bottom subfigure, the numbers 0 to 4 on the vertical axis correspond to {0, 0.4, 0.6, 0.8, 1.7} m/s for next step apex velocity and {0.15, 0.5, 0.6, 0.7, 0.6} m for next step step length.

Figure 15.

Snapshots of the WBDL motions in respond to two environmental emergencies. The snapshots show a sequence of locomotion behaviors including a brachiation motion over the cracked terrain and a stopping motion when a human appears. The figure at the bottom shows the CoM vertical position trajectory (orange thick line), hand and feet trajectories (thin interlaced lines).

7.2. Case II: Locomotion with ground sliding and hopping behaviors

When the robot maneuvers through a narrow passage, an ordinary locomotion mode (e.g., walk and brachiation) will not work anymore due to the confined height. As such, a natural solution is to use a ground sliding mode: the robot crouches and slides with two feet through this constrained space as shown in Figure 16. The two arms are placed at a low position to avoid the contact with the ceiling. As shown in the bottom right subfigure of Figure 16, there are two multi-contact transition phases before and after the sliding phase (see the gray trajectory segments). We assume a constant negative CoM acceleration during the sliding phase, and thus the phase space trajectory of the sliding phase is a parabola. The low-level locomotion model corresponds to the mode (f) in Section 3.1.

Figure 16.

Ground sliding motion when a narrow passage appears in the scene. The robot crouches and slides on the ground through the low-ceiling area with a constant negative acceleration. This ground sliding motion is preceded and succeeded by a multi-contact transition phase as shown in the phase-space subfigure at the lower right corner.

When a crack in the terrain and a high ceiling occur simultaneously, the robot cannot grasp the overhead support any longer. To maneuver forward successfully, the robot has to leap over the unsafe region as shown in Figure 17. Thus, a hopping phase will be executed with no contact with the environment. A constant CoM sagittal velocity shows up in the sagittal phase portrait while a parabola appears in the vertical position trajectory of Figure 17(b). The keyframe state of this hopping motion is chosen to be the center of the horizontal line segment in the sagittal phase-space. The lateral velocity is set to zero to avoid a lateral drift. The state will stay at the red dot in Figure 17(d) during the hopping motion. Since the robot locomotes forward, the lateral phase portrait in Figure 17(d) behaves like a limit cycle but non-periodically due to the rough terrain. The low-level locomotion model corresponds to the mode (e) in Section 3.1. Via introducing specific locomotion modes, our planner is capable of handling emergency-motivated scenarios.

Figure 17.

Hopping motion when a crack in the terrain and a high ceiling occur simultaneously. In this case, no overhead support is available for grasping so the robot has to jump over the cracked terrain.

7.3. Case III: Locomotion replanning strategy

When the robot is already leaping in the air and detects a sudden change from an ordinary terrain to a cracked terrain, it has to replan its contact action and locomotion mode to accommodate this sudden change on the fly. The robot will execute a replanning process to ask the high-level planner for a new decision, that is, grasp the ceiling support and swing the robot’s body over the cracked region as shown in Figure 18. Otherwise, the robot will fail to locomote. The top subfigure of Figure 18 shows a decision sequence including the replanning process. The three rows represent environment actions, locomotion modes, and keyframe states, respectively. The second column with three dotted blocks is the decision before the replanning process and not executed yet until the next walking step. The third column represents the replanned decision which is executed in response to the sudden environmental action change. This replanning process in the phase-space is illustrated in Figure 23. Let us consider a more challenging terrain scenario (i.e., terrainCrack-highCeiling as described in Section 4), where there is a cracked terrain with a high ceiling. Then the robot cannot replan by grasping the overhead support anymore, and the locomotion process will fail inevitably. To avoid this situation, the terrainCrack-highCeiling environment action is not allowed to occur consecutively in our environment specification S_e-1.

Figure 18.

Replanning in response to a sudden environment event change. Suppose that during the flight phase, the robot finds out that the next terrain is cracked. Accordingly, the robot triggers its replanning process by changing the locomotion mode to the prismatic pendulum model, that is, grasping the overhead support, swinging the body over the second terrain crack region, and then landing on the non-cracked terrain. The top subfigure shows the decision sequence in the symbolic task planner. The bottom subfigure shows the snapshot sequence of the locomotion process.

Besides the above replanning strategy in response to environmental changes, there is another replanning strategy embedded in the reachability control library. When the robot state is perturbed to be out of the reachability region, that is, the winning set currently being executed, the robot cannot reach the robustness margin of the targeted keyframe. Then the robot calls for a new reachability controller in the library that covers the current perturbed state and uses this replanned controller to reach a new keyframe goal. Overall, the replanning process determines which new controller to call in the control library.

7.4. Case IV: Validation of the robust reachability controller

This case study evaluates the performance of the synthesized controller given a keyframe robust reachability goal. Let us first consider the prismatic inverted pendulum model (PIPM). Assume that we have a sagittal CoM state vector $ξ = {(x, v_{x})}^{T}$ and two consecutive locomotion modes denoted by PIPM₁ and PIPM₂, respectively. The PIPM dynamics in Equation (3) are reformulated as the CoM dynamics below:

(\begin{matrix} \dot{x} (ζ) \\ {\dot{v}}_{x} (ζ) \end{matrix}) = (\begin{matrix} v_{x} (ζ) \\ ω_{PIPM}^{2} (x (ζ) - x_{foot}) \end{matrix})

(43)

where we assume zero torso angular momentum (τ_x, τ_y) = 0 and a predefined foot placement position x_foot for simplicity. The continuous control input ω_PIPM ∈ [ω_nominal − δω, ω_nominal + δω], where δω is a predefined bound. Note that the parameters x_foot,1, x_foot,2, and ω_nominal are determined by the high-level symbolic task planner.

Let us define two nominal keyframe states $q_{initial} = (\dot{x} (ζ_{0}), v_{x} (ζ_{0})) = (0 m, 0.5 m / s)$ and $q_{final} = (\dot{x} (ζ_{final}), v_{x} (ζ_{final})) = (0.5 m, 0.6 m / s)$ ¹¹ determined from the high-level planner. The goal is to solve the PIPM closed-loop phase-space trajectories starting from the initial robustness margin set $ℬ_{ϵ_{1}} (q_{initial})$ and reaching the final robustness margin set $ℬ_{ϵ_{2}} (q_{final})$ as defined in Def. 10. To this end, we synthesize a controller to determine the realizability of a keyframe transition in one walking step for $T S_{OWS}$ and generate a control strategy $Ω : Ξ_{OWS} \to 2^{A_{OWS}}$ if it is realizable. The intermediate robustness margin set $ℬ_{inter}$ between the locomotion modes PIPM₁ and PIPM₂ is defined by Equation (38). Locomotion mode switching is only allowed when the state is within $ℬ_{inter}$ . Overall, the controller synthesis of one walking step is composed of three steps: first, the CoM trajectory starts from $ℬ_{ϵ_{1}} (q_{initial})$ and moves towards $ℬ_{inter}$ ; second, the state reaches $ℬ_{inter}$ and switches the locomotion mode; third, the CoM state reaches $ℬ_{ϵ_{2}} (q_{final})$ . To reach $ℬ_{ϵ_{2}} (q_{final})$ , the conditions in Equations (36)–(38) need to be satisfied by propagating the PIPM dynamics forward under bounded state disturbances and a bounded control input ω_PIPM. For this example, we assign the initial and final robustness margins of $ℬ_{ϵ_{1}} (q_{initial})$ and $ℬ_{ϵ_{2}} (q_{final})$ as $δ ζ_{ϵ_{1}} = 0.05, δ σ_{ϵ_{1}} = 0.002$ and $δ ζ_{ϵ_{2}} = 0.05, δ σ_{ϵ_{2}} = 0.006$ , respectively.

As to the underlying continuous locomotion subsystem $S S_{OWS}$ , we assign the one-walking-step state space $Ξ_{OWS} = \cup_{p \in P_{OWS}} Ξ_{p}$ where $P_{OWS} = {{PIPM}_{1}, {PIPM}_{2}}$ , Ξ_p = [ − 0.1 m, 0.7 m] × [0.1 m/s, 1.2 m/s] for all $p \in P_{OWS}$ , the initial state set $Ξ_{OWS, 0} = {ξ : | Z_{p_{1}, σ} (ξ) | \leq δ σ_{bound, init}}$ , the control space $U_{OWS} = [2,4]$ . To construct an inter-sampling finite abstraction $T S_{OWS, INT}$ , we uniformly discretize the state space Ξ_OWS with a granularity [0.005 m, 0.005 m/s] and sample the control space $U_{OWS}$ with a 0.02 rad/s granularity, resulting in a sampled finite control set ${\hat{U}}_{OWS} = 0.02 ℤ \cap [2,4]$ . Let $A_{OWS} = {\hat{U}}_{OWS}^{[0, δ ζ]}$ be a piece-wise trajectory with zero-order-hold values in [0, δζ], where δζ = 2 ms is the time duration for each state in $T_{OWS}$ . The system dynamics in (43) are subject to additive disturbances bounded by D_r = (0.05 m; 0.1 m/s), that is, position and velocity disturbances, respectively. Given the PIPM parameters above, we synthesize a reachability controller of $T S_{OWS, INT}$ and the computed winning sets are shown in Figure 19(a). As the result shows, the one-walking step reachability is realizable as long as the winning set overlaps (at least partially) the initial and final robustness margin sets. Five simulated trajectories under randomly-sampled bounded disturbances are shown as the black lines. Figure 19(b) evaluates the changing size of the winning set under different levels of the disturbance. The winning set shrinks as the disturbance set increases because the synthesized controller needs to reach the goal robust set against a larger set of disturbances.

Figure 19.

The additive disturbances to the dynamics are bounded by D_r = (0.05 m; 0.1 m/s) in the subfigure (a). The shaded yellow region represents the winning set. The black trajectories are the five closed-loop trajectories simulated in five trials. The blue trajectory represents a trial suffering a large disturbance, that is, a velocity jump in the phase-space. Since the disturbed state is still in the winning set, the CoM trajectory is guaranteed to reach the final robustness margin set. In subfigure (b), different levels of bounded disturbances are modeled in the computation of the winning sets. Naturally, a larger magnitude of the disturbance results in a smaller winning set.

We design reachability controllers for all the combinations of the locomotion mode set

P

. Consider another locomotion mode transition from the PIPM to the PPM. The PPM dynamics in Equation (4) are reformulated as follows:

(\begin{matrix} \dot{x} (ζ) \\ {\dot{v}}_{x} (ζ) \end{matrix}) = (\begin{matrix} v_{x} (ζ) \\ - ω_{PPM}^{2} (x (ζ) - x_{hand}) \end{matrix})

(44)

with the assumption of τ_x = τ_y = 0 and a predefined hand contact position x_hand. Other parameters are defined in Table 1. To evaluate the robustness performance of the synthesized controller, we examine the success rate of reaching the goal robust set through 50 simulation tests under different granularities and bounded disturbances. In Figure 20(a), each trial is run for the one walking step with the PIPM-PPM mode pair. The exerted disturbance in the simulation is the same as the one used in the controller synthesis process, that is, D_r = (0.15 m, 0.3 m/s). As shown in Figure 20(a), all the trials reach the final robustness margin successfully. This agrees with the correctness guarantee by the one-walking-step robust reachability property of Theorem 1.

Table 1.

Parameters of the PIPM-PPM mode transition.

Parameters	Values	Parameters	Values
Initial keyframe q _initial	(0 m, 0.5 m/s)	Final keyframe q _final	(0.6 m, 1.7 m/s)
Initial tangent bound δσ_bound,init	0.002	Initial cotangent bound δζ_bound,init	0.05
Final tangent bound δσ_bound,final	0.06	Final cotangent bound δζ_bound,init	0.005
Mode set $P_{OWS}$	{PIPM,PPM}	Disturbance range D_r	(0.15 m, 0.3 m/s)
OWS state space Ξ_p	[ − 0.1 m, 0.7 m] × [0.1 m/s, 1.8 m/s]	Control space $U_{OWS}$	[2 rad/s, 4 rad/s]

Figure 20.

Success rate of the simulations under varying granularities and disturbances. In subfigure (a), the system is subjected to disturbances bounded by D_r = (0.15 m; 0.3 m/s). All the 50 simulation trails can reach the goal robustness margin set successfully. In subfigure (b), we run 1000 trials for each case with a specific granularity and a bounded disturbance. The disturbance exerted in the simulation remains the same, that is, D_r = (0.1 m, 0.2 m/s).

We evaluate the effect of the discretization granularity and the magnitude of disturbances used in the controller synthesis process as shown in Figure 20(b). Given each controller synthesized using a specific granularity and for a specific disturbance bound, we simulate 1000 trials with the bounded disturbance D_r = (0.1 m, 0.2 m/s). Figure 20(b) shows four sets of simulation results for different granularities ranging from 0.002 to 0.005. For each set of simulations, the success rate increases as the modeled disturbance in the controller synthesis increases, and it reaches 100% when the modeled disturbance matches the actual disturbance D_r used in the simulation. This is consistent with our expectation. Let us inspect the figure from another perspective. If we compare the results for different granularities with a specific disturbance set D_i (i = 0, 1, 2, 3, 4), the success rate almost remains the same. This is because when constructing the abstraction for the robust reachability analysis, we have taken into consideration the effects of approximation errors caused by different discretization granularities, by using non-deterministic transitions that over-approximate the dynamics of the system. In addition, we observe that the success rates for all the synthesized controllers are greater than 97%, even in the case no disturbance is considered in the controller synthesis. This can again be interpreted by the over-approximation used in the abstraction. Nonetheless, as shown in the simulations, to achieve 100% correctness guarantee, the modeled disturbance has to be larger than (or at least match) the actual disturbance in the simulation. Moreover, under the same disturbance D_r, the nominal phase-space planner with a fixed open-loop control input only achieves a success rate of 29%. This huge discrepancy in success rate clearly shows the advantage of using an abstraction-based feedback controller over an open-loop phase-space planner.

7.5. Case V: Integrated multi-step locomotion via the reachability control library

This case evaluates an integrated multi-step locomotion example with the robust finite transition system $T S_{OWS}$ , the inter-sampling finite abstraction $T S_{OWS, INT}$ , and the replanning strategy. Assume that the decision of the task planner renders a locomotion mode sequence involving the PIPM, PPM, and MCM modes below:

PIPM \to PIPM \to PPM \to PIPM \to MCM \to PIPM \to PIPM

To enable the initial and final keyframe robustness margin sets to cover a sufficiently larger phase space, we extend the default 3 × 3 keyframe grid to a 5 × 5 keyframe grid for each mode. This allows the reachability controllers to be applicable to a larger set of keyframe states. For each locomotion mode pair, we synthesize all the feasible controllers that reach the final keyframe robustness margin set under a bounded disturbance. We enumerate all the combinations of the allowable locomotion mode pairs and generate all the reachability control policies offline. These controllers are saved as a control library and are executed at runtime according to the high-level decision and measured states under bounded disturbances.

Parameters of constructing the inter-sampling finite abstraction $T S_{OWS, INT}$ are defined as follows. The controller synthesis and execution process use the same disturbance bound D_r = (0.05 m; 0.1 m/s). The full discretized state space is Ξ_full = [ − 0.2 m, 3.8 m] × [0.2 m/s, 1.9 m/s] with a granularity (0.003 m, 0.003 m/s). The local state space of each walking step is chosen so that it is sufficiently large to cover the space around the two keyframe states. A time step δζ = 0.02s is used for the abstraction construction of each walking step. The control inputs for PIPM, PPM, and MCM satisfy ω_PIPM ∈ [2, 4], ω_PPM ∈ [2, 4] and ω_MCM ∈ [1, 3]. We obtain the sets of sampled control values by a granularity of 0.02. The robustness margins of the phase space manifolds are δσ_PIPM = 0.002, δζ_PIPM = 0.002; δσ_PPM = 0.04, δζ_PPM = 0.003; δσ_MCM = 0.15, δζ_MCM = 0.9 × 10⁻⁵.

The computational time for constructing abstractions is around 30 s on average, and 5 s–15 s for synthesizing a reachability controller corresponding to each keyframe pair, depending on the number of states and transitions of the abstraction. Since we synthesize 625 (i.e., 25 × 25) reachability controllers for each walking step, the time to generate them is approximately 90 mins. In our simulation of six consecutive walking steps, all the local reachability control strategies are patched together to cover the overall state space. The time for simulating a single closed-loop walking trajectory is around 2 s. As the results show in Figure 21, we simulate six different trials with different initial conditions, that is, starting from different initial robustness margin sets. Each locomotion trajectory is guaranteed to reach one of the robustness margin sets at the next walking step via using the reachability controller from the control library. In particular, a trial is tested to evaluate the replanning strategy when the state is perturbed out of the winning set (Figures 22 and 23).

Figure 21.

Integrated phase-space trajectories of multi-walking step simulations under bounded disturbances. The replanning strategy is evaluated with a trial (see the blue trajectory) exerted with a velocity disturbance larger than the modeled disturbance in the MCM mode (around the position x = 2.3 m). In this case, the state is perturbed out of the winning set of the currently used reachability controller. A replanning signal is triggered, and the planner searches within the control library for a new winning set (together with a new reachability controller) that covers the perturbed state. Then the perturbed state will use that new controller to reach a new robustness margin set for continuous locomotion maneuvering.

Figure 22.

An illustration of the top-down decision sequence of the high-level reactive task planner and middle-level reachability controller synthesis. It illustrates the relationship between the keyframe state, environment action, and system mode.

Figure 23.

Replanning timing for the next walking step. The high-level task planning for the next walking step is determined at the beginning of one walking step. Then during the remaining time of the current walking step (before switching to the next walking step), a replanning process can be triggered anytime if the state is out of the reachability region (i.e., the winning set) or the environment action change suddenly. This figure uses single-contact prismatic inverted pendulum model for illustration.

8. Discussions and future work

8.1. Low-level uncertainties

This paper proposes a hierarchical approach to the task and motion planning of dynamic locomotion in complex environments. We achieve robustness against a general, bounded disturbance by synthesizing a middle-layer robust reachability controller with robustness margins to accommodate low-level uncertainties. Undoubtedly, a variety of low-level uncertainties can come from time delays, actuation limits, unmodeled dynamics, state estimation, and measurement error from the environment. These uncertainties severely deteriorate the execution success rate of the high-level planner, in particular when the robot performs highly agile motions in complex and unstructured environments. In addition, the abstraction methods can induce approximation errors between the high-level and low-level planners. Although not directly dealing with these low-level uncertainties and abstraction approximation errors, the keyframe-based robustness margin proposed in this paper can be viewed as an abstract representation of these uncertainties in the center-of-mass (CoM) state space. As long as a mapping can be established between these low-level uncertainties and the CoM phase-space deviations from the nominal trajectory, these uncertainties can be handled indirectly by the proposed reachability controller at run-time. Additionally, a replanning strategy is designed to handle large uncertainties that are not explicitly modeled in the reachability controller. In the future, abstraction refinement (Nilsson and Ozay, 2014) will be inspirational for designing a model abstraction with a proper granularity. More importantly, implementing the proposed high-level decision-making algorithms in the dynamic simulation and real hardware (Kim et al., 2016; Luo et al., 2017)), and evaluating the robustness performance against low-level uncertainties will be our main upcoming task.

8.2. System and environment assumption relaxation

To make the proposed hierarchical planning approach applicable to locomotion tasks in more complex and cluttered environments, it is important to relax the assumptions and approximations of the environment and model more realistic scenes. For instance, how to formally design recovery strategies for slippery terrains (i.e., with friction coefficient inaccuracies), large tilting angles, and swing foot obstacle collision is a practically meaningful topic.

Our current planner assumes that all limb contacts switch synchronously. To relax this conservative assumption, we will explore the asynchronous contact switching strategy in the future. This relaxation opens up the opportunity for designing more natural and diverse locomotion contact behaviors. From a more general perspective, contact actions and keyframe states may exhibit probabilistic features. Incorporating probabilistic models, such as Markov decision process (MDP) (Feng et al., 2015; Fu and Topcu, 2014; Platt et al., 2004), into the high-level decisions will be a promising direction. Accordingly, studying probabilistic correctness and completeness will be of our interest.

8.3. Generalization to complex tasks

Generalizing the proposed planning framework to more realistic locomotion tasks is of practical importance, in particular when robots are unleashed into the real world. Some more practical locomotion tasks include walking while carrying a payload, walking alongside human teammates, dynamically interacting with a human during motion (Alonso-Mora et al., 2018), and multi-robot coordination (da Silva et al., 2016). To this end, how to design an automatic method for generating locomotion primitives of diverse tasks becomes important. Also, allocating computing resources efficiently among different planning layers is an essential topic. A mission planner will be needed to operate at a more abstract level to make decisions on task allocation, coordination, and synchronization. A key problem is how to properly design integrated, scalable, and reactive mission and motion planners (da Silva et al., 2016) for legged robots to accomplish collaborative tasks in dynamic and unstructured environments.

At the individual robot level, our motion planner is designed for the three-dimensional case, although the demonstrated locomotion tasks are primarily straight walking. In the future, we will incorporate steering models (Zhao et al., 2017) such that the locomotion behaviors are extendable to complex 3D motions with steering capabilities. An advantage of our planning framework is to use simplified models which allow us to efficiently compose multiple locomotion modes and achieve dynamic and complex locomotion behaviors in constrained environments. The high-level symbolic planner automates this sequential composition process and guarantees the formal correctness of the overall planning framework.

An application of the proposed whole-body dynamic locomotion methodology in the constrained environment is the following: The US Defense Advanced Research Projects Agency (DARPA) created a Subterranean Challenge (DARPA, 2018) aiming at augmenting underground operation capabilities. “The Challenge aims to explore new approaches to rapidly map, navigate, and search underground environments … and propose novel methods for tackling time-critical scenarios through unknown courses in mapping subsurface networks and unpredictable conditions, which are too hazardous for human first responders.” Our proposed hierarchical decision-making approach for whole-body dynamic locomotion in constrained environments raise the importance of decision-making algorithms with formal guarantees for robots as complex as humanoid robots, a research topic of increasing importance as these robots begin to move out of the laboratories and work outdoors.

8.4. Planning horizon

Making planner decisions with a sufficiently long predictive horizon has great potential to enable intelligent and robust behaviors in complex and dynamically changing environments (Egerstedt et al., 2018). Our task planner has a one-walking-step horizon and may sometimes result in myopic locomotion decisions. For instance, if the disturbance is so large that the robot cannot recover within one walking step, our planner will execute a replanning process. However, a natural alternative is to design a recovery strategy over the next multiple walking steps, which is commonly used in the recovery process of human locomotion. The downside of this strategy is the increase in computational complexity. Our planning process substantially relieves this computational burden by using the simplified locomotion models. In addition to this computational consideration, the design of the planning horizon should take into account the versatility of the locomotion behaviors. For instance, if the locomotion process is of high speed, being able to make predictions over a longer horizon will be advantageous. Overall, we should take into account the computational power and behavior versatility when designing the planning horizon. Algorithm 2 is designed in a general form and should be extendable to the multiple-walking-step scenario.

9. Conclusions

This paper employs temporal-logic-based formal methods to synthesize a high-level reactive task planner and designs a middle-level discrete control to achieve the one-walking-step robust reachability process for complex WBDL behaviors in constrained environments. A particular focus has been given to (i) the robustness of the keyframe state reachability with respect to bounded disturbances and (ii) the correctness of the top-down hierarchy from the high-level task planner to the low-level motion planner processes.

A diverse set of locomotion models are devised at the low-level to form a template library in response to various environmental events, including those adversarial ones such as cracked terrain, human appearance, and narrow passage. These adversarial events require specifically-designed locomotion modes to enable desired locomotion behaviors. Deviating from numerous existing studies primarily using a single simplified model for a specific locomotion task, our symbolic task planner focuses on integrating and unifying a variety of simplified models and achieves complex locomotion behaviors via sequential composition of trajectories. A key novelty of this task planner lies in solving the traditional contact planning problem via a two-player game. Contact decisions are determined according to the synthesized switching protocol in response to possibly adversarial environment actions.

As for the reachability control under disturbances, we propose a robust metric of the keyframe state and use it to design a robust finite transition system realized by the underlying reachability synthesis. The proposed task and motion planner is validated through simulations of WBDL maneuvers in constrained environments. The performance of the reachability control is benchmarked via a series of synthesis and execution tests. We expect that this line of work acts as an entry point for the locomotion community to employ formal methods to verify and synthesize planners and controllers in legged and humanoid robots (Hereid et al., 2016; Kuindersma et al., 2016; Ramezani et al., 2014). Evaluating the proposed framework on dynamic bipedal robots is one of our high-priority future works.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the NSF Grant [#1924 978, #1724 360, #1652 113], Office of Naval Research (ONR) Grant [grant #N000141512507], and partially supported by NSERC of Canada, the Canada Research Chairs program, and an Ontario MRIS Early Researcher Award.

ORCID iDs

Ye Zhao

Luis Sentis

Notes

Appendix

References

Alexander

(1984) The gaits of bipedal and quadrupedal animals. The International Journal of Robotics Research 3(2): 49–59.

Alonso-Mora

DeCastro

Raman

, et al. (2018) Reactive mission and motion planning with deadlock resolution avoiding dynamic obstacles. Autonomous Robots 42(4): 801–824.

Alur

Henzinger

Lafferriere

, et al. (2000) Discrete abstractions of hybrid systems. Proceedings of the IEEE 88(7): 971–984.

Ames

Tabuada

Schürmann

, et al. (2015). First steps toward formal controller synthesis for bipedal robots. In: International Conference on Hybrid Systems: Computation and Control, pp. 209–218.

Antoniotti

Mishra

(1995). Discrete event models + temporal logic = supervisory controller: Automatic synthesis of locomotion controllers. In: IEEE-RAS International Conference on Robotics and Automation, pp. 1441–1446.

Arslan

Saranli

(2012) Reactive planning and control of planar spring-mass running on rough terrain. IEEE Transactions on Robotics 28(3): 567–579.

Audren

Vaillant

Kheddar

, et al. (2014). Model preview control in multi-contact motion-application to a humanoid robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4030–4035.

Baier

Katoen

J-P

(2008) Principles of Model Checking. Cambridge: MIT press Cambridge.

Belta

Yordanov

Gol

(2017) Formal Methods for Discrete-Time Dynamical Systems. New York: Springer, Vol. 89.

10.

Bertram

Ruina

Cannon

, et al. (1999) A point-mass model of gibbon locomotion. Journal of Experimental Biology 202(19): 2609–2617.

11.

Bhatia

Kavraki

Vardi

(2010). Motion planning with hybrid dynamics and temporal goals. In: IEEE Conference on Decision and Control, pp. 1108–1115.

12.

Bloem

Jobstmann

Piterman

, et al. (2012) Synthesis of Reactive(1) designs. Journal of Computer and System Sciences 78(3): 911–938.

13.

Bouyarmane

Kheddar

(2011). Multi-contact stances planning for multiple agents. In: IEEE International Conference on Robotics and Automation, pp. 5246–5253.

14.

Bretl

(2006) Motion planning of multi-limbed robots subject to equilibrium constraints: The free-climbing robot problem. The International Journal of Robotics Research 25(4): 317–342.

15.

Burridge

Rizzi

Koditschek

(1999) Sequential composition of dynamically dexterous robot behaviors. The International Journal of Robotics Research 18(6): 534–555.

16.

Campbell

Egerstedt

How

, et al. (2010) Autonomous driving in urban environments: approaches, lessons and challenges. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368(1928): 4649–4672.

17.

Caron

Kheddar

(2016). Multi-contact walking pattern generation based on model preview control of 3d com accelerations. In: IEEE-RAS International Conference on Humanoid Robots, pp. 550–557.

18.

Caron

Pham

Q-C

Nakamura

(2015) Zmp support areas for multi-contact mobility under frictional constraints. arXiv preprint arXiv:1510.03232.

19.

Chinchali

Livingston

Topcu

, et al. (2012). Towards formal synthesis of reactive controllers for dexterous robotic manipulation. In: IEEE-RAS International Conference on Robotics and Automation, pp. 5183–5189.

20.

Chung

S-Y

Khatib

(2015). Contact-consistent elastic strips for multi-contact locomotion planning of humanoid robots. In: IEEE-RAS International Conference on Robotics and Automation, pp. 6289–6294.

21.

da Silva

Dai

, et al. (2016) Combined top-down and bottom-up approaches to performance-guaranteed integrated task and motion planning of cooperative multi-agent systems. arXiv preprint arXiv:1607.07797.

22.

Dantam

Kingston

Chaudhuri

, et al. (2018) An incremental constraint-based framework for task and motion planning. The International Journal of Robotics Research 37: 1134–1151. DOI: 10.1177/0278364918761570.

23.

Dantam

Kingston

Chaudhuri

, et al. (2016) Incremental task and motion planning: a constraint-based approach. In: Proceedings of Robotics: Science and Systems. Ann Arbor: Michigan.

24.

DARPA (2018) Darpa subterranean challenge. https://www.darpa.mil/program/darpa-subterranean-challenge.

25.

Koditschek

(2015) The penn jerboa: a platform for exploring parallel composition of templates. arXiv preprint arXiv:1502.05347.

26.

DeCastro

Alonso-Mora

Raman

, et al. (2015). Collision-free reactive mission and motion planning for multi-robot systems. In: International Symposium on Robotics Research.

27.

DeCastro

Kress-Gazit

(2015) Synthesis of nonlinear continuous controllers for verifiably correct high-level, reactive behaviors. The International Journal of Robotics Research 34(3): 378–394.

28.

Deshmukh

Donzé

Ghosh

, et al. (2015) Robust online monitoring of signal temporal logic. In: Runtime Verification. New York: Springer, pp. 55–70.

29.

Donzé

Maler

(2010) Robust satisfaction of temporal logic over real-valued signals. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer, pp. 92–106.

30.

Duperret

Koditschek

(2020) Towards reactive control of transitional legged robot maneuvers. In: Robotics Research. New York: Springer, pp. 145–162.

31.

Egerstedt

Pauli

Notomista

, et al. (2018) Robot ecology: constraint-based control design for long duration autonomy. Annual Reviews in Control.

32.

Englsberger

Ott

Albu-Schaffer

(2015) Three-dimensional bipedal walking control based on divergent component of motion. IEEE Transactions on Robotics 31(2): 355–368.

33.

Fainekos

Pappas

(2009) Robustness of temporal logic specifications for continuous-time signals. Theoretical Computer Science 410(42): 4262–4291.

34.

Farahani

Raman

Murray

(2015) Robust model predictive control for signal temporal logic synthesis. IFAC-PapersOnLine 48(27): 323–328.

35.

Feng

Wiltsche

Humphrey

, et al. (2015). Controller synthesis for autonomous systems interacting with human operators. In: Proceedings of the ACM/IEEE Sixth International Conference on Cyber-Physical Systems, pp. 70–79. ACM.

36.

Topcu

(2014) Probably approximately correct mdp learning and control with temporal logic constraints. In: Proceedings of Robotics: Science and Systems. Berkeley, CA.

37.

Topcu

(2016) Synthesis of joint control and active sensing strategies under temporal logic constraints. IEEE Transactions on Automatic Control 61(11): 3464–3476.

38.

Full

Koditschek

(1999) Templates and anchors: neuromechanical hypotheses of legged locomotion on land. Journal of Experimental Biology 202(23): 3325–3332.

39.

Hauser

(2014) Fast interpolation and time-optimization with contact. The International Journal of Robotics Research 33(9): 1231–1250.

40.

Hauser

Bretl

Latombe

J-C

(2005). Non-gaited humanoid locomotion planning. In: IEEE-RAS International Conference on Humanoid Robots, pp. 7–12.

41.

Lahijanian

Kavraki

, et al. (2015). Towards manipulation planning with temporal logic specifications. In: IEEE-RAS International Conference on Robotics and Automation, pp. 346–352.

42.

Lahijanian

Kavraki

, et al. (2017). Reactive synthesis for finite tasks under resource constraints. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5326–5332.

43.

Hereid

Cousineau

Hubicki

, et al. (2016). 3d dynamic walking with underactuated humanoid robots: a direct collocation framework for optimizing hybrid zero dynamics. In: IEEE International Conference on Robotics and Automation, pp. 1447–1454.

44.

Jaulin

(2001) Applied Interval Analysis: With Examples in Parameter and State Estimation, Robust Control and Robotics. Germany: Springer Science & Business Media, Vol. 1.

45.

Kaelbling

Lozano-Pérez

(2011). Hierarchical task and motion planning in the now. In: IEEE International Conference on Robotics and Automation, pp. 1470–1477.

46.

Kajita

Kanehiro

Kaneko

, et al. (2001) The 3d linear inverted pendulum mode: a simple modeling for a biped walking pattern generation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 239–246.

47.

Kim

Zhao

Thomas

, et al. (2016) Stabilizing series-elastic point-foot bipeds using whole-body operational space control. IEEE Transactions on Robotics 32(6): 1362–1379.

48.

Kloetzer

Belta

(2010) Automatic deployment of distributed teams of robots from temporal logic motion specifications. IEEE Transactions on Robotics 26(1): 48–61.

49.

Kress-Gazit

Fainekos

Pappas

(2009) Temporal-logic-based reactive mission and motion planning. IEEE Transactions on Robotics 25(6): 1370–1381.

50.

Kress-Gazit

Wongpiromsarn

Topcu

(2011) Correct, reactive, high-level robot control. IEEE Robotics & Automation Magazine 18(3): 65–74.

51.

Kudruss

Naveau

Stasse

, et al. (2015). Optimal control for whole-body motion generation using center-of-mass dynamics for predefined multi-contact configurations.

52.

Kuindersma

Deits

Fallon

, et al. (2016) Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots 40(3): 429–455.

53.

Liu

(2018). ROCS: a robustly complete control synthesis tool for nonlinear dynamical systems. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 130–135.

54.

Liberzon

(2012) Switching in Systems and Control. Boston: Springer Science & Business Media.

55.

Liu

(2017) Robust abstractions for control synthesis: completeness via robustness for linear-time properties. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 101–110. ACM.

56.

Liu

Ozay

(2014). Abstraction, discretization, and robustness in temporal logic control of dynamical systems. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 293–302.

57.

Liu

Ozay

(2016) Finite abstractions with robustness margins for temporal logic-based control synthesis. Nonlinear Analysis: Hybrid Systems 22: 1–15.

58.

Liu

Ozay

Topcu

, et al. (2013) Synthesis of reactive switching protocols from temporal logic specifications. IEEE Transactions on Automatic Control 58(7): 1771–1785.

59.

Liu

Topcu

Ozay

, et al. (2012) Synthesis of reactive control protocols for differentially flat systems. In IEEE Conference on Decision and Control.

60.

Luo

Zhao

Kim

, et al. (2017) Locomotion control of three dimensional passive-foot biped robot based on whole body operational space framework. IEEE International Conference on Robotics and Biomimetics 26: 28.

61.

Majumdar

Render

Tabuada

(2011). Robust discrete synthesis against unspecified disturbances. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 211–220. ACM.

62.

Maniatopoulos

Schillinger

Pong

, et al. (2016). Reactive high-level behavior synthesis for an atlas humanoid robot. In: IEEE-RAS International Conference on Robotics and Automation, pp. 4192–4199.

63.

Nilsson

Ozay

(2014). Incremental synthesis of switching protocols via abstraction refinement. In: IEEE Conference on Decision and Control, pp. 6246–6253.

64.

Peng

Abbeel

Levine

, et al. (2018) Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. arXiv Preprint arXiv:1804.02717.

65.

Piovan

Byl

(2015) Reachability-based control for the active slip model. The International Journal of Robotics Research 34(3): 270–287.

66.

Plaku

Kavraki

Vardi

(2010) Motion planning with dynamics by a synergistic combination of layers of planning. IEEE Transactions on Robotics 26(3): 469–482.

67.

Platt

Fagg

Grupen

(2004). Manipulation gaits: Sequences of grasp control tasks. In: IEEE International Conference on Robotics and Automation, pp. 801–806.

68.

Posa

Kuindersma

Tedrake

(2016). Optimization and stabilization of trajectories for constrained dynamical systems. In: IEEE International Conference on Robotics and Automation, pp. 1366–1373.

69.

Raibert

(1986) Legged Robots that Balance. Cambridge: MIT press.

70.

Raman

Donzé

Sadigh

, et al. (2015). Reactive synthesis from signal temporal logic specifications. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 239–248. ACM.

71.

Ramezani

Hurst

Hamed

, et al. (2014) Performance analysis and feedback control of atrias, a three-dimensional bipedal robot. Journal of Dynamic Systems, Measurement, and Control 136(2): 021012.

72.

Sadigh

Kapoor

(2016) Safe control under uncertainty with probabilistic signal temporal logic. In: Proceedings of Robotics: Science and Systems. Ann Arbor: Michigan.

73.

Sadraddini

Belta

(2015). Robust temporal logic model predictive control. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 772–779.

74.

Sentis

Jaeheung Park

Khatib

(2010a) Compliant control of multicontact and center-of-mass behaviors in humanoid robots. IEEE Transactions on Robotics 26(3): 483–501.

75.

Sentis

Jaeheung Park

Khatib

(2010b) Compliant control of multicontact and center-of-mass behaviors in humanoid robots. IEEE Transactions on Robotics 26(3): 483–501.

76.

Sharan

(2014) Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic Manipulation. Ph. D. thesis. Pasadena: California Institute of Technology.

77.

Sreenath

Hill

Jr Kumar

(2013) A partially observable hybrid system model for bipedal locomotion for adapting to terrain variations. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 137–142.

78.

Srivastava

Fang

Riano

, et al. (2014). Combined task and motion planning through an extensible planner-independent interface layer. In: 2014 IEEE International Conference on Robotics and Automation, pp. 639–646.

79.

Tabuada

(2009) Verification and Control of Hybrid Systems: A Symbolic Approach. New York: Springer Science & Business Media.

80.

Topcu

Ozay

Liu

, et al. (2012). On synthesizing robust discrete controllers under modeling uncertainty. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 85–94.

81.

Toussaint

Allen

Smith

, et al. (2018). Differentiable physics and stable modes for tool-use and manipulation planning. In: Proceedings of Robotics: Science and Systems. Pittsburgh, PA.

82.

Wongpiromsarn

Topcu

Murray

(2012) Receding horizon temporal logic planning. IEEE Transactions on Automatic Control 57(11): 2817–2830.

83.

Wongpiromsarn

Topcu

Ozay

, et al. (2011). Tulip: a software toolbox for receding horizon temporal logic planning. In: Proceedings of the International Conference on Hybrid Systems: Computation and Control, pp. 313–314.

84.

Grizzle

Tabuada

, et al. (2018) Correctness guarantees for the composition of lane keeping and adaptive cruise control. IEEE Transactions on Automation Science and Engineering 15(3): 1216–1229.

85.

Zhao

Fernandez

Sentis

(2016a) Robust phase-space planning for agile legged locomotion over various terrain topologies. In: Proceedings of Robotics: Science and Systems. Ann Arbor: Michigan.

86.

Zhao

Fernandez

Sentis

(2017) Robust optimal planning and control of non-periodic bipedal locomotion with a centroidal momentum model. The International Journal of Robotics Research 36(11): 1211–1242.

87.

Zhao

Sentis

(2012). A three dimensional foot placement planner for locomotion in very rough terrains. In: IEEE-RAS International Conference on Humanoid Robots, pp. 726–733.

88.

Zhao

Topcu

Sentis

(2016b). High-level planner synthesis for whole-body locomotion in unstructured environments. In: IEEE Conference on Decision and Control, pp. 6557–6564.