Most tasks faced by a robot require more than a single “action”—for example, to cook a meal, there are many steps in a recipe; in order to navigate a building, a robot must explore many rooms, find feasible unblocked paths, open doors, and so on; construction tasks have many interlocking dependencies (e.g., peg A in hole B before screw C into socket D, etc.).
Critical to solving these kinds of tasks is some abstraction over the set of actions a robot can do, allowing methods to reason over what to do at a higher level (i.e., rather than thinking about joint angles throughout the entire motion, thinking at the level of “pick up object A” and “place object B on object C”). Designing or automatically finding suitable abstractions for a problem (do you assume to know the set of actions the robot can perform, or is this something you must find out?), finding ways of providing feedback through abstraction (e.g., what happens when you can’t find a motion to pick up an object? Do you never pick up that object again? Is another object blocking it, or is it something else? How do you discover and inform search about this?), and efficiently searching over the infinite combinatorial explosion of options is essential to solving these problems effectively.
2024
arXiv
CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning
Large Language Models (LLMs) have demonstrated remarkable ability in long-horizon Task and Motion Planning (TAMP) by translating clear and straightforward natural language problems into formal specifications such as the Planning Domain Definition Language (PDDL). However, real-world problems are often ambiguous and involve many complex constraints. In this paper, we introduce Constraints as Specifications through LLMs (CaStL), a framework that identifies constraints such as goal conditions, action ordering, and action blocking from natural language in multiple stages. CaStL translates these constraints into PDDL and Python scripts, which are solved using an custom PDDL solver. Tested across three PDDL domains, CaStL significantly improves constraint handling and planning success rates from natural language specification in complex scenarios.
Long-horizon task planning is important for robot autonomy, especially as a subroutine for frameworks such as Integrated Task and Motion Planning. However, task planning is computationally challenging and struggles to scale to realistic problem settings. We propose to accelerate task planning over an agent’s lifetime by integrating learned abstract strategies: a generalizable planning experience encoding introduced in earlier work. In this work, we contribute a practical approach to planning with strategies by introducing a novel formalism of planning in a skill-augmented domain. We also introduce and formulate the notion of a skill’s affordance, which indicates its predicted benefit to the solution, and use it to guide the planning and skill grounding processes. Together, our observations yield an affordance-directed, lazy-search planning algorithm, which can seamlessly compose strategies and actions to solve long-horizon planning problems. We evaluate our planner in an object rearrangement domain, where we demonstrate performance benefits relative to a state-of-the-art task planner.
Rearrangement puzzles are variations of rearrangement problems in which the elements of a problem are potentially logically linked together. To efficiently solve such puzzles, we develop a motion planning approach based on a new state space that is logically factored, integrating the capabilities of the robot through factors of simultaneously manipulatable joints of an object. Based on this factored state space, we propose less-actions RRT (LA-RRT), a planner which optimizes for a low number of actions to solve a puzzle. At the core of our approach lies a new path defragmentation method, which rearranges and optimizes consecutive edges to minimize action cost. We solve six rearrangement scenarios with a Fetch robot, involving planar table puzzles and an escape room scenario. LA-RRT significantly outperforms the next best asymptotically-optimal planner by 4.01 to 6.58 times improvement in final action cost.
3D object reconfiguration encompasses common robot manipulation tasks in which a set of objects must be moved through a series of physically feasible state changes into a desired final configuration. Object reconfiguration is challenging to solve in general, as it requires efficient reasoning about environment physics that determine action validity. This information is typically manually encoded in an explicit transition system. Constructing these explicit encodings is tedious and error-prone, and is often a bottleneck for planner use. In this work, we explore embedding a physics simulator within a motion planner to implicitly discover and specify the valid actions from any state, removing the need for manual specification of action semantics. Our experiments demonstrate that the resulting simulation-based planner can effectively produce physically valid rearrangement trajectories for a range of 3D object reconfiguration problems without requiring more than an environment description and start and goal arrangements.
Many methods that solve robot planning problems, such as task and motion planners, employ discrete symbolic search to find sequences of valid symbolic actions that are grounded with motion planning. Much of the efficacy of these planners lies in this grounding—bad placement and grasp choices can lead to inefficient planning when a problem has many geometric constraints. Moreover, grounding methods such as naı̈ve sampling often fail to find appropriate values for these choices in the presence of clutter. Towards efficient task and motion planning, we present a novel optimization-based approach for grounding to solve cluttered problems that have many constraints that arise from geometry. Our approach finds an optimal grounding and can provide feedback to discrete search for more effective planning. We demonstrate our method against baseline methods in complex simulated environments.
Robotic manipulation is inherently continuous, but typically has an underlying discrete structure, such as if an object is grasped. Many problems like these are multi-modal, such as pick-and-place tasks where every object grasp and placement is a mode. Multi-modal problems require finding a sequence of transitions between modes - for example, a particular sequence of object picks and placements. However, many multi-modal planners fail to scale when motion planning is difficult (e.g., in clutter) or the task has a long horizon (e.g., rearrangement). This work presents solutions for multi-modal scalability in both these areas. For motion planning, we present an experience-based planning framework ALEF which reuses experience from similar modes both online and from training data. For task satisfaction, we present a layered planning approach that uses a discrete lead to bias search towards useful mode transitions, informed by weights over mode transitions. Together, these contributions enable multi-modal planners to tackle complex manipulation tasks that were previously infeasible or inefficient, and provide significant improvements in scenes with high-dimensional robots.
Many robotic manipulation problems are multi-modal—they consist of a discrete set of mode families (e.g., whether an object is grasped or placed) each with a continuum of parameters (e.g., where exactly an object is grasped). Core to these problems is solving single-mode motion plans, i.e., given a mode from a mode family (e.g., a specific grasp), find a feasible motion to transition to the next desired mode. Many planners for such problems have been proposed, but complex manipulation plans may require prohibitively long computation times due to the difficulty of solving these underlying single-mode problems. It has been shown that using experience from similar planning queries can significantly improve the efficiency of motion planning. However, even though modes from the same family are similar, they impose different constraints on the planning problem, and thus experience gained in one mode cannot be directly applied to another. We present a new experience-based framework, ALEF , for such multi-modal planning problems. ALEF learns using paths from single-mode problems from a mode family, and applies this experience to novel modes from the same family. We evaluate ALEF on a variety of challenging problems and show a significant improvement in the efficiency of sampling-based planners both in isolation and within a multi-modal manipulation planner.
Robots have begun operating and collaborating with humans in industrial and social settings. This collaboration introduces challenges: the robot must plan while taking the human’s actions into account. In prior work, the problem was posed as a 2-player deterministic game, with a limited number of human moves. The limit on human moves is unintuitive, and in many settings determinism is undesirable. In this paper, we present a novel planning method for collaborative human-robot manipulation tasks via probabilistic synthesis. We introduce a probabilistic manipulation domain that captures the interaction by allowing for both robot and human actions with states that represent the configurations of the objects in the workspace. The task is specified using Linear Temporal Logic over finite traces (LTLf). We then transform our manipulation domain into a Markov Decision Process (MDP) and synthesize an optimal policy to satisfy the specification on this MDP. We present two novel contributions: a formalization of probabilistic manipulation domains allowing us to apply existing techniques and a comparison of different encodings of these domains. Our framework is validated on a physical UR5 robot.
Robotic manipulation problems are inherently continuous, but typically have underlying discrete structure, e.g., whether or not an object is grasped. This means many problems are multi-modal and in particular have a continuous infinity of modes. For example, in a pick-and-place manipulation domain, every grasp and placement of an object is a mode. Usually manipulation problems require the robot to transition into different modes, e.g., going from a mode with an object placed to another mode with the object grasped. To successfully find a manipulation plan, a planner must find a sequence of valid single-mode motions as well as valid transitions between these modes. Many manipulation planners have been proposed to solve tasks with multi-modal structure. However, these methods require mode-specific planners and fail to scale to very cluttered environments or to tasks that require long sequences of transitions. This paper presents a general layered planning approach to multi-modal planning that uses a discrete "lead" to bias search towards useful mode transitions. The difficulty of achieving specific mode transitions is captured online and used to bias search towards more promising sequences of modes. We demonstrate our planner on complex scenes and show that significant performance improvements are tied to both our discrete "lead" and our continuous representation.
We present a new algorithm for task and motion planning (TMP) and discuss the requirements and abstrations necessary to obtain robust solutions for TMP in general. Our Iteratively Deepened Task and Motion Planning (IDTMP) method is probabilistically-complete and offers improved performance and generality compared to a similar, state-of-the-art, probabilistically-complete planner. The key idea of IDTMP is to leverage incremental constraint solving to efficiently add and remove constraints on motion feasibility at the task level. We validate IDTMP on a physical manipulator and evaluate scalability on scenarios with many objects and long plans, showing order-of-magnitude gains compared to the benchmark planner and a four-times self-comparison speedup from our extensions. Finally, in addition to describing a new method for TMP and its implementation on a physical robot, we also put forward requirements and abstractions for the development of similar planners in the future.
We present a new algorithm for task and motion planning (TMP) and discuss the requirements and abstractions necessary to obtain robust solutions for TMP in general. Our Iteratively Deepened Task and Motion Planning (IDTMP) method is probabilistically-complete and offers improved performance and generality compared to a similar, state-of-the-art, probabilistically-complete planner. The key idea of IDTMP is to leverage incremental constraint solving to efficiently add and remove constraints on motion feasibility at the task level. We validate IDTMP on a physical manipulator and evaluate scalability on scenarios with many objects and long plans, showing order-of-magnitude gains compared to the benchmark planner and a four-times self-comparison speedup from our extensions. Finally, in addition to describing a new method for TMP and its implementation on a physical robot, we also put forward requirements and abstractions for the development of similar planners in the future.