diff --git a/CHANGELOG.md b/CHANGELOG.md
index f9e37da..efa27e1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,7 +10,7 @@ its affiliates is strictly prohibited.
-->
# Changelog
-## Version 0.6.3
+## Version 0.7.0
### Changes in default behavior
- Increased default collision cache to 50 in RobotWorld.
- Changed `CSpaceConfig.position_limit_clip` default to 0 as previous default of 0.01 can make
@@ -18,6 +18,11 @@ default start state in examples be out of bounds.
- MotionGen uses parallel_finetune by default. To get previous motion gen behavior, pass
`warmup(parallel_finetune=False)` and `MotionGenPlanConfig(parallel_finetune=False)`.
- MotionGen loads Mesh Collision checker instead of Primitive by default.
+- UR10e and UR5e now don't have a collision sphere at tool frame for world collision checking. This
+sphere is only active for self collision avoidance.
+- With torch>=2.0, cuRobo will use `torch.compile` instead of `torch.jit.script` to generate fused
+kernels. This can take several seconds during the first run. To enable this feature, set
+environment variable `export CUROBO_TORCH_COMPILE_DISABLE=0`.
### Breaking Changes
- Renamed `copy_if_not_none` to `clone_if_not_none` to be more descriptive. Now `copy_if_not_none`
@@ -26,6 +31,9 @@ will try to copy data into reference.
parallel problems in optimization.
- Added more inputs to pose distance kernels. Check `curobolib/geom.py`.
- Pose cost `run_vec_weight` should now be `[0,0,0,0,0,0]` instead of `[1,1,1,1,1,1]`
+- ``max_distance`` is now tensor from ``float`` and is an input to collision kernels.
+- Order of inputs to ``SweptSdfMeshWarpPy`` has changed.
+
### New Features
- Add function to disable and enable collision for specific links in KinematicsTensorConfig.
@@ -37,6 +45,19 @@ calling plan_single after warmup of goalset.
add error when `velocity_scale<0.1`.
- Add experimental robot image segmentation module to enable robot removal in depth images.
- Add constrained planning mode to motion_gen.
+- Use `torch.compile` to leverage better kernel fusion in place of `torch.jit.script`.
+- Significantly improved collision computation for cuboids and meshes. Mesh collision checker is
+now only 2x slower than cuboid (from 5x slower). Optimization convergence is also improved.
+- LBFGS kernels now support ``history <= 31`` from ``history <= 15``.
+- 2x faster LBFGS kernel that allocates upto 68kb of shared memory, preventing use in CUDA devices
+with compute capability ``<7.0``.
+- On benchmarking Dataset, Planning time is now 42ms on average from 50ms. Higher quality solutions
+are also obtained. See [benchmarks](https://curobo.org/source/getting_started/4_benchmarks.html) for more details.
+- Add ``WorldCollisionVoxel``, a new collision checking implementation that uses a voxel grid
+of signed distances (SDF) to compute collision avoidance metrics. Documentation coming soon, see
+``benchmark/curobo_voxel_benchmark.py`` for an example.
+- Add API for ESDF computation from world representations, see
+``WorldCollision.get_esdf_in_bounding_box()``.
### BugFixes & Misc.
- refactored wp.index() instances to `[]` to avoid errors in future releases of warp.
@@ -47,7 +68,7 @@ BATCH_GOALSET.
- Added package data to also export `.so` files.
- Fixed bug in transforming link visual mesh offset when reading from urdf.
- Fixed bug in MotionGenPlanConfig.clone() that didn't clone the state of parallel_finetune.
-- Increased weighting from 1.0 to 5.0 for optimized_dt in TrajEvaluator to select shorter
+- Increased weighting from 1.0 to 10.0 for optimized_dt in TrajEvaluator to select shorter
trajectories.
- Improved determinism by setting global seed for random in `graph_nx.py`.
- Added option to clear obstacles in WorldPrimitiveCollision.
@@ -57,10 +78,24 @@ is enabled.
graph.
- plan_goalset will pad for extra goals when called with less number of goal than initial creation.
- Improved API documentation for Optimizer class.
-- Improved benchmark timings, now within 15ms of results reported in technical report. Added
-numbers to benchmark [webpage](https://curobo.org/source/getting_started/4_benchmarks.html) for
-easy reference.
- Set `use_cuda_graph` to `True` as default from `None` in `MotionGenConfig.load_from_robot_config`
+- Add batched mode to robot image segmentation, supports single robot multiple camera and batch
+robot batch camera.
+- Add `log_warn` import to `arm_reacher.py`
+- Remove negative radius check in self collision kernel to allow for self collision checking with
+spheres of negative radius.
+- Added `conftest.py` to disable `torch.compile` for tests.
+- Added UR5e robot with robotiq gripper (2f-140) with improved sphere model.
+- Fix bug in aarch64.dockerfile where curobo was cloned to wrong path.
+- Fix bug in aarch64.dockerfile where python was used instead of python3.
+- Remove unused variables in kernels.
+- Added ``pybind11`` as a dependency as some pytorch dockers for Jetson do not have this installed.
+- Fix incorrect dimensions in ``MotionGenResult.success`` in ``MotionGen.plan_batch()`` when
+trajectory optimization fails.
+- Added unit tests for collision checking functions.
+- Fix bug in linear interpolation which was not reading the new ``optimized_dt`` to interpolate
+velocity, acceleration, and jerk.
+
### Known Bugs (WIP)
- Examples don't run in Isaac Sim 2023.1.1 due to behavior change in urdf importer.
@@ -104,11 +139,11 @@ fails.
### Performance Regressions
- cuRobo now generates significantly shorter paths then previous version. E.g., cuRobo obtains
-2.2 seconds 98th percentile motion time on the 2600 problems (`benchmark/curobo_benchmark.py`), where
-previously it was at 3 seconds (1.36x quicker motions). This was obtained by retuning the weights and
-slight reformulations of trajectory optimization. These changes have led to a slight degrade in
-planning time, 20ms slower on 4090 and 40ms on ORIN MAXN. We will address this slow down in a later
-release. One way to avoid this regression is to set `finetune_dt_scale=1.05` in
+2.2 seconds 98th percentile motion time on the 2600 problems (`benchmark/curobo_benchmark.py`),
+where previously it was at 3 seconds (1.36x quicker motions). This was obtained by retuning the
+weights and slight reformulations of trajectory optimization. These changes have led to a slight
+degrade in planning time, 20ms slower on 4090 and 40ms on ORIN MAXN. We will address this slow down
+in a later release. One way to avoid this regression is to set `finetune_dt_scale=1.05` in
`MotionGenConfig.load_from_robot_config()`.
diff --git a/benchmark/README.md b/benchmark/README.md
index 77c4937..ce59462 100644
--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -10,49 +10,4 @@ its affiliates is strictly prohibited.
-->
This folder contains scripts to run the motion planning benchmarks.
-Refer to Benchmarks & Profiling instructions: https://curobo.org/source/getting_started/4_benchmarks.html.
-
-Results in the arxiv paper were obtained from v0.6.0.
-
-v0.6.2+ has significant changes to improve motion quality with lower motion time, lower path length, higher pose accuracy (<1mm). v0.6.2+ sacrifices 15ms (RTX 4090) of compute time to achieve signficantly better solutions. The new results are yet to be added to the technical report. For now, to get the latest benchmark results follow instructions here: https://curobo.org/source/getting_started/4_benchmarks.html.
-
-To get results similar to in the technical report, pass `--report_edition` to `curobo_benchmark.py`.
-
-
-# Latest Results (Feb 2024)
-
-Motion Generation on 2600 problems from motion benchmaker and motion policy networks, gives the
-following values on a RTX 4090:
-
-| Metric | Value |
-|-------------------|--------------------------------------------------------------|
-|Success % | 99.84 |
-|Plan Time (s) | mean: 0.068 ± 0.158 median:0.042 75%: 0.055 98%: 0.246 |
-|Motion Time (s) | mean: 1.169 ± 0.360 median:1.140 75%: 1.381 98%: 2.163 |
-|Path Length (rad.) | mean: 3.177 ± 1.072 median:3.261 75%: 3.804 98%: 5.376 |
-|Jerk | mean: 97.700 ± 48.630 median:88.993 75%: 126.092 98%: 199.540|
-|Position Error (mm)| mean: 0.119 ± 0.341 median:0.027 75%: 0.091 98%: 1.039 |
-
-
-## Motion Benchmaker (800 problems):
-
-| Metric | Value |
-|-------------------|--------------------------------------------------------------|
-|Success % | 100 |
-|Plan Time (s) | mean: 0.063 ± 0.137 median:0.042 75%: 0.044 98%: 0.206 |
-|Motion Time (s) | mean: 1.429 ± 0.330 median:1.392 75%: 1.501 98%: 2.473 |
-|Path Length (rad.) | mean: 3.956 ± 0.783 median:3.755 75%: 4.352 98%: 6.041 |
-|Jerk | mean: 67.284 ± 27.788 median:61.853 75%: 83.337 98%: 143.118 |
-|Position Error (mm)| mean: 0.079 ± 0.139 median:0.032 75%: 0.077 98%: 0.472 |
-
-
-## Motion Policy Networks (1800 problems):
-
-| Metric | Value |
-|-------------------|-----------------------------------------------------------------|
-|Success % | 99.77 |
-|Plan Time (s) | mean: 0.068 ± 0.117 median:0.042 75%: 0.059 98%: 0.243 |
-|Motion Time (s) | mean: 1.051 ± 0.306 median:1.016 75%: 1.226 98%: 1.760 |
-|Path Length (rad.) | mean: 2.829 ± 1.000 median:2.837 75%: 3.482 98%: 4.905 |
-|Jerk | mean: 110.610 ± 47.586 median:105.271 75%: 141.517 98%: 217.158 |
-|Position Error (mm)| mean: 0.122 ± 0.343 median:0.024 75%: 0.095 98%: 1.114 |
+Refer to Benchmarks & Profiling page for latest resutls: https://curobo.org/source/getting_started/4_benchmarks.html.
\ No newline at end of file
diff --git a/benchmark/curobo_benchmark.py b/benchmark/curobo_benchmark.py
index 944282b..74cd07f 100644
--- a/benchmark/curobo_benchmark.py
+++ b/benchmark/curobo_benchmark.py
@@ -45,8 +45,8 @@ torch.manual_seed(2)
torch.backends.cudnn.benchmark = True
-torch.backends.cuda.matmul.allow_tf32 = False
-torch.backends.cudnn.allow_tf32 = False
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
np.random.seed(2)
random.seed(2)
@@ -144,6 +144,7 @@ def load_curobo(
collision_activation_distance: float = 0.02,
args=None,
parallel_finetune=False,
+ ik_seeds=None,
):
robot_cfg = load_yaml(join_path(get_robot_configs_path(), "franka.yml"))["robot_cfg"]
robot_cfg["kinematics"]["collision_sphere_buffer"] = collision_buffer
@@ -152,12 +153,14 @@ def load_curobo(
robot_cfg["kinematics"]["ee_link"] = "panda_hand"
# del robot_cfg["kinematics"]
+ if ik_seeds is None:
+ ik_seeds = 24
- ik_seeds = 30 # 500
if graph_mode:
trajopt_seeds = 4
- if trajopt_seeds >= 14:
- ik_seeds = max(100, trajopt_seeds * 2)
+ collision_activation_distance = 0.0
+ if trajopt_seeds >= 16:
+ ik_seeds = 100
if mpinets:
robot_cfg["kinematics"]["lock_joints"] = {
"panda_finger_joint1": 0.025,
@@ -181,11 +184,11 @@ def load_curobo(
K = robot_cfg_instance.kinematics.kinematics_config.joint_limits
K.position[0, :] -= 0.2
K.position[1, :] += 0.2
- finetune_iters = 300
+ finetune_iters = 200
grad_iters = None
if args.report_edition:
finetune_iters = 200
- grad_iters = 125
+ grad_iters = 100
motion_gen_config = MotionGenConfig.load_from_robot_config(
robot_cfg_instance,
world_cfg,
@@ -207,10 +210,11 @@ def load_curobo(
collision_activation_distance=collision_activation_distance,
trajopt_dt=0.25,
finetune_dt_scale=finetune_dt_scale,
- maximum_trajectory_dt=0.16,
+ maximum_trajectory_dt=0.15,
)
mg = MotionGen(motion_gen_config)
mg.warmup(enable_graph=True, warmup_js_trajopt=False, parallel_finetune=parallel_finetune)
+
return mg, robot_cfg
@@ -227,24 +231,20 @@ def benchmark_mb(
# load dataset:
force_graph = False
- interpolation_dt = 0.02
- # mpinets_data = True
- # if mpinets_data:
- file_paths = [motion_benchmaker_raw, mpinets_raw][:]
+ file_paths = [motion_benchmaker_raw, mpinets_raw]
if args.demo:
file_paths = [demo_raw]
- # else:22
- # file_paths = [get_mb_dataset_path()][:1]
enable_debug = save_log or plot_cost
all_files = []
og_tsteps = 32
if override_tsteps is not None:
og_tsteps = override_tsteps
og_finetune_dt_scale = 0.9
- og_trajopt_seeds = 12
- og_parallel_finetune = not args.jetson
+ og_trajopt_seeds = 4
+ og_parallel_finetune = True
og_collision_activation_distance = 0.01
+ og_ik_seeds = None
for file_path in file_paths:
all_groups = []
mpinets_data = False
@@ -258,50 +258,25 @@ def benchmark_mb(
trajopt_seeds = og_trajopt_seeds
collision_activation_distance = og_collision_activation_distance
parallel_finetune = og_parallel_finetune
+ ik_seeds = og_ik_seeds
+
if "cage_panda" in key:
+ trajopt_seeds = 8
+
+ if "table_under_pick_panda" in key:
+ trajopt_seeds = 8
+ finetune_dt_scale = 0.95
+
+ if key == "cubby_task_oriented": # or key == "merged_cubby_task_oriented":
trajopt_seeds = 16
finetune_dt_scale = 0.95
- parallel_finetune = True
- if "table_under_pick_panda" in key:
- tsteps = 40
- trajopt_seeds = 24
- finetune_dt_scale = 0.95
- parallel_finetune = True
- if "table_pick_panda" in key:
- collision_activation_distance = 0.005
-
- if "cubby_task_oriented" in key and "merged" not in key:
- trajopt_seeds = 24
- finetune_dt_scale = 0.95
- collision_activation_distance = 0.015
- parallel_finetune = True
if "dresser_task_oriented" in key:
- trajopt_seeds = 24
- # tsteps = 40
+ trajopt_seeds = 16
finetune_dt_scale = 0.95
- collision_activation_distance = 0.01
- parallel_finetune = True
- if key in [
- "tabletop_neutral_start",
- "merged_cubby_neutral_start",
- "merged_cubby_task_oriented",
- "cubby_neutral_start",
- "cubby_neutral_goal",
- "dresser_neutral_start",
- "dresser_neutral_goal",
- "tabletop_task_oriented",
- ]:
- collision_activation_distance = 0.005
- if key in ["dresser_neutral_goal"]:
- trajopt_seeds = 24 # 24
- # tsteps = 36
- collision_activation_distance = 0.01
- # trajopt_seeds = 12
- scene_problems = problems[key] # [:10]
+
+ scene_problems = problems[key][:]
n_cubes = check_problems(scene_problems)
- # print(n_cubes)
- # continue
mg, robot_cfg = load_curobo(
n_cubes,
enable_debug,
@@ -316,6 +291,7 @@ def benchmark_mb(
collision_activation_distance=collision_activation_distance,
args=args,
parallel_finetune=parallel_finetune,
+ ik_seeds=ik_seeds,
)
m_list = []
i = 0
@@ -326,10 +302,10 @@ def benchmark_mb(
continue
plan_config = MotionGenPlanConfig(
- max_attempts=100,
+ max_attempts=20,
enable_graph_attempt=1,
- disable_graph_attempt=20,
- enable_finetune_trajopt=True,
+ disable_graph_attempt=10,
+ enable_finetune_trajopt=not args.no_finetune,
partial_ik_opt=False,
enable_graph=graph_mode or force_graph,
timeout=60,
@@ -344,9 +320,11 @@ def benchmark_mb(
problem_name = "d_" + key + "_" + str(i)
# reset planner
- mg.reset(reset_seed=True)
+ mg.reset(reset_seed=False)
if args.mesh:
- world = WorldConfig.from_dict(deepcopy(problem["obstacles"])).get_mesh_world()
+ world = WorldConfig.from_dict(deepcopy(problem["obstacles"])).get_mesh_world(
+ merge_meshes=False
+ )
else:
world = WorldConfig.from_dict(deepcopy(problem["obstacles"])).get_obb_world()
mg.world_coll_checker.clear_cache()
@@ -355,7 +333,13 @@ def benchmark_mb(
# run planner
start_state = JointState.from_position(mg.tensor_args.to_device([q_start]))
goal_pose = Pose.from_list(pose)
-
+ if i == 1:
+ for _ in range(3):
+ result = mg.plan_single(
+ start_state,
+ goal_pose,
+ plan_config.clone(),
+ )
result = mg.plan_single(
start_state,
goal_pose,
@@ -389,11 +373,9 @@ def benchmark_mb(
dt = result.optimized_dt.item()
if "trajopt_result" in result.debug_info:
success = result.success.item()
- traj_cost = (
- # result.debug_info["trajopt_result"].debug_info["solver"]["cost"][0]
- result.debug_info["trajopt_result"].debug_info["solver"]["cost"][-1]
- )
- # print(traj_cost[0])
+ traj_cost = result.debug_info["trajopt_result"].debug_info["solver"][
+ "cost"
+ ][-1]
traj_cost = torch.cat(traj_cost, dim=-1)
plot_cost_iteration(
traj_cost,
@@ -438,7 +420,7 @@ def benchmark_mb(
.squeeze()
.numpy()
.tolist(),
- "dt": interpolation_dt,
+ "dt": result.interpolation_dt,
}
debug = {
@@ -728,11 +710,18 @@ if __name__ == "__main__":
help="When True, runs benchmark with parameters for jetson",
default=False,
)
+ parser.add_argument(
+ "--no_finetune",
+ action="store_true",
+ help="When True, runs benchmark with parameters for jetson",
+ default=False,
+ )
args = parser.parse_args()
setup_curobo_logger("error")
- for _ in range(1):
+ for i in range(1):
+ print("*****RUN: " + str(i))
benchmark_mb(
save_log=False,
write_usd=args.save_usd,
diff --git a/benchmark/curobo_profile.py b/benchmark/curobo_profile.py
index 77cbcb2..d6d98a7 100644
--- a/benchmark/curobo_profile.py
+++ b/benchmark/curobo_profile.py
@@ -75,12 +75,13 @@ def load_curobo(
num_ik_seeds=30,
num_trajopt_seeds=12,
interpolation_dt=0.025,
+ finetune_trajopt_iters=200,
# grad_trajopt_iters=200,
store_ik_debug=enable_log,
store_trajopt_debug=enable_log,
)
mg = MotionGen(motion_gen_config)
- mg.warmup(enable_graph=False)
+ mg.warmup(enable_graph=False, warmup_js_trajopt=False)
return mg
@@ -140,14 +141,17 @@ def benchmark_mb(args):
print(result.total_time, result.solve_time)
# continue
# load obstacles
-
+ # exit()
# run planner
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
+ # torch.cuda.profiler.start()
result = mg.plan_single(
start_state,
Pose.from_list(pose),
plan_config,
)
+ # torch.cuda.profiler.stop()
+
print("Exporting the trace..")
prof.export_chrome_trace(join_path(args.save_path, args.file_name) + ".json")
print(result.success, result.status)
@@ -191,5 +195,5 @@ if __name__ == "__main__":
args = parser.parse_args()
- setup_curobo_logger("error")
+ setup_curobo_logger("warn")
benchmark_mb(args)
diff --git a/benchmark/curobo_voxel_benchmark.py b/benchmark/curobo_voxel_benchmark.py
new file mode 100644
index 0000000..bfccc0e
--- /dev/null
+++ b/benchmark/curobo_voxel_benchmark.py
@@ -0,0 +1,660 @@
+#
+# Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
+#
+
+# Standard Library
+import argparse
+from copy import deepcopy
+from typing import Optional
+
+# Third Party
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+from robometrics.datasets import demo_raw, motion_benchmaker_raw, mpinets_raw
+from tqdm import tqdm
+
+# CuRobo
+from curobo.geom.sdf.world import CollisionCheckerType, WorldConfig
+from curobo.geom.types import Cuboid
+from curobo.geom.types import Cuboid as curobo_Cuboid
+from curobo.geom.types import Mesh, VoxelGrid
+from curobo.types.base import TensorDeviceType
+from curobo.types.camera import CameraObservation
+from curobo.types.math import Pose
+from curobo.types.robot import RobotConfig
+from curobo.types.state import JointState
+from curobo.util.logger import setup_curobo_logger
+from curobo.util.metrics import CuroboGroupMetrics, CuroboMetrics
+from curobo.util_file import (
+ get_assets_path,
+ get_robot_configs_path,
+ get_world_configs_path,
+ join_path,
+ load_yaml,
+ write_yaml,
+)
+from curobo.wrap.model.robot_world import RobotWorld, RobotWorldConfig
+from curobo.wrap.reacher.motion_gen import MotionGen, MotionGenConfig, MotionGenPlanConfig
+
+torch.manual_seed(0)
+
+torch.backends.cudnn.benchmark = True
+
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
+
+np.random.seed(0)
+
+
+def plot_cost_iteration(cost: torch.Tensor, save_path="cost", title="", log_scale=False):
+ fig = plt.figure(figsize=(5, 4))
+ cost = cost.cpu().numpy()
+ # save to csv:
+ np.savetxt(save_path + ".csv", cost, delimiter=",")
+
+ # if cost.shape[0] > 1:
+ colormap = plt.cm.winter
+ plt.gca().set_prop_cycle(plt.cycler("color", colormap(np.linspace(0, 1, cost.shape[0]))))
+ x = [i for i in range(cost.shape[-1])]
+ for i in range(cost.shape[0]):
+ plt.plot(x, cost[i], label="seed_" + str(i))
+ plt.tight_layout()
+ # plt.title(title)
+ plt.xlabel("iteration")
+ plt.ylabel("cost")
+ if log_scale:
+ plt.yscale("log")
+ plt.grid()
+ # plt.legend()
+ plt.tight_layout()
+ plt.savefig(save_path + ".pdf")
+ plt.close()
+
+
+def plot_traj(act_seq: JointState, dt=0.25, title="", save_path="plot.png", sma_filter=False):
+ fig, ax = plt.subplots(4, 1, figsize=(5, 8), sharex=True)
+ t_steps = np.linspace(0, act_seq.position.shape[0] * dt, act_seq.position.shape[0])
+
+ if sma_filter:
+ kernel = 5
+ sma = torch.nn.AvgPool1d(kernel_size=kernel, stride=1, padding=2, ceil_mode=False).cuda()
+
+ for i in range(act_seq.position.shape[-1]):
+ ax[0].plot(t_steps, act_seq.position[:, i].cpu(), "-", label=str(i))
+
+ ax[1].plot(t_steps[: act_seq.velocity.shape[0]], act_seq.velocity[:, i].cpu(), "-")
+ if sma_filter:
+ act_seq.acceleration[:, i] = sma(
+ act_seq.acceleration[:, i].view(1, -1)
+ ).squeeze() # @[1:-2]
+
+ ax[2].plot(t_steps[: act_seq.acceleration.shape[0]], act_seq.acceleration[:, i].cpu(), "-")
+ if sma_filter:
+ act_seq.jerk[:, i] = sma(act_seq.jerk[:, i].view(1, -1)).squeeze() # @[1:-2]\
+
+ ax[3].plot(t_steps[: act_seq.jerk.shape[0]], act_seq.jerk[:, i].cpu(), "-")
+ ax[0].set_title(title + " dt=" + "{:.3f}".format(dt))
+ ax[3].set_xlabel("Time(s)")
+ ax[3].set_ylabel("Jerk rad. s$^{-3}$")
+ ax[0].set_ylabel("Position rad.")
+ ax[1].set_ylabel("Velocity rad. s$^{-1}$")
+ ax[2].set_ylabel("Acceleration rad. s$^{-2}$")
+ ax[0].grid()
+ ax[1].grid()
+ ax[2].grid()
+ ax[3].grid()
+ ax[0].legend(bbox_to_anchor=(0.5, 1.6), loc="upper center", ncol=4)
+ plt.tight_layout()
+ plt.savefig(save_path)
+ plt.close()
+
+
+def load_curobo(
+ n_cubes: int,
+ enable_debug: bool = False,
+ tsteps: int = 30,
+ trajopt_seeds: int = 4,
+ mpinets: bool = False,
+ graph_mode: bool = False,
+ cuda_graph: bool = True,
+ collision_activation_distance: float = 0.025,
+ finetune_dt_scale: float = 1.0,
+ parallel_finetune: bool = True,
+ args=None,
+):
+ robot_cfg = load_yaml(join_path(get_robot_configs_path(), "franka.yml"))["robot_cfg"]
+ robot_cfg["kinematics"]["collision_sphere_buffer"] = -0.00
+
+ ik_seeds = 24
+ if graph_mode:
+ trajopt_seeds = 4
+ if trajopt_seeds >= 14:
+ ik_seeds = max(100, trajopt_seeds * 2)
+ if mpinets:
+ robot_cfg["kinematics"]["lock_joints"] = {
+ "panda_finger_joint1": 0.025,
+ "panda_finger_joint2": 0.025,
+ }
+ world_cfg = WorldConfig.from_dict(
+ {
+ "voxel": {
+ "base": {
+ "dims": [2.4, 2.4, 2.4],
+ "pose": [0, 0, 0, 1, 0, 0, 0],
+ "voxel_size": 0.005,
+ "feature_dtype": torch.bfloat16,
+ },
+ }
+ }
+ )
+ interpolation_steps = 2000
+ if graph_mode:
+ interpolation_steps = 100
+ robot_cfg_instance = RobotConfig.from_dict(robot_cfg, tensor_args=TensorDeviceType())
+ K = robot_cfg_instance.kinematics.kinematics_config.joint_limits
+ K.position[0, :] -= 0.2
+ K.position[1, :] += 0.2
+
+ motion_gen_config = MotionGenConfig.load_from_robot_config(
+ robot_cfg_instance,
+ world_cfg,
+ trajopt_tsteps=tsteps,
+ collision_checker_type=CollisionCheckerType.VOXEL,
+ use_cuda_graph=cuda_graph,
+ position_threshold=0.005, # 0.5 cm
+ rotation_threshold=0.05,
+ num_ik_seeds=ik_seeds,
+ num_graph_seeds=trajopt_seeds,
+ num_trajopt_seeds=trajopt_seeds,
+ interpolation_dt=0.025,
+ store_ik_debug=enable_debug,
+ store_trajopt_debug=enable_debug,
+ interpolation_steps=interpolation_steps,
+ collision_activation_distance=collision_activation_distance,
+ trajopt_dt=0.25,
+ finetune_dt_scale=finetune_dt_scale,
+ maximum_trajectory_dt=0.15,
+ finetune_trajopt_iters=200,
+ )
+ mg = MotionGen(motion_gen_config)
+ mg.warmup(enable_graph=True, warmup_js_trajopt=False, parallel_finetune=True)
+ # create a ground truth collision checker:
+ world_model = WorldConfig.from_dict(
+ {
+ "cuboid": {
+ "table": {
+ "dims": [1, 1, 1],
+ "pose": [0, 0, 0, 1, 0, 0, 0],
+ }
+ }
+ }
+ )
+ if args.mesh:
+ world_model = world_model.get_mesh_world()
+ config = RobotWorldConfig.load_from_config(
+ robot_cfg_instance,
+ world_model,
+ collision_activation_distance=0.0,
+ collision_checker_type=CollisionCheckerType.MESH,
+ n_cuboids=50,
+ n_meshes=50,
+ max_collision_distance=100.0,
+ )
+ robot_world = RobotWorld(config)
+
+ return mg, robot_cfg, robot_world
+
+
+def benchmark_mb(
+ write_usd=False,
+ save_log=False,
+ write_plot=False,
+ write_benchmark=False,
+ plot_cost=False,
+ override_tsteps: Optional[int] = None,
+ args=None,
+):
+ # load dataset:
+ graph_mode = args.graph
+ interpolation_dt = 0.02
+ file_paths = [demo_raw, motion_benchmaker_raw, mpinets_raw][1:]
+
+ enable_debug = save_log or plot_cost
+ all_files = []
+ og_tsteps = 32
+ if override_tsteps is not None:
+ og_tsteps = override_tsteps
+
+ og_trajopt_seeds = 4
+ og_collision_activation_distance = 0.01
+ if args.graph:
+ og_trajopt_seeds = 4
+ for file_path in file_paths:
+ all_groups = []
+ mpinets_data = False
+ problems = file_path()
+ if "dresser_task_oriented" in list(problems.keys()):
+ mpinets_data = True
+ for key, v in tqdm(problems.items()):
+ scene_problems = problems[key]
+ m_list = []
+ i = -1
+ ik_fail = 0
+ trajopt_seeds = og_trajopt_seeds
+ tsteps = og_tsteps
+ collision_activation_distance = og_collision_activation_distance
+ finetune_dt_scale = 0.9
+ parallel_finetune = True
+ if "cage_panda" in key:
+ trajopt_seeds = 8
+
+ if "table_under_pick_panda" in key:
+ trajopt_seeds = 8
+ finetune_dt_scale = 0.98
+
+ if key == "cubby_task_oriented":
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.98
+
+ if "dresser_task_oriented" in key:
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.98
+
+ mg, robot_cfg, robot_world = load_curobo(
+ 0,
+ enable_debug,
+ tsteps,
+ trajopt_seeds,
+ mpinets_data,
+ graph_mode,
+ not args.disable_cuda_graph,
+ collision_activation_distance=collision_activation_distance,
+ finetune_dt_scale=finetune_dt_scale,
+ parallel_finetune=parallel_finetune,
+ args=args,
+ )
+ for problem in tqdm(scene_problems, leave=False):
+ i += 1
+ if problem["collision_buffer_ik"] < 0.0:
+ continue
+
+ plan_config = MotionGenPlanConfig(
+ max_attempts=10,
+ enable_graph_attempt=1,
+ enable_finetune_trajopt=True,
+ partial_ik_opt=False,
+ enable_graph=graph_mode,
+ timeout=60,
+ enable_opt=not graph_mode,
+ parallel_finetune=True,
+ )
+
+ q_start = problem["start"]
+ pose = (
+ problem["goal_pose"]["position_xyz"] + problem["goal_pose"]["quaternion_wxyz"]
+ )
+
+ problem_name = key + "_" + str(i)
+ if args.mesh:
+ problem_name = "mesh_" + problem_name
+ # reset planner
+ mg.reset(reset_seed=False)
+ world = WorldConfig.from_dict(problem["obstacles"])
+
+ # mg.world_coll_checker.clear_cache()
+ world_coll = WorldConfig.from_dict(problem["obstacles"])
+ if args.mesh:
+ world_coll = world_coll.get_mesh_world(merge_meshes=False)
+ robot_world.clear_world_cache()
+ robot_world.update_world(world_coll)
+
+ esdf = robot_world.world_model.get_esdf_in_bounding_box(
+ Cuboid(name="base", pose=[0, 0, 0, 1, 0, 0, 0], dims=[2.4, 2.4, 2.4]),
+ voxel_size=0.005,
+ dtype=torch.float32,
+ )
+ world_voxel_collision = mg.world_coll_checker
+ world_voxel_collision.update_voxel_data(esdf)
+
+ torch.cuda.synchronize()
+
+ start_state = JointState.from_position(mg.tensor_args.to_device([q_start]))
+ if i == 0:
+ for _ in range(3):
+ result = mg.plan_single(
+ start_state,
+ Pose.from_list(pose),
+ plan_config,
+ )
+ result = mg.plan_single(
+ start_state,
+ Pose.from_list(pose),
+ plan_config,
+ )
+ # if not result.success.item():
+ # world = write_yaml(problem["obstacles"], "dresser_task.yml")
+ # exit()
+
+ # print(result.total_time, result.ik_time, result.trajopt_time, result.finetune_time)
+ if result.status == "IK Fail":
+ ik_fail += 1
+ problem["solution"] = None
+ if save_log or write_usd:
+ world.randomize_color(r=[0.5, 0.9], g=[0.2, 0.5], b=[0.0, 0.2])
+
+ coll_mesh = mg.world_coll_checker.get_mesh_in_bounding_box(
+ curobo_Cuboid(name="test", pose=[0, 0, 0, 1, 0, 0, 0], dims=[2, 2, 2]),
+ voxel_size=0.005,
+ )
+
+ coll_mesh.color = [0.0, 0.8, 0.8, 0.2]
+
+ coll_mesh.name = "voxel_world"
+ # world = WorldConfig(mesh=[coll_mesh])
+ world.add_obstacle(coll_mesh)
+ # get costs:
+ if plot_cost:
+ dt = 0.5
+ problem_name = "approx_wolfe_p" + problem_name
+ if result.optimized_dt is not None:
+ dt = result.optimized_dt.item()
+ if "trajopt_result" in result.debug_info:
+ success = result.success.item()
+ traj_cost = result.debug_info["trajopt_result"].debug_info["solver"][
+ "cost"
+ ][-1]
+ traj_cost = torch.cat(traj_cost, dim=-1)
+ plot_cost_iteration(
+ traj_cost,
+ title=problem_name + "_" + str(success) + "_" + str(dt),
+ save_path=join_path("benchmark/log/plot/", problem_name + "_cost"),
+ log_scale=False,
+ )
+ if "finetune_trajopt_result" in result.debug_info:
+ traj_cost = result.debug_info["finetune_trajopt_result"].debug_info[
+ "solver"
+ ]["cost"][0]
+ traj_cost = torch.cat(traj_cost, dim=-1)
+ plot_cost_iteration(
+ traj_cost,
+ title=problem_name + "_" + str(success) + "_" + str(dt),
+ save_path=join_path(
+ "benchmark/log/plot/", problem_name + "_ft_cost"
+ ),
+ log_scale=False,
+ )
+ if result.success.item():
+ q_traj = result.get_interpolated_plan()
+ problem["goal_ik"] = q_traj.position.cpu().squeeze().numpy()[-1, :].tolist()
+ problem["solution"] = {
+ "position": result.get_interpolated_plan()
+ .position.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "velocity": result.get_interpolated_plan()
+ .velocity.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "acceleration": result.get_interpolated_plan()
+ .acceleration.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "jerk": result.get_interpolated_plan()
+ .jerk.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "dt": interpolation_dt,
+ }
+ debug = {
+ "used_graph": result.used_graph,
+ "attempts": result.attempts,
+ "ik_time": result.ik_time,
+ "graph_time": result.graph_time,
+ "trajopt_time": result.trajopt_time,
+ "total_time": result.total_time,
+ "solve_time": result.solve_time,
+ "opt_traj": {
+ "position": result.optimized_plan.position.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "velocity": result.optimized_plan.velocity.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "acceleration": result.optimized_plan.acceleration.cpu()
+ .squeeze()
+ .numpy()
+ .tolist(),
+ "jerk": result.optimized_plan.jerk.cpu().squeeze().numpy().tolist(),
+ "dt": result.optimized_dt.item(),
+ },
+ "valid_query": result.valid_query,
+ }
+ problem["solution_debug"] = debug
+
+ # check if path is collision free w.r.t. ground truth mesh:
+ # robot_world.world_model.clear_cache()
+
+ q_int_traj = result.get_interpolated_plan().position.unsqueeze(0)
+ d_int_mask = (
+ torch.count_nonzero(~robot_world.validate_trajectory(q_int_traj)) == 0
+ ).item()
+
+ q_traj = result.optimized_plan.position.unsqueeze(0)
+ d_mask = (
+ torch.count_nonzero(~robot_world.validate_trajectory(q_traj)) == 0
+ ).item()
+ # d_world, _ = robot_world.get_world_self_collision_distance_from_joint_trajectory(
+ # q_traj)
+ # thres_dist = robot_world.contact_distance
+ # in_collision = d_world.squeeze(0) > thres_dist
+ # d_mask = not torch.any(in_collision, dim=-1).item()
+ # if not d_mask:
+ # write_usd = True
+ # #print(torch.max(d_world).item(), problem_name)
+ current_metrics = CuroboMetrics(
+ skip=False,
+ success=True,
+ perception_success=d_mask,
+ perception_interpolated_success=d_int_mask,
+ time=result.total_time,
+ collision=False,
+ joint_limit_violation=False,
+ self_collision=False,
+ position_error=result.position_error.item() * 100.0,
+ orientation_error=result.rotation_error.item() * 100.0,
+ eef_position_path_length=10,
+ eef_orientation_path_length=10,
+ attempts=result.attempts,
+ motion_time=result.motion_time.item(),
+ solve_time=result.solve_time,
+ jerk=torch.max(torch.abs(result.optimized_plan.jerk)).item(),
+ )
+
+ # run planner
+ if write_usd: # and not d_int_mask:
+ # CuRobo
+ from curobo.util.usd_helper import UsdHelper
+
+ q_traj = result.get_interpolated_plan()
+ UsdHelper.write_trajectory_animation_with_robot_usd(
+ robot_cfg,
+ world,
+ start_state,
+ q_traj,
+ dt=result.interpolation_dt,
+ save_path=join_path("benchmark/log/usd/", problem_name) + ".usd",
+ interpolation_steps=1,
+ write_robot_usd_path="benchmark/log/usd/assets/",
+ robot_usd_local_reference="assets/",
+ base_frame="/world_" + problem_name,
+ visualize_robot_spheres=True,
+ # flatten_usd=True,
+ )
+ # write_usd = False
+ # exit()
+ if write_plot:
+ problem_name = problem_name
+ plot_traj(
+ result.optimized_plan,
+ result.optimized_dt.item(),
+ # result.get_interpolated_plan(),
+ # result.interpolation_dt,
+ title=problem_name,
+ save_path=join_path("benchmark/log/plot/", problem_name + ".pdf"),
+ )
+ plot_traj(
+ # result.optimized_plan,
+ # result.optimized_dt.item(),
+ result.get_interpolated_plan(),
+ result.interpolation_dt,
+ title=problem_name,
+ save_path=join_path("benchmark/log/plot/", problem_name + "_int.pdf"),
+ )
+
+ m_list.append(current_metrics)
+ all_groups.append(current_metrics)
+ elif result.valid_query:
+ current_metrics = CuroboMetrics()
+ debug = {
+ "used_graph": result.used_graph,
+ "attempts": result.attempts,
+ "ik_time": result.ik_time,
+ "graph_time": result.graph_time,
+ "trajopt_time": result.trajopt_time,
+ "total_time": result.total_time,
+ "solve_time": result.solve_time,
+ "status": result.status,
+ "valid_query": result.valid_query,
+ }
+ problem["solution_debug"] = debug
+
+ m_list.append(current_metrics)
+ all_groups.append(current_metrics)
+ else:
+ # print("invalid: " + problem_name)
+ debug = {
+ "used_graph": result.used_graph,
+ "attempts": result.attempts,
+ "ik_time": result.ik_time,
+ "graph_time": result.graph_time,
+ "trajopt_time": result.trajopt_time,
+ "total_time": result.total_time,
+ "solve_time": result.solve_time,
+ "status": result.status,
+ "valid_query": result.valid_query,
+ }
+ problem["solution_debug"] = debug
+ if save_log: # and not result.success.item():
+ # CuRobo
+ from curobo.util.usd_helper import UsdHelper
+
+ UsdHelper.write_motion_gen_log(
+ result,
+ robot_cfg,
+ world,
+ start_state,
+ Pose.from_list(pose),
+ join_path("benchmark/log/usd/", problem_name) + "_debug",
+ write_ik=True,
+ write_trajopt=True,
+ visualize_robot_spheres=True,
+ grid_space=2,
+ # flatten_usd=True,
+ )
+ exit()
+ g_m = CuroboGroupMetrics.from_list(m_list)
+ print(
+ key,
+ f"{g_m.success:2.2f}",
+ g_m.time.mean,
+ # g_m.time.percent_75,
+ g_m.time.percent_98,
+ g_m.position_error.percent_98,
+ # g_m.position_error.median,
+ g_m.orientation_error.percent_98,
+ g_m.cspace_path_length.percent_98,
+ g_m.motion_time.percent_98,
+ g_m.perception_interpolated_success,
+ # g_m.orientation_error.median,
+ )
+ print(g_m.attempts)
+ g_m = CuroboGroupMetrics.from_list(all_groups)
+ print(
+ "All: ",
+ f"{g_m.success:2.2f}",
+ g_m.motion_time.percent_98,
+ g_m.time.mean,
+ g_m.time.percent_75,
+ g_m.position_error.percent_75,
+ g_m.orientation_error.percent_75,
+ g_m.perception_success,
+ )
+ print(g_m.attempts)
+ if write_benchmark:
+ if not mpinets_data:
+ write_yaml(problems, "mb_curobo_solution_voxel.yaml")
+ else:
+ write_yaml(problems, "mpinets_curobo_solution_voxel.yaml")
+ all_files += all_groups
+ g_m = CuroboGroupMetrics.from_list(all_files)
+ print("######## FULL SET ############")
+ print("All: ", f"{g_m.success:2.2f}")
+ print(
+ "Perception Success (coarse, interpolated):",
+ g_m.perception_success,
+ g_m.perception_interpolated_success,
+ )
+ print("MT: ", g_m.motion_time)
+ print("PT:", g_m.time)
+ print("ST: ", g_m.solve_time)
+ print("accuracy: ", g_m.position_error, g_m.orientation_error)
+ print("Jerk: ", g_m.jerk)
+
+
+if __name__ == "__main__":
+ setup_curobo_logger("error")
+ parser = argparse.ArgumentParser()
+
+ parser.add_argument(
+ "--mesh",
+ action="store_true",
+ help="When True, runs only geometric planner",
+ default=False,
+ )
+
+ parser.add_argument(
+ "--graph",
+ action="store_true",
+ help="When True, runs only geometric planner",
+ default=False,
+ )
+
+ parser.add_argument(
+ "--disable_cuda_graph",
+ action="store_true",
+ help="When True, disable cuda graph during benchmarking",
+ default=False,
+ )
+ args = parser.parse_args()
+ benchmark_mb(
+ save_log=False,
+ write_usd=False,
+ write_plot=False,
+ write_benchmark=False,
+ plot_cost=False,
+ args=args,
+ )
diff --git a/benchmark/curobo_voxel_profile.py b/benchmark/curobo_voxel_profile.py
new file mode 100644
index 0000000..f706d51
--- /dev/null
+++ b/benchmark/curobo_voxel_profile.py
@@ -0,0 +1,305 @@
+#
+# Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
+#
+
+# Standard Library
+import argparse
+from copy import deepcopy
+from typing import Optional
+
+# Third Party
+import numpy as np
+import torch
+from robometrics.datasets import demo_raw, motion_benchmaker_raw, mpinets_raw
+from torch.profiler import ProfilerActivity, profile, record_function
+from tqdm import tqdm
+
+# CuRobo
+from curobo.geom.sdf.world import CollisionCheckerType, WorldConfig
+from curobo.geom.types import Cuboid
+from curobo.geom.types import Cuboid as curobo_Cuboid
+from curobo.geom.types import Mesh, VoxelGrid
+from curobo.types.base import TensorDeviceType
+from curobo.types.camera import CameraObservation
+from curobo.types.math import Pose
+from curobo.types.robot import RobotConfig
+from curobo.types.state import JointState
+from curobo.util.logger import setup_curobo_logger
+from curobo.util.metrics import CuroboGroupMetrics, CuroboMetrics
+from curobo.util_file import (
+ get_assets_path,
+ get_robot_configs_path,
+ get_world_configs_path,
+ join_path,
+ load_yaml,
+ write_yaml,
+)
+from curobo.wrap.model.robot_world import RobotWorld, RobotWorldConfig
+from curobo.wrap.reacher.motion_gen import MotionGen, MotionGenConfig, MotionGenPlanConfig
+
+torch.manual_seed(0)
+
+torch.backends.cudnn.benchmark = True
+
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
+
+np.random.seed(0)
+
+
+def load_curobo(
+ n_cubes: int,
+ enable_debug: bool = False,
+ tsteps: int = 30,
+ trajopt_seeds: int = 4,
+ mpinets: bool = False,
+ graph_mode: bool = False,
+ cuda_graph: bool = True,
+ collision_activation_distance: float = 0.025,
+ finetune_dt_scale: float = 1.0,
+ parallel_finetune: bool = True,
+):
+ robot_cfg = load_yaml(join_path(get_robot_configs_path(), "franka.yml"))["robot_cfg"]
+ robot_cfg["kinematics"]["collision_sphere_buffer"] = 0.0
+
+ ik_seeds = 30
+ if graph_mode:
+ trajopt_seeds = 4
+ if trajopt_seeds >= 14:
+ ik_seeds = max(100, trajopt_seeds * 2)
+ if mpinets:
+ robot_cfg["kinematics"]["lock_joints"] = {
+ "panda_finger_joint1": 0.025,
+ "panda_finger_joint2": 0.025,
+ }
+ world_cfg = WorldConfig.from_dict(
+ {
+ "voxel": {
+ "base": {
+ "dims": [2.0, 2.0, 3.0],
+ "pose": [0, 0, 0, 1, 0, 0, 0],
+ "voxel_size": 0.01,
+ "feature_dtype": torch.float8_e4m3fn,
+ },
+ }
+ }
+ )
+ interpolation_steps = 2000
+ if graph_mode:
+ interpolation_steps = 100
+ robot_cfg_instance = RobotConfig.from_dict(robot_cfg, tensor_args=TensorDeviceType())
+ K = robot_cfg_instance.kinematics.kinematics_config.joint_limits
+ K.position[0, :] -= 0.2
+ K.position[1, :] += 0.2
+
+ motion_gen_config = MotionGenConfig.load_from_robot_config(
+ robot_cfg_instance,
+ world_cfg,
+ trajopt_tsteps=tsteps,
+ collision_checker_type=CollisionCheckerType.VOXEL,
+ use_cuda_graph=cuda_graph,
+ position_threshold=0.005, # 0.5 cm
+ rotation_threshold=0.05,
+ num_ik_seeds=ik_seeds,
+ num_graph_seeds=trajopt_seeds,
+ num_trajopt_seeds=trajopt_seeds,
+ interpolation_dt=0.025,
+ store_ik_debug=enable_debug,
+ store_trajopt_debug=enable_debug,
+ interpolation_steps=interpolation_steps,
+ collision_activation_distance=collision_activation_distance,
+ trajopt_dt=0.25,
+ finetune_dt_scale=finetune_dt_scale,
+ maximum_trajectory_dt=0.16,
+ finetune_trajopt_iters=300,
+ )
+ mg = MotionGen(motion_gen_config)
+ mg.warmup(enable_graph=True, warmup_js_trajopt=False, parallel_finetune=True)
+ # create a ground truth collision checker:
+ config = RobotWorldConfig.load_from_config(
+ robot_cfg_instance,
+ "collision_table.yml",
+ collision_activation_distance=0.0,
+ collision_checker_type=CollisionCheckerType.PRIMITIVE,
+ n_cuboids=50,
+ )
+ robot_world = RobotWorld(config)
+
+ return mg, robot_cfg, robot_world
+
+
+def benchmark_mb(
+ write_usd=False,
+ save_log=False,
+ write_plot=False,
+ write_benchmark=False,
+ plot_cost=False,
+ override_tsteps: Optional[int] = None,
+ args=None,
+):
+ # load dataset:
+ graph_mode = args.graph
+ interpolation_dt = 0.02
+ file_paths = [demo_raw, motion_benchmaker_raw, mpinets_raw][:1]
+
+ enable_debug = save_log or plot_cost
+ all_files = []
+ og_tsteps = 32
+ if override_tsteps is not None:
+ og_tsteps = override_tsteps
+
+ og_trajopt_seeds = 12
+ og_collision_activation_distance = 0.01 # 0.03
+ if args.graph:
+ og_trajopt_seeds = 4
+ for file_path in file_paths:
+ all_groups = []
+ mpinets_data = False
+ problems = file_path()
+
+ for key, v in tqdm(problems.items()):
+ scene_problems = problems[key]
+ m_list = []
+ i = -1
+ ik_fail = 0
+ trajopt_seeds = og_trajopt_seeds
+ tsteps = og_tsteps
+ collision_activation_distance = og_collision_activation_distance
+ finetune_dt_scale = 1.0
+ parallel_finetune = True
+ if "cage_panda" in key:
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.95
+ parallel_finetune = True
+ if "table_under_pick_panda" in key:
+ tsteps = 36
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.95
+ parallel_finetune = True
+ # collision_activation_distance = 0.015
+
+ if "table_pick_panda" in key:
+ collision_activation_distance = 0.005
+
+ if "cubby_task_oriented" in key: # and "merged" not in key:
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.95
+ collision_activation_distance = 0.005
+ parallel_finetune = True
+ if "dresser_task_oriented" in key:
+ trajopt_seeds = 16
+ finetune_dt_scale = 0.95
+ parallel_finetune = True
+ if key in [
+ "tabletop_neutral_start",
+ "merged_cubby_neutral_start",
+ "cubby_neutral_start",
+ "cubby_neutral_goal",
+ "dresser_neutral_start",
+ "tabletop_task_oriented",
+ ]:
+ collision_activation_distance = 0.005
+ if "dresser_task_oriented" in list(problems.keys()):
+ mpinets_data = True
+
+ mg, robot_cfg, robot_world = load_curobo(
+ 0,
+ enable_debug,
+ tsteps,
+ trajopt_seeds,
+ mpinets_data,
+ graph_mode,
+ not args.disable_cuda_graph,
+ collision_activation_distance=collision_activation_distance,
+ finetune_dt_scale=finetune_dt_scale,
+ parallel_finetune=parallel_finetune,
+ )
+ for problem in tqdm(scene_problems, leave=False):
+ i += 1
+ if problem["collision_buffer_ik"] < 0.0:
+ continue
+
+ plan_config = MotionGenPlanConfig(
+ max_attempts=10,
+ enable_graph_attempt=1,
+ enable_finetune_trajopt=True,
+ partial_ik_opt=False,
+ enable_graph=graph_mode,
+ timeout=60,
+ enable_opt=not graph_mode,
+ parallel_finetune=True,
+ )
+
+ q_start = problem["start"]
+ pose = (
+ problem["goal_pose"]["position_xyz"] + problem["goal_pose"]["quaternion_wxyz"]
+ )
+
+ # reset planner
+ mg.reset(reset_seed=False)
+ world_coll = WorldConfig.from_dict(problem["obstacles"]).get_obb_world()
+
+ robot_world.update_world(world_coll)
+
+ esdf = robot_world.world_model.get_esdf_in_bounding_box(
+ Cuboid(name="base", pose=[0, 0, 0, 1, 0, 0, 0], dims=[2, 2, 3]), voxel_size=0.01
+ )
+ world_voxel_collision = mg.world_coll_checker
+ world_voxel_collision.update_voxel_data(esdf)
+
+ start_state = JointState.from_position(mg.tensor_args.to_device([q_start]))
+ for _ in range(2):
+ result = mg.plan_single(
+ start_state,
+ Pose.from_list(pose),
+ plan_config,
+ )
+ print("Profiling...")
+ with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
+ torch.cuda.profiler.start()
+
+ result = mg.plan_single(
+ start_state,
+ Pose.from_list(pose),
+ plan_config,
+ )
+ torch.cuda.profiler.stop()
+
+ print("Exporting the trace..")
+ prof.export_chrome_trace("benchmark/log/trace/motion_gen_voxel.json")
+
+ exit()
+
+
+if __name__ == "__main__":
+ setup_curobo_logger("error")
+ parser = argparse.ArgumentParser()
+
+ parser.add_argument(
+ "--graph",
+ action="store_true",
+ help="When True, runs only geometric planner",
+ default=False,
+ )
+ parser.add_argument(
+ "--disable_cuda_graph",
+ action="store_true",
+ help="When True, disable cuda graph during benchmarking",
+ default=False,
+ )
+ args = parser.parse_args()
+ benchmark_mb(
+ save_log=False,
+ write_usd=False,
+ write_plot=False,
+ write_benchmark=False,
+ plot_cost=False,
+ args=args,
+ )
diff --git a/benchmark/ik_benchmark.py b/benchmark/ik_benchmark.py
index 7c97f75..2eb75ba 100644
--- a/benchmark/ik_benchmark.py
+++ b/benchmark/ik_benchmark.py
@@ -59,21 +59,22 @@ def run_full_config_collision_free_ik(
robot_cfg = RobotConfig.from_dict(robot_data)
world_cfg = WorldConfig.from_dict(load_yaml(join_path(get_world_configs_path(), world_file)))
position_threshold = 0.005
+ grad_iters = None
if high_precision:
position_threshold = 0.001
+ grad_iters = 100
ik_config = IKSolverConfig.load_from_robot_config(
robot_cfg,
world_cfg,
- rotation_threshold=0.05,
position_threshold=position_threshold,
- num_seeds=24,
+ num_seeds=20,
self_collision_check=collision_free,
self_collision_opt=collision_free,
tensor_args=tensor_args,
use_cuda_graph=use_cuda_graph,
high_precision=high_precision,
regularization=False,
- # grad_iters=500,
+ grad_iters=grad_iters,
)
ik_solver = IKSolver(ik_config)
@@ -140,7 +141,7 @@ if __name__ == "__main__":
"Position-Error-Collision-Free-IK(mm)": [],
"Orientation-Error-Collision-Free-IK": [],
}
- for robot_file in robot_list:
+ for robot_file in robot_list[:1]:
# create a sampler with dof:
for b_size in b_list:
# sample test configs:
diff --git a/docker/aarch64.dockerfile b/docker/aarch64.dockerfile
index e1ee188..3ea0d5c 100644
--- a/docker/aarch64.dockerfile
+++ b/docker/aarch64.dockerfile
@@ -137,7 +137,7 @@ RUN pip3 install warp-lang
# install curobo:
-RUN mkdir /pkgs && git clone https://github.com/NVlabs/curobo.git
+RUN mkdir /pkgs && cd /pkgs && git clone https://github.com/NVlabs/curobo.git
ENV TORCH_CUDA_ARCH_LIST "7.0+PTX"
@@ -166,7 +166,7 @@ RUN cd /pkgs && git clone https://github.com/nvlabs/nvblox_torch.git && \
python3 -m pip install -e .
-RUN python -m pip install "robometrics[evaluator] @ git+https://github.com/fishbotics/robometrics.git"
+RUN python3 -m pip install "robometrics[evaluator] @ git+https://github.com/fishbotics/robometrics.git"
# upgrade typing extensions:
RUN python3 -m pip install typing-extensions --upgrade
diff --git a/docker/user.dockerfile b/docker/user.dockerfile
index 50bc6dc..b402b83 100644
--- a/docker/user.dockerfile
+++ b/docker/user.dockerfile
@@ -15,12 +15,13 @@ FROM curobo_docker:${IMAGE_TAG}
# Set variables
ARG USERNAME
ARG USER_ID
-ARG CACHE_DATE=2024-02-20
+ARG CACHE_DATE=2024-03-18
# Set environment variables
# Set up sudo user
#RUN /sbin/adduser --disabled-password --gecos '' --uid $USER_ID $USERNAME
+# RUN useradd -l -u 1000 $USERNAME
RUN useradd -l -u $USER_ID -g users $USERNAME
RUN /sbin/adduser $USERNAME sudo
@@ -28,6 +29,7 @@ RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
# Set user
+# RUN mkdir /home/$USERNAME && chown $USERNAME:$USERNAME /home/$USERNAME
USER $USERNAME
WORKDIR /home/$USERNAME
ENV USER=$USERNAME
diff --git a/examples/ik_example.py b/examples/ik_example.py
index 7798a6f..3488100 100644
--- a/examples/ik_example.py
+++ b/examples/ik_example.py
@@ -31,7 +31,7 @@ torch.backends.cudnn.allow_tf32 = True
def demo_basic_ik():
tensor_args = TensorDeviceType()
- config_file = load_yaml(join_path(get_robot_configs_path(), "franka.yml"))
+ config_file = load_yaml(join_path(get_robot_configs_path(), "ur10e.yml"))
urdf_file = config_file["robot_cfg"]["kinematics"][
"urdf_path"
] # Send global path starting with "/"
@@ -48,13 +48,13 @@ def demo_basic_ik():
self_collision_check=False,
self_collision_opt=False,
tensor_args=tensor_args,
- use_cuda_graph=False,
+ use_cuda_graph=True,
)
ik_solver = IKSolver(ik_config)
# print(kin_state)
for _ in range(10):
- q_sample = ik_solver.sample_configs(5000)
+ q_sample = ik_solver.sample_configs(100)
kin_state = ik_solver.fk(q_sample)
goal = Pose(kin_state.ee_position, kin_state.ee_quaternion)
@@ -62,12 +62,12 @@ def demo_basic_ik():
result = ik_solver.solve_batch(goal)
torch.cuda.synchronize()
print(
- "Success, Solve Time(s), Total Time(s) ",
+ "Success, Solve Time(s), hz ",
torch.count_nonzero(result.success).item() / len(q_sample),
result.solve_time,
q_sample.shape[0] / (time.time() - st_time),
- torch.mean(result.position_error) * 100.0,
- torch.mean(result.rotation_error) * 100.0,
+ torch.mean(result.position_error),
+ torch.mean(result.rotation_error),
)
@@ -97,7 +97,7 @@ def demo_full_config_collision_free_ik():
# print(kin_state)
print("Running Single IK")
for _ in range(10):
- q_sample = ik_solver.sample_configs(5000)
+ q_sample = ik_solver.sample_configs(1)
kin_state = ik_solver.fk(q_sample)
goal = Pose(kin_state.ee_position, kin_state.ee_quaternion)
@@ -107,7 +107,7 @@ def demo_full_config_collision_free_ik():
total_time = (time.time() - st_time) / q_sample.shape[0]
print(
"Success, Solve Time(s), Total Time(s)",
- torch.count_nonzero(result.success).item() / len(q_sample),
+ torch.count_nonzero(result.success).item(),
result.solve_time,
total_time,
1.0 / total_time,
diff --git a/examples/isaac_sim/helper.py b/examples/isaac_sim/helper.py
index 20a1d2e..9586ed0 100644
--- a/examples/isaac_sim/helper.py
+++ b/examples/isaac_sim/helper.py
@@ -100,7 +100,6 @@ def add_robot_to_scene(
import_config,
dest_path,
)
-
# prim_path = omni.usd.get_stage_next_free_path(
# my_world.scene.stage, str(my_world.scene.stage.GetDefaultPrim().GetPath()) + robot_path, False)
# print(prim_path)
@@ -112,13 +111,13 @@ def add_robot_to_scene(
position=position,
)
if ISAAC_SIM_23:
- robot_p.set_solver_velocity_iteration_count(4)
+ robot_p.set_solver_velocity_iteration_count(0)
robot_p.set_solver_position_iteration_count(44)
my_world._physics_context.set_solver_type("PGS")
robot = my_world.scene.add(robot_p)
-
+ robot_path = robot.prim_path
return robot, robot_path
diff --git a/examples/kinematics_example.py b/examples/kinematics_example.py
index 9b83829..8f2958a 100644
--- a/examples/kinematics_example.py
+++ b/examples/kinematics_example.py
@@ -39,7 +39,7 @@ def demo_basic_robot():
# compute forward kinematics:
- q = torch.rand((10, kin_model.get_dof()), **vars(tensor_args))
+ q = torch.rand((10, kin_model.get_dof()), **(tensor_args.as_torch_dict()))
out = kin_model.get_state(q)
# here is the kinematics state:
# print(out)
@@ -55,7 +55,7 @@ def demo_full_config_robot():
kin_model = CudaRobotModel(robot_cfg.kinematics)
# compute forward kinematics:
- q = torch.rand((10, kin_model.get_dof()), **vars(tensor_args))
+ q = torch.rand((10, kin_model.get_dof()), **(tensor_args.as_torch_dict()))
out = kin_model.get_state(q)
# here is the kinematics state:
# print(out)
diff --git a/examples/mpc_example.py b/examples/mpc_example.py
index fef9b8e..06bb92e 100644
--- a/examples/mpc_example.py
+++ b/examples/mpc_example.py
@@ -94,10 +94,10 @@ def demo_full_config_mpc():
while not converged:
st_time = time.time()
# current_state.position += 0.1
- print(current_state.position)
+ # print(current_state.position)
result = mpc.step(current_state, 1)
- print(mpc.get_visual_rollouts().shape)
+ # print(mpc.get_visual_rollouts().shape)
# exit()
torch.cuda.synchronize()
if tstep > 5:
diff --git a/examples/robot_image_segmentation_example.py b/examples/robot_image_segmentation_example.py
index d98fcdb..52e82c7 100644
--- a/examples/robot_image_segmentation_example.py
+++ b/examples/robot_image_segmentation_example.py
@@ -32,7 +32,6 @@ from curobo.types.robot import RobotConfig
from curobo.types.state import JointState
from curobo.util_file import get_robot_configs_path, get_world_configs_path, join_path, load_yaml
from curobo.wrap.model.robot_segmenter import RobotSegmenter
-from curobo.wrap.model.robot_world import RobotWorld, RobotWorldConfig
torch.manual_seed(30)
@@ -42,7 +41,13 @@ torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
-def create_render_dataset(robot_file, save_debug_data: bool = False):
+def create_render_dataset(
+ robot_file,
+ save_debug_data: bool = False,
+ fov_deg: float = 60,
+ n_frames: int = 20,
+ retract_delta: float = 0.0,
+):
# load robot:
robot_dict = load_yaml(join_path(get_robot_configs_path(), robot_file))
robot_dict["robot_cfg"]["kinematics"]["load_link_names_with_mesh"] = True
@@ -54,6 +59,8 @@ def create_render_dataset(robot_file, save_debug_data: bool = False):
q = kin_model.retract_config
+ q += retract_delta
+
meshes = kin_model.get_robot_as_mesh(q)
world = WorldConfig(mesh=meshes[:])
@@ -71,10 +78,11 @@ def create_render_dataset(robot_file, save_debug_data: bool = False):
mesh_dataset = MeshDataset(
None,
- n_frames=20,
- image_size=480,
+ n_frames=n_frames,
+ image_size=640,
save_data_dir=None,
trimesh_mesh=robot_mesh,
+ fov_deg=fov_deg,
)
q_js = JointState(position=q, joint_names=kin_model.joint_names)
@@ -83,6 +91,7 @@ def create_render_dataset(robot_file, save_debug_data: bool = False):
def mask_image(robot_file="ur5e.yml"):
save_debug_data = False
+ write_pointcloud = False
# create robot segmenter:
tensor_args = TensorDeviceType()
@@ -90,7 +99,7 @@ def mask_image(robot_file="ur5e.yml"):
robot_file, collision_sphere_buffer=0.01, distance_threshold=0.05, use_cuda_graph=True
)
- mesh_dataset, q_js = create_render_dataset(robot_file, save_debug_data)
+ mesh_dataset, q_js = create_render_dataset(robot_file, write_pointcloud, n_frames=20)
if save_debug_data:
visualize_scale = 10.0
@@ -113,14 +122,15 @@ def mask_image(robot_file="ur5e.yml"):
# save robot spheres in current joint configuration
robot_kinematics = curobo_segmenter._robot_world.kinematics
- sph = robot_kinematics.get_robot_as_spheres(q_js.position)
- WorldConfig(sphere=sph[0]).save_world_as_mesh("robot_spheres.stl")
+ if write_pointcloud:
+ sph = robot_kinematics.get_robot_as_spheres(q_js.position)
+ WorldConfig(sphere=sph[0]).save_world_as_mesh("robot_spheres.stl")
- # save world pointcloud in robot origin
+ # save world pointcloud in robot origin
- pc = cam_obs.get_pointcloud()
- pc_obs = PointCloud("world", pose=cam_obs.pose.to_list(), points=pc)
- pc_obs.save_as_mesh("camera_pointcloud.stl", transform_with_pose=True)
+ pc = cam_obs.get_pointcloud()
+ pc_obs = PointCloud("world", pose=cam_obs.pose.to_list(), points=pc)
+ pc_obs.save_as_mesh("camera_pointcloud.stl", transform_with_pose=True)
# run segmentation:
depth_mask, filtered_image = curobo_segmenter.get_robot_mask_from_active_js(
@@ -131,10 +141,12 @@ def mask_image(robot_file="ur5e.yml"):
robot_mask = cam_obs.clone()
robot_mask.depth_image[~depth_mask] = 0.0
- robot_mesh = PointCloud(
- "world", pose=robot_mask.pose.to_list(), points=robot_mask.get_pointcloud()
- )
- robot_mesh.save_as_mesh("robot_segmented.stl", transform_with_pose=True)
+
+ if write_pointcloud:
+ robot_mesh = PointCloud(
+ "world", pose=robot_mask.pose.to_list(), points=robot_mask.get_pointcloud()
+ )
+ robot_mesh.save_as_mesh("robot_segmented.stl", transform_with_pose=True)
# save depth image
imageio.imwrite(
"robot_depth.png",
@@ -150,10 +162,11 @@ def mask_image(robot_file="ur5e.yml"):
world_mask = cam_obs.clone()
world_mask.depth_image[depth_mask] = 0.0
- world_mesh = PointCloud(
- "world", pose=world_mask.pose.to_list(), points=world_mask.get_pointcloud()
- )
- world_mesh.save_as_mesh("world_segmented.stl", transform_with_pose=True)
+ if write_pointcloud:
+ world_mesh = PointCloud(
+ "world", pose=world_mask.pose.to_list(), points=world_mask.get_pointcloud()
+ )
+ world_mesh.save_as_mesh("world_segmented.stl", transform_with_pose=True)
imageio.imwrite(
"world_depth.png",
@@ -190,6 +203,248 @@ def mask_image(robot_file="ur5e.yml"):
print("Segmentation Time (ms), (hz)", np.mean(dt_list[5:]) * 1000.0, 1.0 / np.mean(dt_list[5:]))
+def batch_mask_image(robot_file="ur5e.yml"):
+ """Mask images from different camera views using batched query.
+
+ Note: This only works for a single joint configuration across camera views.
+
+ Args:
+ robot_file: robot to use for example.
+ """
+ save_debug_data = True
+ # create robot segmenter:
+ tensor_args = TensorDeviceType()
+
+ curobo_segmenter = RobotSegmenter.from_robot_file(
+ robot_file, collision_sphere_buffer=0.01, distance_threshold=0.05, use_cuda_graph=True
+ )
+
+ mesh_dataset, q_js = create_render_dataset(robot_file, save_debug_data, fov_deg=60)
+
+ mesh_dataset_zoom, q_js = create_render_dataset(
+ robot_file, save_debug_data, fov_deg=40, n_frames=30
+ )
+
+ if save_debug_data:
+ visualize_scale = 10.0
+ data = mesh_dataset[0]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]).unsqueeze(0) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ data_zoom = mesh_dataset_zoom[1]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs_zoom = CameraObservation(
+ depth_image=tensor_args.to_device(data_zoom["depth"]) * 1000,
+ intrinsics=data_zoom["intrinsics"],
+ pose=Pose.from_matrix(data_zoom["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs = cam_obs.stack(cam_obs_zoom)
+
+ for i in range(cam_obs.depth_image.shape[0]):
+ # save depth image
+ imageio.imwrite(
+ "camera_depth_" + str(i) + ".png",
+ (cam_obs.depth_image[i] * visualize_scale)
+ .squeeze()
+ .detach()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ # run segmentation:
+ depth_mask, filtered_image = curobo_segmenter.get_robot_mask_from_active_js(
+ cam_obs,
+ q_js,
+ )
+ # save robot points as mesh
+
+ robot_mask = cam_obs.clone()
+ robot_mask.depth_image[~depth_mask] = 0.0
+ for i in range(cam_obs.depth_image.shape[0]):
+ # save depth image
+ imageio.imwrite(
+ "robot_depth_" + str(i) + ".png",
+ (robot_mask.depth_image[i] * visualize_scale)
+ .detach()
+ .squeeze()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ # save world points as mesh
+
+ imageio.imwrite(
+ "world_depth_" + str(i) + ".png",
+ (filtered_image[i] * visualize_scale)
+ .detach()
+ .squeeze()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ dt_list = []
+
+ for i in range(len(mesh_dataset)):
+
+ data = mesh_dataset[i]
+ data_zoom = mesh_dataset_zoom[i + 1]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs_zoom = CameraObservation(
+ depth_image=tensor_args.to_device(data_zoom["depth"]) * 1000,
+ intrinsics=data_zoom["intrinsics"],
+ pose=Pose.from_matrix(data_zoom["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs = cam_obs.stack(cam_obs_zoom)
+ if not curobo_segmenter.ready:
+ curobo_segmenter.update_camera_projection(cam_obs)
+ st_time = time.time()
+
+ depth_mask, filtered_image = curobo_segmenter.get_robot_mask_from_active_js(
+ cam_obs,
+ q_js,
+ )
+
+ torch.cuda.synchronize()
+ dt_list.append(time.time() - st_time)
+
+ print("Segmentation Time (ms), (hz)", np.mean(dt_list[5:]) * 1000.0, 1.0 / np.mean(dt_list[5:]))
+
+
+def batch_robot_mask_image(robot_file="ur5e.yml"):
+ """Mask images from different camera views using batched query.
+
+ Note: This example treats each image to have different robot joint configuration
+
+ Args:
+ robot_file: robot to use for example.
+ """
+ save_debug_data = True
+ # create robot segmenter:
+ tensor_args = TensorDeviceType()
+
+ curobo_segmenter = RobotSegmenter.from_robot_file(
+ robot_file, collision_sphere_buffer=0.01, distance_threshold=0.05, use_cuda_graph=True
+ )
+
+ mesh_dataset, q_js = create_render_dataset(robot_file, save_debug_data, fov_deg=60)
+
+ mesh_dataset_zoom, q_js_zoom = create_render_dataset(
+ robot_file, save_debug_data, fov_deg=60, retract_delta=-0.5
+ )
+ q_js = q_js.unsqueeze(0)
+ q_js = q_js.stack(q_js_zoom.unsqueeze(0))
+
+ if save_debug_data:
+ visualize_scale = 10.0
+ data = mesh_dataset[0]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]).unsqueeze(0) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ data_zoom = mesh_dataset_zoom[0]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs_zoom = CameraObservation(
+ depth_image=tensor_args.to_device(data_zoom["depth"]) * 1000,
+ intrinsics=data_zoom["intrinsics"],
+ pose=Pose.from_matrix(data_zoom["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs = cam_obs.stack(cam_obs_zoom)
+
+ for i in range(cam_obs.depth_image.shape[0]):
+ # save depth image
+ imageio.imwrite(
+ "camera_depth_" + str(i) + ".png",
+ (cam_obs.depth_image[i] * visualize_scale)
+ .squeeze()
+ .detach()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ # run segmentation:
+ depth_mask, filtered_image = curobo_segmenter.get_robot_mask_from_active_js(
+ cam_obs,
+ q_js,
+ )
+ # save robot points as mesh
+
+ robot_mask = cam_obs.clone()
+ robot_mask.depth_image[~depth_mask] = 0.0
+ for i in range(cam_obs.depth_image.shape[0]):
+ # save depth image
+ imageio.imwrite(
+ "robot_depth_" + str(i) + ".png",
+ (robot_mask.depth_image[i] * visualize_scale)
+ .detach()
+ .squeeze()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ # save world points as mesh
+
+ imageio.imwrite(
+ "world_depth_" + str(i) + ".png",
+ (filtered_image[i] * visualize_scale)
+ .detach()
+ .squeeze()
+ .cpu()
+ .numpy()
+ .astype(np.uint16),
+ )
+
+ dt_list = []
+
+ for i in range(len(mesh_dataset)):
+
+ data = mesh_dataset[i]
+ data_zoom = mesh_dataset_zoom[i]
+ cam_obs = CameraObservation(
+ depth_image=tensor_args.to_device(data["depth"]) * 1000,
+ intrinsics=data["intrinsics"],
+ pose=Pose.from_matrix(data["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs_zoom = CameraObservation(
+ depth_image=tensor_args.to_device(data_zoom["depth"]) * 1000,
+ intrinsics=data_zoom["intrinsics"],
+ pose=Pose.from_matrix(data_zoom["pose"].to(device=tensor_args.device)),
+ )
+ cam_obs = cam_obs.stack(cam_obs_zoom)
+ if not curobo_segmenter.ready:
+ curobo_segmenter.update_camera_projection(cam_obs)
+ st_time = time.time()
+
+ depth_mask, filtered_image = curobo_segmenter.get_robot_mask_from_active_js(
+ cam_obs,
+ q_js,
+ )
+
+ torch.cuda.synchronize()
+ dt_list.append(time.time() - st_time)
+
+ print("Segmentation Time (ms), (hz)", np.mean(dt_list[5:]) * 1000.0, 1.0 / np.mean(dt_list[5:]))
+
+
def profile_mask_image(robot_file="ur5e.yml"):
# create robot segmenter:
tensor_args = TensorDeviceType()
@@ -236,3 +491,5 @@ def profile_mask_image(robot_file="ur5e.yml"):
if __name__ == "__main__":
mask_image()
# profile_mask_image()
+ # batch_mask_image()
+ # batch_robot_mask_image()
diff --git a/setup.cfg b/setup.cfg
index d7bb59e..d5e1758 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -41,12 +41,12 @@ platforms = any
[options]
install_requires =
+ pybind11
networkx
numpy
numpy-quaternion
pyyaml
setuptools_scm>=6.2
- torchtyping
torch>=1.10
trimesh
yourdfpy>=0.0.53
@@ -55,7 +55,7 @@ install_requires =
tqdm
wheel
importlib_resources
-
+ scikit-image
packages = find_namespace:
package_dir =
= src
diff --git a/setup.py b/setup.py
index c875560..11201a9 100644
--- a/setup.py
+++ b/setup.py
@@ -11,7 +11,7 @@
"""curobo package setuptools."""
-# NOTE (roflaherty): This file is still needed to allow the package to be
+# NOTE: This file is still needed to allow the package to be
# installed in editable mode.
#
# References:
@@ -21,12 +21,6 @@
import setuptools
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
-print("*********************************")
-print("CuRobo: Compiling CUDA kernels...")
-print(
- "This will take 20+ minutes, to speedup compilation set TORCH_CUDA_ARCH_LIST={X}+PTX to "
- + " compile only for a specific architecture. e.g., export TORCH_CUDA_ARCH_LIST=7.0+PTX"
-)
extra_cuda_args = {
"nvcc": [
"--threads=8",
@@ -55,16 +49,6 @@ ext_modules = [
],
extra_compile_args=extra_cuda_args,
),
- CUDAExtension(
- "curobo.curobolib.geom_cu",
- [
- "src/curobo/curobolib/cpp/geom_cuda.cpp",
- "src/curobo/curobolib/cpp/sphere_obb_kernel.cu",
- "src/curobo/curobolib/cpp/pose_distance_kernel.cu",
- "src/curobo/curobolib/cpp/self_collision_kernel.cu",
- ],
- extra_compile_args=extra_cuda_args,
- ),
CUDAExtension(
"curobo.curobolib.line_search_cu",
[
@@ -82,6 +66,16 @@ ext_modules = [
],
extra_compile_args=extra_cuda_args,
),
+ CUDAExtension(
+ "curobo.curobolib.geom_cu",
+ [
+ "src/curobo/curobolib/cpp/geom_cuda.cpp",
+ "src/curobo/curobolib/cpp/sphere_obb_kernel.cu",
+ "src/curobo/curobolib/cpp/pose_distance_kernel.cu",
+ "src/curobo/curobolib/cpp/self_collision_kernel.cu",
+ ],
+ extra_compile_args=extra_cuda_args,
+ ),
]
setuptools.setup(
diff --git a/src/curobo/__init__.py b/src/curobo/__init__.py
index d56871c..5db6e50 100644
--- a/src/curobo/__init__.py
+++ b/src/curobo/__init__.py
@@ -56,8 +56,10 @@ def _get_version():
# `importlib_metadata` is the back ported library for older versions of python.
# Third Party
from importlib_metadata import version
-
- return version("nvidia_curobo")
+ try:
+ return version("nvidia_curobo")
+ except:
+ return "v0.7.0-no-tag"
# Set `__version__` attribute
diff --git a/src/curobo/content/assets/robot/franka_description/franka_test.urdf b/src/curobo/content/assets/robot/franka_description/franka_test.urdf
index eb4115d..2b841b3 100644
--- a/src/curobo/content/assets/robot/franka_description/franka_test.urdf
+++ b/src/curobo/content/assets/robot/franka_description/franka_test.urdf
@@ -280,6 +280,7 @@
+
diff --git a/src/curobo/content/configs/robot/franka.yml b/src/curobo/content/configs/robot/franka.yml
index bc3f013..c555f34 100644
--- a/src/curobo/content/configs/robot/franka.yml
+++ b/src/curobo/content/configs/robot/franka.yml
@@ -13,7 +13,7 @@ robot_cfg:
kinematics:
use_usd_kinematics: False
isaac_usd_path: "/Isaac/Robots/Franka/franka.usd"
- usd_path: "robot/franka/franka_panda.usda"
+ usd_path: "robot/non_shipping/franka/franka_panda_meters.usda"
usd_robot_root: "/panda"
usd_flip_joints: ["panda_joint1","panda_joint2","panda_joint3","panda_joint4", "panda_joint5",
"panda_joint6","panda_joint7","panda_finger_joint1", "panda_finger_joint2"]
diff --git a/src/curobo/content/configs/robot/iiwa.yml b/src/curobo/content/configs/robot/iiwa.yml
index 585541a..caf9e9d 100644
--- a/src/curobo/content/configs/robot/iiwa.yml
+++ b/src/curobo/content/configs/robot/iiwa.yml
@@ -39,7 +39,13 @@ robot_cfg:
}
urdf_path: robot/iiwa_allegro_description/iiwa.urdf
asset_root_path: robot/iiwa_allegro_description
-
+ mesh_link_names:
+ - iiwa7_link_1
+ - iiwa7_link_2
+ - iiwa7_link_3
+ - iiwa7_link_4
+ - iiwa7_link_5
+ - iiwa7_link_6
cspace:
joint_names:
[
diff --git a/src/curobo/content/configs/robot/iiwa_allegro.yml b/src/curobo/content/configs/robot/iiwa_allegro.yml
index 700f4f0..b8435fe 100644
--- a/src/curobo/content/configs/robot/iiwa_allegro.yml
+++ b/src/curobo/content/configs/robot/iiwa_allegro.yml
@@ -30,6 +30,25 @@ robot_cfg:
- ring_link_3
- thumb_link_2
- thumb_link_3
+ mesh_link_names:
+ - iiwa7_link_1
+ - iiwa7_link_2
+ - iiwa7_link_3
+ - iiwa7_link_4
+ - iiwa7_link_5
+ - iiwa7_link_6
+ - palm_link
+ - index_link_1
+ - index_link_2
+ - index_link_3
+ - middle_link_1
+ - middle_link_2
+ - middle_link_3
+ - ring_link_1
+ - ring_link_2
+ - ring_link_3
+ - thumb_link_2
+ - thumb_link_3
collision_sphere_buffer: 0.005
collision_spheres: spheres/iiwa_allegro.yml
ee_link: palm_link
diff --git a/src/curobo/content/configs/robot/spheres/franka_mesh.yml b/src/curobo/content/configs/robot/spheres/franka_mesh.yml
index e4b578c..1dec717 100644
--- a/src/curobo/content/configs/robot/spheres/franka_mesh.yml
+++ b/src/curobo/content/configs/robot/spheres/franka_mesh.yml
@@ -134,11 +134,11 @@ collision_spheres:
"radius": 0.022
panda_leftfinger:
- "center": [0.0, 0.01, 0.043]
- "radius": 0.011 # 25
+ "radius": 0.011 #0.025 # 0.011
- "center": [0.0, 0.02, 0.015]
- "radius": 0.011 # 25
+ "radius": 0.011 #0.025 # 0.011
panda_rightfinger:
- "center": [0.0, -0.01, 0.043]
- "radius": 0.011 #25
+ "radius": 0.011 #0.025 #0.011
- "center": [0.0, -0.02, 0.015]
- "radius": 0.011 #25
\ No newline at end of file
+ "radius": 0.011 #0.025 #0.011
\ No newline at end of file
diff --git a/src/curobo/content/configs/robot/spheres/ur10e.yml b/src/curobo/content/configs/robot/spheres/ur10e.yml
index aaedf9b..19973e1 100644
--- a/src/curobo/content/configs/robot/spheres/ur10e.yml
+++ b/src/curobo/content/configs/robot/spheres/ur10e.yml
@@ -58,7 +58,7 @@ collision_spheres:
radius: 0.07
tool0:
- center: [0, 0, 0.12]
- radius: 0.05
+ radius: -0.01
camera_mount:
- center: [0, 0.11, -0.01]
radius: 0.06
\ No newline at end of file
diff --git a/src/curobo/content/configs/robot/ur10e.yml b/src/curobo/content/configs/robot/ur10e.yml
index 11fe3bb..c47b4a4 100644
--- a/src/curobo/content/configs/robot/ur10e.yml
+++ b/src/curobo/content/configs/robot/ur10e.yml
@@ -38,7 +38,7 @@ robot_cfg:
'wrist_1_link': 0,
'wrist_2_link': 0,
'wrist_3_link' : 0,
- 'tool0': 0,
+ 'tool0': 0.05,
}
mesh_link_names: [ 'shoulder_link','upper_arm_link', 'forearm_link', 'wrist_1_link', 'wrist_2_link' ,'wrist_3_link' ]
lock_joints: null
diff --git a/src/curobo/content/configs/robot/ur5e.yml b/src/curobo/content/configs/robot/ur5e.yml
index 689bd9a..3c17015 100644
--- a/src/curobo/content/configs/robot/ur5e.yml
+++ b/src/curobo/content/configs/robot/ur5e.yml
@@ -92,7 +92,7 @@ robot_cfg:
"radius": 0.043
tool0:
- "center": [0.001, 0.001, 0.05]
- "radius": 0.05
+ "radius": -0.01 #0.05
collision_sphere_buffer: 0.005
@@ -109,6 +109,7 @@ robot_cfg:
'wrist_1_link': 0,
'wrist_2_link': 0,
'wrist_3_link' : 0,
+ 'tool0': 0.025,
}
use_global_cumul: True
diff --git a/src/curobo/content/configs/task/finetune_trajopt.yml b/src/curobo/content/configs/task/finetune_trajopt.yml
index 467d86b..d692422 100644
--- a/src/curobo/content/configs/task/finetune_trajopt.yml
+++ b/src/curobo/content/configs/task/finetune_trajopt.yml
@@ -55,17 +55,17 @@ cost:
bound_cfg:
weight: [5000.0, 50000.0, 50000.0, 50000.0] # needs to be 3 values
- smooth_weight: [0.0,5000.0, 50.0, 0.0] # [vel, acc, jerk,]
+ smooth_weight: [0.0,5000.0, 50.0, 0.0] #[0.0,5000.0, 50.0, 0.0] # [vel, acc, jerk,]
run_weight_velocity: 0.0
run_weight_acceleration: 1.0
run_weight_jerk: 1.0
- activation_distance: [0.1,0.1,0.1,10.0] # for position, velocity, acceleration and jerk
+ activation_distance: [0.1,0.1,0.1,0.1] # for position, velocity, acceleration and jerk
null_space_weight: [0.0]
primitive_collision_cfg:
- weight: 1000000.0
+ weight: 1000000.0 #1000000.0 1000000
use_sweep: True
- sweep_steps: 6
+ sweep_steps: 4
classify: False
use_sweep_kernel: True
use_speed_metric: True
@@ -79,7 +79,7 @@ cost:
lbfgs:
- n_iters: 400 # 400
+ n_iters: 300 # 400
inner_iters: 25
cold_start_n_iters: null
min_iters: 25
@@ -89,7 +89,7 @@ lbfgs:
cost_delta_threshold: 1.0
cost_relative_threshold: 0.999 #0.999
epsilon: 0.01
- history: 15 #15
+ history: 27 #15
use_cuda_graph: True
n_problems: 1
store_debug: False
diff --git a/src/curobo/content/configs/task/gradient_ik.yml b/src/curobo/content/configs/task/gradient_ik.yml
index 97e17b6..c81517b 100644
--- a/src/curobo/content/configs/task/gradient_ik.yml
+++ b/src/curobo/content/configs/task/gradient_ik.yml
@@ -47,7 +47,7 @@ cost:
activation_distance: [0.1]
null_space_weight: [1.0]
primitive_collision_cfg:
- weight: 50000.0
+ weight: 5000.0
use_sweep: False
classify: False
activation_distance: 0.01
@@ -57,11 +57,11 @@ cost:
lbfgs:
- n_iters: 100 #60
- inner_iters: 25
+ n_iters: 80 #60
+ inner_iters: 20
cold_start_n_iters: null
min_iters: 20
- line_search_scale: [0.01, 0.3, 0.7, 1.0]
+ line_search_scale: [0.1, 0.3, 0.7, 1.0]
fixed_iters: True
cost_convergence: 1e-7
cost_delta_threshold: 1e-6 #0.0001
diff --git a/src/curobo/content/configs/task/gradient_trajopt.yml b/src/curobo/content/configs/task/gradient_trajopt.yml
index 471d186..02d3966 100644
--- a/src/curobo/content/configs/task/gradient_trajopt.yml
+++ b/src/curobo/content/configs/task/gradient_trajopt.yml
@@ -40,7 +40,7 @@ cost:
link_pose_cfg:
vec_weight: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0] # orientation, position for all timesteps
- run_vec_weight: [1.00,1.00,1.00,1.0,1.0,1.0] # running weight orientation, position
+ run_vec_weight: [0.00,0.00,0.00,0.0,0.0,0.0] # running weight orientation, position
weight: [2000,50000.0,30,50] #[150.0, 2000.0, 30, 40]
vec_convergence: [0.0,0.0] # orientation, position, orientation metric activation, position metric activation
terminal: True
@@ -54,19 +54,17 @@ cost:
bound_cfg:
weight: [5000.0, 50000.0, 50000.0,50000.0] # needs to be 3 values
- #weight: [000.0, 000.0, 000.0,000.0]
smooth_weight: [0.0,10000.0,10.0, 0.0] # [vel, acc, jerk, alpha_v-not-used]
- #smooth_weight: [0.0,0000.0,0.0, 0.0] # [vel, acc, jerk, alpha_v-not-used]
run_weight_velocity: 0.00
run_weight_acceleration: 1.0
run_weight_jerk: 1.0
- activation_distance: [0.1,0.1,0.1,10.0] # for position, velocity, acceleration and jerk
+ activation_distance: [0.1,0.1,0.1,0.1] # for position, velocity, acceleration and jerk
null_space_weight: [0.0]
primitive_collision_cfg:
weight: 100000.0
use_sweep: True
- sweep_steps: 6
+ sweep_steps: 4
classify: False
use_sweep_kernel: True
use_speed_metric: True
@@ -81,11 +79,11 @@ cost:
lbfgs:
- n_iters: 125 #175
+ n_iters: 100 #175
inner_iters: 25
cold_start_n_iters: null
min_iters: 25
- line_search_scale: [0.01,0.3,0.7,1.0] #[0.01,0.2, 0.3,0.5,0.7,0.9, 1.0] #
+ line_search_scale: [0.01,0.3,0.7,1.0]
fixed_iters: True
cost_convergence: 0.01
cost_delta_threshold: 2000.0
diff --git a/src/curobo/content/configs/task/graph.yml b/src/curobo/content/configs/task/graph.yml
index 92ae0d1..56dbabd 100644
--- a/src/curobo/content/configs/task/graph.yml
+++ b/src/curobo/content/configs/task/graph.yml
@@ -33,7 +33,7 @@ graph:
sample_pts: 1500
node_similarity_distance: 0.1
- rejection_ratio: 20
+ rejection_ratio: 10
k_nn: 15
max_buffer: 10000
max_cg_buffer: 1000
diff --git a/src/curobo/content/configs/task/particle_ik.yml b/src/curobo/content/configs/task/particle_ik.yml
index 903e549..414f873 100644
--- a/src/curobo/content/configs/task/particle_ik.yml
+++ b/src/curobo/content/configs/task/particle_ik.yml
@@ -34,7 +34,7 @@ cost:
run_weight: 1.0
link_pose_cfg:
vec_weight: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
- weight: [30, 50, 10, 10] #[20.0, 100.0]
+ weight: [0, 50, 10, 10] #[20.0, 100.0]
vec_convergence: [0.00, 0.000] # orientation, position
terminal: False
use_metric: True
@@ -67,7 +67,7 @@ mppi:
cov_type : "DIAG_A" #
kappa : 0.01
null_act_frac : 0.0
- sample_mode : 'BEST'
+ sample_mode : 'MEAN'
base_action : 'REPEAT'
squash_fn : 'CLAMP'
n_problems : 1
diff --git a/src/curobo/content/configs/task/particle_mpc.yml b/src/curobo/content/configs/task/particle_mpc.yml
index 2b5b7e8..d10544d 100644
--- a/src/curobo/content/configs/task/particle_mpc.yml
+++ b/src/curobo/content/configs/task/particle_mpc.yml
@@ -89,7 +89,7 @@ mppi:
sample_mode : 'BEST'
base_action : 'REPEAT'
squash_fn : 'CLAMP'
- n_problems : 1
+ n_problems : 1
use_cuda_graph : True
seed : 0
store_debug : False
diff --git a/src/curobo/content/configs/task/particle_trajopt.yml b/src/curobo/content/configs/task/particle_trajopt.yml
index 1c76623..23ae39a 100644
--- a/src/curobo/content/configs/task/particle_trajopt.yml
+++ b/src/curobo/content/configs/task/particle_trajopt.yml
@@ -39,7 +39,7 @@ cost:
link_pose_cfg:
vec_weight: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
- run_vec_weight: [0.00,0.00,0.00,0.0,0.0,0.0] # running weight orientation, position
+ run_vec_weight: [0.0,0.0,0.0,0.0,0.0,0.0] # running weight
weight: [0.0, 5000.0, 40, 40]
vec_convergence: [0.0,0.0,1000.0,1000.0]
terminal: True
@@ -63,11 +63,11 @@ cost:
primitive_collision_cfg:
weight: 5000.0
- use_sweep: True
+ use_sweep: False
classify: False
sweep_steps: 4
- use_sweep_kernel: True
- use_speed_metric: True
+ use_sweep_kernel: False
+ use_speed_metric: False
speed_dt: 0.01 # used only for speed metric
activation_distance: 0.025
@@ -92,7 +92,7 @@ mppi:
cov_type : "DIAG_A" #
kappa : 0.001
null_act_frac : 0.0
- sample_mode : 'BEST'
+ sample_mode : 'MEAN'
base_action : 'REPEAT'
squash_fn : 'CLAMP'
n_problems : 1
diff --git a/src/curobo/cuda_robot_model/cuda_robot_model.py b/src/curobo/cuda_robot_model/cuda_robot_model.py
index 6f5a45a..016816b 100644
--- a/src/curobo/cuda_robot_model/cuda_robot_model.py
+++ b/src/curobo/cuda_robot_model/cuda_robot_model.py
@@ -180,7 +180,7 @@ class CudaRobotModel(CudaRobotModelConfig):
self._batch_robot_spheres = torch.zeros(
(self._batch_size, self.kinematics_config.total_spheres, 4),
device=self.tensor_args.device,
- dtype=self.tensor_args.dtype,
+ dtype=self.tensor_args.collision_geometry_dtype,
)
self._grad_out_q = torch.zeros(
(self._batch_size, self.get_dof()),
diff --git a/src/curobo/cuda_robot_model/types.py b/src/curobo/cuda_robot_model/types.py
index 9c782ad..776c240 100644
--- a/src/curobo/cuda_robot_model/types.py
+++ b/src/curobo/cuda_robot_model/types.py
@@ -23,6 +23,7 @@ from curobo.types.base import TensorDeviceType
from curobo.types.math import Pose
from curobo.types.state import JointState
from curobo.types.tensor import T_DOF
+from curobo.util.logger import log_error
from curobo.util.tensor_util import clone_if_not_none, copy_if_not_none
@@ -138,6 +139,13 @@ class CSpaceConfig:
self.acceleration_scale = self.tensor_args.to_device(self.acceleration_scale)
if isinstance(self.jerk_scale, List):
self.jerk_scale = self.tensor_args.to_device(self.jerk_scale)
+ # check shapes:
+ if self.retract_config is not None:
+ dof = self.retract_config.shape
+ if self.cspace_distance_weight is not None and self.cspace_distance_weight.shape != dof:
+ log_error("cspace_distance_weight shape does not match retract_config")
+ if self.null_space_weight is not None and self.null_space_weight.shape != dof:
+ log_error("null_space_weight shape does not match retract_config")
def inplace_reindex(self, joint_names: List[str]):
new_index = [self.joint_names.index(j) for j in joint_names]
@@ -207,8 +215,8 @@ class CSpaceConfig:
):
retract_config = ((joint_position_upper + joint_position_lower) / 2).flatten()
n_dof = retract_config.shape[-1]
- null_space_weight = torch.ones(n_dof, **vars(tensor_args))
- cspace_distance_weight = torch.ones(n_dof, **vars(tensor_args))
+ null_space_weight = torch.ones(n_dof, **(tensor_args.as_torch_dict()))
+ cspace_distance_weight = torch.ones(n_dof, **(tensor_args.as_torch_dict()))
return CSpaceConfig(
joint_names,
retract_config,
@@ -289,8 +297,8 @@ class KinematicsTensorConfig:
retract_config = (
(self.joint_limits.position[1] + self.joint_limits.position[0]) / 2
).flatten()
- null_space_weight = torch.ones(self.n_dof, **vars(self.tensor_args))
- cspace_distance_weight = torch.ones(self.n_dof, **vars(self.tensor_args))
+ null_space_weight = torch.ones(self.n_dof, **(self.tensor_args.as_torch_dict()))
+ cspace_distance_weight = torch.ones(self.n_dof, **(self.tensor_args.as_torch_dict()))
joint_names = self.joint_names
self.cspace = CSpaceConfig(
joint_names,
diff --git a/src/curobo/cuda_robot_model/urdf_kinematics_parser.py b/src/curobo/cuda_robot_model/urdf_kinematics_parser.py
index e116f68..f733798 100644
--- a/src/curobo/cuda_robot_model/urdf_kinematics_parser.py
+++ b/src/curobo/cuda_robot_model/urdf_kinematics_parser.py
@@ -175,7 +175,11 @@ class UrdfKinematicsParser(KinematicsParser):
return txt
def get_link_mesh(self, link_name):
- m = self._robot.link_map[link_name].visuals[0].geometry.mesh
+ link_data = self._robot.link_map[link_name]
+
+ if len(link_data.visuals) == 0:
+ log_error(link_name + " not found in urdf, remove from mesh_link_names")
+ m = link_data.visuals[0].geometry.mesh
mesh_pose = self._robot.link_map[link_name].visuals[0].origin
# read visual material:
if mesh_pose is None:
diff --git a/src/curobo/curobolib/cpp/geom_cuda.cpp b/src/curobo/curobolib/cpp/geom_cuda.cpp
index 3775edb..6ea44ac 100644
--- a/src/curobo/curobolib/cpp/geom_cuda.cpp
+++ b/src/curobo/curobolib/cpp/geom_cuda.cpp
@@ -57,7 +57,8 @@ std::vectorswept_sphere_obb_clpt(
const bool enable_speed_metric,
const bool transform_back,
const bool compute_distance,
- const bool use_batch_env);
+ const bool use_batch_env,
+ const bool sum_collisions);
std::vector
sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 4
@@ -66,6 +67,7 @@ sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 4
torch::Tensor sparsity_idx,
const torch::Tensor weight,
const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
const torch::Tensor obb_accel, // n_boxes, 4, 4
const torch::Tensor obb_bounds, // n_boxes, 3
const torch::Tensor obb_pose, // n_boxes, 4, 4
@@ -78,8 +80,52 @@ sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 4
const int n_spheres,
const bool transform_back,
const bool compute_distance,
- const bool use_batch_env);
+ const bool use_batch_env,
+ const bool sum_collisions,
+ const bool compute_esdf);
+std::vector
+sphere_voxel_clpt(const torch::Tensor sphere_position, // batch_size, 3
+ torch::Tensor distance,
+ torch::Tensor closest_point, // batch size, 3
+ torch::Tensor sparsity_idx, const torch::Tensor weight,
+ const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
+ const torch::Tensor grid_features, // n_boxes, 4, 4
+ const torch::Tensor grid_params, // n_boxes, 3
+ const torch::Tensor grid_pose, // n_boxes, 4, 4
+ const torch::Tensor grid_enable, // n_boxes, 4, 4
+ const torch::Tensor n_env_grid,
+ const torch::Tensor env_query_idx, // n_boxes, 4, 4
+ const int max_nobs, const int batch_size, const int horizon,
+ const int n_spheres, const bool transform_back,
+ const bool compute_distance, const bool use_batch_env,
+ const bool sum_collisions,
+ const bool compute_esdf);
+std::vector
+swept_sphere_voxel_clpt(const torch::Tensor sphere_position, // batch_size, 3
+ torch::Tensor distance,
+ torch::Tensor closest_point, // batch size, 3
+ torch::Tensor sparsity_idx, const torch::Tensor weight,
+ const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
+ const torch::Tensor speed_dt,
+ const torch::Tensor grid_features, // n_boxes, 4, 4
+ const torch::Tensor grid_params, // n_boxes, 3
+ const torch::Tensor grid_pose, // n_boxes, 4, 4
+ const torch::Tensor grid_enable, // n_boxes, 4, 4
+ const torch::Tensor n_env_grid,
+ const torch::Tensor env_query_idx, // n_boxes, 4, 4
+ const int max_nobs,
+ const int batch_size,
+ const int horizon,
+ const int n_spheres,
+ const int sweep_steps,
+ const bool enable_speed_metric,
+ const bool transform_back,
+ const bool compute_distance,
+ const bool use_batch_env,
+ const bool sum_collisions);
std::vectorpose_distance(
torch::Tensor out_distance,
torch::Tensor out_position_distance,
@@ -159,11 +205,11 @@ std::vectorself_collision_distance_wrapper(
std::vectorsphere_obb_clpt_wrapper(
const torch::Tensor sphere_position, // batch_size, 4
-
torch::Tensor distance,
torch::Tensor closest_point, // batch size, 3
torch::Tensor sparsity_idx, const torch::Tensor weight,
const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
const torch::Tensor obb_accel, // n_boxes, 4, 4
const torch::Tensor obb_bounds, // n_boxes, 3
const torch::Tensor obb_pose, // n_boxes, 4, 4
@@ -171,8 +217,10 @@ std::vectorsphere_obb_clpt_wrapper(
const torch::Tensor n_env_obb, // n_boxes, 4, 4
const torch::Tensor env_query_idx, // n_boxes, 4, 4
const int max_nobs, const int batch_size, const int horizon,
- const int n_spheres, const bool transform_back, const bool compute_distance,
- const bool use_batch_env)
+ const int n_spheres,
+ const bool transform_back, const bool compute_distance,
+ const bool use_batch_env, const bool sum_collisions = true,
+ const bool compute_esdf = false)
{
const at::cuda::OptionalCUDAGuard guard(sphere_position.device());
@@ -185,9 +233,9 @@ std::vectorsphere_obb_clpt_wrapper(
CHECK_INPUT(obb_accel);
return sphere_obb_clpt(
sphere_position, distance, closest_point, sparsity_idx, weight,
- activation_distance, obb_accel, obb_bounds, obb_pose, obb_enable,
+ activation_distance, max_distance, obb_accel, obb_bounds, obb_pose, obb_enable,
n_env_obb, env_query_idx, max_nobs, batch_size, horizon, n_spheres,
- transform_back, compute_distance, use_batch_env);
+ transform_back, compute_distance, use_batch_env, sum_collisions, compute_esdf);
}
std::vectorswept_sphere_obb_clpt_wrapper(
@@ -205,7 +253,7 @@ std::vectorswept_sphere_obb_clpt_wrapper(
const int max_nobs, const int batch_size, const int horizon,
const int n_spheres, const int sweep_steps, const bool enable_speed_metric,
const bool transform_back, const bool compute_distance,
- const bool use_batch_env)
+ const bool use_batch_env, const bool sum_collisions = true)
{
const at::cuda::OptionalCUDAGuard guard(sphere_position.device());
@@ -218,7 +266,37 @@ std::vectorswept_sphere_obb_clpt_wrapper(
distance, closest_point, sparsity_idx, weight, activation_distance,
speed_dt, obb_accel, obb_bounds, obb_pose, obb_enable, n_env_obb,
env_query_idx, max_nobs, batch_size, horizon, n_spheres, sweep_steps,
- enable_speed_metric, transform_back, compute_distance, use_batch_env);
+ enable_speed_metric, transform_back, compute_distance, use_batch_env, sum_collisions);
+}
+
+std::vector
+sphere_voxel_clpt_wrapper(const torch::Tensor sphere_position, // batch_size, 3
+ torch::Tensor distance,
+ torch::Tensor closest_point, // batch size, 3
+ torch::Tensor sparsity_idx, const torch::Tensor weight,
+ const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
+ const torch::Tensor grid_features, // n_boxes, 4, 4
+ const torch::Tensor grid_params, // n_boxes, 3
+ const torch::Tensor grid_pose, // n_boxes, 4, 4
+ const torch::Tensor grid_enable, // n_boxes, 4, 4
+ const torch::Tensor n_env_grid,
+ const torch::Tensor env_query_idx, // n_boxes, 4, 4
+ const int max_ngrid, const int batch_size, const int horizon,
+ const int n_spheres, const bool transform_back,
+ const bool compute_distance, const bool use_batch_env,
+ const bool sum_collisions,
+ const bool compute_esdf)
+{
+ const at::cuda::OptionalCUDAGuard guard(sphere_position.device());
+
+ CHECK_INPUT(distance);
+ CHECK_INPUT(closest_point);
+ CHECK_INPUT(sphere_position);
+ return sphere_voxel_clpt(sphere_position, distance, closest_point, sparsity_idx, weight,
+ activation_distance, max_distance, grid_features, grid_params,
+ grid_pose, grid_enable, n_env_grid, env_query_idx, max_ngrid, batch_size, horizon, n_spheres,
+ transform_back, compute_distance, use_batch_env, sum_collisions, compute_esdf);
}
std::vectorpose_distance_wrapper(
@@ -297,6 +375,12 @@ PYBIND11_MODULE(TORCH_EXTENSION_NAME, m)
"Closest Point OBB(curobolib)");
m.def("swept_closest_point", &swept_sphere_obb_clpt_wrapper,
"Swept Closest Point OBB(curobolib)");
+ m.def("closest_point_voxel", &sphere_voxel_clpt_wrapper,
+ "Closest Point Voxel(curobolib)");
+ m.def("swept_closest_point_voxel", &swept_sphere_voxel_clpt,
+ "Swpet Closest Point Voxel(curobolib)");
+
+
m.def("self_collision_distance", &self_collision_distance_wrapper,
"Self Collision Distance (curobolib)");
diff --git a/src/curobo/curobolib/cpp/lbfgs_step_cuda.cpp b/src/curobo/curobolib/cpp/lbfgs_step_cuda.cpp
index b9bb8ea..ac8bbe5 100644
--- a/src/curobo/curobolib/cpp/lbfgs_step_cuda.cpp
+++ b/src/curobo/curobolib/cpp/lbfgs_step_cuda.cpp
@@ -14,38 +14,6 @@
#include
-// CUDA forward declarations
-std::vectorreduce_cuda(torch::Tensor vec,
- torch::Tensor vec2,
- torch::Tensor rho_buffer,
- torch::Tensor sum,
- const int batch_size,
- const int v_dim,
- const int m);
-
-std::vector
-lbfgs_step_cuda(torch::Tensor step_vec,
- torch::Tensor rho_buffer,
- torch::Tensor y_buffer,
- torch::Tensor s_buffer,
- torch::Tensor grad_q,
- const float epsilon,
- const int batch_size,
- const int m,
- const int v_dim);
-
-std::vector
-lbfgs_update_cuda(torch::Tensor rho_buffer,
- torch::Tensor y_buffer,
- torch::Tensor s_buffer,
- torch::Tensor q,
- torch::Tensor grad_q,
- torch::Tensor x_0,
- torch::Tensor grad_0,
- const int batch_size,
- const int m,
- const int v_dim);
-
std::vector
lbfgs_cuda_fuse(torch::Tensor step_vec,
torch::Tensor rho_buffer,
@@ -59,7 +27,8 @@ lbfgs_cuda_fuse(torch::Tensor step_vec,
const int batch_size,
const int m,
const int v_dim,
- const bool stable_mode);
+ const bool stable_mode,
+ const bool use_shared_buffers);
// C++ interface
@@ -71,58 +40,12 @@ lbfgs_cuda_fuse(torch::Tensor step_vec,
CHECK_CUDA(x); \
CHECK_CONTIGUOUS(x)
-std::vector
-lbfgs_step_call(torch::Tensor step_vec, torch::Tensor rho_buffer,
- torch::Tensor y_buffer, torch::Tensor s_buffer,
- torch::Tensor grad_q, const float epsilon, const int batch_size,
- const int m, const int v_dim)
-{
- CHECK_INPUT(step_vec);
- CHECK_INPUT(rho_buffer);
- CHECK_INPUT(y_buffer);
- CHECK_INPUT(s_buffer);
- CHECK_INPUT(grad_q);
- const at::cuda::OptionalCUDAGuard guard(grad_q.device());
-
- return lbfgs_step_cuda(step_vec, rho_buffer, y_buffer, s_buffer, grad_q,
- epsilon, batch_size, m, v_dim);
-}
-
-std::vector
-lbfgs_update_call(torch::Tensor rho_buffer, torch::Tensor y_buffer,
- torch::Tensor s_buffer, torch::Tensor q, torch::Tensor grad_q,
- torch::Tensor x_0, torch::Tensor grad_0, const int batch_size,
- const int m, const int v_dim)
-{
- CHECK_INPUT(rho_buffer);
- CHECK_INPUT(y_buffer);
- CHECK_INPUT(s_buffer);
- CHECK_INPUT(grad_q);
- CHECK_INPUT(x_0);
- CHECK_INPUT(grad_0);
- CHECK_INPUT(q);
- const at::cuda::OptionalCUDAGuard guard(grad_q.device());
-
- return lbfgs_update_cuda(rho_buffer, y_buffer, s_buffer, q, grad_q, x_0,
- grad_0, batch_size, m, v_dim);
-}
-
-std::vector
-reduce_cuda_call(torch::Tensor vec, torch::Tensor vec2,
- torch::Tensor rho_buffer, torch::Tensor sum,
- const int batch_size, const int v_dim, const int m)
-{
- const at::cuda::OptionalCUDAGuard guard(sum.device());
-
- return reduce_cuda(vec, vec2, rho_buffer, sum, batch_size, v_dim, m);
-}
-
std::vector
lbfgs_call(torch::Tensor step_vec, torch::Tensor rho_buffer,
torch::Tensor y_buffer, torch::Tensor s_buffer, torch::Tensor q,
torch::Tensor grad_q, torch::Tensor x_0, torch::Tensor grad_0,
const float epsilon, const int batch_size, const int m,
- const int v_dim, const bool stable_mode)
+ const int v_dim, const bool stable_mode, const bool use_shared_buffers)
{
CHECK_INPUT(step_vec);
CHECK_INPUT(rho_buffer);
@@ -136,13 +59,11 @@ lbfgs_call(torch::Tensor step_vec, torch::Tensor rho_buffer,
return lbfgs_cuda_fuse(step_vec, rho_buffer, y_buffer, s_buffer, q, grad_q,
x_0, grad_0, epsilon, batch_size, m, v_dim,
- stable_mode);
+ stable_mode, use_shared_buffers);
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m)
{
- m.def("step", &lbfgs_step_call, "L-BFGS step (CUDA)");
- m.def("update", &lbfgs_update_call, "L-BFGS Update (CUDA)");
+
m.def("forward", &lbfgs_call, "L-BFGS Update + Step (CUDA)");
- m.def("debug_reduce", &reduce_cuda_call, "L-BFGS Debug");
}
diff --git a/src/curobo/curobolib/cpp/lbfgs_step_kernel.cu b/src/curobo/curobolib/cpp/lbfgs_step_kernel.cu
index 1bad714..503aaba 100644
--- a/src/curobo/curobolib/cpp/lbfgs_step_kernel.cu
+++ b/src/curobo/curobolib/cpp/lbfgs_step_kernel.cu
@@ -12,7 +12,7 @@
#include
#include
#include
-
+#include
#include
#include
@@ -32,200 +32,20 @@
// #include
// #include
-#define M_MAX 512
-#define HALF_MAX 65504.0
-#define M 15
-#define VDIM 175 // 25 * 7,
-
+//#define M_MAX 512
+//#define HALF_MAX 65504.0
+//#define M 15
+//#define VDIM 175 // 25 * 7,
#define FULL_MASK 0xffffffff
-
+#define VOLTA_PLUS true
namespace Curobo
{
namespace Optimization
{
- template
- __device__ __forceinline__ void
- scalar_vec_product(const scalar_t a, const scalar_t *b, scalar_t *out,
- const int v_dim)
- {
- for (int i = 0; i < v_dim; i++)
- {
- out[i] = a * b[i];
- }
- }
-
- template
- __device__ __forceinline__ void
- m_scalar_vec_product(const scalar_t *a, const scalar_t *b, scalar_t *out,
- const int v_dim, const int m)
- {
- for (int j = 0; j < m; j++)
- {
- for (int i = 0; i < v_dim; i++)
- {
- out[j * v_dim + i] = a[j] * b[j * v_dim + i];
- }
- }
- }
-
- template
- __device__ __forceinline__ void vec_vec_dot(const scalar_t *a,
- const scalar_t *b, scalar_t& out,
- const int v_dim)
- {
- for (int i = 0; i < v_dim; i++)
- {
- out += a[i] * b[i];
- }
- }
-
- template
- __device__ __forceinline__ void update_r(const scalar_t *rho_y,
- const scalar_t *s_buffer, scalar_t *r,
- scalar_t& alpha, const int v_dim)
- {
- // dot product: and subtract with alpha
- for (int i = 0; i < v_dim; i++)
- {
- alpha -= rho_y[i] * r[i];
- }
-
- // scalar vector product:
- for (int i = 0; i < v_dim; i++)
- {
- r[i] = r[i] + alpha * s_buffer[i];
- }
- }
-
- template
- __device__ __forceinline__ void update_q(const scalar_t *y_buffer, scalar_t *gq,
- const scalar_t alpha,
- const int v_dim)
- {
- //
- for (int i = 0; i < v_dim; i++)
- {
- gq[i] = gq[i] - (alpha * y_buffer[i]);
- }
- }
-
- template
- __global__ void
- lbfgs_step_kernel_old(scalar_t *step_vec, scalar_t *rho_buffer,
- const scalar_t *y_buffer, const scalar_t *s_buffer,
- const scalar_t *grad_q, const float epsilon,
- const int batchSize, const int m, const int v_dim)
- {
- // each thread writes one sphere of some link
- const int t = blockDim.x * blockIdx.x + threadIdx.x; // batch
- const int b_idx = t;
-
- if (t >= (batchSize))
- {
- return;
- }
-
- // get thread start address:
- const int b_start_scalar_adrs = b_idx * m;
- const int b_start_vec_adrs = b_idx * m * v_dim;
- const int b_step_start_adrs = b_idx * v_dim;
-
- scalar_t rho_s[M * VDIM];
-
- // copy floats to local buffer?
- // y_buffer, s_buffer, rho_buffer
- // compute rho_s
- scalar_t loc_ybuf[M * VDIM];
- scalar_t loc_sbuf[M * VDIM];
- scalar_t loc_rho[M];
- scalar_t gq[VDIM]; // , l_q[VDIM];
- scalar_t alpha_buffer[M];
- scalar_t t_1, t_2;
-
- for (int i = 0; i < m * v_dim; i++)
- {
- loc_ybuf[i] = y_buffer[b_start_vec_adrs + i];
- loc_sbuf[i] = s_buffer[b_start_vec_adrs + i];
- }
-
- for (int i = 0; i < v_dim; i++)
- {
- gq[i] = grad_q[b_step_start_adrs + i];
- }
-
- for (int i = 0; i < m; i++)
- {
- loc_rho[i] = rho_buffer[b_start_scalar_adrs + i];
- }
-
- m_scalar_vec_product(&loc_rho[0], &loc_sbuf[0], &rho_s[0], v_dim, m);
-
- // for loop over m
- for (int i = m - 1; i > m - 2; i--)
- {
- // l_start_vec_adrs = i * v_dim;
- // scalar_vec_product(loc_rho[i], &loc_sbuf[i*v_dim], &rho_s[i*v_dim],
- // v_dim);
- vec_vec_dot(&rho_s[i * v_dim], &gq[0], alpha_buffer[i], v_dim);
- update_q(&loc_ybuf[(i * v_dim)], &gq[0], alpha_buffer[i], v_dim);
- }
-
- // compute initial hessian:
- vec_vec_dot(&loc_sbuf[(m - 1) * v_dim], &loc_ybuf[(m - 1) * v_dim], t_1,
- v_dim);
- vec_vec_dot(&loc_ybuf[(m - 1) * v_dim], &loc_ybuf[(m - 1) * v_dim], t_2,
- v_dim);
- t_1 = t_1 / t_2;
-
- if (t_1 < 0)
- {
- t_1 = 0;
- }
-
- t_1 += epsilon;
- scalar_vec_product(t_1, &gq[0], &gq[0], v_dim);
-
- m_scalar_vec_product(&loc_rho[0], &loc_ybuf[0], &rho_s[0], v_dim, m);
-
- for (int i = 0; i < m; i++)
- {
- // scalar_vec_product(loc_rho[i], &loc_ybuf[i*v_dim], &rho_s[i*v_dim],
- // v_dim);
- update_r(&rho_s[i * v_dim], &loc_sbuf[i * v_dim], &gq[0], alpha_buffer[i],
- v_dim);
- }
-
- // write gq to out grad:
-
- for (int i = 0; i < v_dim; i++)
- {
- step_vec[b_step_start_adrs + i] = -1.0 * gq[i];
- }
- }
-
- template
- __forceinline__ __device__ psum_t warpReduce(psum_t v, int elems,
- unsigned mask)
- {
- psum_t val = v;
- int shift = 1;
-
- for (int i = elems; i > 1; i /= 2)
- {
- val += __shfl_down_sync(mask, val, shift);
- shift *= 2;
- }
-
- // val += __shfl_down_sync(mask, val, 1); // i=32
- // val += __shfl_down_sync(mask, val, 2); // i=16
- // val += __shfl_down_sync(mask, val, 4); // i=8
- // val += __shfl_down_sync(mask, val, 8); // i=4
- // val += __shfl_down_sync(mask, val, 16); // i=2
- return val;
- }
-
+
+
template
- __forceinline__ __device__ void reduce(scalar_t v, int m, psum_t *data,
+ __forceinline__ __device__ void reduce_v0(scalar_t v, int m, psum_t *data,
scalar_t *result)
{
psum_t val = v;
@@ -238,7 +58,8 @@ namespace Curobo
val += __shfl_down_sync(mask, val, 16);
// int leader = __ffs(mask) – 1; // select a leader lane
- int leader = 0;
+ const int leader = 0;
+
if (threadIdx.x % 32 == leader)
{
@@ -271,7 +92,6 @@ namespace Curobo
}
// int leader = __ffs(mask2) – 1; // select a leader lane
- int leader = 0;
if (threadIdx.x % 32 == leader)
{
@@ -280,8 +100,35 @@ namespace Curobo
}
}
__syncthreads();
+
}
+
+
+ template__inline__ __device__ scalar_t relu(scalar_t var)
+ {
+ if (var < 0)
+ return 0;
+ else
+ return var;
+ }
+
+ template
+ __forceinline__ __device__ psum_t warpReduce(psum_t v,
+ const int elems,
+ unsigned mask)
+ {
+ psum_t val = v;
+ int shift = 1;
+
+ #pragma unroll
+ for (int i = elems; i > 1; i /= 2)
+ {
+ val += __shfl_down_sync(mask, val, shift);
+ shift *= 2;
+ }
+ return val;
+ }
// blockReduce
template
__forceinline__ __device__ void reduce_v1(scalar_t v, int m, psum_t *data,
@@ -292,12 +139,23 @@ namespace Curobo
// int leader = __ffs(mask) – 1; // select a leader lane
int leader = 0;
-
+ if (threadIdx.x % 32 == leader)
+ {
+ if (m < 32)
+ {
+ result[0] = (scalar_t)val;
+ }
+ else
+ {
+ data[(threadIdx.x + 1) / 32] = val;
+ }
+ }
+ /*
if (threadIdx.x % 32 == leader)
{
data[(threadIdx.x + 1) / 32] = val;
}
-
+ */
if (m >= 32)
{
__syncthreads();
@@ -307,10 +165,20 @@ namespace Curobo
if (threadIdx.x / 32 == 0) // only the first warp will do this work
{
- psum_t val2 = warpReduce(data[threadIdx.x % 32], elems, mask2);
+ psum_t val2 = data[threadIdx.x % 32];
+ int shift = 1;
+
+ #pragma unroll
+ for (int i = elems - 1; i > 0; i /= 2)
+ {
+ val2 += __shfl_down_sync(mask2, val2, shift);
+ shift *= 2;
+ }
+
+ //psum_t val2 = warpReduce(data[threadIdx.x % 32], elems - 1, mask2);
// // int leader = __ffs(mask2) – 1; // select a leader lane
- if (threadIdx.x == leader)
+ if (threadIdx.x % 32 == leader)
{
result[0] = (scalar_t)val2;
}
@@ -325,205 +193,7 @@ namespace Curobo
}
__syncthreads();
}
-
- template
- __inline__ __device__ void dot(const scalar_t *mat1, const scalar_t *mat2,
- const int m, psum_t *data, scalar_t *result)
- {
- scalar_t val = mat1[threadIdx.x] * mat2[threadIdx.x];
-
- reduce(val, m, data, result);
- }
-
- template__inline__ __device__ scalar_t relu(scalar_t var)
- {
- if (var < 0)
- return 0;
- else
- return var;
- }
-
- //////////////////////////////////////////////////////////
- // one block per batch
- // v_dim threads per block
- //////////////////////////////////////////////////////////
- template
- __global__ void lbfgs_step_kernel(scalar_t *step_vec, // b x 175
- scalar_t *rho_buffer, // m x b x 1
- const scalar_t *y_buffer, // m x b x 175
- const scalar_t *s_buffer, // m x b x 175
- const scalar_t *grad_q, // b x 175
- const float epsilon, const int batchsize,
- const int m, const int v_dim)
- {
- __shared__ psum_t
- data[32]; // temporary buffer needed for block-wide reduction
- __shared__ scalar_t
- result; // result of the reduction or vector-vector dot product
- __shared__ scalar_t gq[175]; /// gq = batch * v_dim
-
- assert(v_dim < 176);
- int batch = blockIdx.x; // one block per batch
-
- if (threadIdx.x >= v_dim)
- return;
-
- gq[threadIdx.x] = grad_q[batch * v_dim + threadIdx.x]; // copy grad_q to gq
-
- scalar_t alpha_buffer[16];
-
- // assert(m<16); // allocating a buffer assuming m < 16
-
- for (int i = m - 1; i > -1; i--)
- {
- dot(&gq[0], &s_buffer[i * batchsize * v_dim + batch * v_dim], v_dim,
- &data[0], &result);
- alpha_buffer[i] = result * rho_buffer[i * batchsize + batch];
- gq[threadIdx.x] =
- gq[threadIdx.x] -
- alpha_buffer[i] *
- y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
- }
-
- // compute var1
- scalar_t val1 =
- y_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x];
- scalar_t val2 =
- s_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x];
- reduce(val1 * val1, v_dim, data, &result);
- scalar_t denominator = result;
- reduce(val1 * val2, v_dim, data, &result);
- scalar_t numerator = result;
- scalar_t var1 = numerator / denominator;
-
- scalar_t gamma = relu(var1) + epsilon; // epsilon
-
- gq[threadIdx.x] = gamma * gq[threadIdx.x];
-
- for (int i = 0; i < m; i++)
- {
- dot(&gq[0], &y_buffer[i * batchsize * v_dim + batch * v_dim], v_dim,
- &data[0], &result);
- gq[threadIdx.x] =
- gq[threadIdx.x] +
- (alpha_buffer[i] - result * rho_buffer[i * batchsize + batch]) *
- s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
- }
-
- step_vec[batch * v_dim + threadIdx.x] =
- -1 * gq[threadIdx.x]; // copy from shared memory to global memory
- }
-
- template
- __global__ void lbfgs_update_buffer_kernel(scalar_t *rho_buffer, // m x b x 1
- scalar_t *y_buffer, // m x b x 175
- scalar_t *s_buffer, // m x b x 175
- scalar_t *q, // b x 175
- scalar_t *x_0, // b x 175
- scalar_t *grad_0, // b x 175
- const scalar_t *grad_q, // b x 175
- const int batchsize, const int m,
- const int v_dim)
- {
- __shared__ psum_t
- data[32]; // temporary buffer needed for block-wide reduction
- __shared__ scalar_t
- result; // result of the reduction or vector-vector dot product
-
- // __shared__ scalar_t y[175]; // temporary shared memory storage
- // __shared__ scalar_t s[175]; // temporary shared memory storage
- assert(v_dim <= VDIM);
- int batch = blockIdx.x; // one block per batch
-
- if (threadIdx.x >= v_dim)
- return;
-
- scalar_t y =
- grad_q[batch * v_dim + threadIdx.x] - grad_0[batch * v_dim + threadIdx.x];
- scalar_t s =
- q[batch * v_dim + threadIdx.x] - x_0[batch * v_dim + threadIdx.x];
- reduce(y * s, v_dim, &data[0], &result);
-
- for (int i = 1; i < m; i++)
- {
- s_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] =
- s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
- y_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] =
- y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
- }
-
- // __syncthreads();
-
- s_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = s;
- y_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = y;
- grad_0[batch * v_dim +
- threadIdx.x] =
- grad_q[batch * v_dim + threadIdx.x];
- x_0[batch * v_dim +
- threadIdx.x] =
- q[batch * v_dim + threadIdx.x];
-
- if (threadIdx.x == 0)
- {
- scalar_t rho = 1 / result;
-
- for (int i = 1; i < m; i++)
- {
- rho_buffer[(i - 1) * batchsize + batch] =
- rho_buffer[i * batchsize + batch];
- }
- rho_buffer[(m - 1) * batchsize + batch] = rho;
- }
- }
-
- template
- __global__ void reduce_kernel(
- scalar_t *vec1, // b x 175
- scalar_t *vec2, // b x 175
- scalar_t *rho_buffer, // m x b x 1
-
- scalar_t *sum_out, // m x b x 1
-
- const int batchsize, const int m,
- const int v_dim) // s_buffer and y_buffer are not rolled by default
- {
- __shared__ psum_t
- data[32]; // temporary buffer needed for block-wide reduction
- __shared__ scalar_t
- result; // result of the reduction or vector-vector dot product
- int batch = blockIdx.x; // one block per batch
-
- if (threadIdx.x >= v_dim)
- return;
-
- ////////////////////
- // update_buffer
- ////////////////////
- scalar_t y = vec1[batch * v_dim + threadIdx.x];
- scalar_t s = vec2[batch * v_dim + threadIdx.x];
-
- reduce(y * s, v_dim, &data[0], &result);
- scalar_t numerator = result;
-
- if (threadIdx.x == 0)
- {
- sum_out[batch] = 1 / numerator;
- }
-
- // return;
- if (threadIdx.x < m - 1)
- {
- // m thread participate to shif the values
- // this is safe as m<32 and this happens in lockstep
- rho_buffer[threadIdx.x * batchsize + batch] =
- rho_buffer[(threadIdx.x + 1) * batchsize + batch];
- }
- else if (threadIdx.x == m - 1)
- {
- scalar_t rho = (1 / numerator);
- rho_buffer[threadIdx.x * batchsize + batch] = rho;
- }
- }
+
template
__global__ void lbfgs_update_buffer_and_step_v1(
@@ -535,26 +205,24 @@ namespace Curobo
scalar_t *x_0, // b x 175
scalar_t *grad_0, // b x 175
const scalar_t *grad_q, // b x 175
- const float epsilon, const int batchsize, const int m, const int v_dim,
- const bool stable_mode =
- false) // s_buffer and y_buffer are not rolled by default
+ const float epsilon, const int batchsize, const int lbfgs_history, const int v_dim,
+ const bool stable_mode = false) // s_buffer and y_buffer are not rolled by default
{
- // extern __shared__ scalar_t alpha_buffer_sh[];
- extern __shared__ __align__(sizeof(scalar_t)) unsigned char my_smem[];
- scalar_t *my_smem_rc = reinterpret_cast(my_smem);
- scalar_t *alpha_buffer_sh = &my_smem_rc[0]; // m*blockDim.x
- scalar_t *rho_buffer_sh = &my_smem_rc[m * blockDim.x]; // batchsize*m
- scalar_t *s_buffer_sh =
- &my_smem_rc[m * blockDim.x + m * batchsize]; // m*blockDim.x
- scalar_t *y_buffer_sh =
- &my_smem_rc[2 * m * blockDim.x + m * batchsize]; // m*blockDim.x
+ extern __shared__ float my_smem_rc[];
+ //__shared__ float my_smem_rc[21 * (3 * (32 * 7) + 1)];
+ int history_m = lbfgs_history;
+
+ // align the external shared memory by 4 bytes
+ float* s_buffer_sh = (float *) &my_smem_rc; // m*blockDim.x
+
+ float* y_buffer_sh = (float *) &s_buffer_sh[history_m * v_dim]; // m*blockDim.x
+ float* alpha_buffer_sh = (float *) &y_buffer_sh[history_m * v_dim]; // m*blockDim.x
+ float* rho_buffer_sh = (float *) &alpha_buffer_sh[history_m * v_dim]; // m*blockDim.x
+
+ psum_t* data = (psum_t *)&rho_buffer_sh[history_m];
+ float* result = (float *)&data[32];
+
- __shared__ psum_t
- data[32]; // temporary buffer needed for
- // block-wide reduction
- __shared__ scalar_t
- result; // result of the reduction or
- // vector-vector dot product
int batch = blockIdx.x; // one block per batch
if (threadIdx.x >= v_dim)
@@ -562,7 +230,6 @@ namespace Curobo
scalar_t gq;
gq = grad_q[batch * v_dim + threadIdx.x]; // copy grad_q to gq
-
////////////////////
// update_buffer
////////////////////
@@ -571,18 +238,17 @@ namespace Curobo
// if y is close to zero
scalar_t s =
q[batch * v_dim + threadIdx.x] - x_0[batch * v_dim + threadIdx.x];
- reduce_v1(y * s, v_dim, &data[0], &result);
- // reduce(y * s, v_dim, &data[0], &result);
- scalar_t numerator = result;
+ //reduce_v1(y * s, v_dim, &data[0], &result);
+ reduce_v1(y * s, v_dim, &data[0], result);
+
- // scalar_t rho = 1.0/numerator;
+ scalar_t numerator = result[0];
if (!rolled_ys)
{
-#pragma unroll
-
- for (int i = 1; i < m; i++)
+ #pragma unroll
+ for (int i = 1; i < history_m; i++)
{
scalar_t st =
s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
@@ -590,30 +256,30 @@ namespace Curobo
y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
s_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = st;
y_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = yt;
- s_buffer_sh[m * threadIdx.x + i - 1] = st;
- y_buffer_sh[m * threadIdx.x + i - 1] = yt;
+ s_buffer_sh[history_m * threadIdx.x + i - 1] = st;
+ y_buffer_sh[history_m * threadIdx.x + i - 1] = yt;
}
}
- s_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = s;
- y_buffer[(m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = y;
- s_buffer_sh[m * threadIdx.x + m - 1] = s;
- y_buffer_sh[m * threadIdx.x + m - 1] = y;
+ s_buffer[(history_m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = s;
+ y_buffer[(history_m - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = y;
+ s_buffer_sh[history_m * threadIdx.x + history_m - 1] = s;
+ y_buffer_sh[history_m * threadIdx.x + history_m - 1] = y;
grad_0[batch * v_dim + threadIdx.x] = gq;
x_0[batch * v_dim +
threadIdx.x] =
q[batch * v_dim + threadIdx.x];
- if (threadIdx.x < m - 1)
+ if (threadIdx.x < history_m - 1)
{
// m thread participate to shif the values
// this is safe as m<32 and this happens in lockstep
scalar_t rho = rho_buffer[(threadIdx.x + 1) * batchsize + batch];
rho_buffer[threadIdx.x * batchsize + batch] = rho;
- rho_buffer_sh[threadIdx.x * batchsize + batch] = rho;
+ rho_buffer_sh[threadIdx.x] = rho;
}
- if (threadIdx.x == m - 1)
+ if (threadIdx.x == history_m - 1)
{
scalar_t rho = 1.0 / numerator;
@@ -623,10 +289,9 @@ namespace Curobo
rho = 0.0;
}
rho_buffer[threadIdx.x * batchsize + batch] = rho;
- rho_buffer_sh[threadIdx.x * batchsize + batch] = rho;
+ rho_buffer_sh[threadIdx.x] = rho;
}
- // return;
__syncthreads();
////////////////////
@@ -635,25 +300,29 @@ namespace Curobo
// scalar_t alpha_buffer[16];
// assert(m<16); // allocating a buffer assuming m < 16
-#pragma unroll
-
- for (int i = m - 1; i > -1; i--)
+ #pragma unroll
+ for (int i = history_m - 1; i > -1; i--)
{
// reduce(gq * s_buffer[i*batchsize*v_dim + batch*v_dim + threadIdx.x],
// v_dim, &data[0], &result);
- reduce_v1(gq * s_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
- alpha_buffer_sh[threadIdx.x * m + i] =
- result * rho_buffer_sh[i * batchsize + batch];
+ //reduce_v1(gq * s_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
+ reduce_v1(gq * s_buffer_sh[history_m * threadIdx.x + i], v_dim, &data[0], result);
+
+ alpha_buffer_sh[threadIdx.x * history_m + i] =
+ result[0] * rho_buffer_sh[i];
// gq = gq - alpha_buffer_sh[threadIdx.x*m+i]*y_buffer[i*batchsize*v_dim +
// batch*v_dim + threadIdx.x];
- gq = gq - alpha_buffer_sh[threadIdx.x * m + i] *
- y_buffer_sh[m * threadIdx.x + i];
+ gq = gq - alpha_buffer_sh[threadIdx.x * history_m + i] *
+ y_buffer_sh[history_m * threadIdx.x + i];
}
+ //return;
// compute var1
- reduce_v1(y * y, v_dim, data, &result);
- scalar_t denominator = result;
+ //reduce_v1(y * y, v_dim, &data[0], &result);
+ reduce_v1(y * y, v_dim, &data[0], result);
+
+ scalar_t denominator = result[0];
// reduce(s*y, v_dim, data, &result); // redundant - already computed it above
// scalar_t numerator = result;
@@ -670,50 +339,187 @@ namespace Curobo
scalar_t gamma = relu(var1);
gq = gamma * gq;
-#pragma unroll
-
- for (int i = 0; i < m; i++)
+ #pragma unroll
+ for (int i = 0; i < history_m; i++)
{
// reduce(gq * y_buffer[i*batchsize*v_dim + batch*v_dim + threadIdx.x],
// v_dim, &data[0], &result); gq = gq + (alpha_buffer_sh[threadIdx.x*m+i] -
// result * rho_buffer_sh[i*batchsize+batch]) * s_buffer[i*batchsize*v_dim +
// batch*v_dim + threadIdx.x];
- reduce_v1(gq * y_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
- gq = gq + (alpha_buffer_sh[threadIdx.x * m + i] -
- result * rho_buffer_sh[i * batchsize + batch]) *
- s_buffer_sh[m * threadIdx.x + i];
+ //reduce_v1(gq * y_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
+ reduce_v1(gq * y_buffer_sh[history_m * threadIdx.x + i], v_dim, &data[0], result);
+
+ gq = gq + (alpha_buffer_sh[threadIdx.x * history_m + i] -
+ result[0] * rho_buffer_sh[i]) *
+ s_buffer_sh[history_m * threadIdx.x + i];
}
step_vec[batch * v_dim + threadIdx.x] =
-1.0 * gq; // copy from shared memory to global memory
}
- // (32/M) rolls per warp
- // Threads in a warp in a GPU execute in a lock-step. We leverage that to do
- // the roll without using temporary storage or explicit synchronization.
- template
- __global__ void lbfgs_roll(scalar_t *a, // m x b x 175
- scalar_t *b, // m x b x 175
- const int m_t, const int batchsize, const int m,
- const int v_dim)
+ template
+ __global__ void lbfgs_update_buffer_and_step_v1_compile_m(
+ scalar_t *step_vec, // b x 175
+ scalar_t *rho_buffer, // m x b x 1
+ scalar_t *y_buffer, // m x b x 175
+ scalar_t *s_buffer, // m x b x 175
+ scalar_t *q, // b x 175
+ scalar_t *x_0, // b x 175
+ scalar_t *grad_0, // b x 175
+ const scalar_t *grad_q, // b x 175
+ const float epsilon, const int batchsize, const int lbfgs_history, const int v_dim,
+ const bool stable_mode = false) // s_buffer and y_buffer are not rolled by default
{
- assert(m_t <= 32);
- int t = blockDim.x * blockIdx.x + threadIdx.x;
+ extern __shared__ float my_smem_rc[];
+ //__shared__ float my_smem_rc[21 * (3 * (32 * 7) + 1)];
+
+ // align the external shared memory by 4 bytes
+ float* s_buffer_sh = (float *) &my_smem_rc; // m*blockDim.x
- if (t >= m_t * v_dim * batchsize)
+ float* y_buffer_sh = (float *) &s_buffer_sh[FIXED_M * v_dim]; // m*blockDim.x
+ float* alpha_buffer_sh = (float *) &y_buffer_sh[FIXED_M * v_dim]; // m*blockDim.x
+ float* rho_buffer_sh = (float *) &alpha_buffer_sh[FIXED_M * v_dim]; // m*blockDim.x
+
+ psum_t* data = (psum_t *)&rho_buffer_sh[FIXED_M];
+ float* result = (float *)&data[32];
+
+
+ int batch = blockIdx.x; // one block per batch
+
+ if (threadIdx.x >= v_dim)
return;
- int _m = t % m_t;
- int _v_dim = (t / m_t) % v_dim;
- int batch = t / (m * v_dim); // this line could be wrong?
+ scalar_t gq;
+ gq = grad_q[batch * v_dim + threadIdx.x]; // copy grad_q to gq
+ ////////////////////
+ // update_buffer
+ ////////////////////
+ scalar_t y = gq - grad_0[batch * v_dim + threadIdx.x];
- if (_m < m - 1)
+ // if y is close to zero
+ scalar_t s =
+ q[batch * v_dim + threadIdx.x] - x_0[batch * v_dim + threadIdx.x];
+
+ //reduce_v1(y * s, v_dim, &data[0], &result);
+ reduce_v1(y * s, v_dim, &data[0], result);
+
+
+ scalar_t numerator = result[0];
+
+ if (!rolled_ys)
{
- a[_m * batchsize * v_dim + batch * v_dim + _v_dim] =
- a[(_m + 1) * batchsize * v_dim + batch * v_dim + _v_dim];
- b[_m * batchsize * v_dim + batch * v_dim + _v_dim] =
- b[(_m + 1) * batchsize * v_dim + batch * v_dim + _v_dim];
+ scalar_t st = 0;
+ scalar_t yt = 0;
+ #pragma unroll
+ for (int i = 1; i < FIXED_M ; i++)
+ {
+ st =
+ s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ yt =
+ y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ s_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = st;
+ y_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = yt;
+ s_buffer_sh[FIXED_M * threadIdx.x + i - 1] = st;
+ y_buffer_sh[FIXED_M * threadIdx.x + i - 1] = yt;
+ }
}
+
+ s_buffer[(FIXED_M - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = s;
+ y_buffer[(FIXED_M - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = y;
+ s_buffer_sh[FIXED_M * threadIdx.x + FIXED_M - 1] = s;
+ y_buffer_sh[FIXED_M * threadIdx.x + FIXED_M - 1] = y;
+ grad_0[batch * v_dim + threadIdx.x] = gq;
+ x_0[batch * v_dim +
+ threadIdx.x] =
+ q[batch * v_dim + threadIdx.x];
+
+ if (threadIdx.x < FIXED_M - 1)
+ {
+ // m thread participate to shif the values
+ // this is safe as m<32 and this happens in lockstep
+ scalar_t rho = rho_buffer[(threadIdx.x + 1) * batchsize + batch];
+ rho_buffer[threadIdx.x * batchsize + batch] = rho;
+ rho_buffer_sh[threadIdx.x] = rho;
+ }
+
+ if (threadIdx.x == FIXED_M - 1)
+ {
+ scalar_t rho = 1.0 / numerator;
+
+ // if this is nan, make it zero:
+ if (stable_mode && (numerator == 0.0))
+ {
+ rho = 0.0;
+ }
+ rho_buffer[threadIdx.x * batchsize + batch] = rho;
+ rho_buffer_sh[threadIdx.x] = rho;
+ }
+
+ __syncthreads();
+
+ ////////////////////
+ // step
+ ////////////////////
+ // scalar_t alpha_buffer[16];
+ // assert(m<16); // allocating a buffer assuming m < 16
+
+ #pragma unroll
+ for (int i = FIXED_M - 1; i > -1; i--)
+ {
+ // reduce(gq * s_buffer[i*batchsize*v_dim + batch*v_dim + threadIdx.x],
+ // v_dim, &data[0], &result);
+ //reduce_v1(gq * s_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
+ reduce_v1(gq * s_buffer_sh[FIXED_M * threadIdx.x + i], v_dim, &data[0], result);
+
+ alpha_buffer_sh[threadIdx.x * FIXED_M + i] =
+ result[0] * rho_buffer_sh[i];
+
+ // gq = gq - alpha_buffer_sh[threadIdx.x*m+i]*y_buffer[i*batchsize*v_dim +
+ // batch*v_dim + threadIdx.x];
+ gq = gq - alpha_buffer_sh[threadIdx.x * FIXED_M + i] *
+ y_buffer_sh[FIXED_M * threadIdx.x + i];
+ }
+ //return;
+
+ // compute var1
+ //reduce_v1(y * y, v_dim, &data[0], &result);
+ reduce_v1(y * y, v_dim, &data[0], result);
+
+ scalar_t denominator = result[0];
+
+ // reduce(s*y, v_dim, data, &result); // redundant - already computed it above
+ // scalar_t numerator = result;
+ scalar_t var1 = numerator / denominator;
+
+ // To improve stability, uncomment below line: [this however leads to poor
+ // convergence]
+
+ if (stable_mode && (denominator == 0.0))
+ {
+ var1 = epsilon;
+ }
+
+ scalar_t gamma = relu(var1);
+ gq = gamma * gq;
+
+ #pragma unroll
+ for (int i = 0; i < FIXED_M ; i++)
+ {
+ // reduce(gq * y_buffer[i*batchsize*v_dim + batch*v_dim + threadIdx.x],
+ // v_dim, &data[0], &result); gq = gq + (alpha_buffer_sh[threadIdx.x*m+i] -
+ // result * rho_buffer_sh[i*batchsize+batch]) * s_buffer[i*batchsize*v_dim +
+ // batch*v_dim + threadIdx.x];
+ //reduce_v1(gq * y_buffer_sh[m * threadIdx.x + i], v_dim, &data[0], &result);
+ reduce_v1(gq * y_buffer_sh[FIXED_M * threadIdx.x + i], v_dim, &data[0], result);
+
+ gq = gq + (alpha_buffer_sh[threadIdx.x * FIXED_M + i] -
+ result[0] * rho_buffer_sh[i]) *
+ s_buffer_sh[FIXED_M * threadIdx.x + i];
+ }
+
+ step_vec[batch * v_dim + threadIdx.x] =
+ -1.0 * gq; // copy from shared memory to global memory
}
template
@@ -728,14 +534,19 @@ namespace Curobo
const scalar_t *grad_q, // b x 175
const float epsilon, const int batchsize, const int m, const int v_dim,
const bool stable_mode =
- false) // s_buffer and y_buffer are not rolled by default
+ false) // s_buffer and y_buffer are not rolled by default
{
- // extern __shared__ scalar_t alpha_buffer_sh[];
- extern __shared__ __align__(sizeof(scalar_t)) unsigned char my_smem[];
- scalar_t *alpha_buffer_sh = reinterpret_cast(my_smem);
-
- __shared__ psum_t
- data[32]; // temporary buffer needed for block-wide reduction
+ extern __shared__ float alpha_buffer_sh[];
+
+ //extern __shared__ __align__(sizeof(scalar_t)) unsigned char my_smem[];
+ //scalar_t *alpha_buffer_sh = reinterpret_cast(my_smem);
+
+ //extern __shared__ __align__(sizeof(float)) unsigned char my_smem[];
+ //float *alpha_buffer_sh = reinterpret_cast(my_smem);
+
+ __shared__ psum_t
+ data[32];
+ // temporary buffer needed for block-wide reduction
__shared__ scalar_t
result; // result of the reduction or vector-vector dot product
int batch = blockIdx.x; // one block per batch
@@ -754,7 +565,7 @@ namespace Curobo
// if y is close to zero
scalar_t s =
q[batch * v_dim + threadIdx.x] - x_0[batch * v_dim + threadIdx.x];
- reduce(y * s, v_dim, &data[0], &result);
+ reduce_v1(y * s, v_dim, &data[0], &result);
scalar_t numerator = result;
// scalar_t rho = 1.0/numerator;
@@ -805,11 +616,10 @@ namespace Curobo
// scalar_t alpha_buffer[16];
// assert(m<16); // allocating a buffer assuming m < 16
-#pragma unroll
-
+ #pragma unroll
for (int i = m - 1; i > -1; i--)
{
- reduce(gq * s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
+ reduce_v1(gq * s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
v_dim, &data[0], &result);
alpha_buffer_sh[threadIdx.x * m + i] =
result * rho_buffer[i * batchsize + batch];
@@ -818,7 +628,7 @@ namespace Curobo
}
// compute var1
- reduce(y * y, v_dim, data, &result);
+ reduce_v1(y * y, v_dim, &data[0], &result);
scalar_t denominator = result;
// reduce(s*y, v_dim, data, &result); // redundant - already computed it above
@@ -837,11 +647,10 @@ namespace Curobo
gq = gamma * gq;
-#pragma unroll
-
+ #pragma unroll
for (int i = 0; i < m; i++)
{
- reduce(gq * y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
+ reduce_v1(gq * y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
v_dim, &data[0], &result);
gq = gq + (alpha_buffer_sh[threadIdx.x * m + i] -
result * rho_buffer[i * batchsize + batch]) *
@@ -851,157 +660,287 @@ namespace Curobo
step_vec[batch * v_dim + threadIdx.x] =
-1.0 * gq; // copy from shared memory to global memory
}
+
+
+template
+ __global__ void lbfgs_update_buffer_and_step_compile_m(
+ scalar_t *step_vec, // b x 175
+ scalar_t *rho_buffer, // m x b x 1
+ scalar_t *y_buffer, // m x b x 175
+ scalar_t *s_buffer, // m x b x 175
+ scalar_t *q, // b x 175
+ scalar_t *x_0, // b x 175
+ scalar_t *grad_0, // b x 175
+ const scalar_t *grad_q, // b x 175
+ const float epsilon, const int batchsize, const int m, const int v_dim,
+ const bool stable_mode =
+ false) // s_buffer and y_buffer are not rolled by default
+ {
+ extern __shared__ float alpha_buffer_sh[];
+
+ //extern __shared__ __align__(sizeof(scalar_t)) unsigned char my_smem[];
+ //scalar_t *alpha_buffer_sh = reinterpret_cast(my_smem);
+
+ //extern __shared__ __align__(sizeof(float)) unsigned char my_smem[];
+ //float *alpha_buffer_sh = reinterpret_cast(my_smem);
+
+ __shared__ psum_t data[32];
+ // temporary buffer needed for block-wide reduction
+ __shared__ scalar_t
+ result; // result of the reduction or vector-vector dot product
+ int batch = blockIdx.x; // one block per batch
+
+ if (threadIdx.x >= v_dim)
+ return;
+
+ scalar_t gq;
+ gq = grad_q[batch * v_dim + threadIdx.x]; // copy grad_q to gq
+
+ ////////////////////
+ // update_buffer
+ ////////////////////
+ scalar_t y = gq - grad_0[batch * v_dim + threadIdx.x];
+
+ // if y is close to zero
+ scalar_t s =
+ q[batch * v_dim + threadIdx.x] - x_0[batch * v_dim + threadIdx.x];
+ reduce_v1(y * s, v_dim, &data[0], &result);
+ scalar_t numerator = result;
+
+ // scalar_t rho = 1.0/numerator;
+
+ if (!rolled_ys)
+ {
+
+ # pragma unroll
+ for (int i = 1; i < FIXED_M; i++)
+ {
+ s_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] =
+ s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ y_buffer[(i - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] =
+ y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ }
+ }
+
+ s_buffer[(FIXED_M - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = s;
+ y_buffer[(FIXED_M - 1) * batchsize * v_dim + batch * v_dim + threadIdx.x] = y;
+
+ grad_0[batch * v_dim + threadIdx.x] = gq;
+ x_0[batch * v_dim +
+ threadIdx.x] =
+ q[batch * v_dim + threadIdx.x];
+
+ if (threadIdx.x < FIXED_M - 1)
+ {
+ // m thread participate to shif the values
+ // this is safe as m<32 and this happens in lockstep
+ rho_buffer[threadIdx.x * batchsize + batch] =
+ rho_buffer[(threadIdx.x + 1) * batchsize + batch];
+ }
+
+ if (threadIdx.x == FIXED_M - 1)
+ {
+ scalar_t rho = 1.0 / numerator;
+
+ // if this is nan, make it zero:
+ if (stable_mode && (numerator == 0.0))
+ {
+ rho = 0.0;
+ }
+ rho_buffer[threadIdx.x * batchsize + batch] = rho;
+ }
+
+
+ #pragma unroll
+ for (int i = FIXED_M - 1; i > -1; i--)
+ {
+ reduce_v1(gq * s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
+ v_dim, &data[0], &result);
+ alpha_buffer_sh[threadIdx.x * FIXED_M + i] =
+ result * rho_buffer[i * batchsize + batch];
+ gq = gq - alpha_buffer_sh[threadIdx.x * FIXED_M + i] *
+ y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ }
+
+ // compute var1
+ reduce_v1(y * y, v_dim, &data[0], &result);
+ scalar_t denominator = result;
+
+ // reduce(s*y, v_dim, data, &result); // redundant - already computed it above
+ // scalar_t numerator = result;
+ scalar_t var1 = numerator / denominator;
+
+ // To improve stability, uncomment below line: [this however leads to poor
+ // convergence]
+
+ if (stable_mode && (denominator == 0.0))
+ {
+ var1 = epsilon;
+ }
+
+ scalar_t gamma = relu(var1);
+
+ gq = gamma * gq;
+
+ #pragma unroll
+ for (int i = 0; i < FIXED_M; i++)
+ {
+ reduce_v1(gq * y_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x],
+ v_dim, &data[0], &result);
+ gq = gq + (alpha_buffer_sh[threadIdx.x * FIXED_M + i] -
+ result * rho_buffer[i * batchsize + batch]) *
+ s_buffer[i * batchsize * v_dim + batch * v_dim + threadIdx.x];
+ }
+
+ step_vec[batch * v_dim + threadIdx.x] =
+ -1.0 * gq; // copy from shared memory to global memory
+ }
} // namespace Optimization
} // namespace Curobo
-std::vector
-lbfgs_step_cuda(torch::Tensor step_vec, torch::Tensor rho_buffer,
- torch::Tensor y_buffer, torch::Tensor s_buffer,
- torch::Tensor grad_q, const float epsilon, const int batch_size,
- const int m, const int v_dim)
-{
- using namespace Curobo::Optimization;
- const int threadsPerBlock = 128;
- const int blocksPerGrid =
- ((batch_size) + threadsPerBlock - 1) / threadsPerBlock;
-
- // launch threads per batch:
- // int threadsPerBlock = pow(2,((int)log2(v_dim))+1);
-
- // const int blocksPerGrid = batch_size; //((batch_size) + threadsPerBlock -
- // 1) / threadsPerBlock;
-
- cudaStream_t stream = at::cuda::getCurrentCUDAStream();
-
- AT_DISPATCH_FLOATING_TYPES(
- step_vec.scalar_type(), "lbfgs_step_cu", ([&] {
- lbfgs_step_kernel_old
- << < blocksPerGrid, threadsPerBlock,
- v_dim * sizeof(step_vec.scalar_type()), stream >> > (
- step_vec.data_ptr(), rho_buffer.data_ptr(),
- y_buffer.data_ptr(), s_buffer.data_ptr(),
- grad_q.data_ptr(), epsilon, batch_size, m, v_dim);
- }));
-
- C10_CUDA_KERNEL_LAUNCH_CHECK();
-
- return { step_vec };
-}
-
-std::vector
-lbfgs_update_cuda(torch::Tensor rho_buffer, torch::Tensor y_buffer,
- torch::Tensor s_buffer, torch::Tensor q, torch::Tensor grad_q,
- torch::Tensor x_0, torch::Tensor grad_0, const int batch_size,
- const int m, const int v_dim)
-{
- using namespace Curobo::Optimization;
-
- // const int threadsPerBlock = 128;
- // launch threads per batch:
- // int threadsPerBlock = pow(2,((int)log2(v_dim))+1);
- int threadsPerBlock = v_dim;
-
- const int blocksPerGrid =
- batch_size; // ((batch_size) + threadsPerBlock - 1) / threadsPerBlock;
-
- cudaStream_t stream = at::cuda::getCurrentCUDAStream();
-
- AT_DISPATCH_FLOATING_TYPES(
- y_buffer.scalar_type(), "lbfgs_update_cu", ([&] {
- lbfgs_update_buffer_kernel
- << < blocksPerGrid, threadsPerBlock,
- v_dim * sizeof(y_buffer.scalar_type()), stream >> > (
- rho_buffer.data_ptr(), y_buffer.data_ptr(),
- s_buffer.data_ptr(), q.data_ptr(),
- x_0.data_ptr(), grad_0.data_ptr(),
- grad_q.data_ptr(), batch_size, m, v_dim);
- }));
- C10_CUDA_KERNEL_LAUNCH_CHECK();
-
- return { rho_buffer, y_buffer, s_buffer, x_0, grad_0 };
-}
std::vector
lbfgs_cuda_fuse(torch::Tensor step_vec, torch::Tensor rho_buffer,
torch::Tensor y_buffer, torch::Tensor s_buffer, torch::Tensor q,
torch::Tensor grad_q, torch::Tensor x_0, torch::Tensor grad_0,
- const float epsilon, const int batch_size, const int m,
- const int v_dim, const bool stable_mode)
+ const float epsilon, const int batch_size, const int history_m,
+ const int v_dim, const bool stable_mode, const bool use_shared_buffers)
{
using namespace Curobo::Optimization;
// call first kernel:
+ //const bool use_experimental = true;
+ const bool use_fixed_m = true;
int threadsPerBlock = v_dim;
assert(threadsPerBlock < 1024);
- assert(m < M_MAX);
- int blocksPerGrid =
- batch_size; // ((batch_size) + threadsPerBlock - 1) / threadsPerBlock;
+ assert(history_m < 32);
+ int blocksPerGrid = batch_size;
cudaStream_t stream = at::cuda::getCurrentCUDAStream();
- int smemsize = 0;
- if (true)
+ const int smemsize = history_m * v_dim * sizeof(float);
+ const int shared_buffer_smemsize = (((3 * v_dim) + 1) * history_m + 32) * sizeof(float);
+ const int max_shared_increase = shared_buffer_smemsize;
+ const int max_shared_base = 48000;
+ const int max_shared_allowed = 65536; // Turing limit, others support 98304
+ int max_shared = max_shared_base;
+
+ auto kernel_5 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+ auto kernel_6 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+ auto kernel_7 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+
+ auto kernel_15 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+ auto kernel_27 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+
+ auto kernel_31 = Curobo::Optimization::lbfgs_update_buffer_and_step_v1_compile_m;
+ auto kernel_n = Curobo::Optimization::lbfgs_update_buffer_and_step_v1;
+
+
+ auto stable_kernel_5 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+ auto stable_kernel_6 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+ auto stable_kernel_7 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+
+ auto stable_kernel_15 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+ auto stable_kernel_27 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+
+ auto stable_kernel_31 = Curobo::Optimization::lbfgs_update_buffer_and_step_compile_m;
+ auto stable_kernel_n = Curobo::Optimization::lbfgs_update_buffer_and_step;
+
+
+ auto selected_kernel = kernel_n;
+ auto stable_selected_kernel = stable_kernel_n;
+
+
+ switch (history_m)
{
- AT_DISPATCH_FLOATING_TYPES(
- y_buffer.scalar_type(), "lbfgs_cuda_fuse_kernel", [&] {
- smemsize = m * threadsPerBlock * sizeof(scalar_t);
- lbfgs_update_buffer_and_step
- << < blocksPerGrid, threadsPerBlock, smemsize, stream >> > (
- step_vec.data_ptr(),
- rho_buffer.data_ptr(),
- y_buffer.data_ptr(), s_buffer.data_ptr(),
- q.data_ptr(), x_0.data_ptr(),
- grad_0.data_ptr(), grad_q.data_ptr(),
- epsilon, batch_size, m, v_dim, stable_mode);
- });
+
+ case 5:
+ selected_kernel = kernel_5;
+ stable_selected_kernel = stable_kernel_5;
+ break;
+ case 6:
+ selected_kernel = kernel_6;
+ stable_selected_kernel = stable_kernel_6;
+ break;
+ case 7:
+ selected_kernel = kernel_7;
+ stable_selected_kernel = stable_kernel_7;
+ break;
+
+ case 15:
+ selected_kernel = kernel_15;
+ stable_selected_kernel = stable_kernel_15;
+
+ break;
+ case 27:
+ selected_kernel = kernel_27;
+ stable_selected_kernel = stable_kernel_27;
+
+ break;
+ case 31:
+ selected_kernel = kernel_31;
+ stable_selected_kernel = stable_kernel_31;
+
+ break;
}
- else
+ if (!use_fixed_m)
{
- // v1 does not work
- AT_DISPATCH_FLOATING_TYPES(
- y_buffer.scalar_type(), "lbfgs_cuda_fuse_kernel_v1", [&] {
- smemsize = 3 * m * threadsPerBlock * sizeof(scalar_t) +
- m * batch_size * sizeof(scalar_t);
- lbfgs_update_buffer_and_step_v1
- << < blocksPerGrid, threadsPerBlock, smemsize, stream >> > (
- step_vec.data_ptr(),
- rho_buffer.data_ptr(),
- y_buffer.data_ptr(), s_buffer.data_ptr(),
- q.data_ptr(), x_0.data_ptr(),
- grad_0.data_ptr(), grad_q.data_ptr(),
- epsilon, batch_size, m, v_dim, stable_mode);
- });
+ stable_selected_kernel = stable_kernel_n;
+ selected_kernel = kernel_n;
+ }
+
+ // try to increase shared memory:
+ // Note that this feature is only available from volta+ (cuda 7.0+)
+ #if (VOLTA_PLUS)
+ {
+
+
+ if (use_shared_buffers && max_shared_increase > max_shared_base && max_shared_increase <= max_shared_allowed)
+ {
+ max_shared = max_shared_increase;
+ cudaError_t result;
+ result = cudaFuncSetAttribute(selected_kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, max_shared_increase);
+ if (result != cudaSuccess)
+ {
+ max_shared = max_shared_base;
+ }
+
+ }
+ }
+ #endif
+
+ if (use_shared_buffers && shared_buffer_smemsize <= max_shared)
+ {
+
+ selected_kernel
+ << < blocksPerGrid, threadsPerBlock, shared_buffer_smemsize, stream >> > (
+ step_vec.data_ptr(),
+ rho_buffer.data_ptr(),
+ y_buffer.data_ptr(), s_buffer.data_ptr(),
+ q.data_ptr(), x_0.data_ptr(),
+ grad_0.data_ptr(), grad_q.data_ptr(),
+ epsilon, batch_size, history_m, v_dim, stable_mode);
+
+
+ }
+ else
+ {
+
+ stable_selected_kernel
+ << < blocksPerGrid, threadsPerBlock, smemsize, stream >> > (
+ step_vec.data_ptr(),
+ rho_buffer.data_ptr(),
+ y_buffer.data_ptr(), s_buffer.data_ptr(),
+ q.data_ptr(), x_0.data_ptr(),
+ grad_0.data_ptr(), grad_q.data_ptr(),
+ epsilon, batch_size, history_m, v_dim, stable_mode);
+
}
+
+
C10_CUDA_KERNEL_LAUNCH_CHECK();
return { step_vec, rho_buffer, y_buffer, s_buffer, x_0, grad_0 };
}
-std::vectorreduce_cuda(torch::Tensor vec, torch::Tensor vec2,
- torch::Tensor rho_buffer,
- torch::Tensor sum, const int batch_size,
- const int m, const int v_dim)
-{
- using namespace Curobo::Optimization;
-
- int threadsPerBlock = pow(2, ((int)log2(v_dim)) + 1);
- int blocksPerGrid =
- batch_size; // ((batch_size) + threadsPerBlock - 1) / threadsPerBlock;
- // printf("threadsPerBlock:%d, blocksPerGrid: %d\n",
- // threadsPerBlock, blocksPerGrid);
-
- cudaStream_t stream = at::cuda::getCurrentCUDAStream();
-
- AT_DISPATCH_FLOATING_TYPES(
- vec.scalar_type(), "reduce_cu", ([&] {
- reduce_kernel
- << < blocksPerGrid, threadsPerBlock, 0, stream >> > (
- vec.data_ptr(), vec2.data_ptr(),
- rho_buffer.data_ptr(), sum.data_ptr(),
- batch_size, m, v_dim);
- }));
- C10_CUDA_KERNEL_LAUNCH_CHECK();
-
- return { sum, rho_buffer };
-}
diff --git a/src/curobo/curobolib/cpp/line_search_kernel.cu b/src/curobo/curobolib/cpp/line_search_kernel.cu
index ff061a1..ab13d1a 100644
--- a/src/curobo/curobolib/cpp/line_search_kernel.cu
+++ b/src/curobo/curobolib/cpp/line_search_kernel.cu
@@ -9,7 +9,6 @@
* its affiliates is strictly prohibited.
*/
-#pragma once
#include
#include
#include
diff --git a/src/curobo/curobolib/cpp/self_collision_kernel.cu b/src/curobo/curobolib/cpp/self_collision_kernel.cu
index b9aaeb5..7b6d19f 100644
--- a/src/curobo/curobolib/cpp/self_collision_kernel.cu
+++ b/src/curobo/curobolib/cpp/self_collision_kernel.cu
@@ -185,11 +185,11 @@ namespace Curobo
if (coll_matrix[i * nspheres + j] == 1)
{
float4 sph1 = __rs_shared[i];
-
- if ((sph1.w <= 0.0) || (sph2.w <= 0.0))
- {
- continue;
- }
+ //
+ //if ((sph1.w <= 0.0) || (sph2.w <= 0.0))
+ //{
+ // continue;
+ //}
float r_diff = sph1.w + sph2.w;
float d = sqrt((sph1.x - sph2.x) * (sph1.x - sph2.x) +
(sph1.y - sph2.y) * (sph1.y - sph2.y) +
@@ -380,10 +380,10 @@ namespace Curobo
float4 sph1 = __rs_shared[NBPB * i + l];
float4 sph2 = __rs_shared[NBPB * j + l];
- if ((sph1.w <= 0.0) || (sph2.w <= 0.0))
- {
- continue;
- }
+ //if ((sph1.w <= 0.0) || (sph2.w <= 0.0))
+ //{
+ // continue;
+ //}
float r_diff =
sph1.w + sph2.w; // sum of two radii, radii include respective offsets
float d = sqrt((sph1.x - sph2.x) * (sph1.x - sph2.x) +
diff --git a/src/curobo/curobolib/cpp/sphere_obb_kernel.cu b/src/curobo/curobolib/cpp/sphere_obb_kernel.cu
index ff1022f..698aa0b 100644
--- a/src/curobo/curobolib/cpp/sphere_obb_kernel.cu
+++ b/src/curobo/curobolib/cpp/sphere_obb_kernel.cu
@@ -16,8 +16,25 @@
#include "helper_math.h"
#include
+#include
#include
+#include
+#include
+
+// NOTE: AT_ASSERT has become AT_CHECK on master after 0.4.
+#define CHECK_CUDA(x) AT_ASSERTM(x.is_cuda(), # x " must be a CUDA tensor")
+#define CHECK_CONTIGUOUS(x) \
+ AT_ASSERTM(x.is_contiguous(), # x " must be contiguous")
+#define CHECK_INPUT(x) \
+ CHECK_CUDA(x); \
+ CHECK_CONTIGUOUS(x)
+
+#if defined(CUDA_VERSION) && CUDA_VERSION >= 11080 && TORCH_VERSION_MAJOR >= 2 && TORCH_VERSION_MINOR >= 2
+#include
+#endif
#define M 4
+#define VOXEL_DEBUG true
+#define VOXEL_UNOBSERVED_DISTANCE -1000.0
// #define MAX_WARP_THREADS 512 // warp level batching. 8 x M = 32
#define MAX_BOX_SHARED 256 // maximum number of boxes we can store distance and closest point
@@ -45,6 +62,109 @@ namespace Curobo
return max(0.0f, sphere_length(v1, v2) - v1.w - v2.w);
}
+
+ #if defined(CUDA_VERSION) && CUDA_VERSION >= 11080 && TORCH_VERSION_MAJOR >= 2 && TORCH_VERSION_MINOR >= 2
+
+ __device__ __forceinline__ float
+ get_array_value(const at::Float8_e4m3fn *grid_features, const int voxel_idx)
+ {
+
+ __nv_fp8_storage_t value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ float value = __half2float(__nv_cvt_fp8_to_halfraw(value_in, __NV_E4M3));
+
+ return value;
+ }
+
+ __device__ __forceinline__ void
+ get_array_value(const at::Float8_e4m3fn *grid_features, const int voxel_idx, float &value)
+ {
+
+ __nv_fp8_storage_t value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ value = __half2float(__nv_cvt_fp8_to_halfraw(value_in, __NV_E4M3));
+
+ }
+
+ #endif
+
+ __device__ __forceinline__ float
+ get_array_value(const at::BFloat16 *grid_features, const int voxel_idx)
+ {
+
+ __nv_bfloat16 value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ float value = __bfloat162float(value_in);
+
+ return value;
+ }
+
+ __device__ __forceinline__ float
+ get_array_value(const at::Half *grid_features, const int voxel_idx)
+ {
+
+ __nv_half value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ float value = __half2float(value_in);
+
+ return value;
+ }
+
+ __device__ __forceinline__ float
+ get_array_value(const float *grid_features, const int voxel_idx)
+ {
+
+ float value = grid_features[voxel_idx];
+
+ return value;
+ }
+
+ __device__ __forceinline__ float
+ get_array_value(const double *grid_features, const int voxel_idx)
+ {
+
+ float value = (float) grid_features[voxel_idx];
+
+ return value;
+ }
+
+
+ __device__ __forceinline__ void
+ get_array_value(const at::BFloat16 *grid_features, const int voxel_idx, float &value)
+ {
+
+ __nv_bfloat16 value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ value = __bfloat162float(value_in);
+
+ }
+
+ __device__ __forceinline__ void
+ get_array_value(const at::Half *grid_features, const int voxel_idx, float &value)
+ {
+
+ __nv_half value_in = (reinterpret_cast (grid_features))[voxel_idx];
+
+ value = __half2float(value_in);
+
+ }
+
+ __device__ __forceinline__ void
+ get_array_value(const float *grid_features, const int voxel_idx, float &value)
+ {
+
+ value = grid_features[voxel_idx];
+
+ }
+
+ __device__ __forceinline__ void
+ get_array_value(const double *grid_features, const int voxel_idx, float &value)
+ {
+
+ value = (float) grid_features[voxel_idx];
+
+ }
+
template
__device__ __forceinline__ void load_obb_pose(const scalar_t *obb_mat,
float3& position, float4& quat)
@@ -72,6 +192,7 @@ namespace Curobo
bounds.z = loc_bounds.z / 2;
}
+
template
__device__ __forceinline__ void
transform_sphere_quat(const scalar_t *transform_mat, // x,y,z, qw, qx,qy,qz
@@ -164,9 +285,9 @@ namespace Curobo
}
else
{
- C.x = sphere_pos.x;
- C.y = sphere_pos.y;
- C.z = sphere_pos.z;
+ C.x = sphere_pos.x ;
+ C.y = sphere_pos.y ;
+ C.z = sphere_pos.z ;
}
}
@@ -183,73 +304,6 @@ namespace Curobo
C = C + temp_C;
}
- template
- __device__ __forceinline__ void
- inv_transform_vec_quat(const scalar_t *transform_mat, // x,y,z, qw, qx,qy,qz
- const float4& sphere_pos, float3& C)
- {
- // do dot product:
- // new_p = q * p * q_inv + obs_p
- const scalar_t w = transform_mat[3];
- const scalar_t x = -1 * transform_mat[4];
- const scalar_t y = -1 * transform_mat[5];
- const scalar_t z = -1 * transform_mat[6];
-
- if ((x != 0) || (y != 0) || (z != 0))
- {
- C.x = w * w * sphere_pos.x + 2 * y * w * sphere_pos.z -
- 2 * z * w * sphere_pos.y + x * x * sphere_pos.x +
- 2 * y * x * sphere_pos.y + 2 * z * x * sphere_pos.z -
- z * z * sphere_pos.x - y * y * sphere_pos.x;
- C.y = 2 * x * y * sphere_pos.x + y * y * sphere_pos.y +
- 2 * z * y * sphere_pos.z + 2 * w * z * sphere_pos.x -
- z * z * sphere_pos.y + w * w * sphere_pos.y - 2 * x * w * sphere_pos.z -
- x * x * sphere_pos.y;
- C.z = 2 * x * z * sphere_pos.x + 2 * y * z * sphere_pos.y +
- z * z * sphere_pos.z - 2 * w * y * sphere_pos.x - y * y * sphere_pos.z +
- 2 * w * x * sphere_pos.y - x * x * sphere_pos.z + w * w * sphere_pos.z;
- }
- else
- {
- C.x = sphere_pos.x;
- C.y = sphere_pos.y;
- C.z = sphere_pos.z;
- }
- }
-
- template
- __device__ __forceinline__ void
- inv_transform_vec_quat_add(const scalar_t *transform_mat, // x,y,z, qw, qx,qy,qz
- const float4& sphere_pos, float3& C)
- {
- // do dot product:
- // new_p = q * p * q_inv + obs_p
- const scalar_t w = transform_mat[3];
- const scalar_t x = -1 * transform_mat[4];
- const scalar_t y = -1 * transform_mat[5];
- const scalar_t z = -1 * transform_mat[6];
-
- if ((x != 0) || (y != 0) || (z != 0))
- {
- C.x += w * w * sphere_pos.x + 2 * y * w * sphere_pos.z -
- 2 * z * w * sphere_pos.y + x * x * sphere_pos.x +
- 2 * y * x * sphere_pos.y + 2 * z * x * sphere_pos.z -
- z * z * sphere_pos.x - y * y * sphere_pos.x;
- C.y += 2 * x * y * sphere_pos.x + y * y * sphere_pos.y +
- 2 * z * y * sphere_pos.z + 2 * w * z * sphere_pos.x -
- z * z * sphere_pos.y + w * w * sphere_pos.y -
- 2 * x * w * sphere_pos.z - x * x * sphere_pos.y;
- C.z += 2 * x * z * sphere_pos.x + 2 * y * z * sphere_pos.y +
- z * z * sphere_pos.z - 2 * w * y * sphere_pos.x -
- y * y * sphere_pos.z + 2 * w * x * sphere_pos.y -
- x * x * sphere_pos.z + w * w * sphere_pos.z;
- }
- {
- C.x += sphere_pos.x;
- C.y += sphere_pos.y;
- C.z += sphere_pos.z;
- }
- }
/**
* @brief Scales the Collision across the trajectory by sphere velocity. This is
@@ -277,7 +331,10 @@ namespace Curobo
norm_vel_vec = (0.5 / dt) * norm_vel_vec;
const float sph_vel = length(norm_vel_vec);
-
+ if(sph_vel < 0.001)
+ {
+ return;
+ }
if (transform_back)
{
float3 sph_acc_vec = make_float3(
@@ -300,11 +357,11 @@ namespace Curobo
vel_arr[2] = norm_vel_vec.z;
// calculate projection ( I - (v * v^T)):
-#pragma unroll 3
+#pragma unroll
for (int i = 0; i < 3; i++)
{
-#pragma unroll 3
+#pragma unroll
for (int j = 0; j < 3; j++)
{
@@ -322,7 +379,7 @@ namespace Curobo
float orth_pt[3]; // orth_proj(3x3) * max_grad(3x1)
float orth_curve[3]; // max_dist(1) * orth_proj (3x3) * curvature_vec (3x1)
-#pragma unroll 3
+#pragma unroll
for (int i = 0; i < 3; i++) // matrix - vector product
{
@@ -348,6 +405,120 @@ namespace Curobo
//
+
+
+ __device__ __forceinline__ void
+ compute_closest_point(const float3& bounds, const float4& sphere,
+ float3& delta, float& distance, float& sph_distance)
+ {
+ float3 pt = make_float3(sphere.x, sphere.y, sphere.z);
+ bool inside = true;
+
+ if (max(max(fabs(sphere.x) - bounds.x, fabs(sphere.y) - bounds.y),
+ fabs(sphere.z) - bounds.z) >= (0.0))
+ {
+ inside = false;
+ }
+
+
+ float3 val = make_float3(sphere.x,sphere.y,sphere.z);
+ val = bounds - fabs(val);
+
+ if(!inside)
+ {
+
+
+ if (val.x < 0) // it's outside limits, clamp:
+ {
+ pt.x = copysignf(bounds.x, sphere.x);
+ }
+
+
+ if (val.y < 0) // it's outside limits, clamp:
+ {
+ pt.y = copysignf(bounds.y, sphere.y);
+ }
+
+ if (val.z < 0) // it's outside limits, clamp:
+ {
+ pt.z = copysignf(bounds.z, sphere.z);
+ }
+ }
+ else
+ {
+
+
+ val = fabs(val);
+
+
+
+
+ if (val.y <= val.x && val.y <= val.z)
+ {
+
+ if(sphere.y > 0)
+ {
+ pt.y = bounds.y;
+ }
+ else
+ {
+ pt.y = -1 * bounds.y;
+ }
+ }
+
+ else if (val.x <= val.y && val.x <= val.z)
+ {
+ if(sphere.x > 0)
+ {
+ pt.x = bounds.x;
+ }
+ else
+ {
+ pt.x = -1 * bounds.x;
+ }
+ }
+ else if (val.z <= val.x && val.z <= val.y)
+ {
+
+ if(sphere.z > 0)
+ {
+ pt.z = bounds.z;
+ }
+ else
+ {
+ pt.z = -1 * bounds.z;
+ }
+ }
+
+
+
+
+ }
+
+ delta = make_float3(pt.x - sphere.x, pt.y - sphere.y, pt.z - sphere.z);
+
+ distance = length(delta);
+
+ if (!inside)
+ {
+ distance *= -1.0;
+ }
+ else
+ {
+ delta = -1 * delta;
+ }
+
+ if (distance != 0.0)
+ {
+ delta = normalize(delta);
+
+ }
+ sph_distance = distance + sphere.w;
+ //
+
+
+ }
+
/**
* @brief check if sphere is inside. For numerical stability, we assume that if
* sphere is exactly at bound of cuboid, we are not in collision. Note: this is
@@ -358,226 +529,404 @@ namespace Curobo
* @return bool
*/
__device__ __forceinline__ bool
- check_sphere_aabb(const float3 bounds, const float4 sphere)
+ check_sphere_aabb(const float3 bounds, const float4 sphere, bool &inside,
+ float3& delta, float& distance, float& sph_distance)
{
// if((fabs(sphere.x) - bounds.x) >= sphere.w || fabs(sphere.y) - bounds.y >=
// sphere.w || (fabs(sphere.z) - bounds.z) >= sphere.w)
+
+ inside = false;
+
+
if (max(max(fabs(sphere.x) - bounds.x, fabs(sphere.y) - bounds.y),
fabs(sphere.z) - bounds.z) >= (sphere.w))
{
return false;
}
- return true;
+ // if it's within aabb, check more accurately:
+ // compute closest point:
+
+ compute_closest_point(bounds, sphere, delta, distance, sph_distance);
+ if (sph_distance > 0)
+ {
+ inside = true;
+ }
+
+ return inside;
}
-
- __device__ __forceinline__ void
- compute_closest_point(const float3& bounds, const float4& sphere, float4& pt)
- {
- // assuming the cuboid is centered at origin
- // Find the closest face to the sphere position:
- // If sphere is within bounds of obstacle,
- float min_val, curr_val;
- int min_idx;
-
- // We check the distance to each face and store the closest face
- // All we want is the index of the closest face
- min_val = bounds.x - fabsf(sphere.x);
-
- if (min_val < 0) // it's outside limits, clamp:
- {
- pt.x = copysignf(bounds.x, sphere.x);
- }
-
- // check if bounds.x - sphere.x > bounds.x + sphere.x
- min_val = fabsf(min_val);
-
- if (sphere.x > 0)
- {
- min_idx = 0;
- }
- else
- {
- min_idx = 1;
- }
-
- curr_val = bounds.y - fabsf(sphere.y); // check if sphere-y is outside y-lim
-
- if (curr_val < 0) // outside obb-y, we clamp point to bounds
- {
- pt.y = copysignf(bounds.y, sphere.y);
- }
- curr_val = fabsf(curr_val);
-
- if (curr_val < min_val)
- {
- min_val = curr_val;
-
- if (sphere.y > 0)
- {
- min_idx = 2;
- }
- else
- {
- min_idx = 3;
- }
- }
- curr_val = bounds.z - fabsf(sphere.z); // distance to -ve bound
-
- if (curr_val < 0)
- {
- pt.y = copysignf(bounds.z, sphere.z);
- }
- curr_val = fabsf(curr_val);
-
- if (curr_val < min_val)
- {
- min_val = curr_val;
-
- if (sphere.z > 0)
- {
- min_idx = 4;
- }
- else
- {
- min_idx = 5;
- }
- }
-
- // we move pt's value to bounds on the closest face dimension:
- switch (min_idx)
- {
- case 0:
- pt.x = bounds.x;
- break;
-
- case 1:
- pt.x = -1 * bounds.x;
- break;
-
- case 2:
- pt.y = bounds.y;
- break;
-
- case 3:
- pt.y = -1 * bounds.y;
- break;
-
- case 4:
- pt.z = bounds.z;
- break;
-
- case 5:
- pt.z = -1 * bounds.z;
- break;
-
- default:
- break;
- }
- }
-
__device__ __forceinline__ float
- compute_distance(const float3& bounds_w_radius,
- const float4& sphere) // pass in cl_pt
- {// compute closest point:
- float4 cl_pt = make_float4(sphere.x, sphere.y, sphere.z, 0);
-
- compute_closest_point(bounds_w_radius, sphere, cl_pt);
-
- // clamp:
-
+ compute_distance_fn(
+ const float3& bounds,
+ const float4& sphere,
+ const float max_distance,
+ float3& delta,
+ float& sph_dist,
+ float& distance,
+ bool& inside) // pass in cl_pt
+ {
+
// compute distance:
- return sphere_length(sphere, cl_pt); // cl_pt includes radius, and sphere also has radius
+ float4 loc_sphere = sphere;
+ loc_sphere.w = max_distance;
+ distance = max_distance;
+ check_sphere_aabb(bounds, loc_sphere, inside, delta, distance, sph_dist);
+
+ //distance = fabsf(distance);
+ return distance;
}
+
+template
+__device__ __forceinline__ void scale_eta_metric_vector(
+const float eta,
+float4 &sum_pt)
+{
+ float sph_dist = sum_pt.w;
+
+ if (sph_dist == 0)
+ {
+ sum_pt.x = 0;
+ sum_pt.y = 0;
+ sum_pt.z = 0;
+ sum_pt.w = 0;
+
+ return;
+ }
+ sum_pt.w = sph_dist - eta;
+ //sum_pt.x = sum_pt.x / sph_dist;
+ //sum_pt.y = sum_pt.y / sph_dist;
+ //sum_pt.z = sum_pt.z / sph_dist;
+
+ if (SCALE_METRIC)
+ {
+ if (eta > 0.0 && sph_dist > 0)
+ {
+ //sum_pt.x = sum_pt.x * (1/sph_dist);
+ //sum_pt.y = sum_pt.y * (1/sph_dist);
+ //sum_pt.z = sum_pt.z * (1/sph_dist);
+
+ if (sph_dist> eta)
+ {
+ sum_pt.w = sph_dist - 0.5 * eta;
+
+
+ } else if (sph_dist <= eta)
+ {
+
+ sum_pt.w = (0.5 / eta) * (sph_dist) * (sph_dist);
+ const float scale = (1 / eta) * (sph_dist);
+ sum_pt.x = scale * sum_pt.x;
+ sum_pt.y = scale * sum_pt.y;
+ sum_pt.z = scale * sum_pt.z;
+ }
+
+ }
+
+ }
+}
+
+ template
+ __device__ __forceinline__ void scale_eta_metric(
+ const float3 delta,
+ const float sph_dist,
+ const float eta,
+ float4& sum_pt)
+ {
+ // compute distance:
+ //float sph_dist = 0;
+
+ sum_pt.x = delta.x;
+ sum_pt.y = delta.y;
+ sum_pt.z = delta.z;
+ sum_pt.w = sph_dist;
+
+
+ if(SCALE_METRIC)
+ {
+
+ if (sph_dist > 0)
+ {
+
+ if (sph_dist > eta)
+ {
+ sum_pt.w = sph_dist - 0.5 * eta;
+ }
+ else if (sph_dist <= eta)
+ {
+ sum_pt.w = (0.5 / eta) * (sph_dist) * (sph_dist);
+ const float scale = (1.0 / eta) * (sph_dist);
+ sum_pt.x = scale * sum_pt.x;
+ sum_pt.y = scale * sum_pt.y;
+ sum_pt.z = scale * sum_pt.z;
+ }
+
+ }
+ else
+ {
+ sum_pt.x = 0.0;
+ sum_pt.y = 0.0;
+ sum_pt.z = 0.0;
+ sum_pt.w = 0.0;
+
+ }
+
+ }
+
+
+
+
+
+ }
+
+
+ template
__device__ __forceinline__ void scale_eta_metric(const float4& sphere, const float4& cl_pt,
const float eta,
+ const bool inside,
float4& sum_pt)
{
// compute distance:
- float sph_dist = 0;
+ float distance = 0;
- sph_dist = sphere_length(sphere, cl_pt);
+ scale_eta_metric(sphere, cl_pt, eta, inside, sum_pt, distance);
+
+
+ }
- if (sph_dist == 0)
+
+ template
+ __device__ __forceinline__ void
+ compute_voxel_location_params(
+ const grid_scalar_t *grid_features,
+ const float4& loc_grid_params,
+ const float4& loc_sphere,
+ int3 &xyz_loc,
+ int3 &xyz_grid,
+ float &interpolated_distance,
+ int &voxel_idx)
+ {
+
+
+ // convert location to index: can use floor to cast to int.
+ // to account for negative values, add 0.5 * bounds.
+ const float3 loc_grid = make_float3(loc_grid_params.x, loc_grid_params.y, loc_grid_params.z);
+ const float3 sphere = make_float3(loc_sphere.x, loc_sphere.y, loc_sphere.z);
+ // xyz_loc = make_int3(floorf((sphere + 0.5 * loc_grid) / loc_grid_params.w));
+ xyz_loc = make_int3(floorf((sphere) / loc_grid_params.w) + (0.5 * loc_grid/loc_grid_params.w));
+ xyz_grid = make_int3(floorf(loc_grid / loc_grid_params.w));
+
+ // find next nearest voxel to current point and then do weighted interpolation:
+ voxel_idx = xyz_loc.x * xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z;
+
+
+ // compute interpolation distance between voxel origin and sphere location:
+ get_array_value(grid_features, voxel_idx, interpolated_distance);// + 0.5 * loc_grid_params.w;
+ if(!INTERPOLATION)
{
- sum_pt.x = 0;
- sum_pt.y = 0;
- sum_pt.z = 0;
- sum_pt.w = 0;
+ interpolated_distance += 0.5 * loc_grid_params.w;//max(0.0, (0.3 * loc_grid_params.w) - loc_sphere.w);
+ }
+ if(INTERPOLATION)
+ {
+ //
+ float3 voxel_origin = (make_float3(xyz_loc) * loc_grid_params.w) - (loc_grid/2);
+
+
+ float3 delta = sphere - voxel_origin;
+ int3 next_loc = make_int3(floorf((make_float3(xyz_loc) + normalize(delta))));
+ float ratio = length(delta)/loc_grid_params.w;
+ int next_voxel_idx = next_loc.x * xyz_grid.y * xyz_grid.z + next_loc.y * xyz_grid.z + next_loc.z;
+
+ interpolated_distance = ratio * interpolated_distance + (1 - ratio) * get_array_value(grid_features, next_voxel_idx) +
+ max(0.0, (ratio * loc_grid_params.w) - loc_sphere.w);;
+
+ }
+
+
+
+
+ }
+
+ template
+ __device__ __forceinline__ void
+ compute_voxel_index(
+ const grid_scalar_t *grid_features,
+ const float4& loc_grid_params,
+ const float4& loc_sphere,
+ int &voxel_idx,
+ int3 &xyz_loc,
+ int3 &xyz_grid,
+ float &interpolated_distance)
+ {
+ // check if sphere is out of bounds
+ // loc_grid_params.x contains bounds
+ float4 local_bounds = 0.5*loc_grid_params - 2*loc_grid_params.w;
+
+ if (loc_sphere.x <= -1 * (local_bounds.x) ||
+ loc_sphere.x >= (local_bounds.x) ||
+ loc_sphere.y <= -1 * (local_bounds.y) ||
+ loc_sphere.y >= (local_bounds.y) ||
+ loc_sphere.z <= -1 * (local_bounds.z) ||
+ loc_sphere.z >= (local_bounds.z))
+ {
+ voxel_idx = -1;
return;
}
- sum_pt.x = (sphere.x - cl_pt.x) / sph_dist;
- sum_pt.y = (sphere.y - cl_pt.y) / sph_dist;
- sum_pt.z = (sphere.z - cl_pt.z) / sph_dist;
+
+ compute_voxel_location_params(grid_features, loc_grid_params, loc_sphere, xyz_loc, xyz_grid, interpolated_distance, voxel_idx);
+ // convert location to index: can use floor to cast to int.
+ // to account for negative values, add 0.5 * bounds.
+
+
+
- if (eta > 0.0)
- {
- // sph_dist = sph_dist - eta;
- if (sph_dist > eta)
- {
- sum_pt.w = sph_dist - 0.5 * eta;
- }
- else if (sph_dist <= eta)
- {
- sum_pt.w = (0.5 / eta) * (sph_dist) * (sph_dist);
- const float scale = (1 / eta) * (sph_dist);
- sum_pt.x = scale * sum_pt.x;
- sum_pt.y = scale * sum_pt.y;
- sum_pt.z = scale * sum_pt.z;
- }
- }
- else
+ }
+
+
+
+
+
+ template
+ __device__ __forceinline__ void
+ compute_voxel_fd_gradient(
+ const grid_scalar_t *grid_features,
+ const int voxel_layer_start_idx,
+ const int3& xyz_loc,
+ const int3& xyz_grid,
+ const float voxel_size,
+ float4 &cl_pt)
+ {
+
+ float3 d_grad;
+ if (CENTRAL_DIFFERENCE)
+ {
+
+ // x difference:
+ int x_plus, x_minus, y_plus, y_minus, z_plus, z_minus;
+
+ x_plus = (xyz_loc.x + 1) * xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z;
+ x_minus = (xyz_loc.x - 1)* xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z;
+
+ y_plus = (xyz_loc.x) * xyz_grid.y * xyz_grid.z + (xyz_loc.y + 1) * xyz_grid.z + xyz_loc.z;
+ y_minus = (xyz_loc.x )* xyz_grid.y * xyz_grid.z + (xyz_loc.y -1) * xyz_grid.z + xyz_loc.z;
+
+ z_plus = (xyz_loc.x) * xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z + 1;
+ z_minus = (xyz_loc.x)* xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z - 1;
+
+
+ float3 d_plus = make_float3(
+ get_array_value(grid_features,voxel_layer_start_idx + x_plus),
+ get_array_value(grid_features, voxel_layer_start_idx + y_plus),
+ get_array_value(grid_features,voxel_layer_start_idx + z_plus));
+ float3 d_minus = make_float3(
+ get_array_value(grid_features,voxel_layer_start_idx + x_minus),
+ get_array_value(grid_features, voxel_layer_start_idx + y_minus),
+ get_array_value(grid_features,voxel_layer_start_idx + z_minus));
+
+
+ d_grad = (d_plus - d_minus) * (1/(2*voxel_size));
+ }
+ else
+ {
+ // x difference:
+ int x_minus,y_minus, z_minus;
+
+ x_minus = (xyz_loc.x - 1)* xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z;
+ y_minus = (xyz_loc.x )* xyz_grid.y * xyz_grid.z + (xyz_loc.y -1) * xyz_grid.z + xyz_loc.z;
+ z_minus = (xyz_loc.x)* xyz_grid.y * xyz_grid.z + xyz_loc.y * xyz_grid.z + xyz_loc.z - 1;
+
+
+ float3 d_plus = make_float3(cl_pt.w, cl_pt.w, cl_pt.w);
+ float3 d_minus = make_float3(
+ get_array_value(grid_features,voxel_layer_start_idx + x_minus),
+ get_array_value(grid_features, voxel_layer_start_idx + y_minus),
+ get_array_value(grid_features,voxel_layer_start_idx + z_minus));
+
+
+ d_grad = (d_plus - d_minus) * (1/voxel_size);
+ }
+
+
+ if (NORMALIZE)
+ {
+ if (!(d_grad.x ==0 && d_grad.y == 0 && d_grad.z == 0))
{
- sum_pt.w = sph_dist;
+ d_grad = normalize(d_grad);
}
}
-
- /**
- * @brief Compute distance and gradient, with a smooth l2 loss when distance >0
- * and < eta
- *
- * @param bounds_w_radius
- * @param sphere
- * @param sum_pt
- * @param eta
- * @return __device__
- */
- __device__ __forceinline__ void
- compute_sphere_gradient(const float3& bounds_w_radius, const float4& sphere,
- float4& sum_pt, const float eta = 0.0)
+ cl_pt.x = d_grad.x;
+ cl_pt.y = d_grad.y;
+ cl_pt.z = d_grad.z;
+ if (ADD_NOISE)
{
- // compute closest point:
- float4 cl_pt = make_float4(sphere.x, sphere.y, sphere.z, 0.0);
-
- compute_closest_point(bounds_w_radius, sphere, cl_pt);
-
- scale_eta_metric(sphere, cl_pt, eta, sum_pt);
+ if (cl_pt.z == 0 && cl_pt.x == 0 && cl_pt.y == 0)
+ {
+ cl_pt.x = 0.001;
+ cl_pt.y = 0.001;
+ }
+ }
+
}
+ template
__device__ __forceinline__ void
- compute_sphere_gradient_add(const float3& bounds_w_radius, const float4& sphere,
- float4& sum_pt, const float k0 = 1.0,
- const float eta = 0.0)
+ compute_sphere_voxel_gradient(const grid_scalar_t *grid_features,
+ const int voxel_layer_start_idx,
+ const int num_voxels,
+ const float4& loc_grid_params,
+ const float4& loc_sphere,
+ float4 &sum_pt,
+ float &signed_distance,
+ const float eta = 0.0,
+ const float max_distance = -10.0,
+ const bool transform_back = true)
{
- // compute closest point:
- float4 cl_pt = make_float4(sphere.x, sphere.y, sphere.z, 0.0);
+ int voxel_idx = 0;
+ int3 xyz_loc = make_int3(0,0,0);
+ int3 xyz_grid = make_int3(0,0,0);
+ float interpolated_distance = 0.0;
+ compute_voxel_index(grid_features, loc_grid_params, loc_sphere, voxel_idx, xyz_loc, xyz_grid, interpolated_distance);
+ if (voxel_idx < 0 || voxel_idx >= num_voxels)
+ {
+ sum_pt.w = VOXEL_UNOBSERVED_DISTANCE;
+ signed_distance = VOXEL_UNOBSERVED_DISTANCE;
+ return;
+ }
+
+
+ //sum_pt.w = get_array_value(grid_features,voxel_layer_start_idx + voxel_idx);
+ sum_pt.w = interpolated_distance;
+
+ if ((!SCALE_METRIC && transform_back)|| (transform_back && sum_pt.w > -loc_sphere.w ))
+ {
+ // compute closest point:
+ compute_voxel_fd_gradient(grid_features, voxel_layer_start_idx, xyz_loc, xyz_grid, loc_grid_params.w, sum_pt);
+ }
- compute_closest_point(bounds_w_radius, sphere, cl_pt);
+ signed_distance = sum_pt.w;
+
+ sum_pt.w += loc_sphere.w;
+
+
+ scale_eta_metric_vector(eta, sum_pt);
+
- // We need to check if sphere is colliding or outside
- // if distance > 0.0, then it's colliding
- // We can achieve this by adding eta to bounds_w_radius and then subtracting
- // it
- float4 local_sum_pt = make_float4(0.0, 0.0, 0.0, 0.0);
- scale_eta_metric(sphere, cl_pt, eta, local_sum_pt);
- sum_pt += local_sum_pt;
}
+
+
+
+
__device__ __forceinline__ void check_jump_distance(
const float4& loc_sphere_1, const float4 loc_sphere_0, const float k0,
- const float eta, const float3& loc_bounds, const float3& grad_loc_bounds,
+ const float3& bounds,
+ const float max_distance,
+ float3& delta,
+ float& sph_dist,
+ float& distance,
+ bool& inside,
+ const float eta,
float4& sum_pt,
float& curr_jump_distance) // we can pass in interpolated sphere here, also
// need to pass cl_pt for use in
@@ -586,17 +935,27 @@ namespace Curobo
const float4 interpolated_sphere =
(k0) * loc_sphere_1 + (1 - k0) * loc_sphere_0;
- if (check_sphere_aabb(loc_bounds, interpolated_sphere))
+ if (check_sphere_aabb(bounds, interpolated_sphere, inside, delta, distance, sph_dist))
{
- compute_sphere_gradient_add(grad_loc_bounds, interpolated_sphere, sum_pt,
- k0, eta); // true, loc_sphere_1 works better
- curr_jump_distance += loc_sphere_1.w; // move by diameter of sphere
+ float4 loc_grad = make_float4(0,0,0,0);
+ scale_eta_metric(delta, sph_dist, eta, loc_grad);
+ sum_pt += loc_grad;
+
}
else
{
- const float dist = compute_distance(grad_loc_bounds, interpolated_sphere);
- curr_jump_distance += max(dist - 2 * loc_sphere_1.w, loc_sphere_1.w);
+ compute_distance_fn(
+ bounds,
+ interpolated_sphere,
+ max_distance,
+ delta,
+ sph_dist,
+ distance,
+ inside
+ );
}
+ curr_jump_distance += max(fabsf(distance), interpolated_sphere.w);
+
}
///////////////////////////////////////////////////////////////
@@ -608,15 +967,15 @@ namespace Curobo
// use of shared memory.
///////////////////////////////////////////////////////////////
- template
+ template
__device__ __forceinline__ void sphere_obb_collision_fn(
const scalar_t *sphere_position,
const int env_idx,
const int bn_sph_idx,
const int sph_idx,
- scalar_t *out_distance, const scalar_t *weight,
- const scalar_t *activation_distance, const scalar_t *obb_accel,
- const scalar_t *obb_bounds, const scalar_t *obb_mat,
+ dist_scalar_t *out_distance, const float *weight,
+ const float *activation_distance, const float *obb_accel,
+ const float *obb_bounds, const float *obb_mat,
const uint8_t *obb_enable, const int max_nobs, const int nboxes)
{
float max_dist = 0;
@@ -639,7 +998,11 @@ namespace Curobo
float4 obb_quat = make_float4(0.0);
float3 obb_pos = make_float3(0.0);
float3 loc_bounds = make_float3(0.0);
-
+ bool inside = false;
+ float distance = 0.0;
+ float sph_dist = 0.0;
+ float3 delta = make_float3(0,0,0);
+
for (int box_idx = 0; box_idx < nboxes; box_idx++)
{
if (obb_enable[start_box_idx + box_idx] == 0) // disabled obstacle
@@ -651,32 +1014,36 @@ namespace Curobo
load_obb_bounds(&obb_bounds[(start_box_idx + box_idx) * 4], loc_bounds);
transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
-
+
// first check if point is inside or outside box:
- if (check_sphere_aabb(loc_bounds, loc_sphere))
+ if (check_sphere_aabb(loc_bounds, loc_sphere, inside, delta, distance, sph_dist))
{
- max_dist = 1;
- break; // we exit without checking other cuboids if we found a collision.
+ // using same primitive functions:
+ max_dist = 1;
+ break; // we exit without checking other cuboids if we found a collision.
}
+
}
+
out_distance[bn_sph_idx] = weight[0] * max_dist;
}
- template
+ template
__device__ __forceinline__ void sphere_obb_distance_fn(
const scalar_t *sphere_position,
const int32_t env_idx,
const int bn_sph_idx,
const int sph_idx,
- scalar_t *out_distance, scalar_t *closest_pt, uint8_t *sparsity_idx,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_mat, const uint8_t *obb_enable, const int max_nobs,
+ dist_scalar_t *out_distance, scalar_t *closest_pt, uint8_t *sparsity_idx,
+ const float *weight, const float *activation_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_mat, const uint8_t *obb_enable, const int max_nobs,
const int nboxes, const bool transform_back)
{
float max_dist = 0.0;
+
const float eta = activation_distance[0];
float3 max_grad = make_float3(0.0, 0.0, 0.0);
@@ -703,7 +1070,11 @@ namespace Curobo
float4 obb_quat = make_float4(0.0);
float3 obb_pos = make_float3(0.0);
float3 loc_bounds = make_float3(0.0);
-
+ float4 loc_grad = make_float4(0,0,0,0);
+ bool inside = false;
+ float distance = 0.0;
+ float sph_dist = 0.0;
+ float3 delta = make_float3(0,0,0);
for (int box_idx = 0; box_idx < nboxes; box_idx++)
{
if (obb_enable[start_box_idx + box_idx] == 0) // disabled obstacle
@@ -716,26 +1087,45 @@ namespace Curobo
load_obb_bounds(&obb_bounds[(start_box_idx + box_idx) * 4], loc_bounds);
transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
-
+
// first check if point is inside or outside box:
- if (check_sphere_aabb(loc_bounds, loc_sphere))
+ if (check_sphere_aabb(loc_bounds, loc_sphere, inside, delta, distance, sph_dist))
{
// compute closest point:
- loc_bounds = loc_bounds + loc_sphere.w;
+ //loc_bounds = loc_bounds + loc_sphere.w;
// using same primitive functions:
- float4 cl;
- compute_sphere_gradient(loc_bounds, loc_sphere, cl, eta);
+ scale_eta_metric(delta, sph_dist, eta, loc_grad);
- max_dist += cl.w;
-
- if (transform_back)
+
+ if (SUM_COLLISIONS)
{
- inv_transform_vec_quat(obb_pos, obb_quat, cl, max_grad);
+ if (loc_grad.w > 0)
+ {
+ max_dist += loc_grad.w;
- // inv_transform_vec_quat_add(&obb_mat[(start_box_idx + box_idx) * 7], cl,
- // max_grad);
+ if (transform_back)
+ {
+
+ inv_transform_vec_quat_add(obb_pos, obb_quat, loc_grad, max_grad);
+ }
+ }
}
+ else
+ {
+ if (loc_grad.w > max_dist)
+ {
+ max_dist = loc_grad.w;
+
+ if (transform_back)
+ {
+ inv_transform_vec_quat(obb_pos, obb_quat, loc_grad, max_grad);
+
+ }
+ }
+
+ }
+
}
}
@@ -767,18 +1157,824 @@ namespace Curobo
sparsity_idx[bn_sph_idx] = 1;
}
- template
+
+ template
+ __device__ __forceinline__ void sphere_obb_esdf_fn(
+ const scalar_t *sphere_position,
+ const int32_t env_idx,
+ const int bn_sph_idx,
+ const int sph_idx,
+ dist_scalar_t *out_distance, scalar_t *closest_pt, uint8_t *sparsity_idx,
+ const float *weight, const float *activation_distance,
+ const float *max_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_mat, const uint8_t *obb_enable, const int max_nobs,
+ const int nboxes, const bool transform_back)
+ {
+
+ const float eta = max_distance[0];
+ float max_dist = -1 * eta;
+
+ float3 max_grad = make_float3(0.0, 0.0, 0.0);
+
+ // Load sphere_position input
+ float4 sphere_cache = *(float4 *)&sphere_position[bn_sph_idx * 4];
+ if (sphere_cache.w <= 0.0)
+ {
+ // write zeros for cost:
+ out_distance[bn_sph_idx] = 0;
+
+ // write zeros for gradient if not zero:
+ if (sparsity_idx[bn_sph_idx] != 0)
+ {
+ sparsity_idx[bn_sph_idx] = 0;
+ *(float4 *)&closest_pt[bn_sph_idx * 4] = make_float4(0.0);
+ }
+ return;
+ }
+ sphere_cache.w += eta;
+
+ //const float sphere_radius = sphere_cache.w + eta;
+
+ const int start_box_idx = max_nobs * env_idx;
+
+ float4 loc_sphere = make_float4(0.0);
+ float4 obb_quat = make_float4(0.0);
+ float3 obb_pos = make_float3(0.0);
+ float3 loc_bounds = make_float3(0.0);
+ bool inside = false;
+ float distance = 0.0;
+ float sph_dist = 0.0;
+ float3 delta = make_float3(0,0,0);
+ float4 loc_grad = make_float4(0,0,0,0);
+
+ for (int box_idx = 0; box_idx < nboxes; box_idx++)
+ {
+ if (obb_enable[start_box_idx + box_idx] == 0) // disabled obstacle
+ {
+ continue;
+ }
+
+ load_obb_pose(&obb_mat[(start_box_idx + box_idx) * 8], obb_pos,
+ obb_quat);
+ load_obb_bounds(&obb_bounds[(start_box_idx + box_idx) * 4], loc_bounds);
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
+
+ // first check if point is inside or outside box:
+ if (check_sphere_aabb(loc_bounds, loc_sphere, inside, delta, distance, sph_dist))
+ {
+ // compute closest point:
+
+
+ // using same primitive functions:
+ scale_eta_metric(delta, sph_dist, eta, loc_grad);
+
+
+ if (loc_grad.w > max_dist)
+ {
+ max_dist = loc_grad.w;
+
+ if (transform_back)
+ {
+ inv_transform_vec_quat(obb_pos, obb_quat, loc_grad, max_grad);
+
+ }
+ }
+ }
+ }
+ // subtract radius:
+ max_dist = max_dist - sphere_cache.w;
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bn_sph_idx * 4] = max_grad;
+ }
+ out_distance[bn_sph_idx] = max_dist;
+ }
+
+ template
+ __device__ __forceinline__ void sphere_voxel_distance_fn(
+ const geom_scalar_t *sphere_position,
+ const int32_t env_idx,
+ const int bn_sph_idx,
+ const int sph_idx,
+ dist_scalar_t *out_distance, grad_scalar_t *closest_pt, uint8_t *sparsity_idx,
+ const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const grid_scalar_t *grid_features,
+ const float *grid_params,
+ const float *obb_mat,
+ const uint8_t *obb_enable,
+ const int max_nobs,
+ const int num_voxels,
+ const bool transform_back)
+ {
+ float max_dist = 0.0;
+ float max_distance_local = max_distance[0];
+ max_distance_local = -1 * max_distance_local;
+ const float eta = activation_distance[0];
+ float3 max_grad = make_float3(0.0, 0.0, 0.0);
+
+ // Load sphere_position input
+ float4 sphere_cache = *(float4 *)&sphere_position[bn_sph_idx * 4];
+
+ if (sphere_cache.w <= 0.0)
+ {
+ // write zeros for cost:
+ out_distance[bn_sph_idx] = 0;
+
+ // write zeros for gradient if not zero:
+ if (sparsity_idx[bn_sph_idx] != 0)
+ {
+ sparsity_idx[bn_sph_idx] = 0;
+ *(float4 *)&closest_pt[bn_sph_idx * 4] = make_float4(0.0);
+ }
+ return;
+ }
+ sphere_cache.w += eta;
+ const int local_env_idx = max_nobs * env_idx;
+ float signed_distance = 0;
+
+ float4 loc_sphere = make_float4(0.0);
+ float4 obb_quat = make_float4(0.0);
+ float3 obb_pos = make_float3(0.0);
+ float4 loc_grid_params = make_float4(0.0);
+
+ if (NUM_LAYERS <= 4)
+ {
+
+ #pragma unroll
+ for (int layer_idx=0; layer_idx < NUM_LAYERS; layer_idx++)
+ {
+
+
+ int local_env_layer_idx = local_env_idx + layer_idx;
+ if (obb_enable[local_env_layer_idx] != 0) // disabled obstacle
+ {
+
+ load_obb_pose(&obb_mat[(local_env_layer_idx) * 8], obb_pos,
+ obb_quat);
+ loc_grid_params = *(float4 *)&grid_params[local_env_layer_idx*4];
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
+ int voxel_layer_start_idx = local_env_layer_idx * num_voxels;
+ // check distance:
+ float4 cl;
+ compute_sphere_voxel_gradient(grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, loc_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ max_dist += cl.w;
+ if (transform_back)
+ {
+ inv_transform_vec_quat_add(obb_pos, obb_quat, cl, max_grad);
+
+ }
+ }
+ }
+ }
+ }
+ else
+ {
+
+
+
+
+ for (int layer_idx=0; layer_idx < max_nobs; layer_idx++)
+ {
+
+
+ int local_env_layer_idx = local_env_idx + layer_idx;
+ if (obb_enable[local_env_layer_idx] != 0) // disabled obstacle
+ {
+
+ load_obb_pose(&obb_mat[(local_env_layer_idx) * 8], obb_pos,
+ obb_quat);
+ loc_grid_params = *(float4 *)&grid_params[local_env_layer_idx*4];
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
+ int voxel_layer_start_idx = local_env_layer_idx * num_voxels;
+ // check distance:
+ float4 cl;
+ compute_sphere_voxel_gradient(grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, loc_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ max_dist += cl.w;
+ if (transform_back)
+ {
+ inv_transform_vec_quat_add(obb_pos, obb_quat, cl, max_grad);
+
+ }
+ }
+ }
+ }
+ }
+ // sparsity opt:
+ if (max_dist == 0)
+ {
+ if (sparsity_idx[bn_sph_idx] == 0)
+ {
+ return;
+ }
+ sparsity_idx[bn_sph_idx] = 0;
+
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bn_sph_idx * 4] = max_grad; // max_grad is all zeros
+ }
+ out_distance[bn_sph_idx] = 0.0;
+ return;
+ }
+
+ // else max_dist != 0
+ max_dist = weight[0] * max_dist;
+
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bn_sph_idx * 4] = weight[0] * max_grad;
+ }
+ out_distance[bn_sph_idx] = max_dist;
+ sparsity_idx[bn_sph_idx] = 1;
+ }
+
+
+ template
+ __device__ __forceinline__ void swept_sphere_voxel_distance_fn(
+ const scalar_t *sphere_position,
+ const int env_idx,
+ const int b_idx,
+ const int h_idx,
+ const int sph_idx,
+ dist_scalar_t *out_distance,
+ scalar_t *closest_pt,
+ uint8_t *sparsity_idx,
+ const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const float *speed_dt,
+ const grid_scalar_t *grid_features,
+ const float *grid_params,
+ const float *grid_pose,
+ const uint8_t *grid_enable,
+ const int max_nobs,
+ const int env_ngrid,
+ const int num_voxels,
+ const int batch_size,
+ const int horizon,
+ const int nspheres,
+ const int sweep_steps,
+ const bool transform_back)
+ {
+ const int sw_steps = sweep_steps;
+ const int start_box_idx = max_nobs * env_idx;
+ const int b_addrs =
+ b_idx * horizon * nspheres; // + h_idx * n_spheres + sph_idx;
+
+ // We read the same obstacles across
+
+ // Load sphere_position input
+ // if h_idx == horizon -1, we just read the same index
+ const int bhs_idx = b_addrs + h_idx * nspheres + sph_idx;
+
+
+
+
+
+ float max_dist = 0.0;
+ float max_distance_local = max_distance[0];
+ max_distance_local = -1 * max_distance_local;
+ const float eta = activation_distance[0];
+ float3 max_grad = make_float3(0.0, 0.0, 0.0);
+
+ // Load sphere_position input
+ float4 sphere_1_cache = *(float4 *)&sphere_position[bhs_idx * 4];
+
+ if (sphere_1_cache.w <= 0.0)
+ {
+ // write zeros for cost:
+ out_distance[bhs_idx] = 0;
+
+ // write zeros for gradient if not zero:
+ if (sparsity_idx[bhs_idx] != 0)
+ {
+ sparsity_idx[bhs_idx] = 0;
+ *(float4 *)&closest_pt[bhs_idx * 4] = make_float4(0.0);
+ }
+ return;
+ }
+ sphere_1_cache.w += eta;
+ float4 loc_sphere_0, loc_sphere_2;
+
+ bool sweep_back = false;
+ bool sweep_fwd = false;
+ float sphere_0_distance, sphere_2_distance, sphere_0_len, sphere_2_len;
+
+
+ const float dt = speed_dt[0];
+ float4 sphere_0_cache = make_float4(0,0,0,0);
+ float4 sphere_2_cache = make_float4(0,0,0,0);
+
+ if (h_idx > 0)
+ {
+ sphere_0_cache =
+ *(float4 *)&sphere_position[b_addrs * 4 + (h_idx - 1) * nspheres * 4 + sph_idx * 4];
+ sphere_0_cache.w = sphere_1_cache.w;
+ sphere_0_distance = sphere_distance(sphere_0_cache, sphere_1_cache);
+ sphere_0_len = sphere_0_distance + sphere_0_cache.w * 2;
+
+ if (sphere_0_distance > 0.0)
+ {
+ sweep_back = true;
+ }
+ }
+
+ if (h_idx < horizon - 1)
+ {
+ sphere_2_cache =
+ *(float4 *)&sphere_position[b_addrs * 4 + (h_idx + 1) * nspheres * 4 + sph_idx * 4];
+ sphere_2_cache.w = sphere_1_cache.w;
+ sphere_2_distance = sphere_distance(sphere_2_cache, sphere_1_cache);
+ sphere_2_len = sphere_2_distance + sphere_2_cache.w * 2;
+
+ if (sphere_2_distance > 0.0)
+ {
+ sweep_fwd = true;
+ }
+ }
+
+ float signed_distance = 0.0;
+ const int local_env_idx = max_nobs * env_idx;
+
+ float4 loc_sphere = make_float4(0.0);
+ float4 obb_quat = make_float4(0.0);
+ float3 obb_pos = make_float3(0.0);
+ float4 loc_grid_params = make_float4(0.0);
+ float4 sum_grad = make_float4(0.0, 0.0, 0.0, 0.0);
+ float4 cl;
+ float jump_mid_distance = 0.0;
+ float k0, k1;
+ float temp_jump_distance = 0.0;
+
+ if (NUM_LAYERS <= 4)
+ {
+
+
+ #pragma unroll
+ for (int layer_idx=0; layer_idx < NUM_LAYERS; layer_idx++)
+ {
+ float curr_jump_distance = 0.0;
+
+ int local_env_layer_idx = local_env_idx + layer_idx;
+ sum_grad *= 0.0;
+ if (grid_enable[local_env_layer_idx] != 0) // disabled obstacle
+ {
+
+ load_obb_pose(&grid_pose[(local_env_layer_idx) * 8], obb_pos,
+ obb_quat);
+ loc_grid_params = *(float4 *)&grid_params[local_env_layer_idx*4];
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_1_cache, loc_sphere);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_0_cache, loc_sphere_0);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_2_cache, loc_sphere_2);
+
+ int voxel_layer_start_idx = local_env_layer_idx * num_voxels;
+ // check distance:
+ compute_sphere_voxel_gradient(grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, loc_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ jump_mid_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ jump_mid_distance = -1 * signed_distance;
+ }
+
+
+ jump_mid_distance = max(jump_mid_distance, loc_sphere.w);
+ curr_jump_distance = jump_mid_distance;
+ if (sweep_back && curr_jump_distance < sphere_0_distance/2)
+ {
+ for (int j=0; j= sphere_0_len/2)
+ {
+ break;
+ }
+ temp_jump_distance = 0.0;
+ k0 = 1 - (curr_jump_distance/sphere_0_len);
+ // compute collision
+ const float4 interpolated_sphere =
+ (k0)*loc_sphere + (1 - k0) * loc_sphere_0;
+
+ compute_sphere_voxel_gradient(
+ grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, interpolated_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ temp_jump_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ temp_jump_distance = -1 * signed_distance;
+ }
+ temp_jump_distance = max(temp_jump_distance, loc_sphere.w);
+ curr_jump_distance += temp_jump_distance;
+
+
+ }
+ }
+ curr_jump_distance = jump_mid_distance;
+ if (sweep_fwd && curr_jump_distance < sphere_2_distance/2)
+ {
+ for (int j=0; j= sphere_2_len/2)
+ {
+ break;
+ }
+ temp_jump_distance = 0.0;
+ k0 = 1 - (curr_jump_distance/sphere_2_len);
+ // compute collision
+ const float4 interpolated_sphere =
+ (k0)*loc_sphere + (1 - k0) * loc_sphere_2;
+
+ compute_sphere_voxel_gradient(
+ grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, interpolated_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ temp_jump_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ temp_jump_distance = -1 * signed_distance;
+ }
+ temp_jump_distance = max(temp_jump_distance, loc_sphere.w);
+ curr_jump_distance += temp_jump_distance;
+
+ }
+ }
+ if (sum_grad.w > 0.0 )
+ {
+ max_dist += sum_grad.w;
+ if (transform_back)
+ {
+ inv_transform_vec_quat_add(obb_pos, obb_quat, sum_grad, max_grad);
+ }
+
+ }
+
+
+
+
+ }
+ }
+ }
+ else
+ {
+
+
+
+ for (int layer_idx=0; layer_idx < max_nobs; layer_idx++)
+ {
+ float curr_jump_distance = 0.0;
+
+ int local_env_layer_idx = local_env_idx + layer_idx;
+ sum_grad *= 0.0;
+ if (grid_enable[local_env_layer_idx] != 0) // disabled obstacle
+ {
+
+ load_obb_pose(&grid_pose[(local_env_layer_idx) * 8], obb_pos,
+ obb_quat);
+ loc_grid_params = *(float4 *)&grid_params[local_env_layer_idx*4];
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_1_cache, loc_sphere);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_0_cache, loc_sphere_0);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_2_cache, loc_sphere_2);
+
+ int voxel_layer_start_idx = local_env_layer_idx * num_voxels;
+ // check distance:
+ compute_sphere_voxel_gradient(grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, loc_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ jump_mid_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ jump_mid_distance = -1 * signed_distance;
+ }
+
+
+ jump_mid_distance = max(jump_mid_distance, loc_sphere.w);
+ curr_jump_distance = jump_mid_distance;
+ if (sweep_back && curr_jump_distance < sphere_0_distance/2)
+ {
+ for (int j=0; j= sphere_0_len/2)
+ {
+ break;
+ }
+ temp_jump_distance = 0.0;
+ k0 = 1 - (curr_jump_distance/sphere_0_len);
+ // compute collision
+ const float4 interpolated_sphere =
+ (k0)*loc_sphere + (1 - k0) * loc_sphere_0;
+
+ compute_sphere_voxel_gradient(
+ grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, interpolated_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ temp_jump_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ temp_jump_distance = -1 * signed_distance;
+ }
+ temp_jump_distance = max(temp_jump_distance, loc_sphere.w);
+ curr_jump_distance += temp_jump_distance;
+
+
+ }
+ }
+ curr_jump_distance = jump_mid_distance;
+ if (sweep_fwd && curr_jump_distance < sphere_2_distance/2)
+ {
+ for (int j=0; j= sphere_2_len/2)
+ {
+ break;
+ }
+ temp_jump_distance = 0.0;
+ k0 = 1 - (curr_jump_distance/sphere_2_len);
+ // compute collision
+ const float4 interpolated_sphere =
+ (k0)*loc_sphere + (1 - k0) * loc_sphere_2;
+
+ compute_sphere_voxel_gradient(
+ grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, interpolated_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>0.0)
+ {
+ sum_grad += cl;
+ temp_jump_distance = signed_distance;
+ }
+ else if (signed_distance != VOXEL_UNOBSERVED_DISTANCE)
+ {
+ temp_jump_distance = -1 * signed_distance;
+ }
+ temp_jump_distance = max(temp_jump_distance, loc_sphere.w);
+ curr_jump_distance += temp_jump_distance;
+
+ }
+ }
+ if (sum_grad.w > 0.0 )
+ {
+ max_dist += sum_grad.w;
+ if (transform_back)
+ {
+ inv_transform_vec_quat_add(obb_pos, obb_quat, sum_grad, max_grad);
+ }
+
+ }
+
+
+
+
+ }
+ }
+ }
+ // sparsity opt:
+ if (max_dist == 0)
+ {
+ if (sparsity_idx[bhs_idx] == 0)
+ {
+ return;
+ }
+ sparsity_idx[bhs_idx] = 0;
+
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bhs_idx * 4] = max_grad; // max_grad is all zeros
+ }
+ out_distance[bhs_idx] = 0.0;
+ return;
+ }
+
+ // computer speed metric here:
+ if (ENABLE_SPEED_METRIC)
+ {
+ if (sweep_back && sweep_fwd)
+ {
+ scale_speed_metric(sphere_0_cache, sphere_1_cache, sphere_2_cache, dt,
+ transform_back, max_dist, max_grad);
+ }
+ }
+ // else max_dist != 0
+ max_dist = weight[0] * max_dist;
+
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bhs_idx * 4] = weight[0] * max_grad;
+ }
+ out_distance[bhs_idx] = max_dist;
+ sparsity_idx[bhs_idx] = 1;
+
+ }
+
+
+ template
+ __global__ void swept_sphere_voxel_distance_jump_kernel(
+ const scalar_t *sphere_position,
+ dist_scalar_t *out_distance,
+ scalar_t *closest_pt, uint8_t *sparsity_idx, const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const float *speed_dt,
+ const grid_scalar_t *grid_features, const float *grid_params,
+ const float *grid_pose, const uint8_t *grid_enable,
+ const int32_t *n_env_grid, const int32_t *env_query_idx, const int max_nobs,
+ const int num_voxels,
+ const int batch_size, const int horizon, const int nspheres,
+ const int sweep_steps,
+ const bool transform_back)
+ {
+ const int t_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ const int b_idx = t_idx / (horizon * nspheres);
+
+ const int h_idx = (t_idx - b_idx * (horizon * nspheres)) / nspheres;
+ const int sph_idx = (t_idx - b_idx * horizon * nspheres - h_idx * nspheres);
+
+ if ((sph_idx >= nspheres) || (b_idx >= batch_size) || (h_idx >= horizon))
+ {
+ return;
+ }
+
+ int env_idx = 0;
+
+ if (BATCH_ENV_T)
+ {
+ env_idx = env_query_idx[b_idx];
+ }
+
+ const int env_nboxes = n_env_grid[env_idx];
+ swept_sphere_voxel_distance_fn(
+ sphere_position, env_idx, b_idx, h_idx, sph_idx,
+ out_distance, closest_pt,
+ sparsity_idx, weight, activation_distance, max_distance,
+ speed_dt,
+ grid_features,
+ grid_params, grid_pose, grid_enable, max_nobs, env_nboxes, num_voxels, batch_size,
+ horizon, nspheres, sweep_steps, transform_back);
+ }
+
+ template
+ __device__ __forceinline__ void sphere_voxel_esdf_fn(
+ const geom_scalar_t *sphere_position,
+ const int32_t env_idx,
+ const int bn_sph_idx,
+ const int sph_idx,
+ dist_scalar_t *out_distance, grad_scalar_t *closest_pt, uint8_t *sparsity_idx,
+ const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const grid_scalar_t *grid_features,
+ const float *grid_params,
+ const float *obb_mat,
+ const uint8_t *obb_enable,
+ const int max_nobs,
+ const int num_voxels,
+ const bool transform_back)
+ {
+ float max_distance_local = max_distance[0];
+ const float eta = max_distance_local;
+ float max_dist = -1 * max_distance_local;
+ max_distance_local = -1 * max_distance_local;
+;
+
+ float3 max_grad = make_float3(0.0, 0.0, 0.0);
+
+ // Load sphere_position input
+ float4 sphere_cache = *(float4 *)&sphere_position[bn_sph_idx * 4];
+
+ if (sphere_cache.w <= 0.0)
+ {
+ // write zeros for cost:
+ out_distance[bn_sph_idx] = 0;
+
+ // write zeros for gradient if not zero:
+ if (sparsity_idx[bn_sph_idx] != 0)
+ {
+ sparsity_idx[bn_sph_idx] = 0;
+ *(float4 *)&closest_pt[bn_sph_idx * 4] = make_float4(0.0);
+ }
+ return;
+ }
+ const float sphere_radius = sphere_cache.w;
+
+ sphere_cache.w += eta;
+ const int local_env_idx = max_nobs * env_idx;
+
+ float4 loc_sphere = make_float4(0.0);
+ float4 obb_quat = make_float4(0.0);
+ float3 obb_pos = make_float3(0.0);
+ float4 loc_grid_params = make_float4(0.0);
+
+ float signed_distance = 0;
+
+ for (int layer_idx=0; layer_idx < max_nobs; layer_idx++)
+ {
+
+
+ int local_env_layer_idx = local_env_idx + layer_idx;
+ if (obb_enable[local_env_layer_idx] != 0) // disabled obstacle
+ {
+
+ load_obb_pose(&obb_mat[(local_env_layer_idx) * 8], obb_pos,
+ obb_quat);
+ loc_grid_params = *(float4 *)&grid_params[local_env_layer_idx*4];
+
+ transform_sphere_quat(obb_pos, obb_quat, sphere_cache, loc_sphere);
+ int voxel_layer_start_idx = local_env_layer_idx * num_voxels;
+ // check distance:
+ float4 cl;
+
+ compute_sphere_voxel_gradient(grid_features,
+ voxel_layer_start_idx, num_voxels,
+ loc_grid_params, loc_sphere, cl, signed_distance, eta,
+ max_distance_local, transform_back);
+ if (cl.w>max_dist)
+ {
+ max_dist = cl.w;
+ if (transform_back)
+ {
+ inv_transform_vec_quat(obb_pos, obb_quat, cl, max_grad);
+
+ }
+ }
+ }
+ }
+
+
+
+
+ max_dist = max_dist - sphere_radius;
+ if (transform_back)
+ {
+ *(float3 *)&closest_pt[bn_sph_idx * 4] = max_grad;
+ }
+ out_distance[bn_sph_idx] = max_dist;
+
+ }
+
+
+ template
__device__ __forceinline__ void swept_sphere_obb_distance_fn(
const scalar_t *sphere_position,
const int env_idx, const int b_idx,
const int h_idx, const int sph_idx,
- scalar_t *out_distance,
+ dist_scalar_t *out_distance,
scalar_t *closest_pt,
uint8_t *sparsity_idx,
- const scalar_t *weight,
- const scalar_t *activation_distance, const scalar_t *speed_dt,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_mat,
+ const float *weight,
+ const float *activation_distance, const float *speed_dt,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_mat,
const uint8_t *obb_enable,
const int max_nobs,
const int nboxes, const int batch_size, const int horizon,
@@ -789,9 +1985,13 @@ namespace Curobo
// warp wide reductions should only happen across nspheres in same batch and horizon
//
// extern __shared__ float psum[];
+ int sw_steps = SWEEP_STEPS;
+ const float max_distance = 1000.0;
- const int sw_steps = sweep_steps;
- const float fl_sw_steps = sw_steps;
+ if (SWEEP_STEPS == -1)
+ {
+ sw_steps = sweep_steps;
+ }
const int start_box_idx = max_nobs * env_idx;
const int b_addrs =
b_idx * horizon * nspheres; // + h_idx * n_spheres + sph_idx;
@@ -829,8 +2029,13 @@ namespace Curobo
const float dt = speed_dt[0];
const float eta = activation_distance[0];
sphere_1_cache.w += eta;
- float4 sphere_0_cache, sphere_2_cache;
-
+ float4 sphere_0_cache = make_float4(0,0,0,0);
+ float4 sphere_2_cache = make_float4(0,0,0,0);
+ float4 loc_grad = make_float4(0,0,0,0);
+ bool inside = false;
+ float distance = 0.0;
+ float sph_dist = 0.0;
+ float3 delta = make_float3(0,0,0);
if (h_idx > 0)
{
sphere_0_cache =
@@ -880,39 +2085,45 @@ namespace Curobo
load_obb_bounds(&obb_bounds[(start_box_idx + box_idx) * 4], loc_bounds);
float curr_jump_distance = 0.0;
- const float3 grad_loc_bounds = loc_bounds + sphere_1_cache.w; // assuming sphere radius
+ //const float3 grad_loc_bounds = loc_bounds + sphere_1_cache.w; // assuming sphere radius
// doesn't change
transform_sphere_quat(obb_pos, obb_quat, sphere_1_cache, loc_sphere_1);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_0_cache, loc_sphere_0);
+ transform_sphere_quat(obb_pos, obb_quat, sphere_2_cache, loc_sphere_2);
// assuming sphere position is in box frame:
// read data:
float4 sum_pt = make_float4(0.0, 0.0, 0.0, 0.0);
-
+ curr_jump_distance = 0.0;
// check at exact timestep:
- if (check_sphere_aabb(loc_bounds, loc_sphere_1))
+ if (check_sphere_aabb(loc_bounds, loc_sphere_1, inside, delta, curr_jump_distance, sph_dist))
{
- compute_sphere_gradient(grad_loc_bounds, loc_sphere_1, sum_pt, eta);
- curr_jump_distance = loc_sphere_1.w; // TODO: check if this is better
+
+ scale_eta_metric(delta, sph_dist, eta, sum_pt);
}
else if (sweep_back || sweep_fwd)
{
// there is no collision, compute the distance to obstacle:
- curr_jump_distance = compute_distance(grad_loc_bounds, loc_sphere_1) -
- loc_sphere_1.w; // - eta;
+ curr_jump_distance = compute_distance_fn(loc_bounds, loc_sphere_1,
+ max_distance, delta, sph_dist, distance, inside);
+
}
+ curr_jump_distance = fabsf(curr_jump_distance) - loc_sphere_1.w;
+ curr_jump_distance = max(curr_jump_distance, loc_sphere_1.w);
+
const float jump_mid_distance = curr_jump_distance;
// compute distance between sweep spheres:
if (sweep_back && (jump_mid_distance < sphere_0_distance / 2))
{
- transform_sphere_quat(obb_pos, obb_quat, sphere_0_cache, loc_sphere_0);
// get unit vector:
// loc_sphere_0 = (loc_sphere_0 - loc_sphere_1)/(sphere_0_len);
// loop over sweep steps and accumulate distance:
+ #pragma unroll
for (int j = 0; j < sw_steps; j++)
{
// jump by current jump distance:
@@ -925,17 +2136,25 @@ namespace Curobo
break;
}
k0 = 1 - (curr_jump_distance / sphere_0_len);
- check_jump_distance(loc_sphere_1, loc_sphere_0, k0, eta, loc_bounds,
- grad_loc_bounds, sum_pt, curr_jump_distance);
+ check_jump_distance(loc_sphere_1, loc_sphere_0,
+ k0,
+ loc_bounds,
+ max_distance,
+ delta,
+ sph_dist,
+ distance,
+ inside,
+ eta,
+ sum_pt,
+ curr_jump_distance);
}
}
if (sweep_fwd && (jump_mid_distance < (sphere_2_len / 2)))
{
curr_jump_distance = jump_mid_distance;
- transform_sphere_quat(obb_pos, obb_quat, sphere_2_cache, loc_sphere_2);
-
+ #pragma unroll
for (int j = 0; j < sw_steps; j++)
{
if (curr_jump_distance >= (sphere_2_len / 2))
@@ -943,23 +2162,54 @@ namespace Curobo
break;
}
k0 = 1 - curr_jump_distance / sphere_2_len;
- check_jump_distance(loc_sphere_1, loc_sphere_2, k0, eta, loc_bounds,
- grad_loc_bounds, sum_pt, curr_jump_distance);
+ check_jump_distance(loc_sphere_1, loc_sphere_2,
+ k0,
+ loc_bounds,
+ max_distance,
+ delta,
+ sph_dist,
+ distance,
+ inside,
+ eta,
+ sum_pt,
+ curr_jump_distance);
}
}
-
- if (sum_pt.w > 0)
+ if (SUM_COLLISIONS)
+ {
+ if (sum_pt.w > 0) // max_dist starts at 0
{
max_dist += sum_pt.w;
// transform point back if required:
if (transform_back)
{
- inv_transform_vec_quat(obb_pos, obb_quat, sum_pt, max_grad);
+ //inv_transform_vec_quat(obb_pos, obb_quat, sum_pt, max_grad);
+ inv_transform_vec_quat_add(obb_pos, obb_quat, sum_pt, max_grad);
+
}
// break;// break after first obstacle collision
}
+ }
+ else
+ {
+
+ if (sum_pt.w > max_dist) // max_dist starts at 0
+ {
+ max_dist = sum_pt.w;
+
+ // transform point back if required:
+ if (transform_back)
+ {
+ inv_transform_vec_quat(obb_pos, obb_quat, sum_pt, max_grad);
+ //inv_transform_vec_quat_add(obb_pos, obb_quat, sum_pt, max_grad);
+
+ }
+
+ // break;// break after first obstacle collision
+ }
+ }
}
@@ -981,7 +2231,7 @@ namespace Curobo
}
// computer speed metric here:
- if (enable_speed_metric)
+ if (ENABLE_SPEED_METRIC)
{
if (sweep_back && sweep_fwd)
{
@@ -999,6 +2249,7 @@ namespace Curobo
out_distance[bhs_idx] = max_dist;
}
+
/**
* @brief Swept Collision checking. Note: This function currently does not
@@ -1026,14 +2277,14 @@ namespace Curobo
* @param sweep_steps
* @return __device__
*/
- template
+ template
__device__ __forceinline__ void swept_sphere_obb_collision_fn(
const scalar_t *sphere_position,
const int env_idx, const int b_idx,
- const int h_idx, const int sph_idx, scalar_t *out_distance,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_mat, const uint8_t *obb_enable, const int max_nobs,
+ const int h_idx, const int sph_idx, dist_scalar_t *out_distance,
+ const float *weight, const float *activation_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_mat, const uint8_t *obb_enable, const int max_nobs,
const int nboxes, const int batch_size, const int horizon,
const int nspheres, const int sweep_steps)
{
@@ -1045,7 +2296,12 @@ namespace Curobo
b_idx * horizon * nspheres; // + h_idx * n_spheres + sph_idx;
const int start_box_idx = max_nobs * env_idx;
const int bhs_idx = b_addrs + h_idx * nspheres + sph_idx;
-
+ float4 loc_grad = make_float4(0,0,0,0);
+ float3 delta = make_float3(0,0,0);
+ bool inside = false;
+ float curr_jump_distance = 0.0;
+ float distance = 0.0;
+ float sph_dist = 0.0;
// We read the same obstacles across
// Load sphere_position input
@@ -1101,11 +2357,14 @@ namespace Curobo
transform_sphere_quat(&in_obb_mat[0], sphere_1_cache, loc_sphere_1);
max_dist += box_idx;
-
- if (check_sphere_aabb(loc_bounds, loc_sphere_1))
+
+ if (check_sphere_aabb(loc_bounds, loc_sphere_1, inside, delta, curr_jump_distance, sph_dist))
{
- max_dist = 1;
+
+ max_dist = 1;
+ break;
}
+
if (h_idx > 0)
{
@@ -1120,10 +2379,12 @@ namespace Curobo
k1 = 1 - k0;
interpolated_sphere = k0 * loc_sphere_1 + (k1) * loc_sphere_0;
- if (check_sphere_aabb(loc_bounds, interpolated_sphere))
+ if (check_sphere_aabb(loc_bounds, interpolated_sphere, inside, delta, curr_jump_distance, sph_dist))
{
- max_dist = 1;
- break;
+
+ max_dist = 1;
+ break;
+
}
}
}
@@ -1139,8 +2400,7 @@ namespace Curobo
k0 = (j + 1) / (fl_sw_steps);
k1 = 1 - k0;
interpolated_sphere = k0 * loc_sphere_1 + (k1) * loc_sphere_2;
-
- if (check_sphere_aabb(loc_bounds, interpolated_sphere))
+ if (check_sphere_aabb(loc_bounds, interpolated_sphere, inside, delta, curr_jump_distance, sph_dist))
{
max_dist = 1;
break;
@@ -1156,13 +2416,15 @@ namespace Curobo
out_distance[b_addrs + h_idx * nspheres + sph_idx] = weight[0] * max_dist;
}
- template
+ template
__global__ void sphere_obb_distance_kernel(
const scalar_t *sphere_position,
- scalar_t *out_distance,
- scalar_t *closest_pt, uint8_t *sparsity_idx, const scalar_t *weight,
- const scalar_t *activation_distance, const scalar_t *obb_accel,
- const scalar_t *obb_bounds, const scalar_t *obb_mat,
+ dist_scalar_t *out_distance,
+ scalar_t *closest_pt, uint8_t *sparsity_idx, const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const float *obb_accel,
+ const float *obb_bounds, const float *obb_mat,
const uint8_t *obb_enable, const int32_t *n_env_obb,
const int32_t *env_query_idx,
const int max_nobs,
@@ -1185,40 +2447,56 @@ namespace Curobo
b_idx * horizon * nspheres + h_idx * nspheres + sph_idx;
int env_idx = 0;
- int env_nboxes = n_env_obb[0];
- if (batch_env_t)
+ if (BATCH_ENV_T)
{
env_idx =
env_query_idx[b_idx]; // read env idx from current batch idx
- env_nboxes =
- n_env_obb[env_idx]; // read nboxes in current environment
+
}
+ const int env_nboxes = n_env_obb[env_idx]; // read nboxes in current environment
+ if (COMPUTE_ESDF)
+ {
+ sphere_obb_esdf_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
+ closest_pt, sparsity_idx, weight, activation_distance, max_distance,
+ obb_accel, obb_bounds, obb_mat, obb_enable, max_nobs,
+ env_nboxes, transform_back);
- sphere_obb_distance_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
+ }
+ else
+ {
+ sphere_obb_distance_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
closest_pt, sparsity_idx, weight, activation_distance,
obb_accel, obb_bounds, obb_mat, obb_enable, max_nobs,
env_nboxes, transform_back);
+ }
+
// return the sphere distance here:
// sync threads and do block level reduction:
}
- template
- __global__ void swept_sphere_obb_distance_jump_kernel(
- const scalar_t *sphere_position,
- scalar_t *out_distance,
- scalar_t *closest_pt, uint8_t *sparsity_idx, const scalar_t *weight,
- const scalar_t *activation_distance, const scalar_t *speed_dt,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_pose, const uint8_t *obb_enable,
- const int32_t *n_env_obb, const int max_nobs, const int batch_size,
- const int horizon, const int nspheres, const int sweep_steps, const bool transform_back)
- {
- // This kernel jumps by sdf to only get gradients at collision points.
+ template
+ __global__ void sphere_voxel_distance_kernel(
+ const geom_scalar_t *sphere_position,
+ dist_scalar_t *out_distance,
+ grad_scalar_t *closest_pt,
+ uint8_t *sparsity_idx,
+ const float *weight,
+ const float *activation_distance,
+ const float *max_distance,
+ const grid_scalar_t *grid_features,
+ const float *grid_params, const float *obb_mat,
+ const uint8_t *obb_enable, const int32_t *n_env_obb,
+ const int32_t *env_query_idx,
+ const int max_nobs, const int num_voxels,
+ const int batch_size, const int horizon, const int nspheres,
+ const bool transform_back)
+ {
// spheres_per_block is number of spheres in a thread
// compute local sphere batch by first dividing threadidx/nboxes
+ // const int sph_idx = blockIdx.x * blockDim.x + threadIdx.x;
const int t_idx = blockIdx.x * blockDim.x + threadIdx.x;
const int b_idx = t_idx / (horizon * nspheres);
const int h_idx = (t_idx - b_idx * (horizon * nspheres)) / nspheres;
@@ -1228,26 +2506,49 @@ namespace Curobo
{
return;
}
+ const int bn_sph_idx =
+ b_idx * horizon * nspheres + h_idx * nspheres + sph_idx;
- const int env_idx = 0;
+ int env_idx = 0;
+ int env_nboxes = n_env_obb[0];
+ if (BATCH_ENV_T)
+ {
+ env_idx =
+ env_query_idx[b_idx]; // read env idx from current batch idx
+ env_nboxes =
+ n_env_obb[env_idx]; // read nboxes in current environment
+ }
+ if (COMPUTE_ESDF)
+ {
+
+ sphere_voxel_esdf_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
+ closest_pt, sparsity_idx, weight, activation_distance, max_distance,
+ grid_features, grid_params, obb_mat, obb_enable, max_nobs, num_voxels,
+ transform_back);
- const int env_nboxes = n_env_obb[env_idx];
- swept_sphere_obb_distance_fn(
- sphere_position, env_idx, b_idx, h_idx, sph_idx, out_distance, closest_pt,
- sparsity_idx, weight, activation_distance, speed_dt, obb_accel,
- obb_bounds, obb_pose, obb_enable, max_nobs, env_nboxes, batch_size,
- horizon, nspheres, sweep_steps, transform_back);
+ }
+ else
+ {
+ sphere_voxel_distance_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
+ closest_pt, sparsity_idx, weight, activation_distance, max_distance,
+ grid_features, grid_params, obb_mat, obb_enable, max_nobs, num_voxels,
+ transform_back);
+ }
+
+ // return the sphere distance here:
+ // sync threads and do block level reduction:
}
- template
- __global__ void swept_sphere_obb_distance_jump_batch_env_kernel(
+
+ template
+ __global__ void swept_sphere_obb_distance_jump_kernel(
const scalar_t *sphere_position,
- scalar_t *out_distance,
- scalar_t *closest_pt, uint8_t *sparsity_idx, const scalar_t *weight,
- const scalar_t *activation_distance, const scalar_t *speed_dt,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_pose, const uint8_t *obb_enable,
+ dist_scalar_t *out_distance,
+ scalar_t *closest_pt, uint8_t *sparsity_idx, const float *weight,
+ const float *activation_distance, const float *speed_dt,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_pose, const uint8_t *obb_enable,
const int32_t *n_env_obb, const int32_t *env_query_idx, const int max_nobs,
const int batch_size, const int horizon, const int nspheres,
const int sweep_steps,
@@ -1270,23 +2571,29 @@ namespace Curobo
return;
}
- const int env_idx = env_query_idx[b_idx];
+ int env_idx = 0;
+
+ if (BATCH_ENV_T)
+ {
+ env_idx = env_query_idx[b_idx];
+ }
+
const int env_nboxes = n_env_obb[env_idx];
- swept_sphere_obb_distance_fn(
+ swept_sphere_obb_distance_fn(
sphere_position, env_idx, b_idx, h_idx, sph_idx, out_distance, closest_pt,
sparsity_idx, weight, activation_distance, speed_dt, obb_accel,
obb_bounds, obb_pose, obb_enable, max_nobs, env_nboxes, batch_size,
horizon, nspheres, sweep_steps, transform_back);
}
- template
+ template
__global__ void swept_sphere_obb_collision_kernel(
const scalar_t *sphere_position,
- scalar_t *out_distance,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_pose, const uint8_t *obb_enable,
+ dist_scalar_t *out_distance,
+ const float *weight, const float *activation_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_pose, const uint8_t *obb_enable,
const int32_t *n_env_obb, const int max_nobs, const int batch_size,
const int horizon, const int nspheres, const int sweep_steps)
{
@@ -1311,13 +2618,13 @@ namespace Curobo
max_nobs, env_nboxes, batch_size, horizon, nspheres, sweep_steps);
}
- template
+ template
__global__ void swept_sphere_obb_collision_batch_env_kernel(
const scalar_t *sphere_position,
- scalar_t *out_distance,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_pose, const uint8_t *obb_enable,
+ dist_scalar_t *out_distance,
+ const float *weight, const float *activation_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_pose, const uint8_t *obb_enable,
const int32_t *n_env_obb, const int32_t *env_query_idx, const int max_nobs,
const int batch_size, const int horizon, const int nspheres,
const int sweep_steps)
@@ -1343,44 +2650,14 @@ namespace Curobo
max_nobs, env_nboxes, batch_size, horizon, nspheres, sweep_steps);
}
- template
- __global__ void sphere_obb_collision_kernel(
- const scalar_t *sphere_position,
- scalar_t *out_distance,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_mat, const uint8_t *obb_enable,
- const int32_t *n_env_obb, const int max_nobs, const int batch_size,
- const int horizon, const int nspheres)
- {
- // spheres_per_block is number of spheres in a thread
- // compute local sphere batch by first dividing threadidx/nboxes
- const int t_idx = blockIdx.x * blockDim.x + threadIdx.x;
- const int b_idx = t_idx / (horizon * nspheres);
- const int h_idx = (t_idx - b_idx * (horizon * nspheres)) / nspheres;
- const int sph_idx = (t_idx - b_idx * horizon * nspheres - h_idx * nspheres);
- if ((sph_idx >= nspheres) || (b_idx >= batch_size) || (h_idx >= horizon))
- {
- return;
- }
- const int env_idx = 0;
- const int env_nboxes = n_env_obb[env_idx];
- const int bn_sph_idx =
- b_idx * horizon * nspheres + h_idx * nspheres + sph_idx;
- sphere_obb_collision_fn(sphere_position,
- env_idx, bn_sph_idx, sph_idx, out_distance,
- weight, activation_distance, obb_accel, obb_bounds,
- obb_mat, obb_enable, max_nobs, env_nboxes);
- }
-
- template
+ template
__global__ void sphere_obb_collision_batch_env_kernel(
const scalar_t *sphere_position,
- scalar_t *out_distance,
- const scalar_t *weight, const scalar_t *activation_distance,
- const scalar_t *obb_accel, const scalar_t *obb_bounds,
- const scalar_t *obb_mat, const uint8_t *obb_enable,
+ dist_scalar_t *out_distance,
+ const float *weight, const float *activation_distance,
+ const float *obb_accel, const float *obb_bounds,
+ const float *obb_mat, const uint8_t *obb_enable,
const int32_t *n_env_obb, const int32_t *env_query_idx, const int max_nobs,
const int batch_size, const int horizon, const int nspheres)
{
@@ -1395,8 +2672,14 @@ namespace Curobo
{
return;
}
- const int env_idx = env_query_idx[b_idx];
+ int env_idx = 0;
+ if(BATCH_ENV_T)
+ {
+ env_idx = env_query_idx[b_idx];
+ }
+
const int env_nboxes = n_env_obb[env_idx];
+
const int bn_sph_idx =
b_idx * horizon * nspheres + h_idx * nspheres + sph_idx;
sphere_obb_collision_fn(sphere_position, env_idx, bn_sph_idx, sph_idx, out_distance,
@@ -1413,6 +2696,7 @@ sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 3
torch::Tensor closest_point, // batch size, 3
torch::Tensor sparsity_idx, const torch::Tensor weight,
const torch::Tensor activation_distance,
+ const torch::Tensor max_distance,
const torch::Tensor obb_accel, // n_boxes, 4, 4
const torch::Tensor obb_bounds, // n_boxes, 3
const torch::Tensor obb_pose, // n_boxes, 4, 4
@@ -1421,13 +2705,16 @@ sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 3
const torch::Tensor env_query_idx, // n_boxes, 4, 4
const int max_nobs, const int batch_size, const int horizon,
const int n_spheres, const bool transform_back,
- const bool compute_distance, const bool use_batch_env)
+ const bool compute_distance, const bool use_batch_env,
+ const bool sum_collisions,
+ const bool compute_esdf)
{
using namespace Curobo::Geometry;
cudaStream_t stream = at::cuda::getCurrentCUDAStream();
const int bnh_spheres = n_spheres * batch_size * horizon; //
-
+ const bool scale_metric = true;
int threadsPerBlock = bnh_spheres;
+ const bool sum_collisions_ = true;
if (threadsPerBlock > 128)
{
@@ -1435,93 +2722,85 @@ sphere_obb_clpt(const torch::Tensor sphere_position, // batch_size, 3
}
int blocksPerGrid = (bnh_spheres + threadsPerBlock - 1) / threadsPerBlock;
- if (use_batch_env)
+ if (!compute_distance)
{
- if (compute_distance)
+
+ AT_DISPATCH_FLOATING_TYPES(
+ distance.scalar_type(), "SphereObb_clpt_collision", ([&]{
+ auto collision_kernel = sphere_obb_collision_batch_env_kernel;
+ auto batch_collision_kernel = sphere_obb_collision_batch_env_kernel;
+ auto selected_k = collision_kernel;
+ if (use_batch_env)
{
- // TODO: call kernel based on flag:
- AT_DISPATCH_FLOATING_TYPES(
- distance.scalar_type(), "SphereObb_clpt", ([&] {
- sphere_obb_distance_kernel
- << < blocksPerGrid, threadsPerBlock, 0, stream >> > (
- sphere_position.data_ptr(),
- distance.data_ptr(),
- closest_point.data_ptr(),
- sparsity_idx.data_ptr(),
- weight.data_ptr(),
- activation_distance.data_ptr(),
- obb_accel.data_ptr(),
- obb_bounds.data_ptr(),
- obb_pose.data_ptr(),
- obb_enable.data_ptr(),
- n_env_obb.data_ptr(),
- env_query_idx.data_ptr(), max_nobs, batch_size,
- horizon, n_spheres, transform_back);
- }));
+ selected_k = batch_collision_kernel;
}
- else
- {
- // TODO: call kernel based on flag:
- AT_DISPATCH_FLOATING_TYPES(
- distance.scalar_type(), "SphereObb_collision", ([&] {
- sphere_obb_collision_batch_env_kernel
- << < blocksPerGrid, threadsPerBlock, 0, stream >> > (
+
+ selected_k<< < blocksPerGrid, threadsPerBlock, 0, stream >> > (
sphere_position.data_ptr(),
- distance.data_ptr(), weight.data_ptr(),
- activation_distance.data_ptr(),
- obb_accel.data_ptr(),
- obb_bounds.data_ptr