Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ New Question Notification #179

Open
eric-schleicher opened this issue Mar 15, 2017 · 16 comments
Open

FAQ New Question Notification #179

eric-schleicher opened this issue Mar 15, 2017 · 16 comments

Comments

@eric-schleicher
Copy link

I'm specifically curious if re-processing datasets will be faster.

Are there Word Types that are/aren't CUDA accelerated?

@matlabbe matlabbe changed the title Added question to the FAQ/ CUDA acceleration FAQ New Question Notification Mar 16, 2017
@matlabbe
Copy link
Member

I updated the FAQ. I also updated this issue with a general title so that we can notify with a new post in this thread instead of creating a new thread to notify for convenience.

@srinath-iko
Copy link

Hey @matlabbe

I am trying to replicate the camera tracking on the GPU.

I was going through the Odometry and Registration code while working with a RGBD camera. I understand that visual correspondences are used to match features between the last two frames and these matched 3d/2d points are fed into solvepnp to return a rotation and translation vector. The inverse of the previous pose is multiplied with the current transform in OdometryF2M.cpp and then the previous pose again is multiplied with the result from F2M resulting in just the transform we get from solvepnp.

This would just constitute the deltas(change in transformation from one frame to another) instead of global transform but the application seems to be tracking the global transform. Am i missing out on something here? :D

I have disabled all the bundle adjustment and motion estimation to test out the results from solvepnp purely.

@matlabbe
Copy link
Member

matlabbe commented Oct 8, 2019

The odometry pose is updated here:

return _pose *= t; // update

from the incremental transform t computed by the selected odometry approach (e.g. F2M). In F2M, the result from PnP is the pose, not the increment (note that tmpMap contains 3D points of the local feature map in odometry frame):
transform = regPipeline_->computeTransformationMod(
tmpMap,
*lastFrame_,
// special case for ICP-only odom, set guess to identity if we just started or reset
guessIteration==0 && !guess.isNull()?this->getPose()*guess:!regPipeline_->isImageRequired()&&this->framesProcessed()<2?this->getPose():Transform(),
&regInfo);

To make it work like other odometry approaches that output incremental transforms, we have to convert it as incremental too:
// make it incremental
transform = this->getPose().inverse() * transform;

so that pose update above (in parent Odometry class) still work.

@srinath-iko
Copy link

Thanks @matlabbe!

I thought that the matching was done with only the present frame and the one before(hence the increment) and not with the 3d local feature map and the present frame.

Thanks for clarifying!

@Eufhid
Copy link

Eufhid commented Sep 19, 2021

Hello
In iOS there is a lot of parameters in the settings app, can you tell in short the settings for best quality and accuracy of scanning ?
Is there different best settings when scanning an house and when scanning surroundings of a building with pathway and green spaces ?
The points cloud files will be used as it for viewing in the app and then in Revit for building a model later.
Thanks a lot

@matlabbe
Copy link
Member

Hi,
For quality/accuracy, it is more the scanning motion that you take that will make a difference. For example, avoid looking directly towards textureless surfaces and try to find loop closures when you are passing by a previously scanned area (for example, some back to same location before and after scanning a room, to reduce odometry drift).

When scanning large environments, decrease point cloud density during mapping to reduce rendering load (and save battery). Note that even if you decrease point cloud density, rtabmap still record full resolution depth images, so high resolution point clouds can be generated offline afterwards.

Outdoor, increase Max depth range to better see what is scanned.

@naitiknakrani-eic
Copy link

naitiknakrani-eic commented Aug 7, 2024

Hi @matlabbe

Any information available about the CUDA support for cloud based mapping? Any efforts known ? I can see some CUDA support for RGB with OpenCV in FAQ section. Any thing related to PCL ?

Or as an alternative, any document available of parameter tuning for faster point cloud based mapping and processing ?

@matlabbe
Copy link
Member

For PCL, you may check/ask on their github: https://github.com/PointCloudLibrary/pcl. It seems they have some algorithms ported to cuda: https://github.com/PointCloudLibrary/pcl/tree/master/cuda, but rtabmap doesn't use them.

RTAB-Map uses PCL for ICP-based vo / loop closure and for 3D local occupancy grid that require voxel filtering and/or normals estimation. On post-processing, it uses PCL for meshing and texture mapping. To answer your question: "any document available of parameter tuning for faster point cloud based mapping and processing ?", which part exactly do you want to improve speed?

@naitiknakrani-eic
Copy link

@matlabbe Thanks for your prompt answer. I will give you background. We are trying to improve overall RTAB-Map SLAM processing speed using pointcloud and external odometry as inputs.

In our code analysis, we found RTAB-Map does cluster extraction and segmentation in local grid mapping. We are trying to use octree based cuda implementation from PCL in RTAB-Map and estimating execution time in order to improve overall throughput. We are using external odometry, so ICP vo estimation won't come into the play in execution time.

On my point '"any document available of parameter tuning for faster point cloud based mapping and processing ?", I meant any document available which has analysis of parameter tuning (like memory/Grid/optimizer based) impact on overall mapping speed or occupancy grid generation and loop closure?

By using these both ways together, our aim is to accelerate RTAB-Map for NVIDIA GPU. Let us know your feedback on these approach. If you think there is something we are missing in our analysis let us know.

@borongyuan
Copy link
Contributor

I haven't used the CUDA part of PCL in several years. Not sure what updates they made later. In the past two years, I have tried NVIDIA's cuPCL, which includes implementations of ICP, NDT, Octree, etc. It supports x86 and Jetson platforms. The only problem is that it is not open source, but provides pre-compiled dynamic link libraries for different platforms, so integrating it into RTAB-Map may be bloated.
I planned to integrate VDBFusion some time ago #1286. But after preliminary testing, I feel that it is not complete enough, so it may be a better choice to implement new map representation directly based on OpenVDB. With OpenVDB, it may not require a GPU to handle most scenes. But I'm working on two of our new products lately, so I'll be back in a month to continue developing this part.

@naitiknakrani-eic
Copy link

naitiknakrani-eic commented Aug 13, 2024

@borongyuan That's whole new perspective and approach you have mentioned. Great to know usage of openVDB and VDBFusion in PCL processing. I agreed to your point about cuPCL, hence relying on PCL/CUDA implementations.

@naitiknakrani-eic
Copy link

@matlabbe What is your opinion on possibilities of achieving real-time (or 70% of real-time) processing (mapping) having 512x512 size of ordered point cloud @20 FPS and odom @30 FPS as an inputs ?

How much AGX Orin's GPU can be leveraged in the code implementation ?

@matlabbe
Copy link
Member

matlabbe commented Sep 5, 2024

@naitiknakrani-eic

We are trying to use octree based cuda implementation from PCL in RTAB-Map

That sounds a good idea! We could handle PCL-CUDA like we do with OpenCV CUDA, detecting if PCL's CUDA module is available, then enabling related parameters to use GPU version of some of the filtering algorithms.

I meant any document available which has analysis of parameter tuning (like memory/Grid/optimizer based) impact on overall mapping speed or occupancy grid generation and loop closure?

In that paper section 5, we benchmarked the different local and global occupancy grid approaches provided in rtabmap, though not with super extensive or detailed results of every parts of the chain (like time for clustering / downsampling / voxel filtering / normal estimation, ...).
Screenshot from 2024-09-04 21-39-26
We were more concerned on the long-term trending of the computation time, accounting for loop closures for which we need to regenerate the global map.

What is your opinion on possibilities of achieving real-time (or 70% of real-time) processing (mapping) having 512x512 size of ordered point cloud @20 FPS and odom @30 FPS as an inputs ?

Well, it depends what you want to update at this rate. For global maps, I don't think we need super dense point clouds that are processed super fast, unlike local occupancy/voxel grids for obstacle avoidance. The current bottleneck I see with current occupancy grid is not really the time to create local grids (which time could be improved with some PCL's CUDA implementation but it is constant), but the time to update the global occupancy grid map after a loop closure. With RTAB-Map's memory management disabled, these updates can create spikes over real-time limit when continuously doing SLAM for long time as shown in Figure 18 of that paper (note that in that figure the local grids were 2D, so with 3D local grids and using OctoMap, the "Global Assembling Time" would have increased a lot faster).
image

How much AGX Orin's GPU can be leveraged in the code implementation ?

Currently we have GPU options that are more related to 2D features (with OpenCV CUDA, more to come in that PR), not for point cloud processing.

@borongyuan cuPCL looks great for jetson optimization, though the offer seems similar to what is already in PCL (which seems easier to integrate as rtabmap already uses a lot PCL). Just stumbled on this page, the guy tried OpenVDB on TUM RGBD dataset. That could give an idea how to use the library for similar sensors. Maybe another alternative https://github.com/facontidavide/Bonxai

cheers,
Mathieu

@borongyuan
Copy link
Contributor

There is Octree implementation in PCL's gpu module. But I don't see any ICP and NDT related parts in cuda and gpu modules. The way cuPCL is provided is indeed not very friendly.
I don't know why NVIDIA has provided some duplicative and confusing libraries over the years. For example, when we want to use GPU acceleration for CV, we have OpenCV's CUDA module, NVIDIA's VisionWorks, VPI, and CVCUDA. Using PCL and OpenCV's own CUDA module is undoubtedly the most convenient. VisionWorks has been abandoned. I was already trying to add VPI support, but then I noticed CVCUDA. They have many duplicate functions. A friend of mine told me that VPI is intended for edge devices, while CVCUDA is targeted for servers. I don't even know which one NVIDIA wants developers to use, so I decided to wait and see.
Regarding OpenVDB, NVIDIA has also developed GVDB before. Thank goodness it was also abandoned. Only NanoVDB remains, and it has been integrated into OpenVDB. So now we can study OpenVDB with peace of mind.

@naitiknakrani-eic
Copy link

@matlabbe @borongyuan Thanks for all the responses.

Our detailed STM timing analysis (like time for clustering / downsampling / voxel filtering / normal estimation, ...) has shown that the large part of time is taken by Search algorithm for segmentation and clustering. we used radius search (PCL GPU implementation) in octree based implementation.

This thread #1045 (comment) has new thought process of optimizing segmentation process without use of searching algorithm specifically computationally heavy functions like radius search or KNN.

So far we have been working on Lidar-based slam only so haven't used any of visual libraries. Our focus is only PCL based optimization.

@matlabbe
Copy link
Member

matlabbe commented Sep 7, 2024

Those optimizations can be great to reduce "Local Occupancy Grid" time, in particular with sensors generating a lot of points and at long range (e.g., OS2-128 lidar). Another part of the STM time is compressing data to save to database, I opened the other day an issue with possible improvement #1334 (nvCOMP)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants