FAQ New Question Notification #179

eric-schleicher · 2017-03-15T21:44:59Z

I'm specifically curious if re-processing datasets will be faster.

Are there Word Types that are/aren't CUDA accelerated?

matlabbe · 2017-03-16T20:52:43Z

I updated the FAQ. I also updated this issue with a general title so that we can notify with a new post in this thread instead of creating a new thread to notify for convenience.

srinath-iko · 2019-10-07T04:30:41Z

Hey @matlabbe

I am trying to replicate the camera tracking on the GPU.

I was going through the Odometry and Registration code while working with a RGBD camera. I understand that visual correspondences are used to match features between the last two frames and these matched 3d/2d points are fed into solvepnp to return a rotation and translation vector. The inverse of the previous pose is multiplied with the current transform in OdometryF2M.cpp and then the previous pose again is multiplied with the result from F2M resulting in just the transform we get from solvepnp.

This would just constitute the deltas(change in transformation from one frame to another) instead of global transform but the application seems to be tracking the global transform. Am i missing out on something here? :D

I have disabled all the bundle adjustment and motion estimation to test out the results from solvepnp purely.

matlabbe · 2019-10-08T01:51:07Z

The odometry pose is updated here:

rtabmap/corelib/src/Odometry.cpp

Line 662 in 9cb1e4b

return _pose *= t; // update

from the incremental transform t computed by the selected odometry approach (e.g. F2M). In F2M, the result from PnP is the pose, not the increment (note that tmpMap contains 3D points of the local feature map in odometry frame):

rtabmap/corelib/src/odometry/OdometryF2M.cpp

Lines 315 to 320 in 9cb1e4b

    
           transform = regPipeline_->computeTransformationMod( 
        
           		tmpMap, 
        
           		*lastFrame_, 
        
           		// special case for ICP-only odom, set guess to identity if we just started or reset 
        
           		guessIteration==0 && !guess.isNull()?this->getPose()*guess:!regPipeline_->isImageRequired()&&this->framesProcessed()<2?this->getPose():Transform(), 
        
           		&regInfo);

To make it work like other odometry approaches that output incremental transforms, we have to convert it as incremental too:

rtabmap/corelib/src/odometry/OdometryF2M.cpp

Lines 520 to 521 in 9cb1e4b

    
           // make it incremental 
        
           transform = this->getPose().inverse() * transform;

so that pose update above (in parent Odometry class) still work.

srinath-iko · 2019-10-08T02:23:09Z

Thanks @matlabbe!

I thought that the matching was done with only the present frame and the one before(hence the increment) and not with the 3d local feature map and the present frame.

Thanks for clarifying!

Eufhid · 2021-09-19T12:51:59Z

Hello
In iOS there is a lot of parameters in the settings app, can you tell in short the settings for best quality and accuracy of scanning ?
Is there different best settings when scanning an house and when scanning surroundings of a building with pathway and green spaces ?
The points cloud files will be used as it for viewing in the app and then in Revit for building a model later.
Thanks a lot

matlabbe · 2021-09-28T19:37:14Z

Hi,
For quality/accuracy, it is more the scanning motion that you take that will make a difference. For example, avoid looking directly towards textureless surfaces and try to find loop closures when you are passing by a previously scanned area (for example, some back to same location before and after scanning a room, to reduce odometry drift).

When scanning large environments, decrease point cloud density during mapping to reduce rendering load (and save battery). Note that even if you decrease point cloud density, rtabmap still record full resolution depth images, so high resolution point clouds can be generated offline afterwards.

Outdoor, increase Max depth range to better see what is scanned.

naitiknakrani-eic · 2024-08-07T07:27:23Z

Hi @matlabbe

Any information available about the CUDA support for cloud based mapping? Any efforts known ? I can see some CUDA support for RGB with OpenCV in FAQ section. Any thing related to PCL ?

Or as an alternative, any document available of parameter tuning for faster point cloud based mapping and processing ?

matlabbe · 2024-08-12T23:33:48Z

For PCL, you may check/ask on their github: https://github.com/PointCloudLibrary/pcl. It seems they have some algorithms ported to cuda: https://github.com/PointCloudLibrary/pcl/tree/master/cuda, but rtabmap doesn't use them.

RTAB-Map uses PCL for ICP-based vo / loop closure and for 3D local occupancy grid that require voxel filtering and/or normals estimation. On post-processing, it uses PCL for meshing and texture mapping. To answer your question: "any document available of parameter tuning for faster point cloud based mapping and processing ?", which part exactly do you want to improve speed?

naitiknakrani-eic · 2024-08-13T05:49:07Z

@matlabbe Thanks for your prompt answer. I will give you background. We are trying to improve overall RTAB-Map SLAM processing speed using pointcloud and external odometry as inputs.

In our code analysis, we found RTAB-Map does cluster extraction and segmentation in local grid mapping. We are trying to use octree based cuda implementation from PCL in RTAB-Map and estimating execution time in order to improve overall throughput. We are using external odometry, so ICP vo estimation won't come into the play in execution time.

On my point '"any document available of parameter tuning for faster point cloud based mapping and processing ?", I meant any document available which has analysis of parameter tuning (like memory/Grid/optimizer based) impact on overall mapping speed or occupancy grid generation and loop closure?

By using these both ways together, our aim is to accelerate RTAB-Map for NVIDIA GPU. Let us know your feedback on these approach. If you think there is something we are missing in our analysis let us know.

borongyuan · 2024-08-13T06:25:48Z

I haven't used the CUDA part of PCL in several years. Not sure what updates they made later. In the past two years, I have tried NVIDIA's cuPCL, which includes implementations of ICP, NDT, Octree, etc. It supports x86 and Jetson platforms. The only problem is that it is not open source, but provides pre-compiled dynamic link libraries for different platforms, so integrating it into RTAB-Map may be bloated.
I planned to integrate VDBFusion some time ago #1286. But after preliminary testing, I feel that it is not complete enough, so it may be a better choice to implement new map representation directly based on OpenVDB. With OpenVDB, it may not require a GPU to handle most scenes. But I'm working on two of our new products lately, so I'll be back in a month to continue developing this part.

naitiknakrani-eic · 2024-08-13T07:24:17Z

@borongyuan That's whole new perspective and approach you have mentioned. Great to know usage of openVDB and VDBFusion in PCL processing. I agreed to your point about cuPCL, hence relying on PCL/CUDA implementations.

naitiknakrani-eic · 2024-08-28T12:48:04Z

@matlabbe What is your opinion on possibilities of achieving real-time (or 70% of real-time) processing (mapping) having 512x512 size of ordered point cloud @20 FPS and odom @30 FPS as an inputs ?

How much AGX Orin's GPU can be leveraged in the code implementation ?

matlabbe · 2024-09-05T05:18:23Z

@naitiknakrani-eic

We are trying to use octree based cuda implementation from PCL in RTAB-Map

That sounds a good idea! We could handle PCL-CUDA like we do with OpenCV CUDA, detecting if PCL's CUDA module is available, then enabling related parameters to use GPU version of some of the filtering algorithms.

I meant any document available which has analysis of parameter tuning (like memory/Grid/optimizer based) impact on overall mapping speed or occupancy grid generation and loop closure?

In that paper section 5, we benchmarked the different local and global occupancy grid approaches provided in rtabmap, though not with super extensive or detailed results of every parts of the chain (like time for clustering / downsampling / voxel filtering / normal estimation, ...).

We were more concerned on the long-term trending of the computation time, accounting for loop closures for which we need to regenerate the global map.

What is your opinion on possibilities of achieving real-time (or 70% of real-time) processing (mapping) having 512x512 size of ordered point cloud @20 FPS and odom @30 FPS as an inputs ?

Well, it depends what you want to update at this rate. For global maps, I don't think we need super dense point clouds that are processed super fast, unlike local occupancy/voxel grids for obstacle avoidance. The current bottleneck I see with current occupancy grid is not really the time to create local grids (which time could be improved with some PCL's CUDA implementation but it is constant), but the time to update the global occupancy grid map after a loop closure. With RTAB-Map's memory management disabled, these updates can create spikes over real-time limit when continuously doing SLAM for long time as shown in Figure 18 of that paper (note that in that figure the local grids were 2D, so with 3D local grids and using OctoMap, the "Global Assembling Time" would have increased a lot faster).

How much AGX Orin's GPU can be leveraged in the code implementation ?

Currently we have GPU options that are more related to 2D features (with OpenCV CUDA, more to come in that PR), not for point cloud processing.

@borongyuan cuPCL looks great for jetson optimization, though the offer seems similar to what is already in PCL (which seems easier to integrate as rtabmap already uses a lot PCL). Just stumbled on this page, the guy tried OpenVDB on TUM RGBD dataset. That could give an idea how to use the library for similar sensors. Maybe another alternative https://github.com/facontidavide/Bonxai

cheers,
Mathieu

borongyuan · 2024-09-05T13:34:53Z

There is Octree implementation in PCL's gpu module. But I don't see any ICP and NDT related parts in cuda and gpu modules. The way cuPCL is provided is indeed not very friendly.
I don't know why NVIDIA has provided some duplicative and confusing libraries over the years. For example, when we want to use GPU acceleration for CV, we have OpenCV's CUDA module, NVIDIA's VisionWorks, VPI, and CVCUDA. Using PCL and OpenCV's own CUDA module is undoubtedly the most convenient. VisionWorks has been abandoned. I was already trying to add VPI support, but then I noticed CVCUDA. They have many duplicate functions. A friend of mine told me that VPI is intended for edge devices, while CVCUDA is targeted for servers. I don't even know which one NVIDIA wants developers to use, so I decided to wait and see.
Regarding OpenVDB, NVIDIA has also developed GVDB before. Thank goodness it was also abandoned. Only NanoVDB remains, and it has been integrated into OpenVDB. So now we can study OpenVDB with peace of mind.

naitiknakrani-eic · 2024-09-06T06:48:14Z

@matlabbe @borongyuan Thanks for all the responses.

Our detailed STM timing analysis (like time for clustering / downsampling / voxel filtering / normal estimation, ...) has shown that the large part of time is taken by Search algorithm for segmentation and clustering. we used radius search (PCL GPU implementation) in octree based implementation.

This thread #1045 (comment) has new thought process of optimizing segmentation process without use of searching algorithm specifically computationally heavy functions like radius search or KNN.

So far we have been working on Lidar-based slam only so haven't used any of visual libraries. Our focus is only PCL based optimization.

matlabbe · 2024-09-07T17:57:53Z

Those optimizations can be great to reduce "Local Occupancy Grid" time, in particular with sensors generating a lot of points and at long range (e.g., OS2-128 lidar). Another part of the STM time is compressing data to save to database, I opened the other day an issue with possible improvement #1334 (nvCOMP)

matlabbe changed the title ~~Added question to the FAQ/ CUDA acceleration~~ FAQ New Question Notification Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ New Question Notification #179

FAQ New Question Notification #179

eric-schleicher commented Mar 15, 2017

matlabbe commented Mar 16, 2017

srinath-iko commented Oct 7, 2019

matlabbe commented Oct 8, 2019

srinath-iko commented Oct 8, 2019

Eufhid commented Sep 19, 2021

matlabbe commented Sep 28, 2021

naitiknakrani-eic commented Aug 7, 2024 •

edited

Loading

matlabbe commented Aug 12, 2024

naitiknakrani-eic commented Aug 13, 2024

borongyuan commented Aug 13, 2024

naitiknakrani-eic commented Aug 13, 2024 •

edited

Loading

naitiknakrani-eic commented Aug 28, 2024

matlabbe commented Sep 5, 2024 •

edited

Loading

borongyuan commented Sep 5, 2024

naitiknakrani-eic commented Sep 6, 2024

matlabbe commented Sep 7, 2024 •

edited

Loading

FAQ New Question Notification #179

FAQ New Question Notification #179

Comments

eric-schleicher commented Mar 15, 2017

matlabbe commented Mar 16, 2017

srinath-iko commented Oct 7, 2019

matlabbe commented Oct 8, 2019

srinath-iko commented Oct 8, 2019

Eufhid commented Sep 19, 2021

matlabbe commented Sep 28, 2021

naitiknakrani-eic commented Aug 7, 2024 • edited Loading

matlabbe commented Aug 12, 2024

naitiknakrani-eic commented Aug 13, 2024

borongyuan commented Aug 13, 2024

naitiknakrani-eic commented Aug 13, 2024 • edited Loading

naitiknakrani-eic commented Aug 28, 2024

matlabbe commented Sep 5, 2024 • edited Loading

borongyuan commented Sep 5, 2024

naitiknakrani-eic commented Sep 6, 2024

matlabbe commented Sep 7, 2024 • edited Loading

naitiknakrani-eic commented Aug 7, 2024 •

edited

Loading

naitiknakrani-eic commented Aug 13, 2024 •

edited

Loading

matlabbe commented Sep 5, 2024 •

edited

Loading

matlabbe commented Sep 7, 2024 •

edited

Loading