Replies: 1 comment 1 reply
-
Yes, adding more nodes with more partitions will help. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! I am tuning Nebula Graph to handle the scenario with huge upload rate. Can you please help me better understand the bottlenecks of my experimental setup?
The setup can't process more than 1 million entries insert (vertices and edges) per second in total (about 300k entries per graphd process). However, CPU/disk_io/net_io seem underutilized: CPU utilization is around 60%, disk writes are under 5MB/s and reads are around 150MB/s,
RAM is utilized fully (mostly file cache). However I have configured frequent flushes based on Ceph's recommendations, which also uses rocksdb: https://ceph.io/en/news/blog/2022/rocksdb-tuning-deep-dive/. It seems like Nebula utilizes and tracks all available RAM via its memory tracker.
Our setup consists of 3 containerized nodes, each hosting all three services: graphd, metad, and storaged. I've allocated most of the memory to graphd (0.7), leaving 0.2 (24GB) for storaged.
Resource allocations are:
Configurations:
metad:
graphd:
storaged:
Here's the current load setup:
The graph space has 3x replication and 60 partitions.
~40 loader processes are created, each containing 3 threads. Each thread writes to a separate graphd. The loader reads data from the disk and forms batch INSERT requests. Threads send requests in parallel but are limited by a semaphore to prevent high inflight.
Right now, each batch contains 1000 entries (nodes and edges combined), and I limit inflight per process to 24 requests. Thus, the total inflight is limited to 1k. With more requests, either the raft buffer overflows, or graphd starts limiting requests with GraphMemoryLimitExceeded.
Can Nebula handle more load with 4+ nodes? Or would it make sense to separate services into different containers? Or am I missing an obvious bottleneck that could be resolved by simply adding resources to the node?
Thank you in advance for any advice or suggestions!
Beta Was this translation helpful? Give feedback.
All reactions