HDDS-10568. When the ldb command is executed, it is output by line #7467

jianghuazhu · 2024-11-21T16:46:30Z

What changes were proposed in this pull request?

When executing the ldb command, specify the maximum number of records to print for each file.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10568

How was this patch tested?

ci :
https://github.com/jianghuazhu/ozone/actions/runs/11955271018
Test command:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --out=/data12/test/tmp_file1 --max-records-per-file=300
--max-records-per-file specifies the maximum number of records to print per file. It is used together with --out.
result:

-rw-rw-r-- 1 hadoop hadoop 140538 Nov 21 22:02 0
-rw-rw-r-- 1 hadoop hadoop 136737 Nov 21 22:02 1
-rw-rw-r-- 1 hadoop hadoop 136741 Nov 21 22:02 2
-rw-rw-r-- 1 hadoop hadoop 139301 Nov 21 22:02 3
-rw-rw-r-- 1 hadoop hadoop 130741 Nov 21 22:02 4
-rw-rw-r-- 1 hadoop hadoop 130737 Nov 21 22:02 5
-rw-rw-r-- 1 hadoop hadoop 114272 Nov 21 22:02 6

or
/bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=3 --out=/data12/test/tmp_file2 --max-records-per-file=300
--max-records-per-file can also be used with --limit .
result:

-rw-rw-r-- 1 hadoop hadoop 614 Nov 21 22:03 0

jianghuazhu · 2024-11-21T16:53:36Z

@adoroszlai @xichen01 @errose28, can you take a look?
Thanks.

xichen01

@jianghuazhu Thank you for your improvement, a few comments for you refer.

xichen01 · 2024-11-23T17:05:10Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/DBScanner.java

+    // If there are no parent directories, create them
+    File dirFile = new File(fileName);
+    if (!dirFile.exists()) {
+      boolean flg = dirFile.mkdirs();


If we not give the --max-records-per-file we need not create the directory, in this scenario, filename should be a file.

xichen01 · 2024-11-23T17:05:13Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/DBScanner.java

@@ -240,11 +250,28 @@ private boolean displayTable(ManagedRocksIterator iterator,
      return displayTable(iterator, dbColumnFamilyDef, out(), schemaV3);
    }

+    // If there are no parent directories, create them
+    File dirFile = new File(fileName);


How about just add suffix (like: x.0, x.1 x.1 ...)on the output file, instead of crate a directory? just like previous design of this PR.

xichen01 · 2024-11-23T17:08:29Z

hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/debug/TestLDBCli.java

+    int exitCode1 = cmd.execute(completeScanArgs1.toArray(new String[0]));
+    assertEquals(0, exitCode1);
+    assertTrue(tmpDir1.isDirectory());
+    assertEquals(3, tmpDir1.listFiles().length);


Can use records_count / max_records_per_file # need Math.ceil replace this hard code, we can add a parameter records_count for method prepareTable.

xichen01 · 2024-11-23T17:10:56Z

hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/debug/TestLDBCli.java

+    File tmpDir1 = new File(scanDir1);
+    tmpDir1.deleteOnExit();
+
+    int exitCode1 = cmd.execute(completeScanArgs1.toArray(new String[0]));


We can add an assert to confirm that the output file is correct JSON format.

jianghuazhu · 2024-11-24T12:45:48Z

Thanks @xichen01 for the comment and review.
I have updated it.
ci:
https://github.com/jianghuazhu/ozone/actions/runs/11995657782

Test 1:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --out=/data12/test/bucket_records --max-records-per-file=300
Result:

-rw-rw-r-- 1 hadoop hadoop 140538 Nov 24 17:20 bucket_records.0
-rw-rw-r-- 1 hadoop hadoop 136737 Nov 24 17:20 bucket_records.1
-rw-rw-r-- 1 hadoop hadoop 136741 Nov 24 17:20 bucket_records.2
-rw-rw-r-- 1 hadoop hadoop 139301 Nov 24 17:20 bucket_records.3
-rw-rw-r-- 1 hadoop hadoop 130741 Nov 24 17:20 bucket_records.4
-rw-rw-r-- 1 hadoop hadoop 130737 Nov 24 17:20 bucket_records.5
-rw-rw-r-- 1 hadoop hadoop 114272 Nov 24 17:20 bucket_records.6

Test 2:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=3 --out=/data12/test/bucket_records --max-records-per-file=300
Result:

-rw-rw-r-- 1 hadoop hadoop 1855 Nov 24 17:24 bucket_records.0

Test 3:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=2 --out=/data12/test/tmp_file14
Result:

-rw-rw-r-- 1 hadoop hadoop 1239 Nov 24 17:26 tmp_file14

xichen01 · 2024-11-25T16:42:40Z

hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/DBScanner.java

@@ -240,11 +250,31 @@ private boolean displayTable(ManagedRocksIterator iterator,
      return displayTable(iterator, dbColumnFamilyDef, out(), schemaV3);
    }

+    // If there are no parent directories, create them
+    if (recordsPerFile > 0) {


We can automatically create parent directory for both recordsPerFile > 0 and recordsPerFile == 0, current, not give the --max-records-per-file the parent directory will not be created automatically.
Maybe just need remove the if.

xichen01 · 2024-11-25T16:43:05Z

hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/debug/TestLDBCli.java

@@ -384,7 +435,7 @@ private void assertContents(Map<String, ?> expected, String actualStr)
   * @param tableName table name
   * @param schemaV3 set to true for SchemaV3. applicable to block_data table
   */
-  private void prepareTable(String tableName, boolean schemaV3)
+  private void prepareTable(String tableName, boolean schemaV3, int... recordsCount)


I think just void prepareTable(String tableName, boolean schemaV3, int recordsCount) is OK, we can pass 5 for other method (like prepareTable(KEY_TABLE, true, 5)).

When constructing the BLOCK_DATA dataset, containerCount and blockCount are needed. They can be shared with recordsCount, what do you think? @xichen01

final int containerCount = 2; final int blockCount = 2; int blockId = 1; for (int cid = 1; cid <= containerCount; cid++) { for (int blockIdx = 1; blockIdx <= blockCount; blockIdx++, blockId++) { ...... } }

I think you can extract the KEY_TABLE constructing code to a independent method, can be named prepareKeyTable

Thanks @xichen01 .
I have updated it.

Tejaskriya · 2024-11-27T06:10:36Z

hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/debug/TestLDBCli.java

+import com.google.gson.JsonElement;
+import com.google.gson.JsonParser;


We want to use jackson instead of gson due to some issues with gson in java17+. Refer this jira: https://issues.apache.org/jira/browse/HDDS-10538.
Could you please implement the logic with jackson instead?

Thanks @Tejaskriya for the comment and view.
I have updated it.

HDDS-10568. When the ldb command is executed, it is output by line

67b8764

ivandika3 requested a review from xichen01 November 22, 2024 10:31

xichen01 reviewed Nov 23, 2024

View reviewed changes

Update some unit tests.

1b2b8d8

adoroszlai mentioned this pull request Nov 24, 2024

HDDS-10568. When the ldb command is executed, it is output by line #6420

Closed

jianghuazhu added 3 commits November 24, 2024 18:04

Fix some checkstyle.

50a4bf4

Fix some findbugs.

a965d3b

Fix some checkstyle.

1718877

xichen01 reviewed Nov 25, 2024

View reviewed changes

Update some unit tests.

4b8e0fe

Tejaskriya reviewed Nov 27, 2024

View reviewed changes

Update some unit tests.

2844c83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-10568. When the ldb command is executed, it is output by line #7467

HDDS-10568. When the ldb command is executed, it is output by line #7467

jianghuazhu commented Nov 21, 2024 •

edited

Loading

jianghuazhu commented Nov 21, 2024

xichen01 left a comment

xichen01 Nov 23, 2024

xichen01 Nov 23, 2024

xichen01 Nov 23, 2024

xichen01 Nov 23, 2024

jianghuazhu commented Nov 24, 2024

xichen01 Nov 25, 2024

xichen01 Nov 25, 2024

jianghuazhu Nov 26, 2024

xichen01 Nov 26, 2024

jianghuazhu Nov 27, 2024

Tejaskriya Nov 27, 2024

jianghuazhu Nov 27, 2024

		import com.google.gson.JsonElement;
		import com.google.gson.JsonParser;

HDDS-10568. When the ldb command is executed, it is output by line #7467

Are you sure you want to change the base?

HDDS-10568. When the ldb command is executed, it is output by line #7467

Conversation

jianghuazhu commented Nov 21, 2024 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

jianghuazhu commented Nov 21, 2024

xichen01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianghuazhu commented Nov 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianghuazhu commented Nov 21, 2024 •

edited

Loading