Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10568. When the ldb command is executed, it is output by line #7467

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jianghuazhu
Copy link
Contributor

@jianghuazhu jianghuazhu commented Nov 21, 2024

What changes were proposed in this pull request?

When executing the ldb command, specify the maximum number of records to print for each file.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10568

How was this patch tested?

ci :
https://github.com/jianghuazhu/ozone/actions/runs/11955271018
Test command:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --out=/data12/test/tmp_file1 --max-records-per-file=300
--max-records-per-file specifies the maximum number of records to print per file. It is used together with --out.
result:

-rw-rw-r-- 1 hadoop hadoop 140538 Nov 21 22:02 0
-rw-rw-r-- 1 hadoop hadoop 136737 Nov 21 22:02 1
-rw-rw-r-- 1 hadoop hadoop 136741 Nov 21 22:02 2
-rw-rw-r-- 1 hadoop hadoop 139301 Nov 21 22:02 3
-rw-rw-r-- 1 hadoop hadoop 130741 Nov 21 22:02 4
-rw-rw-r-- 1 hadoop hadoop 130737 Nov 21 22:02 5
-rw-rw-r-- 1 hadoop hadoop 114272 Nov 21 22:02 6

or
/bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=3 --out=/data12/test/tmp_file2 --max-records-per-file=300
--max-records-per-file can also be used with --limit .
result:

-rw-rw-r-- 1 hadoop hadoop 614 Nov 21 22:03 0

@jianghuazhu
Copy link
Contributor Author

@adoroszlai @xichen01 @errose28, can you take a look?
Thanks.

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thank you for your improvement, a few comments for you refer.

// If there are no parent directories, create them
File dirFile = new File(fileName);
if (!dirFile.exists()) {
boolean flg = dirFile.mkdirs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we not give the --max-records-per-file we need not create the directory, in this scenario, filename should be a file.

@@ -240,11 +250,28 @@ private boolean displayTable(ManagedRocksIterator iterator,
return displayTable(iterator, dbColumnFamilyDef, out(), schemaV3);
}

// If there are no parent directories, create them
File dirFile = new File(fileName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just add suffix (like: x.0, x.1 x.1 ...)on the output file, instead of crate a directory? just like previous design of this PR.

int exitCode1 = cmd.execute(completeScanArgs1.toArray(new String[0]));
assertEquals(0, exitCode1);
assertTrue(tmpDir1.isDirectory());
assertEquals(3, tmpDir1.listFiles().length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use records_count / max_records_per_file # need Math.ceil replace this hard code, we can add a parameter records_count for method prepareTable.

File tmpDir1 = new File(scanDir1);
tmpDir1.deleteOnExit();

int exitCode1 = cmd.execute(completeScanArgs1.toArray(new String[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an assert to confirm that the output file is correct JSON format.

@jianghuazhu
Copy link
Contributor Author

Thanks @xichen01 for the comment and review.
I have updated it.
ci:
https://github.com/jianghuazhu/ozone/actions/runs/11995657782

Test 1:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --out=/data12/test/bucket_records --max-records-per-file=300
Result:

-rw-rw-r-- 1 hadoop hadoop 140538 Nov 24 17:20 bucket_records.0
-rw-rw-r-- 1 hadoop hadoop 136737 Nov 24 17:20 bucket_records.1
-rw-rw-r-- 1 hadoop hadoop 136741 Nov 24 17:20 bucket_records.2
-rw-rw-r-- 1 hadoop hadoop 139301 Nov 24 17:20 bucket_records.3
-rw-rw-r-- 1 hadoop hadoop 130741 Nov 24 17:20 bucket_records.4
-rw-rw-r-- 1 hadoop hadoop 130737 Nov 24 17:20 bucket_records.5
-rw-rw-r-- 1 hadoop hadoop 114272 Nov 24 17:20 bucket_records.6

Test 2:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=3 --out=/data12/test/bucket_records --max-records-per-file=300
Result:

-rw-rw-r-- 1 hadoop hadoop 1855 Nov 24 17:24 bucket_records.0

Test 3:
./bin/ozone debug ldb --db=/home/hadoop/jhz/test/om.db scan --column_family=bucketTable --limit=2 --out=/data12/test/tmp_file14
Result:

-rw-rw-r-- 1 hadoop hadoop 1239 Nov 24 17:26 tmp_file14

@@ -240,11 +250,31 @@ private boolean displayTable(ManagedRocksIterator iterator,
return displayTable(iterator, dbColumnFamilyDef, out(), schemaV3);
}

// If there are no parent directories, create them
if (recordsPerFile > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can automatically create parent directory for both recordsPerFile > 0 and recordsPerFile == 0, current, not give the --max-records-per-file the parent directory will not be created automatically.
Maybe just need remove the if.

@@ -384,7 +435,7 @@ private void assertContents(Map<String, ?> expected, String actualStr)
* @param tableName table name
* @param schemaV3 set to true for SchemaV3. applicable to block_data table
*/
private void prepareTable(String tableName, boolean schemaV3)
private void prepareTable(String tableName, boolean schemaV3, int... recordsCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just void prepareTable(String tableName, boolean schemaV3, int recordsCount) is OK, we can pass 5 for other method (like prepareTable(KEY_TABLE, true, 5)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When constructing the BLOCK_DATA dataset, containerCount and blockCount are needed. They can be shared with recordsCount, what do you think? @xichen01

      final int containerCount = 2;
      final int blockCount = 2;
      int blockId = 1;
      for (int cid = 1; cid <= containerCount; cid++) {
        for (int blockIdx = 1; blockIdx <= blockCount; blockIdx++, blockId++) {
 ......
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can extract the KEY_TABLE constructing code to a independent method, can be named prepareKeyTable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 .
I have updated it.

Comment on lines 22 to 23
import com.google.gson.JsonElement;
import com.google.gson.JsonParser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use jackson instead of gson due to some issues with gson in java17+. Refer this jira: https://issues.apache.org/jira/browse/HDDS-10538.
Could you please implement the logic with jackson instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Tejaskriya for the comment and view.
I have updated it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants