Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine ScanCode.io d2d pipeline for Java #1404

Open
pombredanne opened this issue Oct 17, 2024 · 1 comment
Open

Refine ScanCode.io d2d pipeline for Java #1404

pombredanne opened this issue Oct 17, 2024 · 1 comment

Comments

@pombredanne
Copy link
Member

pombredanne commented Oct 17, 2024

Update pipelines steps for the binary to source analysis for Java using strings and symbols.

The current implementation matches .java and .class files using path, classpath, java packages and compiler conventions. There are cases when we will not have a correct match with these techniques. For instance, the .class code may not be compiled from Java, but could have been generated directly as bytecode with ASM library or similar bytecode engineering, as this is common with Hibernate and other data framework or SOAP or web services that generate code from @ annotations or XML documents.
To recap:

Here the approach would be to:

  • Collect source symbols with the "purl2sym" collect_symbols* pipelines or custom processing for XML
  • Collect symbols from the binaries, either using lief or using binary strings as collectable in the scancode-toolkit (we are missing a plugin)
  • Match the source to binary symbols, sort by the most matches and report correct matches to create a relation between a source and a binary
@pombredanne pombredanne converted this from a draft issue Oct 17, 2024
@pombredanne
Copy link
Member Author

This PR is a step in that direction:
aboutcode-org/purldb#538

@pombredanne pombredanne changed the title Refine ScanCode.io pipeline for Java Refine ScanCode.io d2d pipeline for Java Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant