-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-9259: Adding GeohashRecord Processor #5476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing @mikayla-yang! The structure of the project, creating the nar, etc. all look good. Overall I think it's a good approach, but there are some conventions that we need to be sure that we're following, and we need to make sure that we are properly handling failure conditions, etc. Please see inline feedback.
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
if (updated) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is written, the destination relationship for ALL records will be either 'success' or 'failure' depending on whether or not the last Record in the FlowFile is successful. We should route the original FlowFile to 'failure' if any Record cannot be enriched, OR we should route each individual Record either to 'success' or 'failure'. Or we should simply enrich the records that we could enrich and skip others. I would lean toward providing a property to allow the user to decide. I can definitely envision a use case where users want to say "I expect all records to be enriched. Otherwise, something is wrong and I want the whole FlowFile to be failed." And I can also imagine other use cases where the user will say "Some of the Records have the latitude/longitude elements and some don't. If the elements are there, enrich them, otherwise skip them." So a property gives the user the freedom to configure it as necessary for their use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a ROUTING_STRATEGY property so that users can specify how to route the flowfile, which I believe covers all the user cases you mentioned:
-
Skip_Unenriched: "Some of the Records have the latitude/longitude elements and some don't. If the elements are there, enrich them, otherwise skip them."
-
Require_All_Enriched: “I expect all records to be enriched. Otherwise, something is wrong and I want the whole FlowFile to be failed."
-
Split_Enriched_Unenriched: split the records that are enriched from those are not. The flowfile including successfully enriched records will be sent to ‘success’, the flowfile including records that are unable to be enriched will be sent to ‘failure’, and the original flowfile will be sent to ‘original’.
Just one quick note on the original code here, the destination relationship for all records will be 'success' as long as one record has been updated, which is identical to the current Skip_Unenriched strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikayla-yang I'd like to recommend another alternative which I think would simplify a bit the approach here, yet still aligns with what @markap14 described. I think this can be similar to what was done in GeoEnrichIPRecord, by having a property that indicates whether or not processing should be "all or nothing" or to allow individual record processing. And instead of having the success or fail have a "found", "not found" and "original" relationships.
In the instance where it's all or nothing and we have one record that did not produce a geohash, the entire flow file would land on the "not found" relationship, otherwise it would land in "found". In the case where we'd support individual record processing, it would then ensure to split records into found or not found as needed. And it would always populate the original connection with the original payload no matter what option is selected.
What do you think of this? @markap14 does this align with the scenario you described? @mikayla-yang would this be something you could migrate to if @markap14 agrees it's a fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @YolandaMDavis for your suggestion! I have posted two commits separately just to provide options. The last commit 97c5a5e is aligned with your comment: I tried to make it parallel to the GeoErichIPRecord as you suggested, and the only difference is our “all or nothing” strategy will route the flowfile to the not found relationship if any of the record is not enriched. The second to last commit c63b7e2 is aligned with the previous design, but I make sure we always route the original flowfiles to the original relationship under every routing strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @mikayla-yang! I noted some minor syntax recommendations, as well as one larger potential refactor for input and output handling.
Based on recommendations from @markap14, it sounds like some adjustments will be necessary to route the input file to an original
relationship. Along that line, using a StreamCallback
to bring together the InputStream and OutputStream handling seems like it would be helpful.
One other note, as indicated in the Pull Request checklist, new dependencies should mentioned in a NOTICE
file under src/main/resources/META-INF
in the NAR module. The geohash dependency is licensed under ASL 2.0, which is great, but the inclusion should be noted in this module, and in the nifi-assembly NOTICE
.
nifi-nar-bundles/nifi-geohash-bundle/nifi-geohash-processors/pom.xml
Outdated
Show resolved
Hide resolved
...i-geohash-processors/src/test/java/org/apache/nifi/processors/geohash/GeohashRecordTest.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
@markap14 @exceptionfactory Thank you for the comments! I have made changes according to your reviews. @markap14 For handling the failure conditions, I made the change based on your suggestion and also referred to the logic of GeoEnrichIPRecord. I didn’t refactor the code to use the Map structure as we discussed because for the Require_All_Enriched routing strategy, we will not know which RecordSetWriter to use until we finish processing all records. However, using the Map structure requires us to know the final relationship every time we write a record in order to use a proper RecordSetWriter. LISCENCE and NOTICE files have been added in the NAR module. The NOTICE under nifi-assembly has been updated as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @mikayla-yang! I recommend adjusting the enum
values to use all capital letters following standard Java conventions. I will take an additional look through the updated implementation.
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
Test failure fixed and successfully built with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @mikayla-yang. I noted a few more detail recommendations.
nifi-nar-bundles/nifi-geohash-bundle/nifi-geohash-processors/pom.xml
Outdated
Show resolved
Hide resolved
...i-geohash-processors/src/test/java/org/apache/nifi/processors/geohash/GeohashRecordTest.java
Outdated
Show resolved
Hide resolved
...i-geohash-processors/src/test/java/org/apache/nifi/processors/geohash/GeohashRecordTest.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Show resolved
Hide resolved
.../main/resources/docs/org.apache.nifi.processors.geohash.GeohashRecord/additionalDetails.html
Outdated
Show resolved
Hide resolved
Thank you @exceptionfactory for the comments! I have made changes accordingly. |
Thanks for the updates @mikayla-yang! The latest version looks close to completion. With the recent release of NiFi 1.15.0, can you rebase and update the version references from |
Rebase and fixing the problem of routing unenriched empty flowfiles to success with SPLIT strategy in the upcoming commit... |
…onsistency with EL functions
…ship for every routing strategy
…nd 2)allow individual record processing
317a5ca
to
97c5a5e
Compare
bbee064
to
97c5a5e
Compare
nifi-geohash-nar | ||
Copyright 2021-2021 The Apache Software Foundation | ||
|
||
This product includes software developed at | ||
The Apache Software Foundation (http://www.apache.org/). | ||
|
||
****************** | ||
Apache Software License v2 | ||
****************** | ||
|
||
The following binary components are provided under the Apache Software License v2 | ||
|
||
(ASLv2) Geohash Java | ||
The following NOTICE information applies: | ||
Geohash Java | ||
Copyright 2016 Silvio Heuberger and contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have any Apache licensed software going into the NOTICE file. We can remove this NOTICE file all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mikayla-yang the code looks much better at this point. I mentioned a couple of very minor points about logging inline. And I think we have an extraneous NOTICE file. Otherwise the code looks pretty good. I'll do some testing of this tomorrow and if all looks good I'll be a +1.
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Outdated
Show resolved
Hide resolved
.../nifi-geohash-processors/src/main/java/org/apache/nifi/processors/geohash/GeohashRecord.java
Show resolved
Hide resolved
Thanks for all of your effort here @mikayla-yang ! All looks good to me now. And with a +1 from @exceptionfactory I've merged your changes to main! |
Thank you for submitting a contribution to Apache NiFi.
Please provide a short description of the PR here:
Description of PR
This PR adds a record-based processor that helps NiFi users encode latitude/longitude into Geohashes with desired format and precision, and decode Geohash values into latitude/longitude coordinates.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
main
)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squash
or use--force
when pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean install
at the rootnifi
folder?LICENSE
file, including the mainLICENSE
file undernifi-assembly
?NOTICE
file, including the mainNOTICE
file found undernifi-assembly
?.displayName
in addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.