Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Tests for ISO compliance #499

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jagill
Copy link

@jagill jagill commented Nov 27, 2019

This PR is meant to start a discussion about where JTS is and isn't
confirming to the ISO spec. These test cases are meant to explore the
"basic" scenarios thoroughly. They aren't meant to test numerical
stability or complex cases, but are meant to test conditions like
nulls and NaNs.

The tests were designed to pass in JTS. Where I've noticed a
departure from the ISO spec, or unexpected behavior, I've put an
"XXX" comment.

This PR is meant to start a discussion about where JTS is and isn't
confirming to the ISO spec.  These test cases are meant to explore the
"basic" scenarios thoroughly.  They aren't meant to test numerical
stability or cpmplex cases, but are meant to test conditions like
nulls and NaNs.

The tests were designed to pass in JTS.  Where I've noticed a
departure from the ISO spec, or unexpected behavior, I've put an
"XXX" comment.
@dr-jts
Copy link
Contributor

dr-jts commented Nov 27, 2019

Can you summarize the discrepancies with ISO in a list? Either here or in the unit test, or both.

@jagill
Copy link
Author

jagill commented Nov 27, 2019

Summarizing some thoughts and incompatibilities:

  1. ISO does not state if !valid -> !simple. JTS has many times where something is not valid but is simple. This seems fine, but there should maybe be a clear indicator that the user should check for both.
  2. ISO is a little unclear if non-valid geometries should throw an error on construction, or just return isValid == false. JTS also treats some issues as construction errors, and some to be reported by isValid.
  3. ISO only specifies validity for Polygons, and not Simplicity. The validity checks are the expensive simplicity checks of the loops, plus various containment/intersection conditions. In JTS, sometimes bad polygons are simple, and sometimes they are not. Might be worth thinking about what polygon.isSimple() should return.

More specific comments:

  1. ISO States that any geometryCollection (which includes MultiPoint, MultiLineString, and MultiPolygon) with a null or empty geometry is invalid. JTS raises an IllegalArgumentException for null geometries, but allows constructing with empty geometries and does not report them as invalid.
  2. According to ISO, creating a LineString/MultiGeometry with a null array should be invalid. JTS converts that to an empty array, which seems reasonable.
  3. LineStrings and Polygons constructed with a null coordinate do not fail construction, but throw an NPE for isValid and isSimple. It seems like it should either fail on constriction, or successfully return false for isValid.
  4. According to ISO. LineStrings with repeated consecutive points are invalid; JTS reports them as valid (and in some cases elides the consecutive points). I'm not sure if any algorithms would have trouble with repeated consecutive points.
  5. Polygons and Multipolygons with a NaN in their coordinates return isValid == false, but fail with an AssertionFailedException when calling isSimple. Should this fail more gracefully?
  6. According to ISO, MultiGeometries with invalid geometries are also invalid. In JTS, a MultiPoint with invalid points (ie, containing NaN) are valid.
  7. MultiPolygons whose polygons touch only on the boundaries are valid (and simple). JTS has them invalid but simple. [Edit -- correction below]
  8. MultiPolygons with polygons whose interiors intersect are valid but not simple. JTS has them invalid but simple. [Edit -- correction below]
  9. ISO has GeometryCollections with geometries whose interiors intersect are valid but not simple. JTS has them valid and simple. [Edit -- correction below]

@dr-jts
Copy link
Contributor

dr-jts commented Nov 27, 2019

MultiPolygons with polygons whose interiors intersect are valid but not simple. JTS has them invalid but simple.

Really? So there is no semantic meaning for an ISO MultiPolygon over a GeometryCollection of Polygons?

MultiPolygons whose polygons touch only on the boundaries are valid (and simple). JTS has them invalid but simple.

In OGC elements of a MultiPolygon can touch at points but not in lines or areas. JTS implements this semantics

@dr-jts
Copy link
Contributor

dr-jts commented Nov 27, 2019

A general comment: The key point is that JTS implements the OGC semantics. So it would be more appropriate to contrast the OGC semantics with ISO semantics. Then if necessary identify anywhere that JTS does not implement OGC semantics.

I suggest splitting the tests into two files, one for JTS-only things and one for discrepancies between OGC and ISO.

It doesn't look possible to meet all of the ISO semantics while keeping OGC compatibility. And a few of the ISO semantics (the ones involving MultiPolygons) look very questionable, since they seem to imply that MultiPolygons have none or very few topological constraints.

The issues around whether geometries containing nulls/NaNs etc are probably addressable. I guess those are not OGC semantics per se, but purely JTS.

@jagill
Copy link
Author

jagill commented Dec 9, 2019

I re-checked the ISO standard, and you're right about MultiPolygon intersections. Here is a more careful reading:

  1. All MultiGeometries (including GeometryCollections) fail on construction if any component is null or empty. All statements below assume construction succeeds.
  2. Excluding subclass restrictions, GeometryCollections are valid if all components are valid. They are simple if all components are simple, and no two components have interiors that intersect.
  3. Additionally, MultiSurfaces (hence MultiPolygons) fail on construction if any two components intersect at more than a finite number of points. If they succeed construction, they are valid/simple if their components are valid/simple.

One challenge with the conforming to OGC is that OGC doesn't define IsValid at all, so it's unclear what that function even means. As you pointed out before, OGC is vague on many edge cases, but maybe these tests case would let us make a de-facto spec for those and then bring them to the OGC committee.

One advantage of ISO is the SQL spec in general is in ISO standard, so the ISO simple geometry sql spec is natural to implement for databases. However, the requirement to pay is an unfortunate hurdle.

ISO leaves some interpretation for those conditions that fail on construction, and those conditions for validity. It seems unreasonable to require containment/intersection checks on a MultiGeometry on construction -- those are very expensive operations, and I like JTS's philosophy of allowing serialization/deserialization of problematic geometries, since we don't know what the user is doing with them or how much they care.

@dr-jts
Copy link
Contributor

dr-jts commented Dec 9, 2019

I re-checked the ISO standard, and you're right about MultiPolygon intersections. Here is a more careful reading:

  1. All MultiGeometries (including GeometryCollections) fail on construction if any component is null or empty. All statements below assume construction succeeds.
  2. Excluding subclass restrictions, GeometryCollections are valid if all components are valid. They are simple if all components are simple, and no two components have interiors that intersect.
  3. Additionally, MultiSurfaces (hence MultiPolygons) fail on construction if any two components intersect at more than a finite number of points. If they succeed construction, they are valid/simple if their components are valid/simple.

Sounds very similar to OGC.

One challenge with the conforming to OGC is that OGC doesn't define IsValid at all, so it's unclear what that function even means.

The semantics of IsValid are presumably defined by the definition of geometry validity. This is reasonably well specified in the OGC SFS.

As you pointed out before, OGC is vague on many edge cases, but maybe these tests case would let us make a de-facto spec for those and then bring them to the OGC committee.

By all means. Although I suspect a textual/mathematical definition of the additional semantics will be required as well (and will be more concise and clear than the tests).

@dr-jts
Copy link
Contributor

dr-jts commented Dec 9, 2019

ISO leaves some interpretation for those conditions that fail on construction, and those conditions for validity. It seems unreasonable to require containment/intersection checks on a MultiGeometry on construction -- those are very expensive operations, and I like JTS's philosophy of allowing serialization/deserialization of problematic geometries, since we don't know what the user is doing with them or how much they care.

So sounds like the current JTS construction semantics are fine. So just need to tighten up the validity checks perhaps? And that only in the area of checking for empty/null components?

Also perhaps extend the definition of IsSimple, which should be fine. The concept of simplicity is actually a good way to expose some powerful topology checks, especially when combined with the JTS idea of generalizing the notion of "boundary node rule". For instance, this allows IsSimple to test whether a linear network is topologically valid (no interior intersections)

@FObermaier
Copy link
Contributor

Out of curiosity, does the ISO spec address undefined/no data ordinate values? e.g. JTS uses Double.NaN, Shapefile spec considers values < –10^38 for measures as no data, ...

@jagill
Copy link
Author

jagill commented Dec 10, 2019

ISO considers ordinates as real numbers. Not integers or floats. So it does not specify anything about infinities, NaNs, or subnormals (or integer effects like overflow). It's pretty specific for the general class of null pointers, so it describes the behavior when you try to pass a null array of Points to the MultiPoint constructor, or if one of the Points is null (in general, this would raise an error on construction).

@jagill
Copy link
Author

jagill commented Jan 5, 2020

I went over the OGC specs more carefully, and I agree that the JTS semantics are generally fine. I'd say the issues remaining are in three categories:

  1. Better sanitization of null/NaN inputs (to raise errors on construction, rather than NPEs/etc on later access).
  2. for MultiGeometries, if any component is not valid/simple, than the MultiGeometry should not be valid/simple.
  3. for MultiGeometries, ISO forbids any component from being empty. OGC doesn't mention empty geometries much. For consistency, I'd prefer that an empty constituent implies not valid (which also removes the need to check for emptiness in algorithms). What do you think?

@dr-jts
Copy link
Contributor

dr-jts commented Jan 5, 2020

I went over the OGC specs more carefully, and I agree that the JTS semantics are generally fine. I'd say the issues remaining are in three categories:

  1. Better sanitization of null/NaN inputs (to raise errors on construction, rather than NPEs/etc on later access).

Maybe... although I wonder if there is a use case for null ordinates?

This is easy to enforce a priori by client code, so does not seem essential.

  1. for MultiGeometries, if any component is not valid/simple, than the MultiGeometry should not be valid/simple.

As far as I know this is the current JTS semantic. Is there an example where this is not the case?

  1. for MultiGeometries, ISO forbids any component from being empty. OGC doesn't mention empty geometries much. For consistency, I'd prefer that an empty constituent implies not valid (which also removes the need to check for emptiness in algorithms). What do you think?

Similar comment to #1. Has appeal, but this would be a breaking change. Relatively easy to check a priori.

For #1 & 3, one approach would be to add these as checks, but allow them to be disabled via a global flag, for backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants