Skip to content

Commit

Permalink
check schema-level information from both pyspark df and pandera shcem…
Browse files Browse the repository at this point in the history
…a before applying nullable check

Signed-off-by: Filipe Oliveira <[email protected]>
  • Loading branch information
filipeo2-mck committed Nov 7, 2023
1 parent 38e0b1b commit 4f6ce43
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions pandera/backends/pyspark/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,13 @@ def coerce_dtype(

@validate_scope(scope=ValidationScope.SCHEMA)
def check_nullable(self, check_obj: DataFrame, schema):
# If True, ignore this `nullable` check
passed = schema.nullable
passed = True

Check warning on line 128 in pandera/backends/pyspark/column.py

View check run for this annotation

Codecov / codecov/patch

pandera/backends/pyspark/column.py#L128

Added line #L128 was not covered by tests

# If False, execute the costly validation
if not schema.nullable:
# Use schema level information to optimize execution of the `nullable` check:
# ignore this check if Pandera Field's `nullable` property is True
# (check not necessary) or if df column's `nullable` property is False
# (PySpark's nullable ensures the presence of values when creating the df)
if (not schema.nullable) and (check_obj.schema[schema.name].nullable):
passed = (

Check warning on line 135 in pandera/backends/pyspark/column.py

View check run for this annotation

Codecov / codecov/patch

pandera/backends/pyspark/column.py#L134-L135

Added lines #L134 - L135 were not covered by tests
check_obj.filter(col(schema.name).isNull()).limit(1).count()
== 0
Expand Down

0 comments on commit 4f6ce43

Please sign in to comment.