You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
File contains quoted numbder "2,126,000,000"....
Throws off index alignment between types extracted in headers and data....
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 397, in run_inference
schemas_result = prl.parallel(records = lines,obj=dtype, d_schema = self.__schema)
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in parallel
return [p.get() for p in results]
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in
return [p.get() for p in results]
To Reproduce
Steps to reproduce the behavior:
See example below...
"id","country","year","sex","age","suicides_no","population","country-year","HDI for year"," gdp_for_year","gdp_per_capita","generation"
0,"Albania",1987,"male","15-24 years",21,312900,"Albania1987",,"2,156,624,900",796,"Generation X"
1,"Albania",1987,"male","35-54 years",16,308000,"Albania1987",,"2,156,624,900",796,"Silent"
2,"Albania",1987,"female","15-24 years",14,289700,"Albania1987",,"2,156,624,900",796,"Generation X"
3,"Albania",1987,"male","75+ years",1,21800,"Albania1987",,"2,156,624,900",796,"G.I. Generation"
4,"Albania",1987,"male","25-34 years",9,274300,"Albania1987",,"2,156,624,900",796,"Boomers"
5,"Albania",1987,"female","75+ years",1,35600,"Albania1987",,"2,156,624,900",796,"G.I. Generation"
See code below...
from multiprocessing import freeze_support, Process
from csv_schema_inference import csv_schema_inference
def main():
#if the inferred data type is INTEGER and there is a presence of FLOAT on the results , then the result will be FLOAT
conditions = {"INTEGER":"FLOAT"}
pathfile = "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/suicide_data.csv"
if name == 'main':
freeze_support()
Process(target=main).start()
Expected behavior
Should have made it to some kind of schema inference.
e.g.
0
name
Username; Identifier;One-time password;Recovery code;First name;Last name;Department;Location
type
STRING
nullable
False
....
Desktop (please complete the following information):
OS: Ubuntu 22.04 and Python 3.10.12
The text was updated successfully, but these errors were encountered:
Describe the bug
File contains quoted numbder "2,126,000,000"....
Throws off index alignment between types extracted in headers and data....
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 397, in run_inference
schemas_result = prl.parallel(records = lines,obj=dtype, d_schema = self.__schema)
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in parallel
return [p.get() for p in results]
File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in
return [p.get() for p in results]
To Reproduce
Steps to reproduce the behavior:
See example below...
"id","country","year","sex","age","suicides_no","population","country-year","HDI for year"," gdp_for_year","gdp_per_capita","generation"
0,"Albania",1987,"male","15-24 years",21,312900,"Albania1987",,"2,156,624,900",796,"Generation X"
1,"Albania",1987,"male","35-54 years",16,308000,"Albania1987",,"2,156,624,900",796,"Silent"
2,"Albania",1987,"female","15-24 years",14,289700,"Albania1987",,"2,156,624,900",796,"Generation X"
3,"Albania",1987,"male","75+ years",1,21800,"Albania1987",,"2,156,624,900",796,"G.I. Generation"
4,"Albania",1987,"male","25-34 years",9,274300,"Albania1987",,"2,156,624,900",796,"Boomers"
5,"Albania",1987,"female","75+ years",1,35600,"Albania1987",,"2,156,624,900",796,"G.I. Generation"
See code below...
from multiprocessing import freeze_support, Process
from csv_schema_inference import csv_schema_inference
def main():
#if the inferred data type is INTEGER and there is a presence of FLOAT on the results , then the result will be FLOAT
conditions = {"INTEGER":"FLOAT"}
pathfile = "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/suicide_data.csv"
csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.9, max_length=100, batch_size = 200000, acc = 0.8, seed=2, header=True, sep=",", conditions = conditions)
aprox_schema = csv_infer.run_inference(pathfile)
csv_infer.pretty(aprox_schema)
if name == 'main':
freeze_support()
Process(target=main).start()
Expected behavior
Should have made it to some kind of schema inference.
e.g.
0
name
Username; Identifier;One-time password;Recovery code;First name;Last name;Department;Location
type
STRING
nullable
False
....
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: