Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse data from Mega system #69

Open
mbenson182 opened this issue Jun 7, 2019 · 2 comments
Open

Unable to parse data from Mega system #69

mbenson182 opened this issue Jun 7, 2019 · 2 comments

Comments

@mbenson182
Copy link
Contributor

I realize that this is my second issue request in a week, for which I apologize, but the problem I'm having now is of much higher importance to me than the last one, as I'm primarily focused on just being able to replicate the parsing and rectification methods on my own data sets. I've been able to get the read() and correct() functions working on the test data, which is about as much as I need (at least for now).

However, I've been trying to use these functions on some data my group has collected, and have been unable to get it parsed out. It seems like in the calls to pyread or pyread_single when trying to parse the scans (in getmetadat() and _get_scans(), respectively).

This is the output when I try to run PyHum.read():

Input file is Rec00003.DAT
Son files are in Rec00003/
cs2cs arguments are epsg:26949
Draft: 0.3
Celerity of sound: 1450.0 m/s
Transducer length is 0.108 m
Only 1 chunk will be produced
Data is from the 2 series
Checking the epsg code you have chosen for compatibility with Basemap ...
... epsg code compatible
WARNING: Because files have to be read in byte by byte,
this could take a very long time ...
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 34.1s
[Parallel(n_jobs=2)]: Done 2 out of 4 | elapsed: 34.1s remaining: 34.1s
[Parallel(n_jobs=2)]: Done 4 out of 4 | elapsed: 34.1s remaining: 0.0s
something went wrong with the parallelised version of pyread ...
Traceback (most recent call last):
File "PyHumRead.py", line 78, in
reader()
File "PyHumRead.py", line 18, in reader
ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m')
File "/home/user/miniconda2/envs/pyhum/lib/python2.7/site-packages/PyHum/_pyhum_read.py", line 427, in read
metadat = data.getmetadata()
File "PyHum/pyread.pyx", line 532, in PyHum.pyread.pyread.getmetadata
File "PyHum/pyread.pyx", line 538, in PyHum.pyread.pyread.getmetadata
TypeError: 'NoneType' object is not subscriptable

I don't particularly mind that the Parallel process works, but the interesting thing is that it seems to execute correctly, and then crash as it tries to finish up the process, which is interesting. Anyway, I wrote up a script to try to run the Parallel parsing method, but without actually calling Parallel (so a single-threaded way of calling the same code, since apparently the code executed by the except: block is different than that in the try: block). The code is:

import PyHum as ph
import glob, sys
import os
import pdb

import PyHum.utils as humutils
import PyHum.pyread_single as pyread_single

def reader():
	humfile = "Rec00003.DAT'"
	sonpath = '"Rec00003/"
	c = 1450.0;
	draft = 0.3;
	t = 0.108;
	model = 2; #2


	# ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m')

	# get the SON files from this directory
	sonfiles = glob.glob(sonpath+'*.SON')
	if not sonfiles:
		sonfiles = glob.glob(os.getcwd()+os.sep+sonpath+'*.SON')

	base = humfile.split('.DAT') # get base of file name for output
	base = base[0].split(os.sep)[-1]

	# remove underscores, negatives and spaces from basename
	base = humutils.strip_base(base)

	print("WARNING: Because files have to be read in byte by byte,")
	print("this could take a very long time ...")

	# Single-threaded version of Parallel call
	X = []; Y = []; A = []; B = [];
	for k in range(len(sonfiles)):
		X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949")




def getscans(sonfile, humfile, c, model, cs2cs_args):

   data = pyread_single.pyread(sonfile, humfile, c, model, cs2cs_args)

   a, b = data.getscan()

   if b == 'sidescan_port':
      dat = data.gethumdat()
      metadat = data.getmetadata()
   else:
      dat = None
      metadat = None

#    return a, b, dat, metadat


if __name__ == '__main__':
	reader()

The output when I run this code block is:

WARNING: Because files have to be read in byte by byte,
this could take a very long time ...
Traceback (most recent call last):
File "PyHumRead.py", line 77, in
reader()
File "PyHumRead.py", line 37, in reader
X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949")
File "PyHumRead.py", line 64, in getscans
a, b = data.getscan()
File "PyHum/pyread_single.pyx", line 473, in PyHum.pyread_single.pyread.getscan
File "PyHum/pyread_single.pyx", line 502, in PyHum.pyread_single.pyread.getscan
File "PyHum/pyread_single.pyx", line 458, in PyHum.pyread_single.pyread._get_scans
MemoryError

There's definitely a problem going on in the parsing somewhere, but I'm not sure how to tackle figuring it out, as the problems seem to be happening in the Cython files in private data types which I can't figure out how to access.

Any help would be greatly appreciated! I'd attach the data I'm working off of, but the zipped file is about 130 MB; let me know if there's a good way to get it to you.

@mbenson182
Copy link
Contributor Author

Daniel,

Would you be able to provide the documentation you used (if any) in order to determine how to decode the byte packing in the Humminbird files? After some further debugging fueled by coffee this morning, I've found that the _decode_humdat function in pyread_single is definitely not parsing the file I have correctly. Here's my output if I print "headdict", the variable returned by that decode_humdat function:

{'linesize': 872742912, 'recordlens_ms': -429582336, 'water_type': 'deep salt', 'lat': 89.99999999999996, 'sonar_name': '151257088', 'lon': -3255.203218234509, 'unix_time': 2028059996, 'numrecords': 680853760, 'water_code': 1, 'filename': 'Rec00003.S', 'utm_x': -362381825, 'utm_y': -639219968}

Some of the filename is cut off, and the utm coordinates (and hence lat/long as well) don't really make sense, the utx time is wrong (the one listed is in 2034, and unfortunately I haven't quite mastered time travel yet), and I presume some of the other fields are misparsed as well. Perhaps the model I'm using has a different byte-packing method, or Humminbird changed the firmware for my model (and hence how the data stacks as well). Either way, getting some documentation, if it exists, should help with this greatly.

I've also attached the .DAT file if you want to take a look or try it on your own machine.

Thanks,
Mike
DatFile.zip

@dbuscombe-usgs
Copy link
Member

The data formats are in the docs folder.

Yes, it seems likely the firmware and model number don't match up. I know of no way to program against various models and firmwares

Here is a non-cythonized version of the main reading script. I also translated into python 3. I hope to move pyhum fully over to python 3 over the coming weeks and months

pyread.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants