-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No datasets have been produced after first feedstock run #2
Comments
@auraoupa, The failures seem to be related to some connectivity issue Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
,
a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
Root cause: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
,
a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
Root cause: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
,
a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
Root cause: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
raise return_result
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
r = await self.session.get(self.url, headers=headers, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
timestamp: '2022-10-31T15:16:38.664727404Z' This is most likely the remote server not being happy with multiple requests being done asynchronously ( i presume this is caused by dataflow's scaling...) Unfortunately, i don't know how to address this issue, but will let others chime in cc @rabernat / @martindurant / @yuvipanda / @alxmrs |
At first I thought you may need rate limiting in the pipeline (something like https://github.com/google/weather-tools/blob/0322cac4d679c105999a96cf9c3fced71e4561ae/weather_mv/loader_pipeline/util.py#L291, Charles and I have discussed this before on a separate issue). However, from the trace, it looks like this is an issue with copying data from their filesystem to ours. I'm interested to hear other's thoughts on the matter. |
Note that async method like |
Hi @andersy005 ! I think I know why there is connectivity issues on our opendap : there is a lot of traffic going on the same network (only one graphic card for transfer and computations ...). What I could do is booking the machine for a time slot so that the pangeo-forge operation to be done, could you indicate me a day and a time for which you would be able to launch it again ? |
thank you for looking into this, @auraoupa! i'm available all day today and tomorrow, and would be happy to help with the new recipe runs. Ping me whenever you are ready for us to try again. |
Ok great ! Actually today is a slow day on the machine, could you give it a try now ? Thanks @andersy005 |
Hi @andersy005, I did not give up on these recipes yet ! There have been some modifications on our opendap in order to fix the connectivity issues, could you give it a last try ? Then if it does not work I will find another place to host the data ... Thanks for your help ! |
Hi @andersy005 , @cisaacstern , @rabernat ! Sorry to be pushy, could you please give this recipe a last try ? If it still does not work I will create a new recipe with a different hosting opendap ... Thanks ! |
Hi @cisaacstern and @yuvipanda ! Would it be possible to try my recipe a last time so I know wether the opendap on which my data are currently hosted still has connectivity issues ? Thanks |
👋 Hi @auraoupa, thanks for being persistent here, and apologies for the (terribly) delayed reply. As you can see, Pangeo Forge (both the software, and the community) does not support time-sensitive requests particularly well. In part, this is a product of our very small maintainer pool, and also it is partially an assumption of the platform design that the public data we are pulling will be (more or less) "always available" ... this latter assumption breaks down of course, in the case of pulling from bandwidth-constrained sources. All that being said, I am of course happy to trigger a re-run now, which I will do by opening (then merging) a PR which makes some arbitrary change to the code. (A merged PR is currently our only switch for triggering a new run.) We can check back on this issue when this new run completes. And again, apologies for the tremendous delay and thank you for keeping us accountable here. |
@auraoupa, the deployments triggered by merging #5 have all failed, despite the pruned subset each of these recipes having just succeeded in the tests I ran from the discussion thread on #5 (which you can see there). The errors I am seeing in the backend logs is consistent with above:
So this would still seem to be a concurrency/bandwidth issue with the source file server. Concurrency limiting is a valuable feature which we should have in Pangeo Forge, but simply have not had the developer time to build yet. |
Thanks @cisaacstern for this test ! I guess I have to find another place to store the data then ... One idea though, do you think it could help if we tried just one of the 3 sub-recipes ? Or rearrange the files so that there is not so many of them ? |
@auraoupa, running just one of the sub-recipes is a good thought, though unfortunately is not currently supported. (This would be a good future feature to develop under the general heading of concurrency limits.)
This is a promising idea. The production run will make one request per file, so yes, reducing file count will also reduce concurrency. If files are too large, however, we run the risk of long-running transfers with dropped connections. How many files (of what sizes) does each sub-recipe currently have? As a general guideline, I'd say if we can reduce number of files by at least 5x without pushing per-file sizes over 10 GB, it's worth a shot. |
Thanks @cisaacstern for the suggestions ! I will then make monthly files instead of daily ones and maybe submit a new recipe with only one dataset at a time, which will be 12 files (instead of 3x 365 files), around 8Gb each ! |
@auraoupa, sounds good... this could conceivably work! Perhaps this is clear, but in case not, please make the PR as an edit to the file |
Hi @cisaacstern, I hope you had nice end of year and wish you the best for 2023 ! I rewrote the recipe so we can try with fewer files at a time in the pull request #6 , could you please merge it ? Thanks ! |
Hi @cisaacstern ! Last try for this recipe in the pull request #6 , this time the files are smaller than 2Gb each and there are 73 of them ... |
Hi @andersy005, @cisaacstern ! So no datasets have been produced after the pull request has been merged, I do not know what went wrong : https://pangeo-forge.org/dashboard/feedstock/87, it only says that the status is failed ... Do you have any insights on this ? Is it possible to run it again ?
Originally posted by @auraoupa in pangeo-forge/staged-recipes#189 (comment)
The text was updated successfully, but these errors were encountered: