Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No datasets have been produced after first feedstock run #2

Open
andersy005 opened this issue Oct 31, 2022 · 17 comments
Open

No datasets have been produced after first feedstock run #2

andersy005 opened this issue Oct 31, 2022 · 17 comments

Comments

@andersy005
Copy link
Member

andersy005 commented Oct 31, 2022

Hi @andersy005, @cisaacstern ! So no datasets have been produced after the pull request has been merged, I do not know what went wrong : https://pangeo-forge.org/dashboard/feedstock/87, it only says that the status is failed ... Do you have any insights on this ? Is it possible to run it again ?

Originally posted by @auraoupa in pangeo-forge/staged-recipes#189 (comment)

@andersy005
Copy link
Member Author

andersy005 commented Oct 31, 2022

@auraoupa, The failures seem to be related to some connectivity issue

 Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
  ,

    a6170692e70616e67656f2d66-10310800-yvxa-harness-wbm9
        Root cause: Traceback (most recent call last):
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
      response = task()
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
      lambda: self.create_worker().do_instruction(request), request)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
      return getattr(self, request_type)(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
      bundle_processor.process_bundle(instruction_id))
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
      input_op_by_transform_id[element.transform_id].process_encoded(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
      self.output(decoded_value)
    File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
    File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
    File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
    File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
    File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
    File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
    File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
    File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
      config.storage_config.cache.cache_file(
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
      _copy_btw_filesystems(input_opener, target_opener)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
      data = source.read(BLOCK_SIZE)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
      return super().read(length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
      out = self.cache._fetch(self.loc, self.loc + length)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
      self.cache = self.fetcher(start, bend)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
      return sync(self.loop, func, *args, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 96, in sync
      raise return_result
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
      result[0] = await coro
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 624, in async_fetch_range
      r = await self.session.get(self.url, headers=headers, **kwargs)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request
      await resp.start(conn)
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 899, in start
      message, payload = await protocol.read()  # type: ignore[union-attr]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/streams.py", line 616, in read
      await self._waiter
  aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
timestamp: '2022-10-31T15:16:38.664727404Z'

This is most likely the remote server not being happy with multiple requests being done asynchronously ( i presume this is caused by dataflow's scaling...) Unfortunately, i don't know how to address this issue, but will let others chime in cc @rabernat / @martindurant / @yuvipanda / @alxmrs

@alxmrs
Copy link

alxmrs commented Oct 31, 2022

At first I thought you may need rate limiting in the pipeline (something like https://github.com/google/weather-tools/blob/0322cac4d679c105999a96cf9c3fced71e4561ae/weather_mv/loader_pipeline/util.py#L291, Charles and I have discussed this before on a separate issue). However, from the trace, it looks like this is an issue with copying data from their filesystem to ours. I'm interested to hear other's thoughts on the matter.

@martindurant
Copy link

martindurant commented Oct 31, 2022

Note that async method like cat allow for a batch_size argument to control how many requests are sent at a time. However, here we are using the stateful file API, so parallelism is controlled entirely outside of fsspec.

@auraoupa
Copy link

Hi @andersy005 ! I think I know why there is connectivity issues on our opendap : there is a lot of traffic going on the same network (only one graphic card for transfer and computations ...). What I could do is booking the machine for a time slot so that the pangeo-forge operation to be done, could you indicate me a day and a time for which you would be able to launch it again ?

@andersy005
Copy link
Member Author

thank you for looking into this, @auraoupa! i'm available all day today and tomorrow, and would be happy to help with the new recipe runs. Ping me whenever you are ready for us to try again.

@auraoupa
Copy link

Ok great ! Actually today is a slow day on the machine, could you give it a try now ? Thanks @andersy005

@auraoupa
Copy link

Hi @andersy005, I did not give up on these recipes yet ! There have been some modifications on our opendap in order to fix the connectivity issues, could you give it a last try ? Then if it does not work I will find another place to host the data ... Thanks for your help !

@auraoupa
Copy link

auraoupa commented Dec 6, 2022

Hi @andersy005 , @cisaacstern , @rabernat ! Sorry to be pushy, could you please give this recipe a last try ? If it still does not work I will create a new recipe with a different hosting opendap ... Thanks !

@auraoupa
Copy link

Hi @cisaacstern and @yuvipanda ! Would it be possible to try my recipe a last time so I know wether the opendap on which my data are currently hosted still has connectivity issues ? Thanks

@cisaacstern
Copy link
Member

cisaacstern commented Dec 19, 2022

👋 Hi @auraoupa, thanks for being persistent here, and apologies for the (terribly) delayed reply. As you can see, Pangeo Forge (both the software, and the community) does not support time-sensitive requests particularly well. In part, this is a product of our very small maintainer pool, and also it is partially an assumption of the platform design that the public data we are pulling will be (more or less) "always available" ... this latter assumption breaks down of course, in the case of pulling from bandwidth-constrained sources.

All that being said, I am of course happy to trigger a re-run now, which I will do by opening (then merging) a PR which makes some arbitrary change to the code. (A merged PR is currently our only switch for triggering a new run.) We can check back on this issue when this new run completes.

And again, apologies for the tremendous delay and thank you for keeping us accountable here.

@cisaacstern
Copy link
Member

@auraoupa, the deployments triggered by merging #5 have all failed, despite the pruned subset each of these recipes having just succeeded in the tests I ran from the discussion thread on #5 (which you can see there).

The errors I am seeing in the backend logs is consistent with above:

aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']

So this would still seem to be a concurrency/bandwidth issue with the source file server. Concurrency limiting is a valuable feature which we should have in Pangeo Forge, but simply have not had the developer time to build yet.

@auraoupa
Copy link

Thanks @cisaacstern for this test ! I guess I have to find another place to store the data then ... One idea though, do you think it could help if we tried just one of the 3 sub-recipes ? Or rearrange the files so that there is not so many of them ?

@cisaacstern
Copy link
Member

@auraoupa, running just one of the sub-recipes is a good thought, though unfortunately is not currently supported. (This would be a good future feature to develop under the general heading of concurrency limits.)

Or rearrange the files so that there is not so many of them ?

This is a promising idea. The production run will make one request per file, so yes, reducing file count will also reduce concurrency. If files are too large, however, we run the risk of long-running transfers with dropped connections.

How many files (of what sizes) does each sub-recipe currently have?

As a general guideline, I'd say if we can reduce number of files by at least 5x without pushing per-file sizes over 10 GB, it's worth a shot.

@auraoupa
Copy link

Thanks @cisaacstern for the suggestions ! I will then make monthly files instead of daily ones and maybe submit a new recipe with only one dataset at a time, which will be 12 files (instead of 3x 365 files), around 8Gb each !

@cisaacstern
Copy link
Member

@auraoupa, sounds good... this could conceivably work!

Perhaps this is clear, but in case not, please make the PR as an edit to the file feedstock/recipe.py in this repo (not to staged-recipes).

@auraoupa
Copy link

auraoupa commented Jan 3, 2023

Hi @cisaacstern, I hope you had nice end of year and wish you the best for 2023 ! I rewrote the recipe so we can try with fewer files at a time in the pull request #6 , could you please merge it ? Thanks !

@auraoupa
Copy link

Hi @cisaacstern ! Last try for this recipe in the pull request #6 , this time the files are smaller than 2Gb each and there are 73 of them ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants