Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload wizard #346

Closed
wants to merge 1,504 commits into from
Closed

File upload wizard #346

wants to merge 1,504 commits into from

Conversation

rija
Copy link
Contributor

@rija rija commented Sep 24, 2019

Pull request for issue:

  1. [Initial map browsing function #146: User stories, GitHub issues, Acceptance tests and dependencies] (User stories, GitHub issues, Acceptance tests and dependencies rija/gigadb-website#146)
  2. [Develop tc issue130 #147: Prototype of client/server file upload and software architecture blueprint] (Prototype of client/server file upload and software architecture blueprint rija/gigadb-website#147)
  3. [Develop #148: Setup upload server and ftp servers in Docker Compose and for CI/CD] (Setup upload server and ftp servers in Docker Compose and for CI/CD rija/gigadb-website#148)
  4. [view user X profile #151: Build admin function to create restricted access: filesystem directories & security token ] (Build admin function to create restricted access: filesystem directories & security token  rija/gigadb-website#151)
  5. [create a simplified dataset view #152: Build secure and reliable upload server based on validated prototype] (Build secure and reliable upload server based on validated prototype rija/gigadb-website#152)
  6. [thumbnail image tag length #153: Add remaining time estimate for file uploads] (Add remaining time estimate for file uploads rija/gigadb-website#163)
  7. [Display format of authors name on dataset page (#80, #81, #82) #165: Add FTP server interface to the Drop-Off area] (Add FTP server interface to the Drop-Off area rija/gigadb-website#165)

This is a pull request for the following functionalities:

File Upload Wizard FileDrop administration API and a prototype using it:

  • Create or Delete a FileDrop account and filesystem for a specific dataset through a REST API.
  • JSon Web Token based authentication for accessing the REST API
  • The FileDrop account's filesystem can be interacted with through a TUSd file upload server and an FTP server
  • A watcher (based on Linux's inotify) service moves files around the filesystem from the drop-off area to a download area (where ftp links used in mockup page are built from)
  • The initial prototype has been updated to use the API and to be optionally deployable from within the project
  • the File Upload Wizard is a standalone, full-stack web application built out of the Yii 2 Advanced template but still accessed through GigaDB website's URL space and sharing the same Database server and code repository
  • Nginx setup was updated to proxy multiple web apps and other services (Tusd). See ops/configuration/nginx-conf/sites
  • Nginx setup was updated to make it easy to disable/enable specific webapps (used for feature flagging the prototype). See ops/configuration/nginx-conf/enable_sites
  • Significant changes to the CI/CD scripts for deploying multiple webapps and for improved operational safety
  • Added TUSd service as container
  • Added Pure-FTPD service as container
  • Added a docker-inotify-command service as container

This PR is meant to lay down infrastructure changes needed to implement File Upload Wizard, so no visible business features are delivered here but the prototype was updated to use the API and to be deployable at /proto/.

Both the GoogleDocs spreadsheet for tasks and my Git project board should be up-to-date.

Changes to the database schema

No change to gigadb database, but a new database fuwdb to support File Upload Wizard web application's functions.

fuwdb's schema is almost entirely built out of Yii2 migrations in fuw/app/console/migrations.

Only the user table is created by the Yii2 Advanced Template's Composer script.

Changes to the model classes

No change to GigaDb model classes.
New model layer in the File Upload Wizard webapps divided in 3 categories:

  • Backend models (fuw/app/backend/models/FiledropAccount.php and fuw/app/backend/models/DockerManager.php): admin functions, used for admin management of Filedrop accounts
  • Frontend models: user facing for future integration between GigaDB UX and FUW's REST API, currently just scaffolded code from Yii2 Advanced template
  • Common models: model and config used by backend and frontend. fuw/app/common/models/User.php is used for the JSon web token (JWT) based authentication to the REST API.

Changes to the controller classes

No change to GigaDB controller classes.
There is a new REST controller in File Upload Wizard for creating and deleting filedrop accounts fuw/app/backend/controllers/FiledropAccountController.php.
That controller that corresponds to the FiledropAccount.php model was automatically scaffoled using gii.

However, I've added a custom fuw/app/backend/actions/FiledropAccountController/DeleteAction.php action so that DELETE command marks the model with a special status instead of deleting it.

Changes to the view template files

N/A

Changes to the style

N/A

Changes to the tests

In GigaDB:

The test setup for GigaDB was updated to fix inconsistent and fragile approach to loading and restoring database data and to work in context of multi-webapp testing. (relevant commit)

Also Behat acceptance tests now use profiles instead of tags for configuration. See behat.yml
So acceptance are run with a profile as argument. e.g:

$ docker-compose run --rm test bin/behat --profile local -v --stop-on-failure

The 'cli' profile is for Gitlab CI environment.

In File Upload Wizard

The File Upload Wizard has unit tests and functional tests. There's no business functionality completed yet so no acceptance tests has been added yet.

The tests are located in:

  • fuw/app/backend/tests/
  • fuw/app/common/tests/
  • fuw/app/frontend/tests/

Codeception is setup in File Upload Wizard directory as a test runner and test framework for unit, functional and acceptance tests (and it can support Behat features). It is integrated, extensible, up-to-date and maintained and work with the most recent and future version of underlying technology (PHP, PHPunit, Yii2, Selenium).

The config files relevant to Codeception are:

  • fuw/app/backend/codeception.yml
  • fuw/app/common/codeception.yml
  • fuw/app/frontend/codeception.yml
  • fuw/app/codeception.yml

More info on testing is included in docs/fuw/developer_guide.md.

Changes to the provisioning

The overall flow and technologies haven't changed. The following changes have been made to accomodate CI and CD of a multi-webapps project or to fix structural weakness in the CI/CD process:

  • Build manifests (Dockerfile and Production-Dockerfile) should be part of each webapp directory space, but service composition (docker-compose.*.yml), appliance provisioning (terraform) and system configuraiton (ansible) should stay centralized
  • It is dangerous to only have one Terraform state for all environments as a provisioning mistake can take down the wrong environment (e.g: production), so now there is a distinct Terraform state for each environment represented by a separate environment directory in ops/infrastructure/envs
  • Previously, provisioning a new environment required duplicating Terraform and Ansible code. Now they are modularized with the bulk of the code (Ansible roles and Terraform modules) stored outside the environment directory because they are environment-agnostic.
  • The Ansible playbook and variables and Terraform main script and variables are in the environment-specific directory in ops/infrastructure/envs

Furthermore, the provisioning for GigaDB has been upgraded to Debian 9 Stretch and the database server upgraded to Postgresql 9.6.

Finally, there are updates to Gitlab CI config .gitlab-ci.yml:

  • distinct stages to deploy on staging for GigaDB only (deploy_gigadb), GigaDB and File Upload Wizard API (deploy_apps), or GigaDB with File Upload Wizard API and the prototype (deploy_proto)
  • It is no longer necessary to have a separate stage for initial deployment of Let's Encrypt certificate.

Changes to the documentation

  • The CI/CD docs have been updated in light of changes in provisioning
  • There's a new docs/fuw directory with documentation relating to File Upload Wizard (with the quickstart guide in docs/fuw/index.md)
  • I now use mkdocs to create a doc server and to build a doc web site easily (configured with mkdocs.yml at the root of the project)
  • There's a docs/index.md with instructions for using mkdocs and to serve as index page for doc server/website.
  • updated README.md and docs/INSTALL.md to reflect the change in starting and testing the GigaDB webapp

Security related changes

Bind-mounting of Docker Daemon unix socket inside container is a security vulnerability, so to allow inter-container communication, File Upload Wizard use the Docker-PHP php library that uses the Docker Daemon API instead. See DockerManager.php for the current implementation.
(I haven't changed yet the GigaDB acceptance tests that still use the unix socket approach).
However on the Mac, the current version of Docker For Desktop doesn't expose the API on a TCP port, so when working on File Upload Wizard, we need to manually expose the API on a TCP port during development. See docs/fuw/index.md

@pli888
Copy link
Member

pli888 commented Nov 26, 2019

When trying to get #346 to work in my dev environment, I get a couple of problems when following the instructions in README.md:

  1. Directly after docker-compose run --rm config, running docker-compose run --rm less to generate site.css fails for me:
$ docker-compose run --rm less
Pulling less (deployment_application:latest)...
ERROR: pull access denied for deployment_application, repository does not exist or may require 'docker login'

Running docker-compose run --rm gigadb first then allows docker-compose run --rm less to be successfully executed.

  1. Executing docker-compose run --rm gigadb also failed for me at first with this message:
  [Composer\Json\JsonValidationException]                     
  "./composer.json" does not match the expected JSON schema:  
   - type : Does not match the regex pattern ^[a-z0-9-]+$  

Seems to be caused by a problem in composer.json with a type object value containing capital letters and spaces. Changing the type value to something like yii-website (via ops/configuration/php-conf/composer.json.dist) allows docker-compose run --rm gigadb to then work.

@pli888
Copy link
Member

pli888 commented Nov 26, 2019

When running unit tests, I got an error message:

$ docker-compose run --rm test ./bin/phpunit --testsuite unit --bootstrap protected/tests/unit_bootstrap.php --verbose --configuration protected/tests/phpunit.xml --no-coverage

Package postgresql-client-9.4 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'postgresql-client-9.4' has no installation candidate
ERROR: Service 'test' failed to build: The command '/bin/sh -c if [ ${INSTALL_PG_CLIENT} = true ]; then     mkdir -p /usr/share/man/man1 &&     mkdir -p /usr/share/man/man7 &&     apt-get update -yq &&     apt-get install -y postgresql-client-${PG_CLIENT_VERSION} ;fi' returned a non-zero code: 100

The problem seems to be caused by ops/packaging/Dockerfile on line 80. For the test container, INSTALL_PG_CLIENT=true (but not for application container) and POSTGRES_VERSION=9.4 from .env file. If changed to POSTGRES_VERSION=9.6 in .env file then this allows the test container to build and pass all unit tests.

@rija
Copy link
Contributor Author

rija commented Nov 26, 2019

Thanks for the feedback, I’ll have a look

@pli888
Copy link
Member

pli888 commented Nov 26, 2019

On running the acceptance tests, the redirect test fails:

@ok @javascript
  Scenario: redirect                                                                        # features/dataset-admin.feature:76
    Given Gigadb web site is loaded with "gigadb_testdata.pgdmp" data                       # GigadbWebsiteContext::gigadbWebSiteIsLoadedWithData()
      │ Initializing the database with gigadb_testdata.pgdmp... Terminating DB Backend...
      │ Recreating database gigadb...
      │ Restarting php container for deployment project...
      │ 
    Given I sign in as an admin                                                             # GigadbWebsiteContext::iSignInAsAnAdmin()
    And I am on "/adminDataset/update/id/210"                                               # Behat\MinkExtension\Context\MinkContext::visit()
    When I fill in "urltoredirect" with "http://gigadb.dev/dataset/100002/token/ban74hsfds" # Behat\MinkExtension\Context\MinkContext::fillField()
    And I press "Save"                                                                      # Behat\MinkExtension\Context\MinkContext::pressButton()
    And I go to "/dataset/100002/token/ban74hsfds"                                          # Behat\MinkExtension\Context\MinkContext::visit()
    Then the url should be "/dataset/100002/token/ban74hsfds"                               # DatasetAdminContext::theUrlShouldBe()
    And I wait "20" seconds                                                                 # ClaimDatasetContext::iWaitSeconds()
    And the url should be "/dataset/100002"                                                 # DatasetAdminContext::theUrlShouldBe()
      Failed asserting that two strings are equal.
      --- Expected
      +++ Actual
      @@ @@
      -'/dataset/100002'
      +'/dataset/100002/token/ban74hsfds'
    │
    │  http://gigadb.dev/dataset/100002/token/ban74hsfds
    │
    └─ @AfterStep # GigadbWebsiteContext::debugStep()

--- Failed scenarios:

    features/dataset-admin.feature:76

43 scenarios (42 passed, 1 failed)
430 steps (429 passed, 1 failed)

This redirect also doesn't happen when I test it for it using http://gigadb.gigasciencejournal.com:9170/dataset/100002/token/ban74hsfds - the browser stays on this URL.

@pli888
Copy link
Member

pli888 commented Nov 26, 2019

When running the CI/CD pipeline, the run_all_tests job fails with a message ERROR: Job failed: exit code 1. Looking at the pipeline log for this job, there are problems with all 12 tests in FiledropAccountTest.php. All tests have the same [yii\db\Exception] SQLSTATE[08006] error, for example:

1) FiledropAccountTest: Can create writable directories
1782  Test  tests/unit/FiledropAccountTest.php:testCanCreateWritableDirectories
1783                                                                                                                         
1784   [yii\db\Exception] SQLSTATE[08006] [7] could not translate host name "dbname=" to address: Name or service not known  
1785                                                                                                                         
1786 #1  /app/vendor/yiisoft/yii2/db/Connection.php:635
1787 #2  /app/vendor/yiisoft/yii2/db/Connection.php:1015
1788 #3  /app/vendor/yiisoft/yii2/db/Connection.php:1002
1789 #4  /app/vendor/yiisoft/yii2/db/Schema.php:462
1790 #5  /app/vendor/yiisoft/yii2/db/Connection.php:894
1791 #6  /app/vendor/yiisoft/yii2/db/Command.php:209
1792 #7  /app/vendor/yiisoft/yii2/db/Command.php:1115
1793 #8  /app/vendor/yiisoft/yii2/db/Command.php:1136
1794 #9  /app/vendor/yiisoft/yii2/db/Command.php:442
1795 #10 /app/vendor/yiisoft/yii2/db/pgsql/Schema.php:182

@rija
Copy link
Contributor Author

rija commented Nov 27, 2019

Hi @pli888 ,

Regarding the composer error, I just saw that you made a corresponding fix to the develop branch 3 weeks ago. I'll rebase the PR branch for that and to ensure I haven't missed any other changes.

Regarding Postgresql 9.6 version, I mentionned it in the PR, but I should have explicitely warn to change the version in the .env file and I should have updated the env-sample file as well.

I'll do all the above now, tweak the readme, and then will look at the acceptance tests issues you are seeing (I didn't have those problems so my install must have some assumptions).

Rija Menage added 22 commits November 27, 2019 16:00
before restoring the original database after running acceptance tests,
we need to drop the database because some of test database dump loaded
during test run have out-dated schema that cause the post-tests
pg_restore process to fail. Due to the need for  all clients to be
disconnected before dropping database, the drop and restore has to be done
as a AfterSuite hooks in Behat when we still got a chance
to kill connections and restart php container. It's also more elegant
than doing it in the bash script.
not really needed as it was there for debug
and it has been removed from Debian 9 causing that step to fail
Gigadb web site can run more than one PHP webapp behind the Nginx web site.
Only one Docker Compose file is used to select the webapps to start and their dependencies.

There is also integrated test runners across all web apps for unit tests and functional tests

There is one Postgresql server holding all the necessary database schema used by the PHP webapps.

Each new and future webapp has their own directory that should match the Yii2 advanced template
and can be made of separate sub-webapps as well. A Dockerfile and test directories for each webapp/sub-webapps should be kept in
each webapp directory, but the provisioning, CI/CD setup, and integrated test runners are centralized.

functional_custom_bootstrap.php:

functional tests need the main database to have certain test data, but at the same time it needs to backup the current database
before running and restoring it after running, hence the need of a special bootstrap file "functional_custom_bootstrap.php"
for Gigadb's webapp functional tests.

Before it wasn't needed there because these hooks were done in the overall test runner that ran all the suites for the only webapp, but now we need to be flexible with each suite and run each suite for all webapps.

Since webapps may have a distinct database schema, such hooks should be within a webapp's test directories.

enable_sites:

"enable_sites" is a shell script (use Alpine's ash shell) that symlink nginx server configs from site-availables to sites-enabled directories. Until they appear in the later, nginx won't be able to serve them.
It allow Docker compose to start nginx for a selected list of webapps.
That and the integrated test runner make it easier to configure new webapps.
Rija Menage added 28 commits November 27, 2019 18:01
there is now three stages to choose from for deploying to staging:
deploy_gigadb: deploy regular gigadb website with nothing related to File Upload Wizard
deploy_apps: deploy GigaDB website and the File Upload Wizard API
deploy_proto: deploy GigaDB website and the File Upload Wizard API and the prototype

missed a step for deploying GigaDB: docker-compose run --rm less

missed a step for deploying File Upload Wizard: docker-compose run --rm fuw-config
to avoid otherwise successful tests to return non-zero exit status when
File Upload Wizard container are not running, the test runner only conditionally
run FUW tests if the console container is running and the success status of the GigaDB
test run is preserved and used as exit status at the end
By versioning the host data directory used by postgresql container,
it will enable data migration when upgrading between versions of postgresql
with breaking data format changes.

Then a tool like this can be used:

https://github.com/tianon/docker-postgres-upgrade

Fixed the gigadb_testdata.pgdmp to preserve the rebased from develop while
being in 9.6 postgresql format (and remove the step in acceptance scenarios
that's a fix no longer necessary)
This looks like and old bug rather than a regression. It may happen on production too.

There was a framework level validation limiting max characters to 50 only.
The acceptance tests didn't fail and pick that up because the url used in the scenario
was unfortunately exactly 50 characters.

Fixed the test scenario and the validation to match the database column max character.
This looks like and old bug rather than a regression. It may happen on production too.

It is connected to the bug of the redirect not working with long urls,
as in the test for this didn't fail because of the same root cause of the redirect issue (see previous commit), 
but this time the dodgy characters entered in the test scenario was longer than the artificially limited field length
so the dodgy keyword was indeed rejected, but for the wrong reason.

Added proper validation and unit tests. The acceptance scenario for this feature doesn't need fixing.
@rija
Copy link
Contributor Author

rija commented Dec 5, 2019

Hi @pli888,

Feel free to try running this PR again, I made a lot of fixes (see list below).

All tests suites are passing locally on my machine and in the CI:
https://gitlab.com/gigascience/forks/rija-gigadb-website/pipelines/100809840

I've managed to reproduce the issue you've had with the redirect test that initiated my commits c674b9c and 9fd113b below which are not regressions but old bugs that may also happen on live.

4b4b5caa  Thu Dec 5 11:47:33 2019 +0800   Remove the README section on socat as it is documented in FUW docs
c674b9c8  Thu Dec 5 10:59:36 2019 +0800   Fix issue with keywords validation in Dataset attributes
9fd113be  Thu Dec 5 09:16:28 2019 +0800   Fix dataset redirect link not functioning with long urls
035f988c  Wed Dec 4 14:53:21 2019 +0800   Fix more errors in docs, improve error messages in config generation
bd5dfa35  Tue Dec 3 22:36:22 2019 +0800   Update the configuration samples to reflect postgresql version and FUW
10fc65b5  Tue Dec 3 22:10:22 2019 +0800   Update File Upload Wizard docs to be more coherent
2631fd69  Mon Dec 2 17:55:47 2019 +0800   Update README to mention the usage for --build argument
3ec274f9  Mon Dec 2 17:07:36 2019 +0800   Upgrade test db dump to 9.6 and version postgres data directory on host
3c7fdd8f  Fri Nov 29 15:23:36 2019 +0800  Remove some debug logging in CI
382f2ae8  Fri Nov 29 14:56:01 2019 +0800  Change acceptance profile label for CI to do some JS testing
354b16aa  Fri Nov 29 14:45:08 2019 +0800  Update FUW docs for consistency with README and fix typo in test runner
270acc22  Fri Nov 29 12:40:01 2019 +0800  Port Peter's fix to composer error
5216ea98  Fri Nov 29 12:23:12 2019 +0800  Fix a typo in a target action in an ajax call
c83dedf1  Fri Nov 29 10:42:23 2019 +0800  Change README with updated instructions for starting and testing GigaDB
cbc5b42e  Fri Nov 29 10:36:34 2019 +0800  Update test runners to exit successfully if File Upload Wizard not up
fa23980c  Thu Nov 28 20:11:46 2019 +0800  Merge branch 'file-upload-wizard' of github.com:rija/gigadb-website into file-upload-wizard

@pli888 pli888 closed this Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants