Skip to content

Database Overhaul and Optimization

Compare
Choose a tag to compare
@DnOberon DnOberon released this 19 Oct 16:38
· 2323 commits to master since this release

This is the second major release of the DeepLynx data warehouse. In this release we focused entirely on the database layer, fundamentally changing how we interact with the underlying PostgreSQL database. We also laid the foundation for several future features. The changelist below highlights all changes made as part of this release.

See the attached PowerPoint presentation for motivation behind this release -
Database Overhaul.pptx

  • Completely wiped out all previous migration scripts. This is a breaking change, and it will be tagged as such.
  • Converted all UUID id fields for all database tables into bigserial types. This will allow for better and faster indexing, as well as the ability to scale more gracefully. This will be especially important once we make changes to the nodes/edges table.
  • Made any code changes necessary for supporting the uuid to bigint change. Fortunately, not much needed to change. Javascript cannot natively handle 64bit integers in its number type. Therefore, when working with the new id fields using bigserial/bigint we must continue to treat them as a string or risk loss of fidelity. This is beneficial to the program as a whole as very little changes needed to be made to handle the type change from uuid to bigint
  • Introduced basic inheritance chain in metatypes and metatype relationships. You may now assign a parent/child combination using the *_inheritance join tables - actually incorporating inheritance at a code and HTTP server level will happen AFTER this overhaul has been completed so as to limit the amount of new features in this MR.
  • Created trigger functions for insuring that you cannot add an invalid parent/child pair - e.g where the proposed child is actually a parent further up in the proposed parent's inheritance chain.
  • Created functions for retrieving all keys of a metatype/relationship - both their own keys and inherited keys
  • Created trigger functions for insuring uniqueness in property_name field on metatype/relationship keys even across inherited keys
  • archived field now deleted_at in preparation of doing tuple, or point in time, versioning
  • removed the internal fields for graph and active_graph this is internal only and will not affect outside services. The reasoning and completion of this will be handled in an MR to this branch.
  • Converted all primary keys into bigint - this again didn't require too much code change as we have to represent bigint as strings in javascript due to the lack of ability to represent 64 bit integers in the number type.
  • Paid special attention to data staging, insuring that this was also converted to bigint in order to handle the large amount of incoming data
  • Type mapping transformations now contain the necessary fields for handling edges connecting to nodes based on a composite of their data source id, metatype id, and original id
  • Nodes table converted to be a partitioned by range (created_at) table, created default partition
  • Created trigger function for handling an insert when a node with the same composite id exists, or with the same id. This trigger function sets the old record's deleted_at field to the created_at field of the new record.
  • Converted nodes update and delete functions to work with point in time versioning. Updates now insert a new record and rely on the trigger for setting the old record's deleted_at field to the created_at field of the new/updated record. Deleting a node will now simply set its deleted_at field to the current time.
  • Edges table converted to partion by range (created_at), created default partition.
  • Edge updates and deletes handled the same way as nodes now.
  • Edge table updated to allow us to link edges either by node id, or a composite id consisting of a nodes original_data_id, metatype_id, and data_source_id.
  • Export tables converted to work with the node/edge changes
  • Converted event registration table to use bigint, made code changes to reflect that
  • Migrations for event system included
  • UI updated to work with DB changes, Transformation updated to include new keys for edge transformations