Plan for decoupling velut from Excel

For context, see this project’s readme. If none of this makes sense to you, I apologize.

I’ve already made several webpages made with vanilla HTML/CSS/JavaScript that I use to manage velut, in addition to the gigantic Excel file. I’ll probably be making more webpages and Node scripts to fulfil the steps below. The steps are (mostly) in chronological order.

A diagram of the new architecture is after the plan.

More words and statistics are on a dedicated de-Excellation page of the velut website.

Port velut to Next.js. Completed 2022-07-23.
Write this plan for how to finish the de-Excellation of velut. Done 2022-07-23, though I have added to the plan since.
Decide whether to cancel my Render subscription for serving the MERN version. Subscription suspended 2022-09-10.
Write a blogpost about how I ported velut to Next.js. This isn’t urgent, but I should write it before I forget what I did! Eventually completed 2023-01-28.
Make a webpage (or similar) that replaces the Excel sheet wordsform, which generates the phonetic data (etc) for each word. (Top-right in the Excel screenshot in the readme.) It doesn’t need database access. Word Data Generator completed 2022-09-29.
For each lemma in velut, generate the list of forms already in velut. This should be something that can be repeated easily whenever I add to the Excel file. A webpage that has <textarea>s for relevant data would suffice. Name it the Forms Collator (or something better). Forms Collator completed 2022-10-08.
Export the manually-entered data of lemmata into Json. (This is the Excel sheet bottom-left in the screenshot.) Done 2022-10-09.
Stop adding words to the Excel file. I haven’t touched the Excel file since 2022-10-09.
Write something that generates an empty list for each lemma. Name it the Inflector. Done 2022-10-15
Write tests that compare the output of the Forms Collator (the lists of forms already in velut) to the output of the Inflector (a set of empty lists, at this point in time). The tests will fail for all lemmata. Done 2022-10-15
Make the Inflector return the lemma for conjunctions and prepositions, which will make some tests pass. Done 2022-10-15
Make the Inflector return the forms for each lemma as an object containing all parsing data. I don’t want an array of simple strings (["amō","amās","amat",...]) for each lemma, but something more like this (for a verb):


          {
            "unencliticized": {
              "indicative": {
                "active": {
                  "present": {
                    "singular": {
                      "1st": ["amō"],
                      "2nd": ["amās"],
                      "3rd": ["amat"]
                    }
                  }
                }
              }
            }
          }

Done 2022-10-15

Make sure tests can handle what the Inflector generates. If the Forms Collator gives ["amō","amās","amat"] and the Inflector gives the object above, the tests should pass because the forms are the same. Order does not matter. Done 2022-10-15
Handling for enclitics — make it so that all forms (where relevant) generated by the Inflector have encliticized forms as well as the unencliticized form. The enclitics in Latin are ‘-que’, ‘-ne’, and ‘-ve’. (Exemptions include conjunctions and lemmata already ending in “-que” such as ‘quisque’. The word ‘ūsquene’ is attested, however.) Done 2022-10-15
Add whatever special cases need to be added to make all the tests pass for conjunctions and prepositions. (Because some conjunctions/prepositions have forms other than the lemma.) Done 2022-10-15
Make the Inflector generate the positive/comparative/superlative forms for adverbs. (Some adverbs will need to be marked as not having comparative/superlatives. There may be other special cases too.) Done 2022-10-30
Make the Inflector generate the forms for adjectives. (Some adjectives will need to be marked as not having comparative/superlatives/etc. There may be other special cases too.) Done 2022-11-13, though I’m temporarily excluding comparatives/superlatives that are not already in velut.
Continue for pronouns, nouns (including proper nouns), and verbs. Pronouns finished 2022-11-18. Non-proper nouns finished 2022-12-10, though I cut a couple of corners with third-declension I-stem nouns. Proper nouns finished 2022-12-30. Verbs finished 2023-04-30. Then I returned to the third declension. This step finished 2023-05-08.
Eventually the list of words in Excel will match that generated by the Inflector, or at least be a subset thereof. (Order does not matter.) Tests will pass. The Inflector generates all the forms that were in Excel, as of when I completed the previous step on 2023-05-08.
Create a local version of the MongoDB database for use in development. So far I’ve been running my development server off the production database, which has been perfectly fine — but it makes sense to have a separate database at this step. Completed 2023-05-20.
Make the parsing data generated by the Inflector get merged into the Json file for the lemmata MongoDB collection. Every lemma will have the Lemma, PartOfSpeech, Meanings (etc) fields that it currently has, but also a Forms field that is the object of parsing data. Completed 2023-05-20.
Replace the lemmata collection in the local MongoDB with the output of the previous step. Completed 2023-05-20.
Add inflection-tables locally on the front-end, using the output of the Inflector in the local database. Completed 2023-05-28.
An example from Wiktionary:
Ensure I can push the inflection-table work to production and not see the inflection-tables on the live website. The flat list of forms will continue to be shown if a lemma doesn’t have the Forms field. True as of 2023-05-28.
Decide whether inflection-tables should include encliticized forms. True as of 2023-05-28 — I’ve made a nice tabs component to switch between enclitics.
Confirm that, if unencliticized forms are generated that are not in Excel, they are all worthy of being added to velut, for some parts of speech. Seeing the forms in the tables will help with this. Proper nouns finished 2023-09-16. Conjunctions finished 2023-09-17. Pronouns finished 2023-09-23.
Create a page on the velut website that shows my progress in checking the output of the Inflector. (This and the steps below that relate to an environment variable were not in my original plan, but I decided to do them after I finished writing the Inflector and started checking its output. It will take a while for me to confirm the forms for all lemmata, and there’s not much point in waiting so long before displaying the inflection-tables for some lemmata.) Done 2023-08-05 — here’s the page that explains about the Inflector.
Create an environment variable to control which lemmata can be displayed with generated forms — set it to all parts of speech locally, but only proper nouns, conjunctions, and pronouns for the live website. Done 2023-09-30.
Replace the lemmata collection in production with the contents of the local lemmata collection, which includes generated forms. This will make inflection-tables appear for some lemmata on the live website. Done 2023-09-30.
Confirm that, if unencliticized forms are generated that are not in Excel, they are all worthy of being added to velut, for all parts of speech. When each part of speech has been checked, update both the lemmata collection and the environment variable in production, so the forms appear. Nouns finished 2024-01-14. Prepositions finished 2024-01-27. Interjections finished 2024-01-28. Adverbs finished 2024-02-17. Third-declension adjectives finished 2024-04-12. Other adjectives finished 2024-07-20. Deponent 1st-conjugation verbs finished 2024-10-17. Deponent 2nd-conjugation verbs finished 2024-10-20. Other deponent verbs finished 2024-10-27. Semi-deponent verbs finished 2024-11-03. All remaining lemmata are non-deponent verbs. See the webpage mentioned earlier for where I’m at in checking the output of the Inflector.

Note from 2024-10-18 — Because all the lemmata I have left to check are verbs, and going through all verbs will take a while, I have allowed verbs that have been checked to have their inflection-tables displayed on the live website. For other verbs, the list of forms from Excel will continue to be shown. The environment variable is still useful for documenting which parts of speech I’ve finished checking.
Consider removing the environment variable that controls which lemmata have inflection-tables — at this point, all lemmata should have them, even on the live website.
Implement programmatic handling of ambiguously stressed forms.

What do I mean by this?

Some pairs of lemmata have particular forms that are identical except for the stress. For example, ‘dominus’ “lord” and ‘dominium’ “banquet” both have a genitive singular ‘dominī’, stressed on the first syllable for “of the lord” and on the second for “of the banquet”. In velut, I differentiate between the two by putting an acute accent on ‘domínī’ “of the banquet”.

(I don’t put the acute on words that don’t need it due to not matching a word with a different stress. Eg, words like ‘imperī’ — genitive singular of ‘imperium’ — have the stress on penultimate syllable just like ‘domínī’, but there’s no other way of stressing ‘imperī’, so I forgo the accent.)

Currently, I apply the acute accent manually… if I notice that forms can coincide like this. I’d like to have an automated solution.
Check that the previous step hasn’t changed any of the data, beyond correcting any accenting mistakes that may exist.
Refactor code for the Inflector.
Check that the previous step hasn’t changed any of the data.
Create a script that reads the parsing data that was generated by the Inflector, and condenses them into the simple “word and its lemmata” format that the Word Data Generator requires. (This is like the reverse of the Forms Collator: a Lemmata Collator.)
Run the output of the Lemmata Collator through the Word Data Generator
(At this point I could also paste the output of the Lemmata Collator back into the wordsform Excel sheet that the Word Data Generator replaces. Excel would probably crash though.)
Replace the words collection in MongoDB locally with the output of the Word Data Generator in the previous step.
Perhaps make some quick fixes to the user interface — the number of rhymes given for a word should not be too huge, inflection-tables and lists of forms should not appear on the “English to Latin” page, etc.
Replace the two collections in the production database with the local data. The new words and inflection-tables will now all be on the live website.
Ensure that whatever script I have to update a database updates both databases.

Check that it is relatively easy to add vocabulary to velut. This means:

1. adding a lemma (or several!) to my relevant Json file,
2. getting the inflected forms with the Inflector,
3. passing the forms into the Word Data Generator via the Lemmata Collator,
4. manually checking that the data are correct,
5. importing to MongoDB (both locally and for production), and
6. manually checking that the words look correct on the live website.

Ensure everything in the Excel file exists outside of it.
Evaluate whether I should discard the development database and switch back to using the production database in development.
Evaluate whether I should delete (or repurpose) the page on the website that shows my progress in checking the output of the Inflector.
Evaluate whether the Excel file can be deprecated.
Continue adding words and going through issues. (I have several private Trello boards, including one for velut.)

Diagram of new architecture

Explanation of diagram

The Json of source lemmata data is read by the Inflector, which generates a Json summary of Inflector work, which is imported into the summary MongoDB collection. The Inflector also generates a Json array of lemmata with their forms, which is imported into the lemmata MongoDB collection and is read by the Lemmata Collator. The Lemmata Collator generates a simple list of words with their lemmata, which is read by the Word Data Generator to generate a Json array of word data, which is imported into the words MongoDB collection. The three MongoDB collections are read by the velut website.

(Note that the “summary” of Inflector work is only useful while I’m moving to the new architecture, since it’s used by the de-Excellation page that displays my progress with that. If I delete that page, I can get rid of that branch of the diagram and go back to having only two MongoDB collections.)

Diagram

flowchart TD
      accTitle: New architecture of velut
      accDescr: Text equivalent in explanation above

    A[fa:fa-file Json of source lemmata data] --> |is read by| B(fa:fa-scroll Inflector)
    B --> |generates| C[fa:fa-file Json summary of Inflector work]
    B --> |generates| D[fa:fa-file Json array of lemmata with their forms]
    D --> |is read by| E(fa:fa-scroll Lemmata Collator)
    E --> |generates| F[fa:fa-file simple list of words with their lemmata]
    F --> |is read by| G(fa:fa-scroll Word Data Generator)
    G --> |generates| H[fa:fa-file Json array of word data]
    C --> |is imported into| I[fa:fa-database `summary` MongoDB collection]
    D --> |is imported into| J[fa:fa-database `lemmata` MongoDB collection]
    H --> |is imported into| K[fa:fa-database `words` MongoDB collection]
    I --> |is read by| L[fa:fa-globe velut website]
    J --> |is read by| L
    K --> |is read by| L

Loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan.md

plan.md

Plan for decoupling velut from Excel

Diagram of new architecture

Explanation of diagram

Diagram

Files

plan.md

Latest commit

History

plan.md

File metadata and controls

Plan for decoupling velut from Excel

Diagram of new architecture

Explanation of diagram

Diagram