Configuration Over Code

iow The Importance of a Data Dictionary
Sunday, March 29, 2020

Data are central to everything we do at Plazi, and the main focus of (almost) everything I do at Plazi — the Zenodeo API and Ocellus, a sample application that uses that API. Both applications are written in JavaScript. While I stay away from frameworks and libraries as much as I can help it, I use Fastify. do use the wonderful Hapijs for all the REST infrastructure. Hapi‘s tagline Fastify’s emulates both Hapijs and Express, and also emphasized configuration over code, and I’ve really come to appreciate that.

In the data pipeline I have setup, the data dictionary plays the central role. In fact, it plays many roles.

                                   +-----------+                             
                                   | Treatment |                             
                                   | Bank XML  |                             
                                   |   dump    |                             
                                   +-----------+                             
                                         |                                   
                                         |                                   
                                         v                                   
                                   +-----------+                             
                                   |  SQL ETL  |                             
                        +--------->|  queries  |------------+                
                        |          +-----------+            |                
                        |                                   v                
+-----------+     +-----------+    +------------+    +------------+          
|Zenodeo API|     | the data  |    |SQL queries |    |Zenodeo SQL |          
|documentati|<----|dictionary |--->|optimization|--->| datastore  |<--+      
+-----------+     +-----------+    +------------+    +------------+   |      
                        |                                   ^         |      
                        |                                   |         |      
                        |          +------------+           |  +------------+
                        |          |  SQL data  |           |  | parameter  |
                        +--------->| retrieval  |-----------+  | validation |
                        |          +------------+              +------------+
                        |                                             ^      
                        |                                             |      
                        |                                             |      
                        |                                      +------------+
                        |                                      |REST queries|
                        +------------------------------------->|            |
                                                               +------------+

As the diagram above shows, the queries that do the Extract-Transform-Load (ETL) between the XML dump and the Zenodeo store, are generated by code using the configuration in the data dictonary. So are the SQL queries for making indexes to optimize the db as well as the SQL queries for the subsequent data retrieval for the API. On the side of the AP, the incoming REST queries are validated using the data dictionary to ensure there is no hanky-panky going on with bad or mal-intentioned query parameters. And, the web-based API documentation automatically generated in an OpenAPI-compatible format is also automatically generated from the data dictionary.

By having all the configuration information in the data dictionary, the code is a lot simpler. And, if there are any errors, only the configuration has to be adjusted. All of this is still under development, but it should be done and ready for testing soon.

Configuration over code – now I understand the value of this philosophy.