Data authority limitations
Data Authority Querying: Limitations and Potential Enhancements¶
The Data Authority pattern provides a robust foundation for CRUDQ operations on bitemporal entities. The querying functionality, while powerful, has certain limitations in its current form and several avenues for future enhancements.
Current Limitations¶
-
QueryObject Complexity:
* Clients are responsible for constructing the JSONQueryobject. For complex queries with multiple filters and sorting criteria, this can be verbose and error-prone. There’s no schema validation for theQueryobject’sfilterkeys beyond what theQueryCompilerin theUniverselayer can handle, potentially leading to runtime errors if unknown fields or operators are used. -
Page Token Nature:
* The current page tokens (as described indata-authority-use.md) are simple strings. While the implementation detail of how these tokens are generated and decoded is inDataAuthorityService(currently placeholder), if they are merely serialized versions of theQueryobject or simple page numbers, they might be predictable or lack robust integrity checking. Opaque tokens that are validated server-side are generally better. The current GET/query/{page}endpoint has placeholder logic for decoding these tokens. -
Performance for Large Datasets/Complex Queries:
* The performance of query operations heavily relies on the efficiency of the underlyingUniverseimplementation (e.g., database indexing, query optimization by theQueryCompiler). Without careful design at that layer, queries on very large datasets or with many complex filter conditions could become slow. The current model primarily fetches full entity records, which might be inefficient if only a few fields are needed. -
Full-Text Search:
* TheQueryobject’s filter mechanism is primarily designed for structured data (exact matches, range queries, etc.). It does not offer built-in support for advanced full-text search capabilities across multiple fields or with relevance scoring. -
Aggregations and Projections:
* The current query functionality is focused on retrieving lists ofEntityRecords. There is no standard mechanism for performing aggregations (e.g.,COUNT,SUM,AVG) based on filter criteria directly via the API, nor for requesting only specific fields (projections) from the entities to reduce payload size. Clients receive the fullEntityRecord. -
Filter Expressiveness:
* TheQuery.filtermap typically impliesANDconditions between its top-level entries. While individual filter values can support operators likestartsWith,gte,lt, the ability to express complexORconditions across different fields or deeply nested logical structures might be limited by theQueryCompiler’s capabilities. -
Error Handling for Invalid Page Tokens:
* The GET/<version>/<resource>/query/{page}endpoint’s logic for handling invalid, expired, or malformed page tokens is currently a placeholder (val decodedQuery = Query()). Robust error handling (e.g., specific 400/404 errors) for such cases needs to be fully implemented. -
Sorting on Calculated/Joined Fields:
* ThesortByfunctionality is generally tied to direct fields in the entity’s payload or metadata. Sorting based on calculated fields or fields from related (joined) entities is typically not supported through the simpleQueryobject.
Potential Enhancements¶
-
Simpler Query Language / Client Libraries:
* Introduce a more concise query language (e.g., a string-based expression parser similar to Lucene syntax or a simplified GraphQL-like syntax) that translates to theQueryobject on the server-side.
* Provide client libraries (SDKs) that offer a fluent API or query builders to constructQueryobjects programmatically, reducing client-side complexity. -
Cursor-Based Pagination:
* Implement true cursor-based pagination. Instead of page tokens potentially encoding query state or page numbers, cursors would point to a specific item in a sorted list, making pagination more robust and performant, especially when dealing with frequently changing data. This often involves thenextPagetoken being derived from the last item of the current result set. -
Field Selection (Projections):
* Allow clients to specify a list of fields they want to retrieve (e.g., via afieldsparameter in theQueryobject or a dedicated query parameter). This would reduce network traffic and deserialization overhead for clients that only need partial data. -
Aggregation Capabilities:
* Extend the query endpoint or add new endpoints to support basic aggregation functions (e.g.,count,sum,avg,min,max) based on a filter. This could return a summarized result instead of entity lists.
* Example:POST /<version>/<resource>/aggregatewith a query body. -
Integration with Dedicated Search Engine:
* For advanced full-text search, synonym support, relevance ranking, and faceting, integrate the Data Authority with a specialized search engine (e.g., Elasticsearch, OpenSearch, Solr). Entities could be indexed asynchronously, and a separate search API endpoint could leverage the search engine’s power. -
Enhanced Page Token Security and Validation:
* If page tokens are not entirely opaque and validated by theUniverselayer, implement more robust server-side validation, encryption, or signing for page tokens to prevent tampering and ensure integrity. Make them short-lived if necessary. -
Caching Strategies:
* Implement caching mechanisms for frequently executed queries or commonPageResults, especially for data that doesn’t change very often. This could be done at the service, gateway, or CDN level. Cache invalidation strategies would be critical. -
Advanced Filtering Logic:
* Enhance theQueryCompilerandQuery.filterstructure to support more complex logical operations, such asORconditions across different fields (e.g., “status is ‘A’ OR priority is ‘HIGH’”),NOTconditions, and potentially nested filter groups. -
Standardized Query Operators:
* Ensure a well-defined and documented set of filter operators (e.g.,eq,ne,gt,gte,lt,lte,in,nin,startsWith,endsWith,contains) that are consistently implemented across different Data Authorities. -
Asynchronous Query Execution for Large Exports:
* For queries that might result in very large datasets (e.g., for data export purposes), consider an asynchronous execution model where the client initiates a query and later polls for or receives a notification when the results are ready for download.
By addressing these limitations and exploring these enhancements, the Data Authority’s querying capabilities can become even more powerful, flexible, and user-friendly for API clients.