Skip to content

Data authority limitations

Data Authority Querying: Limitations and Potential Enhancements

The Data Authority pattern provides a robust foundation for CRUDQ operations on bitemporal entities. The querying functionality, while powerful, has certain limitations in its current form and several avenues for future enhancements.

Current Limitations

  1. Query Object Complexity:
    * Clients are responsible for constructing the JSON Query object. For complex queries with multiple filters and sorting criteria, this can be verbose and error-prone. There’s no schema validation for the Query object’s filter keys beyond what the QueryCompiler in the Universe layer can handle, potentially leading to runtime errors if unknown fields or operators are used.

  2. Page Token Nature:
    * The current page tokens (as described in data-authority-use.md) are simple strings. While the implementation detail of how these tokens are generated and decoded is in DataAuthorityService (currently placeholder), if they are merely serialized versions of the Query object or simple page numbers, they might be predictable or lack robust integrity checking. Opaque tokens that are validated server-side are generally better. The current GET /query/{page} endpoint has placeholder logic for decoding these tokens.

  3. Performance for Large Datasets/Complex Queries:
    * The performance of query operations heavily relies on the efficiency of the underlying Universe implementation (e.g., database indexing, query optimization by the QueryCompiler). Without careful design at that layer, queries on very large datasets or with many complex filter conditions could become slow. The current model primarily fetches full entity records, which might be inefficient if only a few fields are needed.

  4. Full-Text Search:
    * The Query object’s filter mechanism is primarily designed for structured data (exact matches, range queries, etc.). It does not offer built-in support for advanced full-text search capabilities across multiple fields or with relevance scoring.

  5. Aggregations and Projections:
    * The current query functionality is focused on retrieving lists of EntityRecords. There is no standard mechanism for performing aggregations (e.g., COUNT, SUM, AVG) based on filter criteria directly via the API, nor for requesting only specific fields (projections) from the entities to reduce payload size. Clients receive the full EntityRecord.

  6. Filter Expressiveness:
    * The Query.filter map typically implies AND conditions between its top-level entries. While individual filter values can support operators like startsWith, gte, lt, the ability to express complex OR conditions across different fields or deeply nested logical structures might be limited by the QueryCompiler’s capabilities.

  7. Error Handling for Invalid Page Tokens:
    * The GET /<version>/<resource>/query/{page} endpoint’s logic for handling invalid, expired, or malformed page tokens is currently a placeholder (val decodedQuery = Query()). Robust error handling (e.g., specific 400/404 errors) for such cases needs to be fully implemented.

  8. Sorting on Calculated/Joined Fields:
    * The sortBy functionality is generally tied to direct fields in the entity’s payload or metadata. Sorting based on calculated fields or fields from related (joined) entities is typically not supported through the simple Query object.

Potential Enhancements

  1. Simpler Query Language / Client Libraries:
    * Introduce a more concise query language (e.g., a string-based expression parser similar to Lucene syntax or a simplified GraphQL-like syntax) that translates to the Query object on the server-side.
    * Provide client libraries (SDKs) that offer a fluent API or query builders to construct Query objects programmatically, reducing client-side complexity.

  2. Cursor-Based Pagination:
    * Implement true cursor-based pagination. Instead of page tokens potentially encoding query state or page numbers, cursors would point to a specific item in a sorted list, making pagination more robust and performant, especially when dealing with frequently changing data. This often involves the nextPage token being derived from the last item of the current result set.

  3. Field Selection (Projections):
    * Allow clients to specify a list of fields they want to retrieve (e.g., via a fields parameter in the Query object or a dedicated query parameter). This would reduce network traffic and deserialization overhead for clients that only need partial data.

  4. Aggregation Capabilities:
    * Extend the query endpoint or add new endpoints to support basic aggregation functions (e.g., count, sum, avg, min, max) based on a filter. This could return a summarized result instead of entity lists.
    * Example: POST /<version>/<resource>/aggregate with a query body.

  5. Integration with Dedicated Search Engine:
    * For advanced full-text search, synonym support, relevance ranking, and faceting, integrate the Data Authority with a specialized search engine (e.g., Elasticsearch, OpenSearch, Solr). Entities could be indexed asynchronously, and a separate search API endpoint could leverage the search engine’s power.

  6. Enhanced Page Token Security and Validation:
    * If page tokens are not entirely opaque and validated by the Universe layer, implement more robust server-side validation, encryption, or signing for page tokens to prevent tampering and ensure integrity. Make them short-lived if necessary.

  7. Caching Strategies:
    * Implement caching mechanisms for frequently executed queries or common PageResults, especially for data that doesn’t change very often. This could be done at the service, gateway, or CDN level. Cache invalidation strategies would be critical.

  8. Advanced Filtering Logic:
    * Enhance the QueryCompiler and Query.filter structure to support more complex logical operations, such as OR conditions across different fields (e.g., “status is ‘A’ OR priority is ‘HIGH’”), NOT conditions, and potentially nested filter groups.

  9. Standardized Query Operators:
    * Ensure a well-defined and documented set of filter operators (e.g., eq, ne, gt, gte, lt, lte, in, nin, startsWith, endsWith, contains) that are consistently implemented across different Data Authorities.

  10. Asynchronous Query Execution for Large Exports:
    * For queries that might result in very large datasets (e.g., for data export purposes), consider an asynchronous execution model where the client initiates a query and later polls for or receives a notification when the results are ready for download.

By addressing these limitations and exploring these enhancements, the Data Authority’s querying capabilities can become even more powerful, flexible, and user-friendly for API clients.

Comments