Criteria API for Magnolia CMS

Version 5.0.10 is available for Magnolia 4.5.
Warning: release 5.0.10 contains API changes due to the removal of all the deprecated Magnolia APIs.

Criteria for Magnolia CMS (openutils-mgnlcriteria) offers a powerful and simple alternative to standard Magnolia query APIs.

The Criteria API is built by two main components:

  • a simplified API for retrieving JCR Nodes by composing Criterion objects, inspired by Hibernate's Criteria API. This is a very convenient approach for functionality like "search" screens where there is a variable number of conditions to be placed upon the result set.
  • an engine that executes JCR queries, giving you a great performance and several extra goodies (e.g. pagination, lazy loading, search scoring, spell checker support...) compared to the standard Magnolia one. openutils-mgnlcriteria does not use Magnolia query APIs, but works directly at the JCR level. 

 

Better performance than standard Magnolia queries? Are you sure?

Oh, yes! For several use cases thousands of times faster!  

The standard Magnolia APIs for querying the jcr repository (talking about info.magnolia.cms.core.search.QueryManager and friends) are unfortunately poorly optimized and flawed by an old/bad design approach that has been kept since their initial implementation. They don't support pagination at all (or better, you usually paginate by fetching the whole lists of contents from the repository and then thrashing the ones you don't need to display), eagerly initialize all the search results, post-process the search result for the correct item type, post-process the results for applying ACLs. This makes them pretty slow and memory hungry...

Performance/usage comparison between Magnolia Query APIs And Criteria

 

using Magnolia QueryManager using Criteria
Pagination is not supported, you can only retrieve the full list of results and then paginate at front end. If you have a large number of results you can't use these APIs, also if you need to display only a bunch of items (e.g. if a query returns 100.000 your magnolia instance will probably crash, no matter if you only want to display the first result in page). Pagination is supported by default, you can just specify the desired page size and only the needed items will be fetched from the repository. The result will give you all the details about the total number of items, the current page number, the number of available pages. You will not have to do any calculation by yourself!
All the items are eagerly fetched after immediately after a query execution, so they are loaded also if you don't really use them. The result is a Collection of Content objects. Items are lazily fetched from the repository and Content objects are instantiated only when you really need them, by iterating on the query result. The result of a query is an object called "AdvancedResult" that give you an iterator on actual items by calling the getItems() method. You will always work with iterators and not collections, as this will give you a boost in performances and reduce memory usage. As a bonus, there is a getItems(class) method that will give you the results automatically converted to javabeans when you need them. 
The type of result (Content or ContentNode) is selected usually after the query execution. You can run a query and then fetch results as Content objects (pages) or ContentNode objects (paragraph). This may look nice, but it means that QueryManager will have to do an hard work in fetching a "wrong" set of result from jcr and then combining them. For example, if you search for a text contained in paragraphs and a page has two paragraphs with such text, QueryManager will fetch 2 results, get the parent page and discard the duplicated occurrences. You select the type of content before running the query, by adding a restriction on the "@jcr:primaryType" attribute. When the query is executed, you can't change the results returned. Tip: if you need to look in the content of paragraphs when you search for pages you can easily do that using index aggregates in the jackrabbit search index configuration.

Security checks are always applied after the query execution. This means that if the search returned 1000 items, but only 1 is accessible by the current user, QueryManager will load all the 1000 items, iterates through them, trash 999 of them and return the single item left. 
There is no way to tell query manager not to iterate on all the results for checking ACLs, also if you know there are no restrictions on a totally public site. Note that this way ACLs work is also the main reason why JCR paging can't work with the current QueryManager implementation (you can't know the number of left items after the security checks, so calculating  pagination is impossible).


 

 

Results are not post-processed in any way after the execution of the query. Applying ACLs is definitively the most complex thing to do while keeping a good performance and without breaking pagination. By default a Criteria query doesn't apply any security check: you may be happy with that if you are using it for something where there are no restrictions for particular users.
If you need ACLs on criteria query, criteria offers a specific jackrabbit SearchFilter which applies Magnolia constraints directly at the JCR level, by automatically altering any executed query by adding additional constraints. Note that doing this is really tricky and this component is pretty new in Criteria APIs, so properly test you queries/ACL combination. See the search index configuration page for details.
 

mh, anything missed?

Sure, using QueryManager you'll probably find hard to use some advanced JCR features like spell checker or accessing excerpts or the calculated score for the returned items. Criteria queries will expose them in the search results.
Well, we are also not mentioning that one of the most important advantage of Criteria API is the composition of queries with an easy to use API, the automatic escaping of XPath statements... see the usage and samples page for details.

Which version should I use?

The first major release of Criteria API (1.x) was focused on query composition, but still used the standard Magnolia query manager for query execution. The new engine built in version 2.0 and up required several changed and deprecation to the interfaces: version 2.1.1 is the best one you should get if you still depends on such old methods from the first release.

Version 3.0.1 is identical to 2.1.1, but with all the deprecated method removed, so it's the preferred version to use if you don't already have code that depends on older versions.

Installation

The bundle is provided as a zip file; the archive contains a number of .jar files (the module itself plus the required dependencies not already available in a standard Magnolia installation.

Unzip the module bundle into your Magnolia webapp WEB-INF/lib folder, just like any other module. Please look at the requirement paragraph below for a detailed list of jars needed for each specific version of Magnolia.

If you're using Maven, simply declare the dependency in your pom.xml, so all module dependencies will be taken care of by Maven.

Requirements

The Magnolia Criteria API requires Jackrabbit version 1.6 or up, and will not work with other JCR implementations (most of the code use generic  JCR 2.0 APIs, but for some parts -e.g. ACLs- are jackrabbit specific).

Versions 2.x of Criteria API needs Magnolia 4.0 or up and will not work with previous versions. At the moment of the 3.0 release criteria have  been tested using several Magnolia versions from 4.0 to 4.3.6, the latest available release at that time.

This is a summary of the required dependencies:

  • Magnolia 4.0 or up (4.3 suggested)
  • Jackrabbit version 1.6.x, 2.0.x or 2.1.x (2.0 or 2.1 suggested)
  • JCR APIs 2.0: they are included by default if you are using jackrabbit 2.0 or up, you may need to upgrade them from version 1.0 if you are still using jackrabbit 1.6 (don't worry, jackrabbit 1.6.x works fine with jcr API 2.0) not needed anymore since version 3.1.0
  • cglib 2.2: already included in Magnolia 4.3 and up, you may need to add it manually to WEB-INF/lib if you are using an older version.

Project info & quick links