Distribution and Subscriptions
An out-of-process object cache should not only have a storage component, but a messaging system as well. One of the architects in my group, who is a well-known messaging guru, told me that the ideal object cache should have a state-of-the-art messaging system attached to it.
Our object caches should be distributed and subscribable.
A logical cache can be distributed amongst several different servers. We can do this for load balancing and for failover. Applications also have local caches that communicate with the master, distributed cache(s).
Let's say that we are storing information about each position that our company maintains. We might want to have 3 distributed caches, one that stores positions for our customers in the US, one for customers in Europe and the Middle East/Africa, and one for Asia. Upon startup, the master cache loader will read all of the positions from the database and will populate each of the three caches.
This is an example of a very simple load balancer for the distributed caches. Other load balancing schemes include partitioning the positions by the first digit of the position id, a date range, etc.
Each application that uses positions will have its own local cache. These local caches will usually contain a subset of the data that is in the master caches. For example, the US Derivatives Desk might just need to cache positions from US portfolios that have been active in the last 30 days.
When an application updates or deletes a position in one of the master caches, we need to update all of the other master caches that we are using for failover purposes, and any other master caches that contain that particular position. Similarly, when we create a new position, we need to propagate that new position to any redundant caches or any caches that might be interested in the new position.
We might need to push the new or updated position to any of the local caches that are interested in that position. We have a choice of architecture for distributing updates to local caches.
1) We do not distribute the updated object at all. An application won’t know that there is new data in the master cache until it retrieves that object again.
2) We push the update to the local caches right away. We can push out full objects or just the deltas (changes to the object).
There are disadvantages of both schemes. Under scheme (1), the application could be working with old data. Under scheme (2), we could be updating an object in the local cache while the application is working on that same object. Also, under scheme (2), we now have to worry about messaging more.
The master caches have to have some way of communicating with the local caches. We can communicate with each application by one of the familiar messaging mechanisms; Tibco EMS, Tibco RV, LBM, Sockets, etc.
We need to make sure that the messaging is reliable. Each subscriber must receive the update of the object from the master cache. There is no tolerance for dropped messages. Otherwise, different applications might be working with different versions of an object.
We do not have to make the message durable. In other words, if a client goes offline for a while, then the messaging part of the cache does not have to save the update until a time where that client decided to reconnect. So, this saves us the need of storing out-of-date messages.
Using a JMS-based messaging scheme also means that we can use JMS Selectors to filter out objects that an application is not interested in. Selectors have overhead with them, but it is easy to set up a filter-based pub/sub mechanism between the master caches and any local caches. For example, one application might only be interested in updates to position objects whose position id starts with the prefix “A23”. It is easy to set up a JMS selector that has the pattern “positioned LIKE ‘A23%’”.
©2006 Marc Adler - All Rights Reserved