Wednesday, July 02, 2008

First experiments with Velocity

I played around with Microsoft Velocity this past weekend, and I put together a little Order Cache. This application reads FIX messages off of a Tibco EMS bus, turns the messages into a C# object, and stores the object in the Velocity cache.

The primary key of the Order Cache is the ClOrdId, and this value should be unique for each trading day. Every time we see a FIX message, we check to see if an object with the same ClOrdId is in the cache. If it is, the Order object is updated, and store the object back in the cache. The order and execution state is kept, the quantities are updated, and Cancels and Cancel/Replaces are accounted for.

Admittedly, it is an extremely simple application, but I did it just so I could get familiar with the first CTP of Velocity. One cache holds all order objects. I did not try to alpha-partition the orders by symbols amongst different caches, nor did I experiment with multiple regions yet. Just simple Gets, Puts and Adds. I used Reflector a bit to peer into the Velocity code, and I could see that there is a lot more hidden under the surface of Velocity that has not been exposed yet.

Where can we use an object cache with a Complex Event Processing system?

First, we can use it to support resiliency of the CEP system. Let’s say that our CEP engine crashes in the middle of the day. We want to be able to retrieve the current state of our orders and executions, but we don’t want to have to go back to some sort of FIX repository, and play back each FIX message in order to re-create the state of the CEP engine. We can use the object cache to aid in recovering the state.

Now, most CEP engines have a facility where they can save state. For instance, with Coral8, you can choose to save all windows out to a database every n seconds or minutes. But, there is a certain tension with persistence of a CEP engine that is processing fast streams of real-time data. If you persist too frequently, you can slow down the CEP engine. If you persist infrequently, then you can lose a significant amount of your state if the CEP engine fails.

Most of the CEP engines will use an external database like SleepyCat, BerkeleyDB, and SQL Server to store state. But, depending on the size and shape of your saved data, recovery might be slow. So, why not save to an object cache? Reading and writing to an object cache would be extremely fast, and if the object cache supports asynchronous storage to a back-end database (or Microsoft’s SSDS), then you can have the best of both worlds.

Among the various CEP engines, I have not seen any built-in adapter support for object caches, like Gemfire, Tangasol, GigaSpaces, and Velocity. It would be fairly easy to write one, but it takes time to marshal between a native C# object and something like a Coral8 tuple.

Wish #1: The CEP vendors to explore object caches as a way to do persistence.

Another use of an object cache is to assist with a query façade. CEP engines like Coral8 support the notion of ad-hoc queries of the windows. If the CEP engine is aggregating data like orders and executions inside a publicly queryable window, any application should be able to query the CEP engine for something like “What are the top ten stocks that have open orders between 10:15 and 10:30”.

We don’t want external applications connecting directly to the CEP engine. There are compliance and security issues involved, and we don’t want any application tied to a particular variant of Streaming SQL. (Note: Coral8 has SQLite embedded in it, and that is what you use to query the public windows.) Therefore, we want to provide a “query façade”. All applications have to connect to the query façade in order to do ad-hoc querying of the CEP engine.

There is a cost to performing ad-hoc queries on the CEP engine while it is trying to process tons of streaming data. And, unless you index your public window correctly, the ad-hoc query can be very slow.

So, why not be able to query the object cache instead? If we are storing aggregated orders and execution data inside of Velocity, why can’t we query Velocity to find the top ten stocks that have open orders between 10:15 and 10:30? In other words, querying an object cache should be almost the same as querying an SQL Server database. I don’t care if the query language is SQL 92, Streaming SQL, LINQ, or the LINQ-ish thing that SSDS has…. As long as it’s not Q :-)

Wish #2: Microsoft will get together with the object cache and CEP vendors and come up with a common method to query database, caches, and CEP engines.




©2008 Marc Adler - All Rights Reserved.
All opinions here are personal, and have no relation to my employer.

5 comments:

Anonymous said...

Marc,

I like your ideas re Cache & CEP.

Using a cache with CEP is a great idea for persistence, etc. but there is another use that, at first blush, is not readily apparent.

Using a cache, and embedding the cache into the development environment with a number of access/update methods (pub/sub, sql, LINQ, etc.) allows the creation of distributed, 'grid' or cluster-like deployments with much less fuss and mess than what's currently required across all of the CEP vendors.

As a practioner in the area of event driven systems for many years within the Capital Markets space, I don't think I've ever implemented a production, tier 1 app on a single machine. Use of a cache (or bus with different qualities of service) made deployment of these hot/hot solutions much easier than would have otherwise been possible.

Korrelera, (R.I.P.), utilized an embedded pub/sub bus so that when the engine ran, it didn't really know, or care, where data came from and where data was going. This bus could also be used on a single machine deployment. Our first, and only deployment (at the Boston Stock Exchange), consisted of a number of machines running a number of processes (100's). Use of the bus made clustering all that much easier and also allowed the dynamic creation and addition of processes to the system during production (if you were brave enough to try it).

Korrelera didn't have a front end development environment, and 3 years ago we weren't ready to embed anything like a Coherence, etc. so we settled for the next best thing. Today, I would most definitely embed cache functionality within a CEP engine and also incorporate the necessary semantics for its use within the development environment.

Your post highlights something else that I find interesting. As CEP solutions make headway and become more functionaly complete and therefore easier to deploy in production, 24x7 environments, the type of functionality you've described becomes a 'must have.'

Along this vein, I'd be interested to know what else you would consider a 'must have' for CEP vendors.

Your blog is interesting and relevant as always.

All the best.

Colin

Anonymous said...

Hi Marc (and Colin),

Great topic, please see my thoughts on it at Aleri's blog: http://blog.aleri.com/object-caches-for-cep-persistencerecoverability-on-demand-queries-or-not/2008/07/08/

Anonymous said...

Hi Marc,

interesting combination...makes a lot of sense to me, I believe relational databases don't really fit the bill...

By the way you might want to add another Object Cache product to your list, IBM's Websphere Extreme Scale (aka ObjectGrid).

Keep up the good blogging!

Regards,
PatternStorm

Ajaxx said...

Good points as always. In my experience, I find far too often that it's the lack of clean abstraction between an entity (CEP or otherwise) and it's data access layer that prevents the integration of alternative products. Are there really any fundmental differences between pushing state to a cache or database? It reminds me of the problems we saw in the 90s during the introduction of ODBMs architectures. In theory its easy to conceptually separate how you represent state, how you push state and how you retrieve state.

I wouldn't mind a larger entity capable of driving a standard to completion solidifying the incredible variance we're seeing in CEP and cache standards. One of the key items Bill Bains pointed out is that Microsoft has such a dominating presence that it may well cause the Cache vendors to coalesce on a standard API. It'd be nice to see the same thing in the CEP space where there are equally as many differing standards.

Thanks for the post ... always a good read.

-- Aaron

Anonymous said...

Marc,

Unfortunately, it seems the url link in my posting was cutoff, could you please add the following to my prior post so folks can easily link to it?

thanks,
jerry


http://blog.aleri.com/object-caches-for-cep-persistencerecoverability-on-demand-queries-or-not/2008/07/08/