Saturday, April 12, 2008

Cloudy Thinking

One of the things that has been on my mind recently is how to represent the "event cloud".

One of the buzzphrases that comes out of the CEP movement is the term "event cloud". This is the huge amalgamation of all of the events that flows through your event processing system.

This week, I had the opportunity to talk with Mary Knox of the Gartner Group. Mary follows the world of CEP. Just like many of the executives at the various CEP companies have told me, Mary confirms that many companies in finance are using CEP for simple streams of processing .... a small algo trading system here, a pricing engine there, a portfolio analysis system in another place. Most shops seem to be "under-utilizing" their CEP engines. Our effort is probably one of the few out there that will really attempt to poke holes in the hype surrounding CEP.... either our application will validate the hype, or all of the CEP engines will flame out in a blaze of glory.

Mary has not heard of too many financial companies who are implementing the event cloud, so I wonder if we are breaking new ground in that area. We know what an event cloud is and what it is supposed to do, but how do we actually implement the event cloud in a system like Coral8 or Streambase? Are relational tables/windows the best way to represent the cloud? And, what about hierarchies of events that are derived from other events? How can be best represent that complex graph of inter-relationships with SQL-ish windows?

Is anyone out there doing similar work with the cloud?

©2008 Marc Adler - All Rights Reserved


Hans said...

Well, I'll tell you that a huge event cloud project always seemed a little risky to me. Having seen efforts to develop big pools of shared data using databases, message buses, LDAP, etc. I have come to the conclusion that before developing something like this, one had better have some pretty big and specific internal customers and use cases lined up first.

PatternStorm said...

The best way to represent "that complex graph of inter-relationships" is...guess what? To use a graph...well, to use an ontology if yoy want, that is, something capable of easily representing entities and relationships among entities...but..wait a minute...A graph can do that! Well, being precise, an hypergraph can do that (since we might have to deal with n-ary relationships with n >2).

Ok, so it seems that we need an (hyper)graph.

But...hold on! "that complex graph of inter-relationships" that we need to model and process is not static, it changes with time, moreover, we don't have complete knowledge of it as we will uncover its intricacies incrementally as our knowledge of it increases through careful monitoring and analysis, so a static (hyper)graph will not do: we need an updatable, or if you want, dynamic/evolvable (hyper)graph.

Since our domain of discourse is an event cloud the entities we are talking about are most probably events, then if we have events e1,...,en that we know are related by a relationship R in our event cloud then we can encode this in our hypergraph as an R-labeled hyper-edge comprising the nodes e1,...,en. Equivalently we can say that the fact R(e1,...,en) is true; which suggests a very interesting equivalence between databases of facts and hypergraphs: every hypergraph is encoding a database of facts and viceversa.

So far so good, but an important question arises: how do we know that a bunch of events are inter-related in our event-cloud? More important, how do we abstract that
knowledge about a particular bunch of events being inter-related in order to be able to express algorithmically that whenever we see another equivalent bunch of events they'll be also inter-related? We can't keep talking anymore about specific event occurrences, we need to start talking about sets of events that satisfy a certain predicate, or, if you want, about types of events, in order to be able to write logic such as "whenever n events of type T occurr they are insterrelated by relationship R".

So we need to be able to (1) detect the presence of nodes in the hypergraph satisfying certain predicates and (2) add hyper-edges to the hypergraph to be able to add new inter-relations among events.

Now that we have raised our level of abstraction to event types we have a powerful way to model the inter-relations among the events in our event cloud: We can just construct another hypergraph capturing all our knowledge about the inter-relations in our event cloud, as follows: in this hypergraphs nodes are not event occurrences but event types. Now, if R(T,...,T) belongs to the hypergraph that means that whenever n events of type T occur they are interrelated by R.

This hypergraph that lives at a higher level of abstraction is defining the relationships that will be dynamically built on the hypergraph that deals with event occurrences and leaves at an inferior level of abstraction.

Bottomline: the hypergraph of event occurrences could the underlying model used by the CEP engine to store and expose event occurrences. The hypergraph of event types could be the "logic" that instructs de CEP engine which inter-relations to look for and create.

I am not aware of any CEP engine providing this support to dynamically create relationships among events (even when some CEP engines are using facts of databases under the hood, which, as we have seen, are equivalent to hypergraphs), a feature that seems to be a key one when dealing with clouds of events.


Greg Reemler said...

Hi Marc,

See my reply here:

Event clouds are not "implemented" per se; however they can be represented.

They are represented as POSETS. This is directly from the CEP literature.

Yours warmly,


Anonymous said...

The event cloud is a market data cloud.

Market data price or size changes are events - they affect all the core trading infrastructure, risk positions, etc, and the volume of data dwarfs any inhouse event generation.

Consequently any CEP Engine that is not optimized to process low level market data events across a broad market scope is just an expensive toy.

Market data systems also provide insight into core CEP requirements - index calcs are just calculations based on event windows, conflated/intervalized content removes static, handlers and distribution components maintain state, enrich and filter the stream , etc.

By definition market events will be represented in market data feeds and systems. Using an abstract CEP event cloud that doesnt embrace this will either require a lot of (unmaintainable) mapping or result in a CEP system that is not sensitive to market events - neither make sense for CEP in the front office.

Anonymous said...

A market data feed is not an event cloud, it is an event stream.

These comments illustrate how little people know about CEP, event clouds, etc.

Anonymous said...

Ummmm, I'm confused. You're embarking on a cutting edge event cloud project and you've just bought a product that doesn't "do" event clouds?