Sunday, April 13, 2008

Addendum - Cloudy Thinking

In my previous post, I don't think that I stated my question properly.

I was asking how to actually implement the Event Cloud in a Streaming SQL-like CEP engine, so that we can build up hierarchies of events. I was looking for schemas, SQL statements, best practices, etc.

I am well aware of POSETS, but what might be easy to implement in C# or Java-based data structures is not that easy in a quasi-relational system such as CORAL8.

So, I was hunting for advice from people who might have implemented event clouds in Coral8, Streambase, and Aleri, all three which are based on SQL.

I will follow up this post with a more concrete example.

©2008 Marc Adler - All Rights Reserved

3 comments:

Hans said...

The misuse of the term POSET is a pet peeve of mine, so I wrote a post about it rather than take up space in your comments: http://hansgilde.wordpress.com/2008/04/13/posets-and-ep-a-red-herring/

But I have done quite a lot of thinking on how do do things in SQL vs. other languages and what is easier in either case. So when you do post more concrete examples, I'll try to help out.

Greg said...

Hi Marc,

There are quite a few in this space that are are mostly comfortable thinking in linear, streaming, deterministic problem sets.

Aleri, StreamBase, Coral8, etc. do not process POSETS (in general), they process linear streams of ordered data (out-of-sequence time-stamped data arrival does not make a TOSET a POSET).

The software platforms are simply are not designed for POSET processing. POSET processing requires backward chaining, for example, something that none of these stream processing oriented problems currently do.

This is one of the reasons a few in the business continue to argue the "POSET discussion is not relevant" point.

For example, if CEP was envisioned to find causal relationships in complex POSETS and the current state-of-the-art cannot do it, then (some seem to believe it is) better to simply redefine what is CEP to match the lack of capability.

In at nut shell, folks are talking orthogonal concepts. You can't properly represent (model) an "event cloud" in an "event stream" product, unless you reduce the abstraction to something that is less complex...

In other words, to model POSETS in a stream processing engine, you would need to reduce it to a TOSET (it is just that simple, really).

Greg

Hans said...

AFAIK, there are several mistakes in the previous comment.

First, processing a POSET does not require backward chaining. How about: any analysis of data that does not consider ordering. If ordering is not a factor in the analysis, then the status of the sample or population sets as POSETs or a TOSETs is irrelevant. Myth busted.

Next, backward chaining has absolutely nothing directly to do with processing POSETs. Just take a look at this simple description of backward chaining and you'll see what I mean.

Finally, it is dead simple to represent a POSET in any EP engine and this is a great example of why the concept is useless. And here is is: any structure that does not preserve ordering.

Store all incoming events/messages in a bucketed window/table/map or any other structure that does not preserve ordering. Ta-da. I present you with a representation of a POSET.

If you insist on having some kind of ordering available, then use a structure where data is sorted within a bucket, but buckets are not sorted against each other. Now some elements are comparable to each other (within a bucket), but two arbitrary elements are not necessarily comparable. It's just that simple. Don't worry - although I have just added POSET functionality to your EP engine, I will not charge you any royalties.