Wednesday, May 13, 2009

Microsoft CEP - Part 2

Over the next few days, I hope to blog a bit about the new Microsoft CEP product, code-named "Orinoco".

To get yourself a bit more familiar with what was said at the announcement, you may want to go over to Richard Seroter's blog for a great post that summarizes the session at TechEd in which the Microsoft CEP product was announced.

I first got wind of Orinoco in the Spring of 2008 during conversations with some members of the SQL Server group at Microsoft. The Orinoco group maintained radio silence until January of this year, when I got an invitation to come out to Redmond for a Design Review Session. I was joined there by several internal and external Microsoft customers, and we spent 2 days going over Orinoco from front to back.

I will blog a bit more about the technology of Orinoco in some future blog posts. But, I would like to say that Orinoco has the potential to radically change the CEP marketplace.

Orinoco will use LINQ as the "surface language". This builds upon the good work of people like Kevin Hoffman, whose CLINQ is used heavily at Liquidnet. The ability to embed CEP operations within a normal C# application is extremely powerful. You can take advantage of the power of the C# language and you can go back to using mechanisms like inheritance and generics in your CEP applications. This approach combines the expressive power of something like Streaming SQL with the flexibility of something like Apama's MonitorScript.

I don't think that Microsoft has defined their final marketing strategy around Orinoco (although Charles mentions that it will be part of SS 2008 R2) . When I met with the Orinoco team in April, they were still gathering feedback from potential customers as to how to market the CEP technology. But, I can imagine that for the immediate future, Orinoco will be free to users of SQL Server. I can also imagine that some part of Orinoco will be free to Visual Studio users (like SQL Server Express) and that an Enterprise version will be licensed like SQL Server. Perhaps a free developer version will come with MSDN? Perhaps it will be a free add-on to SQL Server in the same way that Reporting Services is free. This can be used to drive more SQL Server sales, and put a damper on competitive products like Sybase RAP. It can also be a compelling factor for developers to choose Windows instead of Linux for their CEP servers.

This means that every .Net developer will have access to CEP technology at a price point that will make it extremely attractive.

CEP vendors like CorAleri, Streambase, and Apama will inevitably find themselves under a degree of pricing pressure. When the 2-ton gorilla bursts into the room with a free or low-cost offering, everyone has to sit up and take notice.

Orinoco is still a while away from being as polished as some of the commercial CEP products. But, the prospect of using LINQ, the integration with Microsoft HPC, and the ability to use Visual Studio as the IDE have gotten us pretty excited.

More later....


©2009 Marc Adler - All Rights Reserved.
All opinions here are personal, and have no relation to my employer.

14 comments:

Anonymous said...

I see two serious problems with this:

1. Performance. As if the current CEP offerings weren't already slow enough, here comes Microsoft trying to implement the engine in .NET using LINQ and Windows. My C++ stream processing systems based on messaging and Linux are probably 100 to 1000 times faster than anything they'll ever come up with.

2. Integration. I suspect that a lot of people use CEP engines as a type of middleware to connect all the components of their infrastructure together. I'm willing to wager that, in the wild, people are regularly (ab)using their CEP servers as glorified pub/sub systems. Middleware can only be viable if it can integrate with different types of technologies. And of course Microsoft (especially the SQL Server team) will do everything in its power to prevent its customers from being able to integrate with non-Microsoft technologies. Microsoft does not even support accessing a plain old SQL Server database from Linux using ODBC!

Microsoft is a sinking ship. This is just another attempt by Microsoft to take an old technology and (badly) reimplement it on Windows and making sure to break compatibility with anything other than Microsoft products. That worked for a long time for Microsoft, but the strategy has already failed in high-end enterprise / high performance computing world (where Linux has a 95%+ market share) and it is also rapidly failing to Apple in the consumer space (see http://www.roughlydrafted.com/2009/05/09/why-windows-7-is-microsofts-next-zune/)

marc said...

Anonymous - A few comments....

1) The CEDR engine, which is the actual CEP engine, is written in C++. I have seen some of the code and the guys who wrote it are definitely aware of performance considerations. For example, they are aware of the cost of calling certain functions (ie: memcpy) vs doing several small repeated assignments.

2) The Orinoco team seemed to be very aware of interoperability. This was one of the main concerns of the people at the Design Review who were from the financial services sector. We certainly do not expect all of our Java developers to transition to .Net just to use the capabilities of Orinoco.

3) I have been fairly critical of Microsoft in my blog, and I have probably given them more lumps than praise. And, I certainly pay close attention to the Mini-Microsoft blog. Nevertheless, there is a growing awareness inside of Microsoft that they need to open themselves up to interop and they need to address the concerns of high-end computing. Some of the people high up in the SQL Server team have recently come from Oracle, so there is an awareness building in that team of what the enterprise needs.

Hans said...

Being written in C++ might not be so great, it means you're marshaling every event to .NET before it gets to your code. But we'll see.

I wouldn't count it out yet. I once tested some vendor CEP products against equivalent code written in C++ and the ACE framework and got pretty good results. So if they're smart, MS could replicate that kind of performance. Not the best possible, but pretty good for something inside a framework.

I wonder if this stuff will take off outside of FSI's. Dos the average Joe what's-a-deadlock-anyway programmer really want to mess around with real-time queries?

Charles Young said...

As far as performance is concerned, I don't think Anonymous should have too many concerns. It is far too early to talk facts and figures, but the numbers that emerged internally are entirely in the right 'ball park'. In the keynote, Bill Veghte talked about how the engine is currently processing 500 million events a day in relation to one of Microsoft's web sites. By my calculation, this represents only a fraction of its potential capacity.

With regards to integration, one interesting point to note is that, after recent re-orgs, the SQL Server and BizTalk teams are now closely aligned. I’ve heard some interesting speculation in recent weeks about what impact the CEP engine may have on the future evolution of BizTalk Server. With the most recent version of BTS shipping just a few weeks ago, we will have to wait some time before we see. BizTalk is, amongst other things, an enterprise level integration server whose whole purpose is to support integration across the widest range of platforms.

I haven't a clue what the 'old', 'reimplemented' technology is that Anonymous refers to in this space. The engine is built on some fairly cutting-edge technology that has emerged from Microsoft Research. Anonymous clearly entertains a degree of antipathy towards Microsoft, and readers can make of his/her comments what they will.

marc said...

Charles,

The words "event" and "processing" are open to wide interpretation. Is an event a 4000-byte FIX message that has to be parsed, or is it a 16 byte code? Do they perform complex analytics on each event, or do they merely put the event in memory?

This is why organizations like Stac Research are so important. They came out with a series of benchmarks for CEP engines. I would like to see how Orinoco compares with the other CEP engines in these benchmarks.

Hans said...

"As far as performance is concerned, I don't think Anonymous
should have too many concerns."

Ha ha. We will see.

Charles Young said...

Well, yes, you are right Marc, of course. Living in BizTalk-land, I'm often asked by potential customers how many messages BizTalk can handle a second. There isn't an answer. It depends how big the message is, how much metadata you collect around the message, how much processing you do on the message, how you configure BizTalk, how much hardware you throw at the problem, etc., etc.

The point, though, is that early indications are that MS is unlikely to be embarrassed when comparing performance to other engines. Deep optimisation of the engine will almost certainly still an outstanding task, and it will be some time before realistic benchmark results come along.

Paul B. said...

The sort of customers that have traditionally bought a CEP product have often required very high event processing performance. Hence there will be a need by Microsoft to prove that their engine is capable of high performance in order to be credible against other engines in the same space.

However, I think focusing on performance alone is missing the point. Microsoft will bring CEP to a large number of organisations that won't have looked twice at a CEP product due to its niche position/high price. These organisations may not need extreme performance but will be happy with the performance on offer for the price point, while at the same time able to take advantage of the capabilities that a CEP product can offer.

Unknown said...

so now IBM is also in the mix.

http://tinyurl.com/oqyzsf

Marc, any comments?

Charles Young said...

Yes, I agree very much with Paul. The challenge facing Microsoft (and the CEP world, generally) is how to incorporate CEP into their platform in a way that will make sense to a much broader constituency than is currently the case. That’s why I see significance in the way that Microsoft has re-aligned the teams responsible for analytics, data management, BPM, RFID, BAM, etc.

About IBM, yesterdays launch of InfoSphere Streams/System S follows announcements made a couple of years ago. I may be wrong, but my impression is that this technology is not specifically orientated to event processing, but has a much wider remit. However, it is clearly linked to their 'BEP' strategy which is built on WebSphere Business Events (the old AptSoft product). IBM also has their 'Amit' CEP engine.

Jerry Baulier said...

Given Aleri is the ONLY CEP vender with certified (STAC) benchmarks, I invite the Microsoft CEP team to take our performance challenge via STAC. To date, no other CEP vendor has taken us up on this - I wonder why:-)

To do this, of course, they will have to support Reuters OMM and update and deletes given this is a consolidated multi-venue order book .

Jerry


Jerry Baulier | Chief Technology Officer, VP - R&D
Aleri - Continuous IntelligenceTM
430 Mountain Ave.. | Murray Hill, NJ 07974 | U.S.A.
Office: 908.673.2440
jerry.baulier@aleri.com | www.aleri.com

Anonymous said...

Charles, you commented, "in the keynote, Bill Veghte talked about how the engine is currently processing 500 million events a day in relation to one of Microsoft's web sites." By my math, that's 347,000 events/minute, or 5,787 events/second. Not very impressive IMHO, and hardly something that calls for CEP. BTW, What's the rate of a typical OPRA feed these days? How about a Reuters equity feed?

Anonymous said...

I believe that MS has mentioned initial target at release of 100,000 events per second

Charles Young said...

Oh come on, anonymous. You can surely do better than that. So MS can, today, point to an unreleased, un-optimised, alpha version of the engine handling 500 million events a day on a live web site. The only thing you can logically deduce from that information is that the web site generates 500 million events a day. No one ever suggested that this figure is some upper limit of the engine's capabilities. Please! We simply won't know what the performance is until Microsoft has released fully optimised code and someone has done some proper benchmarking.

Patience, patience.

Re. the figure of 100,000 events per second, this stems from statements in internal papers, repeated publically at TechEd, that suggest this as a minimum 'ball park' requirement for financial services scenarios. I doubt very much MS would make these statements unless they were confident of meeting and exceeding that kind of performance. Rumours abound that the alpha version of the engine is already performing comfortably within that range, even though yet un-optimised.