/etc/init.d/hack start: January 2008

Thursday, January 31, 2008

REST Sprinkled into my XMPP

Been doing some low-level IQ packet handling in XMPP lately and interestingly, RESTful philosophy seems to be influencing me from afar. We started out the packet design to send a list (roster, but not an XMPP roster) of users associated with an artifact to the remote replica. The initial protocol design had elaborate Add Document, Add User, Remove User, Delete Document packets with corresponding XMPP IQ responses and errors to each. It quickly became apparent that there wasn't really a need to actually have different packets for Add User, Remove User and even Delete Document. When we started envisioning the resource as a document, and simply hammering the list of users along with the document, much the way we would expect an idempotent PUT to work in REST, the entire protocol got much more simple - just send the list every time. When there are no users, send an empty document packet. Otherwise, just send the authoritative list. That's much easier than the book-keeping needed to track if I've delivered a notice for one user and if it's been acked - just send the whole thing every time.

Even better, if the destination misses a packet, the sender doesn't care, it can just send the current one because the new packet is authoritative at the time it's sent.

All said and done, this eliminated a ton of state maintenance on both ends of the protocol. The sender didn't have to remember what it had fired off to the recipient and if it was adding or removing someone, nor did it need to manage the corresponding ack/error packets in detail - if a packet failed, just resend current. Likewise, the receiver was able to simply take the incoming update and make it fact - the reconcile process would ultimately be far easier than the alternative which amounted to checking if a user existed on the document, removing if they did and erroring if they didn't. That level of detail was immaterial to the sender, the sender simply wanted to publish *the* authoritative list of users and nothing more.

Too bad it took a practical implementation to realize most of the early protocol design wasn't needed :) Kind of reminds me of the recent SOAP vs. REST debates.

Monday, January 28, 2008

Really Deep Dynamics

Patrick loses me with this statement:

"Face it. The history of programming is one of gradual adoption of dynamic mechanisms."

I read the post as saying the equivalent of "automatics gradually replaced manual transmission". Sure, it's true depending on how you look at the problem, but I'm not convinced it is really a practical, applied endorsement of dynamically-typed languages. I'm not sure it makes the most sense if I'm worried about demonstrating a performant, maintainable, visually-appealing result to my customers. You may like that automatic for giving average Joe customer a test drive, but it's not what you take on the Autobahn and it's not what you use to haul a big load cross country. More likely, the automatic feature is what you sell the masses who can't use the more precise, focused machinery.

Looking at some of Patrick's points:

"The problem is they are rarely asked by open-minded people." - seems ad hoc and condescending from the beginning. Pot, kettle meet off-white. I'll try and ignore that tone for the rest of the response although it does resonate through the rest of the post.
"I know large systems can be built statically. I've done it in more than one language. I've also done it dynamically in more than one dynamic language. Have you?" - I'm guessing Perl and Python meet this requirement on the dynamic side; pick your poison on the static side, C, C++, Java even C# or VB if you like to make me suffer. From my experience, in code bases of roughly 100 KLOC in size or more, having a compiler in place to check that you haven't made any stupid mistakes was actually helpful. Note that I said helpful, not sufficient. The problem I have with relying on tests for code bases of size is that you assume programmers are capable (as a fallible piece of wetware) of writing near-perfect tests. This is simply not true and as the code grows larger, the chance that your tests are inaccurate increases - this is a function of human nature and deadlines, not any specific programming language, static, dynamic or otherwise.
As a counter point, I spent roughly $2000 of client consulting hours debugging an issue in a commercial product built on Python that looked like this:

[foo,bar] = getImage(file, iso_varriant): ...

In normal circumstances, this returned a tuple of two items. However, in certain file system encodings, this returned a tuple of three items. Python, being its dynamic self, chose to ignore the third item in the return tuple under those conditions. In a full unit and integration test suite, the test for this method passed. In the runtime, it failed because the third argument was ignored. There was no good way to simulate the test because you could only reproduce the pain when you physically had a CD in the drive (due to the way the python interpreter handled the ioctls). In other words, having the world's most awesome, dynamic test suite failed but was still syntactically correct. Having a dynamic language made the failure pass silently where as a statically checked return type would have barked before the program made it to test or bailed with a stack trace in the runtime. Conversely, the Python interpreter quietly chugged along until the failure occurred > 20 lines down in the stack making it even more difficult to diagnose. Not always the case to be sure, but I think you can make equally valid arguments on both sides of the fence if you've used both static and dynamic languages in the wild. Relying on tests for refactoring and coverage of codebases is a luxury of small to mid size code bases. Inevitably, if your code is big enough, it will touch enough edge cases that the tests do not cover all conditions and changing a method will result in runtime errors, not test failures. That is a function of our cognitive limits as humans, not any programming language.

"On the other hand nothing eases the pain of bad projects." - absolutely. Personally I don't see anything about static languages that makes this more likely. From my experience, bad PM can railroad any initiative, static or dynamic.

"Face it. The history of programming is one of gradual adoption of dynamic mechanisms." - so true. Except, it is true in the context of static languages becoming dynamic, not in the sense that dynamic languages are conquering the world. Consider the success of static languages since the 70's:

The browser I'm using to post this blog is written in C++. Cross-platform, dynamic C++ but statically-typed C++.

The system libraries this browser uses are built on C.

The windowing libraries this browser is using are built on C and C++.

The OS the windowing libraries and system libraries used by this browser are built on C and expose strongly-typed C APIs.

The iTouch I used to read the original post was written almost entirely on C.

Nearly every device, feed reader, web server and browser that parses this content will be written in C or C++.

The browser you are using to read this was almost certainly written in C or C++. It might use some JavaScript, but that is optionally typed (based on the current ECMAScript spec and is interpreted by a C-based interpreter).

Conversely, I'm not using a browser built on Smalltalk. My system libs are not built on Scheme. In fact, to be honest, none of the utilities I use on a daily basis are built on a dynamic language (ObjectiveC being the potential crossover and its intents are not to be dynamic). There may be a few Java applications on my MacBook Pro that I use regularly, but even those are heavily reliant on kqueue - a kernel facility - and native windowing libraries. Firefox is proud to announce new Mac-native widgets, not new Smalltalk plugins.

Are dynamic languages influencing all that native code? Well maybe. I can reload kernel modules and I can link to DLLs or SOs, but loading all that static, native code is done through more static native code. I don't use a Ruby linker, the linker that makes my OS do its dynamic magic is proudly compiled to a low-level set of assembly instructions and usually machine-specific instructions, coded by a handful of really smart people at Microsoft or the GNU Foundation.

I'm hard-pressed to come up with any end-user applications that I find useful which do not directly depend on a statically-compiled something. Yes, some of the web applications I use leverage Ruby, Seaside, etc., but the browser or RESTful client library that access them is (for me anyway) immediately based on a statically-typed library (Firefox, libcurl, etc).

So, I'm probably missing Patrick's point. Most of the innovative apps I use don't depend on dynamic programming languages. In fact, if you took away many of the languages Patrick sites - Scheme, Lisp and Smalltalk, I would be able to go about my day without a single glitch. Conversely, take away C, C++ or Java, and I'm pretty sure I'd notice (i.e. no OS, > 75% http traffic, ~50% of http applications servers respectively).

Friday, January 25, 2008

XMPP For Integration

Integration bloggers have been talking about using XMPP for years now. The XMPP community has entire specifications dedicated to things like RPC, Service Discovery and PubSub that make weaving together distributed systems easier (I think the native presence information helps as well). Matt Tucker gives a solid overview of XMPP in the integration space with a new post that is getting a lot of press coverage and shot to the top of Digg. I feel like XMPP's potential as an integration tool is finally starting to get the attention it deserves. Maybe a result of the imploding WS-Death-Star stacks and heavy-weight standards that have plagued integration developers for years? Maybe a natural gravitation towards a better technology stack - hopefully.

Sunday, January 20, 2008

River Disappoints Again

I had a desire to see if I could get Jython or JRuby working with Apache River this weekend for some distributed monitoring and task execution I want to try. Need something quick, reliable and with code mobility and ideally can integrate easily with Java. I did see some new updates on the River site which was initially encouraging, but after digging, looks like little has changed. There were no source distributions which was concerning, but I'd used the older Sun distributions of Jini before, so I wasn't scared about building from source. I proceeded to SVN check-out. Sadly, the listed SVN URL doesn't work - there is nothing there.

Walking up the tree to the /asf/ root, there is no sign of the source or the river project. I'm sure it is around somewhere, but seriously, how much less approachable could this project be? This is no way to attract new users. I didn't think it could get worse from the old Sun site; I was wrong. What a shame.

Looks like it's back to Rinda then. I've avoided Rinda in the past for important data given that there is no reliable storage of tuples. At this point, probably easier to bolt that on to Rinda than to pull anything meaningful out of River. Alternatively, maybe a better approach would be to make smarter Rinda clients that can survive a failure of the master ring server with some local persistence and eventual consistency approach.

Thursday, January 17, 2008

CSRF is the new XSS

I've been looking at Chris Shiflett's CSRF GET to POST converter and have to say that it's got me a bit freaked out. I don't normally do the annual prediction thing, but the more I look at it, the more I think we'll see 2008 as the year of the CSRF, particularly if social networking sites continue to grow in popularity among less-technical users.

I've seen mention that the attack can be mitigated by using a nonce of sorts in the form and session data, a value that must be posted back with the form for a valid request. But as I look at how Chris executed his redirector (loading an iframe on click), I can't help but think that once I have an unprotected and confused security context on the browser, I'm able to work around such nonces when:

1) I can make a reasonable guess that the user has an authenticated session. I don't always have to be right, just every couple of times will do fine.

2) I can parse out a response from the server such that I can snag the nonce parameter. Not hard once I have a confused, unprotected scripting context in an iframe that the browser is mistakenly trusting.

#1 is not a big deal. I can make some reasonable guesses, target my links appropriately using a bit of social engineering. Doesn't have to be terribly accurate, just moderately successful and I can grab some "seed" accounts from which to stage future attacks.

#2 is easy - if I can execute script in an iframe (as Chris' demonstrates), then I can string together multiple XHRs. As long as the user has a session I can piggy-back on, extracting a form nonce is no more difficult than submitting the post in Chris' redirector.

Would love to hear some thoughts on how others are dealing with the issue in the face of user-provided content.

Oh, and if you're wondering how to execute one of those attacks, you have a Blogger account *and* you clicked on the link to Chris' account above, consider yourself vulnerable - that's about all it takes.

Sunday, January 13, 2008

+1 To Yahoo! Games '08 Predictions

Yahoo! Games is spot-on for their predictions of the upcoming gaming year.

In particular, I think we'll see the following:

1) The Wii will jump the shark - I played the Wii for several hours over the holiday season and while it was fun, the novelty wore off quickly. I tried like mad to will Wii Sports into an online mode but to my dismay, that doesn't exist. You can only swing a virtual tennis racket in the confines of your own living room for so long while your neighbor is playing madden with friends across the country on a system two generations ahead of the Wii.

2) In spite of being the worst-marketed, most technically advanced gaming platform ever devised, the PS3 will rebound. For $200 more than the cost of a Blue Ray player, you also get a multi-cell processor on a linux-driven beast of a gaming system that helps society by lending it's processing power to protein-folding analysis in the off hours. I have a ton of stories about how its setup was better than the Wii's or XBOX 360, but those will have to wait for another post. In short, PS3 continues to lag in online content, but eventually I still believe that the hard-core gamers who spend the most money will continue to gravitate towards the PS3 and away from the Wii and the developers will follow. The early adopters I know (including me) are moving towards the better platform, in spite of Sony's seemingly insane approach of delivering the platform. I'm convinced devs will follow, nobody wants to build on the gaming equivalent of Visual Basic when the most advanced 3D platform ever is growing it's install base.

3) PC gaming will continue to deteriorate. I love my PC-only games. I think most games being developed today actually play better with a mouse and keyboard rather than two awkward analog sticks under-thumb. And while I've really loved playing WoW for the last few weeks, games like Crysis are simply making me miserable. By all accounts, I have plenty of hardware to play Crysis, but I still get Vista core dumps on a regular basis. As a result I'm rebuilding my awful NVRAID mirror on a regular basis (I'll never buy an NVidia RAID solution again, that monster of BIOS + Vista + Software RAID has cost me more time and pain than a decent hardware RAID solution ever would). Crysis is the best FPS I've ever played. It's immersive, well-designed and capable of making even the most modern GPUs sweat. None of that scares me, it's Vista's instability in 64-bit gaming and the resulting constant recovery exercise I'm forced to do that will push me closer and closer to purchasing console-only games.

/etc/init.d/hack start