Computing MD5 Hash of varbinary(max) in SQL Server CLR

SQL Server provides the HASHBYTES function which computes the MD5 hash value of the bytes passed to it with one problem... it only accepts the older (pre-2005) max length of 8000 bytes in a varbinary. That's useful, I suppose, if you're trying to get the MD5 hash of a password, or a name, but completely useless in a case like I had where we needed to compute the MD5 hash of a blob column -- and a blob colum where the max datalength is dozens if not hundreds of megabytes. I needed something different!

Our first pass was to write a command line application that selected the filedata column (the blob) from the table and then just computed the MD5 from w/in c#. That made some sense since the point of this was to check an export we were doing of the file data from blobs to files in the file system but it just seemed wrong somehow. If only HASHBYTES wasn't broken! Then it dawned on us, we could write a SQL CLR function that computed the hash value from w/in SQL Server. This made infinitely more sense.

If you haven't played with them, SQL CLR functions allow you to code a static method in your .Net language of choice (C#!) and then install it into SQL Server as if it was a function you'd written in TSQL. You can use it like any other function in a TSQL statement. SQL Server actually acts as a CLR host, starting the appropriate app domains and managing memory consumption (we'll touch on that later). Your .Net code is actually running IN the SQL Server process! If you're wondering what Microsoft's commitment to SQL CLR is, the new geographic data types in 2008 are implemented using SQL CLR functions! Even though HASHBYTES is broken, we could write our own hash method that did work on varbinary(max) values and still compute the hash value from w/in SQL Server.

My immediate requirement was for a MD5 hash of a varbinary value, but I wanted to make the api slightly more generic so I came up with a method signature that took the varbinary we wanted to hash and the type of hash to compute. This way the same method could handle MD5, SHA1, etc.

My first pass looked like the code below. And it worked great in testing. I passed in the blob to hash, and a string to tell me the type. For example: select dbo.ComputeHash(BlobColumn, 'md5') from TableName and it worked just fine (in test!). I installed it in production and let her rip!

As proof of why you should test corner cases two things happened in production. First, what happens when the code above attempts to compute the MD5 hash of a null? That is, the passed in SqlBinary value is null because the underlying column is null? Well, it goes boom! That's what happens. This isn't a big deal, as the fix is simple, if the Source value is null then return a null. I wasn't sure what the "right" value to return was, but I figure the hash of a null is different from the hash of an empty string, so returning null made the most sense. Problem fixed... rerun...

OK, now there's a new problem. The code ran fine for the first 5000 or so blobs it handled, and then boom! We got around to a blob that was 32MB in size and that was just too darn big. SQL Server killed the CLR app domain because it was consuming too much memory! Remember, the CLR integration is running in-process in SQL Server, and SQL Server's primary concern is to protect itself. When an app domain gets too big SQL Server kills it. Specifically you get:

Msg 6532, Level 16, State 49, Line 1
.NET Framework execution was aborted by escalation policy because of out of memory.

It took some Googling but I came across Chris Bernard's blog post Out of memory exception while attempting to do sql clr and that shed some light on the issue, thanks Chris!

However, I didn't like the idea of messing w/the SQL Server settings, especially since this was a one-time function we were running. So I needed a different solution. I knew the problem was that we were passing the blob in as one giant memory allocation. SQL Server probably buffered it up in its memory, then passed a copy to the CLR which probably also had a copy or two hanging around. This is all fine when you're passing 500 bytes, but with a 32MB blob it adds up fast! If only we could get raw access to the SQL data...

I learned long ago that if you find yourself saying "Gee, it would have made sense if Microsoft did 'x'" then they probably already do 'x', you just need to find out where/how. And this was no different. In addition to the SqlBinary class, there's also a SQLBytes class. And guess what, the SQLBytes class provides a property on it called Stream that is access to the underlying stream of bytes. Since the ComputeHash function takes a stream just as readily as it takes a byte[] we could simply pass the stream in and everything worked.

I kept an eye on the memory usage of the CLR with this new code and it never exceeded 75MB, a much better solution!

Next time we'll look at how to install the CLR function in SQL Server and we'll see why I needed the command line applications to create a hex representation of a file and to popupate the clipboard from the command line.

Command Line Hex Representation of File or Standard Input

In my last post (Populating Clipboard from Console App) I showed a *simple* command line app to copy something to the clipboard. The "something" I wanted to copy to the clipboard was the output of this simple app -- a command-line application to take a file (or any standard input data) and return the hex representation of the input. That is, if you have a file with "Walden" then this app will return "57616C64656E"

Why in the world would I do this? Well there are several reasons, including looking to see the actual byte values in a file if you're checking for unicode/code-page issues, or checking to see if there are hidden null values at the end of a file, but the main reason I wanted it was to give me the hex representation of a CLR DLL so I could create a CLR assembly in SQL Server. You can include the path to the DLL in the CREATE ASSEMBLY statement, or you can include the actual DLL bits. I prefer to include the bits, but that's another post.

I created the AsHex command line application to do this for me. It simply takes what's passed in as standard input and writes its hex representation to standard output -- very unix-like, hard to believe for a Windows bigot like me, I know.

Want to get the hex representation of some text? You can 'echo' it into AsHex like this:

echo Test Text | ashex

Do that and you'll get this as output:

546573742054657874200D0A

Note that echo honors everything up to the pipe character, including the trailing spaces (0x20), and echo adds a CRLF (0x0d0a) to the end of your input.

Now being a standard command line tool you can also redirect standard input with the < sign. So to accomplish what I wanted (the hex rep of a dll) I can type:

ashex < SqlHashFunctions.dll

and I'll get something that looks like this (all on one line):

4D5A90000300000004000000FFFF0000B800000000000000
400000000000000000000000000000000000000000000000
000000000000000000000000800000000E1FBA0E00B409CD
21B8014CCD21546869732070726F6772616D2063616E6E6F
742062652072756E20696E20444F53206D6F64652E0D0D0A
2400000000000000504500004C0103007AE32F4B00000000
...

The code for this is straightforward:

We simply open the standard input stream and read the bytes one by one and write the hex representation ({0:X2}) to the console.

And if I want the output on the clipboard then I can combine this with the clipboard application and do this:

ashex < SqlHashFunctions.dll | clipboard

and I'll have the hex representation of the dll on the clipboard, ready to be pasted into my SQL script.

Populating Clipboard from Console App

As part of creating a SQL Server CLR function (other posts to come) I needed to copy the output of a command line app to the clipboard. Now I could of course simply redirect the text to a file and then open the file in Notepad++ and do a Ctrl-A Ctrl-C and go, but that offended my command-line sensibilities. So instead I wrote a short console application that takes standard in and copies it to the clipboard.

To say the code is trivial would be an understatement, it's a one-line program. However I did discover that if I didn't set the main function to be STA threaded I couldn't access the clipboard. This makes sense to me since the clipboard access is via COM and COM doesn't like non-STA things.


Like I said, simple code. We set the clipboard data to a text format copy of everything in the stdin stream.

Usage is simple too, just pipe what you want on the clipboard into the program. For example, want a directory listing of c:\ in the clipboard, do this:

dir c:\ | clipboard


Or as another post will show, want the hex representation of a DLL on the clipboard? Do this

AsHex < TheFile.dll | clipboard


You will need add a reference to PresentationCore to get access to the System.Windows namespace. And remember, just because it's a PresentationCore function doesn't mean you need to be in WPF or WinForms to use it.

Upgrading Windows 7 RC1 to RTM Version

Disclaimer: Upgrading from RC1 to Release To Manufacturing (RTM) of Windows 7 is not a supported upgrade so you do this at your own risk! Having said that, the upgrade from Beta 1 to RC1 wasn't supported either and lots of people, myself included, did that just fine.

If you attempt to run the upgrade of Windows 7 RTM on an install of RC1 setup will check a couple of things and then present you with this friendly (not) message: "You can't upgrade this prerelease version of Windows 7. Go online to see how to insall Windows 7 and keep your files and settings."

However, just like the beta 1 to RC1 upgrade, there's a simple fix. If you haven't already, copy the install files to a location you can modify, a directory on your drive works great. Then open that directory and locate the CVERSION.INI file in the SOURCES directory. It's a simple text file, so open it with notepad. The file should look something like this: Change the contents of the file, specifically setting the MinClient value to 7000.0, as shown below: Once this is done rerun setup, and Windows will happily upgrade your RC1 install of Windows 7 to the RTM version.

Windows 7 (7057) on Acer Aspire One

I'm running Windows 7 build 7057 (appears to be RC1) on my $299 Acer AsipreOne netbook from Costco and it works great! Win7 even found the built in Webcam! The latest install went just fine, booted off my USB key and did an upgrade from the earlier version of Windows 7 I was running (build 7000). Only problem was there were 4 devices listed as "Base System Device" in device manager on the machine -- they'd been unknown in build 7000 too.

In device manager these devices were listed as Vendor 197B and device 2381, 2382, 2383 and 2384. For those googling, this would be device IDs (as an aside, anyone know of a centralized list of these?):

  • PCI\VEN_197B&DEV_2381
  • PCI\VEN_197B&DEV_2382
  • PCI\VEN_197B&DEV_2383
  • PCI\VEN_197B&DEV_2384
Everything seemed to work, so I wasn't that worried about it, but I hate having unknown devices. So after some digging I determined that Vendor 197B was JMicron so I guessed these had to do with the card readers on the netbook. This was odd since the card reader was recognized by Win7, but then again, I'd tried to use an SD card for ReadyBoost, and while Windows found the device, and tried to create a ReadyBoost file on it, Windows would always then pull the file off, saying there had been an IO error.

A little hunting led me to an FTP site for JMicron's drivers, specifically ftp://driver.jmicron.com.tw where, within the /jmb38x/XP_Vista folder there are a list of drivers for the missing devices. Odd, since these are WHQL certified drivers, I would have thought that the centralized Microsoft repositoy in the sky would have found them. Maybe because they're certified for XP and Vista but not Win7?

I downloaded the JMB38X_WinDrv_R1.00.26_WHQL.zip driver and installed it. Worked like a champ! I now have four JMicron devices listed under the "Memory Technology Driver" branch on the device tree, and readyboost now seems to like my SD card too!

Added 15 Oct 09 I'm upgrading my netbook to the RTM version of Win7 and I see that JMicron now has version 34 of the drivers out, and they are Win7 certified now. Available at the same download location.

LISUG iSeries Debugging Presentation

Gee, an attempt at LiveBlogging! I'm sitting at the March LISUG meeting at the Fox Hollow listening to Charles Guarino's presentation on debugging applications on the IBM i.

The first thing that dawns on me is the relative ignorance of the user base with regard to the options for developing applications on the i with a rich IDE like RDi, as opposed to something like the old tired SEU. RDi and it's predecessor, WDSC, have been available for years, and provide a much richer development experience than SEU. Face it folks, this is the 21st century (has been for a while) and we should all be using 21st century tools. The second thing that amazes me is that it's a big deal that the new RDi runs on machines with 512M of memory. Folks (I like that word today) go read Joel Spolsky's blog about writing better code, especially #8 and #9.(and he puts his money where is mouth is). 512M? I think I have that on my video card!

Into the actual presentation... First part is an overview of the RDi tool. RDi is based on Eclipse and as such has all the features of Eclipse; this includes perspectives, GUI interface, drag/drop, keyboard shortcuts, etc. Again, the fact that this is cool and needs to be highlighted shows the current state of development tooling in the i community. For those of us in the Microsoft community using Visual Studio, or for many Java developers, having to highlight these capabilities is like having to highlight that the room you're sitting in today includes gravity and oxygen -- "like, no d'uh"as my kids would say.

Important step, you've got to start the "debug server" on the i. Since RDi is running on your PC, and the PC (obviously) can't run IBM i programs (RPG, Cobol, etc.) the program must run on the i; and there must be some sort of interface between the program being debugged on the i and the debugger running on the PC. The Debug Server is this interface. One Debug Server can handle all the programmers on the machine, so it only needs to be started once. It can be started from the command prompt too (STRDBGSVR), so you might as well just put it in QSTRUP and forget about it, at least on partitions that have active development.

OK, now we're getting to the meat of the presentation, an example program that Charlie will be debugging from within RDi. We're going to debug a SAX XML parser written in freeform RPG (so much for simplicity :-)) . The program seems to read a table to find the list of files in the IFS to parse, and then for each file found, parse it, and do something. To debug a program you simply right-click on the program and select debug. You need to have compiled the program with DBGOPT(*SOURCE), but that should be your default compile setting anyway. You set a service entry point and then click ok. This then connects to the debug server on the i and starts your debug session. A nice feature here is you can actively debug multiple jobs at the same time, remember you're limited to one STRSRVJOB at a time on the i. When you started the debug session you specified a user profile; this user profile tells the i what job(s) to debug. The i will debug the next call to the specified program by the specified user. This is how RDi, the debug server, and the green screen are all linked together.

From w/in RDi you now have access to the program in debug mode. We can do the typical things we would expect to do in a debugger. We will see the variables and the source. We can step though the code with F6 (was F10 in STRDBG) and jump out with F8 (F12 in STRDBG). A nice feature here is that you can click on the "blue dot"on the right side of the screen, and the editor will jump to the current line. Also on the right side are a number of icons that will show you different views, like breakpoint view, or variable view. Of course, being Eclipse based, I'm sure you can rearrange these windows any way you want. RDi also includes the ability to hover over a variable and see the current value. Typical rich IDE debug stuff, but a nice addition to the tools a typical i developer is used to.

This is where things start to go wrong for Charlie -- and while it wasn't my fault it's kind of funny that it goes wrong after I asked him change the value of a date field to '2009-02-30' using the debugger. Actually, until now he's been using a Verizon aircard to connect to his machine in his office so he could show us the use of RDi live. Unfortunately the Verizon coverage around the Fox just isn't up to par and the connection dropped several times. So off to the slides now. Thankfully Charlie is prepared enough to have slides to show. But a nice addition would be to actually record a debugging session with Camtasia and then we could have seen it in action, even w/out a connection. This is also when my netbook's battery died, so the rest is from memory; and I've got poor memory. :-) From slides Charlie does show several steps in the debugging process, but he really needed that connection.

Overall, RDi does provide a much richer interface for debugging (and coding) applications for the i, and everyone should be running it. It's a shame that companies aren't willing to spend the $800 for the tool, and the $500 for the machine to run it. After all they're spending 80 to 100K on each of the developers writing the code, and that's a yearly expense, where the tooling and hardware would be amortized over several years.

In conclusion, great job Charlie. Just wish Verizon had behaved more!

.NET Podcasts that are on my iPod

Based on a Tweet from @shanselman this is a list of the podcasts I listen to on a regular basis as a .NET developer. Do I always find the content useful or interesting? Of course not! But sometimes I'll force myself to listen to something I don't think will be of interest just to make sure I'm not getting stuck in a rut, and I'm staying current with my knowledge.
  • Hanselminutes - This was my introduction to podcasts. Of course I chose a podcast with great production quality as my intro, so short of .NET Rocks, there aren't many that sound this good.
  • .NET Rocks (DNR) - If Hanselminutes got me into podcasts, then DNR made me an addict. Coming up on their 400th show, there are few podcasters as prolific as Carl and Richard, and the sound quality is amazing. Content rocks too!
  • Monday's - Definately NOT worksafe! Monday's is a break from the traditional technology podcast, but if you've got a sick sense of humour, and if you like DNR you owe it to yourself to chechout Monday's.
  • RunAs Radio - If you sit in a corner and never think about deployment of your applications, and just pound out code, then you can pass on this podcast, but if you're like many of us developers, where you need to understand the complete lifecycle of the applications you're developing, this is a great podcast. Wonderful to hear about the operations and infrastructure side of the world. And it's got Richard from DNR!
  • TechNet Radio - I'll admit this one is pushing the edges of my interest, but hey, disk is cheap! Some of the topics are interesting, but sometimes they just go off into a world I don't care about. Still, nice to have on the iPod as an alternative.
  • ARCast - An older show by podcast standards (last show in the end of 2007), I don't think it's in production any longer. However, they cover(ed) a number of good architectural topics and the bottom line is we're not all developing applications using Azure and MEF, the "older" topics are often the ones we need to hear.
  • Polymorphic Podcast - Don't know if Craig is just on haiatus, or if he's stopped making this podcast. Last updated was 10/13. Still, he touches/touched on a number of interesting topics from web parts to SEO to Ajax.
  • Herding Code - More like herding cats sometimes. I'm really torn on this one. In some ways it seems to me like amature hour, with a bunch of guys on a conference call, then again, they're producing a podcast on a regular basis, more than I can say. And their content isn't bad at all. I like the fact that you get different points of view on the topics.
  • Thirsty Developer - One of those podcasts that I wonder about since they are both MS employees, but then their content is typically useful and I've never found them to have a bias, or at least not one they didn't admit to.

UCC Certificates, IIS and GoDaddy.com

OK, so I wanted to do something simple, I wanted to get a UCC certificate from Go Daddy. It would seem simple enough. However, what's not clear is how you actually generate the CSR for the UCC certificate.

For those that don't know, a UCC certificate contains Subject Alternative Name (SAN) entries for your server. This single certificate can be used to secure multiple web servers, similar to a wildcard certificate, except the SANs on a UCC cert don't need to be related. That is, you can have www.mycompany.com and www.mycompany.net on the same UCC cert.

Turns out the process for requesting a UCC cert is really simple, you generate a CSR from IIS just like you would for a single host name. Then you paste that CSR into the GoDaddy site and since GoDaddy knows you're generating a UCC cert IT then asks you for the Subject Alternative Names you want added to the certificate. I spent half the afternoon trying to figure out how to add those names into the CSR on IIS, turns out you don't need to!