Archive for category Geeky

Enterprise Data Integration: The State of the Art

Recently I had to get up to speed on Cast Iron data integration solutions, now part of IBM. I have to admit, I went into it a bit pessimistic about the very notion of “point-and-click programming.” The basic value proposition of Cast Iron is to make it easy to move enterprise data from one place to another, for example, synchronize an Oracle database of accounts and customers with your Salesforce.com database.

cast-iron-diagram

You do this by creating Projects with Orchestrations which are made up of Activities that talk to and listen to Endpoints like HTTP, FTP, SMTP, Database. Most Activities can map and transform data before passing control to the next Activity. There are also control flow Activities like If/Then, Do/While and Try/Catch. This kind of visual activity building isn’t unique to Cast Iron of course, and it makes pretty pictures. The diagram component used in Cast Iron Studio, a Java desktop app, is quite nice to work with.

Within Activity blocks you can map data by dragging and dropping:

cast-iron-map

I learned that the mapping interface is essentially an XSLT generator. When you peek a bit under the covers of Cast Iron, you’ll discover that it’s all about converting data to XML representations, building XSLTs and converting XML back to database calls or strings or whatever your end point needs. But Cast Iron hides all that from you, mostly. Once you’ve built your Orchestration and debugged it to your satisfaction, there’s a server (“appliance”) that you can push the Project to, and it’ll host the whole thing. Or you can have Cast Iron host it for you in “the cloud.”

It’s all actually very neat once it comes together and works. And it’s true, for basic data integrations no programming is required, and you can just point-and-click your way to data integration nirvana. Sounds easy right?

Reflecting on this technology, I’m struck by one thing: data integration is still hard.

We had a small but diverse group in the training session. Most participants introduced themselves as developers or systems analysts. But even with a lot of hand holding and a very knowledgeable and effective instructor, these fairly basic exercises often proved challenging to people. At various points, each person got stuck and needed help to get unstuck. Plus these exercises seemed nowhere near the level of complexity real world data integrations would have.

Making changes required lots of hunting, clicking, dragging, typing little bits of text, waiting and repeating. Even putting aside numerous UI annoyances and glitches, the experience of building even a fairly basic Orchestration with a small handful of Activities can be pretty frustrating. Looking around the room, I felt that while people were pleased when they finally got a lab exercise working, they were concerned that they needed so much help from the instructor, and were often stymied trying to work things out on their own. Although my background made the exercises pretty straightforward for me, it was clear this stuff still isn’t quite within easy reach.

Makes me wonder: can data integration really be easy one day?

What if you could run software that works like this:

  • Asks you: Where’s your data?
  • Asks you: Where do you want to move your data?
  • Gathers and verifies all authentication information, then gets to work:
    • Performs inspection of data sources and targets, including random sampling,
    • Algorithmically generates ETL Orchestration, highlighting areas of greatest uncertainty,
    • Include logging, email/SMS alerts, error handling, apply intelligent upserts, etc. all heuristically and algorithmically determined,
    • Automatically creates “staging/sandbox” environments based on the data targets – then show you Previews of the data integrations without having to make your own staging environments
  • If needed, then deeper customization can be made by hand,
  • Once you’re satisfied, allow one click deployment and activation

Perhaps a very naive vision, but this is what Usable Data Integration would be to me. Although tools like CI are nice, and they pay a lot of lip service to simplicity, it’s still definitely not “simple” except in the simplest scenarios. Yes, the “secret sauce” would be that magical algorithm in the middle step.

Maybe integration has innate complexity and can never be made simple?

I’d like to think and dream that integration can be simple. It’s just a hard problem.

,

1 Comment

Human Tolerant Software

Suppose you have a date range picker in your user interface, like this:

Date range picker

This picker acts as a filter for selecting records of things that are themselves ranges of dates.  For example, project tasks that have start and end dates.

The usability question is, should the filtered records be ones that (A) fall inside the selected date range, or (B) intersect the selected date range?

Answer: B

One guiding instinct is to err on the side of showing more information when given two otherwise equal options.  However, I feel this “rule” is weak because at some extremes, you wouldn’t want to follow it.

My key rationale is that from a usability standpoint, software should be tolerant of human imprecision.

Win95 vs. WinXP Start Buttons See, for example the classic example of how Microsoft Windows designers snatched defeat from the jaws of victory in Windows 95 by not allowing you to click in the bottom-most left-most pixel to invoke the Start menu…you have to actually move the mouse a few pixels up from that, a source of great frustration particularly to novice, sticky mouse ball, twitchy, funky mousepad, visually impaired, screen misaligned, elderly and disabled computer users the world over.  See also Fitt’s Law.

In our example, our designer has two real choices:

(A) Expect the human to always know and properly pick the "bounding dates" and exclude anything that doesn’t lie inside the circumscribed dates.  The worst case here is that the human is wrong, and when that happens, records will be missing the she/he should be seeing.  And she/he may not even know that records are missing in some cases,

OR

(B) Be more tolerant and expect that maybe the human accidentally picked 1/2 instead of 1/1, or maybe the human forgot that there’s a task that started on 12/31.  Show more.  Worst case: we show too much.  Then the user has to refine his/her search, or simply ignore the extra record(s).

Which software would you prefer to use?  The one that expects you to be a machine?  Or the one that expects you’re human?

No Comments

Getting the raw SOAP XML sent via SoapHttpClientProtocol

Suppose you’re using the .NET SoapHttpClientProtocol to invoke a web API.  This is what happens when you use Visual Studio to add a Web Reference and automatically build a proxy for you.  Now suppose you want to programmatically access the raw SOAP XML that you’re sending to the web API.  Sounds straightforward, right?

Turns out, it isn’t straightforward at all.

Looking online for some help, a few solutions have been proposed.  Some people suggest using a network sniffer or HTTP proxy to get the raw SOAP XML.  That can work, but it’s not a good solution for programmatically getting the XML.  It’s also somewhat labor intensive to setup initially and then use on a regular basis.

It’s probably possible to create a SOAP Extension to do this.  But that’s a bit heavyweight for my purposes.

One guy dug into the guts of the .NET assembly in the debugger to find the spot in memory where the XML document can be found.  Impressive, but again, not entirely practical for programmatic access.

After some trial-and-error, I decided upon a strategy that allows us to get what we need pretty reliably.  Hopefully this helps you.

Let’s say your WSDL has a service called HelloService (from the WSDL’s <service> tag).  When you add it as a Web Reference, Visual Studio automatically creates a nice class called HelloService derived from SoapHttpClientProtocol.  What we want to do is this:

HelloService svc = new HelloService();
svc.doSomething();
string rawXml = svc.Xml();

To add the Xml property, we could just add directly to the auto-generated class.  But that’s not ideal because if you get a new WSDL, for example, and re-generate the class, you’ll destroy changes you’ve made.  So let’s create our own subclass of HelloService.  That’s easy enough, something like this should do it:

namespace MyProject
{
   public class MyHelloService : HelloService
   {
      public MyHelloService : base() { }
      public string Xml { get { return null; } }
   }
}

Now  we should change our code to use this new subclass:

MyHelloService svc = new MyHelloService();
svc.doSomething();
string rawXml = svc.Xml();

So far so good, but what now?

Now we need to intercept the XML that gets created during SoapHttpClientProtocol.Invoke().  There’s a convenient point for doing that: GetWriterForMessage().  It’s responsible for returning an XmlWriter that gets used to build the SOAP XML message.

To do that, we’ll need our own XmlWriter that wraps another XmlWriter, the original one returned by the HelloService class.  Our strategy is to intercept all calls to the original XmlWriter and write those to our StringWriter.  It’s an XmlWriterSpy.  Here’s how it looks (some methods omitted for brevity):

namespace MyProject
{
    using System.IO;
    using System.Xml;
    public class XmlWriterSpy : XmlWriter
    {
        private XmlWriter _me;
        private XmlTextWriter _bu; // Buffer to write XML to
        private StringWriter _sw;

        public XmlWriterSpy(XmlWriter implementation)
        {
            _me = implementation;
            _sw = new StringWriter();
            _bu = new XmlTextWriter(_sw);
            _bu.Formatting = Formatting.Indented;
        }
        public override void Flush()
        {
            _me.Flush();
            _bu.Flush();
            _sw.Flush();
        }
        public string Xml { get { return (_sw == null ? null : _sw.ToString()); } }

        public override void Close() { _me.Close(); _bu.Close(); }
        public override string LookupPrefix(string ns) { return _me.LookupPrefix(ns); }
        public override void WriteBase64(byte[] buffer, int index, int count) { _me.WriteBase64(buffer, index, count); _bu.WriteBase64(buffer, index, count); }

        // ...more overrides omitted, you get the idea...

        public override void WriteSurrogateCharEntity(char lowChar, char highChar) { _me.WriteSurrogateCharEntity(lowChar, highChar); _bu.WriteSurrogateCharEntity(lowChar, highChar); }
        public override void WriteWhitespace(string ws) { _me.WriteWhitespace(ws); _bu.WriteWhitespace(ws); }

    }
}

Lastly, we just need to use this new XmlWriterSpy class in our MyHelloService class.  Here’s how:

namespace MyProject
{
   public class MyHelloService : HelloService
   {
      private XmlWriterSpy writer;
      public MyHelloService() : base() { }

      protected override  XmlWriter GetWriterForMessage( SoapClientMessage message,  int bufferSize)
      {
         writer =  new XmlWriterSpy( base.GetWriterForMessage(message, bufferSize));
         return writer;
      }

      public string Xml { get { return (writer == null ? null : writer.Xml); } }
   }
}

There you have it.

Update (April 2, 2010): As Robert pointed out, it would be nice to have the XmlWriterSpy easily downloadable in its entirety.  True!  Here it is.

, , ,

21 Comments

iTunes 9 and Custom iTunes Folder

itunes-logo I am not a fan of the iTunes software or philosophy.  On Windows it’s sluggish, the UI is unconventional and awkward for the platform and it forces the user to accept a bunch of other software like Quicktime.  The philosophy is patronizing and heavy-handed.  Apple says, "don’t worry your pretty little head about where your files are, we’ll take care of everything."  And, "don’t sweat the music format, just play this where and when we tell you."

Unfortunately, it’s probably the best and only way to manage and sync content with my iPods, which I use mainly for exercise and road trips.  So I want all the MP3s available to other programs, like when I make family slideshow movies.  I want to store everything in a separate data volume (drive) that can be shared with my whole home network and backed up on its own schedule.  For me iTunes is a utility for subscribing to my podcasts, managing a few playlists and syncing with iPods.  It is not my media management tool.

But it wants to be.

Here’s how I moved my iTunes files after some experimentation and reading helpful snippets scattered on the ‘net.

Before You Do Anything

Back up your iTunes folder.  If you haven’t already customized everything already, on Vista or Win7 it’s usually somewhere like "C:\Users\<you>\My Music\iTunes".  Just copy the whole folder somewhere.  It may be very big if you have a lot of stuff in there.

If you’re transitioning from one computer to another, the safest bet is to also "Deauthorize" your computer first.  Then Reauthorize it later, after a reinstall.

Install iTunes

Just do a straight install.  Then quit the program.

Attach Your Library

OK, here’s the tricky part.  Go your old iTunes folder.  Mine was on "D:\Music\iTunes".  Maybe yours is on a removable disk or something.  In that folder is a file called "iTunes Library.xml".  Open it with a text editor like Notepad.  Make sure all the paths for all the music and podcast files are right.  Fix with Find-Replace anything that’s not right.

itunes-xml

Now edit the file "iTunes Library.itl" with Notepad.  It’ll look like gobbledygook.  Delete everything in the file and Save it.  That file should now exist, but be 0-bytes in size.

itunes-itl

Now launch iTunes while holding your SHIFT key down.  You’ll be prompted to choose a Library.  Browse over to your old iTunes folder and select the "iTunes Library" file there.  Presto, now iTunes will rebuild itself and point to your old iTunes folder!

If you want, you can now delete your default iTunes directory at "C:\Users\<you>\My Music\iTunes". 

Fix Podcasts

Podcasts may not get added automatically.  This is a pain.  So you’ll need to use File > Add Folder to Library… to find your old podcasts directories.  Then you’ll need to re-subscribe to them.

Final Tip: Scheduling Podcast Download

I don’t know about you, but I don’t like to leave iTunes running all the time.  With a bunch of podcasts subscribed, it tends to suddenly lag the internet while downloading.  Instead, I use the Task Scheduler to create a task to run iTunes once a week while I’m asleep.  This means you need to leave your computer on of course.  When it runs, as usual, iTunes tries to download any new podcast episodes.

,

2 Comments

Going to Windows 7 64-bit

Last evening I bit the bullet and did it: I went for the Windows 7 64-bit Pro upgrade on my main home desktop.  Since I was upgrading from Vista Home Premium 64-bit, I needed to do a clean install, no in-place installs for me.  See the chart from Microsoft at Upgrading Your PC to Windows 7.

win7 After going through this, I have to say, it doesn’t seem worth it for the average consumer who’s already on Vista.  I suppose if you can do an in-place upgrade, that’s fine.  But wiping clean and re-installing everything, and running into compatibility problems with software…that can’t be worth it for most people.

If you’re buying a new PC with Win7, great.

So some of the snafus I ran into:

  • Norton Internet Security 2009 – not compatible and refuses to install.  Will need to get 2010 version for Win7 support…if I still want to stick with Norton.
  • Norton Ghost 14 – Win7 warns that it’s "incompatible".  Before installing anyway, I did some searching, including this amusing thread.  Sure enough, it seems to run just fine.  There’s no newer version to upgrade to yet anyway.  I’m also exploring alternatives, like Macrium Reflect and Acronis.
  • Microsoft Virtual PC 2007 SP1 – This one is vital to me, since I have several VPC images I use regularly.  In fact, using VPC images is one way these clean OS upgrades are feasible for me.  Officially, VPC 2007 is not supported for Win7.  Unofficially, it works fine, more or less.  You might wonder why I don’t spring for the built-in Win7 Virtual PC.  It requires hardware virtualization support.
  • Google Picasa 3.5 – The trick here is to take a backup of your old \Users\<you>\AppData\Local\Google\Picasa* folders and drop them in after installing a new Picasa.
  • iTunes – This was so convoluted, I’ll need a separate post…

As I was popping DVDs in, I was wishing Win7 included built-in ability to mount an ISO image as a removable drive.  I back up all my important (read: purchased) software discs, and this would have been a lifesaver.  I know, there are a number of 3rd party tools, but I’m not interested in inadvertently downloading malware.

Some highlights/changes in Win7 that most stuck me are: Windows Calendar is gone, WordPad & Paint got an overhaul, Super-Taskbar of course, Click-Titlebar+Shake, slicker Themes, included PowerShell, StickyNotes.  In short, lots of little superficial things.  Hopefully, stability and performance are what will impress me in the long run.

There’s still a bunch of stuff that needs to be installed, but it’s getting more usable…

, ,

No Comments

Plumbing in This Old House

This passage resonates with me some days:

Programming starts out like it’s going to be architecture–all black lines on white paper, theoretical and abstract and spatial and up-in-the-head. Then, right around the time you have to get something fucking working, it has this nasty tendency to turn into plumbing.

It’s more like you’re hired as a plumber to work in an old house full of ancient, leaky pipes laid out by some long-gone plumbers who were even weirder than you are. Most of the time you spend scratching your head and thinking: Why the fuck did they do that?

From the novel, "The Bug", by Ellen Ullman.  Didn’t read the book (yet).

No Comments

ASP and ASP.NET on IIS7 on Vista

For 90% of my web projects, I’m good with PHP configured under IIS7.  But there’s one that still needs “Classic ASP” support.  Since I’m on a new machine these days, I haven’t ever configured it to do development for this project.  Here are some tips to get that working on Vista:

  1. By default, turning on IIS in Vista doesn’t give you ASP or ASP.NET.  Go into Control Panel > Programs and Features (aka “Add/Remove Programs”)
  2. Turn Windows Features On or Off
  3. Under World Wide Web Services > Application Development Features check and enable ASP and ASP.NET 

    asp-on-vista

  4. Next, for a development machine, it’s useful to turn on debug error messages.  Open IIS Services Manager (Right-click on your Computer and choose Manage)
  5. Click on your Server icon under the “Connections” pane
  6. Double-click on “ASP”
  7. Expand “Debugging Properties” and set “Send Errors to Browser” to True (only do this on your development machines) 

    asp-config

 

This link from Microsoft’s “Learn IIS” site was helpful.

, ,

No Comments

Moving a SVN Repository from Machine A to B

Move all files off the old machine.  Check.

Clean the dust out of the machine, keyboard, mouse ball.  Check.

Find OS and drivers discs for the machine.  Check.

Transfer local project files over.  Hmm.

See, I’ve been using SVN + TortoiseSVN on Windows for years to manage source code revisions on my projects.  Love it, but I am still a n00b when it comes to SVN.

Now that I’ve moved to a new development machine, I need to move my whole project environment along with historical commits.  Searching on the ‘net yields some good answers, but no real step-by-step for a TortoiseSVN user.

BTW lots of searches would yield “svn export”, which doesn’t exactly do what I need here – it only makes a copy of the current revisions in the repository.

OK:

  1. Backup everything
  2. Make sure your SVN version is the same on both machines (just in case)
  3. Know your repository full path.  If you don’t know it:
    1. Find the root folder for your projects
    2. Right click > TortoiseSVN > Repo-browser
    3. Note the URL, like: file:///C:/Documents and Settings/Bob/Documents/SVN Repository
    4. That path, minus the “file:///” is what you want
  4. On the command line:
    svnadmin dump “C:\Documents and Settings\Bob\Documents\SVN Repository” > projects.dmp

    If you have a lot of projects, and/or a lot of history or big files, this may take a while to run, and may create a big file.  This is your whole repository and history of changes after all.

  5. Copy this big file over to your new machine, say to D:\projects.dmp
  6. On your new machine, create a new directory for the repostory, say “D:\SVN Repository"
  7. Create the repository: Use svnadmin create, or with TortoiseSVN right click on the” folder > TortoiseSVN > Create repository here
  8. On the command line, run:
    svnadmin load “D:\SVN Repository” < projects.dmp
  9. Now you can restore all the files: create your project directory, say “D:\Projects”
  10. Right click > SVN Checkout…
  11. The URL should point to your new repository.  Now do a fully recursive checkout. 
    svn-checkout

That’s it.  Repository and all history moved over.  Check.

Props goes to Digital Media’s nice instructions that helped guide me.

Update: After I did all this, it occurred to me that since the two machines were networked together, and could see each other, I may have been able to do this:

  1. Map a drive from machine A to B
  2. Right click > TortoiseSVN > Relocate…
  3. Move the repository over
  4. Pick up from step #9 above

Haven’t tried it, so can’t vouch for it.

,

No Comments