Category Archives: VB Feng Shui

Configuring Log4Net in a .net VSTO Word Addin

3
Filed under Code Garage, Installations, log4net, VB Feng Shui

VSTO (Visual Studio Tools for Office) is a great way to put together addins for the various MS Office applications (especially Word, Excel, and Outlook).

And Log4Net is a fantastic and unbelievably flexible logging framework for .net applications.

So naturally, I wanted to use them together.

And doing so is not bad at all, till it came to configuration…

A Disclaimer (with a note of Encouragement)

Log4Net is not the most straightforward package out there. It’s extremely flexible, and very easy to work with once you’ve gotten used to it, but getting into the Tao of the thing took a few days, for me, anyway.

Don’t let that discourage you. It really is a spectacular framework for handling virtually any aspect of logging in your applications. And it really doesn’t take much “setup code” at all to get it operational.

The Super Highway

The easiest way to configure log4net in a .net application (VSTO addins included) is to simply call Configure on the XMLConfigurator object:

log4net.Config.XmlConfigurator.Configure()

That’ll work, but unfortunately, since your VSTO addin is a DLL, log4net will, by default, look in the current app.config file, which, if you’re running in Word, for instance, will be WinWord.exe.config in the folder where WinWord.exe lives.

Since WinWord.exe.config is Word’s config file, it’s probably not the best idea in the world to go shoe-horning your own (or log4net’s) config stuff in there as well. Not to mention how do you get your config information easily into that file during installation (or properly remove it during an uninstall).

The Scenic Byway

What you really want is for your VSTO addin DLL to have it’s own config file. Something that lives in the same folder as your DLL itself, and can easily be installed and removed.

Sure enough, that Configure method has an overload that accepts a FileInfo structure for an arbitrary XML Config file. So you can just do this:

Dim MyConfigFile = Me.GetType.Assembly.ManifestModule.Name & ".config"
If My.Computer.FileSystem.FileExists(MyConfigFile) Then
    Dim fi = My.Computer.FileSystem.GetFileInfo(MyConfigFile)
    log4net.Config.XmlConfigurator.Configure(fi)
End If

What this does effectively is construct a filename based on the name of whatever assembly the current code is defined within, but with a “.config” extension.

It then checks for the existence of that file. Since there’s no path on the file, it only looks in the current directory, but that’s fine, since the config file will always be located in the same folder as it’s DLL.

And finally, if the file is found, it retrieves a FILEINFO object for it and configures Log4Net with that file.

Bumps along the Road

Unfortunately, that will get you farther, but not by much.

There are two problems.

  1. In debug mode in the IDE, the “current directory” is, indeed, the same folder as the one with your addin DLL in it. But, when running in release mode, in production, with Visual Studio completely out of the picture, the current directory is very likely the folder where WinWord.exe is located. Not good.
  2. Worse, in production, VSTO addins are generally copied to a “shadow cache” folder by the .net framework, so that the original files can be upgraded in place easily while the application is in use.

Unfortunately, your config will will not get copied to the shadow cache.

This means that for determining where to look for your config file, using something like:

  • Me.GetType.Assembly.CodeBase or
  • Me.GetType.Assembly.Location

won’t work, because often times, they’ll point you off into the wilds of the assembly cache folder and not  the \Program Files\MyCompany\MyProduct\ folder where you installed your addin and where you most likely would prefer your MyProduct.dll.config file to live.

Happy Trails

In the end, I found the best solution to be:

  1. Look in the “current directory” for your config file.
  2. If you find it, use it from there. This accommodated easy debugging while in the IDE because you can easily get to and edit your config file.
  3. If you don’t find it, construct the path to the app’s \Program Files\ folder and check there.
  4. If it’s not there either, just fall back to the default and call XMLConfigurator.Configure() with no parameters and let it default everything.

A routine that puts all that together looks like this:

Private Sub ConfigureLog4Net()
        Dim MyConfig = Me.GetType.Assembly.ManifestModule.Name & ".config"
        If Not My.Computer.FileSystem.FileExists(MyConfig) Then
            '---- not in current dir, so check in our Program Files folder
            Dim pth = Path.Combine(My.Computer.FileSystem.SpecialDirectories.ProgramFiles, My.Application.Info.CompanyName)
            pth = Path.Combinepth, My.Application.Info.ProductName)
            MyConfig = System.IO.Path.Combine(pth, MyConfig)
        End If
        If My.Computer.FileSystem.FileExists(MyConfig) Then
            Dim fi = My.Computer.FileSystem.GetFileInfo(MyConfig)
            log4net.Config.XmlConfigurator.Configure(fi)
        Else
            log4net.Config.XmlConfigurator.Configure()
        End If
End Sub

To use this, you’ll need to make sure that your installation package installs your addin DLL and it’s config file into the \Program Files\Company Name\Product Name folder.

This is the pretty typical case, though, so that shouldn’t be a worry.

Later on Down the Road

This obviously begs the next question. What about other application settings? Log4Net reads stuff out of an arbtrary config file  you specify, but the My.Settings object does not. It’ll still end up looking in the default place, which is the host application’s config file (again, WinWord.exe.config, or Excel.exe.config, etc).

I hope to cover a decent solution to that in a later post.

Implementing Document.SaveCopyAs in Word

0
Filed under Office, VB Feng Shui

If you’ve used the Excel object model, youmay have discovered how incredibly handy the SaveCopyAs function is. Essentially, it allows you to save a currently loaded Excel spreadsheet into some arbitrary file, without altering the state of the loaded copy. In other words, if the user has altered the spreadsheet loaded into Excel, but hasn’t saved it, after a “SaveCopyAs”, that copy of the sheet is still considered dirty, and the saved file on disk does not have the user’s changes saved within it.

This is a very wonderful thing!

Basically, it means you can save off the spreadsheet as it currently exists in memory, complete with all the users edits up to this point, perform any operation on that saved copy that you want, and then either keep that copy or throw it away, and the copy that the user is currently working on is not affected in anyway whatsoever!

Good stuff.

But Word has never had it!

I’d rectified this long ago, in VB6, but recently had a need for this capability again, in VB.net this time.

Turns out, it’s incredibly simple to implement with VB.net, and much more intuitive to.

With .net 3.5, you can even create the function as an Extension Method, and directly add it to the Word Document object interface! Very cool!

The thing is, I knew I’d done this before, but couldn’t remember exactly how. After quite some time googling this, I came across a post where it was theorized it might be possible to implement SaveCopyAs by using the Compare function in Word.

Hmm, I’d never even tried the Compare, but it sounded interesting. After a few false starts, I ended up with this:


Imports System.Runtime.CompilerServices

<System.Runtime.CompilerServices.Extension()> _
Public Sub SaveCopyAs(ByVal Document As Word.Document, ByVal FileName As String)
    Document.Compare(Name:=Document.FullName, CompareTarget:=Microsoft.Office.Interop.Word.WdCompareTarget.wdCompareTargetNew)
    With Document.Application.ActiveDocument
        If .Revisions.Count > 0 Then
            .Revisions.RejectAll()
            .SaveAs(FileName, AddToRecentFiles:=False)
            .Close()
        End If
    End With
    Document.Activate()
End Sub

Notice that I’m using the CompilerServices.Extension attribute to flag this as an extension method. This is what adds this method directly to the Word Document object and makes it so much more intuitive to use.

Essentially, the idea here is to:

  1. Compare the version of the document loaded in memory to the last saved version out on disk.
  2. Write the comparison result to a new document in memory (i.e. what becomes the active document)
  3. Check the revision count.
  4. If there are revisions, you know there’s been changes made to the document since it was last saved, so Reject all the revisions, save the active document and close it.
  5. If there were no revisions, the user hasn’t made any changes to the document since the last time they saved, so there’s really nothing else to do.
  6. Reactivate the user’s originally active document

The real trick here was step 4. Originally, I was accepting the revisions and then saving, and it wasn’t working at all. Eventually I traced the problem back to the comparison order. It turns out, when you perform a compare to another file from a loaded Document object in Word, the results of the comparison are the changes required to convert from the version of the document you currently have open in Word to the other document.

So, if the user has added the word “Nostradamus” in a paragraph, then the comparison would yield a revision that says, essentially, “remove the word Nostradamus from this paragraph”. This is exactly backward from the behavior you want. Turns out that all I had to do was “Reject all changes” instead.

Believe it or not, this works a dream.

But…

I can’t imagine that performing a compare, especially on a large document, is going to be a particularly speedy process and I knew I’d done this before using a alternate interface that the Word Document object implemented, albeit undocumentedly so. I just couldn’t remember how…

Suffice it to say, eventually I stumbled across the long forgotten interface. That interface is the COM IPersistFile. It’s a basic interface COM typically uses when saving Structured Storage documents, which are what Word files usually are (though, fortunately, it works with Word 2007 for saving DOCX style files).

Turns out the Word Document object implements that interface. They just don’t make that fact common knowledge.

Using it is trivially easy, especially given that it’s a COM interface that the .net framework happens to define natively (though do note, you’ll need to Import the InteropServices.ComTypes Namespace as show below.


Imports System.Runtime.CompilerServicesImports System.Runtime.InteropServices.ComTypes

<System.Runtime.CompilerServices.Extension()> _
Public Sub SaveCopyAs(ByVal Document As Word.Document, ByVal FileName As String)
    Dim pf = DirectCast(Document, IPersistFile)       pf.Save(FileName, False)
    'pf.SaveCompleted(FileName)
End Sub

About that last commented line, that calls the SaveCompleted function. From the docs, it sounds like that is necessary, but I’ve never found a situation where it made a difference calling it or not.

Maybe someone can illuminate that aspect of the interface better than me?

But at any rate, the IPersistFile method is likely to be many times faster than the Compare method, especially for large files or files that the user has made many changes to without saving. Still, I thought the Compare method was interesting enough to warrant a mention.

So there you have it. SaveCopyAs for Word, implemented as an extension method that directly extends the Word Document object.

Yet more VB.net sweetness!

VB.net Instance Variable Initializers vs the Constructor

0
Filed under VB Feng Shui

I ran into a very peculiar problem several days ago. Essentially, I had a class, with a few properties. One of those properties had some initialization code for a backing variable, like so:

   Public Property Count() As Integer
      Get
         Return _count
      End Get
      Set(ByVal Value As Integer)
         _count = Value
      End Set
   End Property
   Private _count As Integer = _collection.Count

Now, granted, this is a little contrived. But the idea is that this particular property is initialized with a value from a private object (the _collection variable), that is set up during the constructor.

The problem was that the app was throwing an object not initialized when try to initially set the _count variable’s value.

This completely threw me. How on earth was my constructor not being called? After some debugging, I quickly discovered that the constructor is called after all these private variable initialization lines are executed! Now, that doesn’t really make a whole lot of sense, but, it’s what’s happening.

Only later did I come across this post by Patrick Steele. He’s got a great explanation and description of what’s happening there, so I won’t repeat that.

But the key to the whole issue is a quote by him from the Visual Basic Language Specification:

When a constructor’s first statement is of the form MyBase.New(…), the constructor implicitly performs the initializations specified by the variable initializers of the instance variables declared in the type. This corresponds to a sequence of assignments that are executed immediately after invoking the direct base type constructor. Such ordering ensures that all base instance variables are initialized by their variable initializers before any statements that have access to the instance are executed.

(The emphasis is Patrick’s)

He goes on to quote the relevant part of the C# spec, which explicitly states exactly the opposite behavior for it’s initializers and constructors!

I haven’t been able to find any quote as to why there’s a difference in behavior, or more specifically, why on earth VB’s behavior in this case seems so wrong.

If anyone knows, please comment!

When is the VB.Net Replace function not Replace?

0
Filed under VB Feng Shui

I ran into a little surprise a few days ago. With something as innocuous as the “Replace” function, of all things!

Now, Replace has been around since, well, forever. The basic function has always looked like this:

X = Replace(StringToSearch, StringToSearchFor, StringToReplaceWith)

No big deal. Used it for years.

So imagine my surprise when I got an “Object not set” error on a string variable that I’d just used Replace on! How could that be?

So I checked the variable value. It was, in fact, “Nothing”.

Ok, So I rerun the code, stepping through. The variable starts out as a string containing some data, then gets trimmed, and the resulting string in empty (still a string, just containing no characters).

Then I perform a replace on it, and suddenly, the string is Nothing?!

So, I coded up the following little test snippet.


    Private Sub Test()

        Dim x = "Test A B C"
        Debug.Print(Replace(x, "A", "Z"))
        'Results in "Test Z B C"

        Debug.Print(x.Replace("A", "Z"))
        'also results in "Test Z B C"

        x = ""
        x = x.Replace("A", "Z")
        Debug.Print(x Is Nothing)
        'result is false, (x stays "")

        x = ""
        x = Microsoft.VisualBasic.Replace(x, "A", "Z")
        Debug.Print(x Is Nothing)
        'result is true!?!

        x = ""
        x = Replace(x, "A", "Z")
        Debug.Print(x Is Nothing)
        'Result is true!?!

    End Sub

The first two results are exactly as expected. The search string is replaced with the replace string.

The next result, ALSO, is as expected. x starts out as an empty string. I perform a replace on it, and the result is, duh! an empty string! Now, for as long as I’ve known the Replace function, this has not been how it works.

But the next two results both result in FALSE, meaning the result of the String.Replace function and of the Microsoft.VisualBasic.Replace function is Nothing, and not an empty string!

Google It!

So I started googling. I turned up this bug report from .net 1.1.

The response from Microsoft is:

Posted by Microsoft on 5/18/2006 at 3:02 PM
Hi,
Thank you very much for reporting this issue. Unfortunately, VB had this behavior in previous version too and we can’t change it due to backward compatibility problem.
Thanks,
VB Development Team.

“Can’t be!” I yelled. Well, ok, I didn’t yell, but I exclaimed heartily <g>.

So I fired up VB6 and wrote up another test program, then ran it.

You can see the VB supplied variable inspector tooltip as I hovered over the value of X after I’d just run the normal REPLACE function on it in the EXACT SAME WAY as I’m doing in VB.net.

image

I hate to burst bubbles, but X starts out as an empty string, I run the REPLACE function on it, and it ends up… wait for it… an empty string! Not a null/nothing variable.

Maybe it’s another weird edge case scenario that the VB Team is talking about there, but, come on! This has got to be THE SINGLE MOST COMMON invocation of Replace. Why on earth should it work differently from VB6 to VB.net?

And one other thing. String.Replace is part of the .net FRAMEWORK people. It has virtually nothing to do with VB6 and certainly, compatibility shouldn’t have been a concern there. May for the version in the VisualBasic namespace, but not that String.Replace. Weak!

Short version of the story

Be VERY CAREFUL of using either String.Replace or the Microsoft.VisualBasic.Replace functions.

If you DO need them (they have a nice case-insensitive switch, which makes them handy), write a wrapper function that deals with this VB.net bug (yeah, I went there <g>) so you won’t get bitten and end up with rash.

XPath Testing Tool (XPathMania)

0
Filed under Utilities, VB Feng Shui, XML

I mentioned RAD Software’s RegularExpression Designer a few posts back, but this time, I’m talking XPath.

If you haven’t already played with XPath, and you do much of anything at all with XML data, you owe it to yourself to check it out. Basically, it’s a query language that allows you to query for specific nodes or nodesets from an XML data source very easily. It’s also got some limited function and expression handling capabilities (typical math functions, string manipulation functions, etc).

XPath 1.0 is fully supported by the .net framework. Unfortunately, XPath 2.0, while much more capable, has been dropped by Microsoft in favor of, I believe, XQuery. Trouble is, I can’t find much in the way of VB.net support for XQuery right off.

But, all that aside, XPath 1.0 is still very much a useful tool to have in your arsenal, but, like regular expressions, getting an XPath query right can take some experimenting. That’s where XPathMania comes in.

It’s basically an XPath query tool extension to the VS 2008 IDE.

In the screenshot below, you can see that I’ve got an XML file loaded, and I’ve docked the XPathMania window just below it.

With that, I can easily enter a query again that XML file and view the results (along with the matching nodes actually highlighted in the XML file view! Good stuff!

image

Finally, you’ll also definitely want to check out this page for lots of good XPath example queries.

In the future, I’ll try to put together a little sample app that illustrates how to use the XPathNavigator to easily execute these queries against an XML document and enumerate the resulting nodeset.

Parsing Key-Value Pairs Via Regular Expressions

0
Filed under Code Garage, Regular Expressions, VB Feng Shui

I’ve often found myself in need of a Key-Value pair parser. Simple stuff, really. Essentially, the idea is to be able to parse any of the following from a typical command buffer:

Key=Value (no whitespace in the key or value)

Key=”Value” (whitespace is ok in the value)

Key (when just the existence of the key signals something)

This sort of parser is fairly easy to write, but this time, I’d just finished playing with regular expressions for another parsing task, so I thought, why not give them a try here?

After a few minutes with the Rad Regular Expression Designer, I’d put together what appeared to be a pretty robust expression for this.

My version keys of the matches instead of the seperators. I did this mainly because I wanted the Key and Value parts to be returned as “cleanly” as possible. That means the Key should be just the Key, no whitespace or “=” and the value should never include the leading or trailing quote marks, if they’re there).

The end result is a function that takes a string buffer and returns a generic Dictionary of Key Value string pairs.

Imports System.Text.RegularExpressions

Module RegEx

    Public Function ParseKeyValuePairs(ByVal Buffer As String) As Dictionary(Of String, String)
        Dim Result = New Dictionary(Of String, String)

        '---- There are 3 sub patterns contained here, seperated at the | characters
        '     The first retrieves name="value", honoring doubled inner quotes
        '     The second retrieves name=value where value can't contain spaces
        '     The third retrieves name alone, where there is no "=value" part (ie a "flag" key
        '        where simply its existance has meaning
        Dim Pattern = "(?:(?<key>\w+)\s*\=\s*""(?<value>[^""]*(?:""""[^""]*)*)"") | " & _
                      "(?:(?<key>\w+)\s*\=\s*(?<value>[^""\s]*)) | " & _
                      "(?:(?<key>\w+)\s*)"
        Dim r = New System.Text.RegularExpressions.Regex(Pattern, RegexOptions.IgnorePatternWhitespace)

        '---- parse the matches
        Dim m As System.Text.RegularExpressions.MatchCollection = r.Matches(Buffer)

        '---- break the matches up into Key value pairs in the return dictionary
        For Each Match As System.Text.RegularExpressions.Match In m
            Result.Add(Match.Groups("key").Value, Match.Groups("value").Value)
        Next
        Return Result
    End Function

    Public Sub Test()
        Dim s = "Key1=Value Key2=""My Value here"" Key3=Test Key4 Key5"
        Dim r = ParseKeyValuePairs(s)
        For Each i In r
            Debug.Print(i.Key & "=" & i.Value)
        Next
    End Sub
End Module

I’ve included a simple test function to help validate it.

DISCLAIMER: I’m not RegEx guru, so there are likely much faster ways to assemble the Regex. If you have one, by all means please comment! And finally, it’s entirely possible that I’ve missed some examples of badly formed input that would cause weird parsing results.

For instance, in the above example, notice that “Key3=Test Key4 Key5” will return Key3 set to “Test” and Key4 and Key5 set to empty strings.

If the user meant for Key3’s value to be “Test Key4 Key5”, there would need to be quotes around the value.

But, parsing issues like that will be the norm in any kind of parsing logic for formats such as this, so I’m not terribly worried about it.

Implicit Casts in VB.net

0
Filed under Code Garage, VB Feng Shui

I really hadn’t paid much attention to Implicit Casting in VB. I’d heard the term, thought it might be an interesting idea, but then completely forgot about it.

But I was recently reading a blog post discussing the EntitySpaces ORM, and the author happened to offhandedly mentioned implicit casts with respect to certain features of that ORM. I was intrigued again, so I set off down that road.

Come to find out, Implicit casts were one of the language features new in VS2005! (wow, how’d I miss that?) Essentially, they allow you to explicitly dictate what happens when you implicitly  cast one object to a different type.

How’s that again?

VB is rife with instances where you might need to implicitly cast one object to another of a different type. You’ve got the obvious situations, for instance, converting an object from a subtype to a super type (granted, not the most ideal example, but it illustrates the point):

Dim d = New Dog
Dim a As Animal = d

And then there are more subtle examples. Here, I’ve created a Dog object with the name “Rover” but then I’m comparing it directly to a string. The implicit conversion is from Dog to String:


Dim d = New Dogd.Name = “Rover”


If d = "Rover" Then blah....

Now, before anyone rails on me for encouraging bad programming practices, these are all contrived examples, just to point out the possibilities.

The point is, if you find yourself needing to cast one object into another, and such casting makes logical sense, implicit casting provides a very clean, terse way to accomplish it.

Widening or Narrowing?

Any time you convert one object to another, there’s only 2 possible outcomes:

  • The conversion will always succeed. This is known as a Widening Conversion.
  • The conversion could succeed or fail. This is known as a Narrowing Conversion.

A lot of documentation on the matter will describe a Widening Conversion as one that converts a derived type to one of it’s base types, and a Narrowing Conversion as one that converts a base type to a derived type, but in practice, the two types don’t have to be related at all.

For instance, you can define either a Narrowing or a Widening conversion operator to convert from a StringBuilder Class to a String and vice versa, but neither type is directly related to the other.

The Caveats

There’s always at least one, right? Well, first, as you might guess, used with abandon, implicit casting can lead to code that bears a striking (and quite unwelcome) resemblance to old school VB code chock full of variants.

But a less obvious snag is that when you cast, you need to pay special attention to whether you’re creating a new object or just recasting the existing object.

For instance, you might define a Widening Conversion from Dog to Animal as:

Public Shared Widening Operator CType(ByVal InitialData As Dog) As Animal
    Dim a = New Animal
    a.Name = InitialData.Name
    Return a
End Operator

But in actuality, a new Animal object with the same Name as the original Dog object is created. That might be what you want, or it might not be.

Project Level Implicit Conversion

If you’ve never noticed before, your project actually has an overall level of warning configuration for Implicit Conversion, because it can be such a nasty issue.

You’ll find it on the Compile Tab of the project properties:

image

  1. Set to None, VB won’t warn you at all when you attempt to implicitly convert types.
  2. Set to Warning, VB will show a warning in the Errors list, but will still allow the project to compile.
  3. Set to Error, VB won’t even allow the project to compile.

Generally, you’ll want this set to Error. The good news, however, is that even with this option set to Error, VB will not consider those implicit conversions that are handled explicitly by a Widening Conversion as errors. If you think about it, this makes sense, because a Widening conversion, by its very nature, can’t fail, and thus really isn’t an implicit conversion anymore.

StringBuilder Example

Francesco Balena wrote up an excellent short article here that illustrates using a Widening conversion to make working with StringBuilder objects far easier. He uses Widening conversions to implicitly cast from StringBuilder to String and back, like so:


    Public Shared  Widening Operator CType(ByVal op As StringBuilder6) As  String
        Return op.ToString()
    End Operator

    Public Shared Widening Operator CType(ByVal str As String) As StringBuilder6
        Dim op As New StringBuilder6()
        op.buffer.Append(str)
        Return op
    End Operator

Since these are Widening Conversions, they won’t fail, and thus they won’t be flagged as errors even with Implicit Conversions set to Error in the project properties.

A One Line CSV Parser

0
Filed under Code Garage, Regular Expressions, Utilities, VB Feng Shui

Parsing up Quote Comma delimited text is a pretty common thing to do, and it seems trivial enough till you realize all the little gotcha’s that come with the problem (like doubled quotes, commas in quotes, etc, etc). Then it becomes just another laborious exercise in boring coding.

I came across a regex some time ago that makes the process literally one line of code. I can no longer find the original author but the code I found (what little there was of it) was C# and actually split up into a few lines of code, so this is converted to the equivalent VB.net:

    Public Function QCSplit(ByVal Args As String) As String()

        Return (New System.Text.RegularExpressions.Regex(",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))")).Split(args)
    End Function

The regex used here doesn’t actually match on the contents, it matches on the commas that split things up, and it then uses the SPLIT function to actually split those things up.

I’ve used this a while now and it works great, but I’ve seen a few interesting alternatives since.

The most interesting thus far is this one by Daniel Einspanjer. Not interesting enough yet for me to switch to it, but the regexlib.com website is quite nice as a great repository of good regex recipes.

Regular Expression Tester

And speaking of Regular Expressions, the guys over at RAD Software, have a free Regular expression tester for .net style regular expressions that works fantastically. If you’re just starting out in regex’s (and seriously, who isn’t <G>), you owe it to yourself to pick up a decent regex test tool, and this one is as good as I’ve seen so far.

image

As far as I can tell, it can handle all the various options for regex’s, and dynamically shows match results, etc. Very handy for trying out expressions without actually running them in .net.

Add it to your External Tools menu in VS and it’ll be right there, good to go.

Mail Merge and Reports in Word

2
Filed under Office, VB Feng Shui

If you have a need to generate documents, and I mean lots of documents, you’re likely the eventually investigate the Word mail merge functionality.

It’s ok, but there’s lots of things that it can’t do, especially with respect to generating tables or dealing with table oriented data.

You next stop might be a report generator, like Crystal Reports. But the big problem with those kinds of packages is they require you to use a proprietary report template editor, usually band oriented (the typical “header/body with repeating lines/footer” type reporting system). Not terribly bad, but not great if you actually want to generate mail merged documents.

Word Documents as Templates

But there are alternatives out there. Two of the most intriguing I’ve seen are an open source package called FlexDoc and a commercial application called Windward Reports.

Both of them allow you to use standard Word documents as “report” templates”. But keep in mind, “report template” in this sense is a pretty open ended concept. You could generate everything from a mass mailing letter, to legal documents, to actual reports with tabular data, and even graphs and charts.

And both are lightning fast, since they work directly on the Word document file itself (or more precisely, the DOCX file format, older DOC format files aren’t supported).

The Commercial Product

Windward Reports is polished commercial product. It’s not cheap, but it’s incredibly flexible, supporting everything from XML to SQL data sources, charting, graphs, tabular report type elements, formatted content (ie insert a field into your Word document that consists of formatted HTML, for example), and a lot more.

Windward uses standard Word Mail Merge fields for its tags, specifically the AUTOTEXTLIST field. Those field types have been in Word since even before Word 2000, and it’s very unlikely they’ll change any time soon, so you’re pretty safe there.

One really nice element of Windward is that they have available a “tagging helper” addin for Word called AutoTag. It can really help speed up the design of your Word report templates. It’s optional. Technically, you don’t absolutely have to have it, but it’s a lot more painful to create reports without it.

The Open Source Project

Flexdoc is open source and not quite as polished as Windward. But it’s got much of the same functionality. It can connect to just about any data source, it’s fast, and it can do tabular data very well, but it can’t do charts or graphs, and it can’t handle formatted data (hmtl or otherwise) at all by default, though there are mods that can be made to improve that situation.

The biggest problem with Flexdoc is that it currently relies on the CustomXMLElements functionality of Word that has, as of Jan 10, 2010, been removed from the product because of the lawsuit with I4I.

The author of FlexDoc indicated that there were plans for a ContentControl-based version in the future, but that might be a while off. Still, it’s an open source project, so you could always throw in and help make those changes if you really needed them.

Cleaning up Messy DataContractSerializer XML

5
Filed under Code Garage, Software Architecture, VB Feng Shui, XML

I was working with XML serialization of objects recently and was using the good ol’ DataContractSerializer again.

One thing that I bumped into almost immediately is that the XML that it spits out isn’t exactly the neatest, tidiest of XML possible, to say the least.

So I set out on a little odyssey to see exactly how nice and clean I could make it.

(EDIT: I’ve added more information about how the Name property of the Field object is being serialized twice, which is another big reason for customizing the serialization here, and for specialized dictionary serialization in general).

First, the objects to serialize. I’ve constructed a very rudimentary object hierarchy that still illustrates the problem well.

In this case, I have a List of Record objects, called a Records list. Each Record object is a dictionary of Field objects. And each Field object contains two properties, Name and Value. The code for these (and a little extra code to make populating them easy) is as follows.

Public Class Records
    Inherits List(Of Record)

    Public Sub New()
        '---- default constructor
    End Sub

End Class

Public Class Record
    Inherits Dictionary(Of String, Field)

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub
End Class

Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String

    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

Yes, I realize there are DataTables, KeyValuePair objects, etc that could do this, but that’s not the point, so just bear with me<g>.

To populate a Records object, you might have code that looks like this:

Dim Recs = New Records
Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

Ok, so far so good.

Now, lets serialize that with a simple serialization function using the DataContractSerializer:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim writer = System.Xml.XmlWriter.Create(stream)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

In the test application, I put together, I dump the resulting XML to a text box. Yikes!

image

So, what’re the problems here? <g>

  1. You’ve got that “http://www.w3.org/2001/XMLSchema-instance” namespace attribute amongst other
  2. lots of random letters
  3. no indenting
  4. You can’t really tell it from this shot, but the Record dictionary is serializing the name property twice, because I’m using it as the Key for the dictionary, but it’s also a property of the objects in the dictionary.

All this noise might be fine for computer to computer communication, but it’s pretty tough on human eyes<g>.

Ok, first thing to do is indent:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

Notice that I added the use of the XMLWriterSettings object. This allows me to set the Indent property, and things are much more readable.

image

But that’s still a far cry from nice, simple, tidy XML. Notice all the “ArrayofArrayOf blah blah” names, and the randomized letter sequences? Plus, it’s much more obvious how the NAME jproperty is being serialized twice now. Yuck! Surely, we can do better than this!

Cleaning Up the Single Entity Field Object

The DataContractSerializer certainly works easily enough to serialize the Field object, but unfortunately, it decorates the serialized elements with a load of really nasty looking and completely unnecessary cruft.

My first thought was to simply decorate the class with <DataContract> attributes:

<DataContract(Name:="Field", Namespace:="")> _
Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    <DataMember()> _
    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String

    <DataMember()> _
    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

But this yields:

image

So we have several problems:

  • Each field is rendered into a Value element of the Record’s field collection
  • The Key of the Record collection duplicates the Name of the individual Field objects
  • and we still have a noxious xmlns=”” attribute being rendered.

Unfortunately, this is where the DataContractSerializer’s simplicity is it’s downfall. There’s just no way to customize this any further, using ONLY the DataContractSerializer.

However, we can implement IXMLSerializable on our Field object to customize its serialization. All I need to do is remove the DataContract attribute, and add a simple implementation of IXMLSerializable to the class:

Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String

    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class

And that yields a serialization of:

image

Definitely better, but still not great.

Cleaning up a Generic Dictionary’s Serialization

The problem now is with the Record dictionary.

Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Adding an IXMLSerializable implementation to it as well yields the following XML:

image

Definitely much better! Especially notice that we’ve gotten rid of the duplicated “Name” key. It was duplicated before because we used the Name element of the Field object as the Key for the Record dictionary. This be play an important part in deserializing the Record’s dictionary of Field objects later.

Cleaning up the List of Records

Finally, the only thing really left to do is clean up how the generic list of Record objects is serialized.

But once again, the only way to alter the serialization is to implement IXMLSerializable on the class.

<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Notice that I’ve implemented IXMLSerializable, but I also added the XmlRoot attribute with a blank Namespace parameter. This completely clears the Namespace declaration from the resulting output, which now looks like this:

image

And that is just about as clean as your going to get!

But That’s Not all there is To It

Unfortunately, it’s not quite this simple. The thing is, you very well may want to serialize each object independently, not just serialize the Records collection. Doing that as we have things defined right now won’t work. The Start and End elements won’t be generated in the XML properly.

Instead, we need to add XmlRoot attributes to all three classes, and adjust where the WriteStartElement and WriteEndElement calls are made. So we end up with this:


<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            writer.WriteStartElement("Record")
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class

<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            writer.WriteStartElement("Field")
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class

<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String

    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class


 

 

And Finally, Deserialization

Of course, all this would be for nought if we couldn’t actually deserialize the xml we’ve just spent all this effort to clean up.

Turns out that deserialization is pretty straightforward. I just needed to add code to the ReadXml member of the implemented IXMLSerializable interface. The full code for my testing form is below. Be sure to add a reference to System.Runtime.Serialization, though, or you’ll have type not defined errors.

Public Class frmSample

    Private Sub btnTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnTest.Click

        '---- populate the objects
        Dim Recs = New Records
        Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
        Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
        Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

        Dim t As String
        t = Serialize(Of Field)(Recs(0).Values(0))
        Dim fld = Deserialize(Of Field)(t)
        Debug.Print(fld.Name)
        Debug.Print(fld.Value)
        Debug.Print("--------------")

        t = Serialize(Of Record)(Recs(0))
        Dim rec = Deserialize(Of Record)(t)
        Debug.Print(rec.Values.Count)
        Debug.Print("--------------")

        t = Serialize(Of Records)(Recs)
        tbxOutput.Text = t

        Dim recs2 = Deserialize(Of Records)(t)
        Debug.Print(recs2.Count)
    End Sub
End Class

<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Records")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim Rec = New Record
            DirectCast(Rec, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(Rec)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            writer.WriteStartElement("Record")
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class

<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Record")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim fld = New Field
            DirectCast(fld, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(fld.Name, fld)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            writer.WriteStartElement("Field")
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class

<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable

    Public Sub New()
        '---- default constructor
    End Sub

    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String

    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Field")
        reader.MoveToContent()
        If reader.Name = "Name" Then Me.Name = reader.ReadElementContentAsString
        reader.MoveToContent()
        If reader.Name = "Value" Then Me.Value = reader.ReadElementContentAsString
        reader.MoveToContent()
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class

Public Module Serialize
    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

    ''' <summary>
    ''' Deserializes the data contract from xml.
    ''' </summary>
    Public Function Deserialize(Of T As Class)(ByVal xml As String) As T
        Using stream As New MemoryStream(UnicodeEncoding.Unicode.GetBytes(xml))
            Return DeserializeFromStream(Of T)(stream)
        End Using
    End Function

    ''' <summary>
    ''' Deserializes the data contract from a stream.
    ''' </summary>
    Public Function DeserializeFromStream(Of T As Class)(ByVal stream As Stream) As T
        Dim serializer As New DataContractSerializer(GetType(T))
        Return DirectCast(serializer.ReadObject(stream), T)
    End Function
End Module

Of particular note above is the ReadXML function of the Field object.

It checks the name of the node first and then places the value of the node into the appropriate property of that object. If I didn’t do that, the deserialization process would require the fields in the XML to be in a specific order. This is a minor drawback to the DataContractSerializer that this approach alleviates.

What’s Next?

The one unfortunate aspect of this is that it requires you to implement IXMLSerializable on each object that you want the XML cleaned up for.

Generally speaking, The DataContractSerializer will be perfectly fine for those cases where humans aren’t likely to ever have to see the XML you’re generating. And you get a performance boost for sacrificing that flexibility and “cleanliness”.

But for things like data file imports, custom configuration files, and the like, it may be desirable to  implement custom serialization like this so that your xml files can be almost as easy to read as those old school INI files!