Category Archives: XML

Stupid fun with XML Literals

1
Filed under .NET, XML

I’ve been putting the finishing touches on a little command line utility, and wanted to add some nice instruction text etc.

Writing to the Console is certainly simple in .net, but line after line of Console.WriteLine gets old

Then I recalled some tricks using XML Literals in VB.net. A few snips later and I have this:

Private Shared Sub Header()
    Dim v = My.Application.Info.Version.ToString
    Dim x = <info>
MyUtility v<%= v %> - Controller Tool&#13;
(c) 2011 yadda yadda&#13;
</info>
    Output(x.Value)
End Sub

With that, I can enter any number of lines of text I need, I don’t have to worry about quotes or Console.Writeline’s, just a &#13; at the end of each line, and I can substitute in the app’s version number using the familiar ASP <%= notation (notice the variable ‘v’).

Now, if there was just a way to preserve whitespace in XML literals…

Code Garage – Escaping and UnEscaping XML Strings

4
Filed under Code Garage, XML

image If you work much with XML, eventually, you’ll end up needing to take raw string data and either Escape it (replace characters that are illegal in XML in the string with legal XML markup) or UnEscape it (replace any XML markup in the string with the represented characters).

For instance, a “>” (greater than) symbol is illegal in the content of an XML node, and must be represented by &gt;

There are any number of approaches out there to handle this. Most involve simple brute force string search and replaces. They work, but I thought there must be a more elegant (and already thought out) solution to this problem.

However, after quite a bit of searching, I gave up and wrote my own, making use of the XML functions already defined in the .net framework.

To ENCODE a string into legal XML node content…

    Public Function EncodeXML(ByVal s As String) As String
        If Len(s) = 0 Then Return ""
        Dim encodedString = New StringBuilder()
        Dim writersettings = New System.Xml.XmlWriterSettings
        writersettings.ConformanceLevel = Xml.ConformanceLevel.Fragment
        Using writer = System.Xml.XmlWriter.Create(encodedString, writersettings)
            writer.WriteString(s)
        End Using
        Return encodedString.ToString
    End Function

To reverse the process and Decode a string…

    Public Function DecodeXML(ByVal s As String) As String
        If Len(s) = 0 Then Return ""
        Dim decodedString As String = ""
        Dim readersettings = New System.Xml.XmlReaderSettings
        readersettings.ConformanceLevel = Xml.ConformanceLevel.Fragment
        Dim ms = New System.IO.StringReader(s)
        Using reader = System.Xml.XmlReader.Create(ms, readersettings)
            reader.MoveToContent()
            decodedString = reader.ReadString
        End Using
        Return decodedString
    End Function

No, I haven’t performed any exhaustive performance measures on these. I’ve generally not used them in any kind of ultra high volume situations, but they’re clean, simple and leverage existing code that already performs the necessary functions.

XPath Testing Tool (XPathMania)

0
Filed under Utilities, VB Feng Shui, XML

I mentioned RAD Software’s RegularExpression Designer a few posts back, but this time, I’m talking XPath.

If you haven’t already played with XPath, and you do much of anything at all with XML data, you owe it to yourself to check it out. Basically, it’s a query language that allows you to query for specific nodes or nodesets from an XML data source very easily. It’s also got some limited function and expression handling capabilities (typical math functions, string manipulation functions, etc).

XPath 1.0 is fully supported by the .net framework. Unfortunately, XPath 2.0, while much more capable, has been dropped by Microsoft in favor of, I believe, XQuery. Trouble is, I can’t find much in the way of VB.net support for XQuery right off.

But, all that aside, XPath 1.0 is still very much a useful tool to have in your arsenal, but, like regular expressions, getting an XPath query right can take some experimenting. That’s where XPathMania comes in.

It’s basically an XPath query tool extension to the VS 2008 IDE.

In the screenshot below, you can see that I’ve got an XML file loaded, and I’ve docked the XPathMania window just below it.

With that, I can easily enter a query again that XML file and view the results (along with the matching nodes actually highlighted in the XML file view! Good stuff!

image

Finally, you’ll also definitely want to check out this page for lots of good XPath example queries.

In the future, I’ll try to put together a little sample app that illustrates how to use the XPathNavigator to easily execute these queries against an XML document and enumerate the resulting nodeset.

Cleaning up Messy DataContractSerializer XML

7
Filed under Code Garage, Software Architecture, VB Feng Shui, XML

I was working with XML serialization of objects recently and was using the good ol’ DataContractSerializer again.

One thing that I bumped into almost immediately is that the XML that it spits out isn’t exactly the neatest, tidiest of XML possible, to say the least.

So I set out on a little odyssey to see exactly how nice and clean I could make it.

(EDIT: I’ve added more information about how the Name property of the Field object is being serialized twice, which is another big reason for customizing the serialization here, and for specialized dictionary serialization in general).

First, the objects to serialize. I’ve constructed a very rudimentary object hierarchy that still illustrates the problem well.

In this case, I have a List of Record objects, called a Records list. Each Record object is a dictionary of Field objects. And each Field object contains two properties, Name and Value. The code for these (and a little extra code to make populating them easy) is as follows.

Public Class Records
    Inherits List(Of Record)


    Public Sub New()
        '---- default constructor
    End Sub

End Class


Public Class Record
    Inherits Dictionary(Of String, Field)


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub
End Class


Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub


    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String



    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

Yes, I realize there are DataTables, KeyValuePair objects, etc that could do this, but that’s not the point, so just bear with me<g>.

To populate a Records object, you might have code that looks like this:

Dim Recs = New Records
Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

Ok, so far so good.

Now, lets serialize that with a simple serialization function using the DataContractSerializer:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim writer = System.Xml.XmlWriter.Create(stream)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

In the test application, I put together, I dump the resulting XML to a text box. Yikes!

image

So, what’re the problems here? <g>

  1. You’ve got that “http://www.w3.org/2001/XMLSchema-instance” namespace attribute amongst other
  2. lots of random letters
  3. no indenting
  4. You can’t really tell it from this shot, but the Record dictionary is serializing the name property twice, because I’m using it as the Key for the dictionary, but it’s also a property of the objects in the dictionary.

All this noise might be fine for computer to computer communication, but it’s pretty tough on human eyes<g>.

Ok, first thing to do is indent:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

Notice that I added the use of the XMLWriterSettings object. This allows me to set the Indent property, and things are much more readable.

image

But that’s still a far cry from nice, simple, tidy XML. Notice all the “ArrayofArrayOf blah blah” names, and the randomized letter sequences? Plus, it’s much more obvious how the NAME jproperty is being serialized twice now. Yuck! Surely, we can do better than this!

Cleaning Up the Single Entity Field Object

The DataContractSerializer certainly works easily enough to serialize the Field object, but unfortunately, it decorates the serialized elements with a load of really nasty looking and completely unnecessary cruft.

My first thought was to simply decorate the class with <DataContract> attributes:

<DataContract(Name:="Field", Namespace:="")> _
Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    <DataMember()> _
    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String



    <DataMember()> _
    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

But this yields:

image

So we have several problems:

  • Each field is rendered into a Value element of the Record’s field collection
  • The Key of the Record collection duplicates the Name of the individual Field objects
  • and we still have a noxious xmlns=”” attribute being rendered.

Unfortunately, this is where the DataContractSerializer’s simplicity is it’s downfall. There’s just no way to customize this any further, using ONLY the DataContractSerializer.

However, we can implement IXMLSerializable on our Field object to customize its serialization. All I need to do is remove the DataContract attribute, and add a simple implementation of IXMLSerializable to the class:

Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String


    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String


    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function


    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub


    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class

And that yields a serialization of:

image

Definitely better, but still not great.

Cleaning up a Generic Dictionary’s Serialization

The problem now is with the Record dictionary.

Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Adding an IXMLSerializable implementation to it as well yields the following XML:

image

Definitely much better! Especially notice that we’ve gotten rid of the duplicated “Name” key. It was duplicated before because we used the Name element of the Field object as the Key for the Record dictionary. This be play an important part in deserializing the Record’s dictionary of Field objects later.

Cleaning up the List of Records

Finally, the only thing really left to do is clean up how the generic list of Record objects is serialized.

But once again, the only way to alter the serialization is to implement IXMLSerializable on the class.

<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Notice that I’ve implemented IXMLSerializable, but I also added the XmlRoot attribute with a blank Namespace parameter. This completely clears the Namespace declaration from the resulting output, which now looks like this:

image

And that is just about as clean as your going to get!

But That’s Not all there is To It

Unfortunately, it’s not quite this simple. The thing is, you very well may want to serialize each object independently, not just serialize the Records collection. Doing that as we have things defined right now won’t work. The Start and End elements won’t be generated in the XML properly.

Instead, we need to add XmlRoot attributes to all three classes, and adjust where the WriteStartElement and WriteEndElement calls are made. So we end up with this:

<Xml.Serialization.XmlRoot(Namespace:="")> _ Public Class Records Inherits List(Of Record) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each r In Me writer.WriteStartElement("Record") DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _ Public Class Record Inherits Dictionary(Of String, Field) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each f In Me.Values writer.WriteStartElement("Field") DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _ Public Class Field Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml writer.WriteElementString("Name", Me.Name) writer.WriteElementString("Value", Me.Value) End Sub End Class

 

 

And Finally, Deserialization

Of course, all this would be for nought if we couldn’t actually deserialize the xml we’ve just spent all this effort to clean up.

Turns out that deserialization is pretty straightforward. I just needed to add code to the ReadXml member of the implemented IXMLSerializable interface. The full code for my testing form is below. Be sure to add a reference to System.Runtime.Serialization, though, or you’ll have type not defined errors.

Public Class frmSample

    Private Sub btnTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnTest.Click

        '---- populate the objects
        Dim Recs = New Records
        Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
        Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
        Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

        Dim t As String
        t = Serialize(Of Field)(Recs(0).Values(0))
        Dim fld = Deserialize(Of Field)(t)
        Debug.Print(fld.Name)
        Debug.Print(fld.Value)
        Debug.Print("--------------")

        t = Serialize(Of Record)(Recs(0))
        Dim rec = Deserialize(Of Record)(t)
        Debug.Print(rec.Values.Count)
        Debug.Print("--------------")

        t = Serialize(Of Records)(Recs)
        tbxOutput.Text = t

        Dim recs2 = Deserialize(Of Records)(t)
        Debug.Print(recs2.Count)
    End Sub
End Class


<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Records")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim Rec = New Record
            DirectCast(Rec, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(Rec)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            writer.WriteStartElement("Record")
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class


<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Record")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim fld = New Field
            DirectCast(fld, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(fld.Name, fld)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            writer.WriteStartElement("Field")
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class


<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String


    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String


    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function


    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Field")
        reader.MoveToContent()
        If reader.Name = "Name" Then Me.Name = reader.ReadElementContentAsString
        reader.MoveToContent()
        If reader.Name = "Value" Then Me.Value = reader.ReadElementContentAsString
        reader.MoveToContent()
        reader.ReadEndElement()
    End Sub


    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class



Public Module Serialize
    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function


    ''' <summary>
    ''' Deserializes the data contract from xml.
    ''' </summary>
    Public Function Deserialize(Of T As Class)(ByVal xml As String) As T
        Using stream As New MemoryStream(UnicodeEncoding.Unicode.GetBytes(xml))
            Return DeserializeFromStream(Of T)(stream)
        End Using
    End Function


    ''' <summary>
    ''' Deserializes the data contract from a stream.
    ''' </summary>
    Public Function DeserializeFromStream(Of T As Class)(ByVal stream As Stream) As T
        Dim serializer As New DataContractSerializer(GetType(T))
        Return DirectCast(serializer.ReadObject(stream), T)
    End Function
End Module

Of particular note above is the ReadXML function of the Field object.

It checks the name of the node first and then places the value of the node into the appropriate property of that object. If I didn’t do that, the deserialization process would require the fields in the XML to be in a specific order. This is a minor drawback to the DataContractSerializer that this approach alleviates.

What’s Next?

The one unfortunate aspect of this is that it requires you to implement IXMLSerializable on each object that you want the XML cleaned up for.

Generally speaking, The DataContractSerializer will be perfectly fine for those cases where humans aren’t likely to ever have to see the XML you’re generating. And you get a performance boost for sacrificing that flexibility and “cleanliness”.

But for things like data file imports, custom configuration files, and the like, it may be desirable to  implement custom serialization like this so that your xml files can be almost as easy to read as those old school INI files!

DataContractSerializer is Not Necessarily Wonderful

0
Filed under .NET, VB Feng Shui, XML

Serializing objects in .NET is surprisingly easy, and there are a number of different ways to go about the process.

One method that was recently brought to my attention is the DataContractSerializer.

To use it, you’ll need to:

Imports System.Runtime.Serialization

Then create an new instance of a DataContractSerializer and a stream and finally, use the Serializer object to write a serialized version of some other object to the stream. I won’t delve into that code here, there’s plenty of examples on the web; even the Microsoft Help page for that object has a decent example.

But what you won’t find mentioned is how incredibly brittle the serializer can be.

First off, it doesn’t round trip the serialized XML, at least not completely. Say you serialize an object to an XML file, then EDIT that file and add comments. If you then deserialize and reserialize the object, you loose all your comments.

While this is somewhat understandable, it is rather unfortunate. Too bad that’s only the lesser of the problems you’ll face.

Another issue is one that plagues XML in general. It’s CASE SENSITIVE. This makes hand editing the files tricky at best. Now, I know a lot of you might say “Jeez, stop hand editing your XML!” but, let’s be realistic. Sometimes, that’s just flat the fastest way to deal with certain issues.

But the real stumper for me was when I recently discovered that the DataContractSerializer is highly dependent on the ORDER of the XML elements in the serialized XML! That’s right, get elements out of the order that the serializer expects them in (and yet still have perfectly legal and reasonable XML) and the serializer fails. But what’s worse is that it fails silently. It just simply fails to deserialize certain property values if their xml elements aren’t in the “proper” order (which happens to be alphabetic, but still, XML should not be order dependent).

There’s a short write up on it here, but even the author, Aarron Skonnard, doesn’t comment on the fact that if elements are out of order, they won’t deserialize properly.

From what I’ve read, that order dependency is partly why the DataContractSerializer is up to 10% faster than some other serialization techniques. And, if you know that your serialized objects will ONLY ever be touched by code that is intended to work with them, that’s probably ok.

But it’s something to be very aware of if you intend on the serialized XML being visible to (or editable by) actual people.

Multi-Line String Literals

2
Filed under .NET, VB Feng Shui, XML

VB has long been my favorite development language, but, as is typical, even it can definitely be improved in innumerable ways.

One aspect of VB that has always bothered me is string literals. Since VB treats EOL as significant, you can’t have strings that wrap across multiple lines in the environment. This is especially problematic when you deal with SQL queries, report formatting strings, and the like. For instance, this is generally what you see when you have to deal with long SQL queries in VB code.

Dim q as String = "Select *" & vbcrlf
q &= "From MyTable" & vbcrlf
q &= "Where MyColumn = 1"

Now, you can do a few formatting tricks and clean it up a little, but even then, it’s going to be cumbersome to work with.

One workaround has always been to embed these long, multi-line literals in a Resource file. The VS 2008 IDE even goes so far as to allow you to enter multi-line strings in the resource editor, although you have to use Ctrl-Enter to get a new line:

image

If you look at the RESX file for the above resource, you’ll see something like this:

image

All this is certainly workable, and, in many cases, isolating strings to a resource file (of some sort) can be a really good thing, but in other cases, it’s just a pain.

In VB10, Microsoft is finally doing what VB should have done all those years ago. They’re adding “intelligent line continuation” which is to say, you can continue a line in certain syntactic situations simply by continuing the code on a new line. No need for the line continuation underscore char, etc.  It won’t necessarily work everywhere (esp in places where it the code would end up ambiguous), but hey, that only makes sense.

In the meantime, you can actually have multi-line string literals in VB.Net without too much pain.

The trick is to use XML Literals. Generally, I’m no fan of XML literals. Once they get peppered with variable references, embedded For loops and other code, they become just so much spaghetti code, albeit with a modern twist.

But, they can be handy for multi-line string literals.

Dim q as String = <String>Select *
From MyTable
Where MyColumn = 1
</String>.value

So what’s going on here? First, I dim the result variable Q as a string (technically, this isn’t necessary as VB can infer the type, but you get the picture).

Then, I declare the XML Literal with the <String></String> tags. Everything between those tags (even across lines), gets treated as part of the string, no quotes or line continuation needed.

Finally, since the result of the <String></String> tags is going to be an XElement object, and what I want is a string, I need to invoke the value property of the XElement object to retrieve its string value.

And there you go.

But, as with anything good, there are a few caveats.

  1. line breaks AND whitespace are preserved in the string, which means if you indent those lines, you’ll get space characters where you might not have wanted them.
  2. if you want to include certain characters that are illegal in XML (for instance, the “<” or “>” characters), you’ll need to escape them.

Handling Escape Characters

One option you have for handling escape characters is to use the XML escape sequences, as in:

Dim q as String = <String>Select * as My&gt;Test&lt;var
From MyTable
Where MyColumn = 1
</String>.value

The &gt; and &lt; will be replaced with > and < respectively. You can escape any arbitrary character with &#nnn; where nnn is the ascii code of the character.

But this is code, dammit, not XML, and, at least for me, I’d much rather see normal C-style escape sequences (\r\n anyone <g> ) here than have to muck with XML “&” sequences. If we’re talking about web pages and HTML, that might be a different story, though.

At any rate, you can have your cake and eat it too, as it turns out.

Just feed that string through System.Text.RegularExpressions.Regex.Unescape to convert those sequences, hence:

Dim q as String = System.Text.RegularExpressions.Regex.Unescape(<String>Select * as MyTestvar \r\n
From MyTable
Where MyColumn = 1
</String>.value)

Notice how the XElement.value is simply passed through to the Unescape method to process that \r\n into a normal carriage return/linefeed pair.

Using that, you can now easily embed C-style escaped multi-line strings in your VB code.

But wait, that Unescape call is fairly nasty to look at and use. Wouldn’t it be nice if that was more or less automatic? Well, with a nice Extension Method, it can be.

<System.Runtime.CompilerServices.Extension()> _
Public Function UnEscaped(ByVal v As XElement) As String
    Return System.Text.RegularExpressions.Regex.Unescape(v.Value)
End Function

.....

Dim q as String = <String>Select * as MyTestvar \r\n
From MyTable
Where MyColumn = 1
</String>.Unescaped

First, define an Extension Method that would apply to the XElement object. This is a very simple method that just takes the associated XElement object, retrieves it’s value, and runs that string through the Regex Unescape method.

Since the extension method extends the XElement itself, you can now simply call the Unescaped method directly on the long string definition (the very last line in the code above).

You can’t get much cleaner than that. Or can you? Let me know!