Deserializing xml: Data at the root level is invalid. Line 1, position 1

A couple of days ago a colleague of mine encountered a problem where he had some objects that define a configuration needed to be stored in the database.

He would serialize the data with the following code:

public static class XmlSerializerHelper
{
    public static string SerializeXml<TObject>(this TObject objectToSerialize)
    {
        using (var memoryStream = new MemoryStream())
        {
            var xmlSerializer = new XmlSerializer(typeof (TObject));

            var xmlnsEmpty = new XmlSerializerNamespaces(new[]
                {
                    new XmlQualifiedName(string.Empty, string.Empty),
                });

            var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);

            xmlSerializer.Serialize(xmlTextWriter, objectToSerialize, xmlnsEmpty);

            return Encoding.UTF8.GetString(memoryStream.GetBuffer(), 0, (int) memoryStream.Length);
        }
    }
}

And we would read deserialize the string with these lines:

XmlSerializer xmlSerializer = new XmlSerializer(typeof(Entity));
StringReader stringReader = new StringReader(result /* result is the value from the database */);
Entity deserializedEntity = (Entity)xmlSerializer.Deserialize(stringReader);

When the string then was read from the database we got an error:

{"Data at the root level is invalid. Line 1, position 1."}

{“Data at the root level is invalid. Line 1, position 1.”}

Let’s check why.

I think the exception is in the first 2 characters of the xml:

What's the string?

Nothing weird there… However, let’s look at the characters themselves:

char at index 0

What the? What’s that thing there at [0]? Who put it there?

Well, according to some web documentation it’s the ZERO WIDTH NO-BREAK SPACE (what’s with the screaming anyway…), also known as the BOM!

So what is the BOM? Wikipedia does a better explanation than me. The only thing you need to know is that it is optional.

So where is it coming from? And why is this an issue?

Let’s investigate our serializer helper, we run the code again, but a breakpoint in the method that actually serializes the entity, and investigate the memoryStream:

investigating the stream

As we can see, after carefully reading the Wikipedia page, these first 3 chars are actually the BOM. So the issue is not our reader, it’s our writer.

Let’s get back to the question: why is this an issue? Why can’t our reader interpret this as just a UTF-8 string, and just parse it, why do we manually have to do a TrimStart()?

It seems that the StringReader used in our code just passes through the string as a stream to our deserializer, without caring whether the string starts with or without a BOM. So we can’t do anything there except for calling the TrimStart() method as mentioned above.

So since we control both sides, let’s do it less nasty, the documentation mentions that the BOM is not required, and our reader doesn’t play nice with it. We’re not going to save it at all then Smile.

Let’s check the XmlTextWriter constructor: it accepts 2 parameters, a stream and an encoding. So what’s with this encoding? Apparently if we use the Encoding.UTF8 it emits the BOM. We can avoid this by using new UTF8Encoding(false) to prevent the BOM from being emitted.

Another way, which I find cleaner is this one, using a StringWriter which automatically infers the correct encoding.:

public static class XmlSerializerHelper
{
	public static string SerializeXml<TObject>(this TObject objectToSerialize)
	{
		XmlSerializer xmlSerializer = new XmlSerializer(typeof (TObject));
 
		XmlSerializerNamespaces xmlnsEmpty = new XmlSerializerNamespaces(new[]
			{
				new XmlQualifiedName(string.Empty, string.Empty),
			});
 
		StringWriter stringWriter = new StringWriter();
 
		xmlSerializer.Serialize(stringWriter, objectToSerialize, xmlnsEmpty);
 
		return stringWriter.ToString();
	}
}

This doesn’t involve any encoding being manipulated / used on our side, so it uses a bit more magic, but it’s way more straightforward. Please add your experience to the comments Smile

You can find the full code on GitHub here!

Have a good one,

-Kristof