Monday, May 17, 2010

Encoding strings for the twitter API

gentleface.com free icon set Some time ago, I started to work on shelltwit a command line twitter updater (it only updates your status), but I didn’t want to use any existing library because I wanted to learn how to work directly to twitter API.

I looked around for some examples, the Twitter API doc is not well updated or complete, so it is not easy to start coding right away, you need to read a lot first (I hate when that happens). I found a good sample from Shannon Whitley called Twitter xAuth with .net. I started up with that code but I found an issue with international characters, like á, é or ñ, which kept me from posting about #Peñarol. So I started to hunt the bug, looked around online, went to the Twitter API user group and found out that there are a lot of issues with international characters. I found people form Brazil, Russia and Japan complaining about it. Apparently most libraries were written by english speaking developers so very few encounter the issue.

Now I can happily say that found the issue so I thought about posting the solution here.

Encoded strings (your twitter status) must be made to UTF8 according to RFC3986 and there’s no native .net function that does that, so after some researching I came up with an algorithm that does exactly that. So I hope it helps some one else.

static string UNRESERVED_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";

//http://en.wikipedia.org/wiki/Percent-encoding
//http://www.w3schools.com/tags/ref_urlencode.asp see 'Try It Yourself' to see if this function is encoding well
//This should be encoded according to RFC3986 http://tools.ietf.org/html/rfc3986
//I could not find any native .net function to achieve this
/// <summary>
///
Encodes a string according to RFC 3986
/// </summary>
/// <param name="value">
string to encode</param>
/// <returns></returns>
public static string EncodeString(string value)
{
StringBuilder sb = new StringBuilder();
foreach (char c in value)
{
if (UNRESERVED_CHARS.IndexOf(c) != -1)
sb.Append(c);
else
{
byte[] encoded = Encoding.UTF8.GetBytes(new char[] { c });
for (int i = 0; i < encoded.Length; i++)
{
sb.Append('%');
sb.Append(encoded[i].ToString("X2"));
}
}
}
return sb.ToString();
}

1 comment:

Unknown said...

Don't know about .net but in Windows there is the WideCharToMultiByte function. You'll have to mess with p/invoke though, so your solution might be cleaner...

http://msdn.microsoft.com/en-us/library/dd374130(VS.85).aspx