Thursday, January 15, 2009

Writing data to the cloud (SQL Data Services)

In a previous post I talked about setting up the environment for SQL Data Services. Now I’ll show you something I’ve done with it and hopefully it will help somebody thru.

I’ll use my Goomez project cause that’s like my sand box and also, and since it’s available online, you could download it and also play with it. I used the SQL Data Service to store and query the information, so instead of using Lucene.net I used SDS for this implementation, which I believe it’s a good scenario for SDS. That said, the changes you’ll see here were not committed to the svn repository.

But let’s get down to business. SDS has two interfaces, one is through web services, which implies importing the wsdl like any other web service which, if you do it with Visual Studio, it will create all the necessary proxy classes. The other way is REST and since I hadn’t tried anything with it before I thought it’d be a good idea to give that a try too (I must say it’s not a recommended practice, since if you crewed up, it’d be harder to know where).

One thing that I found missing is a good old REST API for .net. I don’t know if this is planed to stay like this, but it’s pretty crapy. You have to create a HttpWebRequest, HttpWebResponse and so forth… not something I enjoy doing. It was fun though (just for this time).

So, here’s the code of the function that actually saves the info of one of the indexed files (FileInfo) to the cloud.

Firts of all, create the request and response objects and set a few properties to the request

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(m_url);
HttpWebResponse response = null;
MemoryStream stm = null;

request.Credentials = new NetworkCredential("<YourSolution>", "<Password>");
request.Method = "POST";
request.ContentType = "application/x-ssds+xml";


In my example, m_url has the value https://goomez.data.database.windows.net/v1/goomezindex because goomez is the authority I created and goomezindex is the container. So now, the entities.



StringBuilder builder = new StringBuilder("<File xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:x=\"http://www.w3.org/2001/XMLSchema\" xmlns:s=\"http://schemas.microsoft.com/sitka/2008/03/\">");
builder.AppendFormat("<s:Id>{0}</s:Id>", Guid.NewGuid());
builder.AppendFormat("<file xsi:type=\"x:string\">{0}</file>", file.Name);
builder.AppendFormat("<folder xsi:type=\"x:string\">{0}</folder>", file.Directory.FullName);
builder.AppendFormat("<extension xsi:type=\"x:string\">{0}</extension>", file.Extension.Replace(".", string.Empty));
builder.AppendFormat("<size xsi:type=\"x:decimal\">{0}</size>", file.Length.ToString());
builder.AppendFormat("<content xsi:type=\"x:string\">{0}</content>", GoomezSearchHelper.Tokenizer.TokenizeToIndex(file.FullName));
builder.Append("</File>");


Here I’m writing the xml which represents an entity, in my case, called File. The file variable you see here is the FileInfo I passed as a parameter to this function.



Now the ‘magic’:



XElement entity = XElement.Parse(builder.ToString(), LoadOptions.SetLineInfo);

stm = new MemoryStream();
entity.Save(stm);
request.ContentLength = stm.Length;
using (Stream stream2 = request.GetRequestStream())
{
stream2.Write(stm.GetBuffer(), 0, (int)stm.Length);
}

response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.Created)
{
throw new WebException(string.Format(CultureInfo.InvariantCulture, "Unexpected status code returned: {0}", new object[] { response.StatusCode }));
}


Create the entity element, save it at a MemoryStream and get the response.



So, what was that? what did that do? if you go to the SDS Explorer and query the index you’ll see the file you just ‘uploaded’.



SDS Explorer

Read Full Post

Friday, January 09, 2009

My first Parallel Computing programming experience

For some reason I have not researched on yet, processors aren’t evolving as fast as they did in the past. We don’t double the processor speed every 6 months like it used to be, so manufacturers are adding more processors to computers in order to gain a ‘little’ extra speed.

But what happens with your programs? will they take the advantage of more than one processor in your machine. The answer I would guess is “no, they don’t”, and if you’re working with the .net framework as I am, the answer is definitely NO. So if you want your programs to run smoother and take advantage of your computer’s processing power you’ll have to either learn a new parallel computing language or start playing around with the Parallel Computing extension for the .net Framework (like I did).

First let me tell you that if you’re used to work with threads (System.Threading) you’ll have no problem understanding the new parallel paradigm. When working with thread you would create one to say, calling a web service and not freezing the UI while doing it, right? well it’s the same thing here, instead of creating a new Thread you’ll create a new Task.

But this .net extension has another cool feature called PLINQ (Google it, I couldn’t find and official updated resource) which pretty much ease things for you. I’ll show you an example of this with something I did in my Goomez indexer.

So I had a function called IndexFiles that looked like this:

        private static void IndexFiles()
{
try
{
List<string> servers = GetConfigList(K_SERVERS);

foreach (string server in servers)
{
foreach (string folder in GetShares(server))
{
if (folder.EndsWith("$"))
continue;

string folderFullPath = @"\\" + server + @"\" + folder;

try
{
IndexFolder(folderFullPath);



and I wanted to change the outter foreach, which BTW that’s one of the team’s recommendations, so I switched to the following:



        private static void ParallelIndexFiles()
{
try
{
Parallel.ForEach<string>(GetConfigList(K_SERVERS), server =>
{
foreach (string folder in GetShares(server))
{
if (folder.EndsWith("$"))
continue;

string folderFullPath = @"\\" + server + @"\" + folder;

try
{
ParallelIndexFolder(folderFullPath);



see how my foreach is different now? what that change in my code does, is that for each server in my list the code executed inside the lamda expression is processed by the first available processor… pretty cool uh?!



To try this stuff you can either download the CTP of Visual Studio 2010 and the .net Framework 4.0 or the Parallel Extension to the .net framework 3.5 June 2008 CTP.



Quick note: if you don’t have two (or more) processors you would end up slowing your program a little bit because there’s a little overhead which they say will be eliminated in next versions.

Read Full Post