Blog

Tuesday, 8 April 2014

Hadoop for Windows 7 64bit

Hadoop for windows 7 64bit

I recently got a new job, Yay! for me. Part of this new job includes working with Hadoop and MapReduce, I had previous read the tutorials and understood how it works but never got into the actual code. As a historically Microsoft developer it was a scary thought, to step into Unix and Java seemed like a challenge. Despite Java fear I could see the benefits of starting to step away from Microsoft and into the new OpenSource world of BigData, however baby steps are required and stepping away from Windows was one step to far for me now, I have so much other development to do and mostly use Visual Studio.

I started by following some tutorials for setting up Hadoop on windows. You have a lot of documentation out there but sadly it is not up to the standard that you might expect, mainly because Windows and Hadoop is not so common. I struggled with this for a couple of days and felt that it was my duty to try to help anyone in my position. So, please find below a list of tips and tricks to get you up and running, I doubt it is fool proof but I'm sure it will save anyone some time.

SSH service setup

I started by following this tutorial (https://gist.github.com/tariqmislam/2159173) But found what seemed to be a better SSHD guide (http://evalumation.com/blog/86-cygwin-windows7-sshd) and then a potentially even better one (http://venuktan.wordpress.com/2012/11/14/setting-up-hadoop-0-20-2-single-node-cluster-on-ubuntu/)

It is worth pointing out that I followed a combination of these to get to my goal.

Cygwin

First things first you need an introduction to Cygwin. The site mentions what it is and is not, but the best explanation I have seen is "Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment" on Wikipedia.

Problems with installing Cygwin

The install is fairly easy but I did come across some problems one of which is picking the right mirror. You pick the mirror in the installer itself It is almost impossible to know what is right for you but I found some of them resulted in a bad install, so you have to go back and pick another, simple resolution but can be annoying to figure out.

Problems with Setting up SSH

1) Running the SSH install (Privilege separation issue)
No matter how many guides I followed I always come up against a issue with Privilege separation. If you cannot get your service to start then you likely have the same issue, you can test this by running ssh.exe in the Cygwin tool to see full error. I found two solutions to this one felt like a hack the second change my life (maybe a bit dramatic but it helped.)

Add config: Add "sshd:x:71:65:SSH daemon:/var/lib/sshd:/bin/false" to the "etc/passwd" file in the Cygwin directory. You need to change the highlighted text for a real user id, to obtain these numbers just type "id" in the Cygwin prompt. See here for the details that led me to this.
Install Param: Rather than following all the steps to install I found you can just run "ssh-host-config -y" it was by far the easier option.

2) Uninstall and start again
A tip is, if it is going wrong uninstall and start again. If you have the service already installed use the cmd line below.

C:>sc delete [service name]

3) IPV6 Error
When accessing the SSH service I kept getting an error as seen here. I found that if you run the command below after updating the ssh/config as described in the link the error is removed.

$ SSH Localhost -oAddressFamily=inet

However I never really solved this as the issue returned when I later run Hadoop start-all at which point the issue returned. Despite this it didn't really cause any major issues.

Hopefully you now have a service up and running?

Hadoop setup

The tutorials I followed all work against an old version of Hadoop which you can find here. If the link is broken the you need to find version 0.20.2 to run this. I found that newer versions have little or no support for windows, if you find details of a newer version working on windows let me know.

1) Directory
For me it was not clear where to place Hadoop, I in fact tried several places but in the end you need to put it in the "Home/{UserName}" folder. Before starting all the install you need to run "cd hadoop" it wasn't obvious to me and I continued to struggle with having the incorrect directory.

2) Bug Alert
Perhaps this is fixed in newer version but at the point of formatting you will be asked to confirm, make sure you use uppercase "Y". See here for details of the bug.

3) Settings
You will need to configure some settings outside of Cygwins, one of which is JAVA_HOME. If you are not familiar with Environment Variables then look up how to do this, see here for details.

4) Bug Alert 2
"bin/hadoop: line 320 : C:\Program: Command not found" in the console. The route cause of this is the fact we have a space between "Program" and "Files". You can easily resolve this with the fix detailed in the link below http://stackoverflow.com/questions/12378199/hadoop-configuration-on-windows-through-Cygwin.

5) Deprecated commands
It appears Hadoop has or Cygwin has moved on since the tutorial I followed. as a result you will need to amend your Core-Site.XML to include "hdfs://localhost:9100"

looking at the HDFS

Technically if you have followed the tutorials and worked through the issues above you should have Hadoop up and running you can validate this by looking at the following URLs.

You can also make sure that the Hadoop HDFS system is in place by following the details in this link (http://stackoverflow.com/questions/8209616/how-to-read-a-file-from-hdfs-through-browser)

You can also look to run the examples for mapReduce which are in "{drive}:\cygwin64\home\{username}\hadoop\src\examples\org\apache\hadoop\examples". You run these with the following command in Cygwin, or similar.

$ bin/hadoop jar hadoop/examples.jar wordcount /user/t1/tharris/input output2

Update

Although this might seem like a great way for a Windows developer to get started the reality is you can achieve this all in minutes if you use AWS or Azure. Basically save your time in setup and invest it in features.

Wednesday, 30 October 2013

SignalR & MVC5

SignalR is great!

Ok, you want a little more than that, well I have recently set out on a project to learn SignalR and get used to the new VS2013, also really like this. My first point of call was to upgrade my site to MVC5, some great new touches like standard use of bootstrap in the template. The big surprise was how easy it was to create a SignalR application, if you think its more complicated try this link.

http://www.asp.net/signalr/overview/signalr-20/getting-started-with-signalr-20/tutorial-getting-started-with-signalr-20

Canvas Chat

It takes all of about 20minutes to run through this tutorial and you have a fully working chat at the end. In the end I wanted more, I decided to make a canvas chat, try to use some of the MCP skills I learned, after some playing around I got it working.

Remote Console

This then lead me to a problem with trying to get touch to work on my Windows 7.8 phone, still no luck message me if you have the key. Basically I cannot find the event for mouse move.

From that the SignalR console was born. It is still in it's early stages but I think it has some potential, I also am fully aware of tools such as JSConsole, but this was so easy to set up I think it has potential.

Anyway check it out here: http://www.tomharris.net/MiniProjects/WhiteBoard

My servers a little slow so apologies for the speed performance.

Sunday, 27 October 2013

JavaScript testing

JavaScript testing has got to be one of the most overlooked areas of testing since practices like TDD and Unit testing begun. It is often seen as a code to just add events and load data, these days with HTML5 bringing new functionality I have seen more and more people use client side coding. People are seeing the benefit of client side processing and with the introduction of JavaScript workers in new browsers this is only going to continue. This code clearly has a place and therefore we need to start thinking about how we test this.

I have been working on web applications for ten years and in the past 3 years I have really bought into testing be it TDD or just unit test developed after. I believe it’s the stable to any long lasting project. The question I had to ask myself about a year ago was how I can ensure that my client side code is as stable as my server side code. Firstly it was a case of replacing all old legacy code with standard libraries like JQuery, this then left the community to support these areas. Also worth pointing out JQuery is tested.

That alone wasn't enough to get me coverage. I work on a system with complex calculations that process for nutrition and recipe data on the client. This data is critical to business process so has to be correct. We needed to make sure these and calculations similar where covered to our server side code. So I took on the task of testing our Client side code.

Now the idea of testing JavaScript isn't hard. Many solutions are out there to help you on this path, below are some examples. But the concept of testing something like JavaScript is the hard one. Below are some of the issues and solutions and my conclusion so far to how you can start testing.

QUnit : http://qunitjs.com/
Jasmine: http://pivotal.github.io/jasmine/
JSUnit: http://jsunit.berlios.de/

DOM Interaction

So we have all inherited someone's code or even written sloppy code when we where juniors. Well often one of the key mistakes when starting out is the crossing over of JavaScript and the DOM. Now this is perfectly fine when you are working with a DOM manipulating library but when you come across a solution that does calculations and each time it gets the value from the DOM input this is a big "no no". When it comes to JavaScript testing, and really any testing, this has to stop. What we should be trying to do is pass a model to our method or parameters and get the result. With the result you can then bind this to the input DOM. This way when you run you test it can run without the DOM in place.

Server Calls

Next on my list of changes was the methods which hit the server. In the test environment I don't have the data set-up to ensure that my test pass. If I do have the data I cannot be sure it's the same each time. So what I need is to be able to simulate the server being hit, I need to mock the server. This is common practice in .NET development but maybe not obvious in JavaScript. The article below explains in detail the different libaries that are available and

http://testdrivenwebsites.com/2010/05/06/java-script-mock-frameworks-comparison/

Integrated builds

With any testing framework what we want is for it to be automated. Ideally this should happen on each check-in. The below blog explains some of the more popular frameworks and getting them running in a automated fashion.

http://blog.danmerino.com/continuos-integration-ci-for-javascript-jasmine-and-teamcity/

Conclusion So Far
Again to point out this is my feelings so far, 6 months in. I'm sure as I move forward and so does the concept of JavaScript testing the decisions and problems will changes. I will try to keep this post updated with the latest.

For me JavaScript testing although it is firmly out there is still not at the maturity that allows a clear decision. Many libraries are available and its a constantly changing environment. It also depends on the set-up you have and the requirements.

If you are learning about testing and how to structure your code. I would start with QUnit its simple to setup and get going. You will however outgrow QUnit fairly quickly if you are working with multiple developers.

If you want the full feature set then go for Jasmine today it's more of a framework for testing than QUnit. It also appears to be a favourite with many developers. And some large libraries like AngularJS are being tested with this today.

Tuesday, 22 October 2013

Making that decision about AngularJS or Knockout

This isn't the first post comparing these two libraries so I'm going to take a different take on this. I want to compare how viable each of these is to apply to a new or existing project.

Why this post title is wrong

Firstly it’s not as clear cut as comparing one to the other. These two libraries are very different and Knockout is just a small part of what AngularJS does. If you want to compare an equivalent compare AngularJS to Durandal. This is a more complete solution for to compete with AngularJS.

http://www.johnpapa.net/compare-durandal-to-angular-not-knockout-to-angular/.

The history

I was working on one of our legacy projects which only support IE, god help me. The time had come for us to upgrade this solution, after all people use other browser now. The main route of the problem was our old legacy system used XML Data Islands. Most of you will not have used this; I had never used it to working on this solution. Basically the concept of these DataIslands was actually really good. First you define you object in XML and then you bind the XML to the HTML using inline attributes on the HTML, sound familiar.

If you are also in the position that you need to migrate away from DataIslands I suggest you also looking at XML2JSON. This library allows you to move to using something like Knockout without the need of rewriting your backend. It’s only an interim solution but a great way to move forward when the budget doesn't allow.

Comparing Data Binding

I'm only in my first 6 months of using these libraries so this is an early comparison. We started to use Knockout, being a Microsoft development team this come naturally. Problem happened when I head that Google had something similar. I had to know what they were doing; most of us can accept that Google lead web development.

I started to compare these two solutions, for binding only. In all honestly when it comes to binding it’s hard to find a failure in either of these libraries. However I did find a few things easier and this is likely just down to our project.

Structuring your code

It's often one of the hardest things to decide with a new project "how will I structure my files" is a common comment on new projects. With AngularJS this is somewhat removed from your decision making. AngularJS suggest/forces good practice for structure of objects and files. This in its self is a great thing. The problem comes when you have a 10 year old solution. Changing files here just isn't possible; the regression test alone would blow the budget. If working on a Greenfield project I can see the benefit of this as a starting point.

Syntax

Knockout syntax is great and simple if you are used to XML Data Binding you pretty much can find a like for like syntax. Although templating is altogether a different thing. If you again have a Greenfield or never used binding before project then why learn Knockout when you can learn AngularJS. Both these libraries have great and ever improving documentation so you don't need to worry about the learning side.

AngularJS also has some neat syntax that can help make life easier. Take for example the filter and sorting options when templating. These really can add great functionality to your code for little effort. I'm yet to see an ability to sort in Knockout, although I have seen attempts at this on various boards.

Conclusion

Knockout is a great tool but go for Angular unless you are migrating from some older form of data binding. For my project I have gone with Knockout but on our new projects we are using AngularJS. It’s a difficult choice.

I will keep you updated on how the project continues and our final result. As the project continues we will face more unforeseen challenges.

References:

http://angularjs.org/

http://knockoutjs.com/

http://blog.nebithi.com/knockoutjs-vs-angularjs/

Some simple angular examples to get you going:

http://tomharris.net/test/angularJS