Getting Started With Eclipse

A powerful development environment for Java-based programming is Eclipse. Eclipse is a free, open-source IDE. It supports multiple languages through a plugin interface, with special attention paid to Java. Tools designed for working with Hadoop can be integrated into Eclipse, making it an attractive platform for Hadoop development. In this section we will review how to obtain, configure, and use Eclipse.
Downloading and Installing

Note: The most current release of Eclipse is called Ganymede. Our testing shows that Ganymede is currently incompatible with the Hadoop MapReduce plugin. The most recent version which worked properly with the Hadoop plugin is version 3.3.1, "Europa." To download Europa, do not visit the main Eclipse website; it can be found in the archive site http://archive.eclipse.org/eclipse/downloads/ as the "Archived Release (3.3.1)."

The Eclipse website has several versions available for download; choose either "Eclipse Classic" or "Eclipse IDE for Java Developers."

Because it is written in Java, Eclipse is very cross-platform. Eclipse is available for Windows, Linux, and Mac OSX.

Installing Eclipse is very straightforward. Eclipse is packaged as a .zip file. Windows itself can natively unzip the compressed file into a directory. If you encounter errors using the Windows decompression tool (see [1]), try using a third-party unzip utility such as 7-zip or WinRAR.

After you have decompressed Eclipse into a directory, you can run it straight from that directory with no modifications or other "installation" procedure. You may want to move it into C:\Program Files\Eclipse to keep consistent with your other applications, but it can reside in the Desktop or elsewhere as well.
Installing the Hadoop MapReduce Plugin

Hadoop comes with a plugin for Eclipse that makes developing MapReduce programs easier. In the hadoop-0.18.0/contrib/eclipse-plugin directory on this CD, you will find a file named hadoop-0.18.0-eclipse-plugin.jar. Copy this into the plugins/ subdirectory of wherever you unzipped Eclipse.
Making a Copy of Hadoop

While we will be running MapReduce programs on the virtual machine, we will be compiling them on the host machine. The host therefore needs a copy of the Hadoop jars to compile your code against. Copy the /hadoop-0.18.0 directory from the CD into a location on your local drive, and remember where this is. You do not need to configure this copy of Hadoop in any way.

Running Eclipse

Navigate into the Eclipse directory and run eclipse.exe to start the IDE. Eclipse stores all of your source projects and their related settings in a directory called a workspace.

Upon starting Eclipse, it will prompt you for a directory to act as the workspace. Choose a directory name that makes sense to you and click OK.
eclipse-workspace




Configuring the MapReduce Plugin

In this section, we will walk through the process of configuring Eclipse to switch to the MapReduce perspective and connect to the Hadoop virtual machine.

Step 1: If you have not already done so, start Eclipse and choose a workspace directory. If you are presented with a "welcome" screen, click the button that says "Go to the Workbench." The Workbench is the main view of Eclipse, where you can write source code, launch programs, and manage your projects.

Step 2: Start the virtual machine. Double-click on the image.vmx file in the virtual machine's installation directory to launch the virtual machine. It should begin the Linux boot process.

Step 3: Switch to the MapReduce perspective. In the upper-right corner of the workbench, click the "Open Perspective" button, as shown in Figure 3.4:

Select "Other," followed by "Map/Reduce" in the window that opens up. At first, nothing may appear to change. In the menu, choose Window * Show View * Other. Under "MapReduce Tools," select "Map/Reduce Locations." This should make a new panel visible at the bottom of the screen, next to Problems and Tasks.

Step 4: Add the Server. In the Map/Reduce Locations panel, click on the elephant logo in the upper-right corner to add a new server to Eclipse.

You will now be asked to fill in a number of parameters identifying the server. To connect to the VMware image, the values are:

Location name: (Any descriptive name you want; e.g., "VMware server")
Map/Reduce Master Host: (The IP address printed at startup)
Map/Reduce Master Port: 9001
DFS Master Port: 9000
User name: hadoop-user

Next, click on the "Advanced" tab. There are two settings here which must be changed.

Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user.

Next, scroll further down to mapred.system.dir. Erase the current value and set it to /hadoop/mapred/system.

When you are done, click "Finish." Your server will now appear in the Map/Reduce Locations panel. If you look in the Project Explorer (upper-left corner of Eclipse), you will see that the MapReduce plugin has added the ability to browse HDFS. Click the [+] buttons to expand the directory tree to see any files already there. If you inserted files into HDFS yourself, they will be visible in this tree.

18 comments :

  1. Hi,
    I am using VM image hadoop-appliance-0.18.0.vmx and an eclipse plug-in
    of hadoop. I have followed all the steps in this tutorial:
    http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html. My
    problem is that I am not able to browse the HDFS. It only shows an
    entry "Error:null". Upload files to DFS, and Create new directory fail. Any
    suggestions? I have tried to chang all the directories in the hadoop
    location advanced parameters to "/tmp/hadoop-user", but it did not
    work.
    Also, the tutorials mentioned a parameter "hadoop.job.ugi" that needs
    to be changed, but I could not find it in the list of parameters.

    ReplyDelete
    Replies
    1. I have the same problem.. let me know if you found a solution. thanks

      Delete
    2. i had the same problem....i restarted vm and eclipse....and then i clicked on advanced tab and found hadoop.job.ugi....it was set to host user and you have to change it to hadoop-user.....
      PS...use eclipse europa...i tried with juno but couldnt connect

      Delete
    3. did you find any solution? I have the same problem with eclipse europa...

      Delete
    4. This problem usually arises when you are working on windows environment.. The cause of the problem will be Cygwin is not installed or not properly installed.
      Download the Cygwin.exe file and run it.
      It's not all.. After cygwin is installed successfully, you have to add the path of cygwin bin directory (e.g C:\cygwin\bin)to the environmental variable..

      That's all now restart the virtual machine and eclipse and click on the DFS Locations on project explorer and try to explore the files.. You will get a connection error message..
      Now try to find the hadoop.job.ugi, you will it..

      Delete
    5. I have done all those. i even set hadoop.job.ugi. But I got error as Call failed as local exception. Can anyone suggest a solution for my problem pls..

      Delete
    6. Unfortunately this problem also arises when using mac osx. Kindly let me know if you find a solution

      Delete
    7. To fix this, go to "\workspace\.metadata\.plugins\org.apache.hadoop.eclipse\locations". Here open the XML file and just add the property "hadoop.job.ugi" with value "hadoop-user,ABC" and then restart your eclipse. It worked for me.

      Delete
  2. I too ran into the same issue. I installed redhad cgywin and updated the "Path" environment variable with path to cgywin/bin (C:\rhcygwin\bin). Then my eclipse dfs locations was able to connect to hadoop on the virtual machine. once that is successful i saw the option "hadoop.job.ugi". the link v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html in the post above describes installing cgywin.
    Note: I am running the hadoop vm on windows vista.

    ReplyDelete
  3. I had same problem with Windows 7. The reason was cygwin was not installed on my pc. There are some problems with cygwin on Win 7, after installing Cygwin, "cygwin sshd" service will not start. I found a fix (http://www.kgx.net.nz/2010/03/cygwin-sshd-and-windows-7/comment-page-1/) in net & now my pc(Windows 7 - 32 bit) is ready with Yahoo Dev N/w Hadoop VM (hadoop-0.18.0) + Eclipse (Juno).

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I installed sshd using these instructions:
    http://docs.oracle.com/cd/E24628_01/install.121/e22624/preinstall_req_cygwin_ssh.htm

    I started it:
    net start sshd

    Then, per http://hadoop.6.n7.nabble.com/Eclipse-plugin-td8156.html, in the conf/hadoop-site.xml file in the hadoop download from the tutorial, i.e., hadoop-0.18.0/conf/hadoop-site.xml
    I added these lines in the preexisting block:

    <property>
    <name>fs.default.name</name>
    <value>hdfs://<ip_address>:9000</value>
    </property>
    <property>
    <name>mapred.job.tracker</name>
    <value><ip_address>:9001</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>

    I then shut down the instance of Hadoop running in the virtual machine via
    ./stop-hadoop
    then shutdown the virtual machine via
    sudo poweroff
    exited out of Eclipse
    Restarted Eclipse, but from the Cygwin command line per http://hadoop.6.n7.nabble.com/Eclipse-plugin-td8156.html
    Restarted the VMware Player and opened the virtual machine, and started hadoop via the "./start-hadoop" command from the tutorial.

    Then, I deleted and recreated the VMware server in Eclipse. After that, the hadoop.job.ugi entry still wasn't there, HOWEVER, from the project explorer, I right clicked on the VMware server, and selected refresh, and then reconnect. Some entries then appeared below (a folder named "(2)", with some subfolders "hadoop (1)" with its own subfolder "mapred (1)", and another subfolder below "(2)" named "user (1)", and below that, "hadoop-user (0)". Now, when I right click in the Map/Reduce Locations on VMware server and select "Edit hadoop location", then "Advanced parameters", then, finally, the hadoop.job.ugi entry showed up.

    ReplyDelete
  7. Recording to this article excellent.i hope learn to lot of hadoop very useful for my projects.


    Hadoop Training in Chennai

    ReplyDelete
  8. This is Excellent reviews.thanks for that.

    ReplyDelete
  9. Thanks to Share the QTP Material for Freshers,

    qtptrainingchennai

    ReplyDelete