12/15/2014

Setting PIG- ECLIPSE environment/Maven Pig project in Windows machine

Setting PIG- ECLIPSE environment/Maven Pig project in Windows machine  

Setting up ping environment in linux environment is very easy but setting up same in windows is not straight forward and initially eat up lot time. So I would like to share this in blog so that someone can make use of this and save their precious time!
Below steps illustrate setting up the pigunit in eclipse. 
1  1.  Download eclipse 3.7 and above from eclipse site.
2  2.  Install m2eclipse to work with maven
3  3.   Install the Pig eclipse plugin: Go to the site https://cwiki.apache.org/confluence/display/PIG/PigTools  take the update url or copy the below URL, go to eclipse help->install new software -> paste the URL ->  select pig eclipse -> next

4.       Once installation completes restart the
5.    Create maven Project. Project folder structure should look like below.  FixHadoopOnWindows.java  file  and Pom.xml dependcy changes are very important to run the test case.


  

















6. Add below lines to sample.data

Johny, Johny!

Yes, Papa

Eating sugar?

No, Papa

Telling lies?

No, Papa

Open your mouth!

Ha! Ha! Ha!

8.  Add below lines to wordCont.pig file.

A = load 'src/main/resources/sample.data';

B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;

C = group B by word;

D = foreach C generate COUNT(B), group;

dump D;

1 9.   Open the pom.xml file replace existing maven dependency with below.



<repositories>
                                <repository>
                                                <id>cloudera-releases</id>
                                               <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
                                                <releases>
                                                                <enabled>true</enabled>
                                                </releases>
                                                <snapshots>
                                                                <enabled>false</enabled>
                                                </snapshots>
                                </repository>
                </repositories>

                <dependencies>
                                <dependency>
                                                <groupId>org.javassist</groupId>
                                                <artifactId>javassist</artifactId>
                                                <version>3.18.1-GA</version>
                                </dependency>
                                <dependency>
                                                <groupId>org.apache.hadoop</groupId>
                                                <artifactId>hadoop-client</artifactId>
                                                <version>2.3.0</version>
                                </dependency>
                                <dependency>
                                                <groupId>org.apache.hadoop</groupId>
                                                <artifactId>hadoop-core</artifactId>
                                                <version>1.2.1</version>
                                </dependency>
                                <dependency>
                                                <groupId>org.apache.pig</groupId>
                                                <artifactId>pig</artifactId>
                                                <classifier>h2</classifier>
                                                <version>0.13.0</version>
                                </dependency>
                                <dependency>
                                                <groupId>org.apache.pig</groupId>
                                                <artifactId>pigunit</artifactId>
                                                <version>0.13.0</version>
                                </dependency>
                                <dependency>
                                                <groupId>jline</groupId>
                                                <artifactId>jline</artifactId>
                                                <version>0.9.94</version>
                                </dependency>
                                <dependency>
                                                <groupId>org.antlr</groupId>
                                                <artifactId>antlr-runtime</artifactId>
                                                <version> 3.5</version>
                                </dependency>
                                <dependency>
                                                <groupId>com.google.guava</groupId>
                                                <artifactId>guava</artifactId>
                                                <version>18.0</version>
                                </dependency>
                                <dependency>
                                                <groupId>joda-time</groupId>
                                                <artifactId>joda-time</artifactId>
                                                <version>2.2</version>
                                </dependency>
                                <dependency>
                                                <groupId>junit</groupId>
                                                <artifactId>junit</artifactId>
                                                <version>3.8.1</version>
                                                <scope>test</scope>
                                </dependency>
                                <dependency>
                                                <groupId>org.javassist</groupId>
                                                <artifactId>javassist</artifactId>
                                                <version>3.18.1-GA</version>
                                </dependency>

                </dependencies>


10. AppTest.java


import junit.framework.TestCase;

import org.apache.pig.pigunit.PigTest;



/**
 * Unit test for simple App.
 */
public class AppTest extends TestCase {

public void testStudentsPigScript() throws Exception {
FixHadoopOnWindows.runFix();
PigTest pigTest = new PigTest("src/main/resources/wordcount.pig");
pigTest.assertOutput("D", new String[] { "(2,No)", "(3,Ha!)",
"(1,Yes)", "(1,Open)", "(3,Papa)", "(1,your)", "(1,Johny)",
"(1,lies?)", "(1,Eating)", "(1,Johny!)", "(1,mouth!)",
"(1,sugar?)", "(1,Telling)", });
}
}

10. FixHadoopOnWindows.java


import javassist.CannotCompileException;
import javassist.ClassPool;
import javassist.CtClass;
import javassist.CtMethod;

import com.sun.jersey.api.NotFoundException;

public class FixHadoopOnWindows {
/**
     * Fix the followind Hadoop problem on Windows:
     * 1) mapReduceLayer.Launcher: Backend error message during job submission java.io.IOException: Failed to set permissions of path: \tmp\hadoop-MyUsername\mapred\staging\
     * 2) java.io.IOException: Failed to set permissions of path: bla-bla-bla\.staging to 0700
* @throws javassist.NotFoundException 
     */
    public static void runFix() throws NotFoundException, CannotCompileException, javassist.NotFoundException {
        if( isWindows() ) { // run fix only on Windows
            setUpSystemVariables();
            fixCheckReturnValueMethod();
        }
    }
    // set up correct temporary directory on windows
    private static void setUpSystemVariables() {
        System.getProperties().setProperty("java.io.tmpdir", "C:/TMP/");
    }
    /**
     * org.apache.hadoop.fs.FileUtil#checkReturnValue doesn't work on Windows at all
     * so, let's change method body with Javassist on empty body
     * @throws javassist.NotFoundException 
     */
    private static void fixCheckReturnValueMethod() throws NotFoundException, CannotCompileException, javassist.NotFoundException {
        ClassPool cp = new ClassPool(true);
        CtClass ctClass = cp.get("org.apache.hadoop.fs.FileUtil");
        CtMethod ctMethod = ctClass.getDeclaredMethod("checkReturnValue");
        ctMethod.setBody("{  }");
        ctClass.toClass();
    }
    private static boolean isWindows() {
        String OS = System.getProperty("os.name");
        return OS.startsWith("Windows");
    }
    private FixHadoopOnWindows() { }

}

After all changes when we run the test case from AppTest.java it should run without fail.  intermediate  pig files  creates in c:\temp.  "FixHadoopOnWindows"  is used to run the test cases in windows machine. without this file its not possible to run test cases in windows eclipse.

FixHadoopOnWindows.runFix(); is called before executing the testcase in AppTest.java






















2 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Course in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training Chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Training in Chennai | Big Data Training

    ReplyDelete
  2. hi sir, following exception occured while attempting to run
    java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

    when running maven test : Tests in error:
    testStudentsPigScript(com.ram.hjk.debug_pig.AppTest): The system cannot find the path specified

    Thank you.


    ReplyDelete