Setting PIG- ECLIPSE environment/Maven Pig project in
Windows machine
Setting up ping environment in linux environment is very
easy but setting up same in windows is not straight forward and initially eat
up lot time. So I would like to share this in blog so that someone can make use
of this and save their precious time!
Below steps illustrate setting up the pigunit in eclipse.
1 1. Download eclipse 3.7 and above from eclipse
site.
2 2. Install m2eclipse to work with maven
3 3. Install the Pig eclipse plugin: Go to the site https://cwiki.apache.org/confluence/display/PIG/PigTools
take the update url or copy the below
URL, go to eclipse help->install new software -> paste the URL -> select pig eclipse -> next
5. Create maven Project. Project folder structure should look like below. FixHadoopOnWindows.java file and Pom.xml dependcy changes are very important to run the test case.
6. Add below lines to sample.data
Johny, Johny!
Yes, Papa
Eating sugar?
No, Papa
Telling lies?
No, Papa
Open your mouth!
Ha! Ha! Ha!
8. Add below lines to wordCont.pig file.
A = load 'src/main/resources/sample.data';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
dump D;
1 9. Open the pom.xml file replace existing maven dependency
with below.
<repositories>
<repository>
<id>cloudera-releases</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.javassist</groupId>
<artifactId>javassist</artifactId>
<version>3.18.1-GA</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<classifier>h2</classifier>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pigunit</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
<version>0.9.94</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
<version> 3.5</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.javassist</groupId>
<artifactId>javassist</artifactId>
<version>3.18.1-GA</version>
</dependency>
</dependencies>
10. AppTest.java
import junit.framework.TestCase;
import org.apache.pig.pigunit.PigTest;
/**
* Unit test for simple App.
*/
public class AppTest extends TestCase {
public void testStudentsPigScript() throws Exception {
FixHadoopOnWindows.runFix();
PigTest pigTest = new PigTest("src/main/resources/wordcount.pig");
pigTest.assertOutput("D", new String[] { "(2,No)", "(3,Ha!)",
"(1,Yes)", "(1,Open)", "(3,Papa)", "(1,your)", "(1,Johny)",
"(1,lies?)", "(1,Eating)", "(1,Johny!)", "(1,mouth!)",
"(1,sugar?)", "(1,Telling)", });
}
}
10. FixHadoopOnWindows.java
import javassist.CannotCompileException;
import javassist.ClassPool;
import javassist.CtClass;
import javassist.CtMethod;
import com.sun.jersey.api.NotFoundException;
public class FixHadoopOnWindows {
/**
* Fix the followind Hadoop problem on Windows:
* 1) mapReduceLayer.Launcher: Backend error message during job submission java.io.IOException: Failed to set permissions of path: \tmp\hadoop-MyUsername\mapred\staging\
* 2) java.io.IOException: Failed to set permissions of path: bla-bla-bla\.staging to 0700
* @throws javassist.NotFoundException
*/
public static void runFix() throws NotFoundException, CannotCompileException, javassist.NotFoundException {
if( isWindows() ) { // run fix only on Windows
setUpSystemVariables();
fixCheckReturnValueMethod();
}
}
// set up correct temporary directory on windows
private static void setUpSystemVariables() {
System.getProperties().setProperty("java.io.tmpdir", "C:/TMP/");
}
/**
* org.apache.hadoop.fs.FileUtil#checkReturnValue doesn't work on Windows at all
* so, let's change method body with Javassist on empty body
* @throws javassist.NotFoundException
*/
private static void fixCheckReturnValueMethod() throws NotFoundException, CannotCompileException, javassist.NotFoundException {
ClassPool cp = new ClassPool(true);
CtClass ctClass = cp.get("org.apache.hadoop.fs.FileUtil");
CtMethod ctMethod = ctClass.getDeclaredMethod("checkReturnValue");
ctMethod.setBody("{ }");
ctClass.toClass();
}
private static boolean isWindows() {
String OS = System.getProperty("os.name");
return OS.startsWith("Windows");
}
private FixHadoopOnWindows() { }
}
After all changes when we run the test case from AppTest.java it should run without fail. intermediate pig files creates in c:\temp. "FixHadoopOnWindows" is used to run the test cases in windows machine. without this file its not possible to run test cases in windows eclipse.
FixHadoopOnWindows.runFix(); is called before executing the testcase in AppTest.java
hi sir, following exception occured while attempting to run
ReplyDeletejava.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
when running maven test : Tests in error:
testStudentsPigScript(com.ram.hjk.debug_pig.AppTest): The system cannot find the path specified
Thank you.