12/30/2014

Hadoop JAVA Map Reduce Sort by Value


Input  (test.txt)
------------
1,50
2,20
3,30
4,10
5,15
6,25
7,55
8,35
9,70

output
----------------------------
9 70
7 55
1 50
8 35
3 30
6 25
2 20
5 15

4 10



package com.my.cert.example;

import java.nio.ByteBuffer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable.Comparator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class ValueSortExp {
 public static void main(String[] args) throws Exception {

  Path inputPath = new Path("C:\\hadoop\\test\\test.txt");
  Path outputDir = new Path("C:\\hadoop\\test\\test1");

  // Path inputPath = new Path(args[0]);
  // Path outputDir = new Path(args[1]);

  // Create configuration
  Configuration conf = new Configuration(true);

  // Create job
  Job job = new Job(conf, "Test HIVE commond");
  job.setJarByClass(ValueSortExp.class);

  // Setup MapReduce
  job.setMapperClass(ValueSortExp.MapTask.class);
  job.setReducerClass(ValueSortExp.ReduceTask.class);
  job.setNumReduceTasks(1);

  // Specify key / value
  job.setMapOutputKeyClass(IntWritable.class);
  job.setMapOutputValueClass(IntWritable.class);
  job.setOutputKeyClass(IntWritable.class);
  job.setOutputValueClass(IntWritable.class);
  job.setSortComparatorClass(IntComparator.class);
  // Input
  FileInputFormat.addInputPath(job, inputPath);
  job.setInputFormatClass(TextInputFormat.class);

  // Output
  FileOutputFormat.setOutputPath(job, outputDir);
  job.setOutputFormatClass(TextOutputFormat.class);

  /*
   * // Delete output if exists FileSystem hdfs = FileSystem.get(conf); if
   * (hdfs.exists(outputDir)) hdfs.delete(outputDir, true);
   * 
   * // Execute job int code = job.waitForCompletion(true) ? 0 : 1;
   * System.exit(code);
   */

  // Execute job
  int code = job.waitForCompletion(true) ? 0 : 1;
  System.exit(code);

 }
 
 public static class IntComparator extends WritableComparator {

     public IntComparator() {
         super(IntWritable.class);
     }

     @Override
     public int compare(byte[] b1, int s1, int l1,
             byte[] b2, int s2, int l2) {

         Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
         Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();

         return v1.compareTo(v2) * (-1);
     }
 }

 public static class MapTask extends
   Mapper<LongWritable, Text, IntWritable, IntWritable> {
  public void map(LongWritable key, Text value, Context context)
    throws java.io.IOException, InterruptedException {
   String line = value.toString();
   String[] tokens = line.split(","); // This is the delimiter between
   int keypart = Integer.parseInt(tokens[0]);
   int valuePart = Integer.parseInt(tokens[1]);
   context.write(new IntWritable(valuePart), new IntWritable(keypart));

  }
 }

 public static class ReduceTask extends
   Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
  public void reduce(IntWritable key, Iterable<IntWritable> list, Context context)
    throws java.io.IOException, InterruptedException {
   
   for (IntWritable value : list) {
    
    context.write(value,key);
    
   }
   
  }
 }

}

35 comments:

  1. Thank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work

    Hadoop training chennai velachery
    Hadoop training velachery
    Hadoop training institute in t nagar

    ReplyDelete
  2. This works fine as the input K, V are unique values, this gives problems with duplicate keys while grouping. Correct me if I am wrong. Thanks

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
    Devops Training in Chennai

    Devops Training in Bangalore

    Devops Training in pune

    Devops training in tambaram
    Devops training in velachery

    ReplyDelete
  5. This is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb. This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolites festivity to pity. I appreciated what you ok extremely here 
    python training in chennai
    python training in chennai
    python training in Bangalore

    ReplyDelete
  6. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information. 

    java training in omr | oracle training in chennai

    java training in annanagar | java training in chennai

    ReplyDelete
  7. It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.
    Data Science training in Chennai
    Data science training in bangalore
    Data science training in pune
    Data science online training

    ReplyDelete
  8. I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. Ill be watching this post with much interest. Keep up the great work, I will be back
    Java training in Chennai

    Java training in Bangalore

    ReplyDelete
  9. Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
    python Training in Pune
    python Training in Chennai
    python Training in Bangalore

    ReplyDelete
  10. Thanks for sharing The Information The Information shared is very valuable Please keep Updating us Time Just went on redaing the article Python Online Training AWS Online Training Devops Online Training Data Science Online Training

    ReplyDelete
  11. Great post, informative and helpful post and you are obviously very knowledgeable in this field. Very useful and solid content. Thanks for sharing


    ExcelR Data Science Course in Bangalore

    ReplyDelete
  12. It’s very informative and you are obviously very knowledgeable in this area. You have opened my eyes to varying views on this topic with interesting and solid content.
    date analytics certification training courses
    data science courses training

    ReplyDelete
  13. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.




    BIG DATA COURSE MALAYSIA

    ReplyDelete
  14. Attend The Python training in bangalore From ExcelR. Practical Python training in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python training in bangalore.
    python training in bangalore

    ReplyDelete
  15. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    Data Science Courses

    ReplyDelete
  16. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    seo certification course

    ReplyDelete
  17. Nice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work
    data analytics course
    Business Analytics Certification Course Training in Hyderabad
    <a href="https://360digitmg.com/india/python-r-programming/''>Python & R Programming Course Training for Beginners</a>

    ReplyDelete
  18. Greetings! Very useful advice in this particular post! It's the little changes that make the gadgets greatest changes. Thanks for sharing!

    ReplyDelete
  19. Well explanation with great coding knowledge. This blog gonna helpful to many. I am expecting these kind blogs in future too.
    AWS training in chennai | AWS training in anna nagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery

    ReplyDelete
  20. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. data scientist courses

    ReplyDelete
  21. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
    data scientist course in hyderabad

    ReplyDelete
  22. Hi there, I found your blog via Google while searching for such kinda informative post and your post looks very interesting for me
    data scientist course in hyderabad

    ReplyDelete
  23. Infycle Technologies, the top software training institute and placement center in Chennai offers the Digital Marketing course in Chennai for freshers, students, and tech professionals at the best offers. In addition to the Oracle training, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

    ReplyDelete
  24. Want to set your career towards Big Data? Then Infycle is with you to make this into your life. Infycle Technologies gives the combined and best Big Data Hadoop training in Chennai, along with the 100% hands-on training guided by professional teachers in the field. In addition to this, the mock interviews for the placement will be guided to the candidates, so that, they can face the interviews with full confidence. Once after the mock interview, the candidates will be placed in the top MNC's with a great salary package. To get it all, call 7502633633 and make this happen for your happy life.Big Data Hadoop Training in Chennai | Infycle Technologies

    ReplyDelete
  25. Smart move for your career is Choosing to do Oracle Course in Chennai at Infycle!! Do you know why this name is chosen for Infycle. Infycle where the place we offered Infinity of Oracle.
    Yes!!! But not only Oracle, More than 20+ courses are offered here 5000+ students are placed in top MNC’s Company with good salary packages. For admission 7502633633.Best Oracle Training in Chennai | Infycle Technologies

    ReplyDelete
  26. Really awesome blog, informative and knowledgeable content. Thanks for sharing this blog with us. Keep sharing more stuff again.
    Data Science Institute in Hyderabad

    ReplyDelete
  27. This is really very nice post you shared, i like the post, thanks for sharing..
    cyber security course

    ReplyDelete
  28. Good to visit your weblog again, it has been months for me. Nicely this article that i've been waiting for so long. I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share.
    data scientist course in hyderabad

    ReplyDelete
  29. The content is well acknowledged, so no one could allege that it is just one person's opinion yet it covers and justifies all the applicable points. I have read such a startling work after a long time!
    Data Science training in Mumbai
    Data Science course in Mumbai
    SAP training in Mumbai

    ReplyDelete
  30. I am highly overwhelmed to read this perfect piece of writing. It has really enthused me to read more on this topic.
    Data Science training in Mumbai
    Data Science course in Mumbai
    SAP training in Mumbai

    ReplyDelete
  31. This comment has been removed by the author.

    ReplyDelete