12/30/2014

Hadoop Counters Example or How to track the number of records processed by mapper and reducer

We should use Hadoop counters as much as possible in Map reduce programs. So that we can keep track of number of records processed in Mappers and Reducers.

Input
----------------------
Johny, Johny!
Yes, Papa
Eating sugar?
No, Papa
Telling lies?
No, Papa
Open your mouth!
Ha! Ha! Ha!

output Console
------------------------------
Total Number of Records Processed in MAP: 8
Total Number of Records Processed in Reducer: 8



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
package com.my.cert.example;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class CounterUseCaseExp {
 public static void main(String[] args) throws Exception {

  Path inputPath = new Path("C:\\hadoop\\test\\test.txt");
  Path outputDir = new Path("C:\\hadoop\\test\\test1");

  // Create configuration
  Configuration conf = new Configuration(true);

  // Create job
  Job job = new Job(conf, "Hadoop Counter Example");
  job.setJarByClass(CounterUseCaseExp.class);

  // Setup MapReduce
  job.setMapperClass(CounterUseCaseExp.MapTask.class);
  job.setReducerClass(CounterUseCaseExp.ReduceTask.class);
  job.setNumReduceTasks(1);

  job.setOutputKeyClass(NullWritable.class);
  job.setOutputValueClass(Text.class);

  // Input
  FileInputFormat.addInputPath(job, inputPath);
  job.setInputFormatClass(TextInputFormat.class);

  // Output
  FileOutputFormat.setOutputPath(job, outputDir);
  job.setOutputFormatClass(TextOutputFormat.class);
  int code = job.waitForCompletion(true) ? 0 : 1;
  
  Counter mapperCounter = job.getCounters().findCounter(MapTask.MapCounters.MAP_RECORD_COUNTER);
  Counter reducerCounter = job.getCounters().findCounter(ReduceTask.ReducerCounters.REDUCER_RECORD_COUNTER);
  System.out.println("Total Number of Records Processed in MAP: "+mapperCounter.getValue());
  System.out.println("Total Number of Records Processed in Reducer: "+reducerCounter.getValue()); 
  System.exit(code);

 }

 public static class MapTask extends
   Mapper<LongWritable, Text, NullWritable, Text> {

  static enum MapCounters {
   MAP_RECORD_COUNTER
  }

  public void map(LongWritable key, Text value, Context context)
    throws java.io.IOException, InterruptedException {
   context.getCounter(MapCounters.MAP_RECORD_COUNTER).increment(1);
   String line = value.toString();
   context.write(NullWritable.get(), new Text(line));
  }
 }

 public static class ReduceTask extends
   Reducer<NullWritable, Text, NullWritable, Text> {
  static enum ReducerCounters {
   REDUCER_RECORD_COUNTER
  }

  public void reduce(NullWritable key, Iterable<Text> list, Context context)
    throws java.io.IOException, InterruptedException {
   for (Text item : list) {
    context.write(key, item);
    context.getCounter(ReducerCounters.REDUCER_RECORD_COUNTER).increment(1);
   }
  }
 }

}

22 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

    ReplyDelete
  2. if possible share with us...Very useful information is providing by ur blog.here is a way to find Oracle Training In Chennai

    ReplyDelete
  3. i wondered keep share this sites .if anyone wants realtime training Greens technolog chennai in Adyar visit this blog..performance tuning training In Chennai and more Oracle Training In Chennai

    ReplyDelete
  4. once again sharing this informative blog .Datastage training In Chennai It uses a graphical notation to construct data integration solutions and is available in various versions may visit greens technology chennai in adyar Greens Technologys Training In Chennai

    ReplyDelete
  5. As your information sybase very nice its more informative and gather new ideas implemnted thanks for sharing this blog sybase training In Chennai if want to get details now Greens Technologies Training In Chennai

    ReplyDelete
  6. i gain the knowledge of Java programs easy to add functionalities play online games, chating with others and industry oriented coaching available from greens technology chennai in Adyar may visit.Core java training In Chennai

    ReplyDelete
  7. I have read your blog and I got very useful and knowledgeable information from your blog. It’s really a very nice article Greens Technologies Training In Chennai

    ReplyDelete
  8. fantastic presentation .We are charging very competitive in the market which helps to bring more oracle professionals into this market. may update this blog . Oracle training In Chennai which No1:Greens Technologies In Chennai

    ReplyDelete
  9. Excellent post, I agree with you 100%! I’m always scouring the oracle for new information and learning whatever I can, and in doing so I sometimes leave comments on blogs.Oracle Training In Chennai

    ReplyDelete
  10. There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training In Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this blogs..

    ReplyDelete
  11. I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
    SalesForce Training in Chennai

    ReplyDelete
  12. Pretty article! I found some useful information in your blog, it was awesome to read,thanks for sharing this great content to my vision, keep sharing..
    Unix Training In Chennai

    ReplyDelete
  13. This information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post,thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
    Android Training In Chennai In Chennai

    ReplyDelete
  14. I have read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job.
    SAP Training in Chennai

    ReplyDelete
  15. Oracle Training in chennai
    Thanks for sharing such a great information..Its really nice and informative..

    ReplyDelete
  16. Selenium Training in Chennai
    Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

    ReplyDelete
  17. Data warehousing Training in Chennai
    I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..

    ReplyDelete
  18. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
    Websphere Training in Chennai

    ReplyDelete
  19. Oracle DBA Training in Chennai
    Thanks for sharing this informative blog. I did Oracle DBA Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..

    ReplyDelete
  20. antastic presentation .We are charging very competitive in the market which helps to bring

    more oracle professionals into this market. may update this blog . Oracle training In Chennai which No1:Greens

    Technologies In Chennai

    ReplyDelete
  21. Good post.You have explained the concepts clearly.I suggest to view this blog that is who are in need.You are running a good blog.Keep blogging always like this.
    Regards,
    Hadoop Training Chennai | Hadoop course in Chennai

    ReplyDelete
  22. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete