Hadoop的Writable必须是reusable

这个问题导致的bug困扰了我大半天。

旧的错误的代码:

   1: public class VectorWritable implements Writable {
   2:
   3:     public TIntIntHashMap vectorMap;
   4:
   5:     public VectorWritable () {
   6:         this.vectorMap = new TIntIntHashMap();
   7:     }
   8:
   9:     //...................
  10:
  11:     @Override
  12:     public void readFields(DataInput in) throws IOException {
  13:         int size = in.readInt();
  14:
  15:         for (int i = 0; i < size; ++i) {
  16:             int index = in.readInt();
  17:             int value = in.readInt();
  18:             this.vectorMap.put(index, value);
  19:         }
  20:     }
  21:
  22:     @Override
  23:     public void write(DataOutput out) throws IOException {
  24:         int size = this.vectorMap.size();
  25:         out.writeInt(size);
  26:
  27:         int[] keys = this.vectorMap.keys();
  28:         for (int key : keys) {
  29:             int value = this.vectorMap.get(key);
  30:             out.writeInt(key);
  31:             out.writeInt(value);
  32:         }
  33:     }
  34: }

(代码中的TIntIntHashMap是Trove库中的一个类,类似于HashMap)

这个代码放在单机上测试运行,一点问题都没有。但是用在hadoop里面,导致输出结果不对。

问题出在readFields的实现上。Writable的JavaDoc中,关于readFields()成员函数,有这么句话:
“For efficiency, implementations should attempt to re-use storage in the existing object where possible.”
而在我的错误实现中,如果连续调用readFields()两次,后一次调用的时候,前一次读取到到的内容仍然存放在TIntIntHashMap中,导致结果不正确。

正确的做法,在每次调用readFields的时候,必须重新初始化TIntIntHashMap或者将它清空(注意第5行):

   1: @Override
   2: public void readFields(DataInput in) throws IOException {
   3:     int size = in.readInt();
   4:
   5:     this.vectorMap = new TIntIntHashMap();
   6:
   7:     for (int i = 0; i < size; ++i) {
   8:         int index = in.readInt();
   9:         int value = in.readInt();
  10:         this.vectorMap.put(index, value);
  11:     }
  12: }

—END.—

Advertisements
相册 | 此条目发表在Hadoop分类目录,贴了标签。将固定链接加入收藏夹。

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s