KeyDistribution (Pig 0.9.3-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.zebra.io
Class KeyDistribution

java.lang.Object
  org.apache.hadoop.zebra.io.KeyDistribution

public class KeyDistribution
extends Object
extends Object

Class used to convey the information of how on-disk data are distributed among key-partitioned buckets. This class is used by the MapReduce layer to calculate intelligent splits.

Method Summary
`BlockDistribution`	`getBlockDistribution(RawComparable key)`
`RawComparable[]`	`getKeys()` Get the list of sampling keys
`long`	`getMinStepSize()` Get the minimum split step size from all tables in union
`long`	`length()` Get the total unique bytes contained in the key-partitioned buckets.
`static KeyDistribution`	`merge(KeyDistribution[] sourceKeys)` Merge the key samples Algorithm: select the smallest key from all clean source ranges and ranges subsequent to respective dirty ranges.
`int`	`resize(BlockDistribution lastBd)`
`int`	`size()` Get the size of the key sampling.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

length

public long length()

Get the total unique bytes contained in the key-partitioned buckets.

Returns:: The total number of bytes contained in the key-partitioned buckets.

size

public int size()

Get the size of the key sampling.

Returns:: Number of key samples.

getMinStepSize

public long getMinStepSize()

Get the minimum split step size from all tables in union

getKeys

public RawComparable[] getKeys()

Get the list of sampling keys

Returns:: A list of sampling keys

getBlockDistribution

public BlockDistribution getBlockDistribution(RawComparable key)

merge

public static KeyDistribution merge(KeyDistribution[] sourceKeys)
                             throws IOException

Merge the key samples Algorithm: select the smallest key from all clean source ranges and ranges subsequent to respective dirty ranges. A dirty range is a range that has been partially needed by one or more of the previous final ranges.

Parameters:: sourceKeys - key samples to be merged
Returns:: the merged key samples
Throws:: IOException

resize

public int resize(BlockDistribution lastBd)