|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.data.DefaultAbstractBag
org.apache.pig.data.SortedSpillBag
org.apache.pig.data.InternalDistinctBag
public class InternalDistinctBag
An unordered collection of Tuples with no multiples. Data is stored without duplicates as it comes in. When it is time to spill, that data is sorted and written to disk. The data is stored in a HashSet. When it is time to sort it is placed in an ArrayList and then sorted. Dispite all these machinations, this was found to be faster than storing it in a TreeSet. This bag spills pro-actively when the number of tuples in memory reaches a limit
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.pig.data.DefaultAbstractBag |
---|
DefaultAbstractBag.BagDelimiterTuple, DefaultAbstractBag.EndBag, DefaultAbstractBag.StartBag |
Field Summary |
---|
Fields inherited from class org.apache.pig.data.DefaultAbstractBag |
---|
endBag, MAX_SPILL_FILES, mContents, mLastContentsSize, mMemSize, mSize, mSpillFiles, startBag |
Constructor Summary | |
---|---|
InternalDistinctBag()
|
|
InternalDistinctBag(int bagCount)
|
|
InternalDistinctBag(int bagCount,
double percent)
|
Method Summary | |
---|---|
void |
add(Tuple t)
Add a tuple to the bag. |
void |
addAll(Collection<Tuple> c)
Add contents of a container to the bag. |
void |
addAll(DataBag b)
Add contents of a bag to the bag. |
boolean |
isDistinct()
Find out if the bag is distinct. |
boolean |
isSorted()
Find out if the bag is sorted. |
Iterator<Tuple> |
iterator()
Get an iterator to the bag. |
long |
size()
Get the number of elements in the bag, both in memory and on disk. |
long |
spill()
Instructs an object to spill whatever it can to disk and release references to any data structures it spills. |
Methods inherited from class org.apache.pig.data.SortedSpillBag |
---|
proactive_spill |
Methods inherited from class org.apache.pig.data.DefaultAbstractBag |
---|
clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markStale, readFields, reportProgress, toString, warn, write |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public InternalDistinctBag()
public InternalDistinctBag(int bagCount)
public InternalDistinctBag(int bagCount, double percent)
Method Detail |
---|
public boolean isSorted()
DataBag
public boolean isDistinct()
DataBag
public long size()
DefaultAbstractBag
size
in interface DataBag
size
in class DefaultAbstractBag
public Iterator<Tuple> iterator()
DataBag
public void add(Tuple t)
DefaultAbstractBag
add
in interface DataBag
add
in class DefaultAbstractBag
t
- tuple to add.public void addAll(DataBag b)
DefaultAbstractBag
addAll
in interface DataBag
addAll
in class DefaultAbstractBag
b
- bag to add contents of.public void addAll(Collection<Tuple> c)
DefaultAbstractBag
addAll
in class DefaultAbstractBag
c
- Collection to add contents of.public long spill()
Spillable
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |