|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.zebra.io.BasicTable.Reader
public static class BasicTable.Reader
BasicTable reader.
Nested Class Summary | |
---|---|
static class |
BasicTable.Reader.RangeSplit
A range-based split on the metaReadertable.The content of the split is implementation-dependent. |
static class |
BasicTable.Reader.RowSplit
A row-based split on the zebra table; |
Constructor Summary | |
---|---|
BasicTable.Reader(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Create a BasicTable reader. |
|
BasicTable.Reader(org.apache.hadoop.fs.Path path,
String[] deletedCGs,
org.apache.hadoop.conf.Configuration conf)
|
Method Summary | |
---|---|
void |
close()
Close the BasicTable for reading. |
BlockDistribution |
getBlockDistribution(BasicTable.Reader.RangeSplit split)
Given a split range, calculate how the file data that fall into the range are distributed among hosts. |
BlockDistribution |
getBlockDistribution(BasicTable.Reader.RowSplit split)
Given a row-based split, calculate how the file data that fall into the split are distributed among hosts. |
String |
getDeletedCGs()
|
static String |
getDeletedCGs(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
|
KeyDistribution |
getKeyDistribution(int n,
int nTables,
BlockDistribution lastBd)
Collect some key samples and use them to partition the table. |
DataInputStream |
getMetaBlock(String name)
Obtain an input stream for reading a meta block. |
String |
getName(int i)
|
String |
getPath()
Get the path to the table. |
org.apache.hadoop.fs.PathFilter |
getPathFilter(org.apache.hadoop.conf.Configuration conf)
Get the path filter used by the table. |
int |
getRowSplitCGIndex()
Get index of the column group that will be used for row-based split. |
TableScanner |
getScanner(BasicTable.Reader.RangeSplit split,
boolean closeReader)
Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RangeSplit object, which should be obtained from previous calls
of rangeSplit(int) . |
TableScanner |
getScanner(boolean closeReader,
BasicTable.Reader.RowSplit rowSplit)
Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RowSplit object. |
TableScanner |
getScanner(org.apache.hadoop.io.BytesWritable beginKey,
org.apache.hadoop.io.BytesWritable endKey,
boolean closeReader)
Get a scanner that reads all rows whose row keys fall in a specific range. |
Schema |
getSchema()
Get the schema of the table. |
static Schema |
getSchema(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Get the BasicTable schema without loading the full table index. |
SortInfo |
getSortInfo()
|
BasicTableStatus |
getStatus()
Get the status of the BasicTable. |
boolean |
isSorted()
Is the Table sorted? |
List<BasicTable.Reader.RangeSplit> |
rangeSplit(int n)
Split the table into at most n parts. |
void |
rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus)
Rearrange the files according to the column group index ordering |
List<BasicTable.Reader.RowSplit> |
rowSplit(long[] starts,
long[] lengths,
org.apache.hadoop.fs.Path[] paths,
int splitCGIndex,
int[] batchSizes,
int numBatches)
We already use FileInputFormat to create byte offset-based input splits. |
void |
setProjection(String projection)
Set the projection for the reader. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BasicTable.Reader(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws IOException
path
- The directory path to the BasicTable.conf
- Optional configuration parameters.
IOException
public BasicTable.Reader(org.apache.hadoop.fs.Path path, String[] deletedCGs, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
Method Detail |
---|
public boolean isSorted()
public SortInfo getSortInfo()
public String getName(int i)
public void setProjection(String projection) throws ParseException, IOException
getScanner(RangeSplit, boolean)
,
getScanner(BytesWritable, BytesWritable, boolean)
,
getStatus()
, getSchema()
.
projection
- The projection on the BasicTable for subsequent read operations.
For this version of implementation, the projection is a comma
separated list of column names, such as
"FirstName, LastName, Sex, Department". If we want select all
columns, pass projection==null.
IOException
ParseException
public BasicTableStatus getStatus() throws IOException
IOException
public BlockDistribution getBlockDistribution(BasicTable.Reader.RangeSplit split) throws IOException
split
- The range-based split. Can be null to indicate the whole TFile.
IOException
rangeSplit(int)
public BlockDistribution getBlockDistribution(BasicTable.Reader.RowSplit split) throws IOException
split
- The row-based split. Cannot be null.
IOException
public KeyDistribution getKeyDistribution(int n, int nTables, BlockDistribution lastBd) throws IOException
KeyDistribution
object also contains information on how data are distributed for each
key-partitioned bucket.
n
- Targeted size of the sampling.nTables
- Number of tables in union
IOException
public TableScanner getScanner(org.apache.hadoop.io.BytesWritable beginKey, org.apache.hadoop.io.BytesWritable endKey, boolean closeReader) throws IOException
beginKey
- The begin key of the scan range. If null, start from the first
row in the table.endKey
- The end key of the scan range. If null, scan till the last row
in the table.closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.
IOException
public TableScanner getScanner(BasicTable.Reader.RangeSplit split, boolean closeReader) throws IOException, ParseException
BasicTable.Reader.RangeSplit
object, which should be obtained from previous calls
of rangeSplit(int)
.
split
- The split range. If null, get a scanner to read the complete
table.closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.
IOException
ParseException
public TableScanner getScanner(boolean closeReader, BasicTable.Reader.RowSplit rowSplit) throws IOException, ParseException, ParseException
BasicTable.Reader.RowSplit
object.
closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.rowSplit
- split based on row numbers.
IOException
ParseException
public Schema getSchema()
getSchema(Path, Configuration)
if a projection
has been set on the table.
public static Schema getSchema(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws IOException
path
- The path to the BasicTable.conf
-
IOException
public String getPath()
public org.apache.hadoop.fs.PathFilter getPathFilter(org.apache.hadoop.conf.Configuration conf)
public List<BasicTable.Reader.RangeSplit> rangeSplit(int n) throws IOException
n
- Maximum number of parts in the output list.
IOException
public List<BasicTable.Reader.RowSplit> rowSplit(long[] starts, long[] lengths, org.apache.hadoop.fs.Path[] paths, int splitCGIndex, int[] batchSizes, int numBatches) throws IOException
starts
- array of starting byte of fileSplits.lengths
- array of length of fileSplits.paths
- array of path of fileSplits.splitCGIndex
- index of column group that is used to create fileSplits.
IOException
public void rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus) throws IOException
filestatus
- array of FileStatus to be rearraged on
IOException
public int getRowSplitCGIndex() throws IOException
IOException
public void close() throws IOException
close
in interface Closeable
IOException
public String getDeletedCGs()
public static String getDeletedCGs(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public DataInputStream getMetaBlock(String name) throws MetaBlockDoesNotExist, IOException
name
- The name of the meta block.
IOException
MetaBlockDoesNotExist
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |