OmniSciDB
bf83d84833
|
#include <LazyParquetChunkLoader.h>
Public Member Functions | |
LazyParquetChunkLoader (std::shared_ptr< arrow::fs::FileSystem > file_system) | |
std::list< std::unique_ptr < ChunkMetadata > > | loadChunk (const std::vector< RowGroupInterval > &row_group_intervals, const int parquet_column_index, std::list< Chunk_NS::Chunk > &chunks, StringDictionary *string_dictionary=nullptr) |
std::list< RowGroupMetadata > | metadataScan (const std::set< std::string > &file_paths, const ForeignTableSchema &schema) |
Perform a metadata scan for the paths specified. More... | |
Static Public Member Functions | |
static bool | isColumnMappingSupported (const ColumnDescriptor *omnisci_column, const parquet::ColumnDescriptor *parquet_column) |
Static Public Attributes | |
static const int | batch_reader_num_elements = 4096 |
Private Attributes | |
std::shared_ptr < arrow::fs::FileSystem > | file_system_ |
A lazy parquet to chunk loader
Definition at line 32 of file LazyParquetChunkLoader.h.
foreign_storage::LazyParquetChunkLoader::LazyParquetChunkLoader | ( | std::shared_ptr< arrow::fs::FileSystem > | file_system | ) |
Definition at line 1478 of file LazyParquetChunkLoader.cpp.
|
static |
Determine if a Parquet to OmniSci column mapping is supported.
omnisci_column | - the column descriptor of the OmniSci column |
parquet_column | - the column descriptor of the Parquet column |
Definition at line 1442 of file LazyParquetChunkLoader.cpp.
References foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_array_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_date_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_decimal_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_floating_point_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_geospatial_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_integral_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_none_type_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_string_mapping(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_time_mapping(), and foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_timestamp_mapping().
Referenced by foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_allowed_mapping(), and foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_array_mapping().
std::list< std::unique_ptr< ChunkMetadata > > foreign_storage::LazyParquetChunkLoader::loadChunk | ( | const std::vector< RowGroupInterval > & | row_group_intervals, |
const int | parquet_column_index, | ||
std::list< Chunk_NS::Chunk > & | chunks, | ||
StringDictionary * | string_dictionary = nullptr |
||
) |
Load a number of row groups of a column in a parquet file into a chunk
row_group_interval | - an inclusive interval [start,end] that specifies row groups to load |
parquet_column_index | - the logical column index in the parquet file (and omnisci db) of column to load |
chunks | - a list containing the chunks to load |
string_dictionary | - a string dictionary for the column corresponding to the column, if applicable |
NOTE: if more than one chunk is supplied, the first chunk is required to be the chunk corresponding to the logical column, while the remaining chunks correspond to physical columns (in ascending order of column id.) Similarly, if a metada update is expected, the list of ChunkMetadata shared pointers returned will correspond directly to the list chunks
.
Definition at line 1482 of file LazyParquetChunkLoader.cpp.
References foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::append_row_groups(), CHECK, and file_system_.
Referenced by foreign_storage::ParquetDataWrapper::loadBuffersUsingLazyParquetChunkLoader().
std::list< RowGroupMetadata > foreign_storage::LazyParquetChunkLoader::metadataScan | ( | const std::set< std::string > & | file_paths, |
const ForeignTableSchema & | schema | ||
) |
Perform a metadata scan for the paths specified.
file_paths | - (ordered) files of the metadata scan |
schema | - schema of the foreign table to perform metadata scan for |
file_paths
Definition at line 1508 of file LazyParquetChunkLoader.cpp.
References CHECK, file_system_, foreign_storage::get_parquet_table_size(), foreign_storage::ForeignTableSchema::getLogicalAndPhysicalColumns(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::metadata_scan_rowgroup_interval(), foreign_storage::open_parquet_table(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::populate_encoder_map(), foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_equal_schema(), and foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::validate_parquet_metadata().
Referenced by foreign_storage::ParquetDataWrapper::metadataScanFiles().
|
static |
Definition at line 37 of file LazyParquetChunkLoader.h.
Referenced by foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::append_row_groups(), and foreign_storage::anonymous_namespace{LazyParquetChunkLoader.cpp}::resize_values_buffer().
|
private |
Definition at line 94 of file LazyParquetChunkLoader.h.
Referenced by loadChunk(), and metadataScan().