OmniSciDB  fe05a0c208
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const std::string &base_path, const size_t num_reader_threads=0, const size_t default_page_size=DEFAULT_PAGE_SIZE)
 
virtual ~CachingFileMgr ()
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getDefaultPageSize ()
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
std::string getOrAddTableDir (int db_id, int tb_id)
 Returns (and optionally creates) a subdirectory for table-specific persistent data (e.g. serialized foreign data warppers). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
uint64_t getChunkSpaceReservedByTable (int db_id, int tb_id)
 
uint64_t getMetadataSpaceReservedByTable (int db_id, int tb_id)
 
uint64_t getWrapperSpaceReservedByTable (int db_id, int tb_id)
 
uint64_t getSpaceReservedByTable (int db_id, int tb_id)
 
std::string describeSelf () override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes) override
 allocates a new CachingFileBuffer. More...
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1, const size_t defaultPageSize=DEFAULT_PAGE_SIZE)
 Constructor. More...
 
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const size_t defaultPageSize, const bool runCoreInit)
 
 FileMgr (GlobalFileMgr *gfm, const size_t defaultPageSize, std::string basePath)
 
virtual ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats ()
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
Page requestFreePage (size_t pagesize, const bool isMetadata)
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
void clearSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId)
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator)
 
void init (const size_t num_reader_threads, const int32_t epochOverride)
 
void init (const std::string &dataPathToConvertFrom, const int32_t epochOverride)
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch ()
 Returns value of epoch at last checkpoint. More...
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedPages () const
 
size_t getNumUsedMetadataPages () const
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
int32_t getDBVersion () const
 Index for looking up chunks. More...
 
bool getDBConvert () const
 
void createTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
void free_page (std::pair< FileInfo *, int32_t > &&page)
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name)
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 

Private Member Functions

void openOrCreateEpochIfNotExists (int32_t db_id, int32_t tb_id)
 
void openAndReadEpochFileUnlocked (int32_t db_id, int32_t tb_id)
 
void incrementEpoch (int32_t db_id, int32_t tb_id)
 
void init (const size_t num_reader_threads)
 
void createEpochFileUnlocked (int32_t db_id, int32_t tb_id)
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 
std::string getOrAddTableDirUnlocked (int db_id, int tb_id)
 
void readTableDirs ()
 
void createBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt)
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 
void incrementAllEpochs ()
 
void removeTableDirectory (int32_t db_id, int32_t tb_id)
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 

Private Attributes

mapd_shared_mutex epochs_mutex_
 
std::map< TablePair,
std::unique_ptr< EpochInfo > > 
table_epochs_
 

Additional Inherited Members

- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 
- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char constCOPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char constUPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char constDELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr ()
 
FileInfocreateFile (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
void deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t, FileInfo * > files_
 
PageSizeFileMMap fileIndex_
 A map of files accessible via a file identifier. More...
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
size_t defaultPageSize_
 number of threads used when loading data More...
 
unsigned nextFileId_
 
int32_t db_version_
 the index of the next file id More...
 
int32_t fileMgrVersion_
 
const int32_t latestFileMgrVersion_ {1}
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
mapd_shared_mutex chunkIndexMutex_
 
mapd_shared_mutex files_rw_mutex_
 
mapd_shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 87 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const std::string &  base_path,
const size_t  num_reader_threads = 0,
const size_t  default_page_size = DEFAULT_PAGE_SIZE 
)

Definition at line 28 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::defaultPageSize_, File_Namespace::FileMgr::fileMgrBasePath_, init(), File_Namespace::FileMgr::maxRollbackEpochs_, and File_Namespace::FileMgr::nextFileId_.

30  {
31  fileMgrBasePath_ = base_path;
33  defaultPageSize_ = default_page_size;
34  nextFileId_ = 0;
35  init(num_reader_threads);
36 }
std::string fileMgrBasePath_
Definition: FileMgr.h:388
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:394
int32_t maxRollbackEpochs_
Definition: FileMgr.h:387
void init(const size_t num_reader_threads)

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
virtual

Definition at line 38 of file CachingFileMgr.cpp.

38 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes 
)
overridevirtual

allocates a new CachingFileBuffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 352 of file CachingFileMgr.cpp.

354  {
355  return new CachingFileBuffer(this, page_size, key, num_bytes);
356 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 266 of file CachingFileMgr.cpp.

References VLOG.

266  {
267  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
268  << ") epoch: " << epoch(db_id, tb_id);
269  writeDirtyBuffers(db_id, tb_id);
270  syncFilesToDisk();
271  writeAndSyncEpochToDisk(db_id, tb_id);
272  incrementEpoch(db_id, tb_id);
273  freePages();
274 }
std::string describeSelf() override
describes this FileMgr for logging purposes.
int32_t incrementEpoch()
Definition: FileMgr.h:275
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:638
int32_t epoch() const
Definition: FileMgr.h:500
#define VLOG(n)
Definition: Logger.h:297
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 176 of file CachingFileMgr.cpp.

176  {
177  removeTableBuffers(db_id, tb_id);
178  removeTableDirectory(db_id, tb_id);
179  freePages();
180 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
void removeTableDirectory(int32_t db_id, int32_t tb_id)
void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 201 of file CachingFileMgr.cpp.

201  {
202  mapd_unique_lock<mapd_shared_mutex> write_lock(files_rw_mutex_);
204  table_epochs_.clear();
205  auto dir_name = getFileMgrBasePath();
206  if (bf::exists(dir_name)) {
207  bf::remove_all(dir_name);
208  }
209 }
std::string getFileMgrBasePath() const
Definition: FileMgr.h:328
mapd_unique_lock< mapd_shared_mutex > write_lock
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:403
void File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
private

Definition at line 285 of file CachingFileMgr.cpp.

References CHECK, and get_table_prefix().

Referenced by init().

288  {
289  if (startIt->pageId == -1) {
290  // If the first pageId is not -1 then there is no metadata page for the
291  // current key (which means it was never checkpointed), so we should skip.
292 
293  // Need to acquire chunkIndexMutex_ lock first to avoid lock order cycles.
294  mapd_unique_lock<mapd_shared_mutex> chunk_lock(chunkIndexMutex_);
295  auto [db_id, tb_id] = get_table_prefix(key);
296  mapd_shared_lock<mapd_shared_mutex> epochs_lock(epochs_mutex_);
297  CHECK(table_epochs_.find({db_id, tb_id}) != table_epochs_.end());
298  CHECK(chunkIndex_.find(key) == chunkIndex_.end());
299  chunkIndex_[key] = new CachingFileBuffer(this, key, startIt, endIt);
300 
301  auto buffer = chunkIndex_.at(key);
302  if (buffer->isMissingPages()) {
303  // Detect the case where a page is missing by comparing the amount of pages read
304  // with the metadata size. If data are missing, discard the chunk.
305  buffer->freeChunkPages();
306  }
307  }
308 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:57
#define CHECK(condition)
Definition: Logger.h:203
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 276 of file CachingFileMgr.cpp.

References get_table_prefix().

278  {
279  auto [db_id, tb_id] = get_table_prefix(key);
280  // We need to have an epoch to correspond to each table for which we have buffers.
281  openOrCreateEpochIfNotExists(db_id, tb_id);
282  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
283 }
void openOrCreateEpochIfNotExists(int32_t db_id, int32_t tb_id)
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:705
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:57

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createEpochFileUnlocked ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 147 of file CachingFileMgr.cpp.

References Epoch::byte_size(), CHECK, and File_Namespace::create().

147  {
148  std::string epoch_file_path(getOrAddTableDirUnlocked(db_id, tb_id) + "/" +
150  CHECK(!bf::exists(epoch_file_path)) << "Can't create epoch file. File already exists";
151  TablePair table_pair{db_id, tb_id};
152  table_epochs_.emplace(
153  table_pair,
154  std::make_unique<EpochInfo>(create(epoch_file_path, sizeof(Epoch::byte_size()))));
155  writeAndSyncEpochToDisk(db_id, tb_id);
156  table_epochs_.at(table_pair)->increment();
157 }
FILE * create(const std::string &basePath, const int fileId, const size_t pageSize, const size_t numPages)
Definition: File.cpp:49
std::string getOrAddTableDirUnlocked(int db_id, int tb_id)
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:638
static constexpr char EPOCH_FILENAME[]
Definition: FileMgr.h:378
static size_t byte_size()
Definition: Epoch.h:63
#define CHECK(condition)
Definition: Logger.h:203
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:86
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 394 of file CachingFileMgr.cpp.

394  {
395  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
396  auto chunk_it = chunkIndex_.find(key);
397  if (chunk_it != chunkIndex_.end()) {
398  deleteBufferUnlocked(chunk_it);
399  }
400 }
void deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:736
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402
std::string File_Namespace::CachingFileMgr::describeSelf ( )
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 261 of file CachingFileMgr.cpp.

261  {
262  return "cache";
263 }
int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 100 of file CachingFileMgr.cpp.

References CHECK.

100  {
101  mapd_shared_lock<mapd_shared_mutex> read_lock(epochs_mutex_);
102  auto table_epoch_it = table_epochs_.find({db_id, tb_id});
103  CHECK(table_epoch_it != table_epochs_.end());
104  auto& [pair, epochInfo] = *table_epoch_it;
105  return static_cast<int32_t>(epochInfo->epoch.ceiling());
106 }
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:203
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_
bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 185 of file CachingFileMgr.h.

185 { return false; }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 109 of file CachingFileMgr.h.

References UNREACHABLE.

109  {
110  UNREACHABLE() << "Unimplemented";
111  return 0;
112  }
#define UNREACHABLE()
Definition: Logger.h:247
uint64_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int  db_id,
int  tb_id 
)

Set of functions to determine how much space is reserved in a table by type.

Definition at line 211 of file CachingFileMgr.cpp.

211  {
212  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
213  uint64_t space_used = 0;
214  ChunkKey min_table_key{db_id, tb_id};
215  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
216  for (auto it = chunkIndex_.lower_bound(min_table_key);
217  it != chunkIndex_.upper_bound(max_table_key);
218  ++it) {
219  auto& [key, buffer] = *it;
220  space_used += (buffer->numChunkPages() * defaultPageSize_);
221  }
222  return space_used;
223 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:394
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402
size_t File_Namespace::CachingFileMgr::getDefaultPageSize ( )
inline

Definition at line 97 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_.

97 { return defaultPageSize_; }
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:394
size_t File_Namespace::CachingFileMgr::getInUseSize ( )
inlineoverride

Definition at line 105 of file CachingFileMgr.h.

References UNREACHABLE.

105  {
106  UNREACHABLE() << "Unimplemented";
107  return 0;
108  }
#define UNREACHABLE()
Definition: Logger.h:247
size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 101 of file CachingFileMgr.h.

References UNREACHABLE.

101  {
102  UNREACHABLE() << "Unimplemented";
103  return 0;
104  }
#define UNREACHABLE()
Definition: Logger.h:247
uint64_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int  db_id,
int  tb_id 
)

Definition at line 225 of file CachingFileMgr.cpp.

References METADATA_PAGE_SIZE.

225  {
226  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
227  uint64_t space_used = 0;
228  ChunkKey min_table_key{db_id, tb_id};
229  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
230  for (auto it = chunkIndex_.lower_bound(min_table_key);
231  it != chunkIndex_.upper_bound(max_table_key);
232  ++it) {
233  auto& [key, buffer] = *it;
234  space_used += (buffer->numMetadataPages() * METADATA_PAGE_SIZE);
235  }
236  return space_used;
237 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402
MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 95 of file CachingFileMgr.h.

95 { return CACHING_FILE_MGR; };
std::string File_Namespace::CachingFileMgr::getOrAddTableDir ( int  db_id,
int  tb_id 
)

Returns (and optionally creates) a subdirectory for table-specific persistent data (e.g. serialized foreign data warppers).

Definition at line 182 of file CachingFileMgr.cpp.

182  {
183  mapd_unique_lock<mapd_shared_mutex> write_lock(files_rw_mutex_);
184  return getOrAddTableDirUnlocked(db_id, tb_id);
185 }
std::string getOrAddTableDirUnlocked(int db_id, int tb_id)
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:403
std::string File_Namespace::CachingFileMgr::getOrAddTableDirUnlocked ( int  db_id,
int  tb_id 
)
private

Definition at line 187 of file CachingFileMgr.cpp.

References logger::FATAL, File_Namespace::get_dir_name_for_table(), and LOG.

187  {
188  std::string table_dir =
189  getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
190  if (!bf::exists(table_dir)) {
191  bf::create_directory(table_dir);
192  } else {
193  if (!bf::is_directory(table_dir)) {
194  LOG(FATAL) << "Specified path '" << table_dir
195  << "' for cache table data is not a directory.";
196  }
197  }
198  return table_dir;
199 }
std::string get_dir_name_for_table(int db_id, int tb_id)
#define LOG(tag)
Definition: Logger.h:194
std::string getFileMgrBasePath() const
Definition: FileMgr.h:328

+ Here is the call graph for this function:

uint64_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int  db_id,
int  tb_id 
)

Definition at line 254 of file CachingFileMgr.cpp.

254  {
255  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
256  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
257  auto wrapper_space = getWrapperSpaceReservedByTable(db_id, tb_id);
258  return chunk_space + meta_space + wrapper_space;
259 }
uint64_t getMetadataSpaceReservedByTable(int db_id, int tb_id)
uint64_t getWrapperSpaceReservedByTable(int db_id, int tb_id)
uint64_t getChunkSpaceReservedByTable(int db_id, int tb_id)
std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 96 of file CachingFileMgr.h.

96 { return ToString(CACHING_FILE_MGR); }
uint64_t File_Namespace::CachingFileMgr::getWrapperSpaceReservedByTable ( int  db_id,
int  tb_id 
)

Definition at line 239 of file CachingFileMgr.cpp.

References omnisci::file_size(), and File_Namespace::get_dir_name_for_table().

239  {
240  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
241  uint64_t space_used = 0;
242  std::string table_dir =
243  getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
244  if (bf::exists(table_dir)) {
245  for (const auto& file : bf::recursive_directory_iterator(table_dir)) {
246  if (bf::is_regular_file(file.path())) {
247  space_used += bf::file_size(file.path());
248  }
249  }
250  }
251  return space_used;
252 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:328
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:403
size_t file_size(const int fd)
Definition: omnisci_fs.cpp:31

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 130 of file CachingFileMgr.h.

130 { return false; }
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Definition at line 323 of file CachingFileMgr.cpp.

Referenced by init().

323  {
324  mapd_shared_lock<mapd_shared_mutex> read_lock(epochs_mutex_);
325  for (auto& [key, epochInfo] : table_epochs_) {
326  epochInfo->increment();
327  }
328 }
mapd_shared_lock< mapd_shared_mutex > read_lock
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 108 of file CachingFileMgr.cpp.

References CHECK.

108  {
109  mapd_shared_lock<mapd_shared_mutex> read_lock(epochs_mutex_);
110  auto epochs_it = table_epochs_.find({db_id, tb_id});
111  CHECK(epochs_it != table_epochs_.end());
112  auto& [pair, epochInfo] = *epochs_it;
113  epochInfo->increment();
114 }
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:203
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Definition at line 40 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableDirs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

40  {
41  readTableDirs();
42  auto open_files_result = openFiles();
43  /* Sort headerVec so that all HeaderInfos
44  * from a chunk will be grouped together
45  * and in order of increasing PageId
46  * - Version Epoch */
47  auto& header_vec = open_files_result.header_infos;
48  std::sort(header_vec.begin(), header_vec.end());
49 
50  /* Goal of next section is to find sequences in the
51  * sorted headerVec of the same ChunkId, which we
52  * can then initiate a FileBuffer with */
53  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
54  if (header_vec.size() > 0) {
55  auto startIt = header_vec.begin();
56  ChunkKey lastChunkKey = startIt->chunkKey;
57  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
58  if (it->chunkKey != lastChunkKey) {
59  createBufferFromHeaders(lastChunkKey, startIt, it);
60  lastChunkKey = it->chunkKey;
61  startIt = it;
62  }
63  }
64  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
65  }
66 
67  nextFileId_ = open_files_result.max_file_id + 1;
69  freePages();
70  initializeNumThreads(num_reader_threads);
71  isFullyInitted_ = true;
72 }
std::vector< int > ChunkKey
Definition: types.h:37
OpenFilesResult openFiles()
Definition: FileMgr.cpp:189
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt)
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1544
#define VLOG(n)
Definition: Logger.h:297

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::isAllocationCapped ( )
inlineoverride

Definition at line 113 of file CachingFileMgr.h.

113 { return false; }
void File_Namespace::CachingFileMgr::openAndReadEpochFileUnlocked ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 124 of file CachingFileMgr.cpp.

References Epoch::byte_size(), CHECK, omnisci::file_size(), omnisci::open(), and File_Namespace::read().

124  {
125  TablePair table_pair{db_id, tb_id};
126  auto table_epoch_it = table_epochs_.find(table_pair);
127  if (table_epoch_it == table_epochs_.end()) {
128  std::string epoch_file_path(getOrAddTableDirUnlocked(db_id, tb_id) + "/" +
130  if (!bf::exists(epoch_file_path)) {
131  // Epoch file was missing or malformed. Create a new one.
132  createEpochFileUnlocked(db_id, tb_id);
133  return;
134  } else {
135  CHECK(bf::is_regular_file(epoch_file_path))
136  << "Found epoch file '" << epoch_file_path << "' which is not a regular file";
137  CHECK(bf::file_size(epoch_file_path) == Epoch::byte_size())
138  << "Found epoch file '" << epoch_file_path << "' which is not of expected size";
139  }
140  table_epochs_.emplace(table_pair, std::make_unique<EpochInfo>(open(epoch_file_path)));
141  }
142  table_epoch_it = table_epochs_.find(table_pair);
143  auto& [epoch, epoch_file, is_checkpointed] = *(table_epoch_it->second);
144  read(epoch_file, 0, Epoch::byte_size(), epoch.storage_ptr());
145 }
size_t read(FILE *f, const size_t offset, const size_t size, int8_t *buf)
Reads the specified number of bytes from the offset position in file f into buf.
Definition: File.cpp:133
std::string getOrAddTableDirUnlocked(int db_id, int tb_id)
static constexpr char EPOCH_FILENAME[]
Definition: FileMgr.h:378
static size_t byte_size()
Definition: Epoch.h:63
FILE * open(int fileId)
Opens/creates the file with the given id; returns NULL on error.
Definition: File.cpp:98
int32_t epoch() const
Definition: FileMgr.h:500
#define CHECK(condition)
Definition: Logger.h:203
void createEpochFileUnlocked(int32_t db_id, int32_t tb_id)
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:86
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_
size_t file_size(const int fd)
Definition: omnisci_fs.cpp:31

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::openOrCreateEpochIfNotExists ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 116 of file CachingFileMgr.cpp.

Referenced by readTableDirs().

116  {
117  mapd_unique_lock<mapd_shared_mutex> epoch_lock(epochs_mutex_);
118  TablePair table_pair{db_id, tb_id};
119  if (table_epochs_.find(table_pair) == table_epochs_.end()) {
120  openAndReadEpochFileUnlocked(db_id, tb_id);
121  }
122 }
void openAndReadEpochFileUnlocked(int32_t db_id, int32_t tb_id)
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:86
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 316 of file CachingFileMgr.cpp.

318  {
320  return FileMgr::putBuffer(key, src_buffer, num_bytes);
321 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:803
void File_Namespace::CachingFileMgr::readTableDirs ( )
private

Assumes a base directory exists. Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 78 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, File_Namespace::FileMgr::files_rw_mutex_, openOrCreateEpochIfNotExists(), and table_epochs_.

Referenced by init().

78  {
79  mapd_unique_lock<mapd_shared_mutex> write_lock(files_rw_mutex_);
80  bf::path path(fileMgrBasePath_);
81  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
82  CHECK(bf::is_directory(path))
83  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
84 
85  // Look for directories with table-specific names.
86  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
87  for (const auto& file : bf::directory_iterator(path)) {
88  boost::smatch match;
89  auto file_name = file.path().filename().string();
90  if (boost::regex_match(file_name, match, table_filter)) {
91  int32_t db_id = std::stoi(match[1]);
92  int32_t tb_id = std::stoi(match[2]);
93  CHECK(table_epochs_.find({db_id, tb_id}) == table_epochs_.end())
94  << "Trying to read epoch for existing table";
95  openOrCreateEpochIfNotExists(db_id, tb_id);
96  }
97  }
98 }
std::string fileMgrBasePath_
Definition: FileMgr.h:388
void openOrCreateEpochIfNotExists(int32_t db_id, int32_t tb_id)
#define CHECK(condition)
Definition: Logger.h:203
mapd_unique_lock< mapd_shared_mutex > write_lock
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:403

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 338 of file CachingFileMgr.cpp.

338  {
339  // Free associated FileBuffers and clear buffer entries.
340  mapd_unique_lock<mapd_shared_mutex> write_lock(chunkIndexMutex_);
341  ChunkKey min_table_key{db_id, tb_id};
342  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
343  for (auto it = chunkIndex_.lower_bound(min_table_key);
344  it != chunkIndex_.upper_bound(max_table_key);) {
345  auto& [key, buffer] = *it;
346  buffer->freePages();
347  delete buffer;
348  it = chunkIndex_.erase(it);
349  }
350 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402
void File_Namespace::CachingFileMgr::removeTableDirectory ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 330 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table().

330  {
331  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
332  mapd_unique_lock<mapd_shared_mutex> write_lock(epochs_mutex_);
333  table_epochs_.erase({db_id, tb_id});
334  auto dir_name = getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
335  bf::remove_all(dir_name);
336 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:328
mapd_unique_lock< mapd_shared_mutex > write_lock
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 359 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePageImmediate(), and File_Namespace::ROLLOFF_CONTINGENT.

363  {
364  // These contingents are stored by overwriting the bytes used for chunkKeys. If
365  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
366  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
367  // the key. At this point our only option is to free the page as though it was
368  // checkpointed (which should be fine since we only maintain one version of each
369  // page).
370  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
371  file_info->freePageImmediate(page_num);
372  return true;
373  }
374  return false;
375 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 159 of file CachingFileMgr.cpp.

References Epoch::byte_size(), CHECK, omnisci::fsync(), and File_Namespace::write().

159  {
160  auto epochs_it = table_epochs_.find({db_id, tb_id});
161  CHECK(epochs_it != table_epochs_.end());
162  auto& [pair, epoch_info] = *epochs_it;
163  auto& [epoch, epoch_file, is_checkpointed] = *epoch_info;
164  write(epoch_file, 0, Epoch::byte_size(), epoch.storage_ptr());
165  int32_t status = fflush(epoch_file);
166  CHECK(status == 0) << "Could not flush epoch file to disk";
167 #ifdef __APPLE__
168  status = fcntl(fileno(epoch_file), 51);
169 #else
170  status = omnisci::fsync(fileno(epoch_file));
171 #endif
172  CHECK(status == 0) << "Could not sync epoch file to disk";
173  is_checkpointed = true;
174 }
size_t write(FILE *f, const size_t offset, const size_t size, const int8_t *buf)
Writes the specified number of bytes to the offset position in file f from buf.
Definition: File.cpp:141
int fsync(int fd)
Definition: omnisci_fs.cpp:60
static size_t byte_size()
Definition: Epoch.h:63
int32_t epoch() const
Definition: FileMgr.h:500
#define CHECK(condition)
Definition: Logger.h:203
std::map< TablePair, std::unique_ptr< EpochInfo > > table_epochs_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Definition at line 377 of file CachingFileMgr.cpp.

377  {
378  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
379  ChunkKey min_table_key{db_id, tb_id};
380  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
381 
382  for (auto chunkIt = chunkIndex_.lower_bound(min_table_key);
383  chunkIt != chunkIndex_.upper_bound(max_table_key);
384  ++chunkIt) {
385  if (chunkIt->second->isDirty()) {
386  // Free previous versions first so we only have one metadata version.
387  chunkIt->second->freeMetadataPages();
388  chunkIt->second->writeMetadata(epoch(db_id, tb_id));
389  chunkIt->second->clearDirtyBits();
390  }
391  }
392 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:323
int32_t epoch() const
Definition: FileMgr.h:500
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:402

Member Data Documentation

mapd_shared_mutex File_Namespace::CachingFileMgr::epochs_mutex_
mutableprivate

Definition at line 212 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<EpochInfo> > File_Namespace::CachingFileMgr::table_epochs_
private

Definition at line 214 of file CachingFileMgr.h.

Referenced by readTableDirs().


The documentation for this class was generated from the following files: