OmniSciDB  a5dc49c757
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const DiskCacheConfig &config)
 
 ~CachingFileMgr () override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getPageSize ()
 
size_t getMaxSize () override
 
size_t getMaxDataFiles () const
 
size_t getMaxMetaFiles () const
 
size_t getMaxWrapperSize () const
 
size_t getDataFileSize () const
 
size_t getMetadataFileSize () const
 
size_t getNumDataFiles () const
 
size_t getNumMetaFiles () const
 
size_t getAvailableSpace ()
 
size_t getAvailableWrapperSpace ()
 
size_t getAllocated () override
 
size_t getMaxDataFilesSize () const
 
void removeChunkKeepMetadata (const ChunkKey &key)
 Free pages for chunk and remove it from the chunk eviction algorithm. More...
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
size_t getChunkSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getMetadataSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getTableFileMgrSpaceReserved (int32_t db_id, int32_t tb_id) const
 
size_t getSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
std::string describeSelf () const override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes=0) override
 allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms. More...
 
CachingFileBufferallocateBuffer (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt) override
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
size_t getNumChunksWithMetadata () const
 Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts. More...
 
size_t getNumDataChunks () const
 Returns the number of buffers with chunk data in the CFM. More...
 
std::vector< ChunkKeygetChunkKeysForPrefix (const ChunkKey &prefix) const
 Returns the keys for chunks with chunk data that match the given prefix. More...
 
std::unique_ptr< CachingFileMgrreconstruct () const
 Initializes a new CFM using the initialization values in the current CFM. More...
 
void deleteWrapperFile (int32_t db, int32_t tb)
 Deletes the wrapper file from a table subdir. More...
 
void writeWrapperFile (const std::string &doc, int32_t db, int32_t tb)
 Writes a wrapper file to a table subdir. More...
 
bool hasWrapperFile (int32_t db_id, int32_t table_id) const
 
std::string getTableFileMgrPath (int32_t db, int32_t tb) const
 
size_t getFilesSize () const
 Get the total size of page files (data and metadata files). This includes allocated, but unused space. More...
 
size_t getTableFileMgrsSize () const
 Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files. More...
 
std::optional< FileBuffer * > getBufferIfExists (const ChunkKey &key)
 an optional version of get buffer if we are not sure a chunk exists. More...
 
void free_page (std::pair< FileInfo *, int32_t > &&page) override
 Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
std::string dumpKeysWithMetadata () const
 
std::string dumpKeysWithChunkData () const
 
std::string dumpTableQueue () const
 
std::string dumpEvictionQueue () const
 
std::string dump () const
 
void setMaxNumDataFiles (size_t max)
 
void setMaxNumMetadataFiles (size_t max)
 
void setMaxWrapperSpace (size_t max)
 
std::set< ChunkKeygetKeysWithMetadata () const
 
void setDataSizeLimit (size_t max)
 
Page requestFreePage (size_t pagesize, const bool isMetadata) override
 requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available. More...
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1)
 Constructor. More...
 
 FileMgr (const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const bool run_core_init)
 
 FileMgr (GlobalFileMgr *gfm, std::string basePath)
 
 ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats () const
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId) const
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator) const
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
bool hasChunkMetadataForKeyPrefix (const ChunkKey &keyPrefix)
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch () const
 Returns value of epoch at last checkpoint. More...
 
void resetEpochFloor ()
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
bool getDBConvert () const
 Index for looking up chunks. More...
 
void createOrMigrateTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name) const
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 
size_t getPageSize () const
 
size_t getMetadataPageSize () const
 
FILE * createFile (const std::string &full_path, const size_t requested_file_size) const
 
std::pair< FILE *, std::string > createFile (const std::string &base_path, const int file_id, const size_t page_size, const size_t num_pages) const
 
size_t writeFile (FILE *f, const size_t offset, const size_t size, const int8_t *buf) const
 

Static Public Member Functions

static size_t getMinimumSize ()
 
- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 
static void renameAndSymlinkLegacyFiles (const std::string &table_data_dir)
 

Static Public Attributes

static constexpr char WRAPPER_FILE_NAME [] = "wrapper_metadata.json"
 
static constexpr float METADATA_SPACE_PERCENTAGE {0.1}
 
static constexpr float METADATA_FILE_SPACE_PERCENTAGE {0.01}
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char const * COPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char const * UPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char const * DELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 
static constexpr int32_t LATEST_FILE_MGR_VERSION = 2
 

Private Member Functions

void incrementEpoch (int32_t db_id, int32_t tb_id)
 Increments epoch for the given table. More...
 
void init (const size_t num_reader_threads)
 Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe). More...
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 Flushes epoch value to disk for a table. More...
 
void readTableFileMgrs ()
 Checks for any sub-directories containing table-specific data and creates epochs from found files. More...
 
FileBuffercreateBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
 Creates a buffer and initializes it with info read from files on disk. More...
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a buffer. More...
 
void createTableFileMgrIfNoneExists (const int32_t db_id, const int32_t tb_id)
 Create and initialize a subdirectory for a table if none exists. More...
 
void incrementAllEpochs ()
 Increment epochs for each table in the CFM. More...
 
void removeTableFileMgr (int32_t db_id, int32_t tb_id)
 Removes the subdirectory content for a table. More...
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 Erases and cleans up all buffers for a table. More...
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 helper function to flush all dirty buffers to disk. More...
 
void touchKey (const ChunkKey &key) const
 Used to track which tables/chunks were least recently used. More...
 
void removeKey (const ChunkKey &key) const
 
std::vector< ChunkKeygetKeysForTable (int32_t db_id, int32_t tb_id) const
 returns set of keys contained in chunkIndex_ that match the given table prefix. More...
 
FileInfoevictMetadataPages ()
 evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
FileInfoevictPages ()
 evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
void deleteCacheIfTooLarge ()
 When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space. More...
 
void setMaxSizes ()
 Sets the maximum number of files/space for each type of storage based on the maximum size. More...
 
FileBuffergetBufferUnlocked (const ChunkKey &key, const size_t numBytes=0) const override
 
ChunkKeyToChunkMap::iterator deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
 
void readOnlyCheck (const std::string &action, const std::optional< std::string > &file_name={}) const override
 

Private Attributes

heavyai::shared_mutex table_dirs_mutex_
 
std::map< TablePair,
std::unique_ptr< TableFileMgr > > 
table_dirs_
 
size_t max_num_data_files_
 
size_t max_num_meta_files_
 
size_t max_wrapper_space_
 
size_t max_size_
 
std::optional< size_t > limit_data_size_ {}
 
LRUEvictionAlgorithm chunk_evict_alg_
 
LRUEvictionAlgorithm table_evict_alg_
 

Additional Inherited Members

- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const size_t defaultPageSize, const size_t defaultMetadataPageSize)
 
FileInfocreateFileInfo (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
void migrateLegacyFilesV1 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t,
std::unique_ptr< FileInfo > > 
files_
 
PageSizeFileMMap fileIndex_
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
unsigned nextFileId_
 number of threads used when loading data More...
 
int32_t fileMgrVersion_
 the index of the next file id More...
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
heavyai::shared_mutex chunkIndexMutex_
 
heavyai::shared_mutex files_rw_mutex_
 
heavyai::shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
const size_t page_size_
 
const size_t metadata_page_size_
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 178 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const DiskCacheConfig config)

Definition at line 71 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::fileMgrBasePath_, init(), max_size_, File_Namespace::FileMgr::maxRollbackEpochs_, File_Namespace::FileMgr::nextFileId_, File_Namespace::DiskCacheConfig::num_reader_threads, File_Namespace::DiskCacheConfig::path, setMaxSizes(), and File_Namespace::DiskCacheConfig::size_limit.

72  : FileMgr(config.page_size, config.meta_page_size) {
73  fileMgrBasePath_ = config.path;
75  nextFileId_ = 0;
76  max_size_ = config.size_limit;
77  init(config.num_reader_threads);
78  setMaxSizes();
79 }
FileMgr(const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1)
Constructor.
Definition: FileMgr.cpp:51
void setMaxSizes()
Sets the maximum number of files/space for each type of storage based on the maximum size...
std::string fileMgrBasePath_
Definition: FileMgr.h:411
int32_t maxRollbackEpochs_
Definition: FileMgr.h:410
void init(const size_t num_reader_threads)
Initializes a CFM, parsing any existing files and initializing data structures appropriately (current...
unsigned nextFileId_
number of threads used when loading data
Definition: FileMgr.h:416

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
override

Definition at line 81 of file CachingFileMgr.cpp.

81 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes = 0 
)
overridevirtual

allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms.

Reimplemented from File_Namespace::FileMgr.

Definition at line 346 of file CachingFileMgr.cpp.

348  {
349  return new CachingFileBuffer(this, page_size, key, num_bytes);
350 }
CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  headerStartIt,
const std::vector< HeaderInfo >::const_iterator &  headerEndIt 
)
overridevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 352 of file CachingFileMgr.cpp.

355  {
356  return new CachingFileBuffer(this, key, headerStartIt, headerEndIt);
357 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 246 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

246  {
247  {
249  CHECK(table_dirs_.find({db_id, tb_id}) != table_dirs_.end()) << "No data for table";
250  }
251  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
252  << ") epoch: " << epoch(db_id, tb_id);
253  writeDirtyBuffers(db_id, tb_id);
254  syncFilesToDisk();
255  writeAndSyncEpochToDisk(db_id, tb_id);
256  incrementEpoch(db_id, tb_id);
257  freePages();
258 }
heavyai::shared_mutex table_dirs_mutex_
std::string describeSelf() const override
describes this FileMgr for logging purposes.
std::shared_lock< T > shared_lock
int32_t incrementEpoch()
Definition: FileMgr.h:285
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:659
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
int32_t epoch() const
Definition: FileMgr.h:530
#define CHECK(condition)
Definition: Logger.h:291
#define VLOG(n)
Definition: Logger.h:388
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 173 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::freePages(), removeTableBuffers(), and removeTableFileMgr().

173  {
174  removeTableBuffers(db_id, tb_id);
175  removeTableFileMgr(db_id, tb_id);
176  freePages();
177 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
Erases and cleans up all buffers for a table.
void removeTableFileMgr(int32_t db_id, int32_t tb_id)
Removes the subdirectory content for a table.

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 183 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::closePhysicalUnlocked(), File_Namespace::FileMgr::files_rw_mutex_, File_Namespace::FileMgr::getFileMgrBasePath(), table_dirs_, and table_dirs_mutex_.

183  {
184  {
187  }
188  {
190  table_dirs_.clear();
191  }
192  bf::remove_all(getFileMgrBasePath());
193 }
heavyai::shared_mutex table_dirs_mutex_
std::string getFileMgrBasePath() const
Definition: FileMgr.h:334
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:421

+ Here is the call graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
overrideprivatevirtual

Creates a buffer and initializes it with info read from files on disk.

Reimplemented from File_Namespace::FileMgr.

Definition at line 279 of file CachingFileMgr.cpp.

References get_table_prefix().

Referenced by init().

282  {
283  if (startIt->pageId != -1) {
284  // If the first pageId is not -1 then there is no metadata page for the
285  // current key (which means it was never checkpointed), so we should skip.
286  return nullptr;
287  }
288  touchKey(key);
289  auto [db_id, tb_id] = get_table_prefix(key);
290  createTableFileMgrIfNoneExists(db_id, tb_id);
291  auto buffer = FileMgr::createBufferFromHeaders(key, startIt, endIt);
292  if (buffer->isMissingPages()) {
293  // Detect the case where a page is missing by comparing the amount of pages read
294  // with the metadata size. If data are missing, discard the chunk.
295  buffer->freeChunkPages();
296  }
297  return buffer;
298 }
virtual FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt)
Definition: FileMgr.cpp:737
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Creates a buffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 270 of file CachingFileMgr.cpp.

References get_table_prefix().

272  {
273  touchKey(key);
274  auto [db_id, tb_id] = get_table_prefix(key);
275  createTableFileMgrIfNoneExists(db_id, tb_id);
276  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
277 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:726
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createTableFileMgrIfNoneExists ( const int32_t  db_id,
const int32_t  tb_id 
)
private

Create and initialize a subdirectory for a table if none exists.

Definition at line 260 of file CachingFileMgr.cpp.

261  {
263  TablePair table_pair{db_id, tb_id};
264  if (table_dirs_.find(table_pair) == table_dirs_.end()) {
265  table_dirs_.emplace(
266  table_pair, std::make_unique<TableFileMgr>(getTableFileMgrPath(db_id, tb_id)));
267  }
268 }
heavyai::shared_mutex table_dirs_mutex_
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:98
std::string getTableFileMgrPath(int32_t db, int32_t tb) const
void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 396 of file CachingFileMgr.cpp.

396  {
398  auto chunk_it = chunkIndex_.find(key);
399  if (chunk_it != chunkIndex_.end()) {
400  deleteBufferUnlocked(chunk_it);
401  }
402 }
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
ChunkKeyToChunkMap::iterator File_Namespace::CachingFileMgr::deleteBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const bool  purge = true 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 713 of file CachingFileMgr.cpp.

715  {
716  removeKey(chunk_it->first);
717  return FileMgr::deleteBufferUnlocked(chunk_it, purge);
718 }
virtual ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:760
void removeKey(const ChunkKey &key) const
void File_Namespace::CachingFileMgr::deleteCacheIfTooLarge ( )
private

When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

Definition at line 415 of file CachingFileMgr.cpp.

References logger::INFO, LOG, and anonymous_namespace{CachingFileMgr.cpp}::size_of_dir().

Referenced by init().

415  {
418  bf::create_directory(fileMgrBasePath_);
419  LOG(INFO) << "Cache path over limit. Existing cache deleted.";
420  }
421 }
size_t size_of_dir(const std::string &dir)
#define LOG(tag)
Definition: Logger.h:285
void closeRemovePhysical() override
Closes files and removes the caching directory.
std::string fileMgrBasePath_
Definition: FileMgr.h:411

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::deleteWrapperFile ( int32_t  db,
int32_t  tb 
)

Deletes the wrapper file from a table subdir.

Definition at line 652 of file CachingFileMgr.cpp.

References CHECK.

652  {
654  auto it = table_dirs_.find({db, tb});
655  CHECK(it != table_dirs_.end()) << "Wrapper does not exist.";
656  it->second->deleteWrapperFile();
657 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
std::string File_Namespace::CachingFileMgr::describeSelf ( ) const
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 241 of file CachingFileMgr.cpp.

241  {
242  return "cache";
243 }
std::string File_Namespace::CachingFileMgr::dump ( ) const

Definition at line 58 of file CachingFileMgr.cpp.

References chunk_evict_alg_, File_Namespace::FileMgr::chunkIndex_, LRUEvictionAlgorithm::dumpEvictionQueue(), show_chunk(), and table_evict_alg_.

58  {
59  std::stringstream ss;
60  ss << "Dump Cache:\n";
61  for (const auto& [key, buf] : chunkIndex_) {
62  ss << " " << show_chunk(key) << " num_pages: " << buf->pageCount()
63  << ", is dirty: " << buf->isDirty() << "\n";
64  }
65  ss << "Data Eviction Queue:\n" << chunk_evict_alg_.dumpEvictionQueue();
66  ss << "Metadata Eviction Queue:\n" << table_evict_alg_.dumpEvictionQueue();
67  ss << "\n";
68  return ss.str();
69 }
LRUEvictionAlgorithm table_evict_alg_
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpEvictionQueue ( ) const
inline

Definition at line 373 of file CachingFileMgr.h.

References chunk_evict_alg_, and LRUEvictionAlgorithm::dumpEvictionQueue().

LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithChunkData ( ) const

Definition at line 631 of file CachingFileMgr.cpp.

References show_chunk().

631  {
633  std::string ret_string = "CFM keys with chunk data:\n";
634  for (const auto& [key, buf] : chunkIndex_) {
635  if (buf->hasDataPages()) {
636  ret_string += " " + show_chunk(key) + "\n";
637  }
638  }
639  return ret_string;
640 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithMetadata ( ) const

Definition at line 620 of file CachingFileMgr.cpp.

References show_chunk().

620  {
622  std::string ret_string = "CFM keys with metadata:\n";
623  for (const auto& [key, buf] : chunkIndex_) {
624  if (buf->hasEncoder()) {
625  ret_string += " " + show_chunk(key) + "\n";
626  }
627  }
628  return ret_string;
629 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpTableQueue ( ) const
inline

Definition at line 372 of file CachingFileMgr.h.

References LRUEvictionAlgorithm::dumpEvictionQueue(), and table_evict_alg_.

LRUEvictionAlgorithm table_evict_alg_

+ Here is the call graph for this function:

int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 141 of file CachingFileMgr.cpp.

References Epoch::min_allowable_epoch(), table_dirs_, and table_dirs_mutex_.

141  {
143  auto tables_it = table_dirs_.find({db_id, tb_id});
144  if (tables_it == table_dirs_.end()) {
145  // If there is no directory for this table, that means the cache does not recognize
146  // the table that is requested. This can happen if a table was dropped, and it's
147  // pages were invalidated but not yet freed and then the server crashed before they
148  // were freed. Upon re-starting the FileMgr will find these pages and attempt to
149  // compare their epoch to know if they are valid or not. In this case we should
150  // return an invalid epoch to indicate that any page for this table is not valid and
151  // should be freed.
153  }
154  auto& [pair, table_dir] = *tables_it;
155  return table_dir->getEpoch();
156 }
heavyai::shared_mutex table_dirs_mutex_
static int64_t min_allowable_epoch()
Definition: Epoch.h:65
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictMetadataPages ( )
private

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 475 of file CachingFileMgr.cpp.

References CHECK, anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail(), and get_table_prefix().

475  {
476  // Locks should already be in place before calling this method.
477  FileInfo* file_info{nullptr};
478  auto key_to_evict = evict_chunk_or_fail(table_evict_alg_);
479  auto [db_id, tb_id] = get_table_prefix(key_to_evict);
480  const auto keys = getKeysForTable(db_id, tb_id);
481  for (const auto& key : keys) {
482  auto chunk_it = chunkIndex_.find(key);
483  CHECK(chunk_it != chunkIndex_.end());
484  auto& buf = chunk_it->second;
485  if (!file_info) {
486  // Return the FileInfo for the first file we are freeing a page from so that the
487  // caller does not have to search for a FileInfo guaranteed to have at least one
488  // free page.
489  CHECK(buf->getMetadataPage().pageVersions.size() > 0);
490  file_info =
491  getFileInfoForFileId(buf->getMetadataPage().pageVersions.front().page.fileId);
492  }
493  // We erase all pages and entries for the chunk, as without metadata all other
494  // entries are useless.
495  deleteBufferUnlocked(chunk_it);
496  }
497  // Serialized datawrappers require metadata to be in the cache.
498  deleteWrapperFile(db_id, tb_id);
499  CHECK(file_info) << "FileInfo with freed page not found";
500  return file_info;
501 }
LRUEvictionAlgorithm table_evict_alg_
void deleteWrapperFile(int32_t db, int32_t tb)
Deletes the wrapper file from a table subdir.
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
std::vector< ChunkKey > getKeysForTable(int32_t db_id, int32_t tb_id) const
returns set of keys contained in chunkIndex_ that match the given table prefix.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
#define CHECK(condition)
Definition: Logger.h:291
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:229

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictPages ( )
private

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 503 of file CachingFileMgr.cpp.

References CHECK, and anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail().

503  {
504  FileInfo* file_info{nullptr};
505  FileBuffer* buf{nullptr};
506  while (!file_info) {
508  CHECK(buf);
509  if (!buf->hasDataPages()) {
510  // This buffer contains no chunk data (metadata only, uninitialized, size == 0,
511  // etc...) so we won't recover any space by evicting it. In this case it gets
512  // removed from the eviction queue (it will get re-added if it gets populated with
513  // data) and we look at the next chunk in queue until we find a buffer with page
514  // data.
515  continue;
516  }
517  // Return the FileInfo for the first file we are freeing a page from so that the
518  // caller does not have to search for a FileInfo guaranteed to have at least one free
519  // page.
520  CHECK(buf->getMultiPage().front().pageVersions.size() > 0);
521  file_info = getFileInfoForFileId(
522  buf->getMultiPage().front().pageVersions.front().page.fileId);
523  }
524  auto pages_freed = buf->freeChunkPages();
525  CHECK(pages_freed > 0) << "failed to evict a page";
526  CHECK(file_info) << "FileInfo with freed page not found";
527  return file_info;
528 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
#define CHECK(condition)
Definition: Logger.h:291
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:229
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 298 of file CachingFileMgr.h.

298 { return false; }
void File_Namespace::CachingFileMgr::free_page ( std::pair< FileInfo *, int32_t > &&  page)
overridevirtual

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

Reimplemented from File_Namespace::FileMgr.

Definition at line 735 of file CachingFileMgr.cpp.

735  {
736  page.first->freePageDeferred(page.second);
737 }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 218 of file CachingFileMgr.h.

References getFilesSize(), and getTableFileMgrsSize().

Referenced by getAvailableSpace().

218  {
219  return getFilesSize() + getTableFileMgrsSize();
220  }
size_t getFilesSize() const
Get the total size of page files (data and metadata files). This includes allocated, but unused space.
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableSpace ( )
inline

Definition at line 214 of file CachingFileMgr.h.

References getAllocated(), and max_size_.

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableWrapperSpace ( )
inline

Definition at line 215 of file CachingFileMgr.h.

References getTableFileMgrsSize(), and max_wrapper_space_.

215  {
217  }
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

std::optional< FileBuffer * > File_Namespace::CachingFileMgr::getBufferIfExists ( const ChunkKey key)

an optional version of get buffer if we are not sure a chunk exists.

Definition at line 704 of file CachingFileMgr.cpp.

704  {
706  auto chunk_it = chunkIndex_.find(key);
707  if (chunk_it == chunkIndex_.end()) {
708  return {};
709  }
710  return getBufferUnlocked(key);
711 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const override
FileBuffer * File_Namespace::CachingFileMgr::getBufferUnlocked ( const ChunkKey key,
const size_t  numBytes = 0 
) const
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 729 of file CachingFileMgr.cpp.

730  {
731  touchKey(key);
732  return FileMgr::getBufferUnlocked(key, num_bytes);
733 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
virtual FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const
Definition: FileMgr.cpp:790
std::vector< ChunkKey > File_Namespace::CachingFileMgr::getChunkKeysForPrefix ( const ChunkKey prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

Definition at line 582 of file CachingFileMgr.cpp.

References in_same_table().

583  {
585  std::vector<ChunkKey> chunks;
586  for (auto [key, buf] : chunkIndex_) {
587  if (in_same_table(key, prefix)) {
588  if (buf->hasDataPages()) {
589  chunks.emplace_back(key);
590  touchKey(key);
591  }
592  }
593  }
594  return chunks;
595 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
bool in_same_table(const ChunkKey &left_key, const ChunkKey &right_key)
Definition: types.h:83

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::getChunkMetadataVecForKeyPrefix ( ChunkMetadataVector chunkMetadataVec,
const ChunkKey keyPrefix 
)
override

Definition at line 720 of file CachingFileMgr.cpp.

722  {
723  FileMgr::getChunkMetadataVecForKeyPrefix(chunkMetadataVec, keyPrefix);
724  for (const auto& [key, meta] : chunkMetadataVec) {
725  touchKey(key);
726  }
727 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
Definition: FileMgr.cpp:1021
size_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Set of functions to determine how much space is reserved in a table by type.

Definition at line 195 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::page_size_.

Referenced by getSpaceReservedByTable().

195  {
197  size_t space_used = 0;
198  ChunkKey min_table_key{db_id, tb_id};
199  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
200  for (auto it = chunkIndex_.lower_bound(min_table_key);
201  it != chunkIndex_.upper_bound(max_table_key);
202  ++it) {
203  auto& [key, buffer] = *it;
204  space_used += (buffer->numChunkPages() * page_size_);
205  }
206  return space_used;
207 }
std::vector< int > ChunkKey
Definition: types.h:36
const size_t page_size_
Definition: FileMgr.h:551
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getDataFileSize ( ) const
inline
size_t File_Namespace::CachingFileMgr::getFilesSize ( ) const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

Definition at line 554 of file CachingFileMgr.cpp.

Referenced by getAllocated().

554  {
556  size_t sum = 0;
557  for (const auto& [id, file] : files_) {
558  sum += file->size();
559  }
560  return sum;
561 }
std::shared_lock< T > shared_lock
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:421
std::map< int32_t, std::unique_ptr< FileInfo > > files_
Definition: FileMgr.h:413

+ Here is the caller graph for this function:

std::vector< ChunkKey > File_Namespace::CachingFileMgr::getKeysForTable ( int32_t  db_id,
int32_t  tb_id 
) const
private

returns set of keys contained in chunkIndex_ that match the given table prefix.

Definition at line 462 of file CachingFileMgr.cpp.

463  {
464  std::vector<ChunkKey> keys;
465  ChunkKey min_table_key{db_id, tb_id};
466  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
467  for (auto it = chunkIndex_.lower_bound(min_table_key);
468  it != chunkIndex_.upper_bound(max_table_key);
469  ++it) {
470  keys.emplace_back(it->first);
471  }
472  return keys;
473 }
std::vector< int > ChunkKey
Definition: types.h:36
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
std::set< ChunkKey > File_Namespace::CachingFileMgr::getKeysWithMetadata ( ) const

Definition at line 739 of file CachingFileMgr.cpp.

739  {
741  std::set<ChunkKey> ret;
742  for (const auto& [key, buf] : chunkIndex_) {
743  if (buf->hasEncoder()) {
744  ret.emplace(key);
745  }
746  }
747  return ret;
748 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
size_t File_Namespace::CachingFileMgr::getMaxDataFiles ( ) const
inline

Definition at line 204 of file CachingFileMgr.h.

References max_num_data_files_.

size_t File_Namespace::CachingFileMgr::getMaxDataFilesSize ( ) const

Definition at line 750 of file CachingFileMgr.cpp.

750  {
751  if (limit_data_size_) {
752  return *limit_data_size_;
753  }
754  return getMaxDataFiles() * getDataFileSize();
755 }
std::optional< size_t > limit_data_size_
size_t File_Namespace::CachingFileMgr::getMaxMetaFiles ( ) const
inline

Definition at line 205 of file CachingFileMgr.h.

References max_num_meta_files_.

size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 203 of file CachingFileMgr.h.

References max_size_.

203 { return max_size_; }
size_t File_Namespace::CachingFileMgr::getMaxWrapperSize ( ) const
inline

Definition at line 206 of file CachingFileMgr.h.

References max_wrapper_space_.

size_t File_Namespace::CachingFileMgr::getMetadataFileSize ( ) const
inline
size_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 209 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::metadata_page_size_.

Referenced by getSpaceReservedByTable().

210  {
212  size_t space_used = 0;
213  ChunkKey min_table_key{db_id, tb_id};
214  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
215  for (auto it = chunkIndex_.lower_bound(min_table_key);
216  it != chunkIndex_.upper_bound(max_table_key);
217  ++it) {
218  auto& [key, buffer] = *it;
219  space_used += (buffer->numMetadataPages() * metadata_page_size_);
220  }
221  return space_used;
222 }
const size_t metadata_page_size_
Definition: FileMgr.h:552
std::vector< int > ChunkKey
Definition: types.h:36
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420

+ Here is the caller graph for this function:

MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 200 of file CachingFileMgr.h.

200 { return CACHING_FILE_MGR; };
static size_t File_Namespace::CachingFileMgr::getMinimumSize ( )
inlinestatic

Definition at line 188 of file CachingFileMgr.h.

References DEFAULT_METADATA_PAGE_SIZE, File_Namespace::FileMgr::DEFAULT_NUM_PAGES_PER_METADATA_FILE, and METADATA_FILE_SPACE_PERCENTAGE.

Referenced by CommandLineOptions::validate().

188  {
189  // Currently the minimum default size is based on the metadata file size and
190  // percentage usage.
193  }
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE
Definition: FileMgr.h:385
#define DEFAULT_METADATA_PAGE_SIZE
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getNumChunksWithMetadata ( ) const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

Definition at line 609 of file CachingFileMgr.cpp.

609  {
611  size_t sum = 0;
612  for (const auto& [key, buf] : chunkIndex_) {
613  if (buf->hasEncoder()) {
614  sum++;
615  }
616  }
617  return sum;
618 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
size_t File_Namespace::CachingFileMgr::getNumDataChunks ( ) const

Returns the number of buffers with chunk data in the CFM.

Definition at line 404 of file CachingFileMgr.cpp.

404  {
406  size_t num_chunks = 0;
407  for (auto [key, buf] : chunkIndex_) {
408  if (buf->hasDataPages()) {
409  num_chunks++;
410  }
411  }
412  return num_chunks;
413 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
size_t File_Namespace::CachingFileMgr::getNumDataFiles ( ) const

Definition at line 572 of file CachingFileMgr.cpp.

572  {
574  return fileIndex_.count(page_size_);
575 }
const size_t page_size_
Definition: FileMgr.h:551
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
Definition: FileMgr.h:414
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:421
size_t File_Namespace::CachingFileMgr::getNumMetaFiles ( ) const

Definition at line 577 of file CachingFileMgr.cpp.

577  {
579  return fileIndex_.count(metadata_page_size_);
580 }
const size_t metadata_page_size_
Definition: FileMgr.h:552
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
Definition: FileMgr.h:414
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:421
size_t File_Namespace::CachingFileMgr::getPageSize ( )
inline

Definition at line 202 of file CachingFileMgr.h.

References File_Namespace::FileMgr::page_size_.

202 { return page_size_; }
const size_t page_size_
Definition: FileMgr.h:551
size_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 234 of file CachingFileMgr.cpp.

References getChunkSpaceReservedByTable(), getMetadataSpaceReservedByTable(), and getTableFileMgrSpaceReserved().

234  {
235  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
236  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
237  auto subdir_space = getTableFileMgrSpaceReserved(db_id, tb_id);
238  return chunk_space + meta_space + subdir_space;
239 }
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 201 of file CachingFileMgr.h.

201 { return ToString(CACHING_FILE_MGR); }
std::string File_Namespace::CachingFileMgr::getTableFileMgrPath ( int32_t  db,
int32_t  tb 
) const

Definition at line 179 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table(), and File_Namespace::FileMgr::getFileMgrBasePath().

179  {
180  return getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
181 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:334

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrSpaceReserved ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 224 of file CachingFileMgr.cpp.

References table_dirs_, and table_dirs_mutex_.

Referenced by getSpaceReservedByTable().

224  {
226  size_t space = 0;
227  auto table_it = table_dirs_.find({db_id, tb_id});
228  if (table_it != table_dirs_.end()) {
229  space += table_it->second->getReservedSpace();
230  }
231  return space;
232 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrsSize ( ) const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

Definition at line 563 of file CachingFileMgr.cpp.

Referenced by getAllocated(), and getAvailableWrapperSpace().

563  {
565  size_t space_used = 0;
566  for (const auto& [pair, table_dir] : table_dirs_) {
567  space_used += table_dir->getReservedSpace();
568  }
569  return space_used;
570 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 237 of file CachingFileMgr.h.

237 { return false; }
bool File_Namespace::CachingFileMgr::hasWrapperFile ( int32_t  db_id,
int32_t  table_id 
) const

Checks if data wrapper file has been written to disk/cached.

Definition at line 671 of file CachingFileMgr.cpp.

671  {
673  auto it = table_dirs_.find({db_id, table_id});
674  if (it != table_dirs_.end()) {
675  return it->second->hasWrapperFile();
676  }
677  return false;
678 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Increment epochs for each table in the CFM.

Definition at line 318 of file CachingFileMgr.cpp.

Referenced by init().

318  {
320  for (auto& table_dir : table_dirs_) {
321  table_dir.second->incrementEpoch();
322  }
323 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Increments epoch for the given table.

Definition at line 158 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

158  {
160  auto tables_it = table_dirs_.find({db_id, tb_id});
161  CHECK(tables_it != table_dirs_.end());
162  auto& [pair, table_dir] = *tables_it;
163  table_dir->incrementEpoch();
164 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

Definition at line 83 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), deleteCacheIfTooLarge(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableFileMgrs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

83  {
86  auto open_files_result = openFiles();
87  /* Sort headerVec so that all HeaderInfos
88  * from a chunk will be grouped together
89  * and in order of increasing PageId
90  * - Version Epoch */
91  auto& header_vec = open_files_result.header_infos;
92  std::sort(header_vec.begin(), header_vec.end());
93 
94  /* Goal of next section is to find sequences in the
95  * sorted headerVec of the same ChunkId, which we
96  * can then initiate a FileBuffer with */
97  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
98  if (header_vec.size() > 0) {
99  auto startIt = header_vec.begin();
100  ChunkKey lastChunkKey = startIt->chunkKey;
101  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
102  if (it->chunkKey != lastChunkKey) {
103  createBufferFromHeaders(lastChunkKey, startIt, it);
104  lastChunkKey = it->chunkKey;
105  startIt = it;
106  }
107  }
108  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
109  }
110  nextFileId_ = open_files_result.max_file_id + 1;
112  freePages();
113  initializeNumThreads(num_reader_threads);
114  isFullyInitted_ = true;
115 }
std::vector< int > ChunkKey
Definition: types.h:36
OpenFilesResult openFiles()
Definition: FileMgr.cpp:200
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void deleteCacheIfTooLarge()
When the cache is read from disk, we don&#39;t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.
void incrementAllEpochs()
Increment epochs for each table in the CFM.
void readTableFileMgrs()
Checks for any sub-directories containing table-specific data and creates epochs from found files...
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1604
unsigned nextFileId_
number of threads used when loading data
Definition: FileMgr.h:416
#define VLOG(n)
Definition: Logger.h:388
FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
Creates a buffer and initializes it with info read from files on disk.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 306 of file CachingFileMgr.cpp.

References CHECK, Data_Namespace::AbstractBuffer::isDirty(), Data_Namespace::AbstractBuffer::setAppended(), Data_Namespace::AbstractBuffer::setDirty(), and Data_Namespace::AbstractBuffer::size().

308  {
309  CHECK(!src_buffer->isDirty()) << "Cannot cache dirty buffers.";
311  // Since the buffer is not dirty we mark it as dirty if we are only writing metadata and
312  // appended if we are writing chunk data. We delete + append rather than write to make
313  // sure we don't write multiple page versions.
314  (src_buffer->size() == 0) ? src_buffer->setDirty() : src_buffer->setAppended();
315  return FileMgr::putBuffer(key, src_buffer, num_bytes);
316 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
#define CHECK(condition)
Definition: Logger.h:291
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:816

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::readOnlyCheck ( const std::string &  action,
const std::optional< std::string > &  file_name = {} 
) const
inlineoverrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 499 of file CachingFileMgr.h.

500  {}) const override{};
void File_Namespace::CachingFileMgr::readTableFileMgrs ( )
private

Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 117 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, table_dirs_, and table_dirs_mutex_.

Referenced by init().

117  {
119  bf::path path(fileMgrBasePath_);
120  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
121  CHECK(bf::is_directory(path))
122  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
123 
124  // Look for directories with table-specific names.
125  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
126  for (const auto& file : bf::directory_iterator(path)) {
127  boost::smatch match;
128  auto file_name = file.path().filename().string();
129  if (boost::regex_match(file_name, match, table_filter)) {
130  int32_t db_id = std::stoi(match[1]);
131  int32_t tb_id = std::stoi(match[2]);
132  TablePair table_pair{db_id, tb_id};
133  CHECK(table_dirs_.find(table_pair) == table_dirs_.end())
134  << "Trying to read data for existing table";
135  table_dirs_.emplace(table_pair,
136  std::make_unique<TableFileMgr>(file.path().string()));
137  }
138  }
139 }
heavyai::shared_mutex table_dirs_mutex_
std::string fileMgrBasePath_
Definition: FileMgr.h:411
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:98

+ Here is the caller graph for this function:

std::unique_ptr< CachingFileMgr > File_Namespace::CachingFileMgr::reconstruct ( ) const

Initializes a new CFM using the initialization values in the current CFM.

Definition at line 642 of file CachingFileMgr.cpp.

642  {
643  DiskCacheConfig config{fileMgrBasePath_,
646  max_size_,
647  page_size_,
649  return std::make_unique<CachingFileMgr>(config);
650 }
const size_t metadata_page_size_
Definition: FileMgr.h:552
const size_t page_size_
Definition: FileMgr.h:551
std::string fileMgrBasePath_
Definition: FileMgr.h:411
size_t num_reader_threads_
Maps page sizes to FileInfo objects.
Definition: FileMgr.h:415
void File_Namespace::CachingFileMgr::removeChunkKeepMetadata ( const ChunkKey key)

Free pages for chunk and remove it from the chunk eviction algorithm.

Definition at line 597 of file CachingFileMgr.cpp.

References CHECK.

597  {
598  if (isBufferOnDevice(key)) {
599  auto chunkIt = chunkIndex_.find(key);
600  CHECK(chunkIt != chunkIndex_.end());
601  auto& buf = chunkIt->second;
602  if (buf->hasDataPages()) {
603  buf->freeChunkPages();
605  }
606  }
607 }
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
bool isBufferOnDevice(const ChunkKey &key) override
Definition: FileMgr.cpp:748
#define CHECK(condition)
Definition: Logger.h:291
LRUEvictionAlgorithm chunk_evict_alg_
void File_Namespace::CachingFileMgr::removeKey ( const ChunkKey key) const
private

Definition at line 535 of file CachingFileMgr.cpp.

References get_table_prefix().

535  {
536  // chunkIndex lock should already be acquired.
538  auto [db_id, tb_id] = get_table_prefix(key);
539  ChunkKey table_key{db_id, tb_id};
540  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
541  for (auto it = chunkIndex_.lower_bound(table_key);
542  it != chunkIndex_.upper_bound(max_table_key);
543  ++it) {
544  if (it->first != key) {
545  // If there are any keys in this table other than that one we are removing, then
546  // keep the table in the eviction queue.
547  return;
548  }
549  }
550  // No other keys exist for this table, so remove it from the queue.
551  table_evict_alg_.removeChunk(table_key);
552 }
std::vector< int > ChunkKey
Definition: types.h:36
LRUEvictionAlgorithm table_evict_alg_
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Erases and cleans up all buffers for a table.

Definition at line 335 of file CachingFileMgr.cpp.

Referenced by clearForTable().

335  {
336  // Free associated FileBuffers and clear buffer entries.
338  ChunkKey min_table_key{db_id, tb_id};
339  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
340  for (auto it = chunkIndex_.lower_bound(min_table_key);
341  it != chunkIndex_.upper_bound(max_table_key);) {
342  it = deleteBufferUnlocked(it);
343  }
344 }
std::vector< int > ChunkKey
Definition: types.h:36
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableFileMgr ( int32_t  db_id,
int32_t  tb_id 
)
private

Removes the subdirectory content for a table.

Definition at line 325 of file CachingFileMgr.cpp.

Referenced by clearForTable().

325  {
326  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
328  auto it = table_dirs_.find({db_id, tb_id});
329  if (it != table_dirs_.end()) {
330  it->second->removeDiskContent();
331  table_dirs_.erase(it);
332  }
333 }
heavyai::shared_mutex table_dirs_mutex_
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

Page File_Namespace::CachingFileMgr::requestFreePage ( size_t  pagesize,
const bool  isMetadata 
)
overridevirtual

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

Reimplemented from File_Namespace::FileMgr.

Definition at line 423 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileInfo::fileId, and File_Namespace::FileInfo::getFreePage().

423  {
424  std::lock_guard<std::mutex> lock(getPageMutex_);
425  int32_t pageNum = -1;
426  // Splits files into metadata and regular data by size.
427  auto candidateFiles = fileIndex_.equal_range(pageSize);
428  // Check if there is a free page in an existing file.
429  for (auto fileIt = candidateFiles.first; fileIt != candidateFiles.second; ++fileIt) {
430  FileInfo* fileInfo = getFileInfoForFileId(fileIt->second);
431  pageNum = fileInfo->getFreePage();
432  if (pageNum != -1) {
433  return (Page(fileInfo->fileId, pageNum));
434  }
435  }
436 
437  // Try to add a new file if there is free space available.
438  FileInfo* fileInfo = nullptr;
439  if (isMetadata) {
440  if (getMaxMetaFiles() > getNumMetaFiles()) {
441  fileInfo = createFileInfo(pageSize, num_pages_per_metadata_file_);
442  }
443  } else {
444  if (getMaxDataFiles() > getNumDataFiles()) {
445  fileInfo = createFileInfo(pageSize, num_pages_per_data_file_);
446  }
447  }
448 
449  if (!fileInfo) {
450  // We were not able to create a new file, so we try to evict space.
451  // Eviction will return the first file it evicted a page from (a file now guaranteed
452  // to have a free page).
453  fileInfo = isMetadata ? evictMetadataPages() : evictPages();
454  }
455  CHECK(fileInfo);
456 
457  pageNum = fileInfo->getFreePage();
458  CHECK(pageNum != -1);
459  return (Page(fileInfo->fileId, pageNum));
460 }
FileInfo * createFileInfo(const size_t pageSize, const size_t numPages)
Adds a file to the file manager repository.
Definition: FileMgr.cpp:968
std::mutex getPageMutex_
pointer to DB level metadata
Definition: FileMgr.h:419
static size_t num_pages_per_data_file_
Definition: FileMgr.h:427
FileInfo * evictPages()
evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first F...
PageSizeFileMMap fileIndex_
Definition: FileMgr.h:414
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:428
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
#define CHECK(condition)
Definition: Logger.h:291
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:229

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::setDataSizeLimit ( size_t  max)
inline

Definition at line 381 of file CachingFileMgr.h.

References limit_data_size_.

381 { limit_data_size_ = max; }
std::optional< size_t > limit_data_size_
void File_Namespace::CachingFileMgr::setMaxNumDataFiles ( size_t  max)
inline

Definition at line 377 of file CachingFileMgr.h.

References max_num_data_files_.

void File_Namespace::CachingFileMgr::setMaxNumMetadataFiles ( size_t  max)
inline

Definition at line 378 of file CachingFileMgr.h.

References max_num_meta_files_.

void File_Namespace::CachingFileMgr::setMaxSizes ( )
private

Sets the maximum number of files/space for each type of storage based on the maximum size.

Definition at line 689 of file CachingFileMgr.cpp.

References CHECK_GT.

Referenced by CachingFileMgr().

689  {
690  size_t max_meta_space = std::floor(max_size_ * METADATA_SPACE_PERCENTAGE);
691  size_t max_meta_file_space = std::floor(max_size_ * METADATA_FILE_SPACE_PERCENTAGE);
692  max_wrapper_space_ = max_meta_space - max_meta_file_space;
693  auto max_data_space = max_size_ - max_meta_space;
694  auto meta_file_size = metadata_page_size_ * num_pages_per_metadata_file_;
695  auto data_file_size = page_size_ * num_pages_per_data_file_;
696  max_num_data_files_ = max_data_space / data_file_size;
697  max_num_meta_files_ = max_meta_file_space / meta_file_size;
698  CHECK_GT(max_num_data_files_, 0U) << "Cannot create a cache of size " << max_size_
699  << ". Not enough space to create a data file.";
700  CHECK_GT(max_num_meta_files_, 0U) << "Cannot create a cache of size " << max_size_
701  << ". Not enough space to create a metadata file.";
702 }
const size_t metadata_page_size_
Definition: FileMgr.h:552
const size_t page_size_
Definition: FileMgr.h:551
static constexpr float METADATA_SPACE_PERCENTAGE
#define CHECK_GT(x, y)
Definition: Logger.h:305
static size_t num_pages_per_data_file_
Definition: FileMgr.h:427
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:428
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::setMaxWrapperSpace ( size_t  max)
inline

Definition at line 379 of file CachingFileMgr.h.

References max_wrapper_space_.

void File_Namespace::CachingFileMgr::touchKey ( const ChunkKey key) const
private

Used to track which tables/chunks were least recently used.

Definition at line 530 of file CachingFileMgr.cpp.

References get_table_key().

530  {
533 }
LRUEvictionAlgorithm table_evict_alg_
void touchChunk(const ChunkKey &) override
ChunkKey get_table_key(const ChunkKey &key)
Definition: types.h:57
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 360 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePage(), and File_Namespace::ROLLOFF_CONTINGENT.

364  {
365  // These contingents are stored by overwriting the bytes used for chunkKeys. If
366  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
367  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
368  // the key. At this point our only option is to free the page as though it was
369  // checkpointed (which should be fine since we only maintain one version of each
370  // page).
371  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
372  file_info->freePage(page_num, false, page_epoch);
373  return true;
374  }
375  return false;
376 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Flushes epoch value to disk for a table.

Definition at line 166 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

166  {
168  auto table_it = table_dirs_.find({db_id, tb_id});
169  CHECK(table_it != table_dirs_.end());
170  table_it->second->writeAndSyncEpochToDisk();
171 }
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

helper function to flush all dirty buffers to disk.

Definition at line 378 of file CachingFileMgr.cpp.

378  {
380  ChunkKey min_table_key{db_id, tb_id};
381  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
382 
383  for (auto chunk_it = chunkIndex_.lower_bound(min_table_key);
384  chunk_it != chunkIndex_.upper_bound(max_table_key);
385  ++chunk_it) {
386  if (auto [key, buf] = *chunk_it; buf->isDirty()) {
387  // Free previous versions first so we only have one metadata version.
388  buf->freeMetadataPages();
389  buf->writeMetadata(epoch(db_id, tb_id));
390  buf->clearDirtyBits();
391  touchKey(key);
392  }
393  }
394 }
std::vector< int > ChunkKey
Definition: types.h:36
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:330
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:420
int32_t epoch() const
Definition: FileMgr.h:530
void File_Namespace::CachingFileMgr::writeWrapperFile ( const std::string &  doc,
int32_t  db,
int32_t  tb 
)

Writes a wrapper file to a table subdir.

Definition at line 659 of file CachingFileMgr.cpp.

References CHECK_LE.

659  {
661  auto wrapper_size = doc.size();
662  CHECK_LE(wrapper_size, getMaxWrapperSize())
663  << "Wrapper is too big to fit into the cache";
664  while (wrapper_size > getAvailableWrapperSpace()) {
666  }
668  table_dirs_.at({db, tb})->writeWrapperFile(doc);
669 }
heavyai::shared_mutex table_dirs_mutex_
void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)
Writes a wrapper file to a table subdir.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::shared_lock< T > shared_lock
#define CHECK_LE(x, y)
Definition: Logger.h:304
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...

Member Data Documentation

LRUEvictionAlgorithm File_Namespace::CachingFileMgr::chunk_evict_alg_
mutableprivate

Definition at line 512 of file CachingFileMgr.h.

Referenced by dump(), and dumpEvictionQueue().

std::optional<size_t> File_Namespace::CachingFileMgr::limit_data_size_ {}
private

Definition at line 510 of file CachingFileMgr.h.

Referenced by setDataSizeLimit().

size_t File_Namespace::CachingFileMgr::max_num_data_files_
private

Definition at line 506 of file CachingFileMgr.h.

Referenced by getMaxDataFiles(), and setMaxNumDataFiles().

size_t File_Namespace::CachingFileMgr::max_num_meta_files_
private

Definition at line 507 of file CachingFileMgr.h.

Referenced by getMaxMetaFiles(), and setMaxNumMetadataFiles().

size_t File_Namespace::CachingFileMgr::max_size_
private

Definition at line 509 of file CachingFileMgr.h.

Referenced by CachingFileMgr(), getAvailableSpace(), and getMaxSize().

size_t File_Namespace::CachingFileMgr::max_wrapper_space_
private
constexpr float File_Namespace::CachingFileMgr::METADATA_FILE_SPACE_PERCENTAGE {0.01}
static

Definition at line 186 of file CachingFileMgr.h.

Referenced by getMinimumSize().

constexpr float File_Namespace::CachingFileMgr::METADATA_SPACE_PERCENTAGE {0.1}
static

Definition at line 184 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<TableFileMgr> > File_Namespace::CachingFileMgr::table_dirs_
private
heavyai::shared_mutex File_Namespace::CachingFileMgr::table_dirs_mutex_
mutableprivate
LRUEvictionAlgorithm File_Namespace::CachingFileMgr::table_evict_alg_
mutableprivate

Definition at line 513 of file CachingFileMgr.h.

Referenced by dump(), and dumpTableQueue().

constexpr char File_Namespace::CachingFileMgr::WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
static

The documentation for this class was generated from the following files: