OmniSciDB  cde582ebc3
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const DiskCacheConfig &config)
 
 ~CachingFileMgr () override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getDefaultPageSize ()
 
size_t getMaxSize () override
 
size_t getMaxDataFiles () const
 
size_t getMaxMetaFiles () const
 
size_t getMaxWrapperSize () const
 
size_t getDataFileSize () const
 
size_t getMetadataFileSize () const
 
size_t getNumDataFiles () const
 
size_t getNumMetaFiles () const
 
size_t getAvailableSpace ()
 
size_t getAvailableWrapperSpace ()
 
size_t getAllocated () override
 
size_t getMaxDataFilesSize () const
 
void removeChunkKeepMetadata (const ChunkKey &key)
 Free pages for chunk and remove it from the chunk eviction algorithm. More...
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
size_t getChunkSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getMetadataSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getTableFileMgrSpaceReserved (int32_t db_id, int32_t tb_id) const
 
size_t getSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
std::string describeSelf () const override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes=0) override
 allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms. More...
 
CachingFileBufferallocateBuffer (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt) override
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
size_t getNumChunksWithMetadata () const
 Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts. More...
 
size_t getNumDataChunks () const
 Returns the number of buffers with chunk data in the CFM. More...
 
std::vector< ChunkKeygetChunkKeysForPrefix (const ChunkKey &prefix) const
 Returns the keys for chunks with chunk data that match the given prefix. More...
 
std::unique_ptr< CachingFileMgrreconstruct () const
 Initializes a new CFM using the initialization values in the current CFM. More...
 
void deleteWrapperFile (int32_t db, int32_t tb)
 Deletes the wrapper file from a table subdir. More...
 
void writeWrapperFile (const std::string &doc, int32_t db, int32_t tb)
 Writes a wrapper file to a table subdir. More...
 
bool hasWrapperFile (int32_t db_id, int32_t table_id) const
 
std::string getTableFileMgrPath (int32_t db, int32_t tb) const
 
size_t getFilesSize () const
 Get the total size of page files (data and metadata files). This includes allocated, but unused space. More...
 
size_t getTableFileMgrsSize () const
 Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files. More...
 
std::optional< FileBuffer * > getBufferIfExists (const ChunkKey &key)
 an optional version of get buffer if we are not sure a chunk exists. More...
 
void free_page (std::pair< FileInfo *, int32_t > &&page) override
 Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
std::string dumpKeysWithMetadata () const
 
std::string dumpKeysWithChunkData () const
 
std::string dumpTableQueue () const
 
std::string dumpEvictionQueue () const
 
std::string dump () const
 
void setMaxNumDataFiles (size_t max)
 
void setMaxNumMetadataFiles (size_t max)
 
void setMaxWrapperSpace (size_t max)
 
std::set< ChunkKeygetKeysWithMetadata () const
 
void setDataSizeLimit (size_t max)
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1, const size_t defaultPageSize=DEFAULT_PAGE_SIZE)
 Constructor. More...
 
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const size_t defaultPageSize, const bool runCoreInit)
 
 FileMgr (GlobalFileMgr *gfm, const size_t defaultPageSize, std::string basePath)
 
 ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats () const
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId) const
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator) const
 
void init (const size_t num_reader_threads, const int32_t epochOverride)
 
void init (const std::string &dataPathToConvertFrom, const int32_t epochOverride)
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
bool hasChunkMetadataForKeyPrefix (const ChunkKey &keyPrefix)
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch () const
 Returns value of epoch at last checkpoint. More...
 
void resetEpochFloor ()
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
int32_t getDBVersion () const
 Index for looking up chunks. More...
 
bool getDBConvert () const
 
void createTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name) const
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 

Static Public Member Functions

static size_t getMinimumSize ()
 
- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 
static void renameAndSymlinkLegacyFiles (const std::string &table_data_dir)
 

Static Public Attributes

static constexpr char WRAPPER_FILE_NAME [] = "wrapper_metadata.json"
 
static constexpr float METADATA_SPACE_PERCENTAGE {0.1}
 
static constexpr float METADATA_FILE_SPACE_PERCENTAGE {0.01}
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char const * COPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char const * UPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char const * DELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 

Private Member Functions

void incrementEpoch (int32_t db_id, int32_t tb_id)
 Increments epoch for the given table. More...
 
void init (const size_t num_reader_threads)
 Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe). More...
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 Flushes epoch value to disk for a table. More...
 
void readTableFileMgrs ()
 Checks for any sub-directories containing table-specific data and creates epochs from found files. More...
 
FileBuffercreateBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
 Creates a buffer and initializes it with info read from files on disk. More...
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a buffer. More...
 
void createTableFileMgrIfNoneExists (const int32_t db_id, const int32_t tb_id)
 Create and initialize a subdirectory for a table if none exists. More...
 
void incrementAllEpochs ()
 Increment epochs for each table in the CFM. More...
 
void removeTableFileMgr (int32_t db_id, int32_t tb_id)
 Removes the subdirectory content for a table. More...
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 Erases and cleans up all buffers for a table. More...
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 helper function to flush all dirty buffers to disk. More...
 
Page requestFreePage (size_t pagesize, const bool isMetadata) override
 requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available. More...
 
void touchKey (const ChunkKey &key) const
 Used to track which tables/chunks were least recently used. More...
 
void removeKey (const ChunkKey &key) const
 
std::vector< ChunkKeygetKeysForTable (int32_t db_id, int32_t tb_id) const
 returns set of keys contained in chunkIndex_ that match the given table prefix. More...
 
FileInfoevictMetadataPages ()
 evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
FileInfoevictPages ()
 evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
void deleteCacheIfTooLarge ()
 When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space. More...
 
void setMaxSizes ()
 Sets the maximum number of files/space for each type of storage based on the maximum size. More...
 
FileBuffergetBufferUnlocked (const ChunkKey &key, const size_t numBytes=0) const override
 
ChunkKeyToChunkMap::iterator deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
 

Private Attributes

heavyai::shared_mutex table_dirs_mutex_
 
std::map< TablePair,
std::unique_ptr< TableFileMgr > > 
table_dirs_
 
size_t max_num_data_files_
 
size_t max_num_meta_files_
 
size_t max_wrapper_space_
 
size_t max_size_
 
std::optional< size_t > limit_data_size_ {}
 
LRUEvictionAlgorithm chunk_evict_alg_
 
LRUEvictionAlgorithm table_evict_alg_
 

Additional Inherited Members

- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr ()
 
FileInfocreateFile (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
void migrateLegacyFilesV1 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t, FileInfo * > files_
 
PageSizeFileMMap fileIndex_
 A map of files accessible via a file identifier. More...
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
size_t defaultPageSize_
 number of threads used when loading data More...
 
unsigned nextFileId_
 
int32_t db_version_
 the index of the next file id More...
 
int32_t fileMgrVersion_
 
const int32_t latestFileMgrVersion_ {2}
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
heavyai::shared_mutex chunkIndexMutex_
 
heavyai::shared_mutex files_rw_mutex_
 
heavyai::shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 172 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const DiskCacheConfig config)

Definition at line 70 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::defaultPageSize_, File_Namespace::FileMgr::fileMgrBasePath_, init(), max_size_, File_Namespace::FileMgr::maxRollbackEpochs_, File_Namespace::FileMgr::nextFileId_, File_Namespace::DiskCacheConfig::num_reader_threads, File_Namespace::DiskCacheConfig::page_size, File_Namespace::DiskCacheConfig::path, setMaxSizes(), and File_Namespace::DiskCacheConfig::size_limit.

70  {
71  fileMgrBasePath_ = config.path;
73  defaultPageSize_ = config.page_size;
74  nextFileId_ = 0;
75  max_size_ = config.size_limit;
76  init(config.num_reader_threads);
77  setMaxSizes();
78 }
void setMaxSizes()
Sets the maximum number of files/space for each type of storage based on the maximum size...
std::string fileMgrBasePath_
Definition: FileMgr.h:396
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
int32_t maxRollbackEpochs_
Definition: FileMgr.h:395
void init(const size_t num_reader_threads)
Initializes a CFM, parsing any existing files and initializing data structures appropriately (current...

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
override

Definition at line 80 of file CachingFileMgr.cpp.

80 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes = 0 
)
overridevirtual

allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms.

Reimplemented from File_Namespace::FileMgr.

Definition at line 345 of file CachingFileMgr.cpp.

347  {
348  return new CachingFileBuffer(this, page_size, key, num_bytes);
349 }
CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  headerStartIt,
const std::vector< HeaderInfo >::const_iterator &  headerEndIt 
)
overridevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 351 of file CachingFileMgr.cpp.

354  {
355  return new CachingFileBuffer(this, key, headerStartIt, headerEndIt);
356 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 245 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

245  {
246  {
248  CHECK(table_dirs_.find({db_id, tb_id}) != table_dirs_.end());
249  }
250  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
251  << ") epoch: " << epoch(db_id, tb_id);
252  writeDirtyBuffers(db_id, tb_id);
253  syncFilesToDisk();
254  writeAndSyncEpochToDisk(db_id, tb_id);
255  incrementEpoch(db_id, tb_id);
256  freePages();
257 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::string describeSelf() const override
describes this FileMgr for logging purposes.
std::shared_lock< T > shared_lock
int32_t incrementEpoch()
Definition: FileMgr.h:283
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:647
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
int32_t epoch() const
Definition: FileMgr.h:517
#define CHECK(condition)
Definition: Logger.h:222
#define VLOG(n)
Definition: Logger.h:316
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 172 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::freePages(), removeTableBuffers(), and removeTableFileMgr().

172  {
173  removeTableBuffers(db_id, tb_id);
174  removeTableFileMgr(db_id, tb_id);
175  freePages();
176 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
Erases and cleans up all buffers for a table.
void removeTableFileMgr(int32_t db_id, int32_t tb_id)
Removes the subdirectory content for a table.

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 182 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::closePhysicalUnlocked(), File_Namespace::FileMgr::files_rw_mutex_, File_Namespace::FileMgr::getFileMgrBasePath(), table_dirs_, and table_dirs_mutex_.

182  {
183  {
186  }
187  {
189  table_dirs_.clear();
190  }
191  bf::remove_all(getFileMgrBasePath());
192 }
heavyai::shared_mutex table_dirs_mutex_
std::string getFileMgrBasePath() const
Definition: FileMgr.h:333
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411

+ Here is the call graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
overrideprivatevirtual

Creates a buffer and initializes it with info read from files on disk.

Reimplemented from File_Namespace::FileMgr.

Definition at line 278 of file CachingFileMgr.cpp.

References get_table_prefix().

Referenced by init().

281  {
282  if (startIt->pageId != -1) {
283  // If the first pageId is not -1 then there is no metadata page for the
284  // current key (which means it was never checkpointed), so we should skip.
285  return nullptr;
286  }
287  touchKey(key);
288  auto [db_id, tb_id] = get_table_prefix(key);
289  createTableFileMgrIfNoneExists(db_id, tb_id);
290  auto buffer = FileMgr::createBufferFromHeaders(key, startIt, endIt);
291  if (buffer->isMissingPages()) {
292  // Detect the case where a page is missing by comparing the amount of pages read
293  // with the metadata size. If data are missing, discard the chunk.
294  buffer->freeChunkPages();
295  }
296  return buffer;
297 }
virtual FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt)
Definition: FileMgr.cpp:725
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Creates a buffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 269 of file CachingFileMgr.cpp.

References get_table_prefix().

271  {
272  touchKey(key);
273  auto [db_id, tb_id] = get_table_prefix(key);
274  createTableFileMgrIfNoneExists(db_id, tb_id);
275  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
276 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:714
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createTableFileMgrIfNoneExists ( const int32_t  db_id,
const int32_t  tb_id 
)
private

Create and initialize a subdirectory for a table if none exists.

Definition at line 259 of file CachingFileMgr.cpp.

260  {
262  TablePair table_pair{db_id, tb_id};
263  if (table_dirs_.find(table_pair) == table_dirs_.end()) {
264  table_dirs_.emplace(
265  table_pair, std::make_unique<TableFileMgr>(getTableFileMgrPath(db_id, tb_id)));
266  }
267 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:91
std::string getTableFileMgrPath(int32_t db, int32_t tb) const
void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 395 of file CachingFileMgr.cpp.

395  {
397  auto chunk_it = chunkIndex_.find(key);
398  if (chunk_it != chunkIndex_.end()) {
399  deleteBufferUnlocked(chunk_it);
400  }
401 }
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
ChunkKeyToChunkMap::iterator File_Namespace::CachingFileMgr::deleteBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const bool  purge = true 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 711 of file CachingFileMgr.cpp.

713  {
714  removeKey(chunk_it->first);
715  return FileMgr::deleteBufferUnlocked(chunk_it, purge);
716 }
virtual ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:749
void removeKey(const ChunkKey &key) const
void File_Namespace::CachingFileMgr::deleteCacheIfTooLarge ( )
private

When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

Definition at line 414 of file CachingFileMgr.cpp.

References logger::INFO, LOG, and anonymous_namespace{CachingFileMgr.cpp}::size_of_dir().

Referenced by init().

414  {
417  bf::create_directory(fileMgrBasePath_);
418  LOG(INFO) << "Cache path over limit. Existing cache deleted.";
419  }
420 }
size_t size_of_dir(const std::string &dir)
#define LOG(tag)
Definition: Logger.h:216
void closeRemovePhysical() override
Closes files and removes the caching directory.
std::string fileMgrBasePath_
Definition: FileMgr.h:396

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::deleteWrapperFile ( int32_t  db,
int32_t  tb 
)

Deletes the wrapper file from a table subdir.

Definition at line 650 of file CachingFileMgr.cpp.

References CHECK.

650  {
652  auto it = table_dirs_.find({db, tb});
653  CHECK(it != table_dirs_.end());
654  it->second->deleteWrapperFile();
655 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:222
std::string File_Namespace::CachingFileMgr::describeSelf ( ) const
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 240 of file CachingFileMgr.cpp.

240  {
241  return "cache";
242 }
std::string File_Namespace::CachingFileMgr::dump ( ) const

Definition at line 57 of file CachingFileMgr.cpp.

References chunk_evict_alg_, File_Namespace::FileMgr::chunkIndex_, LRUEvictionAlgorithm::dumpEvictionQueue(), show_chunk(), and table_evict_alg_.

57  {
58  std::stringstream ss;
59  ss << "Dump Cache:\n";
60  for (const auto& [key, buf] : chunkIndex_) {
61  ss << " " << show_chunk(key) << " num_pages: " << buf->pageCount()
62  << ", is dirty: " << buf->isDirty() << "\n";
63  }
64  ss << "Data Eviction Queue:\n" << chunk_evict_alg_.dumpEvictionQueue();
65  ss << "Metadata Eviction Queue:\n" << table_evict_alg_.dumpEvictionQueue();
66  ss << "\n";
67  return ss.str();
68 }
LRUEvictionAlgorithm table_evict_alg_
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpEvictionQueue ( ) const
inline

Definition at line 369 of file CachingFileMgr.h.

References chunk_evict_alg_, and LRUEvictionAlgorithm::dumpEvictionQueue().

LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithChunkData ( ) const

Definition at line 630 of file CachingFileMgr.cpp.

References show_chunk().

630  {
632  std::string ret_string = "CFM keys with chunk data:\n";
633  for (const auto& [key, buf] : chunkIndex_) {
634  if (buf->hasDataPages()) {
635  ret_string += " " + show_chunk(key) + "\n";
636  }
637  }
638  return ret_string;
639 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithMetadata ( ) const

Definition at line 619 of file CachingFileMgr.cpp.

References show_chunk().

619  {
621  std::string ret_string = "CFM keys with metadata:\n";
622  for (const auto& [key, buf] : chunkIndex_) {
623  if (buf->hasEncoder()) {
624  ret_string += " " + show_chunk(key) + "\n";
625  }
626  }
627  return ret_string;
628 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpTableQueue ( ) const
inline

Definition at line 368 of file CachingFileMgr.h.

References LRUEvictionAlgorithm::dumpEvictionQueue(), and table_evict_alg_.

LRUEvictionAlgorithm table_evict_alg_

+ Here is the call graph for this function:

int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 140 of file CachingFileMgr.cpp.

References Epoch::min_allowable_epoch(), table_dirs_, and table_dirs_mutex_.

140  {
142  auto tables_it = table_dirs_.find({db_id, tb_id});
143  if (tables_it == table_dirs_.end()) {
144  // If there is no directory for this table, that means the cache does not recognize
145  // the table that is requested. This can happen if a table was dropped, and it's
146  // pages were invalidated but not yet freed and then the server crashed before they
147  // were freed. Upon re-starting the FileMgr will find these pages and attempt to
148  // compare their epoch to know if they are valid or not. In this case we should
149  // return an invalid epoch to indicate that any page for this table is not valid and
150  // should be freed.
152  }
153  auto& [pair, table_dir] = *tables_it;
154  return table_dir->getEpoch();
155 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
static int64_t min_allowable_epoch()
Definition: Epoch.h:65
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictMetadataPages ( )
private

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 474 of file CachingFileMgr.cpp.

References CHECK, anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail(), and get_table_prefix().

474  {
475  // Locks should already be in place before calling this method.
476  FileInfo* file_info{nullptr};
477  auto key_to_evict = evict_chunk_or_fail(table_evict_alg_);
478  auto [db_id, tb_id] = get_table_prefix(key_to_evict);
479  const auto keys = getKeysForTable(db_id, tb_id);
480  for (const auto& key : keys) {
481  auto chunk_it = chunkIndex_.find(key);
482  CHECK(chunk_it != chunkIndex_.end());
483  auto& buf = chunk_it->second;
484  if (!file_info) {
485  // Return the FileInfo for the first file we are freeing a page from so that the
486  // caller does not have to search for a FileInfo guaranteed to have at least one
487  // free page.
488  CHECK(buf->getMetadataPage().pageVersions.size() > 0);
489  file_info =
490  getFileInfoForFileId(buf->getMetadataPage().pageVersions.front().page.fileId);
491  }
492  // We erase all pages and entries for the chunk, as without metadata all other
493  // entries are useless.
494  deleteBufferUnlocked(chunk_it);
495  }
496  // Serialized datawrappers require metadata to be in the cache.
497  deleteWrapperFile(db_id, tb_id);
498  CHECK(file_info) << "FileInfo with freed page not found";
499  return file_info;
500 }
LRUEvictionAlgorithm table_evict_alg_
void deleteWrapperFile(int32_t db, int32_t tb)
Deletes the wrapper file from a table subdir.
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
std::vector< ChunkKey > getKeysForTable(int32_t db_id, int32_t tb_id) const
returns set of keys contained in chunkIndex_ that match the given table prefix.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
#define CHECK(condition)
Definition: Logger.h:222
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:224

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictPages ( )
private

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 502 of file CachingFileMgr.cpp.

References CHECK, and anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail().

502  {
503  FileInfo* file_info{nullptr};
504  FileBuffer* buf{nullptr};
505  while (!file_info) {
507  CHECK(buf);
508  if (!buf->hasDataPages()) {
509  // This buffer contains no chunk data (metadata only, uninitialized, size == 0,
510  // etc...) so we won't recover any space by evicting it. In this case it gets
511  // removed from the eviction queue (it will get re-added if it gets populated with
512  // data) and we look at the next chunk in queue until we find a buffer with page
513  // data.
514  continue;
515  }
516  // Return the FileInfo for the first file we are freeing a page from so that the
517  // caller does not have to search for a FileInfo guaranteed to have at least one free
518  // page.
519  CHECK(buf->getMultiPage().front().pageVersions.size() > 0);
520  file_info = getFileInfoForFileId(
521  buf->getMultiPage().front().pageVersions.front().page.fileId);
522  }
523  auto pages_freed = buf->freeChunkPages();
524  CHECK(pages_freed > 0) << "failed to evict a page";
525  CHECK(file_info) << "FileInfo with freed page not found";
526  return file_info;
527 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
#define CHECK(condition)
Definition: Logger.h:222
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:224
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 294 of file CachingFileMgr.h.

294 { return false; }
void File_Namespace::CachingFileMgr::free_page ( std::pair< FileInfo *, int32_t > &&  page)
overridevirtual

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

Reimplemented from File_Namespace::FileMgr.

Definition at line 733 of file CachingFileMgr.cpp.

733  {
734  page.first->freePageDeferred(page.second);
735 }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 214 of file CachingFileMgr.h.

References getFilesSize(), and getTableFileMgrsSize().

Referenced by getAvailableSpace().

214  {
215  return getFilesSize() + getTableFileMgrsSize();
216  }
size_t getFilesSize() const
Get the total size of page files (data and metadata files). This includes allocated, but unused space.
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableSpace ( )
inline

Definition at line 210 of file CachingFileMgr.h.

References getAllocated(), and max_size_.

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableWrapperSpace ( )
inline

Definition at line 211 of file CachingFileMgr.h.

References getTableFileMgrsSize(), and max_wrapper_space_.

211  {
213  }
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

std::optional< FileBuffer * > File_Namespace::CachingFileMgr::getBufferIfExists ( const ChunkKey key)

an optional version of get buffer if we are not sure a chunk exists.

Definition at line 702 of file CachingFileMgr.cpp.

702  {
704  auto chunk_it = chunkIndex_.find(key);
705  if (chunk_it == chunkIndex_.end()) {
706  return {};
707  }
708  return getBufferUnlocked(key);
709 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const override
FileBuffer * File_Namespace::CachingFileMgr::getBufferUnlocked ( const ChunkKey key,
const size_t  numBytes = 0 
) const
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 727 of file CachingFileMgr.cpp.

728  {
729  touchKey(key);
730  return FileMgr::getBufferUnlocked(key, num_bytes);
731 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
virtual FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const
Definition: FileMgr.cpp:779
std::vector< ChunkKey > File_Namespace::CachingFileMgr::getChunkKeysForPrefix ( const ChunkKey prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

Definition at line 581 of file CachingFileMgr.cpp.

References in_same_table().

582  {
584  std::vector<ChunkKey> chunks;
585  for (auto [key, buf] : chunkIndex_) {
586  if (in_same_table(key, prefix)) {
587  if (buf->hasDataPages()) {
588  chunks.emplace_back(key);
589  touchKey(key);
590  }
591  }
592  }
593  return chunks;
594 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
bool in_same_table(const ChunkKey &left_key, const ChunkKey &right_key)
Definition: types.h:83

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::getChunkMetadataVecForKeyPrefix ( ChunkMetadataVector chunkMetadataVec,
const ChunkKey keyPrefix 
)
override

Definition at line 718 of file CachingFileMgr.cpp.

720  {
721  FileMgr::getChunkMetadataVecForKeyPrefix(chunkMetadataVec, keyPrefix);
722  for (const auto& [key, meta] : chunkMetadataVec) {
723  touchKey(key);
724  }
725 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
Definition: FileMgr.cpp:997
size_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Set of functions to determine how much space is reserved in a table by type.

Definition at line 194 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::defaultPageSize_.

Referenced by getSpaceReservedByTable().

194  {
196  size_t space_used = 0;
197  ChunkKey min_table_key{db_id, tb_id};
198  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
199  for (auto it = chunkIndex_.lower_bound(min_table_key);
200  it != chunkIndex_.upper_bound(max_table_key);
201  ++it) {
202  auto& [key, buffer] = *it;
203  space_used += (buffer->numChunkPages() * defaultPageSize_);
204  }
205  return space_used;
206 }
std::vector< int > ChunkKey
Definition: types.h:36
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getDataFileSize ( ) const
inline

Definition at line 201 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_, and File_Namespace::FileMgr::num_pages_per_data_file_.

201  {
203  }
static size_t num_pages_per_data_file_
Definition: FileMgr.h:417
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
size_t File_Namespace::CachingFileMgr::getDefaultPageSize ( )
inline

Definition at line 196 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_.

196 { return defaultPageSize_; }
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
size_t File_Namespace::CachingFileMgr::getFilesSize ( ) const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

Definition at line 553 of file CachingFileMgr.cpp.

Referenced by getAllocated().

553  {
555  size_t sum = 0;
556  for (auto [id, file] : files_) {
557  sum += file->size();
558  }
559  return sum;
560 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:399
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411

+ Here is the caller graph for this function:

std::vector< ChunkKey > File_Namespace::CachingFileMgr::getKeysForTable ( int32_t  db_id,
int32_t  tb_id 
) const
private

returns set of keys contained in chunkIndex_ that match the given table prefix.

Definition at line 461 of file CachingFileMgr.cpp.

462  {
463  std::vector<ChunkKey> keys;
464  ChunkKey min_table_key{db_id, tb_id};
465  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
466  for (auto it = chunkIndex_.lower_bound(min_table_key);
467  it != chunkIndex_.upper_bound(max_table_key);
468  ++it) {
469  keys.emplace_back(it->first);
470  }
471  return keys;
472 }
std::vector< int > ChunkKey
Definition: types.h:36
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
std::set< ChunkKey > File_Namespace::CachingFileMgr::getKeysWithMetadata ( ) const

Definition at line 737 of file CachingFileMgr.cpp.

737  {
739  std::set<ChunkKey> ret;
740  for (const auto& [key, buf] : chunkIndex_) {
741  if (buf->hasEncoder()) {
742  ret.emplace(key);
743  }
744  }
745  return ret;
746 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getMaxDataFiles ( ) const
inline

Definition at line 198 of file CachingFileMgr.h.

References max_num_data_files_.

size_t File_Namespace::CachingFileMgr::getMaxDataFilesSize ( ) const

Definition at line 748 of file CachingFileMgr.cpp.

748  {
749  if (limit_data_size_) {
750  return *limit_data_size_;
751  }
752  return getMaxDataFiles() * getDataFileSize();
753 }
std::optional< size_t > limit_data_size_
size_t File_Namespace::CachingFileMgr::getMaxMetaFiles ( ) const
inline

Definition at line 199 of file CachingFileMgr.h.

References max_num_meta_files_.

size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 197 of file CachingFileMgr.h.

References max_size_.

197 { return max_size_; }
size_t File_Namespace::CachingFileMgr::getMaxWrapperSize ( ) const
inline

Definition at line 200 of file CachingFileMgr.h.

References max_wrapper_space_.

size_t File_Namespace::CachingFileMgr::getMetadataFileSize ( ) const
inline

Definition at line 204 of file CachingFileMgr.h.

References METADATA_PAGE_SIZE, and File_Namespace::FileMgr::num_pages_per_metadata_file_.

204  {
206  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:418
size_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 208 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and METADATA_PAGE_SIZE.

Referenced by getSpaceReservedByTable().

209  {
211  size_t space_used = 0;
212  ChunkKey min_table_key{db_id, tb_id};
213  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
214  for (auto it = chunkIndex_.lower_bound(min_table_key);
215  it != chunkIndex_.upper_bound(max_table_key);
216  ++it) {
217  auto& [key, buffer] = *it;
218  space_used += (buffer->numMetadataPages() * METADATA_PAGE_SIZE);
219  }
220  return space_used;
221 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
std::vector< int > ChunkKey
Definition: types.h:36
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the caller graph for this function:

MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 194 of file CachingFileMgr.h.

194 { return CACHING_FILE_MGR; };
static size_t File_Namespace::CachingFileMgr::getMinimumSize ( )
inlinestatic

Definition at line 182 of file CachingFileMgr.h.

References File_Namespace::FileMgr::DEFAULT_NUM_PAGES_PER_METADATA_FILE, METADATA_FILE_SPACE_PERCENTAGE, and METADATA_PAGE_SIZE.

Referenced by CommandLineOptions::validate().

182  {
183  // Currently the minimum default size is based on the metadata file size and
184  // percentage usage.
187  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE
Definition: FileMgr.h:371
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getNumChunksWithMetadata ( ) const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

Definition at line 608 of file CachingFileMgr.cpp.

608  {
610  size_t sum = 0;
611  for (const auto& [key, buf] : chunkIndex_) {
612  if (buf->hasEncoder()) {
613  sum++;
614  }
615  }
616  return sum;
617 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getNumDataChunks ( ) const

Returns the number of buffers with chunk data in the CFM.

Definition at line 403 of file CachingFileMgr.cpp.

403  {
405  size_t num_chunks = 0;
406  for (auto [key, buf] : chunkIndex_) {
407  if (buf->hasDataPages()) {
408  num_chunks++;
409  }
410  }
411  return num_chunks;
412 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getNumDataFiles ( ) const

Definition at line 571 of file CachingFileMgr.cpp.

571  {
573  return fileIndex_.count(defaultPageSize_);
574 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:400
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411
size_t File_Namespace::CachingFileMgr::getNumMetaFiles ( ) const

Definition at line 576 of file CachingFileMgr.cpp.

References METADATA_PAGE_SIZE.

576  {
578  return fileIndex_.count(METADATA_PAGE_SIZE);
579 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:400
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411
size_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 233 of file CachingFileMgr.cpp.

References getChunkSpaceReservedByTable(), getMetadataSpaceReservedByTable(), and getTableFileMgrSpaceReserved().

233  {
234  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
235  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
236  auto subdir_space = getTableFileMgrSpaceReserved(db_id, tb_id);
237  return chunk_space + meta_space + subdir_space;
238 }
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 195 of file CachingFileMgr.h.

195 { return ToString(CACHING_FILE_MGR); }
std::string File_Namespace::CachingFileMgr::getTableFileMgrPath ( int32_t  db,
int32_t  tb 
) const

Definition at line 178 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table(), and File_Namespace::FileMgr::getFileMgrBasePath().

178  {
179  return getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
180 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:333

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrSpaceReserved ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 223 of file CachingFileMgr.cpp.

References table_dirs_, and table_dirs_mutex_.

Referenced by getSpaceReservedByTable().

223  {
225  size_t space = 0;
226  auto table_it = table_dirs_.find({db_id, tb_id});
227  if (table_it != table_dirs_.end()) {
228  space += table_it->second->getReservedSpace();
229  }
230  return space;
231 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrsSize ( ) const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

Definition at line 562 of file CachingFileMgr.cpp.

Referenced by getAllocated(), and getAvailableWrapperSpace().

562  {
564  size_t space_used = 0;
565  for (const auto& [pair, table_dir] : table_dirs_) {
566  space_used += table_dir->getReservedSpace();
567  }
568  return space_used;
569 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 233 of file CachingFileMgr.h.

233 { return false; }
bool File_Namespace::CachingFileMgr::hasWrapperFile ( int32_t  db_id,
int32_t  table_id 
) const

Checks if data wrapper file has been written to disk/cached.

Definition at line 669 of file CachingFileMgr.cpp.

669  {
671  auto it = table_dirs_.find({db_id, table_id});
672  if (it != table_dirs_.end()) {
673  return it->second->hasWrapperFile();
674  }
675  return false;
676 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Increment epochs for each table in the CFM.

Definition at line 317 of file CachingFileMgr.cpp.

Referenced by init().

317  {
319  for (auto& table_dir : table_dirs_) {
320  table_dir.second->incrementEpoch();
321  }
322 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Increments epoch for the given table.

Definition at line 157 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

157  {
159  auto tables_it = table_dirs_.find({db_id, tb_id});
160  CHECK(tables_it != table_dirs_.end());
161  auto& [pair, table_dir] = *tables_it;
162  table_dir->incrementEpoch();
163 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:222
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

Definition at line 82 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), deleteCacheIfTooLarge(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableFileMgrs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

82  {
85  auto open_files_result = openFiles();
86  /* Sort headerVec so that all HeaderInfos
87  * from a chunk will be grouped together
88  * and in order of increasing PageId
89  * - Version Epoch */
90  auto& header_vec = open_files_result.header_infos;
91  std::sort(header_vec.begin(), header_vec.end());
92 
93  /* Goal of next section is to find sequences in the
94  * sorted headerVec of the same ChunkId, which we
95  * can then initiate a FileBuffer with */
96  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
97  if (header_vec.size() > 0) {
98  auto startIt = header_vec.begin();
99  ChunkKey lastChunkKey = startIt->chunkKey;
100  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
101  if (it->chunkKey != lastChunkKey) {
102  createBufferFromHeaders(lastChunkKey, startIt, it);
103  lastChunkKey = it->chunkKey;
104  startIt = it;
105  }
106  }
107  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
108  }
109  nextFileId_ = open_files_result.max_file_id + 1;
111  freePages();
112  initializeNumThreads(num_reader_threads);
113  isFullyInitted_ = true;
114 }
std::vector< int > ChunkKey
Definition: types.h:36
OpenFilesResult openFiles()
Definition: FileMgr.cpp:189
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void deleteCacheIfTooLarge()
When the cache is read from disk, we don&#39;t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.
void incrementAllEpochs()
Increment epochs for each table in the CFM.
void readTableFileMgrs()
Checks for any sub-directories containing table-specific data and creates epochs from found files...
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1573
#define VLOG(n)
Definition: Logger.h:316
FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
Creates a buffer and initializes it with info read from files on disk.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 305 of file CachingFileMgr.cpp.

References CHECK, Data_Namespace::AbstractBuffer::isDirty(), Data_Namespace::AbstractBuffer::setAppended(), Data_Namespace::AbstractBuffer::setDirty(), and Data_Namespace::AbstractBuffer::size().

307  {
308  CHECK(!src_buffer->isDirty()) << "Cannot cache dirty buffers.";
310  // Since the buffer is not dirty we mark it as dirty if we are only writing metadata and
311  // appended if we are writing chunk data. We delete + append rather than write to make
312  // sure we don't write multiple page versions.
313  (src_buffer->size() == 0) ? src_buffer->setDirty() : src_buffer->setAppended();
314  return FileMgr::putBuffer(key, src_buffer, num_bytes);
315 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
#define CHECK(condition)
Definition: Logger.h:222
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:805

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::readTableFileMgrs ( )
private

Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 116 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, table_dirs_, and table_dirs_mutex_.

Referenced by init().

116  {
118  bf::path path(fileMgrBasePath_);
119  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
120  CHECK(bf::is_directory(path))
121  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
122 
123  // Look for directories with table-specific names.
124  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
125  for (const auto& file : bf::directory_iterator(path)) {
126  boost::smatch match;
127  auto file_name = file.path().filename().string();
128  if (boost::regex_match(file_name, match, table_filter)) {
129  int32_t db_id = std::stoi(match[1]);
130  int32_t tb_id = std::stoi(match[2]);
131  TablePair table_pair{db_id, tb_id};
132  CHECK(table_dirs_.find(table_pair) == table_dirs_.end())
133  << "Trying to read data for existing table";
134  table_dirs_.emplace(table_pair,
135  std::make_unique<TableFileMgr>(file.path().string()));
136  }
137  }
138 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::string fileMgrBasePath_
Definition: FileMgr.h:396
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:222
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:91

+ Here is the caller graph for this function:

std::unique_ptr< CachingFileMgr > File_Namespace::CachingFileMgr::reconstruct ( ) const

Initializes a new CFM using the initialization values in the current CFM.

Definition at line 641 of file CachingFileMgr.cpp.

641  {
642  DiskCacheConfig config{fileMgrBasePath_,
645  max_size_,
647  return std::make_unique<CachingFileMgr>(config);
648 }
std::string fileMgrBasePath_
Definition: FileMgr.h:396
size_t num_reader_threads_
Maps page sizes to FileInfo objects.
Definition: FileMgr.h:401
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
void File_Namespace::CachingFileMgr::removeChunkKeepMetadata ( const ChunkKey key)

Free pages for chunk and remove it from the chunk eviction algorithm.

Definition at line 596 of file CachingFileMgr.cpp.

References CHECK.

596  {
597  if (isBufferOnDevice(key)) {
598  auto chunkIt = chunkIndex_.find(key);
599  CHECK(chunkIt != chunkIndex_.end());
600  auto& buf = chunkIt->second;
601  if (buf->hasDataPages()) {
602  buf->freeChunkPages();
604  }
605  }
606 }
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
bool isBufferOnDevice(const ChunkKey &key) override
Definition: FileMgr.cpp:736
#define CHECK(condition)
Definition: Logger.h:222
LRUEvictionAlgorithm chunk_evict_alg_
void File_Namespace::CachingFileMgr::removeKey ( const ChunkKey key) const
private

Definition at line 534 of file CachingFileMgr.cpp.

References get_table_prefix().

534  {
535  // chunkIndex lock should already be acquired.
537  auto [db_id, tb_id] = get_table_prefix(key);
538  ChunkKey table_key{db_id, tb_id};
539  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
540  for (auto it = chunkIndex_.lower_bound(table_key);
541  it != chunkIndex_.upper_bound(max_table_key);
542  ++it) {
543  if (it->first != key) {
544  // If there are any keys in this table other than that one we are removing, then
545  // keep the table in the eviction queue.
546  return;
547  }
548  }
549  // No other keys exist for this table, so remove it from the queue.
550  table_evict_alg_.removeChunk(table_key);
551 }
std::vector< int > ChunkKey
Definition: types.h:36
LRUEvictionAlgorithm table_evict_alg_
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Erases and cleans up all buffers for a table.

Definition at line 334 of file CachingFileMgr.cpp.

Referenced by clearForTable().

334  {
335  // Free associated FileBuffers and clear buffer entries.
337  ChunkKey min_table_key{db_id, tb_id};
338  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
339  for (auto it = chunkIndex_.lower_bound(min_table_key);
340  it != chunkIndex_.upper_bound(max_table_key);) {
341  it = deleteBufferUnlocked(it);
342  }
343 }
std::vector< int > ChunkKey
Definition: types.h:36
heavyai::unique_lock< heavyai::shared_mutex > write_lock
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableFileMgr ( int32_t  db_id,
int32_t  tb_id 
)
private

Removes the subdirectory content for a table.

Definition at line 324 of file CachingFileMgr.cpp.

Referenced by clearForTable().

324  {
325  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
327  auto it = table_dirs_.find({db_id, tb_id});
328  if (it != table_dirs_.end()) {
329  it->second->removeDiskContent();
330  table_dirs_.erase(it);
331  }
332 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

Page File_Namespace::CachingFileMgr::requestFreePage ( size_t  pagesize,
const bool  isMetadata 
)
overrideprivatevirtual

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

Reimplemented from File_Namespace::FileMgr.

Definition at line 422 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileInfo::fileId, and File_Namespace::FileInfo::getFreePage().

422  {
423  std::lock_guard<std::mutex> lock(getPageMutex_);
424  int32_t pageNum = -1;
425  // Splits files into metadata and regular data by size.
426  auto candidateFiles = fileIndex_.equal_range(pageSize);
427  // Check if there is a free page in an existing file.
428  for (auto fileIt = candidateFiles.first; fileIt != candidateFiles.second; ++fileIt) {
429  FileInfo* fileInfo = files_.at(fileIt->second);
430  pageNum = fileInfo->getFreePage();
431  if (pageNum != -1) {
432  return (Page(fileInfo->fileId, pageNum));
433  }
434  }
435 
436  // Try to add a new file if there is free space available.
437  FileInfo* fileInfo = nullptr;
438  if (isMetadata) {
439  if (getMaxMetaFiles() > getNumMetaFiles()) {
440  fileInfo = createFile(pageSize, num_pages_per_metadata_file_);
441  }
442  } else {
443  if (getMaxDataFiles() > getNumDataFiles()) {
444  fileInfo = createFile(pageSize, num_pages_per_data_file_);
445  }
446  }
447 
448  if (!fileInfo) {
449  // We were not able to create a new file, so we try to evict space.
450  // Eviction will return the first file it evicted a page from (a file now guaranteed
451  // to have a free page).
452  fileInfo = isMetadata ? evictMetadataPages() : evictPages();
453  }
454  CHECK(fileInfo);
455 
456  pageNum = fileInfo->getFreePage();
457  CHECK(pageNum != -1);
458  return (Page(fileInfo->fileId, pageNum));
459 }
std::mutex getPageMutex_
pointer to DB level metadata
Definition: FileMgr.h:409
static size_t num_pages_per_data_file_
Definition: FileMgr.h:417
FileInfo * evictPages()
evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first F...
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:400
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:418
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
FileInfo * createFile(const size_t pageSize, const size_t numPages)
Adds a file to the file manager repository.
Definition: FileMgr.cpp:951
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:399
#define CHECK(condition)
Definition: Logger.h:222

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::setDataSizeLimit ( size_t  max)
inline

Definition at line 377 of file CachingFileMgr.h.

References limit_data_size_.

377 { limit_data_size_ = max; }
std::optional< size_t > limit_data_size_
void File_Namespace::CachingFileMgr::setMaxNumDataFiles ( size_t  max)
inline

Definition at line 373 of file CachingFileMgr.h.

References max_num_data_files_.

void File_Namespace::CachingFileMgr::setMaxNumMetadataFiles ( size_t  max)
inline

Definition at line 374 of file CachingFileMgr.h.

References max_num_meta_files_.

void File_Namespace::CachingFileMgr::setMaxSizes ( )
private

Sets the maximum number of files/space for each type of storage based on the maximum size.

Definition at line 687 of file CachingFileMgr.cpp.

References CHECK_GT, and METADATA_PAGE_SIZE.

Referenced by CachingFileMgr().

687  {
688  size_t max_meta_space = std::floor(max_size_ * METADATA_SPACE_PERCENTAGE);
689  size_t max_meta_file_space = std::floor(max_size_ * METADATA_FILE_SPACE_PERCENTAGE);
690  max_wrapper_space_ = max_meta_space - max_meta_file_space;
691  auto max_data_space = max_size_ - max_meta_space;
692  auto meta_file_size = METADATA_PAGE_SIZE * num_pages_per_metadata_file_;
693  auto data_file_size = defaultPageSize_ * num_pages_per_data_file_;
694  max_num_data_files_ = max_data_space / data_file_size;
695  max_num_meta_files_ = max_meta_file_space / meta_file_size;
696  CHECK_GT(max_num_data_files_, 0U) << "Cannot create a cache of size " << max_size_
697  << ". Not enough space to create a data file.";
698  CHECK_GT(max_num_meta_files_, 0U) << "Cannot create a cache of size " << max_size_
699  << ". Not enough space to create a metadata file.";
700 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr float METADATA_SPACE_PERCENTAGE
#define CHECK_GT(x, y)
Definition: Logger.h:234
static size_t num_pages_per_data_file_
Definition: FileMgr.h:417
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:418
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:402
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::setMaxWrapperSpace ( size_t  max)
inline

Definition at line 375 of file CachingFileMgr.h.

References max_wrapper_space_.

void File_Namespace::CachingFileMgr::touchKey ( const ChunkKey key) const
private

Used to track which tables/chunks were least recently used.

Definition at line 529 of file CachingFileMgr.cpp.

References get_table_key().

529  {
532 }
LRUEvictionAlgorithm table_evict_alg_
void touchChunk(const ChunkKey &) override
ChunkKey get_table_key(const ChunkKey &key)
Definition: types.h:57
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 359 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePage(), and File_Namespace::ROLLOFF_CONTINGENT.

363  {
364  // These contingents are stored by overwriting the bytes used for chunkKeys. If
365  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
366  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
367  // the key. At this point our only option is to free the page as though it was
368  // checkpointed (which should be fine since we only maintain one version of each
369  // page).
370  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
371  file_info->freePage(page_num, false, page_epoch);
372  return true;
373  }
374  return false;
375 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Flushes epoch value to disk for a table.

Definition at line 165 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

165  {
167  auto table_it = table_dirs_.find({db_id, tb_id});
168  CHECK(table_it != table_dirs_.end());
169  table_it->second->writeAndSyncEpochToDisk();
170 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:222
void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

helper function to flush all dirty buffers to disk.

Definition at line 377 of file CachingFileMgr.cpp.

377  {
379  ChunkKey min_table_key{db_id, tb_id};
380  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
381 
382  for (auto chunk_it = chunkIndex_.lower_bound(min_table_key);
383  chunk_it != chunkIndex_.upper_bound(max_table_key);
384  ++chunk_it) {
385  if (auto [key, buf] = *chunk_it; buf->isDirty()) {
386  // Free previous versions first so we only have one metadata version.
387  buf->freeMetadataPages();
388  buf->writeMetadata(epoch(db_id, tb_id));
389  buf->clearDirtyBits();
390  touchKey(key);
391  }
392  }
393 }
std::vector< int > ChunkKey
Definition: types.h:36
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:328
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
int32_t epoch() const
Definition: FileMgr.h:517
void File_Namespace::CachingFileMgr::writeWrapperFile ( const std::string &  doc,
int32_t  db,
int32_t  tb 
)

Writes a wrapper file to a table subdir.

Definition at line 657 of file CachingFileMgr.cpp.

References CHECK_LE.

657  {
659  auto wrapper_size = doc.size();
660  CHECK_LE(wrapper_size, getMaxWrapperSize())
661  << "Wrapper is too big to fit into the cache";
662  while (wrapper_size > getAvailableWrapperSpace()) {
664  }
666  table_dirs_.at({db, tb})->writeWrapperFile(doc);
667 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)
Writes a wrapper file to a table subdir.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::shared_lock< T > shared_lock
#define CHECK_LE(x, y)
Definition: Logger.h:233
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...

Member Data Documentation

LRUEvictionAlgorithm File_Namespace::CachingFileMgr::chunk_evict_alg_
mutableprivate

Definition at line 503 of file CachingFileMgr.h.

Referenced by dump(), and dumpEvictionQueue().

std::optional<size_t> File_Namespace::CachingFileMgr::limit_data_size_ {}
private

Definition at line 501 of file CachingFileMgr.h.

Referenced by setDataSizeLimit().

size_t File_Namespace::CachingFileMgr::max_num_data_files_
private

Definition at line 497 of file CachingFileMgr.h.

Referenced by getMaxDataFiles(), and setMaxNumDataFiles().

size_t File_Namespace::CachingFileMgr::max_num_meta_files_
private

Definition at line 498 of file CachingFileMgr.h.

Referenced by getMaxMetaFiles(), and setMaxNumMetadataFiles().

size_t File_Namespace::CachingFileMgr::max_size_
private

Definition at line 500 of file CachingFileMgr.h.

Referenced by CachingFileMgr(), getAvailableSpace(), and getMaxSize().

size_t File_Namespace::CachingFileMgr::max_wrapper_space_
private
constexpr float File_Namespace::CachingFileMgr::METADATA_FILE_SPACE_PERCENTAGE {0.01}
static

Definition at line 180 of file CachingFileMgr.h.

Referenced by getMinimumSize().

constexpr float File_Namespace::CachingFileMgr::METADATA_SPACE_PERCENTAGE {0.1}
static

Definition at line 178 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<TableFileMgr> > File_Namespace::CachingFileMgr::table_dirs_
private
heavyai::shared_mutex File_Namespace::CachingFileMgr::table_dirs_mutex_
mutableprivate
LRUEvictionAlgorithm File_Namespace::CachingFileMgr::table_evict_alg_
mutableprivate

Definition at line 504 of file CachingFileMgr.h.

Referenced by dump(), and dumpTableQueue().

constexpr char File_Namespace::CachingFileMgr::WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
static

The documentation for this class was generated from the following files: