OmniSciDB  c1a53651b2
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const DiskCacheConfig &config)
 
 ~CachingFileMgr () override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getPageSize ()
 
size_t getMaxSize () override
 
size_t getMaxDataFiles () const
 
size_t getMaxMetaFiles () const
 
size_t getMaxWrapperSize () const
 
size_t getDataFileSize () const
 
size_t getMetadataFileSize () const
 
size_t getNumDataFiles () const
 
size_t getNumMetaFiles () const
 
size_t getAvailableSpace ()
 
size_t getAvailableWrapperSpace ()
 
size_t getAllocated () override
 
size_t getMaxDataFilesSize () const
 
void removeChunkKeepMetadata (const ChunkKey &key)
 Free pages for chunk and remove it from the chunk eviction algorithm. More...
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
size_t getChunkSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getMetadataSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getTableFileMgrSpaceReserved (int32_t db_id, int32_t tb_id) const
 
size_t getSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
std::string describeSelf () const override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes=0) override
 allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms. More...
 
CachingFileBufferallocateBuffer (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt) override
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
size_t getNumChunksWithMetadata () const
 Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts. More...
 
size_t getNumDataChunks () const
 Returns the number of buffers with chunk data in the CFM. More...
 
std::vector< ChunkKeygetChunkKeysForPrefix (const ChunkKey &prefix) const
 Returns the keys for chunks with chunk data that match the given prefix. More...
 
std::unique_ptr< CachingFileMgrreconstruct () const
 Initializes a new CFM using the initialization values in the current CFM. More...
 
void deleteWrapperFile (int32_t db, int32_t tb)
 Deletes the wrapper file from a table subdir. More...
 
void writeWrapperFile (const std::string &doc, int32_t db, int32_t tb)
 Writes a wrapper file to a table subdir. More...
 
bool hasWrapperFile (int32_t db_id, int32_t table_id) const
 
std::string getTableFileMgrPath (int32_t db, int32_t tb) const
 
size_t getFilesSize () const
 Get the total size of page files (data and metadata files). This includes allocated, but unused space. More...
 
size_t getTableFileMgrsSize () const
 Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files. More...
 
std::optional< FileBuffer * > getBufferIfExists (const ChunkKey &key)
 an optional version of get buffer if we are not sure a chunk exists. More...
 
void free_page (std::pair< FileInfo *, int32_t > &&page) override
 Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
std::string dumpKeysWithMetadata () const
 
std::string dumpKeysWithChunkData () const
 
std::string dumpTableQueue () const
 
std::string dumpEvictionQueue () const
 
std::string dump () const
 
void setMaxNumDataFiles (size_t max)
 
void setMaxNumMetadataFiles (size_t max)
 
void setMaxWrapperSpace (size_t max)
 
std::set< ChunkKeygetKeysWithMetadata () const
 
void setDataSizeLimit (size_t max)
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1)
 Constructor. More...
 
 FileMgr (const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const bool run_core_init)
 
 FileMgr (GlobalFileMgr *gfm, std::string basePath)
 
 ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats () const
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId) const
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator) const
 
void init (const size_t num_reader_threads, const int32_t epochOverride)
 
void init (const std::string &dataPathToConvertFrom, const int32_t epochOverride)
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
bool hasChunkMetadataForKeyPrefix (const ChunkKey &keyPrefix)
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch () const
 Returns value of epoch at last checkpoint. More...
 
void resetEpochFloor ()
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
int32_t getDBVersion () const
 Index for looking up chunks. More...
 
bool getDBConvert () const
 
void createTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name) const
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 
size_t getPageSize () const
 
size_t getMetadataPageSize () const
 

Static Public Member Functions

static size_t getMinimumSize ()
 
- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 
static void renameAndSymlinkLegacyFiles (const std::string &table_data_dir)
 

Static Public Attributes

static constexpr char WRAPPER_FILE_NAME [] = "wrapper_metadata.json"
 
static constexpr float METADATA_SPACE_PERCENTAGE {0.1}
 
static constexpr float METADATA_FILE_SPACE_PERCENTAGE {0.01}
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char const * COPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char const * UPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char const * DELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 

Private Member Functions

void incrementEpoch (int32_t db_id, int32_t tb_id)
 Increments epoch for the given table. More...
 
void init (const size_t num_reader_threads)
 Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe). More...
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 Flushes epoch value to disk for a table. More...
 
void readTableFileMgrs ()
 Checks for any sub-directories containing table-specific data and creates epochs from found files. More...
 
FileBuffercreateBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
 Creates a buffer and initializes it with info read from files on disk. More...
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a buffer. More...
 
void createTableFileMgrIfNoneExists (const int32_t db_id, const int32_t tb_id)
 Create and initialize a subdirectory for a table if none exists. More...
 
void incrementAllEpochs ()
 Increment epochs for each table in the CFM. More...
 
void removeTableFileMgr (int32_t db_id, int32_t tb_id)
 Removes the subdirectory content for a table. More...
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 Erases and cleans up all buffers for a table. More...
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 helper function to flush all dirty buffers to disk. More...
 
Page requestFreePage (size_t pagesize, const bool isMetadata) override
 requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available. More...
 
void touchKey (const ChunkKey &key) const
 Used to track which tables/chunks were least recently used. More...
 
void removeKey (const ChunkKey &key) const
 
std::vector< ChunkKeygetKeysForTable (int32_t db_id, int32_t tb_id) const
 returns set of keys contained in chunkIndex_ that match the given table prefix. More...
 
FileInfoevictMetadataPages ()
 evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
FileInfoevictPages ()
 evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
void deleteCacheIfTooLarge ()
 When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space. More...
 
void setMaxSizes ()
 Sets the maximum number of files/space for each type of storage based on the maximum size. More...
 
FileBuffergetBufferUnlocked (const ChunkKey &key, const size_t numBytes=0) const override
 
ChunkKeyToChunkMap::iterator deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
 

Private Attributes

heavyai::shared_mutex table_dirs_mutex_
 
std::map< TablePair,
std::unique_ptr< TableFileMgr > > 
table_dirs_
 
size_t max_num_data_files_
 
size_t max_num_meta_files_
 
size_t max_wrapper_space_
 
size_t max_size_
 
std::optional< size_t > limit_data_size_ {}
 
LRUEvictionAlgorithm chunk_evict_alg_
 
LRUEvictionAlgorithm table_evict_alg_
 

Additional Inherited Members

- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const size_t defaultPageSize, const size_t defaultMetadataPageSize)
 
FileInfocreateFile (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
void migrateLegacyFilesV1 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t, FileInfo * > files_
 
PageSizeFileMMap fileIndex_
 A map of files accessible via a file identifier. More...
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
unsigned nextFileId_
 number of threads used when loading data More...
 
int32_t db_version_
 the index of the next file id More...
 
int32_t fileMgrVersion_
 
const int32_t latestFileMgrVersion_ {2}
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
heavyai::shared_mutex chunkIndexMutex_
 
heavyai::shared_mutex files_rw_mutex_
 
heavyai::shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
const size_t page_size_
 
const size_t metadata_page_size_
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 172 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const DiskCacheConfig config)

Definition at line 73 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::fileMgrBasePath_, init(), max_size_, File_Namespace::FileMgr::maxRollbackEpochs_, File_Namespace::FileMgr::nextFileId_, File_Namespace::DiskCacheConfig::num_reader_threads, File_Namespace::DiskCacheConfig::path, setMaxSizes(), and File_Namespace::DiskCacheConfig::size_limit.

74  : FileMgr(config.page_size, DEFAULT_METADATA_PAGE_SIZE) {
75  fileMgrBasePath_ = config.path;
77  nextFileId_ = 0;
78  max_size_ = config.size_limit;
79  init(config.num_reader_threads);
80  setMaxSizes();
81 }
FileMgr(const int32_t device_id, GlobalFileMgr *gfm, const TablePair file_mgr_key, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1)
Constructor.
Definition: FileMgr.cpp:47
void setMaxSizes()
Sets the maximum number of files/space for each type of storage based on the maximum size...
#define DEFAULT_METADATA_PAGE_SIZE
std::string fileMgrBasePath_
Definition: FileMgr.h:397
int32_t maxRollbackEpochs_
Definition: FileMgr.h:396
void init(const size_t num_reader_threads)
Initializes a CFM, parsing any existing files and initializing data structures appropriately (current...
unsigned nextFileId_
number of threads used when loading data
Definition: FileMgr.h:403

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
override

Definition at line 83 of file CachingFileMgr.cpp.

83 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes = 0 
)
overridevirtual

allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms.

Reimplemented from File_Namespace::FileMgr.

Definition at line 348 of file CachingFileMgr.cpp.

350  {
351  return new CachingFileBuffer(this, page_size, key, num_bytes);
352 }
CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  headerStartIt,
const std::vector< HeaderInfo >::const_iterator &  headerEndIt 
)
overridevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 354 of file CachingFileMgr.cpp.

357  {
358  return new CachingFileBuffer(this, key, headerStartIt, headerEndIt);
359 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 248 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

248  {
249  {
251  CHECK(table_dirs_.find({db_id, tb_id}) != table_dirs_.end());
252  }
253  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
254  << ") epoch: " << epoch(db_id, tb_id);
255  writeDirtyBuffers(db_id, tb_id);
256  syncFilesToDisk();
257  writeAndSyncEpochToDisk(db_id, tb_id);
258  incrementEpoch(db_id, tb_id);
259  freePages();
260 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::string describeSelf() const override
describes this FileMgr for logging purposes.
std::shared_lock< T > shared_lock
int32_t incrementEpoch()
Definition: FileMgr.h:281
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:656
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
int32_t epoch() const
Definition: FileMgr.h:517
#define CHECK(condition)
Definition: Logger.h:291
#define VLOG(n)
Definition: Logger.h:387
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 175 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::freePages(), removeTableBuffers(), and removeTableFileMgr().

175  {
176  removeTableBuffers(db_id, tb_id);
177  removeTableFileMgr(db_id, tb_id);
178  freePages();
179 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
Erases and cleans up all buffers for a table.
void removeTableFileMgr(int32_t db_id, int32_t tb_id)
Removes the subdirectory content for a table.

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 185 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::closePhysicalUnlocked(), File_Namespace::FileMgr::files_rw_mutex_, File_Namespace::FileMgr::getFileMgrBasePath(), table_dirs_, and table_dirs_mutex_.

185  {
186  {
189  }
190  {
192  table_dirs_.clear();
193  }
194  bf::remove_all(getFileMgrBasePath());
195 }
heavyai::shared_mutex table_dirs_mutex_
std::string getFileMgrBasePath() const
Definition: FileMgr.h:331
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411

+ Here is the call graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
overrideprivatevirtual

Creates a buffer and initializes it with info read from files on disk.

Reimplemented from File_Namespace::FileMgr.

Definition at line 281 of file CachingFileMgr.cpp.

References get_table_prefix().

Referenced by init().

284  {
285  if (startIt->pageId != -1) {
286  // If the first pageId is not -1 then there is no metadata page for the
287  // current key (which means it was never checkpointed), so we should skip.
288  return nullptr;
289  }
290  touchKey(key);
291  auto [db_id, tb_id] = get_table_prefix(key);
292  createTableFileMgrIfNoneExists(db_id, tb_id);
293  auto buffer = FileMgr::createBufferFromHeaders(key, startIt, endIt);
294  if (buffer->isMissingPages()) {
295  // Detect the case where a page is missing by comparing the amount of pages read
296  // with the metadata size. If data are missing, discard the chunk.
297  buffer->freeChunkPages();
298  }
299  return buffer;
300 }
virtual FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt)
Definition: FileMgr.cpp:734
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Creates a buffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 272 of file CachingFileMgr.cpp.

References get_table_prefix().

274  {
275  touchKey(key);
276  auto [db_id, tb_id] = get_table_prefix(key);
277  createTableFileMgrIfNoneExists(db_id, tb_id);
278  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
279 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:723
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createTableFileMgrIfNoneExists ( const int32_t  db_id,
const int32_t  tb_id 
)
private

Create and initialize a subdirectory for a table if none exists.

Definition at line 262 of file CachingFileMgr.cpp.

263  {
265  TablePair table_pair{db_id, tb_id};
266  if (table_dirs_.find(table_pair) == table_dirs_.end()) {
267  table_dirs_.emplace(
268  table_pair, std::make_unique<TableFileMgr>(getTableFileMgrPath(db_id, tb_id)));
269  }
270 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:91
std::string getTableFileMgrPath(int32_t db, int32_t tb) const
void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 398 of file CachingFileMgr.cpp.

398  {
400  auto chunk_it = chunkIndex_.find(key);
401  if (chunk_it != chunkIndex_.end()) {
402  deleteBufferUnlocked(chunk_it);
403  }
404 }
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
ChunkKeyToChunkMap::iterator File_Namespace::CachingFileMgr::deleteBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const bool  purge = true 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 711 of file CachingFileMgr.cpp.

713  {
714  removeKey(chunk_it->first);
715  return FileMgr::deleteBufferUnlocked(chunk_it, purge);
716 }
virtual ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:758
void removeKey(const ChunkKey &key) const
void File_Namespace::CachingFileMgr::deleteCacheIfTooLarge ( )
private

When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

Definition at line 417 of file CachingFileMgr.cpp.

References logger::INFO, LOG, and anonymous_namespace{CachingFileMgr.cpp}::size_of_dir().

Referenced by init().

417  {
420  bf::create_directory(fileMgrBasePath_);
421  LOG(INFO) << "Cache path over limit. Existing cache deleted.";
422  }
423 }
size_t size_of_dir(const std::string &dir)
#define LOG(tag)
Definition: Logger.h:285
void closeRemovePhysical() override
Closes files and removes the caching directory.
std::string fileMgrBasePath_
Definition: FileMgr.h:397

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::deleteWrapperFile ( int32_t  db,
int32_t  tb 
)

Deletes the wrapper file from a table subdir.

Definition at line 650 of file CachingFileMgr.cpp.

References CHECK.

650  {
652  auto it = table_dirs_.find({db, tb});
653  CHECK(it != table_dirs_.end());
654  it->second->deleteWrapperFile();
655 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
std::string File_Namespace::CachingFileMgr::describeSelf ( ) const
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 243 of file CachingFileMgr.cpp.

243  {
244  return "cache";
245 }
std::string File_Namespace::CachingFileMgr::dump ( ) const

Definition at line 60 of file CachingFileMgr.cpp.

References chunk_evict_alg_, File_Namespace::FileMgr::chunkIndex_, LRUEvictionAlgorithm::dumpEvictionQueue(), show_chunk(), and table_evict_alg_.

60  {
61  std::stringstream ss;
62  ss << "Dump Cache:\n";
63  for (const auto& [key, buf] : chunkIndex_) {
64  ss << " " << show_chunk(key) << " num_pages: " << buf->pageCount()
65  << ", is dirty: " << buf->isDirty() << "\n";
66  }
67  ss << "Data Eviction Queue:\n" << chunk_evict_alg_.dumpEvictionQueue();
68  ss << "Metadata Eviction Queue:\n" << table_evict_alg_.dumpEvictionQueue();
69  ss << "\n";
70  return ss.str();
71 }
LRUEvictionAlgorithm table_evict_alg_
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpEvictionQueue ( ) const
inline

Definition at line 367 of file CachingFileMgr.h.

References chunk_evict_alg_, and LRUEvictionAlgorithm::dumpEvictionQueue().

LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithChunkData ( ) const

Definition at line 633 of file CachingFileMgr.cpp.

References show_chunk().

633  {
635  std::string ret_string = "CFM keys with chunk data:\n";
636  for (const auto& [key, buf] : chunkIndex_) {
637  if (buf->hasDataPages()) {
638  ret_string += " " + show_chunk(key) + "\n";
639  }
640  }
641  return ret_string;
642 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithMetadata ( ) const

Definition at line 622 of file CachingFileMgr.cpp.

References show_chunk().

622  {
624  std::string ret_string = "CFM keys with metadata:\n";
625  for (const auto& [key, buf] : chunkIndex_) {
626  if (buf->hasEncoder()) {
627  ret_string += " " + show_chunk(key) + "\n";
628  }
629  }
630  return ret_string;
631 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::string show_chunk(const ChunkKey &key)
Definition: types.h:98
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpTableQueue ( ) const
inline

Definition at line 366 of file CachingFileMgr.h.

References LRUEvictionAlgorithm::dumpEvictionQueue(), and table_evict_alg_.

LRUEvictionAlgorithm table_evict_alg_

+ Here is the call graph for this function:

int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 143 of file CachingFileMgr.cpp.

References Epoch::min_allowable_epoch(), table_dirs_, and table_dirs_mutex_.

143  {
145  auto tables_it = table_dirs_.find({db_id, tb_id});
146  if (tables_it == table_dirs_.end()) {
147  // If there is no directory for this table, that means the cache does not recognize
148  // the table that is requested. This can happen if a table was dropped, and it's
149  // pages were invalidated but not yet freed and then the server crashed before they
150  // were freed. Upon re-starting the FileMgr will find these pages and attempt to
151  // compare their epoch to know if they are valid or not. In this case we should
152  // return an invalid epoch to indicate that any page for this table is not valid and
153  // should be freed.
155  }
156  auto& [pair, table_dir] = *tables_it;
157  return table_dir->getEpoch();
158 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
static int64_t min_allowable_epoch()
Definition: Epoch.h:65
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictMetadataPages ( )
private

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 477 of file CachingFileMgr.cpp.

References CHECK, anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail(), and get_table_prefix().

477  {
478  // Locks should already be in place before calling this method.
479  FileInfo* file_info{nullptr};
480  auto key_to_evict = evict_chunk_or_fail(table_evict_alg_);
481  auto [db_id, tb_id] = get_table_prefix(key_to_evict);
482  const auto keys = getKeysForTable(db_id, tb_id);
483  for (const auto& key : keys) {
484  auto chunk_it = chunkIndex_.find(key);
485  CHECK(chunk_it != chunkIndex_.end());
486  auto& buf = chunk_it->second;
487  if (!file_info) {
488  // Return the FileInfo for the first file we are freeing a page from so that the
489  // caller does not have to search for a FileInfo guaranteed to have at least one
490  // free page.
491  CHECK(buf->getMetadataPage().pageVersions.size() > 0);
492  file_info =
493  getFileInfoForFileId(buf->getMetadataPage().pageVersions.front().page.fileId);
494  }
495  // We erase all pages and entries for the chunk, as without metadata all other
496  // entries are useless.
497  deleteBufferUnlocked(chunk_it);
498  }
499  // Serialized datawrappers require metadata to be in the cache.
500  deleteWrapperFile(db_id, tb_id);
501  CHECK(file_info) << "FileInfo with freed page not found";
502  return file_info;
503 }
LRUEvictionAlgorithm table_evict_alg_
void deleteWrapperFile(int32_t db, int32_t tb)
Deletes the wrapper file from a table subdir.
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
std::vector< ChunkKey > getKeysForTable(int32_t db_id, int32_t tb_id) const
returns set of keys contained in chunkIndex_ that match the given table prefix.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
#define CHECK(condition)
Definition: Logger.h:291
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:222

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictPages ( )
private

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 505 of file CachingFileMgr.cpp.

References CHECK, and anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail().

505  {
506  FileInfo* file_info{nullptr};
507  FileBuffer* buf{nullptr};
508  while (!file_info) {
510  CHECK(buf);
511  if (!buf->hasDataPages()) {
512  // This buffer contains no chunk data (metadata only, uninitialized, size == 0,
513  // etc...) so we won't recover any space by evicting it. In this case it gets
514  // removed from the eviction queue (it will get re-added if it gets populated with
515  // data) and we look at the next chunk in queue until we find a buffer with page
516  // data.
517  continue;
518  }
519  // Return the FileInfo for the first file we are freeing a page from so that the
520  // caller does not have to search for a FileInfo guaranteed to have at least one free
521  // page.
522  CHECK(buf->getMultiPage().front().pageVersions.size() > 0);
523  file_info = getFileInfoForFileId(
524  buf->getMultiPage().front().pageVersions.front().page.fileId);
525  }
526  auto pages_freed = buf->freeChunkPages();
527  CHECK(pages_freed > 0) << "failed to evict a page";
528  CHECK(file_info) << "FileInfo with freed page not found";
529  return file_info;
530 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
#define CHECK(condition)
Definition: Logger.h:291
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:222
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 292 of file CachingFileMgr.h.

292 { return false; }
void File_Namespace::CachingFileMgr::free_page ( std::pair< FileInfo *, int32_t > &&  page)
overridevirtual

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

Reimplemented from File_Namespace::FileMgr.

Definition at line 733 of file CachingFileMgr.cpp.

733  {
734  page.first->freePageDeferred(page.second);
735 }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 212 of file CachingFileMgr.h.

References getFilesSize(), and getTableFileMgrsSize().

Referenced by getAvailableSpace().

212  {
213  return getFilesSize() + getTableFileMgrsSize();
214  }
size_t getFilesSize() const
Get the total size of page files (data and metadata files). This includes allocated, but unused space.
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableSpace ( )
inline

Definition at line 208 of file CachingFileMgr.h.

References getAllocated(), and max_size_.

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableWrapperSpace ( )
inline

Definition at line 209 of file CachingFileMgr.h.

References getTableFileMgrsSize(), and max_wrapper_space_.

209  {
211  }
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

std::optional< FileBuffer * > File_Namespace::CachingFileMgr::getBufferIfExists ( const ChunkKey key)

an optional version of get buffer if we are not sure a chunk exists.

Definition at line 702 of file CachingFileMgr.cpp.

702  {
704  auto chunk_it = chunkIndex_.find(key);
705  if (chunk_it == chunkIndex_.end()) {
706  return {};
707  }
708  return getBufferUnlocked(key);
709 }
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const override
FileBuffer * File_Namespace::CachingFileMgr::getBufferUnlocked ( const ChunkKey key,
const size_t  numBytes = 0 
) const
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 727 of file CachingFileMgr.cpp.

728  {
729  touchKey(key);
730  return FileMgr::getBufferUnlocked(key, num_bytes);
731 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
virtual FileBuffer * getBufferUnlocked(const ChunkKey &key, const size_t numBytes=0) const
Definition: FileMgr.cpp:788
std::vector< ChunkKey > File_Namespace::CachingFileMgr::getChunkKeysForPrefix ( const ChunkKey prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

Definition at line 584 of file CachingFileMgr.cpp.

References in_same_table().

585  {
587  std::vector<ChunkKey> chunks;
588  for (auto [key, buf] : chunkIndex_) {
589  if (in_same_table(key, prefix)) {
590  if (buf->hasDataPages()) {
591  chunks.emplace_back(key);
592  touchKey(key);
593  }
594  }
595  }
596  return chunks;
597 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
bool in_same_table(const ChunkKey &left_key, const ChunkKey &right_key)
Definition: types.h:83

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::getChunkMetadataVecForKeyPrefix ( ChunkMetadataVector chunkMetadataVec,
const ChunkKey keyPrefix 
)
override

Definition at line 718 of file CachingFileMgr.cpp.

720  {
721  FileMgr::getChunkMetadataVecForKeyPrefix(chunkMetadataVec, keyPrefix);
722  for (const auto& [key, meta] : chunkMetadataVec) {
723  touchKey(key);
724  }
725 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
Definition: FileMgr.cpp:1006
size_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Set of functions to determine how much space is reserved in a table by type.

Definition at line 197 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::page_size_.

Referenced by getSpaceReservedByTable().

197  {
199  size_t space_used = 0;
200  ChunkKey min_table_key{db_id, tb_id};
201  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
202  for (auto it = chunkIndex_.lower_bound(min_table_key);
203  it != chunkIndex_.upper_bound(max_table_key);
204  ++it) {
205  auto& [key, buffer] = *it;
206  space_used += (buffer->numChunkPages() * page_size_);
207  }
208  return space_used;
209 }
std::vector< int > ChunkKey
Definition: types.h:36
const size_t page_size_
Definition: FileMgr.h:535
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getDataFileSize ( ) const
inline
size_t File_Namespace::CachingFileMgr::getFilesSize ( ) const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

Definition at line 556 of file CachingFileMgr.cpp.

Referenced by getAllocated().

556  {
558  size_t sum = 0;
559  for (auto [id, file] : files_) {
560  sum += file->size();
561  }
562  return sum;
563 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:400
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411

+ Here is the caller graph for this function:

std::vector< ChunkKey > File_Namespace::CachingFileMgr::getKeysForTable ( int32_t  db_id,
int32_t  tb_id 
) const
private

returns set of keys contained in chunkIndex_ that match the given table prefix.

Definition at line 464 of file CachingFileMgr.cpp.

465  {
466  std::vector<ChunkKey> keys;
467  ChunkKey min_table_key{db_id, tb_id};
468  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
469  for (auto it = chunkIndex_.lower_bound(min_table_key);
470  it != chunkIndex_.upper_bound(max_table_key);
471  ++it) {
472  keys.emplace_back(it->first);
473  }
474  return keys;
475 }
std::vector< int > ChunkKey
Definition: types.h:36
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
std::set< ChunkKey > File_Namespace::CachingFileMgr::getKeysWithMetadata ( ) const

Definition at line 737 of file CachingFileMgr.cpp.

737  {
739  std::set<ChunkKey> ret;
740  for (const auto& [key, buf] : chunkIndex_) {
741  if (buf->hasEncoder()) {
742  ret.emplace(key);
743  }
744  }
745  return ret;
746 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getMaxDataFiles ( ) const
inline

Definition at line 198 of file CachingFileMgr.h.

References max_num_data_files_.

size_t File_Namespace::CachingFileMgr::getMaxDataFilesSize ( ) const

Definition at line 748 of file CachingFileMgr.cpp.

748  {
749  if (limit_data_size_) {
750  return *limit_data_size_;
751  }
752  return getMaxDataFiles() * getDataFileSize();
753 }
std::optional< size_t > limit_data_size_
size_t File_Namespace::CachingFileMgr::getMaxMetaFiles ( ) const
inline

Definition at line 199 of file CachingFileMgr.h.

References max_num_meta_files_.

size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 197 of file CachingFileMgr.h.

References max_size_.

197 { return max_size_; }
size_t File_Namespace::CachingFileMgr::getMaxWrapperSize ( ) const
inline

Definition at line 200 of file CachingFileMgr.h.

References max_wrapper_space_.

size_t File_Namespace::CachingFileMgr::getMetadataFileSize ( ) const
inline
size_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 211 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::metadata_page_size_.

Referenced by getSpaceReservedByTable().

212  {
214  size_t space_used = 0;
215  ChunkKey min_table_key{db_id, tb_id};
216  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
217  for (auto it = chunkIndex_.lower_bound(min_table_key);
218  it != chunkIndex_.upper_bound(max_table_key);
219  ++it) {
220  auto& [key, buffer] = *it;
221  space_used += (buffer->numMetadataPages() * metadata_page_size_);
222  }
223  return space_used;
224 }
const size_t metadata_page_size_
Definition: FileMgr.h:536
std::vector< int > ChunkKey
Definition: types.h:36
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the caller graph for this function:

MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 194 of file CachingFileMgr.h.

194 { return CACHING_FILE_MGR; };
static size_t File_Namespace::CachingFileMgr::getMinimumSize ( )
inlinestatic

Definition at line 182 of file CachingFileMgr.h.

References DEFAULT_METADATA_PAGE_SIZE, File_Namespace::FileMgr::DEFAULT_NUM_PAGES_PER_METADATA_FILE, and METADATA_FILE_SPACE_PERCENTAGE.

Referenced by CommandLineOptions::validate().

182  {
183  // Currently the minimum default size is based on the metadata file size and
184  // percentage usage.
187  }
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE
Definition: FileMgr.h:372
#define DEFAULT_METADATA_PAGE_SIZE
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getNumChunksWithMetadata ( ) const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

Definition at line 611 of file CachingFileMgr.cpp.

611  {
613  size_t sum = 0;
614  for (const auto& [key, buf] : chunkIndex_) {
615  if (buf->hasEncoder()) {
616  sum++;
617  }
618  }
619  return sum;
620 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getNumDataChunks ( ) const

Returns the number of buffers with chunk data in the CFM.

Definition at line 406 of file CachingFileMgr.cpp.

406  {
408  size_t num_chunks = 0;
409  for (auto [key, buf] : chunkIndex_) {
410  if (buf->hasDataPages()) {
411  num_chunks++;
412  }
413  }
414  return num_chunks;
415 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
size_t File_Namespace::CachingFileMgr::getNumDataFiles ( ) const

Definition at line 574 of file CachingFileMgr.cpp.

574  {
576  return fileIndex_.count(page_size_);
577 }
const size_t page_size_
Definition: FileMgr.h:535
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:401
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411
size_t File_Namespace::CachingFileMgr::getNumMetaFiles ( ) const

Definition at line 579 of file CachingFileMgr.cpp.

579  {
581  return fileIndex_.count(metadata_page_size_);
582 }
const size_t metadata_page_size_
Definition: FileMgr.h:536
heavyai::shared_lock< heavyai::shared_mutex > read_lock
std::shared_lock< T > shared_lock
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:401
heavyai::shared_mutex files_rw_mutex_
Definition: FileMgr.h:411
size_t File_Namespace::CachingFileMgr::getPageSize ( )
inline

Definition at line 196 of file CachingFileMgr.h.

References File_Namespace::FileMgr::page_size_.

196 { return page_size_; }
const size_t page_size_
Definition: FileMgr.h:535
size_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 236 of file CachingFileMgr.cpp.

References getChunkSpaceReservedByTable(), getMetadataSpaceReservedByTable(), and getTableFileMgrSpaceReserved().

236  {
237  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
238  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
239  auto subdir_space = getTableFileMgrSpaceReserved(db_id, tb_id);
240  return chunk_space + meta_space + subdir_space;
241 }
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 195 of file CachingFileMgr.h.

195 { return ToString(CACHING_FILE_MGR); }
std::string File_Namespace::CachingFileMgr::getTableFileMgrPath ( int32_t  db,
int32_t  tb 
) const

Definition at line 181 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table(), and File_Namespace::FileMgr::getFileMgrBasePath().

181  {
182  return getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
183 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:331

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrSpaceReserved ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 226 of file CachingFileMgr.cpp.

References table_dirs_, and table_dirs_mutex_.

Referenced by getSpaceReservedByTable().

226  {
228  size_t space = 0;
229  auto table_it = table_dirs_.find({db_id, tb_id});
230  if (table_it != table_dirs_.end()) {
231  space += table_it->second->getReservedSpace();
232  }
233  return space;
234 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrsSize ( ) const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

Definition at line 565 of file CachingFileMgr.cpp.

Referenced by getAllocated(), and getAvailableWrapperSpace().

565  {
567  size_t space_used = 0;
568  for (const auto& [pair, table_dir] : table_dirs_) {
569  space_used += table_dir->getReservedSpace();
570  }
571  return space_used;
572 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 231 of file CachingFileMgr.h.

231 { return false; }
bool File_Namespace::CachingFileMgr::hasWrapperFile ( int32_t  db_id,
int32_t  table_id 
) const

Checks if data wrapper file has been written to disk/cached.

Definition at line 669 of file CachingFileMgr.cpp.

669  {
671  auto it = table_dirs_.find({db_id, table_id});
672  if (it != table_dirs_.end()) {
673  return it->second->hasWrapperFile();
674  }
675  return false;
676 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Increment epochs for each table in the CFM.

Definition at line 320 of file CachingFileMgr.cpp.

Referenced by init().

320  {
322  for (auto& table_dir : table_dirs_) {
323  table_dir.second->incrementEpoch();
324  }
325 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Increments epoch for the given table.

Definition at line 160 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

160  {
162  auto tables_it = table_dirs_.find({db_id, tb_id});
163  CHECK(tables_it != table_dirs_.end());
164  auto& [pair, table_dir] = *tables_it;
165  table_dir->incrementEpoch();
166 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

Definition at line 85 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), deleteCacheIfTooLarge(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableFileMgrs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

85  {
88  auto open_files_result = openFiles();
89  /* Sort headerVec so that all HeaderInfos
90  * from a chunk will be grouped together
91  * and in order of increasing PageId
92  * - Version Epoch */
93  auto& header_vec = open_files_result.header_infos;
94  std::sort(header_vec.begin(), header_vec.end());
95 
96  /* Goal of next section is to find sequences in the
97  * sorted headerVec of the same ChunkId, which we
98  * can then initiate a FileBuffer with */
99  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
100  if (header_vec.size() > 0) {
101  auto startIt = header_vec.begin();
102  ChunkKey lastChunkKey = startIt->chunkKey;
103  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
104  if (it->chunkKey != lastChunkKey) {
105  createBufferFromHeaders(lastChunkKey, startIt, it);
106  lastChunkKey = it->chunkKey;
107  startIt = it;
108  }
109  }
110  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
111  }
112  nextFileId_ = open_files_result.max_file_id + 1;
114  freePages();
115  initializeNumThreads(num_reader_threads);
116  isFullyInitted_ = true;
117 }
std::vector< int > ChunkKey
Definition: types.h:36
OpenFilesResult openFiles()
Definition: FileMgr.cpp:196
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void deleteCacheIfTooLarge()
When the cache is read from disk, we don&#39;t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.
void incrementAllEpochs()
Increment epochs for each table in the CFM.
void readTableFileMgrs()
Checks for any sub-directories containing table-specific data and creates epochs from found files...
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1582
unsigned nextFileId_
number of threads used when loading data
Definition: FileMgr.h:403
#define VLOG(n)
Definition: Logger.h:387
FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
Creates a buffer and initializes it with info read from files on disk.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 308 of file CachingFileMgr.cpp.

References CHECK, Data_Namespace::AbstractBuffer::isDirty(), Data_Namespace::AbstractBuffer::setAppended(), Data_Namespace::AbstractBuffer::setDirty(), and Data_Namespace::AbstractBuffer::size().

310  {
311  CHECK(!src_buffer->isDirty()) << "Cannot cache dirty buffers.";
313  // Since the buffer is not dirty we mark it as dirty if we are only writing metadata and
314  // appended if we are writing chunk data. We delete + append rather than write to make
315  // sure we don't write multiple page versions.
316  (src_buffer->size() == 0) ? src_buffer->setDirty() : src_buffer->setAppended();
317  return FileMgr::putBuffer(key, src_buffer, num_bytes);
318 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
#define CHECK(condition)
Definition: Logger.h:291
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:814

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::readTableFileMgrs ( )
private

Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 119 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, table_dirs_, and table_dirs_mutex_.

Referenced by init().

119  {
121  bf::path path(fileMgrBasePath_);
122  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
123  CHECK(bf::is_directory(path))
124  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
125 
126  // Look for directories with table-specific names.
127  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
128  for (const auto& file : bf::directory_iterator(path)) {
129  boost::smatch match;
130  auto file_name = file.path().filename().string();
131  if (boost::regex_match(file_name, match, table_filter)) {
132  int32_t db_id = std::stoi(match[1]);
133  int32_t tb_id = std::stoi(match[2]);
134  TablePair table_pair{db_id, tb_id};
135  CHECK(table_dirs_.find(table_pair) == table_dirs_.end())
136  << "Trying to read data for existing table";
137  table_dirs_.emplace(table_pair,
138  std::make_unique<TableFileMgr>(file.path().string()));
139  }
140  }
141 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::string fileMgrBasePath_
Definition: FileMgr.h:397
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:91

+ Here is the caller graph for this function:

std::unique_ptr< CachingFileMgr > File_Namespace::CachingFileMgr::reconstruct ( ) const

Initializes a new CFM using the initialization values in the current CFM.

Definition at line 644 of file CachingFileMgr.cpp.

644  {
645  DiskCacheConfig config{
647  return std::make_unique<CachingFileMgr>(config);
648 }
const size_t page_size_
Definition: FileMgr.h:535
std::string fileMgrBasePath_
Definition: FileMgr.h:397
size_t num_reader_threads_
Maps page sizes to FileInfo objects.
Definition: FileMgr.h:402
void File_Namespace::CachingFileMgr::removeChunkKeepMetadata ( const ChunkKey key)

Free pages for chunk and remove it from the chunk eviction algorithm.

Definition at line 599 of file CachingFileMgr.cpp.

References CHECK.

599  {
600  if (isBufferOnDevice(key)) {
601  auto chunkIt = chunkIndex_.find(key);
602  CHECK(chunkIt != chunkIndex_.end());
603  auto& buf = chunkIt->second;
604  if (buf->hasDataPages()) {
605  buf->freeChunkPages();
607  }
608  }
609 }
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
bool isBufferOnDevice(const ChunkKey &key) override
Definition: FileMgr.cpp:745
#define CHECK(condition)
Definition: Logger.h:291
LRUEvictionAlgorithm chunk_evict_alg_
void File_Namespace::CachingFileMgr::removeKey ( const ChunkKey key) const
private

Definition at line 537 of file CachingFileMgr.cpp.

References get_table_prefix().

537  {
538  // chunkIndex lock should already be acquired.
540  auto [db_id, tb_id] = get_table_prefix(key);
541  ChunkKey table_key{db_id, tb_id};
542  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
543  for (auto it = chunkIndex_.lower_bound(table_key);
544  it != chunkIndex_.upper_bound(max_table_key);
545  ++it) {
546  if (it->first != key) {
547  // If there are any keys in this table other than that one we are removing, then
548  // keep the table in the eviction queue.
549  return;
550  }
551  }
552  // No other keys exist for this table, so remove it from the queue.
553  table_evict_alg_.removeChunk(table_key);
554 }
std::vector< int > ChunkKey
Definition: types.h:36
LRUEvictionAlgorithm table_evict_alg_
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:62
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Erases and cleans up all buffers for a table.

Definition at line 337 of file CachingFileMgr.cpp.

Referenced by clearForTable().

337  {
338  // Free associated FileBuffers and clear buffer entries.
340  ChunkKey min_table_key{db_id, tb_id};
341  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
342  for (auto it = chunkIndex_.lower_bound(min_table_key);
343  it != chunkIndex_.upper_bound(max_table_key);) {
344  it = deleteBufferUnlocked(it);
345  }
346 }
std::vector< int > ChunkKey
Definition: types.h:36
heavyai::unique_lock< heavyai::shared_mutex > write_lock
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableFileMgr ( int32_t  db_id,
int32_t  tb_id 
)
private

Removes the subdirectory content for a table.

Definition at line 327 of file CachingFileMgr.cpp.

Referenced by clearForTable().

327  {
328  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
330  auto it = table_dirs_.find({db_id, tb_id});
331  if (it != table_dirs_.end()) {
332  it->second->removeDiskContent();
333  table_dirs_.erase(it);
334  }
335 }
heavyai::shared_mutex table_dirs_mutex_
heavyai::unique_lock< heavyai::shared_mutex > write_lock
std::unique_lock< T > unique_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_

+ Here is the caller graph for this function:

Page File_Namespace::CachingFileMgr::requestFreePage ( size_t  pagesize,
const bool  isMetadata 
)
overrideprivatevirtual

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

Reimplemented from File_Namespace::FileMgr.

Definition at line 425 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileInfo::fileId, and File_Namespace::FileInfo::getFreePage().

425  {
426  std::lock_guard<std::mutex> lock(getPageMutex_);
427  int32_t pageNum = -1;
428  // Splits files into metadata and regular data by size.
429  auto candidateFiles = fileIndex_.equal_range(pageSize);
430  // Check if there is a free page in an existing file.
431  for (auto fileIt = candidateFiles.first; fileIt != candidateFiles.second; ++fileIt) {
432  FileInfo* fileInfo = files_.at(fileIt->second);
433  pageNum = fileInfo->getFreePage();
434  if (pageNum != -1) {
435  return (Page(fileInfo->fileId, pageNum));
436  }
437  }
438 
439  // Try to add a new file if there is free space available.
440  FileInfo* fileInfo = nullptr;
441  if (isMetadata) {
442  if (getMaxMetaFiles() > getNumMetaFiles()) {
443  fileInfo = createFile(pageSize, num_pages_per_metadata_file_);
444  }
445  } else {
446  if (getMaxDataFiles() > getNumDataFiles()) {
447  fileInfo = createFile(pageSize, num_pages_per_data_file_);
448  }
449  }
450 
451  if (!fileInfo) {
452  // We were not able to create a new file, so we try to evict space.
453  // Eviction will return the first file it evicted a page from (a file now guaranteed
454  // to have a free page).
455  fileInfo = isMetadata ? evictMetadataPages() : evictPages();
456  }
457  CHECK(fileInfo);
458 
459  pageNum = fileInfo->getFreePage();
460  CHECK(pageNum != -1);
461  return (Page(fileInfo->fileId, pageNum));
462 }
std::mutex getPageMutex_
pointer to DB level metadata
Definition: FileMgr.h:409
static size_t num_pages_per_data_file_
Definition: FileMgr.h:417
FileInfo * evictPages()
evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first F...
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:401
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:418
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
FileInfo * createFile(const size_t pageSize, const size_t numPages)
Adds a file to the file manager repository.
Definition: FileMgr.cpp:960
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:400
#define CHECK(condition)
Definition: Logger.h:291

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::setDataSizeLimit ( size_t  max)
inline

Definition at line 375 of file CachingFileMgr.h.

References limit_data_size_.

375 { limit_data_size_ = max; }
std::optional< size_t > limit_data_size_
void File_Namespace::CachingFileMgr::setMaxNumDataFiles ( size_t  max)
inline

Definition at line 371 of file CachingFileMgr.h.

References max_num_data_files_.

void File_Namespace::CachingFileMgr::setMaxNumMetadataFiles ( size_t  max)
inline

Definition at line 372 of file CachingFileMgr.h.

References max_num_meta_files_.

void File_Namespace::CachingFileMgr::setMaxSizes ( )
private

Sets the maximum number of files/space for each type of storage based on the maximum size.

Definition at line 687 of file CachingFileMgr.cpp.

References CHECK_GT.

Referenced by CachingFileMgr().

687  {
688  size_t max_meta_space = std::floor(max_size_ * METADATA_SPACE_PERCENTAGE);
689  size_t max_meta_file_space = std::floor(max_size_ * METADATA_FILE_SPACE_PERCENTAGE);
690  max_wrapper_space_ = max_meta_space - max_meta_file_space;
691  auto max_data_space = max_size_ - max_meta_space;
692  auto meta_file_size = metadata_page_size_ * num_pages_per_metadata_file_;
693  auto data_file_size = page_size_ * num_pages_per_data_file_;
694  max_num_data_files_ = max_data_space / data_file_size;
695  max_num_meta_files_ = max_meta_file_space / meta_file_size;
696  CHECK_GT(max_num_data_files_, 0U) << "Cannot create a cache of size " << max_size_
697  << ". Not enough space to create a data file.";
698  CHECK_GT(max_num_meta_files_, 0U) << "Cannot create a cache of size " << max_size_
699  << ". Not enough space to create a metadata file.";
700 }
const size_t metadata_page_size_
Definition: FileMgr.h:536
const size_t page_size_
Definition: FileMgr.h:535
static constexpr float METADATA_SPACE_PERCENTAGE
#define CHECK_GT(x, y)
Definition: Logger.h:305
static size_t num_pages_per_data_file_
Definition: FileMgr.h:417
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:418
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::setMaxWrapperSpace ( size_t  max)
inline

Definition at line 373 of file CachingFileMgr.h.

References max_wrapper_space_.

void File_Namespace::CachingFileMgr::touchKey ( const ChunkKey key) const
private

Used to track which tables/chunks were least recently used.

Definition at line 532 of file CachingFileMgr.cpp.

References get_table_key().

532  {
535 }
LRUEvictionAlgorithm table_evict_alg_
void touchChunk(const ChunkKey &) override
ChunkKey get_table_key(const ChunkKey &key)
Definition: types.h:57
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 362 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePage(), and File_Namespace::ROLLOFF_CONTINGENT.

366  {
367  // These contingents are stored by overwriting the bytes used for chunkKeys. If
368  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
369  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
370  // the key. At this point our only option is to free the page as though it was
371  // checkpointed (which should be fine since we only maintain one version of each
372  // page).
373  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
374  file_info->freePage(page_num, false, page_epoch);
375  return true;
376  }
377  return false;
378 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Flushes epoch value to disk for a table.

Definition at line 168 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

168  {
170  auto table_it = table_dirs_.find({db_id, tb_id});
171  CHECK(table_it != table_dirs_.end());
172  table_it->second->writeAndSyncEpochToDisk();
173 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
std::shared_lock< T > shared_lock
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:291
void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

helper function to flush all dirty buffers to disk.

Definition at line 380 of file CachingFileMgr.cpp.

380  {
382  ChunkKey min_table_key{db_id, tb_id};
383  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
384 
385  for (auto chunk_it = chunkIndex_.lower_bound(min_table_key);
386  chunk_it != chunkIndex_.upper_bound(max_table_key);
387  ++chunk_it) {
388  if (auto [key, buf] = *chunk_it; buf->isDirty()) {
389  // Free previous versions first so we only have one metadata version.
390  buf->freeMetadataPages();
391  buf->writeMetadata(epoch(db_id, tb_id));
392  buf->clearDirtyBits();
393  touchKey(key);
394  }
395  }
396 }
std::vector< int > ChunkKey
Definition: types.h:36
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:326
std::unique_lock< T > unique_lock
heavyai::shared_mutex chunkIndexMutex_
Definition: FileMgr.h:410
int32_t epoch() const
Definition: FileMgr.h:517
void File_Namespace::CachingFileMgr::writeWrapperFile ( const std::string &  doc,
int32_t  db,
int32_t  tb 
)

Writes a wrapper file to a table subdir.

Definition at line 657 of file CachingFileMgr.cpp.

References CHECK_LE.

657  {
659  auto wrapper_size = doc.size();
660  CHECK_LE(wrapper_size, getMaxWrapperSize())
661  << "Wrapper is too big to fit into the cache";
662  while (wrapper_size > getAvailableWrapperSpace()) {
664  }
666  table_dirs_.at({db, tb})->writeWrapperFile(doc);
667 }
heavyai::shared_lock< heavyai::shared_mutex > read_lock
heavyai::shared_mutex table_dirs_mutex_
void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)
Writes a wrapper file to a table subdir.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::shared_lock< T > shared_lock
#define CHECK_LE(x, y)
Definition: Logger.h:304
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...

Member Data Documentation

LRUEvictionAlgorithm File_Namespace::CachingFileMgr::chunk_evict_alg_
mutableprivate

Definition at line 501 of file CachingFileMgr.h.

Referenced by dump(), and dumpEvictionQueue().

std::optional<size_t> File_Namespace::CachingFileMgr::limit_data_size_ {}
private

Definition at line 499 of file CachingFileMgr.h.

Referenced by setDataSizeLimit().

size_t File_Namespace::CachingFileMgr::max_num_data_files_
private

Definition at line 495 of file CachingFileMgr.h.

Referenced by getMaxDataFiles(), and setMaxNumDataFiles().

size_t File_Namespace::CachingFileMgr::max_num_meta_files_
private

Definition at line 496 of file CachingFileMgr.h.

Referenced by getMaxMetaFiles(), and setMaxNumMetadataFiles().

size_t File_Namespace::CachingFileMgr::max_size_
private

Definition at line 498 of file CachingFileMgr.h.

Referenced by CachingFileMgr(), getAvailableSpace(), and getMaxSize().

size_t File_Namespace::CachingFileMgr::max_wrapper_space_
private
constexpr float File_Namespace::CachingFileMgr::METADATA_FILE_SPACE_PERCENTAGE {0.01}
static

Definition at line 180 of file CachingFileMgr.h.

Referenced by getMinimumSize().

constexpr float File_Namespace::CachingFileMgr::METADATA_SPACE_PERCENTAGE {0.1}
static

Definition at line 178 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<TableFileMgr> > File_Namespace::CachingFileMgr::table_dirs_
private
heavyai::shared_mutex File_Namespace::CachingFileMgr::table_dirs_mutex_
mutableprivate
LRUEvictionAlgorithm File_Namespace::CachingFileMgr::table_evict_alg_
mutableprivate

Definition at line 502 of file CachingFileMgr.h.

Referenced by dump(), and dumpTableQueue().

constexpr char File_Namespace::CachingFileMgr::WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
static

The documentation for this class was generated from the following files: