OmniSciDB  16c4e035a1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const DiskCacheConfig &config)
 
 ~CachingFileMgr () override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getDefaultPageSize ()
 
size_t getMaxSize () override
 
size_t getMaxDataFiles () const
 
size_t getMaxMetaFiles () const
 
size_t getMaxWrapperSize () const
 
size_t getDataFileSize () const
 
size_t getMetadataFileSize () const
 
size_t getNumDataFiles () const
 
size_t getNumMetaFiles () const
 
size_t getAvailableSpace ()
 
size_t getAvailableWrapperSpace ()
 
size_t getAllocated () override
 
size_t getMaxDataFilesSize () const
 
void removeChunkKeepMetadata (const ChunkKey &key)
 Free pages for chunk and remove it from the chunk eviction algorithm. More...
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
size_t getChunkSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getMetadataSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getTableFileMgrSpaceReserved (int32_t db_id, int32_t tb_id) const
 
size_t getSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
std::string describeSelf () const override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes=0) override
 allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms. More...
 
CachingFileBufferallocateBuffer (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt) override
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
size_t getNumChunksWithMetadata () const
 Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts. More...
 
size_t getNumDataChunks () const
 Returns the number of buffers with chunk data in the CFM. More...
 
std::vector< ChunkKeygetChunkKeysForPrefix (const ChunkKey &prefix) const
 Returns the keys for chunks with chunk data that match the given prefix. More...
 
std::unique_ptr< CachingFileMgrreconstruct () const
 Initializes a new CFM using the initialization values in the current CFM. More...
 
void deleteWrapperFile (int32_t db, int32_t tb)
 Deletes the wrapper file from a table subdir. More...
 
void writeWrapperFile (const std::string &doc, int32_t db, int32_t tb)
 Writes a wrapper file to a table subdir. More...
 
std::string getTableFileMgrPath (int32_t db, int32_t tb) const
 
size_t getFilesSize () const
 Get the total size of page files (data and metadata files). This includes allocated, but unused space. More...
 
size_t getTableFileMgrsSize () const
 Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files. More...
 
std::optional< FileBuffer * > getBufferIfExists (const ChunkKey &key)
 an optional version of get buffer if we are not sure a chunk exists. More...
 
void free_page (std::pair< FileInfo *, int32_t > &&page) override
 Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
std::string dumpKeysWithMetadata () const
 
std::string dumpKeysWithChunkData () const
 
std::string dumpTableQueue () const
 
std::string dumpEvictionQueue () const
 
std::string dump () const
 
void setMaxNumDataFiles (size_t max)
 
void setMaxNumMetadataFiles (size_t max)
 
void setMaxWrapperSpace (size_t max)
 
std::set< ChunkKeygetKeysWithMetadata () const
 
void setDataSizeLimit (size_t max)
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1, const size_t defaultPageSize=DEFAULT_PAGE_SIZE)
 Constructor. More...
 
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const size_t defaultPageSize, const bool runCoreInit)
 
 FileMgr (GlobalFileMgr *gfm, const size_t defaultPageSize, std::string basePath)
 
 ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats () const
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId) const
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator) const
 
void init (const size_t num_reader_threads, const int32_t epochOverride)
 
void init (const std::string &dataPathToConvertFrom, const int32_t epochOverride)
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch () const
 Returns value of epoch at last checkpoint. More...
 
void resetEpochFloor ()
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
int32_t getDBVersion () const
 Index for looking up chunks. More...
 
bool getDBConvert () const
 
void createTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name) const
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 

Static Public Member Functions

static size_t getMinimumSize ()
 
- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 

Static Public Attributes

static constexpr char WRAPPER_FILE_NAME [] = "wrapper_metadata.json"
 
static constexpr float METADATA_SPACE_PERCENTAGE {0.1}
 
static constexpr float METADATA_FILE_SPACE_PERCENTAGE {0.01}
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char constCOPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char constUPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char constDELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 

Private Member Functions

void incrementEpoch (int32_t db_id, int32_t tb_id)
 Increments epoch for the given table. More...
 
void init (const size_t num_reader_threads)
 Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe). More...
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 Flushes epoch value to disk for a table. More...
 
void readTableFileMgrs ()
 Checks for any sub-directories containing table-specific data and creates epochs from found files. More...
 
FileBuffercreateBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
 Creates a buffer and initializes it with info read from files on disk. More...
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a buffer. More...
 
void createTableFileMgrIfNoneExists (const int32_t db_id, const int32_t tb_id)
 Create and initialize a subdirectory for a table if none exists. More...
 
void incrementAllEpochs ()
 Increment epochs for each table in the CFM. More...
 
void removeTableFileMgr (int32_t db_id, int32_t tb_id)
 Removes the subdirectory content for a table. More...
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 Erases and cleans up all buffers for a table. More...
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 helper function to flush all dirty buffers to disk. More...
 
Page requestFreePage (size_t pagesize, const bool isMetadata) override
 requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available. More...
 
void touchKey (const ChunkKey &key) const
 Used to track which tables/chunks were least recently used. More...
 
void removeKey (const ChunkKey &key) const
 
std::vector< ChunkKeygetKeysForTable (int32_t db_id, int32_t tb_id) const
 returns set of keys contained in chunkIndex_ that match the given table prefix. More...
 
FileInfoevictMetadataPages ()
 evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
FileInfoevictPages ()
 evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
void deleteCacheIfTooLarge ()
 When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space. More...
 
void setMaxSizes ()
 Sets the maximum number of files/space for each type of storage based on the maximum size. More...
 
FileBuffergetBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0) override
 
ChunkKeyToChunkMap::iterator deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
 

Private Attributes

mapd_shared_mutex table_dirs_mutex_
 
std::map< TablePair,
std::unique_ptr< TableFileMgr > > 
table_dirs_
 
size_t max_num_data_files_
 
size_t max_num_meta_files_
 
size_t max_wrapper_space_
 
size_t max_size_
 
std::optional< size_t > limit_data_size_ {}
 
LRUEvictionAlgorithm chunk_evict_alg_
 
LRUEvictionAlgorithm table_evict_alg_
 

Additional Inherited Members

- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr ()
 
FileInfocreateFile (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t, FileInfo * > files_
 
PageSizeFileMMap fileIndex_
 A map of files accessible via a file identifier. More...
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
size_t defaultPageSize_
 number of threads used when loading data More...
 
unsigned nextFileId_
 
int32_t db_version_
 the index of the next file id More...
 
int32_t fileMgrVersion_
 
const int32_t latestFileMgrVersion_ {1}
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
mapd_shared_mutex chunkIndexMutex_
 
mapd_shared_mutex files_rw_mutex_
 
mapd_shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 165 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const DiskCacheConfig config)

Definition at line 68 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::defaultPageSize_, File_Namespace::FileMgr::fileMgrBasePath_, init(), max_size_, File_Namespace::FileMgr::maxRollbackEpochs_, File_Namespace::FileMgr::nextFileId_, File_Namespace::DiskCacheConfig::num_reader_threads, File_Namespace::DiskCacheConfig::page_size, File_Namespace::DiskCacheConfig::path, setMaxSizes(), and File_Namespace::DiskCacheConfig::size_limit.

68  {
69  fileMgrBasePath_ = config.path;
71  defaultPageSize_ = config.page_size;
72  nextFileId_ = 0;
73  max_size_ = config.size_limit;
74  init(config.num_reader_threads);
75  setMaxSizes();
76 }
void setMaxSizes()
Sets the maximum number of files/space for each type of storage based on the maximum size...
std::string fileMgrBasePath_
Definition: FileMgr.h:393
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
int32_t maxRollbackEpochs_
Definition: FileMgr.h:392
void init(const size_t num_reader_threads)
Initializes a CFM, parsing any existing files and initializing data structures appropriately (current...

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
override

Definition at line 78 of file CachingFileMgr.cpp.

78 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes = 0 
)
overridevirtual

allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms.

Reimplemented from File_Namespace::FileMgr.

Definition at line 334 of file CachingFileMgr.cpp.

336  {
337  return new CachingFileBuffer(this, page_size, key, num_bytes);
338 }
CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  headerStartIt,
const std::vector< HeaderInfo >::const_iterator &  headerEndIt 
)
overridevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 340 of file CachingFileMgr.cpp.

343  {
344  return new CachingFileBuffer(this, key, headerStartIt, headerEndIt);
345 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 234 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

234  {
235  {
236  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
237  CHECK(table_dirs_.find({db_id, tb_id}) != table_dirs_.end());
238  }
239  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
240  << ") epoch: " << epoch(db_id, tb_id);
241  writeDirtyBuffers(db_id, tb_id);
242  syncFilesToDisk();
243  writeAndSyncEpochToDisk(db_id, tb_id);
244  incrementEpoch(db_id, tb_id);
245  freePages();
246 }
mapd_shared_mutex table_dirs_mutex_
std::string describeSelf() const override
describes this FileMgr for logging purposes.
int32_t incrementEpoch()
Definition: FileMgr.h:282
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:647
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
int32_t epoch() const
Definition: FileMgr.h:513
#define CHECK(condition)
Definition: Logger.h:211
#define VLOG(n)
Definition: Logger.h:305
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 161 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::freePages(), removeTableBuffers(), and removeTableFileMgr().

161  {
162  removeTableBuffers(db_id, tb_id);
163  removeTableFileMgr(db_id, tb_id);
164  freePages();
165 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
Erases and cleans up all buffers for a table.
void removeTableFileMgr(int32_t db_id, int32_t tb_id)
Removes the subdirectory content for a table.

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 171 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::closePhysicalUnlocked(), File_Namespace::FileMgr::files_rw_mutex_, File_Namespace::FileMgr::getFileMgrBasePath(), table_dirs_, and table_dirs_mutex_.

171  {
172  {
173  mapd_unique_lock<mapd_shared_mutex> write_lock(files_rw_mutex_);
175  }
176  {
177  mapd_unique_lock<mapd_shared_mutex> tables_lock(table_dirs_mutex_);
178  table_dirs_.clear();
179  }
180  bf::remove_all(getFileMgrBasePath());
181 }
mapd_shared_mutex table_dirs_mutex_
std::string getFileMgrBasePath() const
Definition: FileMgr.h:332
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:408

+ Here is the call graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
overrideprivatevirtual

Creates a buffer and initializes it with info read from files on disk.

Reimplemented from File_Namespace::FileMgr.

Definition at line 267 of file CachingFileMgr.cpp.

References get_table_prefix().

Referenced by init().

270  {
271  if (startIt->pageId != -1) {
272  // If the first pageId is not -1 then there is no metadata page for the
273  // current key (which means it was never checkpointed), so we should skip.
274  return nullptr;
275  }
276  touchKey(key);
277  auto [db_id, tb_id] = get_table_prefix(key);
278  createTableFileMgrIfNoneExists(db_id, tb_id);
279  auto buffer = FileMgr::createBufferFromHeaders(key, startIt, endIt);
280  if (buffer->isMissingPages()) {
281  // Detect the case where a page is missing by comparing the amount of pages read
282  // with the metadata size. If data are missing, discard the chunk.
283  buffer->freeChunkPages();
284  }
285  return buffer;
286 }
virtual FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt)
Definition: FileMgr.cpp:725
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Creates a buffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 258 of file CachingFileMgr.cpp.

References get_table_prefix().

260  {
261  touchKey(key);
262  auto [db_id, tb_id] = get_table_prefix(key);
263  createTableFileMgrIfNoneExists(db_id, tb_id);
264  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
265 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:714
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createTableFileMgrIfNoneExists ( const int32_t  db_id,
const int32_t  tb_id 
)
private

Create and initialize a subdirectory for a table if none exists.

Definition at line 248 of file CachingFileMgr.cpp.

249  {
250  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
251  TablePair table_pair{db_id, tb_id};
252  if (table_dirs_.find(table_pair) == table_dirs_.end()) {
253  table_dirs_.emplace(
254  table_pair, std::make_unique<TableFileMgr>(getTableFileMgrPath(db_id, tb_id)));
255  }
256 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:92
std::string getTableFileMgrPath(int32_t db, int32_t tb) const
void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 384 of file CachingFileMgr.cpp.

384  {
385  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
386  auto chunk_it = chunkIndex_.find(key);
387  if (chunk_it != chunkIndex_.end()) {
388  deleteBufferUnlocked(chunk_it);
389  }
390 }
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
ChunkKeyToChunkMap::iterator File_Namespace::CachingFileMgr::deleteBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const bool  purge = true 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 691 of file CachingFileMgr.cpp.

693  {
694  removeKey(chunk_it->first);
695  return FileMgr::deleteBufferUnlocked(chunk_it, purge);
696 }
virtual ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:749
void removeKey(const ChunkKey &key) const
void File_Namespace::CachingFileMgr::deleteCacheIfTooLarge ( )
private

When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

Definition at line 403 of file CachingFileMgr.cpp.

References logger::INFO, LOG, and anonymous_namespace{CachingFileMgr.cpp}::size_of_dir().

Referenced by init().

403  {
406  bf::create_directory(fileMgrBasePath_);
407  LOG(INFO) << "Cache path over limit. Existing cache deleted.";
408  }
409 }
size_t size_of_dir(const std::string &dir)
#define LOG(tag)
Definition: Logger.h:205
void closeRemovePhysical() override
Closes files and removes the caching directory.
std::string fileMgrBasePath_
Definition: FileMgr.h:393

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::deleteWrapperFile ( int32_t  db,
int32_t  tb 
)

Deletes the wrapper file from a table subdir.

Definition at line 639 of file CachingFileMgr.cpp.

References CHECK.

639  {
640  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
641  auto it = table_dirs_.find({db, tb});
642  CHECK(it != table_dirs_.end());
643  it->second->deleteWrapperFile();
644 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:211
std::string File_Namespace::CachingFileMgr::describeSelf ( ) const
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 229 of file CachingFileMgr.cpp.

229  {
230  return "cache";
231 }
std::string File_Namespace::CachingFileMgr::dump ( ) const

Definition at line 55 of file CachingFileMgr.cpp.

References chunk_evict_alg_, File_Namespace::FileMgr::chunkIndex_, LRUEvictionAlgorithm::dumpEvictionQueue(), show_chunk(), and table_evict_alg_.

55  {
56  std::stringstream ss;
57  ss << "Dump Cache:\n";
58  for (const auto& [key, buf] : chunkIndex_) {
59  ss << " " << show_chunk(key) << " num_pages: " << buf->pageCount()
60  << ", is dirty: " << buf->isDirty() << "\n";
61  }
62  ss << "Data Eviction Queue:\n" << chunk_evict_alg_.dumpEvictionQueue();
63  ss << "Metadata Eviction Queue:\n" << table_evict_alg_.dumpEvictionQueue();
64  ss << "\n";
65  return ss.str();
66 }
LRUEvictionAlgorithm table_evict_alg_
std::string show_chunk(const ChunkKey &key)
Definition: types.h:94
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpEvictionQueue ( ) const
inline

Definition at line 357 of file CachingFileMgr.h.

References chunk_evict_alg_, and LRUEvictionAlgorithm::dumpEvictionQueue().

LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithChunkData ( ) const

Definition at line 619 of file CachingFileMgr.cpp.

References show_chunk().

619  {
620  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
621  std::string ret_string = "CFM keys with chunk data:\n";
622  for (const auto& [key, buf] : chunkIndex_) {
623  if (buf->hasDataPages()) {
624  ret_string += " " + show_chunk(key) + "\n";
625  }
626  }
627  return ret_string;
628 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:94
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithMetadata ( ) const

Definition at line 608 of file CachingFileMgr.cpp.

References show_chunk().

608  {
609  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
610  std::string ret_string = "CFM keys with metadata:\n";
611  for (const auto& [key, buf] : chunkIndex_) {
612  if (buf->hasEncoder()) {
613  ret_string += " " + show_chunk(key) + "\n";
614  }
615  }
616  return ret_string;
617 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:94
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpTableQueue ( ) const
inline

Definition at line 356 of file CachingFileMgr.h.

References LRUEvictionAlgorithm::dumpEvictionQueue(), and table_evict_alg_.

LRUEvictionAlgorithm table_evict_alg_

+ Here is the call graph for this function:

int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 138 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

138  {
139  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
140  auto tables_it = table_dirs_.find({db_id, tb_id});
141  CHECK(tables_it != table_dirs_.end());
142  auto& [pair, table_dir] = *tables_it;
143  return table_dir->getEpoch();
144 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:211
FileInfo * File_Namespace::CachingFileMgr::evictMetadataPages ( )
private

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 463 of file CachingFileMgr.cpp.

References CHECK, anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail(), and get_table_prefix().

463  {
464  // Locks should already be in place before calling this method.
465  FileInfo* file_info{nullptr};
466  auto key_to_evict = evict_chunk_or_fail(table_evict_alg_);
467  auto [db_id, tb_id] = get_table_prefix(key_to_evict);
468  const auto keys = getKeysForTable(db_id, tb_id);
469  for (const auto& key : keys) {
470  auto chunk_it = chunkIndex_.find(key);
471  CHECK(chunk_it != chunkIndex_.end());
472  auto& buf = chunk_it->second;
473  if (!file_info) {
474  // Return the FileInfo for the first file we are freeing a page from so that the
475  // caller does not have to search for a FileInfo guaranteed to have at least one
476  // free page.
477  CHECK(buf->getMetadataPage().pageVersions.size() > 0);
478  file_info =
479  getFileInfoForFileId(buf->getMetadataPage().pageVersions.front().page.fileId);
480  }
481  // We erase all pages and entries for the chunk, as without metadata all other
482  // entries are useless.
483  deleteBufferUnlocked(chunk_it);
484  }
485  // Serialized datawrappers require metadata to be in the cache.
486  deleteWrapperFile(db_id, tb_id);
487  CHECK(file_info) << "FileInfo with freed page not found";
488  return file_info;
489 }
LRUEvictionAlgorithm table_evict_alg_
void deleteWrapperFile(int32_t db, int32_t tb)
Deletes the wrapper file from a table subdir.
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
std::vector< ChunkKey > getKeysForTable(int32_t db_id, int32_t tb_id) const
returns set of keys contained in chunkIndex_ that match the given table prefix.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58
#define CHECK(condition)
Definition: Logger.h:211
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:225

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictPages ( )
private

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 491 of file CachingFileMgr.cpp.

References CHECK, and anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail().

491  {
492  FileInfo* file_info{nullptr};
493  FileBuffer* buf{nullptr};
494  while (!file_info) {
496  CHECK(buf);
497  if (!buf->hasDataPages()) {
498  // This buffer contains no chunk data (metadata only, uninitialized, size == 0,
499  // etc...) so we won't recover any space by evicting it. In this case it gets
500  // removed from the eviction queue (it will get re-added if it gets populated with
501  // data) and we look at the next chunk in queue until we find a buffer with page
502  // data.
503  continue;
504  }
505  // Return the FileInfo for the first file we are freeing a page from so that the
506  // caller does not have to search for a FileInfo guaranteed to have at least one free
507  // page.
508  CHECK(buf->getMultiPage().front().pageVersions.size() > 0);
509  file_info = getFileInfoForFileId(
510  buf->getMultiPage().front().pageVersions.front().page.fileId);
511  }
512  auto pages_freed = buf->freeChunkPages();
513  CHECK(pages_freed > 0) << "failed to evict a page";
514  CHECK(file_info) << "FileInfo with freed page not found";
515  return file_info;
516 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
#define CHECK(condition)
Definition: Logger.h:211
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:225
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 287 of file CachingFileMgr.h.

287 { return false; }
void File_Namespace::CachingFileMgr::free_page ( std::pair< FileInfo *, int32_t > &&  page)
overridevirtual

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

Reimplemented from File_Namespace::FileMgr.

Definition at line 713 of file CachingFileMgr.cpp.

713  {
714  page.first->freePageDeferred(page.second);
715 }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 207 of file CachingFileMgr.h.

References getFilesSize(), and getTableFileMgrsSize().

Referenced by getAvailableSpace().

207  {
208  return getFilesSize() + getTableFileMgrsSize();
209  }
size_t getFilesSize() const
Get the total size of page files (data and metadata files). This includes allocated, but unused space.
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableSpace ( )
inline

Definition at line 203 of file CachingFileMgr.h.

References getAllocated(), and max_size_.

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableWrapperSpace ( )
inline

Definition at line 204 of file CachingFileMgr.h.

References getTableFileMgrsSize(), and max_wrapper_space_.

204  {
206  }
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

std::optional< FileBuffer * > File_Namespace::CachingFileMgr::getBufferIfExists ( const ChunkKey key)

an optional version of get buffer if we are not sure a chunk exists.

Definition at line 682 of file CachingFileMgr.cpp.

682  {
683  mapd_shared_lock<mapd_shared_mutex> chunk_index_read_lock(chunkIndexMutex_);
684  auto chunk_it = chunkIndex_.find(key);
685  if (chunk_it == chunkIndex_.end()) {
686  return {};
687  }
688  return getBufferUnlocked(chunk_it);
689 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
FileBuffer * getBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0) override
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
FileBuffer * File_Namespace::CachingFileMgr::getBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 707 of file CachingFileMgr.cpp.

708  {
709  touchKey(chunk_it->first);
710  return FileMgr::getBufferUnlocked(chunk_it, num_bytes);
711 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
virtual FileBuffer * getBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0)
Definition: FileMgr.cpp:780
std::vector< ChunkKey > File_Namespace::CachingFileMgr::getChunkKeysForPrefix ( const ChunkKey prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

Definition at line 570 of file CachingFileMgr.cpp.

References in_same_table().

571  {
572  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
573  std::vector<ChunkKey> chunks;
574  for (auto [key, buf] : chunkIndex_) {
575  if (in_same_table(key, prefix)) {
576  if (buf->hasDataPages()) {
577  chunks.emplace_back(key);
578  touchKey(key);
579  }
580  }
581  }
582  return chunks;
583 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
bool in_same_table(const ChunkKey &left_key, const ChunkKey &right_key)
Definition: types.h:79
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::getChunkMetadataVecForKeyPrefix ( ChunkMetadataVector chunkMetadataVec,
const ChunkKey keyPrefix 
)
override

Definition at line 698 of file CachingFileMgr.cpp.

700  {
701  FileMgr::getChunkMetadataVecForKeyPrefix(chunkMetadataVec, keyPrefix);
702  for (const auto& [key, meta] : chunkMetadataVec) {
703  touchKey(key);
704  }
705 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
Definition: FileMgr.cpp:986
size_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Set of functions to determine how much space is reserved in a table by type.

Definition at line 183 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::defaultPageSize_.

Referenced by getSpaceReservedByTable().

183  {
184  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
185  size_t space_used = 0;
186  ChunkKey min_table_key{db_id, tb_id};
187  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
188  for (auto it = chunkIndex_.lower_bound(min_table_key);
189  it != chunkIndex_.upper_bound(max_table_key);
190  ++it) {
191  auto& [key, buffer] = *it;
192  space_used += (buffer->numChunkPages() * defaultPageSize_);
193  }
194  return space_used;
195 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getDataFileSize ( ) const
inline

Definition at line 194 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_, and File_Namespace::FileMgr::num_pages_per_data_file_.

194  {
196  }
static size_t num_pages_per_data_file_
Definition: FileMgr.h:414
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
size_t File_Namespace::CachingFileMgr::getDefaultPageSize ( )
inline

Definition at line 189 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_.

189 { return defaultPageSize_; }
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
size_t File_Namespace::CachingFileMgr::getFilesSize ( ) const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

Definition at line 542 of file CachingFileMgr.cpp.

Referenced by getAllocated().

542  {
543  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
544  size_t sum = 0;
545  for (auto [id, file] : files_) {
546  sum += file->size();
547  }
548  return sum;
549 }
mapd_shared_lock< mapd_shared_mutex > read_lock
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:396
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:408

+ Here is the caller graph for this function:

std::vector< ChunkKey > File_Namespace::CachingFileMgr::getKeysForTable ( int32_t  db_id,
int32_t  tb_id 
) const
private

returns set of keys contained in chunkIndex_ that match the given table prefix.

Definition at line 450 of file CachingFileMgr.cpp.

451  {
452  std::vector<ChunkKey> keys;
453  ChunkKey min_table_key{db_id, tb_id};
454  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
455  for (auto it = chunkIndex_.lower_bound(min_table_key);
456  it != chunkIndex_.upper_bound(max_table_key);
457  ++it) {
458  keys.emplace_back(it->first);
459  }
460  return keys;
461 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
std::set< ChunkKey > File_Namespace::CachingFileMgr::getKeysWithMetadata ( ) const

Definition at line 717 of file CachingFileMgr.cpp.

717  {
718  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
719  std::set<ChunkKey> ret;
720  for (const auto& [key, buf] : chunkIndex_) {
721  if (buf->hasEncoder()) {
722  ret.emplace(key);
723  }
724  }
725  return ret;
726 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
size_t File_Namespace::CachingFileMgr::getMaxDataFiles ( ) const
inline

Definition at line 191 of file CachingFileMgr.h.

References max_num_data_files_.

size_t File_Namespace::CachingFileMgr::getMaxDataFilesSize ( ) const

Definition at line 728 of file CachingFileMgr.cpp.

728  {
729  if (limit_data_size_) {
730  return *limit_data_size_;
731  }
732  return getMaxDataFiles() * getDataFileSize();
733 }
std::optional< size_t > limit_data_size_
size_t File_Namespace::CachingFileMgr::getMaxMetaFiles ( ) const
inline

Definition at line 192 of file CachingFileMgr.h.

References max_num_meta_files_.

size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 190 of file CachingFileMgr.h.

References max_size_.

190 { return max_size_; }
size_t File_Namespace::CachingFileMgr::getMaxWrapperSize ( ) const
inline

Definition at line 193 of file CachingFileMgr.h.

References max_wrapper_space_.

size_t File_Namespace::CachingFileMgr::getMetadataFileSize ( ) const
inline

Definition at line 197 of file CachingFileMgr.h.

References METADATA_PAGE_SIZE, and File_Namespace::FileMgr::num_pages_per_metadata_file_.

197  {
199  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:415
size_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 197 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and METADATA_PAGE_SIZE.

Referenced by getSpaceReservedByTable().

198  {
199  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
200  size_t space_used = 0;
201  ChunkKey min_table_key{db_id, tb_id};
202  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
203  for (auto it = chunkIndex_.lower_bound(min_table_key);
204  it != chunkIndex_.upper_bound(max_table_key);
205  ++it) {
206  auto& [key, buffer] = *it;
207  space_used += (buffer->numMetadataPages() * METADATA_PAGE_SIZE);
208  }
209  return space_used;
210 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the caller graph for this function:

MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 187 of file CachingFileMgr.h.

187 { return CACHING_FILE_MGR; };
static size_t File_Namespace::CachingFileMgr::getMinimumSize ( )
inlinestatic

Definition at line 175 of file CachingFileMgr.h.

References File_Namespace::FileMgr::DEFAULT_NUM_PAGES_PER_METADATA_FILE, METADATA_FILE_SPACE_PERCENTAGE, and METADATA_PAGE_SIZE.

Referenced by CommandLineOptions::validate().

175  {
176  // Currently the minimum default size is based on the metadata file size and
177  // percentage usage.
180  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE
Definition: FileMgr.h:370
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getNumChunksWithMetadata ( ) const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

Definition at line 597 of file CachingFileMgr.cpp.

597  {
598  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
599  size_t sum = 0;
600  for (const auto& [key, buf] : chunkIndex_) {
601  if (buf->hasEncoder()) {
602  sum++;
603  }
604  }
605  return sum;
606 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
size_t File_Namespace::CachingFileMgr::getNumDataChunks ( ) const

Returns the number of buffers with chunk data in the CFM.

Definition at line 392 of file CachingFileMgr.cpp.

392  {
393  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
394  size_t num_chunks = 0;
395  for (auto [key, buf] : chunkIndex_) {
396  if (buf->hasDataPages()) {
397  num_chunks++;
398  }
399  }
400  return num_chunks;
401 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
size_t File_Namespace::CachingFileMgr::getNumDataFiles ( ) const

Definition at line 560 of file CachingFileMgr.cpp.

560  {
561  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
562  return fileIndex_.count(defaultPageSize_);
563 }
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:397
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:408
size_t File_Namespace::CachingFileMgr::getNumMetaFiles ( ) const

Definition at line 565 of file CachingFileMgr.cpp.

References METADATA_PAGE_SIZE.

565  {
566  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
567  return fileIndex_.count(METADATA_PAGE_SIZE);
568 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:397
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:408
size_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 222 of file CachingFileMgr.cpp.

References getChunkSpaceReservedByTable(), getMetadataSpaceReservedByTable(), and getTableFileMgrSpaceReserved().

222  {
223  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
224  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
225  auto subdir_space = getTableFileMgrSpaceReserved(db_id, tb_id);
226  return chunk_space + meta_space + subdir_space;
227 }
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 188 of file CachingFileMgr.h.

188 { return ToString(CACHING_FILE_MGR); }
std::string File_Namespace::CachingFileMgr::getTableFileMgrPath ( int32_t  db,
int32_t  tb 
) const

Definition at line 167 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table(), and File_Namespace::FileMgr::getFileMgrBasePath().

167  {
168  return getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
169 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:332

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrSpaceReserved ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 212 of file CachingFileMgr.cpp.

References table_dirs_, and table_dirs_mutex_.

Referenced by getSpaceReservedByTable().

212  {
213  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
214  size_t space = 0;
215  auto table_it = table_dirs_.find({db_id, tb_id});
216  if (table_it != table_dirs_.end()) {
217  space += table_it->second->getReservedSpace();
218  }
219  return space;
220 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrsSize ( ) const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

Definition at line 551 of file CachingFileMgr.cpp.

Referenced by getAllocated(), and getAvailableWrapperSpace().

551  {
552  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
553  size_t space_used = 0;
554  for (const auto& [pair, table_dir] : table_dirs_) {
555  space_used += table_dir->getReservedSpace();
556  }
557  return space_used;
558 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 226 of file CachingFileMgr.h.

226 { return false; }
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Increment epochs for each table in the CFM.

Definition at line 306 of file CachingFileMgr.cpp.

Referenced by init().

306  {
307  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
308  for (auto& table_dir : table_dirs_) {
309  table_dir.second->incrementEpoch();
310  }
311 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Increments epoch for the given table.

Definition at line 146 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

146  {
147  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
148  auto tables_it = table_dirs_.find({db_id, tb_id});
149  CHECK(tables_it != table_dirs_.end());
150  auto& [pair, table_dir] = *tables_it;
151  table_dir->incrementEpoch();
152 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:211
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

Definition at line 80 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), deleteCacheIfTooLarge(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableFileMgrs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

80  {
83  auto open_files_result = openFiles();
84  /* Sort headerVec so that all HeaderInfos
85  * from a chunk will be grouped together
86  * and in order of increasing PageId
87  * - Version Epoch */
88  auto& header_vec = open_files_result.header_infos;
89  std::sort(header_vec.begin(), header_vec.end());
90 
91  /* Goal of next section is to find sequences in the
92  * sorted headerVec of the same ChunkId, which we
93  * can then initiate a FileBuffer with */
94  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
95  if (header_vec.size() > 0) {
96  auto startIt = header_vec.begin();
97  ChunkKey lastChunkKey = startIt->chunkKey;
98  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
99  if (it->chunkKey != lastChunkKey) {
100  createBufferFromHeaders(lastChunkKey, startIt, it);
101  lastChunkKey = it->chunkKey;
102  startIt = it;
103  }
104  }
105  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
106  }
107  nextFileId_ = open_files_result.max_file_id + 1;
109  freePages();
110  initializeNumThreads(num_reader_threads);
111  isFullyInitted_ = true;
112 }
std::vector< int > ChunkKey
Definition: types.h:37
OpenFilesResult openFiles()
Definition: FileMgr.cpp:189
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void deleteCacheIfTooLarge()
When the cache is read from disk, we don&#39;t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.
void incrementAllEpochs()
Increment epochs for each table in the CFM.
void readTableFileMgrs()
Checks for any sub-directories containing table-specific data and creates epochs from found files...
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1529
#define VLOG(n)
Definition: Logger.h:305
FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
Creates a buffer and initializes it with info read from files on disk.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 294 of file CachingFileMgr.cpp.

References CHECK, Data_Namespace::AbstractBuffer::isDirty(), Data_Namespace::AbstractBuffer::setAppended(), Data_Namespace::AbstractBuffer::setDirty(), and Data_Namespace::AbstractBuffer::size().

296  {
297  CHECK(!src_buffer->isDirty()) << "Cannot cache dirty buffers.";
299  // Since the buffer is not dirty we mark it as dirty if we are only writing metadata and
300  // appended if we are writing chunk data. We delete + append rather than write to make
301  // sure we don't write multiple page versions.
302  (src_buffer->size() == 0) ? src_buffer->setDirty() : src_buffer->setAppended();
303  return FileMgr::putBuffer(key, src_buffer, num_bytes);
304 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
#define CHECK(condition)
Definition: Logger.h:211
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:806

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::readTableFileMgrs ( )
private

Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 114 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, table_dirs_, and table_dirs_mutex_.

Referenced by init().

114  {
115  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
116  bf::path path(fileMgrBasePath_);
117  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
118  CHECK(bf::is_directory(path))
119  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
120 
121  // Look for directories with table-specific names.
122  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
123  for (const auto& file : bf::directory_iterator(path)) {
124  boost::smatch match;
125  auto file_name = file.path().filename().string();
126  if (boost::regex_match(file_name, match, table_filter)) {
127  int32_t db_id = std::stoi(match[1]);
128  int32_t tb_id = std::stoi(match[2]);
129  TablePair table_pair{db_id, tb_id};
130  CHECK(table_dirs_.find(table_pair) == table_dirs_.end())
131  << "Trying to read data for existing table";
132  table_dirs_.emplace(table_pair,
133  std::make_unique<TableFileMgr>(file.path().string()));
134  }
135  }
136 }
mapd_shared_mutex table_dirs_mutex_
std::string fileMgrBasePath_
Definition: FileMgr.h:393
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:211
mapd_unique_lock< mapd_shared_mutex > write_lock
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:92

+ Here is the caller graph for this function:

std::unique_ptr< CachingFileMgr > File_Namespace::CachingFileMgr::reconstruct ( ) const

Initializes a new CFM using the initialization values in the current CFM.

Definition at line 630 of file CachingFileMgr.cpp.

630  {
631  DiskCacheConfig config{fileMgrBasePath_,
634  max_size_,
636  return std::make_unique<CachingFileMgr>(config);
637 }
std::string fileMgrBasePath_
Definition: FileMgr.h:393
size_t num_reader_threads_
Maps page sizes to FileInfo objects.
Definition: FileMgr.h:398
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
void File_Namespace::CachingFileMgr::removeChunkKeepMetadata ( const ChunkKey key)

Free pages for chunk and remove it from the chunk eviction algorithm.

Definition at line 585 of file CachingFileMgr.cpp.

References CHECK.

585  {
586  if (isBufferOnDevice(key)) {
587  auto chunkIt = chunkIndex_.find(key);
588  CHECK(chunkIt != chunkIndex_.end());
589  auto& buf = chunkIt->second;
590  if (buf->hasDataPages()) {
591  buf->freeChunkPages();
593  }
594  }
595 }
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
bool isBufferOnDevice(const ChunkKey &key) override
Definition: FileMgr.cpp:736
#define CHECK(condition)
Definition: Logger.h:211
LRUEvictionAlgorithm chunk_evict_alg_
void File_Namespace::CachingFileMgr::removeKey ( const ChunkKey key) const
private

Definition at line 523 of file CachingFileMgr.cpp.

References get_table_prefix().

523  {
524  // chunkIndex lock should already be acquired.
526  auto [db_id, tb_id] = get_table_prefix(key);
527  ChunkKey table_key{db_id, tb_id};
528  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
529  for (auto it = chunkIndex_.lower_bound(table_key);
530  it != chunkIndex_.upper_bound(max_table_key);
531  ++it) {
532  if (it->first != key) {
533  // If there are any keys in this table other than that one we are removing, then
534  // keep the table in the eviction queue.
535  return;
536  }
537  }
538  // No other keys exist for this table, so remove it from the queue.
539  table_evict_alg_.removeChunk(table_key);
540 }
std::vector< int > ChunkKey
Definition: types.h:37
LRUEvictionAlgorithm table_evict_alg_
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Erases and cleans up all buffers for a table.

Definition at line 323 of file CachingFileMgr.cpp.

Referenced by clearForTable().

323  {
324  // Free associated FileBuffers and clear buffer entries.
325  mapd_unique_lock<mapd_shared_mutex> write_lock(chunkIndexMutex_);
326  ChunkKey min_table_key{db_id, tb_id};
327  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
328  for (auto it = chunkIndex_.lower_bound(min_table_key);
329  it != chunkIndex_.upper_bound(max_table_key);) {
330  it = deleteBufferUnlocked(it);
331  }
332 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableFileMgr ( int32_t  db_id,
int32_t  tb_id 
)
private

Removes the subdirectory content for a table.

Definition at line 313 of file CachingFileMgr.cpp.

Referenced by clearForTable().

313  {
314  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
315  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
316  auto it = table_dirs_.find({db_id, tb_id});
317  if (it != table_dirs_.end()) {
318  it->second->removeDiskContent();
319  table_dirs_.erase(it);
320  }
321 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock

+ Here is the caller graph for this function:

Page File_Namespace::CachingFileMgr::requestFreePage ( size_t  pagesize,
const bool  isMetadata 
)
overrideprivatevirtual

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

Reimplemented from File_Namespace::FileMgr.

Definition at line 411 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileInfo::fileId, and File_Namespace::FileInfo::getFreePage().

411  {
412  std::lock_guard<std::mutex> lock(getPageMutex_);
413  int32_t pageNum = -1;
414  // Splits files into metadata and regular data by size.
415  auto candidateFiles = fileIndex_.equal_range(pageSize);
416  // Check if there is a free page in an existing file.
417  for (auto fileIt = candidateFiles.first; fileIt != candidateFiles.second; ++fileIt) {
418  FileInfo* fileInfo = files_.at(fileIt->second);
419  pageNum = fileInfo->getFreePage();
420  if (pageNum != -1) {
421  return (Page(fileInfo->fileId, pageNum));
422  }
423  }
424 
425  // Try to add a new file if there is free space available.
426  FileInfo* fileInfo = nullptr;
427  if (isMetadata) {
428  if (getMaxMetaFiles() > getNumMetaFiles()) {
429  fileInfo = createFile(pageSize, num_pages_per_metadata_file_);
430  }
431  } else {
432  if (getMaxDataFiles() > getNumDataFiles()) {
433  fileInfo = createFile(pageSize, num_pages_per_data_file_);
434  }
435  }
436 
437  if (!fileInfo) {
438  // We were not able to create a new file, so we try to evict space.
439  // Eviction will return the first file it evicted a page from (a file now guaranteed
440  // to have a free page).
441  fileInfo = isMetadata ? evictMetadataPages() : evictPages();
442  }
443  CHECK(fileInfo);
444 
445  pageNum = fileInfo->getFreePage();
446  CHECK(pageNum != -1);
447  return (Page(fileInfo->fileId, pageNum));
448 }
std::mutex getPageMutex_
pointer to DB level metadata
Definition: FileMgr.h:406
static size_t num_pages_per_data_file_
Definition: FileMgr.h:414
FileInfo * evictPages()
evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first F...
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:397
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:415
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
FileInfo * createFile(const size_t pageSize, const size_t numPages)
Adds a file to the file manager repository.
Definition: FileMgr.cpp:952
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:396
#define CHECK(condition)
Definition: Logger.h:211

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::setDataSizeLimit ( size_t  max)
inline

Definition at line 365 of file CachingFileMgr.h.

References limit_data_size_.

365 { limit_data_size_ = max; }
std::optional< size_t > limit_data_size_
void File_Namespace::CachingFileMgr::setMaxNumDataFiles ( size_t  max)
inline

Definition at line 361 of file CachingFileMgr.h.

References max_num_data_files_.

void File_Namespace::CachingFileMgr::setMaxNumMetadataFiles ( size_t  max)
inline

Definition at line 362 of file CachingFileMgr.h.

References max_num_meta_files_.

void File_Namespace::CachingFileMgr::setMaxSizes ( )
private

Sets the maximum number of files/space for each type of storage based on the maximum size.

Definition at line 667 of file CachingFileMgr.cpp.

References CHECK_GT, and METADATA_PAGE_SIZE.

Referenced by CachingFileMgr().

667  {
668  size_t max_meta_space = std::floor(max_size_ * METADATA_SPACE_PERCENTAGE);
669  size_t max_meta_file_space = std::floor(max_size_ * METADATA_FILE_SPACE_PERCENTAGE);
670  max_wrapper_space_ = max_meta_space - max_meta_file_space;
671  auto max_data_space = max_size_ - max_meta_space;
672  auto meta_file_size = METADATA_PAGE_SIZE * num_pages_per_metadata_file_;
673  auto data_file_size = defaultPageSize_ * num_pages_per_data_file_;
674  max_num_data_files_ = max_data_space / data_file_size;
675  max_num_meta_files_ = max_meta_file_space / meta_file_size;
676  CHECK_GT(max_num_data_files_, 0U) << "Cannot create a cache of size " << max_size_
677  << ". Not enough space to create a data file.";
678  CHECK_GT(max_num_meta_files_, 0U) << "Cannot create a cache of size " << max_size_
679  << ". Not enough space to create a metadata file.";
680 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr float METADATA_SPACE_PERCENTAGE
#define CHECK_GT(x, y)
Definition: Logger.h:223
static size_t num_pages_per_data_file_
Definition: FileMgr.h:414
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:415
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:399
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::setMaxWrapperSpace ( size_t  max)
inline

Definition at line 363 of file CachingFileMgr.h.

References max_wrapper_space_.

void File_Namespace::CachingFileMgr::touchKey ( const ChunkKey key) const
private

Used to track which tables/chunks were least recently used.

Definition at line 518 of file CachingFileMgr.cpp.

References get_table_key().

518  {
521 }
LRUEvictionAlgorithm table_evict_alg_
void touchChunk(const ChunkKey &) override
ChunkKey get_table_key(const ChunkKey &key)
Definition: types.h:53
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 348 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePage(), and File_Namespace::ROLLOFF_CONTINGENT.

352  {
353  // These contingents are stored by overwriting the bytes used for chunkKeys. If
354  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
355  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
356  // the key. At this point our only option is to free the page as though it was
357  // checkpointed (which should be fine since we only maintain one version of each
358  // page).
359  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
360  file_info->freePage(page_num, false, page_epoch);
361  return true;
362  }
363  return false;
364 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Flushes epoch value to disk for a table.

Definition at line 154 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

154  {
155  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
156  auto table_it = table_dirs_.find({db_id, tb_id});
157  CHECK(table_it != table_dirs_.end());
158  table_it->second->writeAndSyncEpochToDisk();
159 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:211
void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

helper function to flush all dirty buffers to disk.

Definition at line 366 of file CachingFileMgr.cpp.

366  {
367  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
368  ChunkKey min_table_key{db_id, tb_id};
369  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
370 
371  for (auto chunk_it = chunkIndex_.lower_bound(min_table_key);
372  chunk_it != chunkIndex_.upper_bound(max_table_key);
373  ++chunk_it) {
374  if (auto [key, buf] = *chunk_it; buf->isDirty()) {
375  // Free previous versions first so we only have one metadata version.
376  buf->freeMetadataPages();
377  buf->writeMetadata(epoch(db_id, tb_id));
378  buf->clearDirtyBits();
379  touchKey(key);
380  }
381  }
382 }
std::vector< int > ChunkKey
Definition: types.h:37
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:327
int32_t epoch() const
Definition: FileMgr.h:513
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:407
void File_Namespace::CachingFileMgr::writeWrapperFile ( const std::string &  doc,
int32_t  db,
int32_t  tb 
)

Writes a wrapper file to a table subdir.

Definition at line 646 of file CachingFileMgr.cpp.

References CHECK_LE.

646  {
648  auto wrapper_size = doc.size();
649  CHECK_LE(wrapper_size, getMaxWrapperSize())
650  << "Wrapper is too big to fit into the cache";
651  while (wrapper_size > getAvailableWrapperSpace()) {
653  }
654  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
655  table_dirs_.at({db, tb})->writeWrapperFile(doc);
656 }
void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)
Writes a wrapper file to a table subdir.
mapd_shared_mutex table_dirs_mutex_
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
#define CHECK_LE(x, y)
Definition: Logger.h:222
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
mapd_shared_lock< mapd_shared_mutex > read_lock

Member Data Documentation

LRUEvictionAlgorithm File_Namespace::CachingFileMgr::chunk_evict_alg_
mutableprivate

Definition at line 491 of file CachingFileMgr.h.

Referenced by dump(), and dumpEvictionQueue().

std::optional<size_t> File_Namespace::CachingFileMgr::limit_data_size_ {}
private

Definition at line 489 of file CachingFileMgr.h.

Referenced by setDataSizeLimit().

size_t File_Namespace::CachingFileMgr::max_num_data_files_
private

Definition at line 485 of file CachingFileMgr.h.

Referenced by getMaxDataFiles(), and setMaxNumDataFiles().

size_t File_Namespace::CachingFileMgr::max_num_meta_files_
private

Definition at line 486 of file CachingFileMgr.h.

Referenced by getMaxMetaFiles(), and setMaxNumMetadataFiles().

size_t File_Namespace::CachingFileMgr::max_size_
private

Definition at line 488 of file CachingFileMgr.h.

Referenced by CachingFileMgr(), getAvailableSpace(), and getMaxSize().

size_t File_Namespace::CachingFileMgr::max_wrapper_space_
private
constexpr float File_Namespace::CachingFileMgr::METADATA_FILE_SPACE_PERCENTAGE {0.01}
static

Definition at line 173 of file CachingFileMgr.h.

Referenced by getMinimumSize().

constexpr float File_Namespace::CachingFileMgr::METADATA_SPACE_PERCENTAGE {0.1}
static

Definition at line 171 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<TableFileMgr> > File_Namespace::CachingFileMgr::table_dirs_
private
mapd_shared_mutex File_Namespace::CachingFileMgr::table_dirs_mutex_
mutableprivate
LRUEvictionAlgorithm File_Namespace::CachingFileMgr::table_evict_alg_
mutableprivate

Definition at line 492 of file CachingFileMgr.h.

Referenced by dump(), and dumpTableQueue().

constexpr char File_Namespace::CachingFileMgr::WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
static

The documentation for this class was generated from the following files: