OmniSciDB  6686921089
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
File_Namespace::CachingFileMgr Class Reference

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache. More...

#include <CachingFileMgr.h>

+ Inheritance diagram for File_Namespace::CachingFileMgr:
+ Collaboration diagram for File_Namespace::CachingFileMgr:

Public Member Functions

 CachingFileMgr (const DiskCacheConfig &config)
 
 ~CachingFileMgr () override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
size_t getDefaultPageSize ()
 
size_t getMaxSize () override
 
size_t getMaxDataFiles () const
 
size_t getMaxMetaFiles () const
 
size_t getMaxWrapperSize () const
 
size_t getDataFileSize () const
 
size_t getMetadataFileSize () const
 
size_t getNumDataFiles () const
 
size_t getNumMetaFiles () const
 
size_t getAvailableSpace ()
 
size_t getAvailableWrapperSpace ()
 
size_t getAllocated () override
 
void removeChunkKeepMetadata (const ChunkKey &key)
 Free pages for chunk and remove it from the chunk eviction algorithm. More...
 
void clearForTable (int32_t db_id, int32_t tb_id)
 Removes all data related to the given table (pages and subdirectories). More...
 
bool hasFileMgrKey () const override
 Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this). More...
 
void closeRemovePhysical () override
 Closes files and removes the caching directory. More...
 
size_t getChunkSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getMetadataSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
size_t getTableFileMgrSpaceReserved (int32_t db_id, int32_t tb_id) const
 
size_t getSpaceReservedByTable (int32_t db_id, int32_t tb_id) const
 
std::string describeSelf () const override
 describes this FileMgr for logging purposes. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages. More...
 
int32_t epoch (int32_t db_id, int32_t tb_id) const override
 obtain the epoch version for the given table. More...
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes=0) override
 deletes any existing buffer for the given key then copies in a new one. More...
 
CachingFileBufferallocateBuffer (const size_t page_size, const ChunkKey &key, const size_t num_bytes=0) override
 allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms. More...
 
CachingFileBufferallocateBuffer (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt) override
 
bool updatePageIfDeleted (FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num) override
 checks whether a page should be deleted. More...
 
bool failOnReadError () const override
 True if a read error should cause a fatal error. More...
 
void deleteBufferIfExists (const ChunkKey &key)
 deletes a buffer if it exists in the mgr. Otherwise do nothing. More...
 
size_t getNumChunksWithMetadata () const
 Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts. More...
 
size_t getNumDataChunks () const
 Returns the number of buffers with chunk data in the CFM. More...
 
std::vector< ChunkKeygetChunkKeysForPrefix (const ChunkKey &prefix) const
 Returns the keys for chunks with chunk data that match the given prefix. More...
 
std::unique_ptr< CachingFileMgrreconstruct () const
 Initializes a new CFM using the initialization values in the current CFM. More...
 
void deleteWrapperFile (int32_t db, int32_t tb)
 Deletes the wrapper file from a table subdir. More...
 
void writeWrapperFile (const std::string &doc, int32_t db, int32_t tb)
 Writes a wrapper file to a table subdir. More...
 
std::string getTableFileMgrPath (int32_t db, int32_t tb) const
 
size_t getFilesSize () const
 Get the total size of page files (data and metadata files). This includes allocated, but unused space. More...
 
size_t getTableFileMgrsSize () const
 Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files. More...
 
std::optional< FileBuffer * > getBufferIfExists (const ChunkKey &key)
 an optional version of get buffer if we are not sure a chunk exists. More...
 
void free_page (std::pair< FileInfo *, int32_t > &&page) override
 Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
std::string dumpKeysWithMetadata () const
 
std::string dumpKeysWithChunkData () const
 
std::string dumpTableQueue () const
 
void setMaxNumDataFiles (size_t max)
 
void setMaxNumMetadataFiles (size_t max)
 
void setMaxWrapperSpace (size_t max)
 
std::set< ChunkKeygetKeysWithMetadata () const
 
- Public Member Functions inherited from File_Namespace::FileMgr
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const int32_t max_rollback_epochs=-1, const size_t num_reader_threads=0, const int32_t epoch=-1, const size_t defaultPageSize=DEFAULT_PAGE_SIZE)
 Constructor. More...
 
 FileMgr (const int32_t deviceId, GlobalFileMgr *gfm, const TablePair fileMgrKey, const size_t defaultPageSize, const bool runCoreInit)
 
 FileMgr (GlobalFileMgr *gfm, const size_t defaultPageSize, std::string basePath)
 
 ~FileMgr () override
 Destructor. More...
 
StorageStats getStorageStats ()
 
FileBuffercreateBuffer (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a chunk with the specified key and page size. More...
 
bool isBufferOnDevice (const ChunkKey &key) override
 
void deleteBuffer (const ChunkKey &key, const bool purge=true) override
 Deletes the chunk with the specified key. More...
 
void deleteBuffersWithPrefix (const ChunkKey &keyPrefix, const bool purge=true) override
 
FileBuffergetBuffer (const ChunkKey &key, const size_t numBytes=0) override
 Returns the a pointer to the chunk with the specified key. More...
 
void fetchBuffer (const ChunkKey &key, AbstractBuffer *destBuffer, const size_t numBytes) override
 
FileBufferputBuffer (const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
 Puts the contents of d into the Chunk with the given key. More...
 
AbstractBufferalloc (const size_t numBytes) override
 
void free (AbstractBuffer *buffer) override
 
MgrType getMgrType () override
 
std::string getStringMgrType () override
 
std::string printSlabs () override
 
size_t getMaxSize () override
 
size_t getInUseSize () override
 
size_t getAllocated () override
 
bool isAllocationCapped () override
 
FileInfogetFileInfoForFileId (const int32_t fileId) const
 
FileMetadata getMetadataForFile (const boost::filesystem::directory_iterator &fileIterator)
 
void init (const size_t num_reader_threads, const int32_t epochOverride)
 
void init (const std::string &dataPathToConvertFrom, const int32_t epochOverride)
 
void copyPage (Page &srcPage, FileMgr *destFileMgr, Page &destPage, const size_t reservedHeaderSize, const size_t numBytes, const size_t offset)
 
void requestFreePages (size_t npages, size_t pagesize, std::vector< Page > &pages, const bool isMetadata)
 Obtains free pages – creates new files if necessary – of the requested size. More...
 
void getChunkMetadataVecForKeyPrefix (ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
 
void checkpoint () override
 Fsyncs data files, writes out epoch and fsyncs that. More...
 
void checkpoint (const int32_t db_id, const int32_t tb_id) override
 
int32_t epochFloor () const
 
int32_t incrementEpoch ()
 
int32_t lastCheckpointedEpoch ()
 Returns value of epoch at last checkpoint. More...
 
int32_t maxRollbackEpochs ()
 Returns value max_rollback_epochs. More...
 
size_t getNumReaderThreads ()
 Returns number of threads defined by parameter num-reader-threads which should be used during initial load and consequent read of data. More...
 
FILE * getFileForFileId (const int32_t fileId)
 Returns FILE pointer associated with requested fileId. More...
 
size_t getNumChunks () override
 
size_t getNumUsedMetadataPagesForChunkKey (const ChunkKey &chunkKey) const
 
int32_t getDBVersion () const
 Index for looking up chunks. More...
 
bool getDBConvert () const
 
void createTopLevelMetadata ()
 
std::string getFileMgrBasePath () const
 
void removeTableRelatedDS (const int32_t db_id, const int32_t table_id) override
 
const TablePair get_fileMgrKey () const
 
boost::filesystem::path getFilePath (const std::string &file_name)
 
void writePageMappingsToStatusFile (const std::vector< PageMapping > &page_mappings)
 
void renameCompactionStatusFile (const char *const from_status, const char *const to_status)
 
void compactFiles ()
 

Static Public Member Functions

static size_t getMinimumSize ()
 
- Static Public Member Functions inherited from File_Namespace::FileMgr
static void setNumPagesPerDataFile (size_t num_pages)
 
static void setNumPagesPerMetadataFile (size_t num_pages)
 

Static Public Attributes

static constexpr char WRAPPER_FILE_NAME [] = "wrapper_metadata.json"
 
static constexpr float METADATA_SPACE_PERCENTAGE {0.1}
 
static constexpr float METADATA_FILE_SPACE_PERCENTAGE {0.01}
 
- Static Public Attributes inherited from File_Namespace::FileMgr
static constexpr size_t DEFAULT_NUM_PAGES_PER_DATA_FILE {256}
 
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE {4096}
 
static constexpr char constCOPY_PAGES_STATUS {"pending_data_compaction_0"}
 
static constexpr char constUPDATE_PAGE_VISIBILITY_STATUS {"pending_data_compaction_1"}
 
static constexpr char constDELETE_EMPTY_FILES_STATUS {"pending_data_compaction_2"}
 
static constexpr char LEGACY_EPOCH_FILENAME [] = "epoch"
 
static constexpr char EPOCH_FILENAME [] = "epoch_metadata"
 
static constexpr char DB_META_FILENAME [] = "dbmeta"
 
static constexpr char FILE_MGR_VERSION_FILENAME [] = "filemgr_version"
 
static constexpr int32_t INVALID_VERSION = -1
 

Private Member Functions

void incrementEpoch (int32_t db_id, int32_t tb_id)
 Increments epoch for the given table. More...
 
void init (const size_t num_reader_threads)
 Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe). More...
 
void writeAndSyncEpochToDisk (int32_t db_id, int32_t tb_id)
 Flushes epoch value to disk for a table. More...
 
void readTableFileMgrs ()
 Checks for any sub-directories containing table-specific data and creates epochs from found files. More...
 
FileBuffercreateBufferFromHeaders (const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
 Creates a buffer and initializes it with info read from files on disk. More...
 
FileBuffercreateBufferUnlocked (const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0) override
 Creates a buffer. More...
 
void createTableFileMgrIfNoneExists (const int32_t db_id, const int32_t tb_id)
 Create and initialize a subdirectory for a table if none exists. More...
 
void incrementAllEpochs ()
 Increment epochs for each table in the CFM. More...
 
void removeTableFileMgr (int32_t db_id, int32_t tb_id)
 Removes the subdirectory content for a table. More...
 
void removeTableBuffers (int32_t db_id, int32_t tb_id)
 Erases and cleans up all buffers for a table. More...
 
void writeDirtyBuffers (int32_t db_id, int32_t tb_id)
 helper function to flush all dirty buffers to disk. More...
 
Page requestFreePage (size_t pagesize, const bool isMetadata) override
 requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available. More...
 
void touchKey (const ChunkKey &key) const
 Used to track which tables/chunks were least recently used. More...
 
void removeKey (const ChunkKey &key) const
 
std::vector< ChunkKeygetKeysForTable (int32_t db_id, int32_t tb_id) const
 returns set of keys contained in chunkIndex_ that match the given table prefix. More...
 
FileInfoevictMetadataPages ()
 evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
FileInfoevictPages ()
 evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it). More...
 
void deleteCacheIfTooLarge ()
 When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space. More...
 
void setMaxSizes ()
 Sets the maximum number of files/space for each type of storage based on the maximum size. More...
 
FileBuffergetBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0) override
 
ChunkKeyToChunkMap::iterator deleteBufferUnlocked (const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
 

Private Attributes

mapd_shared_mutex table_dirs_mutex_
 
std::map< TablePair,
std::unique_ptr< TableFileMgr > > 
table_dirs_
 
size_t max_num_data_files_
 
size_t max_num_meta_files_
 
size_t max_wrapper_space_
 
size_t max_size_
 
LRUEvictionAlgorithm chunk_evict_alg_
 
LRUEvictionAlgorithm table_evict_alg_
 

Additional Inherited Members

- Public Attributes inherited from File_Namespace::FileMgr
ChunkKeyToChunkMap chunkIndex_
 
- Protected Member Functions inherited from File_Namespace::FileMgr
 FileMgr ()
 
FileInfocreateFile (const size_t pageSize, const size_t numPages)
 Adds a file to the file manager repository. More...
 
FileInfoopenExistingFile (const std::string &path, const int32_t fileId, const size_t pageSize, const size_t numPages, std::vector< HeaderInfo > &headerVec)
 
void createEpochFile (const std::string &epochFileName)
 
int32_t openAndReadLegacyEpochFile (const std::string &epochFileName)
 
void openAndReadEpochFile (const std::string &epochFileName)
 
void writeAndSyncEpochToDisk ()
 
void setEpoch (const int32_t newEpoch)
 
int32_t readVersionFromDisk (const std::string &versionFileName) const
 
void writeAndSyncVersionToDisk (const std::string &versionFileName, const int32_t version)
 
void processFileFutures (std::vector< std::future< std::vector< HeaderInfo >>> &file_futures, std::vector< HeaderInfo > &headerVec)
 
void migrateToLatestFileMgrVersion ()
 
void migrateEpochFileV0 ()
 
OpenFilesResult openFiles ()
 
void clearFileInfos ()
 
void copySourcePageForCompaction (const Page &source_page, FileInfo *destination_file_info, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
int32_t copyPageWithoutHeaderSize (const Page &source_page, const Page &destination_page)
 
void sortAndCopyFilePagesForCompaction (size_t page_size, std::vector< PageMapping > &page_mappings, std::set< Page > &touched_pages)
 
void updateMappedPagesVisibility (const std::vector< PageMapping > &page_mappings)
 
void deleteEmptyFiles ()
 
void resumeFileCompaction (const std::string &status_file_name)
 
std::vector< PageMappingreadPageMappingsFromStatusFile ()
 
 FileMgr (const int epoch)
 
void closePhysicalUnlocked ()
 
void syncFilesToDisk ()
 
void freePages ()
 
void initializeNumThreads (size_t num_reader_threads=0)
 
- Protected Attributes inherited from File_Namespace::FileMgr
int32_t maxRollbackEpochs_
 
std::string fileMgrBasePath_
 
std::map< int32_t, FileInfo * > files_
 
PageSizeFileMMap fileIndex_
 A map of files accessible via a file identifier. More...
 
size_t num_reader_threads_
 Maps page sizes to FileInfo objects. More...
 
size_t defaultPageSize_
 number of threads used when loading data More...
 
unsigned nextFileId_
 
int32_t db_version_
 the index of the next file id More...
 
int32_t fileMgrVersion_
 
const int32_t latestFileMgrVersion_ {1}
 
FILE * DBMetaFile_ = nullptr
 
std::mutex getPageMutex_
 pointer to DB level metadata More...
 
mapd_shared_mutex chunkIndexMutex_
 
mapd_shared_mutex files_rw_mutex_
 
mapd_shared_mutex mutex_free_page_
 
std::vector< std::pair
< FileInfo *, int32_t > > 
free_pages_
 
bool isFullyInitted_ {false}
 
- Static Protected Attributes inherited from File_Namespace::FileMgr
static size_t num_pages_per_data_file_ {DEFAULT_NUM_PAGES_PER_DATA_FILE}
 
static size_t num_pages_per_metadata_file_ {DEFAULT_NUM_PAGES_PER_METADATA_FILE}
 

Detailed Description

A FileMgr capable of limiting it's size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Definition at line 164 of file CachingFileMgr.h.

Constructor & Destructor Documentation

File_Namespace::CachingFileMgr::CachingFileMgr ( const DiskCacheConfig config)

Definition at line 54 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::defaultPageSize_, File_Namespace::FileMgr::fileMgrBasePath_, init(), max_size_, File_Namespace::FileMgr::maxRollbackEpochs_, File_Namespace::FileMgr::nextFileId_, File_Namespace::DiskCacheConfig::num_reader_threads, File_Namespace::DiskCacheConfig::page_size, File_Namespace::DiskCacheConfig::path, setMaxSizes(), and File_Namespace::DiskCacheConfig::size_limit.

54  {
55  fileMgrBasePath_ = config.path;
57  defaultPageSize_ = config.page_size;
58  nextFileId_ = 0;
59  max_size_ = config.size_limit;
60  init(config.num_reader_threads);
61  setMaxSizes();
62 }
void setMaxSizes()
Sets the maximum number of files/space for each type of storage based on the maximum size...
std::string fileMgrBasePath_
Definition: FileMgr.h:386
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
int32_t maxRollbackEpochs_
Definition: FileMgr.h:385
void init(const size_t num_reader_threads)
Initializes a CFM, parsing any existing files and initializing data structures appropriately (current...

+ Here is the call graph for this function:

File_Namespace::CachingFileMgr::~CachingFileMgr ( )
override

Definition at line 64 of file CachingFileMgr.cpp.

64 {}

Member Function Documentation

CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const size_t  page_size,
const ChunkKey key,
const size_t  num_bytes = 0 
)
overridevirtual

allocates a new CachingFileBuffer and tracks it's use in the eviction algorithms.

Reimplemented from File_Namespace::FileMgr.

Definition at line 320 of file CachingFileMgr.cpp.

322  {
323  return new CachingFileBuffer(this, page_size, key, num_bytes);
324 }
CachingFileBuffer * File_Namespace::CachingFileMgr::allocateBuffer ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  headerStartIt,
const std::vector< HeaderInfo >::const_iterator &  headerEndIt 
)
overridevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 326 of file CachingFileMgr.cpp.

329  {
330  return new CachingFileBuffer(this, key, headerStartIt, headerEndIt);
331 }
void File_Namespace::CachingFileMgr::checkpoint ( const int32_t  db_id,
const int32_t  tb_id 
)
override

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

Definition at line 220 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

220  {
221  {
222  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
223  CHECK(table_dirs_.find({db_id, tb_id}) != table_dirs_.end());
224  }
225  VLOG(2) << "Checkpointing " << describeSelf() << " (" << db_id << ", " << tb_id
226  << ") epoch: " << epoch(db_id, tb_id);
227  writeDirtyBuffers(db_id, tb_id);
228  syncFilesToDisk();
229  writeAndSyncEpochToDisk(db_id, tb_id);
230  incrementEpoch(db_id, tb_id);
231  freePages();
232 }
mapd_shared_mutex table_dirs_mutex_
std::string describeSelf() const override
describes this FileMgr for logging purposes.
int32_t incrementEpoch()
Definition: FileMgr.h:275
void writeAndSyncEpochToDisk()
Definition: FileMgr.cpp:631
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
int32_t epoch() const
Definition: FileMgr.h:506
#define CHECK(condition)
Definition: Logger.h:209
#define VLOG(n)
Definition: Logger.h:303
void File_Namespace::CachingFileMgr::clearForTable ( int32_t  db_id,
int32_t  tb_id 
)

Removes all data related to the given table (pages and subdirectories).

Definition at line 147 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::freePages(), removeTableBuffers(), and removeTableFileMgr().

147  {
148  removeTableBuffers(db_id, tb_id);
149  removeTableFileMgr(db_id, tb_id);
150  freePages();
151 }
void removeTableBuffers(int32_t db_id, int32_t tb_id)
Erases and cleans up all buffers for a table.
void removeTableFileMgr(int32_t db_id, int32_t tb_id)
Removes the subdirectory content for a table.

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::closeRemovePhysical ( )
overridevirtual

Closes files and removes the caching directory.

Reimplemented from File_Namespace::FileMgr.

Definition at line 157 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::closePhysicalUnlocked(), File_Namespace::FileMgr::files_rw_mutex_, File_Namespace::FileMgr::getFileMgrBasePath(), table_dirs_, and table_dirs_mutex_.

157  {
158  {
159  mapd_unique_lock<mapd_shared_mutex> write_lock(files_rw_mutex_);
161  }
162  {
163  mapd_unique_lock<mapd_shared_mutex> tables_lock(table_dirs_mutex_);
164  table_dirs_.clear();
165  }
166  bf::remove_all(getFileMgrBasePath());
167 }
mapd_shared_mutex table_dirs_mutex_
std::string getFileMgrBasePath() const
Definition: FileMgr.h:323
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:401

+ Here is the call graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferFromHeaders ( const ChunkKey key,
const std::vector< HeaderInfo >::const_iterator &  startIt,
const std::vector< HeaderInfo >::const_iterator &  endIt 
)
overrideprivatevirtual

Creates a buffer and initializes it with info read from files on disk.

Reimplemented from File_Namespace::FileMgr.

Definition at line 253 of file CachingFileMgr.cpp.

References get_table_prefix().

Referenced by init().

256  {
257  if (startIt->pageId != -1) {
258  // If the first pageId is not -1 then there is no metadata page for the
259  // current key (which means it was never checkpointed), so we should skip.
260  return nullptr;
261  }
262  touchKey(key);
263  auto [db_id, tb_id] = get_table_prefix(key);
264  createTableFileMgrIfNoneExists(db_id, tb_id);
265  auto buffer = FileMgr::createBufferFromHeaders(key, startIt, endIt);
266  if (buffer->isMissingPages()) {
267  // Detect the case where a page is missing by comparing the amount of pages read
268  // with the metadata size. If data are missing, discard the chunk.
269  buffer->freeChunkPages();
270  }
271  return buffer;
272 }
virtual FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &headerStartIt, const std::vector< HeaderInfo >::const_iterator &headerEndIt)
Definition: FileMgr.cpp:709
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::createBufferUnlocked ( const ChunkKey key,
size_t  pageSize = 0,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Creates a buffer.

Reimplemented from File_Namespace::FileMgr.

Definition at line 244 of file CachingFileMgr.cpp.

References get_table_prefix().

246  {
247  touchKey(key);
248  auto [db_id, tb_id] = get_table_prefix(key);
249  createTableFileMgrIfNoneExists(db_id, tb_id);
250  return FileMgr::createBufferUnlocked(key, page_size, num_bytes);
251 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
virtual FileBuffer * createBufferUnlocked(const ChunkKey &key, size_t pageSize=0, const size_t numBytes=0)
Definition: FileMgr.cpp:698
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::createTableFileMgrIfNoneExists ( const int32_t  db_id,
const int32_t  tb_id 
)
private

Create and initialize a subdirectory for a table if none exists.

Definition at line 234 of file CachingFileMgr.cpp.

235  {
236  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
237  TablePair table_pair{db_id, tb_id};
238  if (table_dirs_.find(table_pair) == table_dirs_.end()) {
239  table_dirs_.emplace(
240  table_pair, std::make_unique<TableFileMgr>(getTableFileMgrPath(db_id, tb_id)));
241  }
242 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:86
std::string getTableFileMgrPath(int32_t db, int32_t tb) const
void File_Namespace::CachingFileMgr::deleteBufferIfExists ( const ChunkKey key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

Definition at line 370 of file CachingFileMgr.cpp.

370  {
371  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
372  auto chunk_it = chunkIndex_.find(key);
373  if (chunk_it != chunkIndex_.end()) {
374  deleteBufferUnlocked(chunk_it);
375  }
376 }
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
ChunkKeyToChunkMap::iterator File_Namespace::CachingFileMgr::deleteBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const bool  purge = true 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 677 of file CachingFileMgr.cpp.

679  {
680  removeKey(chunk_it->first);
681  return FileMgr::deleteBufferUnlocked(chunk_it, purge);
682 }
virtual ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true)
Definition: FileMgr.cpp:733
void removeKey(const ChunkKey &key) const
void File_Namespace::CachingFileMgr::deleteCacheIfTooLarge ( )
private

When the cache is read from disk, we don't know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

Definition at line 389 of file CachingFileMgr.cpp.

References logger::INFO, LOG, and anonymous_namespace{CachingFileMgr.cpp}::size_of_dir().

Referenced by init().

389  {
392  bf::create_directory(fileMgrBasePath_);
393  LOG(INFO) << "Cache path over limit. Existing cache deleted.";
394  }
395 }
size_t size_of_dir(const std::string &dir)
#define LOG(tag)
Definition: Logger.h:203
void closeRemovePhysical() override
Closes files and removes the caching directory.
std::string fileMgrBasePath_
Definition: FileMgr.h:386

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::deleteWrapperFile ( int32_t  db,
int32_t  tb 
)

Deletes the wrapper file from a table subdir.

Definition at line 625 of file CachingFileMgr.cpp.

References CHECK.

625  {
626  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
627  auto it = table_dirs_.find({db, tb});
628  CHECK(it != table_dirs_.end());
629  it->second->deleteWrapperFile();
630 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:209
std::string File_Namespace::CachingFileMgr::describeSelf ( ) const
overridevirtual

describes this FileMgr for logging purposes.

Reimplemented from File_Namespace::FileMgr.

Definition at line 215 of file CachingFileMgr.cpp.

215  {
216  return "cache";
217 }
std::string File_Namespace::CachingFileMgr::dumpKeysWithChunkData ( ) const

Definition at line 605 of file CachingFileMgr.cpp.

References show_chunk().

605  {
606  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
607  std::string ret_string = "CFM keys with chunk data:\n";
608  for (const auto& [key, buf] : chunkIndex_) {
609  if (buf->hasDataPages()) {
610  ret_string += " " + show_chunk(key) + "\n";
611  }
612  }
613  return ret_string;
614 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:86
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpKeysWithMetadata ( ) const

Definition at line 594 of file CachingFileMgr.cpp.

References show_chunk().

594  {
595  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
596  std::string ret_string = "CFM keys with metadata:\n";
597  for (const auto& [key, buf] : chunkIndex_) {
598  if (buf->hasEncoder()) {
599  ret_string += " " + show_chunk(key) + "\n";
600  }
601  }
602  return ret_string;
603 }
std::string show_chunk(const ChunkKey &key)
Definition: types.h:86
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::dumpTableQueue ( ) const
inline

Definition at line 354 of file CachingFileMgr.h.

References LRUEvictionAlgorithm::dumpEvictionQueue(), and table_evict_alg_.

LRUEvictionAlgorithm table_evict_alg_

+ Here is the call graph for this function:

int32_t File_Namespace::CachingFileMgr::epoch ( int32_t  db_id,
int32_t  tb_id 
) const
overridevirtual

obtain the epoch version for the given table.

Reimplemented from File_Namespace::FileMgr.

Definition at line 124 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

124  {
125  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
126  auto tables_it = table_dirs_.find({db_id, tb_id});
127  CHECK(tables_it != table_dirs_.end());
128  auto& [pair, table_dir] = *tables_it;
129  return table_dir->getEpoch();
130 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:209
FileInfo * File_Namespace::CachingFileMgr::evictMetadataPages ( )
private

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 449 of file CachingFileMgr.cpp.

References CHECK, anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail(), and get_table_prefix().

449  {
450  // Locks should already be in place before calling this method.
451  FileInfo* file_info{nullptr};
452  auto key_to_evict = evict_chunk_or_fail(table_evict_alg_);
453  auto [db_id, tb_id] = get_table_prefix(key_to_evict);
454  const auto keys = getKeysForTable(db_id, tb_id);
455  for (const auto& key : keys) {
456  auto chunk_it = chunkIndex_.find(key);
457  CHECK(chunk_it != chunkIndex_.end());
458  auto& buf = chunk_it->second;
459  if (!file_info) {
460  // Return the FileInfo for the first file we are freeing a page from so that the
461  // caller does not have to search for a FileInfo guaranteed to have at least one
462  // free page.
463  CHECK(buf->getMetadataPage().pageVersions.size() > 0);
464  file_info =
465  getFileInfoForFileId(buf->getMetadataPage().pageVersions.front().page.fileId);
466  }
467  // We erase all pages and entries for the chunk, as without metadata all other
468  // entries are useless.
469  deleteBufferUnlocked(chunk_it);
470  }
471  // Serialized datawrappers require metadata to be in the cache.
472  deleteWrapperFile(db_id, tb_id);
473  CHECK(file_info) << "FileInfo with freed page not found";
474  return file_info;
475 }
LRUEvictionAlgorithm table_evict_alg_
void deleteWrapperFile(int32_t db, int32_t tb)
Deletes the wrapper file from a table subdir.
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
std::vector< ChunkKey > getKeysForTable(int32_t db_id, int32_t tb_id) const
returns set of keys contained in chunkIndex_ that match the given table prefix.
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58
#define CHECK(condition)
Definition: Logger.h:209
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:218

+ Here is the call graph for this function:

FileInfo * File_Namespace::CachingFileMgr::evictPages ( )
private

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

Definition at line 477 of file CachingFileMgr.cpp.

References CHECK, and anonymous_namespace{CachingFileMgr.cpp}::evict_chunk_or_fail().

477  {
478  FileInfo* file_info{nullptr};
479  FileBuffer* buf{nullptr};
480  while (!file_info) {
482  CHECK(buf);
483  if (!buf->hasDataPages()) {
484  // This buffer contains no chunk data (metadata only, uninitialized, size == 0,
485  // etc...) so we won't recover any space by evicting it. In this case it gets
486  // removed from the eviction queue (it will get re-added if it gets populated with
487  // data) and we look at the next chunk in queue until we find a buffer with page
488  // data.
489  continue;
490  }
491  // Return the FileInfo for the first file we are freeing a page from so that the
492  // caller does not have to search for a FileInfo guaranteed to have at least one free
493  // page.
494  CHECK(buf->getMultiPage().front().pageVersions.size() > 0);
495  file_info = getFileInfoForFileId(
496  buf->getMultiPage().front().pageVersions.front().page.fileId);
497  }
498  auto pages_freed = buf->freeChunkPages();
499  CHECK(pages_freed > 0) << "failed to evict a page";
500  CHECK(file_info) << "FileInfo with freed page not found";
501  return file_info;
502 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
ChunkKey evict_chunk_or_fail(LRUEvictionAlgorithm &alg)
#define CHECK(condition)
Definition: Logger.h:209
FileInfo * getFileInfoForFileId(const int32_t fileId) const
Definition: FileMgr.h:218
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::failOnReadError ( ) const
inlineoverridevirtual

True if a read error should cause a fatal error.

Reimplemented from File_Namespace::FileMgr.

Definition at line 285 of file CachingFileMgr.h.

285 { return false; }
void File_Namespace::CachingFileMgr::free_page ( std::pair< FileInfo *, int32_t > &&  page)
overridevirtual

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

Reimplemented from File_Namespace::FileMgr.

Definition at line 699 of file CachingFileMgr.cpp.

699  {
700  page.first->freePageDeferred(page.second);
701 }
size_t File_Namespace::CachingFileMgr::getAllocated ( )
inlineoverride

Definition at line 206 of file CachingFileMgr.h.

References getFilesSize(), and getTableFileMgrsSize().

Referenced by getAvailableSpace().

206  {
207  return getFilesSize() + getTableFileMgrsSize();
208  }
size_t getFilesSize() const
Get the total size of page files (data and metadata files). This includes allocated, but unused space.
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableSpace ( )
inline

Definition at line 202 of file CachingFileMgr.h.

References getAllocated(), and max_size_.

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getAvailableWrapperSpace ( )
inline

Definition at line 203 of file CachingFileMgr.h.

References getTableFileMgrsSize(), and max_wrapper_space_.

203  {
205  }
size_t getTableFileMgrsSize() const
Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirector...

+ Here is the call graph for this function:

std::optional< FileBuffer * > File_Namespace::CachingFileMgr::getBufferIfExists ( const ChunkKey key)

an optional version of get buffer if we are not sure a chunk exists.

Definition at line 668 of file CachingFileMgr.cpp.

668  {
669  mapd_shared_lock<mapd_shared_mutex> chunk_index_read_lock(chunkIndexMutex_);
670  auto chunk_it = chunkIndex_.find(key);
671  if (chunk_it == chunkIndex_.end()) {
672  return {};
673  }
674  return getBufferUnlocked(chunk_it);
675 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
FileBuffer * getBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0) override
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
FileBuffer * File_Namespace::CachingFileMgr::getBufferUnlocked ( const ChunkKeyToChunkMap::iterator  chunk_it,
const size_t  numBytes = 0 
)
overrideprivatevirtual

Reimplemented from File_Namespace::FileMgr.

Definition at line 693 of file CachingFileMgr.cpp.

694  {
695  touchKey(chunk_it->first);
696  return FileMgr::getBufferUnlocked(chunk_it, num_bytes);
697 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
virtual FileBuffer * getBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes=0)
Definition: FileMgr.cpp:764
std::vector< ChunkKey > File_Namespace::CachingFileMgr::getChunkKeysForPrefix ( const ChunkKey prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

Definition at line 556 of file CachingFileMgr.cpp.

References in_same_table().

557  {
558  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
559  std::vector<ChunkKey> chunks;
560  for (auto [key, buf] : chunkIndex_) {
561  if (in_same_table(key, prefix)) {
562  if (buf->hasDataPages()) {
563  chunks.emplace_back(key);
564  touchKey(key);
565  }
566  }
567  }
568  return chunks;
569 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
bool in_same_table(const ChunkKey &left_key, const ChunkKey &right_key)
Definition: types.h:79
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::getChunkMetadataVecForKeyPrefix ( ChunkMetadataVector chunkMetadataVec,
const ChunkKey keyPrefix 
)
override

Definition at line 684 of file CachingFileMgr.cpp.

686  {
687  FileMgr::getChunkMetadataVecForKeyPrefix(chunkMetadataVec, keyPrefix);
688  for (const auto& [key, meta] : chunkMetadataVec) {
689  touchKey(key);
690  }
691 }
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix) override
Definition: FileMgr.cpp:970
size_t File_Namespace::CachingFileMgr::getChunkSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Set of functions to determine how much space is reserved in a table by type.

Definition at line 169 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and File_Namespace::FileMgr::defaultPageSize_.

Referenced by getSpaceReservedByTable().

169  {
170  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
171  size_t space_used = 0;
172  ChunkKey min_table_key{db_id, tb_id};
173  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
174  for (auto it = chunkIndex_.lower_bound(min_table_key);
175  it != chunkIndex_.upper_bound(max_table_key);
176  ++it) {
177  auto& [key, buffer] = *it;
178  space_used += (buffer->numChunkPages() * defaultPageSize_);
179  }
180  return space_used;
181 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getDataFileSize ( ) const
inline

Definition at line 193 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_, and File_Namespace::FileMgr::num_pages_per_data_file_.

193  {
195  }
static size_t num_pages_per_data_file_
Definition: FileMgr.h:407
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
size_t File_Namespace::CachingFileMgr::getDefaultPageSize ( )
inline

Definition at line 188 of file CachingFileMgr.h.

References File_Namespace::FileMgr::defaultPageSize_.

188 { return defaultPageSize_; }
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
size_t File_Namespace::CachingFileMgr::getFilesSize ( ) const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

Definition at line 528 of file CachingFileMgr.cpp.

Referenced by getAllocated().

528  {
529  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
530  size_t sum = 0;
531  for (auto [id, file] : files_) {
532  sum += file->size();
533  }
534  return sum;
535 }
mapd_shared_lock< mapd_shared_mutex > read_lock
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:389
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:401

+ Here is the caller graph for this function:

std::vector< ChunkKey > File_Namespace::CachingFileMgr::getKeysForTable ( int32_t  db_id,
int32_t  tb_id 
) const
private

returns set of keys contained in chunkIndex_ that match the given table prefix.

Definition at line 436 of file CachingFileMgr.cpp.

437  {
438  std::vector<ChunkKey> keys;
439  ChunkKey min_table_key{db_id, tb_id};
440  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
441  for (auto it = chunkIndex_.lower_bound(min_table_key);
442  it != chunkIndex_.upper_bound(max_table_key);
443  ++it) {
444  keys.emplace_back(it->first);
445  }
446  return keys;
447 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
std::set< ChunkKey > File_Namespace::CachingFileMgr::getKeysWithMetadata ( ) const

Definition at line 703 of file CachingFileMgr.cpp.

703  {
704  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
705  std::set<ChunkKey> ret;
706  for (const auto& [key, buf] : chunkIndex_) {
707  if (buf->hasEncoder()) {
708  ret.emplace(key);
709  }
710  }
711  return ret;
712 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
size_t File_Namespace::CachingFileMgr::getMaxDataFiles ( ) const
inline

Definition at line 190 of file CachingFileMgr.h.

References max_num_data_files_.

size_t File_Namespace::CachingFileMgr::getMaxMetaFiles ( ) const
inline

Definition at line 191 of file CachingFileMgr.h.

References max_num_meta_files_.

size_t File_Namespace::CachingFileMgr::getMaxSize ( )
inlineoverride

Definition at line 189 of file CachingFileMgr.h.

References max_size_.

189 { return max_size_; }
size_t File_Namespace::CachingFileMgr::getMaxWrapperSize ( ) const
inline

Definition at line 192 of file CachingFileMgr.h.

References max_wrapper_space_.

size_t File_Namespace::CachingFileMgr::getMetadataFileSize ( ) const
inline

Definition at line 196 of file CachingFileMgr.h.

References METADATA_PAGE_SIZE, and File_Namespace::FileMgr::num_pages_per_metadata_file_.

196  {
198  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:408
size_t File_Namespace::CachingFileMgr::getMetadataSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 183 of file CachingFileMgr.cpp.

References File_Namespace::FileMgr::chunkIndex_, File_Namespace::FileMgr::chunkIndexMutex_, and METADATA_PAGE_SIZE.

Referenced by getSpaceReservedByTable().

184  {
185  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
186  size_t space_used = 0;
187  ChunkKey min_table_key{db_id, tb_id};
188  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
189  for (auto it = chunkIndex_.lower_bound(min_table_key);
190  it != chunkIndex_.upper_bound(max_table_key);
191  ++it) {
192  auto& [key, buffer] = *it;
193  space_used += (buffer->numMetadataPages() * METADATA_PAGE_SIZE);
194  }
195  return space_used;
196 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the caller graph for this function:

MgrType File_Namespace::CachingFileMgr::getMgrType ( )
inlineoverride

Definition at line 186 of file CachingFileMgr.h.

186 { return CACHING_FILE_MGR; };
static size_t File_Namespace::CachingFileMgr::getMinimumSize ( )
inlinestatic

Definition at line 174 of file CachingFileMgr.h.

References File_Namespace::FileMgr::DEFAULT_NUM_PAGES_PER_METADATA_FILE, METADATA_FILE_SPACE_PERCENTAGE, and METADATA_PAGE_SIZE.

Referenced by CommandLineOptions::validate().

174  {
175  // Currently the minimum default size is based on the metadata file size and
176  // percentage usage.
179  }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr size_t DEFAULT_NUM_PAGES_PER_METADATA_FILE
Definition: FileMgr.h:363
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getNumChunksWithMetadata ( ) const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

Definition at line 583 of file CachingFileMgr.cpp.

583  {
584  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
585  size_t sum = 0;
586  for (const auto& [key, buf] : chunkIndex_) {
587  if (buf->hasEncoder()) {
588  sum++;
589  }
590  }
591  return sum;
592 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
size_t File_Namespace::CachingFileMgr::getNumDataChunks ( ) const

Returns the number of buffers with chunk data in the CFM.

Definition at line 378 of file CachingFileMgr.cpp.

378  {
379  mapd_shared_lock<mapd_shared_mutex> read_lock(chunkIndexMutex_);
380  size_t num_chunks = 0;
381  for (auto [key, buf] : chunkIndex_) {
382  if (buf->hasDataPages()) {
383  num_chunks++;
384  }
385  }
386  return num_chunks;
387 }
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
size_t File_Namespace::CachingFileMgr::getNumDataFiles ( ) const

Definition at line 546 of file CachingFileMgr.cpp.

546  {
547  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
548  return fileIndex_.count(defaultPageSize_);
549 }
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:390
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:401
size_t File_Namespace::CachingFileMgr::getNumMetaFiles ( ) const

Definition at line 551 of file CachingFileMgr.cpp.

References METADATA_PAGE_SIZE.

551  {
552  mapd_shared_lock<mapd_shared_mutex> read_lock(files_rw_mutex_);
553  return fileIndex_.count(METADATA_PAGE_SIZE);
554 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:390
mapd_shared_lock< mapd_shared_mutex > read_lock
mapd_shared_mutex files_rw_mutex_
Definition: FileMgr.h:401
size_t File_Namespace::CachingFileMgr::getSpaceReservedByTable ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 208 of file CachingFileMgr.cpp.

References getChunkSpaceReservedByTable(), getMetadataSpaceReservedByTable(), and getTableFileMgrSpaceReserved().

208  {
209  auto chunk_space = getChunkSpaceReservedByTable(db_id, tb_id);
210  auto meta_space = getMetadataSpaceReservedByTable(db_id, tb_id);
211  auto subdir_space = getTableFileMgrSpaceReserved(db_id, tb_id);
212  return chunk_space + meta_space + subdir_space;
213 }
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

+ Here is the call graph for this function:

std::string File_Namespace::CachingFileMgr::getStringMgrType ( )
inlineoverride

Definition at line 187 of file CachingFileMgr.h.

187 { return ToString(CACHING_FILE_MGR); }
std::string File_Namespace::CachingFileMgr::getTableFileMgrPath ( int32_t  db,
int32_t  tb 
) const

Definition at line 153 of file CachingFileMgr.cpp.

References File_Namespace::get_dir_name_for_table(), and File_Namespace::FileMgr::getFileMgrBasePath().

153  {
154  return getFileMgrBasePath() + "/" + get_dir_name_for_table(db_id, tb_id);
155 }
std::string get_dir_name_for_table(int db_id, int tb_id)
std::string getFileMgrBasePath() const
Definition: FileMgr.h:323

+ Here is the call graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrSpaceReserved ( int32_t  db_id,
int32_t  tb_id 
) const

Definition at line 198 of file CachingFileMgr.cpp.

References table_dirs_, and table_dirs_mutex_.

Referenced by getSpaceReservedByTable().

198  {
199  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
200  size_t space = 0;
201  auto table_it = table_dirs_.find({db_id, tb_id});
202  if (table_it != table_dirs_.end()) {
203  space += table_it->second->getReservedSpace();
204  }
205  return space;
206 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

size_t File_Namespace::CachingFileMgr::getTableFileMgrsSize ( ) const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

Definition at line 537 of file CachingFileMgr.cpp.

Referenced by getAllocated(), and getAvailableWrapperSpace().

537  {
538  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
539  size_t space_used = 0;
540  for (const auto& [pair, table_dir] : table_dirs_) {
541  space_used += table_dir->getReservedSpace();
542  }
543  return space_used;
544 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

bool File_Namespace::CachingFileMgr::hasFileMgrKey ( ) const
inlineoverridevirtual

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

Reimplemented from File_Namespace::FileMgr.

Definition at line 224 of file CachingFileMgr.h.

224 { return false; }
void File_Namespace::CachingFileMgr::incrementAllEpochs ( )
private

Increment epochs for each table in the CFM.

Definition at line 292 of file CachingFileMgr.cpp.

Referenced by init().

292  {
293  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
294  for (auto& table_dir : table_dirs_) {
295  table_dir.second->incrementEpoch();
296  }
297 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::incrementEpoch ( int32_t  db_id,
int32_t  tb_id 
)
private

Increments epoch for the given table.

Definition at line 132 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

132  {
133  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
134  auto tables_it = table_dirs_.find({db_id, tb_id});
135  CHECK(tables_it != table_dirs_.end());
136  auto& [pair, table_dir] = *tables_it;
137  table_dir->incrementEpoch();
138 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:209
void File_Namespace::CachingFileMgr::init ( const size_t  num_reader_threads)
private

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

Definition at line 66 of file CachingFileMgr.cpp.

References createBufferFromHeaders(), deleteCacheIfTooLarge(), File_Namespace::FileMgr::freePages(), incrementAllEpochs(), File_Namespace::FileMgr::initializeNumThreads(), File_Namespace::FileMgr::isFullyInitted_, File_Namespace::FileMgr::nextFileId_, File_Namespace::FileMgr::openFiles(), readTableFileMgrs(), gpu_enabled::sort(), and VLOG.

Referenced by CachingFileMgr().

66  {
69  auto open_files_result = openFiles();
70  /* Sort headerVec so that all HeaderInfos
71  * from a chunk will be grouped together
72  * and in order of increasing PageId
73  * - Version Epoch */
74  auto& header_vec = open_files_result.header_infos;
75  std::sort(header_vec.begin(), header_vec.end());
76 
77  /* Goal of next section is to find sequences in the
78  * sorted headerVec of the same ChunkId, which we
79  * can then initiate a FileBuffer with */
80  VLOG(3) << "Number of Headers in Vector: " << header_vec.size();
81  if (header_vec.size() > 0) {
82  auto startIt = header_vec.begin();
83  ChunkKey lastChunkKey = startIt->chunkKey;
84  for (auto it = header_vec.begin() + 1; it != header_vec.end(); ++it) {
85  if (it->chunkKey != lastChunkKey) {
86  createBufferFromHeaders(lastChunkKey, startIt, it);
87  lastChunkKey = it->chunkKey;
88  startIt = it;
89  }
90  }
91  createBufferFromHeaders(lastChunkKey, startIt, header_vec.end());
92  }
93  nextFileId_ = open_files_result.max_file_id + 1;
95  freePages();
96  initializeNumThreads(num_reader_threads);
97  isFullyInitted_ = true;
98 }
std::vector< int > ChunkKey
Definition: types.h:37
OpenFilesResult openFiles()
Definition: FileMgr.cpp:189
DEVICE void sort(ARGS &&...args)
Definition: gpu_enabled.h:105
void deleteCacheIfTooLarge()
When the cache is read from disk, we don&#39;t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.
void incrementAllEpochs()
Increment epochs for each table in the CFM.
void readTableFileMgrs()
Checks for any sub-directories containing table-specific data and creates epochs from found files...
void initializeNumThreads(size_t num_reader_threads=0)
Definition: FileMgr.cpp:1513
#define VLOG(n)
Definition: Logger.h:303
FileBuffer * createBufferFromHeaders(const ChunkKey &key, const std::vector< HeaderInfo >::const_iterator &startIt, const std::vector< HeaderInfo >::const_iterator &endIt) override
Creates a buffer and initializes it with info read from files on disk.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

FileBuffer * File_Namespace::CachingFileMgr::putBuffer ( const ChunkKey key,
AbstractBuffer src_buffer,
const size_t  num_bytes = 0 
)
override

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

Definition at line 280 of file CachingFileMgr.cpp.

References CHECK, Data_Namespace::AbstractBuffer::isDirty(), Data_Namespace::AbstractBuffer::setAppended(), Data_Namespace::AbstractBuffer::setDirty(), and Data_Namespace::AbstractBuffer::size().

282  {
283  CHECK(!src_buffer->isDirty()) << "Cannot cache dirty buffers.";
285  // Since the buffer is not dirty we mark it as dirty if we are only writing metadata and
286  // appended if we are writing chunk data. We delete + append rather than write to make
287  // sure we don't write multiple page versions.
288  (src_buffer->size() == 0) ? src_buffer->setDirty() : src_buffer->setAppended();
289  return FileMgr::putBuffer(key, src_buffer, num_bytes);
290 }
void deleteBufferIfExists(const ChunkKey &key)
deletes a buffer if it exists in the mgr. Otherwise do nothing.
#define CHECK(condition)
Definition: Logger.h:209
FileBuffer * putBuffer(const ChunkKey &key, AbstractBuffer *d, const size_t numBytes=0) override
Puts the contents of d into the Chunk with the given key.
Definition: FileMgr.cpp:790

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::readTableFileMgrs ( )
private

Checks for any sub-directories containing table-specific data and creates epochs from found files.

Definition at line 100 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileMgr::fileMgrBasePath_, table_dirs_, and table_dirs_mutex_.

Referenced by init().

100  {
101  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
102  bf::path path(fileMgrBasePath_);
103  CHECK(bf::exists(path)) << "Cache path: " << fileMgrBasePath_ << " does not exit.";
104  CHECK(bf::is_directory(path))
105  << "Specified path '" << fileMgrBasePath_ << "' for disk cache is not a directory.";
106 
107  // Look for directories with table-specific names.
108  boost::regex table_filter("table_([0-9]+)_([0-9]+)");
109  for (const auto& file : bf::directory_iterator(path)) {
110  boost::smatch match;
111  auto file_name = file.path().filename().string();
112  if (boost::regex_match(file_name, match, table_filter)) {
113  int32_t db_id = std::stoi(match[1]);
114  int32_t tb_id = std::stoi(match[2]);
115  TablePair table_pair{db_id, tb_id};
116  CHECK(table_dirs_.find(table_pair) == table_dirs_.end())
117  << "Trying to read data for existing table";
118  table_dirs_.emplace(table_pair,
119  std::make_unique<TableFileMgr>(file.path().string()));
120  }
121  }
122 }
mapd_shared_mutex table_dirs_mutex_
std::string fileMgrBasePath_
Definition: FileMgr.h:386
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
#define CHECK(condition)
Definition: Logger.h:209
mapd_unique_lock< mapd_shared_mutex > write_lock
std::pair< const int32_t, const int32_t > TablePair
Definition: FileMgr.h:86

+ Here is the caller graph for this function:

std::unique_ptr< CachingFileMgr > File_Namespace::CachingFileMgr::reconstruct ( ) const

Initializes a new CFM using the initialization values in the current CFM.

Definition at line 616 of file CachingFileMgr.cpp.

616  {
617  DiskCacheConfig config{fileMgrBasePath_,
620  max_size_,
622  return std::make_unique<CachingFileMgr>(config);
623 }
std::string fileMgrBasePath_
Definition: FileMgr.h:386
size_t num_reader_threads_
Maps page sizes to FileInfo objects.
Definition: FileMgr.h:391
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
void File_Namespace::CachingFileMgr::removeChunkKeepMetadata ( const ChunkKey key)

Free pages for chunk and remove it from the chunk eviction algorithm.

Definition at line 571 of file CachingFileMgr.cpp.

References CHECK.

571  {
572  if (isBufferOnDevice(key)) {
573  auto chunkIt = chunkIndex_.find(key);
574  CHECK(chunkIt != chunkIndex_.end());
575  auto& buf = chunkIt->second;
576  if (buf->hasDataPages()) {
577  buf->freeChunkPages();
579  }
580  }
581 }
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
bool isBufferOnDevice(const ChunkKey &key) override
Definition: FileMgr.cpp:720
#define CHECK(condition)
Definition: Logger.h:209
LRUEvictionAlgorithm chunk_evict_alg_
void File_Namespace::CachingFileMgr::removeKey ( const ChunkKey key) const
private

Definition at line 509 of file CachingFileMgr.cpp.

References get_table_prefix().

509  {
510  // chunkIndex lock should already be acquired.
512  auto [db_id, tb_id] = get_table_prefix(key);
513  ChunkKey table_key{db_id, tb_id};
514  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
515  for (auto it = chunkIndex_.lower_bound(table_key);
516  it != chunkIndex_.upper_bound(max_table_key);
517  ++it) {
518  if (it->first != key) {
519  // If there are any keys in this table other than that one we are removing, then
520  // keep the table in the eviction queue.
521  return;
522  }
523  }
524  // No other keys exist for this table, so remove it from the queue.
525  table_evict_alg_.removeChunk(table_key);
526 }
std::vector< int > ChunkKey
Definition: types.h:37
LRUEvictionAlgorithm table_evict_alg_
void removeChunk(const ChunkKey &) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
std::pair< int, int > get_table_prefix(const ChunkKey &key)
Definition: types.h:58
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::removeTableBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

Erases and cleans up all buffers for a table.

Definition at line 309 of file CachingFileMgr.cpp.

Referenced by clearForTable().

309  {
310  // Free associated FileBuffers and clear buffer entries.
311  mapd_unique_lock<mapd_shared_mutex> write_lock(chunkIndexMutex_);
312  ChunkKey min_table_key{db_id, tb_id};
313  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
314  for (auto it = chunkIndex_.lower_bound(min_table_key);
315  it != chunkIndex_.upper_bound(max_table_key);) {
316  it = deleteBufferUnlocked(it);
317  }
318 }
std::vector< int > ChunkKey
Definition: types.h:37
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge=true) override
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
mapd_unique_lock< mapd_shared_mutex > write_lock
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::removeTableFileMgr ( int32_t  db_id,
int32_t  tb_id 
)
private

Removes the subdirectory content for a table.

Definition at line 299 of file CachingFileMgr.cpp.

Referenced by clearForTable().

299  {
300  // Delete table-specific directory (stores table epoch data and serialized data wrapper)
301  mapd_unique_lock<mapd_shared_mutex> write_lock(table_dirs_mutex_);
302  auto it = table_dirs_.find({db_id, tb_id});
303  if (it != table_dirs_.end()) {
304  it->second->removeDiskContent();
305  table_dirs_.erase(it);
306  }
307 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_unique_lock< mapd_shared_mutex > write_lock

+ Here is the caller graph for this function:

Page File_Namespace::CachingFileMgr::requestFreePage ( size_t  pagesize,
const bool  isMetadata 
)
overrideprivatevirtual

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

Reimplemented from File_Namespace::FileMgr.

Definition at line 397 of file CachingFileMgr.cpp.

References CHECK, File_Namespace::FileInfo::fileId, and File_Namespace::FileInfo::getFreePage().

397  {
398  std::lock_guard<std::mutex> lock(getPageMutex_);
399  int32_t pageNum = -1;
400  // Splits files into metadata and regular data by size.
401  auto candidateFiles = fileIndex_.equal_range(pageSize);
402  // Check if there is a free page in an existing file.
403  for (auto fileIt = candidateFiles.first; fileIt != candidateFiles.second; ++fileIt) {
404  FileInfo* fileInfo = files_.at(fileIt->second);
405  pageNum = fileInfo->getFreePage();
406  if (pageNum != -1) {
407  return (Page(fileInfo->fileId, pageNum));
408  }
409  }
410 
411  // Try to add a new file if there is free space available.
412  FileInfo* fileInfo = nullptr;
413  if (isMetadata) {
414  if (getMaxMetaFiles() > getNumMetaFiles()) {
415  fileInfo = createFile(pageSize, num_pages_per_metadata_file_);
416  }
417  } else {
418  if (getMaxDataFiles() > getNumDataFiles()) {
419  fileInfo = createFile(pageSize, num_pages_per_data_file_);
420  }
421  }
422 
423  if (!fileInfo) {
424  // We were not able to create a new file, so we try to evict space.
425  // Eviction will return the first file it evicted a page from (a file now guaranteed
426  // to have a free page).
427  fileInfo = isMetadata ? evictMetadataPages() : evictPages();
428  }
429  CHECK(fileInfo);
430 
431  pageNum = fileInfo->getFreePage();
432  CHECK(pageNum != -1);
433  return (Page(fileInfo->fileId, pageNum));
434 }
std::mutex getPageMutex_
pointer to DB level metadata
Definition: FileMgr.h:399
static size_t num_pages_per_data_file_
Definition: FileMgr.h:407
FileInfo * evictPages()
evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first F...
PageSizeFileMMap fileIndex_
A map of files accessible via a file identifier.
Definition: FileMgr.h:390
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:408
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
FileInfo * createFile(const size_t pageSize, const size_t numPages)
Adds a file to the file manager repository.
Definition: FileMgr.cpp:936
std::map< int32_t, FileInfo * > files_
Definition: FileMgr.h:389
#define CHECK(condition)
Definition: Logger.h:209

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::setMaxNumDataFiles ( size_t  max)
inline

Definition at line 357 of file CachingFileMgr.h.

References max_num_data_files_.

void File_Namespace::CachingFileMgr::setMaxNumMetadataFiles ( size_t  max)
inline

Definition at line 358 of file CachingFileMgr.h.

References max_num_meta_files_.

void File_Namespace::CachingFileMgr::setMaxSizes ( )
private

Sets the maximum number of files/space for each type of storage based on the maximum size.

Definition at line 653 of file CachingFileMgr.cpp.

References CHECK_GT, and METADATA_PAGE_SIZE.

Referenced by CachingFileMgr().

653  {
654  size_t max_meta_space = std::floor(max_size_ * METADATA_SPACE_PERCENTAGE);
655  size_t max_meta_file_space = std::floor(max_size_ * METADATA_FILE_SPACE_PERCENTAGE);
656  max_wrapper_space_ = max_meta_space - max_meta_file_space;
657  auto max_data_space = max_size_ - max_meta_space;
658  auto meta_file_size = METADATA_PAGE_SIZE * num_pages_per_metadata_file_;
659  auto data_file_size = defaultPageSize_ * num_pages_per_data_file_;
660  max_num_data_files_ = max_data_space / data_file_size;
661  max_num_meta_files_ = max_meta_file_space / meta_file_size;
662  CHECK_GT(max_num_data_files_, 0U) << "Cannot create a cache of size " << max_size_
663  << ". Not enough space to create a data file.";
664  CHECK_GT(max_num_meta_files_, 0U) << "Cannot create a cache of size " << max_size_
665  << ". Not enough space to create a metadata file.";
666 }
#define METADATA_PAGE_SIZE
Definition: FileBuffer.h:37
static constexpr float METADATA_SPACE_PERCENTAGE
#define CHECK_GT(x, y)
Definition: Logger.h:221
static size_t num_pages_per_data_file_
Definition: FileMgr.h:407
static size_t num_pages_per_metadata_file_
Definition: FileMgr.h:408
size_t defaultPageSize_
number of threads used when loading data
Definition: FileMgr.h:392
static constexpr float METADATA_FILE_SPACE_PERCENTAGE

+ Here is the caller graph for this function:

void File_Namespace::CachingFileMgr::setMaxWrapperSpace ( size_t  max)
inline

Definition at line 359 of file CachingFileMgr.h.

References max_wrapper_space_.

void File_Namespace::CachingFileMgr::touchKey ( const ChunkKey key) const
private

Used to track which tables/chunks were least recently used.

Definition at line 504 of file CachingFileMgr.cpp.

References get_table_key().

504  {
507 }
LRUEvictionAlgorithm table_evict_alg_
void touchChunk(const ChunkKey &) override
ChunkKey get_table_key(const ChunkKey &key)
Definition: types.h:53
LRUEvictionAlgorithm chunk_evict_alg_

+ Here is the call graph for this function:

bool File_Namespace::CachingFileMgr::updatePageIfDeleted ( FileInfo file_info,
ChunkKey chunk_key,
int32_t  contingent,
int32_t  page_epoch,
int32_t  page_num 
)
overridevirtual

checks whether a page should be deleted.

Reimplemented from File_Namespace::FileMgr.

Definition at line 334 of file CachingFileMgr.cpp.

References File_Namespace::DELETE_CONTINGENT, File_Namespace::FileInfo::freePage(), and File_Namespace::ROLLOFF_CONTINGENT.

338  {
339  // These contingents are stored by overwriting the bytes used for chunkKeys. If
340  // we run into a key marked for deletion in a fileMgr with no fileMgrKey (i.e.
341  // CachingFileMgr) then we can't know if the epoch is valid because we don't know
342  // the key. At this point our only option is to free the page as though it was
343  // checkpointed (which should be fine since we only maintain one version of each
344  // page).
345  if (contingent == DELETE_CONTINGENT || contingent == ROLLOFF_CONTINGENT) {
346  file_info->freePage(page_num, false, page_epoch);
347  return true;
348  }
349  return false;
350 }
constexpr int32_t DELETE_CONTINGENT
A FileInfo type has a file pointer and metadata about a file.
Definition: FileInfo.h:51
constexpr int32_t ROLLOFF_CONTINGENT
Definition: FileInfo.h:52

+ Here is the call graph for this function:

void File_Namespace::CachingFileMgr::writeAndSyncEpochToDisk ( int32_t  db_id,
int32_t  tb_id 
)
private

Flushes epoch value to disk for a table.

Definition at line 140 of file CachingFileMgr.cpp.

References CHECK, table_dirs_, and table_dirs_mutex_.

140  {
141  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
142  auto table_it = table_dirs_.find({db_id, tb_id});
143  CHECK(table_it != table_dirs_.end());
144  table_it->second->writeAndSyncEpochToDisk();
145 }
mapd_shared_mutex table_dirs_mutex_
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
mapd_shared_lock< mapd_shared_mutex > read_lock
#define CHECK(condition)
Definition: Logger.h:209
void File_Namespace::CachingFileMgr::writeDirtyBuffers ( int32_t  db_id,
int32_t  tb_id 
)
private

helper function to flush all dirty buffers to disk.

Definition at line 352 of file CachingFileMgr.cpp.

352  {
353  mapd_unique_lock<mapd_shared_mutex> chunk_index_write_lock(chunkIndexMutex_);
354  ChunkKey min_table_key{db_id, tb_id};
355  ChunkKey max_table_key{db_id, tb_id, std::numeric_limits<int32_t>::max()};
356 
357  for (auto chunk_it = chunkIndex_.lower_bound(min_table_key);
358  chunk_it != chunkIndex_.upper_bound(max_table_key);
359  ++chunk_it) {
360  if (auto [key, buf] = *chunk_it; buf->isDirty()) {
361  // Free previous versions first so we only have one metadata version.
362  buf->freeMetadataPages();
363  buf->writeMetadata(epoch(db_id, tb_id));
364  buf->clearDirtyBits();
365  touchKey(key);
366  }
367  }
368 }
std::vector< int > ChunkKey
Definition: types.h:37
void touchKey(const ChunkKey &key) const
Used to track which tables/chunks were least recently used.
ChunkKeyToChunkMap chunkIndex_
Definition: FileMgr.h:318
int32_t epoch() const
Definition: FileMgr.h:506
mapd_shared_mutex chunkIndexMutex_
Definition: FileMgr.h:400
void File_Namespace::CachingFileMgr::writeWrapperFile ( const std::string &  doc,
int32_t  db,
int32_t  tb 
)

Writes a wrapper file to a table subdir.

Definition at line 632 of file CachingFileMgr.cpp.

References CHECK_LE.

632  {
634  auto wrapper_size = doc.size();
635  CHECK_LE(wrapper_size, getMaxWrapperSize())
636  << "Wrapper is too big to fit into the cache";
637  while (wrapper_size > getAvailableWrapperSpace()) {
639  }
640  mapd_shared_lock<mapd_shared_mutex> read_lock(table_dirs_mutex_);
641  table_dirs_.at({db, tb})->writeWrapperFile(doc);
642 }
void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)
Writes a wrapper file to a table subdir.
mapd_shared_mutex table_dirs_mutex_
void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)
Create and initialize a subdirectory for a table if none exists.
#define CHECK_LE(x, y)
Definition: Logger.h:220
std::map< TablePair, std::unique_ptr< TableFileMgr > > table_dirs_
FileInfo * evictMetadataPages()
evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page w...
mapd_shared_lock< mapd_shared_mutex > read_lock

Member Data Documentation

LRUEvictionAlgorithm File_Namespace::CachingFileMgr::chunk_evict_alg_
mutableprivate

Definition at line 484 of file CachingFileMgr.h.

size_t File_Namespace::CachingFileMgr::max_num_data_files_
private

Definition at line 480 of file CachingFileMgr.h.

Referenced by getMaxDataFiles(), and setMaxNumDataFiles().

size_t File_Namespace::CachingFileMgr::max_num_meta_files_
private

Definition at line 481 of file CachingFileMgr.h.

Referenced by getMaxMetaFiles(), and setMaxNumMetadataFiles().

size_t File_Namespace::CachingFileMgr::max_size_
private

Definition at line 483 of file CachingFileMgr.h.

Referenced by CachingFileMgr(), getAvailableSpace(), and getMaxSize().

size_t File_Namespace::CachingFileMgr::max_wrapper_space_
private
constexpr float File_Namespace::CachingFileMgr::METADATA_FILE_SPACE_PERCENTAGE {0.01}
static

Definition at line 172 of file CachingFileMgr.h.

Referenced by getMinimumSize().

constexpr float File_Namespace::CachingFileMgr::METADATA_SPACE_PERCENTAGE {0.1}
static

Definition at line 170 of file CachingFileMgr.h.

std::map<TablePair, std::unique_ptr<TableFileMgr> > File_Namespace::CachingFileMgr::table_dirs_
private
mapd_shared_mutex File_Namespace::CachingFileMgr::table_dirs_mutex_
mutableprivate
LRUEvictionAlgorithm File_Namespace::CachingFileMgr::table_evict_alg_
mutableprivate

Definition at line 485 of file CachingFileMgr.h.

Referenced by dumpTableQueue().

constexpr char File_Namespace::CachingFileMgr::WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
static

The documentation for this class was generated from the following files: