OmniSciDB  d2f719934e
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
TableOptimizer Class Reference

Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard. More...

#include <TableOptimizer.h>

+ Collaboration diagram for TableOptimizer:

Public Member Functions

 TableOptimizer (const TableDescriptor *td, Executor *executor, const Catalog_Namespace::Catalog &cat)
 
void recomputeMetadata () const
 Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate. More...
 
void recomputeMetadataUnlocked (const TableUpdateMetadata &table_update_metadata) const
 Recomputes column chunk metadata for the given set of fragments. The caller of this method is expected to have already acquired the executor lock. More...
 
void vacuumDeletedRows () const
 Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased. More...
 
void vacuumFragmentsAboveMinSelectivity (const TableUpdateMetadata &table_update_metadata) const
 

Private Member Functions

DeletedColumnStats recomputeDeletedColumnMetadata (const TableDescriptor *td, const std::set< size_t > &fragment_indexes={}) const
 
void recomputeColumnMetadata (const TableDescriptor *td, const ColumnDescriptor *cd, const std::unordered_map< int, size_t > &tuple_count_map, std::optional< Data_Namespace::MemoryLevel > memory_level, const std::set< size_t > &fragment_indexes) const
 
std::set< size_t > getFragmentIndexes (const TableDescriptor *td, const std::set< int > &fragment_ids) const
 
void vacuumFragments (const TableDescriptor *td, const std::set< int > &fragment_ids={}) const
 
DeletedColumnStats getDeletedColumnStats (const TableDescriptor *td, const std::set< size_t > &fragment_indexes) const
 

Private Attributes

const TableDescriptortd_
 
Executorexecutor_
 
const Catalog_Namespace::Catalogcat_
 

Static Private Attributes

static constexpr size_t ROW_SET_SIZE {1000000000}
 

Detailed Description

Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard.

Definition at line 38 of file TableOptimizer.h.

Constructor & Destructor Documentation

TableOptimizer::TableOptimizer ( const TableDescriptor td,
Executor executor,
const Catalog_Namespace::Catalog cat 
)

Definition at line 29 of file TableOptimizer.cpp.

References CHECK.

32  : td_(td), executor_(executor), cat_(cat) {
33  CHECK(td);
34 }
const TableDescriptor * td_
Executor * executor_
#define CHECK(condition)
Definition: Logger.h:211
const Catalog_Namespace::Catalog & cat_

Member Function Documentation

DeletedColumnStats TableOptimizer::getDeletedColumnStats ( const TableDescriptor td,
const std::set< size_t > &  fragment_indexes 
) const
private

Definition at line 209 of file TableOptimizer.cpp.

References anonymous_namespace{TableOptimizer.cpp}::build_ra_exe_unit(), cat_, CHECK_EQ, DeletedColumnStats::chunk_stats_per_fragment, ColumnDescriptor::columnId, CPU, executor_, anonymous_namespace{TableOptimizer.cpp}::get_compilation_options(), anonymous_namespace{TableOptimizer.cpp}::get_execution_options(), get_logical_type_info(), get_table_infos(), Catalog_Namespace::Catalog::getDeletedColumn(), TableDescriptor::hasDeletedCol, kCOUNT, LOG, anonymous_namespace{TableOptimizer.cpp}::set_metadata_from_results(), TableDescriptor::tableId, DeletedColumnStats::total_row_count, DeletedColumnStats::visible_row_count_per_fragment, and logger::WARNING.

Referenced by recomputeDeletedColumnMetadata(), and vacuumFragmentsAboveMinSelectivity().

211  {
212  if (!td->hasDeletedCol) {
213  return {};
214  }
215 
216  auto cd = cat_.getDeletedColumn(td);
217  const auto column_id = cd->columnId;
218 
219  const auto input_col_desc =
220  std::make_shared<const InputColDescriptor>(column_id, td->tableId, 0);
221  const auto col_expr =
222  makeExpr<Analyzer::ColumnVar>(cd->columnType, td->tableId, column_id, 0);
223  const auto count_expr =
224  makeExpr<Analyzer::AggExpr>(cd->columnType, kCOUNT, col_expr, false, nullptr);
225 
226  const auto ra_exe_unit = build_ra_exe_unit(input_col_desc, {count_expr.get()});
227  const auto table_infos = get_table_infos(ra_exe_unit, executor_);
228  CHECK_EQ(table_infos.size(), size_t(1));
229 
231  const auto eo = get_execution_options();
232 
233  DeletedColumnStats deleted_column_stats;
234  Executor::PerFragmentCallBack compute_deleted_callback =
235  [&deleted_column_stats, cd](
236  ResultSetPtr results, const Fragmenter_Namespace::FragmentInfo& fragment_info) {
237  // count number of tuples in $deleted as total number of tuples in table.
238  if (cd->isDeletedCol) {
239  deleted_column_stats.total_row_count += fragment_info.getPhysicalNumTuples();
240  }
241  if (fragment_info.getPhysicalNumTuples() == 0) {
242  // TODO(adb): Should not happen, but just to be safe...
243  LOG(WARNING) << "Skipping completely empty fragment for column "
244  << cd->columnName;
245  return;
246  }
247 
248  const auto row = results->getNextRow(false, false);
249  CHECK_EQ(row.size(), size_t(1));
250 
251  const auto& ti = cd->columnType;
252 
253  auto chunk_metadata = std::make_shared<ChunkMetadata>();
254  chunk_metadata->sqlType = get_logical_type_info(ti);
255 
256  const auto count_val = read_scalar_target_value<int64_t>(row[0]);
257 
258  // min element 0 max element 1
259  std::vector<TargetValue> fakerow;
260 
261  auto num_tuples = static_cast<size_t>(count_val);
262 
263  // calculate min
264  if (num_tuples == fragment_info.getPhysicalNumTuples()) {
265  // nothing deleted
266  // min = false;
267  // max = false;
268  fakerow.emplace_back(TargetValue{int64_t(0)});
269  fakerow.emplace_back(TargetValue{int64_t(0)});
270  } else {
271  if (num_tuples == 0) {
272  // everything marked as delete
273  // min = true
274  // max = true
275  fakerow.emplace_back(TargetValue{int64_t(1)});
276  fakerow.emplace_back(TargetValue{int64_t(1)});
277  } else {
278  // some deleted
279  // min = false
280  // max = true;
281  fakerow.emplace_back(TargetValue{int64_t(0)});
282  fakerow.emplace_back(TargetValue{int64_t(1)});
283  }
284  }
285 
286  // place manufacture min and max in fake row to use common infra
287  if (!set_metadata_from_results(*chunk_metadata, fakerow, ti, false)) {
288  LOG(WARNING) << "Unable to process new metadata values for column "
289  << cd->columnName;
290  return;
291  }
292 
293  deleted_column_stats.chunk_stats_per_fragment.emplace(
294  std::make_pair(fragment_info.fragmentId, chunk_metadata->chunkStats));
295  deleted_column_stats.visible_row_count_per_fragment.emplace(
296  std::make_pair(fragment_info.fragmentId, num_tuples));
297  };
298 
299  executor_->executeWorkUnitPerFragment(ra_exe_unit,
300  table_infos[0],
301  co,
302  eo,
303  cat_,
304  compute_deleted_callback,
305  fragment_indexes);
306  return deleted_column_stats;
307 }
#define CHECK_EQ(x, y)
Definition: Logger.h:219
const ColumnDescriptor * getDeletedColumn(const TableDescriptor *td) const
Definition: Catalog.cpp:3183
#define LOG(tag)
Definition: Logger.h:205
CompilationOptions get_compilation_options(const ExecutorDeviceType &device_type)
SQLTypeInfo get_logical_type_info(const SQLTypeInfo &type_info)
Definition: sqltypes.h:1064
std::shared_ptr< ResultSet > ResultSetPtr
Executor * executor_
Used by Fragmenter classes to store info about each fragment - the fragment id and number of tuples(r...
Definition: Fragmenter.h:77
std::unordered_map< int, size_t > visible_row_count_per_fragment
std::unordered_map< int, ChunkStats > chunk_stats_per_fragment
Definition: sqldefs.h:76
bool set_metadata_from_results(ChunkMetadata &chunk_metadata, const std::vector< TargetValue > &row, const SQLTypeInfo &ti, const bool has_nulls)
std::function< void(ResultSetPtr, const Fragmenter_Namespace::FragmentInfo &)> PerFragmentCallBack
Definition: Execute.h:565
std::vector< InputTableInfo > get_table_infos(const std::vector< InputDescriptor > &input_descs, Executor *executor)
boost::variant< ScalarTargetValue, ArrayTargetValue, GeoTargetValue, GeoTargetValuePtr > TargetValue
Definition: TargetValue.h:167
RelAlgExecutionUnit build_ra_exe_unit(const std::shared_ptr< const InputColDescriptor > input_col_desc, const std::vector< Analyzer::Expr * > &target_exprs)
const Catalog_Namespace::Catalog & cat_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

std::set< size_t > TableOptimizer::getFragmentIndexes ( const TableDescriptor td,
const std::set< int > &  fragment_ids 
) const
private

Definition at line 408 of file TableOptimizer.cpp.

References CHECK, shared::contains(), TableDescriptor::fragmenter, and i.

Referenced by recomputeMetadataUnlocked(), and vacuumFragmentsAboveMinSelectivity().

410  {
411  CHECK(td->fragmenter);
412  auto table_info = td->fragmenter->getFragmentsForQuery();
413  std::set<size_t> fragment_indexes;
414  for (size_t i = 0; i < table_info.fragments.size(); i++) {
415  if (shared::contains(fragment_ids, table_info.fragments[i].fragmentId)) {
416  fragment_indexes.emplace(i);
417  }
418  }
419  return fragment_indexes;
420 }
bool contains(const T &container, const U &element)
Definition: misc.h:187
std::shared_ptr< Fragmenter_Namespace::AbstractFragmenter > fragmenter
#define CHECK(condition)
Definition: Logger.h:211

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void TableOptimizer::recomputeColumnMetadata ( const TableDescriptor td,
const ColumnDescriptor cd,
const std::unordered_map< int, size_t > &  tuple_count_map,
std::optional< Data_Namespace::MemoryLevel memory_level,
const std::set< size_t > &  fragment_indexes 
) const
private

Definition at line 309 of file TableOptimizer.cpp.

References anonymous_namespace{TableOptimizer.cpp}::build_ra_exe_unit(), cat_, CHECK, CHECK_EQ, ColumnDescriptor::columnId, ColumnDescriptor::columnName, ColumnDescriptor::columnType, CPU, executor_, TableDescriptor::fragmenter, anonymous_namespace{TableOptimizer.cpp}::get_compilation_options(), anonymous_namespace{TableOptimizer.cpp}::get_execution_options(), get_logical_type_info(), get_table_infos(), logger::INFO, kCOUNT, kINT, kMAX, kMIN, LOG, anonymous_namespace{TableOptimizer.cpp}::set_metadata_from_results(), TableDescriptor::tableId, and logger::WARNING.

Referenced by recomputeMetadata(), and recomputeMetadataUnlocked().

314  {
315  const auto ti = cd->columnType;
316  if (ti.is_varlen()) {
317  LOG(INFO) << "Skipping varlen column " << cd->columnName;
318  return;
319  }
320 
321  const auto column_id = cd->columnId;
322  const auto input_col_desc =
323  std::make_shared<const InputColDescriptor>(column_id, td->tableId, 0);
324  const auto col_expr =
325  makeExpr<Analyzer::ColumnVar>(cd->columnType, td->tableId, column_id, 0);
326  auto max_expr =
327  makeExpr<Analyzer::AggExpr>(cd->columnType, kMAX, col_expr, false, nullptr);
328  auto min_expr =
329  makeExpr<Analyzer::AggExpr>(cd->columnType, kMIN, col_expr, false, nullptr);
330  auto count_expr =
331  makeExpr<Analyzer::AggExpr>(cd->columnType, kCOUNT, col_expr, false, nullptr);
332 
333  if (ti.is_string()) {
334  const SQLTypeInfo fun_ti(kINT);
335  const auto fun_expr = makeExpr<Analyzer::KeyForStringExpr>(col_expr);
336  max_expr = makeExpr<Analyzer::AggExpr>(fun_ti, kMAX, fun_expr, false, nullptr);
337  min_expr = makeExpr<Analyzer::AggExpr>(fun_ti, kMIN, fun_expr, false, nullptr);
338  }
339  const auto ra_exe_unit = build_ra_exe_unit(
340  input_col_desc, {min_expr.get(), max_expr.get(), count_expr.get()});
341  const auto table_infos = get_table_infos(ra_exe_unit, executor_);
342  CHECK_EQ(table_infos.size(), size_t(1));
343 
345  const auto eo = get_execution_options();
346 
347  std::unordered_map</*fragment_id*/ int, ChunkStats> stats_map;
348 
349  Executor::PerFragmentCallBack compute_metadata_callback =
350  [&stats_map, &tuple_count_map, cd](
351  ResultSetPtr results, const Fragmenter_Namespace::FragmentInfo& fragment_info) {
352  if (fragment_info.getPhysicalNumTuples() == 0) {
353  // TODO(adb): Should not happen, but just to be safe...
354  LOG(WARNING) << "Skipping completely empty fragment for column "
355  << cd->columnName;
356  return;
357  }
358 
359  const auto row = results->getNextRow(false, false);
360  CHECK_EQ(row.size(), size_t(3));
361 
362  const auto& ti = cd->columnType;
363 
364  auto chunk_metadata = std::make_shared<ChunkMetadata>();
365  chunk_metadata->sqlType = get_logical_type_info(ti);
366 
367  const auto count_val = read_scalar_target_value<int64_t>(row[2]);
368  if (count_val == 0) {
369  // Assume chunk of all nulls, bail
370  return;
371  }
372 
373  bool has_nulls = true; // default to wide
374  auto tuple_count_itr = tuple_count_map.find(fragment_info.fragmentId);
375  if (tuple_count_itr != tuple_count_map.end()) {
376  has_nulls = !(static_cast<size_t>(count_val) == tuple_count_itr->second);
377  } else {
378  // no deleted column calc so use raw physical count
379  has_nulls =
380  !(static_cast<size_t>(count_val) == fragment_info.getPhysicalNumTuples());
381  }
382 
383  if (!set_metadata_from_results(*chunk_metadata, row, ti, has_nulls)) {
384  LOG(WARNING) << "Unable to process new metadata values for column "
385  << cd->columnName;
386  return;
387  }
388 
389  stats_map.emplace(
390  std::make_pair(fragment_info.fragmentId, chunk_metadata->chunkStats));
391  };
392 
393  executor_->executeWorkUnitPerFragment(ra_exe_unit,
394  table_infos[0],
395  co,
396  eo,
397  cat_,
398  compute_metadata_callback,
399  fragment_indexes);
400 
401  auto* fragmenter = td->fragmenter.get();
402  CHECK(fragmenter);
403  fragmenter->updateChunkStats(cd, stats_map, memory_level);
404 }
#define CHECK_EQ(x, y)
Definition: Logger.h:219
#define LOG(tag)
Definition: Logger.h:205
CompilationOptions get_compilation_options(const ExecutorDeviceType &device_type)
SQLTypeInfo get_logical_type_info(const SQLTypeInfo &type_info)
Definition: sqltypes.h:1064
std::shared_ptr< ResultSet > ResultSetPtr
Definition: sqldefs.h:73
Executor * executor_
Used by Fragmenter classes to store info about each fragment - the fragment id and number of tuples(r...
Definition: Fragmenter.h:77
std::shared_ptr< Fragmenter_Namespace::AbstractFragmenter > fragmenter
Definition: sqldefs.h:76
bool set_metadata_from_results(ChunkMetadata &chunk_metadata, const std::vector< TargetValue > &row, const SQLTypeInfo &ti, const bool has_nulls)
std::function< void(ResultSetPtr, const Fragmenter_Namespace::FragmentInfo &)> PerFragmentCallBack
Definition: Execute.h:565
#define CHECK(condition)
Definition: Logger.h:211
std::vector< InputTableInfo > get_table_infos(const std::vector< InputDescriptor > &input_descs, Executor *executor)
Definition: sqltypes.h:45
SQLTypeInfo columnType
Definition: sqldefs.h:74
RelAlgExecutionUnit build_ra_exe_unit(const std::shared_ptr< const InputColDescriptor > input_col_desc, const std::vector< Analyzer::Expr * > &target_exprs)
std::string columnName
const Catalog_Namespace::Catalog & cat_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

DeletedColumnStats TableOptimizer::recomputeDeletedColumnMetadata ( const TableDescriptor td,
const std::set< size_t > &  fragment_indexes = {} 
) const
private

Definition at line 193 of file TableOptimizer.cpp.

References cat_, CHECK, TableDescriptor::fragmenter, Catalog_Namespace::Catalog::getDeletedColumn(), getDeletedColumnStats(), and TableDescriptor::hasDeletedCol.

Referenced by recomputeMetadata(), and recomputeMetadataUnlocked().

195  {
196  if (!td->hasDeletedCol) {
197  return {};
198  }
199 
200  auto stats = getDeletedColumnStats(td, fragment_indexes);
201  auto* fragmenter = td->fragmenter.get();
202  CHECK(fragmenter);
203  auto cd = cat_.getDeletedColumn(td);
204  fragmenter->updateChunkStats(cd, stats.chunk_stats_per_fragment, {});
205  fragmenter->setNumRows(stats.total_row_count);
206  return stats;
207 }
DeletedColumnStats getDeletedColumnStats(const TableDescriptor *td, const std::set< size_t > &fragment_indexes) const
const ColumnDescriptor * getDeletedColumn(const TableDescriptor *td) const
Definition: Catalog.cpp:3183
std::shared_ptr< Fragmenter_Namespace::AbstractFragmenter > fragmenter
#define CHECK(condition)
Definition: Logger.h:211
const Catalog_Namespace::Catalog & cat_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void TableOptimizer::recomputeMetadata ( ) const

Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate.

Definition at line 120 of file TableOptimizer.cpp.

References cat_, CHECK_GE, Catalog_Namespace::DBMetadata::dbId, DEBUG_TIMER, executor_, Catalog_Namespace::Catalog::getAllColumnMetadataForTable(), Catalog_Namespace::Catalog::getCurrentDB(), Catalog_Namespace::Catalog::getDataMgr(), Catalog_Namespace::Catalog::getPhysicalTablesDescriptors(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), logger::INFO, LOG, TableDescriptor::nShards, recomputeColumnMetadata(), recomputeDeletedColumnMetadata(), ROW_SET_SIZE, TableDescriptor::tableId, TableDescriptor::tableName, and td_.

Referenced by Parser::OptimizeTableStmt::execute(), and migrations::MigrationMgr::migrateDateInDaysMetadata().

120  {
121  auto timer = DEBUG_TIMER(__func__);
122  mapd_unique_lock<mapd_shared_mutex> lock(executor_->execute_mutex_);
123 
124  LOG(INFO) << "Recomputing metadata for " << td_->tableName;
125 
126  CHECK_GE(td_->tableId, 0);
127 
128  std::vector<const TableDescriptor*> table_descriptors;
129  if (td_->nShards > 0) {
130  const auto physical_tds = cat_.getPhysicalTablesDescriptors(td_);
131  table_descriptors.insert(
132  table_descriptors.begin(), physical_tds.begin(), physical_tds.end());
133  } else {
134  table_descriptors.push_back(td_);
135  }
136 
137  auto& data_mgr = cat_.getDataMgr();
138 
139  // acquire write lock on table data
141 
142  for (const auto td : table_descriptors) {
143  ScopeGuard row_set_holder = [this] { executor_->row_set_mem_owner_ = nullptr; };
144  executor_->row_set_mem_owner_ =
145  std::make_shared<RowSetMemoryOwner>(ROW_SET_SIZE, /*num_threads=*/1);
146  executor_->catalog_ = &cat_;
147  const auto table_id = td->tableId;
148  auto stats = recomputeDeletedColumnMetadata(td);
149 
150  // TODO(adb): Support geo
151  auto col_descs = cat_.getAllColumnMetadataForTable(table_id, false, false, false);
152  for (const auto& cd : col_descs) {
153  recomputeColumnMetadata(td, cd, stats.visible_row_count_per_fragment, {}, {});
154  }
155  data_mgr.checkpoint(cat_.getCurrentDB().dbId, table_id);
156  executor_->clearMetaInfoCache();
157  }
158 
159  data_mgr.clearMemory(Data_Namespace::MemoryLevel::CPU_LEVEL);
160  if (data_mgr.gpusPresent()) {
161  data_mgr.clearMemory(Data_Namespace::MemoryLevel::GPU_LEVEL);
162  }
163 }
std::string tableName
Data_Namespace::DataMgr & getDataMgr() const
Definition: Catalog.h:226
#define LOG(tag)
Definition: Logger.h:205
static WriteLock getWriteLockForTable(const Catalog_Namespace::Catalog &cat, const std::string &table_name)
Definition: LockMgrImpl.h:155
#define CHECK_GE(x, y)
Definition: Logger.h:224
static constexpr size_t ROW_SET_SIZE
const TableDescriptor * td_
Executor * executor_
const DBMetadata & getCurrentDB() const
Definition: Catalog.h:225
std::vector< const TableDescriptor * > getPhysicalTablesDescriptors(const TableDescriptor *logical_table_desc, bool populate_fragmenter=true) const
Definition: Catalog.cpp:4125
void recomputeColumnMetadata(const TableDescriptor *td, const ColumnDescriptor *cd, const std::unordered_map< int, size_t > &tuple_count_map, std::optional< Data_Namespace::MemoryLevel > memory_level, const std::set< size_t > &fragment_indexes) const
DeletedColumnStats recomputeDeletedColumnMetadata(const TableDescriptor *td, const std::set< size_t > &fragment_indexes={}) const
std::list< const ColumnDescriptor * > getAllColumnMetadataForTable(const int tableId, const bool fetchSystemColumns, const bool fetchVirtualColumns, const bool fetchPhysicalColumns) const
Returns a list of pointers to constant ColumnDescriptor structs for all the columns from a particular...
Definition: Catalog.cpp:1811
#define DEBUG_TIMER(name)
Definition: Logger.h:358
const Catalog_Namespace::Catalog & cat_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void TableOptimizer::recomputeMetadataUnlocked ( const TableUpdateMetadata table_update_metadata) const

Recomputes column chunk metadata for the given set of fragments. The caller of this method is expected to have already acquired the executor lock.

Definition at line 165 of file TableOptimizer.cpp.

References cat_, CHECK, TableUpdateMetadata::columns_for_metadata_update, Data_Namespace::CPU_LEVEL, DEBUG_TIMER, getFragmentIndexes(), Catalog_Namespace::Catalog::getMetadataForTable(), recomputeColumnMetadata(), and recomputeDeletedColumnMetadata().

166  {
167  auto timer = DEBUG_TIMER(__func__);
168  std::map<int, std::list<const ColumnDescriptor*>> columns_by_table_id;
169  auto& columns_for_update = table_update_metadata.columns_for_metadata_update;
170  for (const auto& entry : columns_for_update) {
171  auto column_descriptor = entry.first;
172  columns_by_table_id[column_descriptor->tableId].emplace_back(column_descriptor);
173  }
174 
175  for (const auto& [table_id, columns] : columns_by_table_id) {
176  auto td = cat_.getMetadataForTable(table_id);
177  auto stats = recomputeDeletedColumnMetadata(td);
178  for (const auto cd : columns) {
179  CHECK(columns_for_update.find(cd) != columns_for_update.end());
180  auto fragment_indexes = getFragmentIndexes(td, columns_for_update.find(cd)->second);
182  cd,
183  stats.visible_row_count_per_fragment,
185  fragment_indexes);
186  }
187  }
188 }
void recomputeColumnMetadata(const TableDescriptor *td, const ColumnDescriptor *cd, const std::unordered_map< int, size_t > &tuple_count_map, std::optional< Data_Namespace::MemoryLevel > memory_level, const std::set< size_t > &fragment_indexes) const
DeletedColumnStats recomputeDeletedColumnMetadata(const TableDescriptor *td, const std::set< size_t > &fragment_indexes={}) const
ColumnToFragmentsMap columns_for_metadata_update
Definition: Execute.h:321
#define CHECK(condition)
Definition: Logger.h:211
#define DEBUG_TIMER(name)
Definition: Logger.h:358
const TableDescriptor * getMetadataForTable(const std::string &tableName, const bool populateFragmenter=true) const
Returns a pointer to a const TableDescriptor struct matching the provided tableName.
const Catalog_Namespace::Catalog & cat_
std::set< size_t > getFragmentIndexes(const TableDescriptor *td, const std::set< int > &fragment_ids) const

+ Here is the call graph for this function:

void TableOptimizer::vacuumDeletedRows ( ) const

Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased.

Definition at line 422 of file TableOptimizer.cpp.

References cat_, Catalog_Namespace::Catalog::checkpoint(), File_Namespace::GlobalFileMgr::compactDataFiles(), DEBUG_TIMER, Catalog_Namespace::Catalog::getDatabaseId(), Catalog_Namespace::Catalog::getDataMgr(), Data_Namespace::DataMgr::getGlobalFileMgr(), Catalog_Namespace::Catalog::getPhysicalTablesDescriptors(), Catalog_Namespace::Catalog::getTableEpochs(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), Catalog_Namespace::Catalog::removeFragmenterForTable(), Catalog_Namespace::Catalog::setTableEpochsLogExceptions(), TableDescriptor::tableId, td_, and vacuumFragments().

Referenced by Parser::OptimizeTableStmt::execute(), and DBHandler::sql_execute_impl().

422  {
423  auto timer = DEBUG_TIMER(__func__);
424  const auto table_id = td_->tableId;
425  const auto db_id = cat_.getDatabaseId();
426  const auto table_lock =
428  const auto table_epochs = cat_.getTableEpochs(db_id, table_id);
429  const auto shards = cat_.getPhysicalTablesDescriptors(td_);
430  try {
431  for (const auto shard : shards) {
432  vacuumFragments(shard);
433  }
434  cat_.checkpoint(table_id);
435  } catch (...) {
436  cat_.setTableEpochsLogExceptions(db_id, table_epochs);
437  throw;
438  }
439 
440  for (auto shard : shards) {
441  cat_.removeFragmenterForTable(shard->tableId);
443  shard->tableId);
444  }
445 }
Data_Namespace::DataMgr & getDataMgr() const
Definition: Catalog.h:226
static WriteLock getWriteLockForTable(const Catalog_Namespace::Catalog &cat, const std::string &table_name)
Definition: LockMgrImpl.h:155
const TableDescriptor * td_
int getDatabaseId() const
Definition: Catalog.h:281
std::vector< const TableDescriptor * > getPhysicalTablesDescriptors(const TableDescriptor *logical_table_desc, bool populate_fragmenter=true) const
Definition: Catalog.cpp:4125
File_Namespace::GlobalFileMgr * getGlobalFileMgr() const
Definition: DataMgr.cpp:616
void checkpoint(const int logicalTableId) const
Definition: Catalog.cpp:4278
void removeFragmenterForTable(const int table_id) const
Definition: Catalog.cpp:3493
void compactDataFiles(const int32_t db_id, const int32_t tb_id)
#define DEBUG_TIMER(name)
Definition: Logger.h:358
void setTableEpochsLogExceptions(const int32_t db_id, const std::vector< TableEpochInfo > &table_epochs) const
Definition: Catalog.cpp:3171
const Catalog_Namespace::Catalog & cat_
std::vector< TableEpochInfo > getTableEpochs(const int32_t db_id, const int32_t table_id) const
Definition: Catalog.cpp:3107
void vacuumFragments(const TableDescriptor *td, const std::set< int > &fragment_ids={}) const

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void TableOptimizer::vacuumFragments ( const TableDescriptor td,
const std::set< int > &  fragment_ids = {} 
) const
private

Definition at line 447 of file TableOptimizer.cpp.

References cat_, UpdelRoll::catalog, CHECK_EQ, CHUNK_KEY_COLUMN_IDX, CHUNK_KEY_FRAGMENT_IDX, ColumnDescriptor::columnId, shared::contains(), Data_Namespace::CPU_LEVEL, TableDescriptor::fragmenter, Chunk_NS::Chunk::getChunk(), Data_Namespace::DataMgr::getChunkMetadataVecForKeyPrefix(), Catalog_Namespace::Catalog::getDatabaseId(), Catalog_Namespace::Catalog::getDataMgr(), Catalog_Namespace::Catalog::getDeletedColumn(), Catalog_Namespace::Catalog::getLogicalTableId(), UpdelRoll::logicalTableId, UpdelRoll::memoryLevel, UpdelRoll::stageUpdate(), UpdelRoll::table_descriptor, and TableDescriptor::tableId.

Referenced by vacuumDeletedRows(), and vacuumFragmentsAboveMinSelectivity().

448  {
449  // "if not a table that supports delete return, nothing more to do"
450  const ColumnDescriptor* cd = cat_.getDeletedColumn(td);
451  if (nullptr == cd) {
452  return;
453  }
454  // vacuum chunks which show sign of deleted rows in metadata
455  ChunkKey chunk_key_prefix = {cat_.getDatabaseId(), td->tableId, cd->columnId};
456  ChunkMetadataVector chunk_metadata_vec;
457  cat_.getDataMgr().getChunkMetadataVecForKeyPrefix(chunk_metadata_vec, chunk_key_prefix);
458  for (auto& [chunk_key, chunk_metadata] : chunk_metadata_vec) {
459  auto fragment_id = chunk_key[CHUNK_KEY_FRAGMENT_IDX];
460  // If delete has occurred, only vacuum fragments that are in the fragment_ids set.
461  // Empty fragment_ids set implies all fragments.
462  if (chunk_metadata->chunkStats.max.tinyintval == 1 &&
463  (fragment_ids.empty() || shared::contains(fragment_ids, fragment_id))) {
464  UpdelRoll updel_roll;
465  updel_roll.catalog = &cat_;
466  updel_roll.logicalTableId = cat_.getLogicalTableId(td->tableId);
468  updel_roll.table_descriptor = td;
469  CHECK_EQ(cd->columnId, chunk_key[CHUNK_KEY_COLUMN_IDX]);
470  const auto chunk = Chunk_NS::Chunk::getChunk(cd,
471  &cat_.getDataMgr(),
472  chunk_key,
473  updel_roll.memoryLevel,
474  0,
475  chunk_metadata->numBytes,
476  chunk_metadata->numElements);
477  td->fragmenter->compactRows(&cat_,
478  td,
479  fragment_id,
480  td->fragmenter->getVacuumOffsets(chunk),
481  updel_roll.memoryLevel,
482  updel_roll);
483  updel_roll.stageUpdate();
484  }
485  }
486  td->fragmenter->resetSizesFromFragments();
487 }
bool contains(const T &container, const U &element)
Definition: misc.h:187
Data_Namespace::MemoryLevel memoryLevel
Definition: UpdelRoll.h:54
#define CHECK_EQ(x, y)
Definition: Logger.h:219
std::vector< int > ChunkKey
Definition: types.h:37
const TableDescriptor * table_descriptor
Definition: UpdelRoll.h:57
const ColumnDescriptor * getDeletedColumn(const TableDescriptor *td) const
Definition: Catalog.cpp:3183
Data_Namespace::DataMgr & getDataMgr() const
Definition: Catalog.h:226
#define CHUNK_KEY_FRAGMENT_IDX
Definition: types.h:42
void stageUpdate()
const Catalog_Namespace::Catalog * catalog
Definition: UpdelRoll.h:52
void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix)
Definition: DataMgr.cpp:467
std::vector< std::pair< ChunkKey, std::shared_ptr< ChunkMetadata >>> ChunkMetadataVector
int getDatabaseId() const
Definition: Catalog.h:281
static std::shared_ptr< Chunk > getChunk(const ColumnDescriptor *cd, DataMgr *data_mgr, const ChunkKey &key, const MemoryLevel mem_level, const int deviceId, const size_t num_bytes, const size_t num_elems)
Definition: Chunk.cpp:30
int getLogicalTableId(const int physicalTableId) const
Definition: Catalog.cpp:4264
specifies the content in-memory of a row in the column metadata table
std::shared_ptr< Fragmenter_Namespace::AbstractFragmenter > fragmenter
int logicalTableId
Definition: UpdelRoll.h:53
#define CHUNK_KEY_COLUMN_IDX
Definition: types.h:41
const Catalog_Namespace::Catalog & cat_

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void TableOptimizer::vacuumFragmentsAboveMinSelectivity ( const TableUpdateMetadata table_update_metadata) const

Vacuums fragments with a deleted rows percentage that exceeds the configured minimum vacuum selectivity threshold.

Definition at line 489 of file TableOptimizer.cpp.

References cat_, Catalog_Namespace::Catalog::checkpoint(), Catalog_Namespace::Catalog::checkpointWithAutoRollback(), DEBUG_TIMER, Data_Namespace::DISK_LEVEL, executor_, TableUpdateMetadata::fragments_with_deleted_rows, g_vacuum_min_selectivity, Catalog_Namespace::Catalog::getDatabaseId(), getDeletedColumnStats(), getFragmentIndexes(), Catalog_Namespace::Catalog::getMetadataForTable(), Catalog_Namespace::Catalog::getTableEpochs(), lockmgr::TableLockMgrImpl< TableDataLockMgr >::getWriteLockForTable(), TableDescriptor::persistenceLevel, shared::printContainer(), ROW_SET_SIZE, Catalog_Namespace::Catalog::setTableEpochsLogExceptions(), TableDescriptor::tableId, td_, vacuumFragments(), DeletedColumnStats::visible_row_count_per_fragment, and VLOG.

490  {
492  return;
493  }
494  auto timer = DEBUG_TIMER(__func__);
495  std::map<const TableDescriptor*, std::set<int32_t>> fragments_to_vacuum;
496  for (const auto& [table_id, fragment_ids] :
497  table_update_metadata.fragments_with_deleted_rows) {
498  auto td = cat_.getMetadataForTable(table_id);
499  // Skip automatic vacuuming for tables with uncapped epoch
500  if (td->maxRollbackEpochs == -1) {
501  continue;
502  }
503 
504  DeletedColumnStats deleted_column_stats;
505  {
506  mapd_unique_lock<mapd_shared_mutex> executor_lock(executor_->execute_mutex_);
507  ScopeGuard row_set_holder = [this] { executor_->row_set_mem_owner_ = nullptr; };
508  executor_->row_set_mem_owner_ =
509  std::make_shared<RowSetMemoryOwner>(ROW_SET_SIZE, /*num_threads=*/1);
510  deleted_column_stats =
511  getDeletedColumnStats(td, getFragmentIndexes(td, fragment_ids));
512  executor_->clearMetaInfoCache();
513  }
514 
515  std::set<int32_t> filtered_fragment_ids;
516  for (const auto [fragment_id, visible_row_count] :
517  deleted_column_stats.visible_row_count_per_fragment) {
518  auto total_row_count =
519  td->fragmenter->getFragmentInfo(fragment_id)->getPhysicalNumTuples();
520  float deleted_row_count = total_row_count - visible_row_count;
521  if ((deleted_row_count / total_row_count) >= g_vacuum_min_selectivity) {
522  filtered_fragment_ids.emplace(fragment_id);
523  }
524  }
525 
526  if (!filtered_fragment_ids.empty()) {
527  fragments_to_vacuum[td] = filtered_fragment_ids;
528  }
529  }
530 
531  if (!fragments_to_vacuum.empty()) {
532  const auto db_id = cat_.getDatabaseId();
533  const auto table_lock =
535  const auto table_epochs = cat_.getTableEpochs(db_id, td_->tableId);
536  try {
537  for (const auto& [td, fragment_ids] : fragments_to_vacuum) {
538  vacuumFragments(td, fragment_ids);
539  VLOG(1) << "Auto-vacuumed fragments: " << shared::printContainer(fragment_ids)
540  << ", table id: " << td->tableId;
541  }
543  } catch (...) {
544  cat_.setTableEpochsLogExceptions(db_id, table_epochs);
545  throw;
546  }
547  } else {
548  // Checkpoint, even when no data update occurs, in order to ensure that epochs are
549  // uniformly incremented in distributed mode.
551  }
552 }
DeletedColumnStats getDeletedColumnStats(const TableDescriptor *td, const std::set< size_t > &fragment_indexes) const
static WriteLock getWriteLockForTable(const Catalog_Namespace::Catalog &cat, const std::string &table_name)
Definition: LockMgrImpl.h:155
TableToFragmentIds fragments_with_deleted_rows
Definition: Execute.h:322
static constexpr size_t ROW_SET_SIZE
const TableDescriptor * td_
Executor * executor_
int getDatabaseId() const
Definition: Catalog.h:281
void checkpointWithAutoRollback(const int logical_table_id) const
Definition: Catalog.cpp:4286
void checkpoint(const int logicalTableId) const
Definition: Catalog.cpp:4278
Data_Namespace::MemoryLevel persistenceLevel
float g_vacuum_min_selectivity
#define DEBUG_TIMER(name)
Definition: Logger.h:358
void setTableEpochsLogExceptions(const int32_t db_id, const std::vector< TableEpochInfo > &table_epochs) const
Definition: Catalog.cpp:3171
PrintContainer< CONTAINER > printContainer(CONTAINER &container)
Definition: misc.h:104
const TableDescriptor * getMetadataForTable(const std::string &tableName, const bool populateFragmenter=true) const
Returns a pointer to a const TableDescriptor struct matching the provided tableName.
#define VLOG(n)
Definition: Logger.h:305
const Catalog_Namespace::Catalog & cat_
std::vector< TableEpochInfo > getTableEpochs(const int32_t db_id, const int32_t table_id) const
Definition: Catalog.cpp:3107
std::set< size_t > getFragmentIndexes(const TableDescriptor *td, const std::set< int > &fragment_ids) const
void vacuumFragments(const TableDescriptor *td, const std::set< int > &fragment_ids={}) const

+ Here is the call graph for this function:

Member Data Documentation

Executor* TableOptimizer::executor_
private
constexpr size_t TableOptimizer::ROW_SET_SIZE {1000000000}
staticprivate

Definition at line 101 of file TableOptimizer.h.

Referenced by recomputeMetadata(), and vacuumFragmentsAboveMinSelectivity().

const TableDescriptor* TableOptimizer::td_
private

The documentation for this class was generated from the following files: