OmniSciDB  5ade3759e0
TableOptimizer Class Reference

Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard. More...

#include <TableOptimizer.h>

+ Collaboration diagram for TableOptimizer:

Public Member Functions

 TableOptimizer (const TableDescriptor *td, Executor *executor, const Catalog_Namespace::Catalog &cat)
 
void recomputeMetadata () const
 Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate. More...
 
void vacuumDeletedRows () const
 Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased. More...
 

Private Attributes

const TableDescriptortd_
 
Executorexecutor_
 
const Catalog_Namespace::Catalogcat_
 

Detailed Description

Driver for running cleanup processes on a table. TableOptimizer provides functions for various cleanup processes that improve performance on a table. Only tables that have been modified using updates or deletes are candidates for cleanup. If the table descriptor corresponds to a sharded table, table optimizer processes each physical shard.

Definition at line 32 of file TableOptimizer.h.

Constructor & Destructor Documentation

◆ TableOptimizer()

TableOptimizer::TableOptimizer ( const TableDescriptor td,
Executor executor,
const Catalog_Namespace::Catalog cat 
)
inline

Definition at line 34 of file TableOptimizer.h.

References CHECK, recomputeMetadata(), and vacuumDeletedRows().

37  : td_(td), executor_(executor), cat_(cat) {
38  CHECK(td);
39  }
const TableDescriptor * td_
Executor * executor_
#define CHECK(condition)
Definition: Logger.h:187
const Catalog_Namespace::Catalog & cat_
+ Here is the call graph for this function:

Member Function Documentation

◆ recomputeMetadata()

void TableOptimizer::recomputeMetadata ( ) const

Recomputes per-chunk metadata for each fragment in the table. Updates and deletes can cause chunk metadata to become wider than the values in the chunk. Recomputing the metadata narrows the range to fit the chunk, as well as setting or unsetting the nulls flag as appropriate.

Definition at line 107 of file TableOptimizer.cpp.

References anonymous_namespace{TableOptimizer.cpp}::build_ra_exe_unit(), cat_, CHECK, CHECK_EQ, CHECK_GE, ChunkMetadata::chunkStats, ColumnDescriptor::columnId, CPU, Data_Namespace::CPU_LEVEL, Catalog_Namespace::DBMetadata::dbId, executor_, anonymous_namespace{TableOptimizer.cpp}::get_compilation_options(), anonymous_namespace{TableOptimizer.cpp}::get_execution_options(), get_logical_type_info(), get_table_infos(), Catalog_Namespace::Catalog::getAllColumnMetadataForTable(), Catalog_Namespace::Catalog::getCurrentDB(), Catalog_Namespace::Catalog::getDataMgr(), Catalog_Namespace::Catalog::getDeletedColumn(), Catalog_Namespace::Catalog::getPhysicalTablesDescriptors(), Data_Namespace::GPU_LEVEL, logger::INFO, INJECT_TIMER, kCOUNT, kINT, kMAX, kMIN, LOG, TableDescriptor::nShards, anonymous_namespace{TableOptimizer.cpp}::set_metadata_from_results(), ChunkMetadata::sqlType, TableDescriptor::tableId, TableDescriptor::tableName, td_, and logger::WARNING.

Referenced by Catalog_Namespace::Catalog::checkDateInDaysColumnMigration(), and TableOptimizer().

107  {
108  INJECT_TIMER(optimizeMetadata);
109  std::lock_guard<std::mutex> lock(executor_->execute_mutex_);
110 
111  LOG(INFO) << "Recomputing metadata for " << td_->tableName;
112 
113  CHECK_GE(td_->tableId, 0);
114 
115  std::vector<const TableDescriptor*> table_descriptors;
116  if (td_->nShards > 0) {
117  const auto physical_tds = cat_.getPhysicalTablesDescriptors(td_);
118  table_descriptors.insert(
119  table_descriptors.begin(), physical_tds.begin(), physical_tds.end());
120  } else {
121  table_descriptors.push_back(td_);
122  }
123 
124  auto& data_mgr = cat_.getDataMgr();
125 
126  for (const auto td : table_descriptors) {
127  ScopeGuard row_set_holder = [this] { executor_->row_set_mem_owner_ = nullptr; };
128  executor_->row_set_mem_owner_ = std::make_shared<RowSetMemoryOwner>();
129  executor_->catalog_ = &cat_;
130  const auto table_id = td->tableId;
131 
132  std::unordered_map</*fragment_id*/ int, size_t> tuple_count_map;
133 
134  // Special case handle $deleted column if it exists
135  // whilst handling the delete column also capture
136  // the number of non deleted rows per fragment
137  if (td->hasDeletedCol) {
138  auto cd = cat_.getDeletedColumn(td);
139  const auto column_id = cd->columnId;
140 
141  const auto input_col_desc =
142  std::make_shared<const InputColDescriptor>(column_id, table_id, 0);
143  const auto col_expr =
144  makeExpr<Analyzer::ColumnVar>(cd->columnType, table_id, column_id, 0);
145  const auto count_expr =
146  makeExpr<Analyzer::AggExpr>(cd->columnType, kCOUNT, col_expr, false, nullptr);
147 
148  const auto ra_exe_unit = build_ra_exe_unit(input_col_desc, {count_expr.get()});
149  const auto table_infos = get_table_infos(ra_exe_unit, executor_);
150  CHECK_EQ(table_infos.size(), size_t(1));
151 
153  const auto eo = get_execution_options();
154 
155  std::unordered_map</*fragment_id*/ int, ChunkStats> stats_map;
156 
157  size_t total_num_tuples = 0;
158  PerFragmentCB compute_deleted_callback =
159  [&stats_map, &tuple_count_map, &total_num_tuples, cd](
160  ResultSetPtr results,
161  const Fragmenter_Namespace::FragmentInfo& fragment_info) {
162  // count number of tuples in $deleted as total number of tuples in table.
163  if (cd->isDeletedCol) {
164  total_num_tuples += fragment_info.getPhysicalNumTuples();
165  }
166  if (fragment_info.getPhysicalNumTuples() == 0) {
167  // TODO(adb): Should not happen, but just to be safe...
168  LOG(WARNING) << "Skipping completely empty fragment for column "
169  << cd->columnName;
170  return;
171  }
172 
173  const auto row = results->getNextRow(false, false);
174  CHECK_EQ(row.size(), size_t(1));
175 
176  const auto& ti = cd->columnType;
177 
178  ChunkMetadata chunk_metadata;
179  chunk_metadata.sqlType = get_logical_type_info(ti);
180 
181  const auto count_val = read_scalar_target_value<int64_t>(row[0]);
182  if (count_val == 0) {
183  // Assume chunk of all nulls, bail
184  return;
185  }
186 
187  // min element 0 max element 1
188  std::vector<TargetValue> fakerow;
189 
190  auto num_tuples = static_cast<size_t>(count_val);
191 
192  // calculate min
193  if (num_tuples == fragment_info.getPhysicalNumTuples()) {
194  // nothing deleted
195  // min = false;
196  // max = false;
197  fakerow.emplace_back(TargetValue{int64_t(0)});
198  fakerow.emplace_back(TargetValue{int64_t(0)});
199  } else {
200  if (num_tuples == 0) {
201  // everything marked as delete
202  // min = true
203  // max = true
204  fakerow.emplace_back(TargetValue{int64_t(1)});
205  fakerow.emplace_back(TargetValue{int64_t(1)});
206  } else {
207  // some deleted
208  // min = false
209  // max = true;
210  fakerow.emplace_back(TargetValue{int64_t(0)});
211  fakerow.emplace_back(TargetValue{int64_t(1)});
212  }
213  }
214 
215  // place manufacture min and max in fake row to use common infra
216  if (!set_metadata_from_results(chunk_metadata, fakerow, ti, false)) {
217  LOG(WARNING) << "Unable to process new metadata values for column "
218  << cd->columnName;
219  return;
220  }
221 
222  stats_map.emplace(
223  std::make_pair(fragment_info.fragmentId, chunk_metadata.chunkStats));
224  tuple_count_map.emplace(std::make_pair(fragment_info.fragmentId, num_tuples));
225  };
226 
227  executor_->executeWorkUnitPerFragment(
228  ra_exe_unit, table_infos[0], co, eo, cat_, compute_deleted_callback);
229 
230  auto* fragmenter = td->fragmenter;
231  CHECK(fragmenter);
232  fragmenter->updateChunkStats(cd, stats_map);
233  fragmenter->setNumRows(total_num_tuples);
234  } // finished special handling deleted column;
235 
236  // TODO(adb): Support geo
237  auto col_descs = cat_.getAllColumnMetadataForTable(table_id, false, false, false);
238  for (const auto& cd : col_descs) {
239  const auto ti = cd->columnType;
240  const auto column_id = cd->columnId;
241 
242  if (ti.is_varlen()) {
243  LOG(INFO) << "Skipping varlen column " << cd->columnName;
244  continue;
245  }
246 
247  const auto input_col_desc =
248  std::make_shared<const InputColDescriptor>(column_id, table_id, 0);
249  const auto col_expr =
250  makeExpr<Analyzer::ColumnVar>(cd->columnType, table_id, column_id, 0);
251  auto max_expr =
252  makeExpr<Analyzer::AggExpr>(cd->columnType, kMAX, col_expr, false, nullptr);
253  auto min_expr =
254  makeExpr<Analyzer::AggExpr>(cd->columnType, kMIN, col_expr, false, nullptr);
255  auto count_expr =
256  makeExpr<Analyzer::AggExpr>(cd->columnType, kCOUNT, col_expr, false, nullptr);
257 
258  if (ti.is_string()) {
259  const SQLTypeInfo fun_ti(kINT);
260  const auto fun_expr = makeExpr<Analyzer::KeyForStringExpr>(col_expr);
261  max_expr = makeExpr<Analyzer::AggExpr>(fun_ti, kMAX, fun_expr, false, nullptr);
262  min_expr = makeExpr<Analyzer::AggExpr>(fun_ti, kMIN, fun_expr, false, nullptr);
263  }
264  const auto ra_exe_unit = build_ra_exe_unit(
265  input_col_desc, {min_expr.get(), max_expr.get(), count_expr.get()});
266  const auto table_infos = get_table_infos(ra_exe_unit, executor_);
267  CHECK_EQ(table_infos.size(), size_t(1));
268 
270  const auto eo = get_execution_options();
271 
272  std::unordered_map</*fragment_id*/ int, ChunkStats> stats_map;
273 
274  PerFragmentCB compute_metadata_callback =
275  [&stats_map, &tuple_count_map, cd](
276  ResultSetPtr results,
277  const Fragmenter_Namespace::FragmentInfo& fragment_info) {
278  if (fragment_info.getPhysicalNumTuples() == 0) {
279  // TODO(adb): Should not happen, but just to be safe...
280  LOG(WARNING) << "Skipping completely empty fragment for column "
281  << cd->columnName;
282  return;
283  }
284 
285  const auto row = results->getNextRow(false, false);
286  CHECK_EQ(row.size(), size_t(3));
287 
288  const auto& ti = cd->columnType;
289 
290  ChunkMetadata chunk_metadata;
291  chunk_metadata.sqlType = get_logical_type_info(ti);
292 
293  const auto count_val = read_scalar_target_value<int64_t>(row[2]);
294  if (count_val == 0) {
295  // Assume chunk of all nulls, bail
296  return;
297  }
298 
299  bool has_nulls = true; // default to wide
300  auto tuple_count_itr = tuple_count_map.find(fragment_info.fragmentId);
301  if (tuple_count_itr != tuple_count_map.end()) {
302  has_nulls = !(static_cast<size_t>(count_val) == tuple_count_itr->second);
303  } else {
304  // no deleted column calc so use raw physical count
305  has_nulls = !(static_cast<size_t>(count_val) ==
306  fragment_info.getPhysicalNumTuples());
307  }
308 
309  if (!set_metadata_from_results(chunk_metadata, row, ti, has_nulls)) {
310  LOG(WARNING) << "Unable to process new metadata values for column "
311  << cd->columnName;
312  return;
313  }
314 
315  stats_map.emplace(
316  std::make_pair(fragment_info.fragmentId, chunk_metadata.chunkStats));
317  };
318 
319  executor_->executeWorkUnitPerFragment(
320  ra_exe_unit, table_infos[0], co, eo, cat_, compute_metadata_callback);
321 
322  auto* fragmenter = td->fragmenter;
323  CHECK(fragmenter);
324  fragmenter->updateChunkStats(cd, stats_map);
325  }
326  data_mgr.checkpoint(cat_.getCurrentDB().dbId, table_id);
327  executor_->clearMetaInfoCache();
328  }
329 
330  data_mgr.clearMemory(Data_Namespace::MemoryLevel::CPU_LEVEL);
331  if (data_mgr.gpusPresent()) {
332  data_mgr.clearMemory(Data_Namespace::MemoryLevel::GPU_LEVEL);
333  }
334 }
#define CHECK_EQ(x, y)
Definition: Logger.h:195
RelAlgExecutionUnit build_ra_exe_unit(const std::shared_ptr< const InputColDescriptor > input_col_desc, const std::vector< Analyzer::Expr *> &target_exprs)
std::string tableName
Data_Namespace::DataMgr & getDataMgr() const
Definition: Catalog.h:177
#define LOG(tag)
Definition: Logger.h:182
CompilationOptions get_compilation_options(const ExecutorDeviceType &device_type)
#define CHECK_GE(x, y)
Definition: Logger.h:200
SQLTypeInfo get_logical_type_info(const SQLTypeInfo &type_info)
Definition: sqltypes.h:840
std::shared_ptr< ResultSet > ResultSetPtr
std::vector< const TableDescriptor * > getPhysicalTablesDescriptors(const TableDescriptor *logicalTableDesc) const
Definition: Catalog.cpp:2895
ChunkStats chunkStats
Definition: ChunkMetadata.h:35
const TableDescriptor * td_
Definition: sqldefs.h:71
Executor * executor_
#define INJECT_TIMER(DESC)
Definition: measure.h:91
Used by Fragmenter classes to store info about each fragment - the fragment id and number of tuples(r...
Definition: Fragmenter.h:79
const DBMetadata & getCurrentDB() const
Definition: Catalog.h:176
std::list< const ColumnDescriptor * > getAllColumnMetadataForTable(const int tableId, const bool fetchSystemColumns, const bool fetchVirtualColumns, const bool fetchPhysicalColumns) const
Returns a list of pointers to constant ColumnDescriptor structs for all the columns from a particular...
Definition: Catalog.cpp:1579
Definition: sqldefs.h:71
const ColumnDescriptor * getDeletedColumn(const TableDescriptor *td) const
Definition: Catalog.cpp:2177
bool set_metadata_from_results(ChunkMetadata &chunk_metadata, const std::vector< TargetValue > &row, const SQLTypeInfo &ti, const bool has_nulls)
std::function< void(ResultSetPtr, const Fragmenter_Namespace::FragmentInfo &)> PerFragmentCB
Definition: Execute.h:324
#define CHECK(condition)
Definition: Logger.h:187
std::vector< InputTableInfo > get_table_infos(const std::vector< InputDescriptor > &input_descs, Executor *executor)
boost::variant< ScalarTargetValue, ArrayTargetValue, GeoTargetValue, GeoTargetValuePtr > TargetValue
Definition: TargetValue.h:167
Definition: sqltypes.h:47
Definition: sqldefs.h:71
SQLTypeInfo sqlType
Definition: ChunkMetadata.h:32
const Catalog_Namespace::Catalog & cat_
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ vacuumDeletedRows()

void TableOptimizer::vacuumDeletedRows ( ) const

Compacts fragments to remove deleted rows. When a row is deleted, a boolean deleted system column is set to true. Vacuuming removes all deleted rows from a fragment. Note that vacuuming is a checkpointing operation, so data on disk will increase even though the number of rows for the current epoch has decreased.

Definition at line 336 of file TableOptimizer.cpp.

References cat_, Catalog_Namespace::Catalog::checkpoint(), TableDescriptor::tableId, td_, and Catalog_Namespace::Catalog::vacuumDeletedRows().

Referenced by anonymous_namespace{UpdelStorageTest.cpp}::delete_and_vacuum_varlen_rows(), MapDHandler::sql_execute_impl(), and TableOptimizer().

336  {
337  const auto table_id = td_->tableId;
338  cat_.vacuumDeletedRows(table_id);
339  cat_.checkpoint(table_id);
340 }
void vacuumDeletedRows(const TableDescriptor *td) const
Definition: Catalog.cpp:3028
const TableDescriptor * td_
void checkpoint(const int logicalTableId) const
Definition: Catalog.cpp:2927
const Catalog_Namespace::Catalog & cat_
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

Member Data Documentation

◆ cat_

const Catalog_Namespace::Catalog& TableOptimizer::cat_
private

Definition at line 61 of file TableOptimizer.h.

Referenced by recomputeMetadata(), and vacuumDeletedRows().

◆ executor_

Executor* TableOptimizer::executor_
private

Definition at line 60 of file TableOptimizer.h.

Referenced by recomputeMetadata().

◆ td_

const TableDescriptor* TableOptimizer::td_
private

Definition at line 59 of file TableOptimizer.h.

Referenced by recomputeMetadata(), and vacuumDeletedRows().


The documentation for this class was generated from the following files: