OmniSciDB  04ee39c94c
anonymous_namespace{HashJoinRuntime.cpp} Namespace Reference

Functions

int64_t translate_str_id_to_outer_dict (const int64_t elem, const int64_t min_elem, const int64_t max_elem, const void *sd_inner_proxy, const void *sd_outer_proxy)
 

Function Documentation

◆ translate_str_id_to_outer_dict()

int64_t anonymous_namespace{HashJoinRuntime.cpp}::translate_str_id_to_outer_dict ( const int64_t  elem,
const int64_t  min_elem,
const int64_t  max_elem,
const void *  sd_inner_proxy,
const void *  sd_outer_proxy 
)
inline

Joins between two dictionary encoded string columns without a shared string dictionary are computed by translating the inner dictionary to the outer dictionary while filling the hash table. The translation works as follows:

Given two tables t1 and t2, with t1 the outer table and t2 the inner table, and two columns t1.x and t2.x, both dictionary encoded strings without a shared dictionary, we read each value in t2.x and do a lookup in the dictionary for t1.x. If the lookup returns a valid ID, we insert that ID into the hash table. Otherwise, we skip adding an entry into the hash table for the inner column. We can also skip adding any entries that are outside the range of the outer column.

Consider a join of the form SELECT x, n FROM (SELECT x, COUNT(*) n FROM t1 GROUP BY x HAVING n > 10), t2 WHERE t1.x = t2.x; Let the result of the subquery be t1_s. Due to the HAVING clause, the range of all IDs in t1_s must be less than or equal to the range of all IDs in t1. Suppose we have an element a in t2.x that is also in t1_s.x. Then the ID of a must be within the range of t1_s. Therefore it is safe to ignore any element ID that is not in the dictionary corresponding to t1_s.x or is outside the range of column t1_s.

Definition at line 69 of file HashJoinRuntime.cpp.

References CHECK, StringDictionaryProxy::getString(), and StringDictionary::INVALID_STR_ID.

Referenced by count_matches_impl(), count_matches_sharded(), fill_hash_join_buff_impl(), fill_hash_join_buff_sharded_impl(), fill_row_ids_impl(), and fill_row_ids_sharded_impl().

73  {
74  CHECK(sd_outer_proxy);
75  const auto sd_inner_dict_proxy =
76  static_cast<const StringDictionaryProxy*>(sd_inner_proxy);
77  const auto sd_outer_dict_proxy =
78  static_cast<const StringDictionaryProxy*>(sd_outer_proxy);
79  const auto elem_str = sd_inner_dict_proxy->getString(elem);
80  const auto outer_id = sd_outer_dict_proxy->getIdOfString(elem_str);
81  if (outer_id > max_elem || outer_id < min_elem) {
83  }
84  return outer_id;
85 }
static constexpr int32_t INVALID_STR_ID
std::string getString(int32_t string_id) const
#define CHECK(condition)
Definition: Logger.h:187
+ Here is the call graph for this function:
+ Here is the caller graph for this function: