OmniSciDB  72c90bc290
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap Struct Reference

A struct that creates a bytemap to map each key to its corresponding one-hot column index. More...

Public Member Functions

 KeyToOneHotColBytemap (const std::vector< int32_t > &top_k_keys, const int32_t min_key, const int32_t max_key, const bool has_other_key)
 
int16_t get_col_idx_for_key (const int32_t key) const
 

Static Public Member Functions

static std::vector< int16_t > init_bytemap (const std::vector< int32_t > &top_k_keys, const int32_t min_key, const int32_t max_key, const bool has_other_key)
 

Public Attributes

const int32_t min_key_
 
const int32_t max_key_
 
const bool has_other_key_
 
const int32_t other_key_
 
const std::vector< int16_t > bytemap_
 

Detailed Description

A struct that creates a bytemap to map each key to its corresponding one-hot column index.

Definition at line 189 of file OneHotEncoder.cpp.

Constructor & Destructor Documentation

TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::KeyToOneHotColBytemap ( const std::vector< int32_t > &  top_k_keys,
const int32_t  min_key,
const int32_t  max_key,
const bool  has_other_key 
)
inline

Definition at line 190 of file OneHotEncoder.cpp.

194  : min_key_(min_key)
195  , max_key_(max_key)
196  , has_other_key_(has_other_key)
197  , other_key_(top_k_keys.size())
198  , bytemap_(init_bytemap(top_k_keys, min_key, max_key, has_other_key)) {}
static std::vector< int16_t > init_bytemap(const std::vector< int32_t > &top_k_keys, const int32_t min_key, const int32_t max_key, const bool has_other_key)

Member Function Documentation

int16_t TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::get_col_idx_for_key ( const int32_t  key) const
inline
static std::vector<int16_t> TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::init_bytemap ( const std::vector< int32_t > &  top_k_keys,
const int32_t  min_key,
const int32_t  max_key,
const bool  has_other_key 
)
inlinestatic

Definition at line 200 of file OneHotEncoder.cpp.

References TableFunctions_Namespace::OneHotEncoder_Namespace::INVALID_COL_IDX.

203  {
204  // The bytemap can be quite large if the dictionary-encoded key range is large, so for
205  // efficiency we store the offsets as int16_t Since we use `top_k_keys.size()` as the
206  // sentinel for the OTHER key, we check to see if the top_k_keys.size() is smaller
207  // than the maximum allowable value for int16_t
208  if (static_cast<int64_t>(top_k_keys.size()) >= std::numeric_limits<int16_t>::max()) {
209  std::ostringstream error_oss;
210  error_oss << "Error: More than " << std::numeric_limits<int16_t>::max() - 1
211  << " top k categorical keys not allowed.";
212  throw std::runtime_error(error_oss.str());
213  }
214  std::vector<int16_t> bytemap(max_key - min_key + 1,
215  has_other_key ? top_k_keys.size() : INVALID_COL_IDX);
216  int16_t offset = 0;
217  for (const auto& key : top_k_keys) {
218  bytemap[key - min_key] = offset++;
219  }
220  return bytemap;
221  }

Member Data Documentation

const std::vector<int16_t> TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::bytemap_

Definition at line 234 of file OneHotEncoder.cpp.

Referenced by get_col_idx_for_key().

const bool TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::has_other_key_

Definition at line 232 of file OneHotEncoder.cpp.

Referenced by get_col_idx_for_key().

const int32_t TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::max_key_

Definition at line 231 of file OneHotEncoder.cpp.

Referenced by get_col_idx_for_key().

const int32_t TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::min_key_

Definition at line 230 of file OneHotEncoder.cpp.

Referenced by get_col_idx_for_key().

const int32_t TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap::other_key_

Definition at line 233 of file OneHotEncoder.cpp.

Referenced by get_col_idx_for_key().


The documentation for this struct was generated from the following file: