OmniSciDB  72c90bc290
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
OneHotEncoder.cpp File Reference
#include "OneHotEncoder.h"
#include "QueryEngine/TableFunctions/SystemFunctions/os/Shared/TableFunctionsCommon.hpp"
#include "Shared/ThreadInfo.h"
#include <tbb/parallel_for.h>
#include <tbb/parallel_sort.h>
+ Include dependency graph for OneHotEncoder.cpp:

Go to the source code of this file.

Classes

struct  TableFunctions_Namespace::OneHotEncoder_Namespace::KeyToOneHotColBytemap
 A struct that creates a bytemap to map each key to its corresponding one-hot column index. More...
 

Namespaces

 TableFunctions_Namespace
 
 TableFunctions_Namespace::OneHotEncoder_Namespace
 

Functions

NEVER_INLINE HOST std::pair
< std::vector< int32_t >, bool > 
TableFunctions_Namespace::OneHotEncoder_Namespace::get_top_k_keys (const Column< TextEncodingDict > &text_col, const int32_t top_k, const double min_perc_col_total_per_key)
 This function calculates the top k most frequent keys (categories) in the provided column based on a given minimum percentage of the column total per key. It returns the top k keys along with a boolean value indicating whether there are other keys beyond the top k keys. More...
 
template<typename F >
NEVER_INLINE HOST std::vector
< std::vector< F > > 
TableFunctions_Namespace::OneHotEncoder_Namespace::allocate_one_hot_cols (const int64_t num_one_hot_cols, const int64_t col_size)
 Allocates memory for the one-hot encoded columns and initializes them to zero. It takes the number of one-hot columns and the column size as input and returns a vector of one-hot encoded columns. More...
 
std::pair< int32_t, int32_t > TableFunctions_Namespace::OneHotEncoder_Namespace::get_min_max_keys (const std::vector< int32_t > &top_k_keys)
 Finds the minimum and maximum keys in a given vector of keys and returns them as a pair. More...
 
template<typename F >
NEVER_INLINE HOST
OneHotEncodedCol< F > 
TableFunctions_Namespace::OneHotEncoder_Namespace::one_hot_encode (const Column< TextEncodingDict > &text_col, const TableFunctions_Namespace::OneHotEncoder_Namespace::OneHotEncodingInfo &one_hot_encoding_info)
 Takes a column of text-encoded data and one-hot encoding information as input. It performs the one-hot encoding process and returns an object containing the one-hot encoded columns and their corresponding categorical features. More...
 
template<typename F >
NEVER_INLINE HOST std::vector
< OneHotEncodedCol< F > > 
TableFunctions_Namespace::OneHotEncoder_Namespace::one_hot_encode (const ColumnList< TextEncodingDict > &text_cols, const std::vector< TableFunctions_Namespace::OneHotEncoder_Namespace::OneHotEncodingInfo > &one_hot_encoding_infos)
 One-hot encode multiple columns of text-encoded data in a column list, given a vector of one-hot encoding information for each column. More...
 

Variables

constexpr int16_t TableFunctions_Namespace::OneHotEncoder_Namespace::INVALID_COL_IDX {-1}