OmniSciDB  21ac014ffc
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
generate_TableFunctionsFactory_init Namespace Reference

Classes

class  Bracket
 

Functions

def find_comma
 
def line_is_incomplete
 
def find_signatures
 
def is_template_function
 
def build_template_function_call
 
def format_annotations
 
def parse_annotations
 

Variables

tuple Signature = namedtuple('Signature', ['name', 'inputs', 'outputs', 'line'])
 
string ExtArgumentTypes
 
string OutputBufferSizeTypes
 
string SupportedAnnotations
 
tuple translate_map
 
tuple _is_int = re.compile(r'\d+')
 
list input_files = [os.path.join(os.path.dirname(__file__), 'test_udtf_signatures.hpp')]
 
string content
 
tuple dirname = os.path.dirname(output_filename)
 
tuple f = open(output_filename, 'w')
 

Detailed Description

Given a list of input files, scan for lines containing UDTF
specification statements in the following form:

  UDTF: function_name(<arguments>) -> <output column types>

where <arguments> is a comma-separated list of argument types. The
argument types specifications are:

- scalar types:
    Int8, Int16, Int32, Int64, Float, Double, Bool, TextEncodingDict, etc
- column types:
    ColumnInt8, ColumnInt16, ColumnInt32, ColumnInt64, ColumnFloat, ColumnDouble, ColumnBool, etc
- column list types:
    ColumnListInt8, ColumnListInt16, ColumnListInt32, ColumnListInt64, ColumnListFloat, ColumnListDouble, ColumnListBool, etc
- cursor type:
    Cursor<t0, t1, ...>
  where t0, t1 are column or column list types
- output buffer size parameter type:
    RowMultiplier<i>, ConstantParameter<i>, Constant<i>
  where i is literal integer

The output column types is a comma-separated list of column types, see above.

In addition, the following equivalents are suppored:
  Column<T> == ColumnT
  ColumnList<T> == ColumnListT
  Cursor<T, V, ...> == Cursor<ColumnT, ColumnV, ...>
  int8 == int8_t == Int8, etc
  float == Float, double == Double, bool == Bool
  T == ColumnT for output column types
  RowMultiplier == RowMultiplier<i> where i is the one-based position of the sizer argument
  when no sizer argument is provided, Constant<1> is assumed

Argument types can be annotated using `|' (bar) symbol after an
argument type specification. An annotation is specified by a label and
a value separated by `=' (equal) symbol. Multiple annotations can be
specified by using `|` (bar) symbol as the annotations separator.
Supported annotation labels are:

- name: to specify argument name
- input_id: to specify the dict id mapping for output TextEncodingDict columns.

Function Documentation

def generate_TableFunctionsFactory_init.build_template_function_call (   name,
  input_types,
  output_types 
)

Definition at line 392 of file generate_TableFunctionsFactory_init.py.

References join().

Referenced by parse_annotations().

393 def build_template_function_call(name, input_types, output_types):
394 
395  def format_cpp_type(cpp_type, idx, is_input=True):
396  # Perhaps integrate this to Bracket?
397  col_typs = ('Column', 'ColumnList')
398  idx = str(idx)
399  # TODO: use name in annotations when present?
400  arg_name = 'input' + idx if is_input else 'out' + idx
401  const = 'const ' if is_input else ''
402 
403  if any(cpp_type.startswith(ct) for ct in col_typs):
404  return '%s%s& %s' % (const, cpp_type, arg_name), arg_name
405  else:
406  return '%s %s' % (cpp_type, arg_name), arg_name
407 
408  input_cpp_args = []
409  output_cpp_args = []
410  arg_names = []
411 
412  for idx, input_type in enumerate(input_types):
413  cpp_type = input_type.get_cpp_type()
414  cpp_arg, arg_name = format_cpp_type(cpp_type, idx)
415  input_cpp_args.append(cpp_arg)
416  arg_names.append(arg_name)
417 
418  for idx, output_type in enumerate(output_types):
419  cpp_type = output_type.get_cpp_type()
420  cpp_arg, arg_name = format_cpp_type(cpp_type, idx, is_input=False)
421  output_cpp_args.append(cpp_arg)
422  arg_names.append(arg_name)
423 
424  callee = name
425  called = name.split('__')[0]
426  args = ', '.join(input_cpp_args + output_cpp_args)
427  arg_names = ', '.join(arg_names)
428 
429  template = ("EXTENSION_NOINLINE int32_t\n"
430  "%s(%s) {\n"
431  " return %s(%s);\n"
432  "}\n") % (callee, args, called, arg_names)
433  return template
434 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.find_comma (   line)

Definition at line 231 of file generate_TableFunctionsFactory_init.py.

Referenced by find_signatures(), and generate_TableFunctionsFactory_init.Bracket.parse().

232 def find_comma(line):
233  d = 0
234  for i, c in enumerate(line):
235  if c in '<([{':
236  d += 1
237  elif c in '>)]{':
238  d -= 1
239  elif d == 0 and c == ',':
240  return i
241  return -1
242 

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.find_signatures (   input_file)
Returns a list of parsed UDTF signatures.

Definition at line 249 of file generate_TableFunctionsFactory_init.py.

References find_comma(), join(), line_is_incomplete(), omnisci.open(), Signature, and split().

Referenced by parse_annotations().

250 def find_signatures(input_file):
251  """Returns a list of parsed UDTF signatures.
252  """
253 
254  def get_function_name(line):
255  return line.split('(')[0]
256 
257  def get_types_and_annotations(line):
258  """Line is a comma separated string of types.
259  """
260  rest = line.strip()
261  types, annotations = [], []
262  while rest:
263  i = find_comma(rest)
264  if i == -1:
265  type_annot, rest = rest, ''
266  else:
267  type_annot, rest = rest[:i].rstrip(), rest[i+1:].lstrip()
268  if '|' in type_annot:
269  typ, annots = type_annot.split('|', 1)
270  typ, annots = typ.rstrip(), annots.lstrip().split('|')
271  else:
272  typ, annots = type_annot, []
273  types.append(typ)
274  pairs = []
275  for annot in annots:
276  label, value = annot.strip().split('=', 1)
277  label, value = label.rstrip(), value.lstrip()
278  pairs.append((label, value))
279  annotations.append(pairs)
280  return types, annotations
281 
282  def get_input_types_and_annotations(line):
283  start = line.rfind('(') + 1
284  end = line.find(')')
285  assert -1 not in [start, end], line
286  return get_types_and_annotations(line[start:end])
287 
288  def get_output_types_and_annotations(line):
289  start = line.rfind('->') + 2
290  end = len(line)
291  assert -1 not in [start, end], line
292  return get_types_and_annotations(line[start:end])
293 
294  signatures = []
295 
296  last_line = None
297  for line in open(input_file).readlines():
298  line = line.strip()
299  if last_line is not None:
300  line = last_line + line
301  last_line = None
302  if not line.startswith('UDTF:'):
303  continue
304  if line_is_incomplete(line):
305  last_line = line
306  continue
307  last_line = None
308  line = line[5:].lstrip()
309  i = line.find('(')
310  j = line.find(')')
311  if i == -1 or j == -1:
312  sys.stderr.write('Invalid UDTF specification: `%s`. Skipping.\n' % (line))
313  continue
314 
315  expected_result = None
316  if '!' in line:
317  line, expected_result = line.split('!', 1)
318  expected_result = expected_result.strip()
319 
320  name = get_function_name(line)
321  input_types, input_annotations = get_input_types_and_annotations(line)
322  output_types, output_annotations = get_output_types_and_annotations(line)
323 
324  input_types = tuple([Bracket.parse(typ).normalize(kind='input') for typ in input_types])
325  output_types = tuple([Bracket.parse(typ).normalize(kind='output') for typ in output_types])
326 
327  # Apply default sizer
328  has_sizer = False
329  consumed_nargs = 0
330  for i, t in enumerate(input_types):
331  if t.is_output_buffer_sizer():
332  has_sizer = True
333  if t.is_row_multiplier():
334  if not t.args:
335  t.args = Bracket.parse('RowMultiplier<%s>' % (consumed_nargs + 1)).args
336  elif t.is_cursor():
337  consumed_nargs += len(t.args)
338  else:
339  consumed_nargs += 1
340  if not has_sizer:
341  t = Bracket.parse('kTableFunctionSpecifiedParameter<1>')
342  input_types += (t,)
343 
344  # Apply default input_id to output TextEncodedDict columns
345  default_input_id = None
346  for i, t in enumerate(input_types):
347  if t.is_column_text_encoded_dict():
348  default_input_id = 'args<%s>' % (i,)
349  break
350  elif t.is_column_list_text_encoded_dict():
351  default_input_id = 'args<%s, 0>' % (i,)
352  break
353  for t, annots in zip(output_types, output_annotations):
354  if t.is_any_text_encoded_dict():
355  has_input_id = False
356  for a in annots:
357  if a[0] == 'input_id':
358  has_input_id = True
359  break
360  if not has_input_id:
361  assert default_input_id is not None
362  annots.append(('input_id', default_input_id))
363 
364  result = name + '('
365  result += ', '.join([' | '.join([str(t)] + [k + '=' + v for k, v in a]) for t, a in zip(input_types, input_annotations)])
366  result += ') -> '
367  result += ', '.join([' | '.join([str(t)] + [k + '=' + v for k, v in a]) for t, a in zip(output_types, output_annotations)])
368 
369  if expected_result is not None:
370  assert result == expected_result, (result, expected_result)
371  if 1:
372  # Make sure that we have stable parsing result
373  line = result
374  name = get_function_name(line)
375  input_types, input_annotations = get_input_types_and_annotations(line)
376  output_types, output_annotations = get_output_types_and_annotations(line)
377  input_types = tuple([Bracket.parse(typ).normalize(kind='input') for typ in input_types])
378  output_types = tuple([Bracket.parse(typ).normalize(kind='output') for typ in output_types])
379  result2 = name + '('
380  result2 += ', '.join([' | '.join([str(t)] + [k + '=' + v for k, v in a]) for t, a in zip(input_types, input_annotations)])
381  result2 += ') -> '
382  result2 += ', '.join([' | '.join([str(t)] + [k + '=' + v for k, v in a]) for t, a in zip(output_types, output_annotations)])
383  assert result == result2, (result, result2)
384  signatures.append(Signature(name, input_types, output_types, input_annotations, output_annotations))
385 
386  return signatures
387 
int open(const char *path, int flags, int mode)
Definition: omnisci_fs.cpp:64
std::string join(T const &container, std::string const &delim)
std::vector< std::string > split(std::string_view str, std::string_view delim, std::optional< size_t > maxsplit)
split apart a string into a vector of substrings

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.format_annotations (   annotations_)

Definition at line 435 of file generate_TableFunctionsFactory_init.py.

References join().

Referenced by parse_annotations().

436 def format_annotations(annotations_):
437  s = "std::vector<std::map<std::string, std::string>>{"
438  s += ', '.join(('{' + ', '.join('{"%s", "%s"}' % (k, v) for k, v in a) + '}') for a in annotations_)
439  s += "}"
440  return s
441 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.is_template_function (   sig)

Definition at line 388 of file generate_TableFunctionsFactory_init.py.

Referenced by parse_annotations().

389 def is_template_function(sig):
390  return '_template' in sig.name
391 

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.line_is_incomplete (   line)

Definition at line 243 of file generate_TableFunctionsFactory_init.py.

Referenced by find_signatures().

244 def line_is_incomplete(line):
245  # TODO: try to parse the line to be certain about completeness.
246  # `!' is used to separate the UDTF signature and the expected result
247  return line.endswith(',') or line.endswith('->') or line.endswith('!')
248 

+ Here is the caller graph for this function:

def generate_TableFunctionsFactory_init.parse_annotations (   input_files)

Definition at line 442 of file generate_TableFunctionsFactory_init.py.

References build_template_function_call(), find_signatures(), format_annotations(), is_template_function(), and join().

443 def parse_annotations(input_files):
444 
445  add_stmts = []
446  template_functions = []
447 
448  for input_file in input_files:
449  for sig in find_signatures(input_file):
450 
451  # Compute sql_types, input_types, and sizer
452  sql_types_ = []
453  input_types_ = []
454  sizer = None
455  for t in sig.inputs:
456  if t.is_output_buffer_sizer():
457  if t.is_user_specified():
458  sql_types_.append(Bracket.parse('int32').normalize(kind='input'))
459  input_types_.append(sql_types_[-1])
460  assert sizer is None # exactly one sizer argument is allowed
461  assert len(t.args) == 1, t
462  sizer = 'TableFunctionOutputRowSizer{OutputBufferSizeType::%s, %s}' % (t.name, t.args[0])
463  elif t.name == 'Cursor':
464  for t_ in t.args:
465  input_types_.append(t_)
466  sql_types_.append(Bracket('Cursor', args=()))
467  else:
468  input_types_.append(t)
469  if t.is_column_any():
470  # XXX: let Bracket handle mapping of column to cursor(column)
471  sql_types_.append(Bracket('Cursor', args=()))
472  else:
473  sql_types_.append(t)
474  assert sizer is not None
475 
476  ns_output_types = tuple([a.apply_namespace(ns='ExtArgumentType') for a in sig.outputs])
477  ns_input_types = tuple([t.apply_namespace(ns='ExtArgumentType') for t in input_types_])
478  ns_sql_types = tuple([t.apply_namespace(ns='ExtArgumentType') for t in sql_types_])
479 
480  input_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(str, ns_input_types)))
481  output_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(str, ns_output_types)))
482  sql_types = 'std::vector<ExtArgumentType>{%s}' % (', '.join(map(str, ns_sql_types)))
483  annotations = format_annotations(sig.input_annotations + sig.output_annotations)
484 
485  add = 'TableFunctionsFactory::add("%s", %s, %s, %s, %s, %s);' % (sig.name, sizer, input_types, output_types, sql_types, annotations)
486  add_stmts.append(add)
487 
488  if is_template_function(sig):
489  t = build_template_function_call(sig.name, input_types_, sig.outputs)
490  template_functions.append(t)
491 
492  return add_stmts, template_functions
493 
std::string join(T const &container, std::string const &delim)

+ Here is the call graph for this function:

Variable Documentation

tuple generate_TableFunctionsFactory_init._is_int = re.compile(r'\d+')

Definition at line 90 of file generate_TableFunctionsFactory_init.py.

string generate_TableFunctionsFactory_init.content
Initial value:
1 = '''
2 /*
3  This file is generated by %s. Do no edit!
4 */
5 
6 #include "QueryEngine/TableFunctions/TableFunctionsFactory.h"
7 #include "QueryEngine/TableFunctions/TableFunctions.hpp"
8 #include "QueryEngine/OmniSciTypes.h"
9 
10 extern bool g_enable_table_functions;
11 
12 namespace table_functions {
13 
14 std::once_flag init_flag;
15 
16 void TableFunctionsFactory::init() {
17  if (!g_enable_table_functions) {
18  return;
19  }
20  std::call_once(init_flag, []() {
21  %s
22  });
23 }
24 
25 %s
26 
27 } // namespace table_functions
28 '''

Definition at line 508 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.dirname = os.path.dirname(output_filename)

Definition at line 538 of file generate_TableFunctionsFactory_init.py.

string generate_TableFunctionsFactory_init.ExtArgumentTypes
Initial value:
1 = ''' Int8, Int16, Int32, Int64, Float, Double, Void, PInt8, PInt16,
2 PInt32, PInt64, PFloat, PDouble, PBool, Bool, ArrayInt8, ArrayInt16,
3 ArrayInt32, ArrayInt64, ArrayFloat, ArrayDouble, ArrayBool, GeoPoint,
4 GeoLineString, Cursor, GeoPolygon, GeoMultiPolygon, ColumnInt8,
5 ColumnInt16, ColumnInt32, ColumnInt64, ColumnFloat, ColumnDouble,
6 ColumnBool, ColumnTextEncodingDict, TextEncodingNone, TextEncodingDict,
7 ColumnListInt8, ColumnListInt16, ColumnListInt32, ColumnListInt64,
8 ColumnListFloat, ColumnListDouble, ColumnListBool, ColumnListTextEncodingDict'''

Definition at line 55 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.f = open(output_filename, 'w')

Definition at line 542 of file generate_TableFunctionsFactory_init.py.

list generate_TableFunctionsFactory_init.input_files = [os.path.join(os.path.dirname(__file__), 'test_udtf_signatures.hpp')]

Definition at line 496 of file generate_TableFunctionsFactory_init.py.

string generate_TableFunctionsFactory_init.OutputBufferSizeTypes
Initial value:
1 = '''
2 kConstant, kUserSpecifiedConstantParameter, kUserSpecifiedRowMultiplier, kTableFunctionSpecifiedParameter
3 '''

Definition at line 64 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.Signature = namedtuple('Signature', ['name', 'inputs', 'outputs', 'line'])

Definition at line 51 of file generate_TableFunctionsFactory_init.py.

Referenced by find_signatures().

string generate_TableFunctionsFactory_init.SupportedAnnotations
Initial value:
1 = '''
2 input_id, name
3 '''

Definition at line 68 of file generate_TableFunctionsFactory_init.py.

tuple generate_TableFunctionsFactory_init.translate_map
Initial value:
1 = dict(
2  Constant='kConstant',
3  ConstantParameter='kUserSpecifiedConstantParameter',
4  RowMultiplier='kUserSpecifiedRowMultiplier',
5  UserSpecifiedConstantParameter='kUserSpecifiedConstantParameter',
6  UserSpecifiedRowMultiplier='kUserSpecifiedRowMultiplier',
7  TableFunctionSpecifiedParameter='kTableFunctionSpecifiedParameter',
8  short='Int16',
9  int='Int32',
10  long='Int64',
11 )

Definition at line 72 of file generate_TableFunctionsFactory_init.py.