OmniSciDB  2e3a973ef4
create_table.SyntheticTable Class Reference

Public Member Functions

def __init__ (self, kwargs)
 
def createDataAndImportTable (self, skip_data_generation=False)
 
def generateColumnsSchema (self)
 
def getCreateTableCommand (self)
 
def getCopyFromCommand (self)
 
def generateData (self, thread_idx, size)
 
def generateDataParallel (self)
 
def createExpectedTableDetails (self)
 
def doesTableHasExpectedSchemaInDB (self)
 
def doesTableHasExpectedNumEntriesInDB (self)
 
def createTableInDB (self)
 
def importDataIntoTableInDB (self)
 

Public Attributes

 table_name
 
 fragment_size
 
 num_fragments
 
 db_name
 
 db_user
 
 db_password
 
 db_server
 
 db_port
 
 data_dir_path
 
 num_entries
 
 column_list
 
 is_remote_server
 
 data_file_name_base
 

Detailed Description

Definition at line 47 of file create_table.py.

Constructor & Destructor Documentation

◆ __init__()

def create_table.SyntheticTable.__init__ (   self,
  kwargs 
)
    kwargs:
table_name(str): synthetic table's name in the database
fragment_size(int): fragment size (number of entries per fragment)
num_fragment(int): total number of fragments for the synthetic table
db_user(str): database username
db_password(str): database password
db_port(int): database port
db_name(str): database name
db_server(str): database server name
data_dir_path(str): path to directory that will include the generated data
is_remote_server(Bool): if True, it indicates that this class is not created on the 
same machine that is going to host the server.  

Definition at line 48 of file create_table.py.

48  def __init__(self, **kwargs):
49  """
50  kwargs:
51  table_name(str): synthetic table's name in the database
52  fragment_size(int): fragment size (number of entries per fragment)
53  num_fragment(int): total number of fragments for the synthetic table
54  db_user(str): database username
55  db_password(str): database password
56  db_port(int): database port
57  db_name(str): database name
58  db_server(str): database server name
59  data_dir_path(str): path to directory that will include the generated data
60  is_remote_server(Bool): if True, it indicates that this class is not created on the
61  same machine that is going to host the server.
62  """
63  self.table_name = kwargs["table_name"]
64  self.fragment_size = kwargs["fragment_size"]
65  self.num_fragments = kwargs["num_fragments"]
66  self.db_name = kwargs["db_name"]
67  self.db_user = kwargs["db_user"]
68  self.db_password = kwargs["db_password"]
69  self.db_server = kwargs["db_server"]
70  self.db_port = kwargs["db_port"]
71  self.data_dir_path = kwargs["data_dir_path"]
72  self.num_entries = self.num_fragments * self.fragment_size
73  self.column_list = self.generateColumnsSchema()
74  self.data_dir_path = kwargs["data_dir_path"]
75  self.is_remote_server = kwargs["is_remote_server"]
76  if not os.path.isdir(self.data_dir_path):
77  os.mkdir(self.data_dir_path)
78  self.data_file_name_base = self.data_dir_path + "/data"
79 

Member Function Documentation

◆ createDataAndImportTable()

def create_table.SyntheticTable.createDataAndImportTable (   self,
  skip_data_generation = False 
)

Definition at line 80 of file create_table.py.

References create_table.SyntheticTable.createTableInDB(), create_table.SyntheticTable.data_file_name_base, create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB(), create_table.SyntheticTable.doesTableHasExpectedSchemaInDB(), create_table.SyntheticTable.generateDataParallel(), create_table.SyntheticTable.importDataIntoTableInDB(), create_table.SyntheticTable.is_remote_server, create_table.SyntheticTable.num_entries, and split().

80  def createDataAndImportTable(self, skip_data_generation=False):
81  # deciding whether it is required to generate data and import it into the database
82  # or the data already exists there:
83  if (
84  self.doesTableHasExpectedSchemaInDB()
85  and self.doesTableHasExpectedNumEntriesInDB()
86  ):
87  print(
88  "Data already exists in the database, proceeding to the queries:"
89  )
90  else:
91  if self.is_remote_server:
92  # at this point, we abort the procedure as the data is
93  # either not present in the remote server or the schema/number of rows
94  # does not match of those indicated by this class.
95  raise Exception(
96  "Proper data does not exist in the remote server."
97  )
98  else:
99  # generate random synthetic data
100  if not skip_data_generation:
101  # choosing a relatively unique name for the generated csv files
102  current_time = str(datetime.datetime.now()).split()
103  self.data_file_name_base += "_" + current_time[0]
104 
105  self.generateDataParallel()
106  print(
107  "Synthetic data created: "
108  + str(self.num_entries)
109  + " rows"
110  )
111  # create a table on the database:
112  self.createTableInDB()
113  # import the generated data into the data base:
114  self.importDataIntoTableInDB()
115  print("Data imported into the database")
116 
std::vector< std::string > split(std::string_view str, std::string_view delim, std::optional< size_t > maxsplit)
split apart a string into a vector of substrings
+ Here is the call graph for this function:

◆ createExpectedTableDetails()

def create_table.SyntheticTable.createExpectedTableDetails (   self)
Creates table details in the same format as expected 
from pymapd's get_table_details  

Definition at line 207 of file create_table.py.

References create_table.SyntheticTable.column_list.

Referenced by create_table.SyntheticTable.doesTableHasExpectedSchemaInDB().

207  def createExpectedTableDetails(self):
208  """
209  Creates table details in the same format as expected
210  from pymapd's get_table_details
211  """
212  return [
213  column.createColumnDetailsString() for column in self.column_list
214  ]
215 
+ Here is the caller graph for this function:

◆ createTableInDB()

def create_table.SyntheticTable.createTableInDB (   self)

Definition at line 277 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, create_table.SyntheticTable.getCreateTableCommand(), and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

277  def createTableInDB(self):
278  try:
279  con = pymapd.connect(
280  user=self.db_user,
281  password=self.db_password,
282  host=self.db_server,
283  port=self.db_port,
284  dbname=self.db_name,
285  )
286  # drop the current table if exists:
287  con.execute("DROP TABLE IF EXISTS " + self.table_name + ";")
288  # create a new table:
289  con.execute(self.getCreateTableCommand())
290  except:
291  raise Exception("Failure in creating a new table.")
292 
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ doesTableHasExpectedNumEntriesInDB()

def create_table.SyntheticTable.doesTableHasExpectedNumEntriesInDB (   self)
    Verifies whether the existing table in the database has the expected
    number of entries in it as in this class.

Definition at line 253 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, create_table.SyntheticTable.num_entries, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

253  def doesTableHasExpectedNumEntriesInDB(self):
254  """
255  Verifies whether the existing table in the database has the expected
256  number of entries in it as in this class.
257  """
258  try:
259  con = pymapd.connect(
260  user=self.db_user,
261  password=self.db_password,
262  host=self.db_server,
263  port=self.db_port,
264  dbname=self.db_name,
265  )
266  result = con.execute(
267  "select count(*) from " + self.table_name + ";"
268  )
269  if list(result)[0][0] == self.num_entries:
270  return True
271  else:
272  print("Expected num rows did not match:")
273  return False
274  except:
275  raise Exception("Pymapd's connection to the server has failed.")
276 
+ Here is the caller graph for this function:

◆ doesTableHasExpectedSchemaInDB()

def create_table.SyntheticTable.doesTableHasExpectedSchemaInDB (   self)
    Verifies whether the existing table in the database has the expected
    schema or not. 

Definition at line 216 of file create_table.py.

References create_table.SyntheticTable.createExpectedTableDetails(), create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

216  def doesTableHasExpectedSchemaInDB(self):
217  """
218  Verifies whether the existing table in the database has the expected
219  schema or not.
220  """
221  try:
222  con = pymapd.connect(
223  user=self.db_user,
224  password=self.db_password,
225  host=self.db_server,
226  port=self.db_port,
227  dbname=self.db_name,
228  )
229  except:
230  raise Exception("Pymapd's connection to the server has failed.")
231  try:
232  table_details = con.get_table_details(self.table_name)
233  except:
234  # table does not exist
235  print("Table does not exist in the database")
236  return False
237 
238  if [
239  str(table_detail) for table_detail in table_details
240  ] == self.createExpectedTableDetails():
241  return True
242  else:
243  print("Schema does not match the expected one:")
244  print(
245  "Observed table details: "
246  + str([str(table_detail) for table_detail in table_details])
247  )
248  print(
249  "Expected table details: "
250  + str(self.createExpectedTableDetails())
251  )
252 
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ generateColumnsSchema()

def create_table.SyntheticTable.generateColumnsSchema (   self)

Definition at line 117 of file create_table.py.

117  def generateColumnsSchema(self):
118  column_list = []
119  # columns with uniform distribution and step=1
120  column_list.append(Column("x10", "INT", 1, 10))
121  column_list.append(Column("y10", "INT", 1, 10))
122  column_list.append(Column("z10", "INT", 1, 10))
123  column_list.append(Column("x100", "INT", 1, 100))
124  column_list.append(Column("y100", "INT", 1, 100))
125  column_list.append(Column("z100", "INT", 1, 100))
126  column_list.append(Column("x1k", "INT", 1, 1000))
127  column_list.append(Column("x10k", "INT", 1, 10000))
128  column_list.append(Column("x100k", "INT", 1, 100000))
129  column_list.append(Column("x1m", "INT", 1, 1000000))
130  column_list.append(Column("x10m", "INT", 1, 10000000))
131 
132  # columns with step != 1
133  # cardinality = 10k, range = 100m
134  column_list.append(Column("x10k_s10k", "BIGINT", 1, 10000, 10000))
135  column_list.append(Column("x100k_s10k", "BIGINT", 1, 100000, 10000))
136  column_list.append(Column("x1m_s10k", "BIGINT", 1, 1000000, 10000))
137  return column_list
138 

◆ generateData()

def create_table.SyntheticTable.generateData (   self,
  thread_idx,
  size 
)
    Single-thread random data generation based on the provided schema.
    Data is stored in CSV format.

Definition at line 161 of file create_table.py.

References create_table.SyntheticTable.column_list, create_table.SyntheticTable.data_file_name_base, join(), and omnisci.open().

Referenced by create_table.SyntheticTable.generateDataParallel().

161  def generateData(self, thread_idx, size):
162  """
163  Single-thread random data generation based on the provided schema.
164  Data is stored in CSV format.
165  """
166  file_name = (
167  self.data_file_name_base + "_part" + str(thread_idx) + ".csv"
168  )
169  with open(file_name, "w") as f:
170  for i in range(size):
171  f.write(
172  ",".join(
173  map(
174  str,
175  [col.generateEntry() for col in self.column_list],
176  )
177  )
178  )
179  f.write("\n")
180 
int open(const char *path, int flags, int mode)
Definition: omnisci_fs.cpp:64
std::string join(T const &container, std::string const &delim)
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ generateDataParallel()

def create_table.SyntheticTable.generateDataParallel (   self)
    Uses all available CPU threads to generate random data based on the 
    provided schema. Data is stored in CSV format.

Definition at line 181 of file create_table.py.

References create_table.SyntheticTable.generateData(), and create_table.SyntheticTable.num_entries.

Referenced by create_table.SyntheticTable.createDataAndImportTable().

181  def generateDataParallel(self):
182  """
183  Uses all available CPU threads to generate random data based on the
184  provided schema. Data is stored in CSV format.
185  """
186  num_threads = cpu_count()
187  num_entries_per_thread = int(
188  (self.num_entries + num_threads - 1) / num_threads
189  )
190  thread_index = [i for i in range(0, num_threads)]
191 
192  # making sure we end up having as many fragments as the user asked for
193  num_balanced_entries = [
194  num_entries_per_thread for _ in range(num_threads)
195  ]
196  if self.num_entries != num_entries_per_thread * num_threads:
197  last_threads_portion = (
198  self.num_entries - num_entries_per_thread * (num_threads - 1)
199  )
200  num_balanced_entries[-1] = last_threads_portion
201 
202  arguments = zip(thread_index, num_balanced_entries)
203 
204  with Pool(num_threads) as pool:
205  pool.starmap(self.generateData, arguments)
206 
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ getCopyFromCommand()

def create_table.SyntheticTable.getCopyFromCommand (   self)

Definition at line 154 of file create_table.py.

References create_table.SyntheticTable.data_file_name_base, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.importDataIntoTableInDB().

154  def getCopyFromCommand(self):
155  copy_sql = "COPY " + self.table_name + " FROM '"
156  copy_sql += (
157  self.data_file_name_base + "*.csv' WITH (header = 'false');"
158  )
159  return copy_sql
160 
+ Here is the caller graph for this function:

◆ getCreateTableCommand()

def create_table.SyntheticTable.getCreateTableCommand (   self)

Definition at line 139 of file create_table.py.

References create_table.SyntheticTable.column_list, create_table.SyntheticTable.fragment_size, and create_table.SyntheticTable.table_name.

Referenced by create_table.SyntheticTable.createTableInDB().

139  def getCreateTableCommand(self):
140  create_sql = "CREATE TABLE " + self.table_name + " ( "
141  for column_idx in range(len(self.column_list)):
142  column = self.column_list[column_idx]
143  create_sql += column.column_name + " " + column.sql_type
144  if column_idx != (len(self.column_list) - 1):
145  create_sql += ", "
146  create_sql += ")"
147  if self.fragment_size != 32000000:
148  create_sql += (
149  " WITH (FRAGMENT_SIZE = " + str(self.fragment_size) + ")"
150  )
151  create_sql += ";"
152  return create_sql
153 
+ Here is the caller graph for this function:

◆ importDataIntoTableInDB()

def create_table.SyntheticTable.importDataIntoTableInDB (   self)

Definition at line 293 of file create_table.py.

References create_table.SyntheticTable.db_name, create_table.SyntheticTable.db_password, create_table.SyntheticTable.db_port, create_table.SyntheticTable.db_server, create_table.SyntheticTable.db_user, and create_table.SyntheticTable.getCopyFromCommand().

Referenced by create_table.SyntheticTable.createDataAndImportTable().

293  def importDataIntoTableInDB(self):
294  try:
295  con = pymapd.connect(
296  user=self.db_user,
297  password=self.db_password,
298  host=self.db_server,
299  port=self.db_port,
300  dbname=self.db_name,
301  )
302  # import generated data:
303  con.execute(self.getCopyFromCommand())
304  except:
305  raise Exception("Failure in importing data into the table")
306 
307 
+ Here is the call graph for this function:
+ Here is the caller graph for this function:

Member Data Documentation

◆ column_list

◆ data_dir_path

create_table.SyntheticTable.data_dir_path

Definition at line 71 of file create_table.py.

◆ data_file_name_base

◆ db_name

◆ db_password

◆ db_port

◆ db_server

◆ db_user

◆ fragment_size

create_table.SyntheticTable.fragment_size

Definition at line 64 of file create_table.py.

Referenced by create_table.SyntheticTable.getCreateTableCommand().

◆ is_remote_server

create_table.SyntheticTable.is_remote_server

◆ num_entries

◆ num_fragments

create_table.SyntheticTable.num_fragments

Definition at line 65 of file create_table.py.

◆ table_name


The documentation for this class was generated from the following file: