July 5, 2006
SQL Set-Based Processing
I just finished coding on part of a conversion (SQR and
dynamic SQL) and a set of
data-scrambling scripts (SQL). Some of the steps
in both projects took a very long time to run.
As usual, the way the SQL was written was by far the most important
factor in how fast the processes run. Just a few
tricks yielded huge performance gains.
One of the supposed advantages of Application Engine
over languages such as PeopleCode or SQR is that
it uses SQL set-based processing, in which one
SQL statement affects many rows. The alternative
is to call separate procedures for each row
within SQR, or use a cursor within a PL/SQL procedure.
(I'm using Oracle concepts today, although most of
these things should have equivalents in other
RDBMSs.) But set-based processing is an App Engine
"advantage" only because App Engine pretty much forces us
to use it. SQR and PeopleCode are just as capable,
and much more flexible.
The simplest example of using set-based processing
to improve performance is to use a join instead of calling
separate procedures for each row in SQR. For example,
if you are reading employee data and want to get the
description of each row's department, you could call
a separate procedure within the SELECT loop
to read the department table. But it's more efficient
to join the department table and just have one SELECT.
If you need the descriptions from other tables
(location, company, etc.), coding one SELECT can
save significant time.
But be careful with this. Remember that PeopleSoft
does not enforce referential integrity within the
database. If there is a chance that a particular row
of employee data will not have a corresponding row
in the department table, then you will need to use
an outer join to ensure that you select the employee
row. It may seem obvious that there should be a
department table row corresponding to each employee
row's department ID, but check your data. I've seen
some surprising situations. These problems
can usually be traced back to direct database inserts or
updates, but they can also be due to correction-mode
updates within the other tables.
In addition to referential integrity concerns, we also
need to be aware of the potential for optimization
problems when joins become too complicated. You may
have seen this when using row-level security views
in the HR/HCM product, for example. If you have a
huge, complicated join, the SQL can seem to run
forever. If you were expecting to retrieve only a
few rows of data, it is sometimes more efficient
to jump out to separate procedures for the other
How can we use set-based processing for inserts and
updates? Next week I'll describe how to update
a view, a good way to set the values of several columns when their
values are selected from other tables, how to
insert large amounts of data efficiently, and how to
deal with missing rows using functions that handle
Until next time...