Chapter Download

Selecting ODBC functions for optimized SQL statements

In this section from The Data Access Handbook you'll find best practices and guidelines for deciding what ODBC function will provide you with the best performance when executing SQL statements. You'll also learn about ODBC cursor libraries, how to limit the data retrieved and how to use bound columns.

 

 

Table of Contents

    Requires Free Membership to View

Designing for performance: Strategic database application deployments
An introduction to database transaction management
Executing SQL statements using prepared statements and statement pooling
Database access security: network authentication or data encryption?
Static SQL vs. dynamic SQL for database application performance
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance

Executing SQL Statements

Use the guidelines in this section to help you select which ODBC functions will give you the best performance when executing SQL statements.

Using Stored Procedures
Database drivers can call stored procedures on the database using either of the following methods:

  • Execute the procedure the same way as any other SQL statement. The database parses the SQL statement, validates argument types, and converts arguments into the correct data types.
  • Invoke a Remote Procedure Call (RPC) directly in the database. The database skips the parsing and optimization that executing a SQL statement requires.

Remember that SQL is always sent to the database as a character string. For example, consider the following stored procedure call, which passes a literal argument to the stored procedure:

{call getCustName (12345)}

Although the argument to getCustName() is an integer, the argument is passed inside a character string to the database, namely {call getCustName (12345)} . The database parses the SQL statement, isolates the single argument value of 12345, and converts the string 12345 into an integer value before executing the procedure as a SQL language event. Using an RPC on the database, your application can pass the parameters to the RPC. The driver sends a database protocol packet that contains the parameters in their native data type formats, skipping the parsing and optimization required to execute the stored procedure as a SQL statement. Compare the following examples.


Example A: Not Using a Server-Side RPC

The stored procedure getCustName is not optimized to use a server-side RPC. The database treats the SQL stored procedure execution request as a normal SQL language event, which includes parsing the statement, validating argument types, and converting arguments into the correct data types before executing the procedure.

strcpy (sqlStatement,"{call getCustName (12345)}");
rc = SQLPrepare((SQLHSTMT)hstmt, sqlStatement, SQL_NTS);
rc = SQLExecute(hstmt);


Example B: Using a Server-Side RPC

The stored procedure getCustName is optimized to use a server-side RPC. Because the application avoids literal arguments and calls the procedure by specifying arguments as parameters, the driver optimizes the execution by invoking the stored procedure directly on the database as an RPC. The SQL language processing by the database is avoided, and execution time is faster.

 strcpy (sqlStatement,"{call getCustName (?)}");
rc = SQLPrepare((SQLHSTMT)hstmt, sqlStatement, SQL_NTS);
rc = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT,
            SQL_C_LONG, SQL_INTEGER, 10, 0,
            &id, sizeof(id), NULL);
id = 12345;
rc = SQLExecute(hstmt);


Why doesn't the driver parse and automatically change the SQL stored procedure call when it encounters a literal argument so that it can execute the stored procedure using an RPC? Consider this example:

{call getCustname (12345)}

The driver doesn't know if the value 12345 represents an integer, a decimal, a smallint, a bigint, or another numeric data type. To determine the correct data type for packaging the RPC request, the driver must make an expensive network round trip to the database server. The overhead needed to determine the true data type of the literal argument far outweighs the benefit of trying to execute the request as an RPC.

Using Statements Versus Prepared Statements
Most applications have a certain set of SQL statements that are executed multiple times and a few SQL statements that are executed only once or twice during the life of the application. Choose the SQLExecDirect function or the SQLPrepare/SQLExecute functions depending on how frequently you plan to execute the SQL statement.

The SQLExecDirect function is optimized for a SQL statement that is only executed once. In contrast, the SQLPrepare/SQLExecute functions are optimized for SQL statements that use parameter markers and are executed multiple times. Although the overhead for the initial execution of a prepared statement is high, the advantage is realized with subsequent executions of the SQL statement.

Using the SQLPrepare/SQLExecute functions typically results in at least two network round trips to the database server:

  • One network round trip to parse and optimize the statement
  • One or more network round trips to execute the statement and retrieve Results

See "SQL Statements," page 27, for more information about statements versus prepared statements.

Using Arrays of Parameters
Updating large amounts of data typically is done by preparing an Insert statement and executing that statement multiple times, resulting in many network round trips.

With ODBC 3.x, calls to SQLSetStmtAttr with the SQL_ATTR_ PARAMSET_SIZE, SQL_ATTR_PARAMS_PROCESSED_PTR, and SQL_ATTR_PARAM_ STATUS_PTR arguments supersede the ODBC 2.x call to SQLParamOptions.

Before executing the statement, the application sets the value of each data element in the bound array. When the statement is executed, the driver tries to process the entire array contents using one network round trip. For example, let's compare the following examples.


Example A: Executing a Prepared Statement Multiple Times

A prepared statement is used to execute an Insert statement multiple times, requiring 101 network round trips to perform 100 Insert operations: 1 round trip to prepare the statement and 100 additional round trips to execute its iterations.

rc = SQLPrepare (hstmt, "INSERT INTO DailyLedger (...)" +
     "VALUES (?,?,...)", SQL_NTS);
// bind parameters

do {
// read ledger values into bound parameter buffers
...
rc = SQLExecute (hstmt);
// insert row
} while ! (eof);


Example B: Arrays of Parameters

When arrays of parameters are used to consolidate 100 Insert operations, only two network round trips are required: one to prepare the statement and another to execute the array. Although arrays of parameters use more CPU cycles, performance is gained by reducing the number of network round trips.

 SQLPrepare (hstmt, "INSERT INTO DailyLedger (...)" +
     "VALUES (?,?,...)", SQL_NTS);
SQLSetStmtAttr (hstmt, SQL_ATTR_PARAMSET_SIZE, (UDWORD)100,
      SQL_IS_UINTEGER);
SQLSetStmtAttr (hstmt, SQL_ATTR_PARAMS_PROCESSED_PTR,
      &rows_processed, SQL_IS_POINTER);
// Specify an array in which to retrieve the status of
// each set of parameters.
SQLSetStmtAttr(hstmt, SQL_ATTR_PARAM_STATUS_PTR,
      ParamStatusArray, SQL_IS_POINTER);
// pass 100 parameters per execute
// bind parameters
...
do {
// read up to 100 ledger values into
// bound parameter buffers.
...
rc = SQLExecute (hstmt);
// insert a group of 100 rows
} while ! (eof);


Using the Cursor Library
The ODBC cursor library is a component of Microsoft Data Access Components (MDAC) and is used to implement static cursors (one type of scrollable cursor) for drivers that normally don't support them.

What if you don't know whether your driver supports scrollable cursors? Using the following code ensures that the ODBC cursor library is only used when the driver doesn't support scrollable cursors:

 rc = SQLSetConnectAttr (hstmt, SQL_ATTR_ODBC_CURSORS,
     SQL_CUR_USE_IF_NEEDED, SQL_IS_INTEGER);

Retrieving Data

Retrieve only the data you need, and choose the most efficient method to retrieve that data. Use the guidelines in this section to optimize your performance when retrieving data.

Retrieving Long Data
Retrieving long data—such as large XML data, long varchar/text, long varbinary, Clobs, and Blobs—across a network is slow and resource intensive. Most users really don't want to see long data. For example, consider the user interface of an employee directory application that allows the user to look up an employee's phone extension and department, and optionally, view an employee's photograph by clicking the name of the employee.

Employee Phone Dept
Harding X4568 Manager
Hoover X4324 Sales
Taft X4569 Sales
Lincoln X4329 Tech

Retrieving each employee's photograph would slow performance unnecessarily. If the user does want to see the photograph, he can click the employee name and the application can query the database again, specifying only the long columns in the Select list. This method allows users to retrieve result sets without paying a high performance penalty for network traffic.

Although excluding long data from the Select list is the best approach, some applications do not formulate the Select list before sending the query to the driver (that is, some applications use SELECT * FROM table ...). If the Select list contains long data, the driver is forced to retrieve that long data, even if the application never requests the long data from the result set. When possible, use a method that does not retrieve all columns of the table. For example, consider the following code:

 rc = SQLExecDirect (hstmt, "SELECT * FROM employees" +
           "WHERE SSID = '999-99-2222'", SQL_NTS);
rc = SQLFetch (hstmt);

When a query is executed, the driver has no way to determine which result columns the application will use; an application may fetch any result column that is retrieved. When the driver processes a SQLFetch or SQLExtendedFetch request, it retrieves at least one, and often multiple, result rows from the database across the network. A result row contains all the column values for each row. What if one of the columns includes long data such as an employee photograph? Performance would slow considerably.

Limiting the Select list to contain only the name column results in a faster performing query at runtime. For example:

 rc = SQLExecDirect (hstmt, "SELECT name FROM employees" +
           "WHERE SSID = '999-99-2222'", SQL_NTS);
rc = SQLFetch(hstmt);
rc = SQLGetData(hstmt, 1, ...);

Limiting the Amount of Data Retrieved
If your application executes a query that retrieves five rows when it needs only two, application performance suffers, especially if the unnecessary rows include long data.

Make sure that your Select statements use a Where clause to limit the amount of data that is retrieved. Even when using a Where clause, a Select statement that does not adequately restrict its request could retrieve hundreds of rows of data. For example, if you want data from the employees table for each manager hired in recent years, your application could execute the following statement, and subsequently, filter out the rows of employees who are not managers:

SELECT * FROM employees
WHERE hiredate > 2000

However, suppose the employees table contains a column that stores photographs of each employee. In this case, retrieving extra rows is extremely expensive to your application performance. Let the database filter the request for you and avoid sending extra data that you don't need across the network. The following query uses a better approach, limiting the data retrieved and improving performance:

SELECT * FROM employees
WHERE hiredate > 2003 AND job_title='Manager'

Sometimes applications need to use SQL queries that generate a large amount of network traffic. For example, consider an application that displays information from support case histories, which each contains a 10MB log file. Does the user really need to see the entire contents of the log file? If not, performance would improve if the application displayed only the first 1MB of the log file.

Suppose you have a GUI-based application, and each screen can display no more than 20 rows of data. It's easy to construct a query that may retrieve a million rows, such as SELECT * FROM employees, but it's hard to imagine a scenario where a query that retrieves a million rows would be useful. When designing applications, it's good practice to call the SQLSetStmtAttr function with the SQL_ATTR_MAX_ROWS option as a fail-safe to limit the number of rows that a query can retrieve. For example, if an application calls SQLSetStmt(SQL_ATTR_MAX_ROWS, 10000, 0) , no query will retrieve more than 10,000 rows.

In addition, calling the SQLSetStmtAttr function with the SQL_ATTR_MAX_LENGTH option limits the bytes of data that can be retrieved for a column value with the following data types:

  • Binary
  • Varbinary
  • Longvarbinary
  • Char
  • Varchar
  • Longvarchar

For example, consider an application that allows users to select from a repository of technical articles. Rather than retrieve and display the entire article, the application can call SQLSetStmtAttr(SQL_ATTR_MAX_LENGTH, 153600, 0) to retrieve only the first 150KB of text to the application—enough to give users a reasonable preview of the article.

Using Bound Columns
Data can be retrieved from the database using either the SQLBindCol function or the SQLGetData function. When SQLBindCol is called, it associates, or binds, a variable to a column in the result set. Nothing is sent to the database. SQLBindCol tells the driver to remember the addresses of the variables, which the driver will use to store the data when it is actually retrieved. When SQLFetch is executed, the driver places the data into the addresses of the variables specified by SQLBindCol. In contrast, SQLGetData returns data directly into variables. It's commonly called to retrieve long data, which often exceeds the length of a single buffer and must be retrieved in parts.

The following code uses the SQLGetData function to retrieve data:

rc = SQLExecDirect (hstmt, "SELECT <20 columns>" +
           "FROM employees" +
           "WHERE HireDate >= ?", SQL_NTS);
do {
rc = SQLFetch (hstmt);
// call SQLGetData 20 times
} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));

If the query retrieves 90 result rows, 1,891 ODBC calls are made (20 calls to SQLGetData 90 result rows + 91 calls to SQLFetch).

The following code uses the SQLBindCol function instead of SQLGetData:

rc = SQLExecDirect (hstmt, "SELECT <20 columns>" +
           "FROM employees" +
           "WHERE HireDate >= ?", SQL_NTS);
// call SQLBindCol 20 times
do {
rc = SQLFetch (hstmt);
} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));

The number of ODBC calls is reduced from 1,891 to 111 (20 calls to SQLBindCol + 91 calls to SQLFetch). In addition to reducing the number of calls required, many drivers optimize how SQLBindCol is used by binding result information directly from the database into the user's buffer. That is, instead of the driver retrieving information into a container and then copying that information to the user's buffer, the driver requests that the information from the database be placed directly into the user's buffer.

Using SQLExtendedFetch Instead of SQLFetch
Most ODBC drivers now support SQLExtendedFetch for forward-only cursors. Yet, most ODBC applications continue to use SQLFetch to fetch data.

Again, consider the same example we used in the section, "Using Bound Columns," page 145, but using SQLExtendedFetch instead of SQLFetch:

rc = SQLSetStmtOption (hstmt, SQL_ROWSET_SIZE, 100);
// use arrays of 100 elements
rc = SQLExecDirect (hstmt, "SELECT <20 columns>" +
           "FROM employees" +
           "WHERE HireDate >= ?", SQL_NTS);
// call SQLBindCol 1 time specifying row-wise binding
do {
rc = SQLExtendedFetch (hstmt, SQL_FETCH_NEXT, 0,
     &RowsFetched, RowStatus);
} while ((rc == SQL_SUCCESS) || (rc == SQL_SUCCESS_WITH_INFO));

The number of ODBC calls made by the application has been reduced from 1,891 to 4 (1 SQLSetStmtOption + 1 SQLExecDirect + 1 SQLBindCol + 1 SQLExtendedFetch). Besides reducing the ODBC call load, some ODBC drivers can retrieve data from the server in arrays, further improving the performance by reducing network traffic.

For ODBC drivers that do not support SQLExtendedFetch, your application can enable forward-only cursors using the ODBC cursor library by calling SQLSetConnectAttr.Using the cursor library won't improve performance, but it also won't decrease application response time when using forward-only cursors because no logging is required. For scrollable cursors, it's a different story (see "Using the Cursor Library," page 141). In addition, using the cursor library when SQLExtendedFetch is not supported natively by the driver simplifies code because the application can always depend on SQLExtendedFetch being available. The application doesn't require two algorithms (one using SQLExtendedFetch and one using SQLFetch).

Determining the Number of Rows in a Result Set
ODBC defines two types of cursors:

  • Forward-only
  • Scrollable (static, keyset-driven, dynamic, and mixed)

Scrollable cursors let you go both forward and backward through a result set. However, because of limited support for server-side scrollable cursors in many database systems, ODBC drivers often emulate scrollable cursors, storing rows from a scrollable result set in a cache on the machine where the driver resides (client or application server).

Unless you are certain that the database natively supports using a scrollable result set, do not call the SQLExtendedFetch function to find out how many rows the result set contains. For drivers that emulate scrollable cursors, calling SQLExtendedFetch causes the driver to retrieve all results across the network to reach the last row. This emulated model of scrollable cursors provides flexibility for the developer but comes with a performance penalty until the client cache of rows is fully populated. Instead of calling SQLExtendedFetch to determine the number of rows, count the rows by iterating through the result set or obtain the number of rows by submitting a Select statement with the Count function. For example:

SELECT COUNT(*) FROM employees

Unfortunately, there's no easy way to tell if a database driver uses native server-side scrollable cursors or emulates this functionality. For Oracle or MySQL, you know the driver emulates scrollable cursors, but for other databases, it's more complicated. See "Using Scrollable Cursors," page 36, for details about which common databases support server-side scrollable cursors and how database drivers emulate scrollable cursors.

Choosing the Right Data Type
When designing your database schema, it's obvious that you need to think about the impact of storage requirements on the database server. Less obvious, but just as important, you need to think about the network traffic required to move data in its native format to and from the ODBC driver. Retrieving and sending certain data types across the network can increase or decrease network traffic.

See "Choosing the Right Data Type," page 34, for information about which data types are processed faster than others.

 


More on accessing data:

 


 


This was first published in September 2009

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: