Home > Guidelines for managing data updates to optimize ODBC performance
Chapter Download:
EMAIL THIS

Guidelines for managing data updates to optimize ODBC performance

17 Sep 2009 | Written by: John Goodson and Robert A. Steward

Tips, expert advice and sample chapters
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Data Access Handbook cover
This section from The Data Access Handbook will provide you with best practices and guidelines for updating data and managing your updates efficiently to optimize ODBC performance. Read a definition of pseudo-columns and learn about their role in SQL and ODBC functions. Find out how to minimize the slow retrieving catalog functions in your ODBC application and how to use a dummy query to determine table characteristics.

Table of Contents

Designing for performance: Strategic database application deployments
An introduction to database transaction management
Executing SQL statements using prepared statements and statement pooling
Database access security: network authentication or data encryption?
Static SQL vs. dynamic SQL for database application performance
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
Guidelines for managing data updates to optimize ODBC performance

Copyright Info
The Data Access Handbook by John Goodson and Robert A. Steward
ISBN 0137143931
First Printing March 2009
Prentice Hall Professional
Updating Data

Use the guidelines in this section to manage your updates more efficiently.

Using SQLSpecialColumns to Optimize Updates and Deletes
Many databases have hidden columns, named pseudo-columns, that represent a unique key associated with every row in a table. Typically, pseudo-columns in a SQL statement provide the fastest way to access a row because they usually point to the exact location of the physical record.

Performance Tip
Use SQLSpecialColumns to identify the most optimal columns, typically pseudo-columns, to use in the Where clause for updating data.
Some applications, such as an application that forms a Where clause consisting of a subset of the column values retrieved in the result set, cannot be designed to take advantage of positioned updates and deletes. Some applications may formulate the Where clause by using searchable result columns or by calling SQLStatistics to find columns that may be part of a unique index. These methods usually work but can result in fairly complex queries. For example:

rc = SQLExecDirect (hstmt, "SELECT first_name, last_name," +
           "ssn, address, city, state, zip" +
           "FROM employees", SQL_NTS);
// fetch data using complex query

...
rc = SQLExecDirect (hstmt, "UPDATE employees SET address = ?" +
           "WHERE first_name = ? AND last_name = ?" +
           "AND ssn = ? AND address = ? AND city = ? AND" +
           "state = ? AND zip = ?", SQL_NTS);

Many databases support pseudo-columns that are not explicitly defined in the table definition but are hidden columns of every table (for example, ROWID for Oracle). Because pseudo-columns are not part of the explicit table definition, they're not retrieved when SQLColumns is called. To determine if pseudo-columns exist, your application must call SQLSpecialColumns. For example:

...
rc = SQLSpecialColumns (hstmt, SQL_BEST_ROWID, ...);
...
rc = SQLExecDirect (hstmt, "SELECT first_name, last_name," +
           "ssn, address, city, state, zip," +
           "ROWID FROM employees", SQL_NTS);
// fetch data and probably "hide" ROWID from the user
...
rc = SQLExecDirect (hstmt, "UPDATE employees SET address = ?" +
           "WHERE ROWID = ?", SQL_NTS);
// fastest access to the data!

If your data source doesn't contain pseudo-columns, the result set of SQLSpecialColumns consists of the columns of the most optimal unique index on the specified table (if a unique index exists). Therefore, your application doesn't need to call SQLStatistics to find the smallest unique index.

Using Catalog Functions

Performance Tip
Although it's almost impossible to write an ODBC application without using a catalog function, you can improve performance by minimizing their use.
Catalog functions retrieve information about a result set, such as the number and type of columns. Because catalog functions are slow compared to other ODBC functions, using them frequently can impair performance. Use the guidelines in this section to optimize performance when selecting and using catalog functions.

Minimizing the Use of Catalog Functions
Compared to other ODBC functions, catalog functions that generate result sets are slow. To retrieve all result column information mandated by the ODBC specification, an ODBC driver often must perform multiple or complex queries to retrieve the result set for a single call to a catalog function.

In addition to avoid executing catalog functions multiple times, you should cache information retrieved from result sets generated by catalog functions. For example, call SQLGetTypeInfo once, and cache the elements of the result set that your application depends on. It's unlikely that any application will use all elements of the result set generated by a catalog function, so the cache of information shouldn't be difficult to maintain.

Performance Tip
Always supply as many non-null arguments as possible to result sets that generate catalog functions.
Avoiding Search Patterns
Catalog functions support arguments that can limit the amount of data retrieved. Using null values or search patterns, such as %A%, for these arguments often generates time-consuming queries. In addition, network traffic can increase because of unnecessary results.

In the following example, an application uses the SQLTables function to determine whether the table named WSTable exists and provides null values for most of the arguments:

rc = SQLTables(hstmt, null, 0, null, 0, "WSTable",
           SQL_NTS, null, 0);

The driver interprets the request as follows: Retrieve all tables, views, system tables, synonyms, temporary tables, and aliases named WSTable that exist in any database schema in the database catalog.

In contrast, the following request provides non-null values for all arguments, allowing the driver to process the request more efficiently:

rc = SQLTables(hstmt, "cat1", SQL_NTS, "johng", SQL_NTS,
nbsp;          "WSTable", SQL_NTS, "Table", SQL_NTS);

Performance Tip
To determine characteristics about a database table, avoid using the SQLColumns function. Instead, use a dummy query inside a prepared statement that executes the SQLDescribeCol function. Only use the SQLColumns function when you cannot obtain the requested information from result set metadata (for example, using the table column default values).
The driver interprets the request as follows: Retrieve all tables in catalog "cat1" that are named "WSTable" and owned by "johng." No synonyms, views, system tables, aliases, or temporary tables are retrieved.

Sometimes little is known about the object that you are requesting information for. Any information that the application can provide the driver when calling catalog functions can result in improved performance and reliability.

Using a Dummy Query to Determine Table Characteristics
Sometimes you need information about columns in the database table, such as column names, column data types, and column precision and scale. For example, an application that allows users to choose which columns to select may need to request the names of each column in the database table.

The following examples show the benefit of using the SQLDescribeCol function over the SQLColumns function.



Example A: SQLColumns Function

A potentially complex query is prepared and executed, the result description information is formulated, the driver retrieves the result rows, and the application fetches the result. This method results in increased CPU use and network communication.

rc = SQLColumns (... "UnknownTable" ...);
// This call to SQLColumns will generate a query to the
// system catalogs... possibly a join which must be
// prepared, executed, and produce a result set.
rc = SQLBindCol (...);
rc = SQLExtendedFetch (...);
// user must retrieve N rows from the server
// N = # result columns of UnknownTable
// result column information has now been obtained



Example B: SQLDescribeCol Function

A simple query that retrieves result set information is prepared, but the query is not executed and result rows are not retrieved by the driver. Only information about the result set is retrieved (the same information retrieved by SQLColumns in Example A).

// prepare dummy query
rc = SQLPrepare (... "SELECT * FROM UnknownTable" +
     "WHERE 1 = 0" ...);
// query is never executed on the server - only prepared
rc = SQLNumResultCols (...);
for (irow = 1; irow <= NumColumns; irow++) {
     rc = SQLDescribeCol (...)
     // + optional calls to SQLColAttributes
     }
// result column information has now been obtained
// Note we also know the column ordering within the table!
// This information cannot be
// assumed from the SQLColumns example.

What if the database server, such as a Microsoft SQL Server server does not support prepared statements by default? The performance of Example A wouldn't change, but the performance of Example B would decrease slightly because the dummy query is evaluated in addition to being prepared. Because the Where clause of the query always evaluates to FALSE, the query generates no result rows and executes the statement without retrieving result rows. So, even with a slight decrease in performance, Example B still outperforms Example A.


Summary

The performance of ODBC applications can suffer if they fail to reduce network traffic, limit disk I/O, simplify queries, and optimize the interaction between the application and driver. Reducing network communication probably is the most important technique for improving performance. For example, when you need to update large amounts of data, using arrays of parameters rather than executing an Insert statement multiple times reduces the number of network round trips required to complete the operation.

Typically, creating a connection is the most performance-expensive task your application performs. Connection pooling can help you manage your connections efficiently, particularly if your application has numerous users. Regardless of whether your application uses connection pooling, make sure that your application closes connections immediately after the user is finished with them.

Making smart choices about how to handle transactions can also improve performance. For example, using manual commits instead of auto-commit mode provides better control over when work is committed. Similarly, if you don't need the protection of distributed transactions, using local transactions can improve performance.

Inefficient SQL queries slow the performance of ODBC applications. Some SQL queries don't filter data, causing the driver to retrieve unnecessary data. Your application pays a huge penalty in performance when that unnecessary data is long data, such as data stored as a Blob or Clob. Even well-formed SQL queries can be more or less effective depending on how they are executed. For example, using SQLExtendedFetch instead of SQLFetch and using SQLBindCol instead of SQLGetData reduces ODBC calls and improves performance.

More on accessing data:




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data warehouse strategy
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Shredding XML docs into relational tables with annotated XML schemas
Examples of single and bulk XML shredding of XML documents
Definition of primary, super, foreign and candidate key in the DBMS
What are some emerging data warehouse and DBMS trends?
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
How to capture metadata information, ETL rules with CA Erwin Data Modeler
Designing for performance: Strategic database application deployments

Data management tutorials
Seven secrets to business intelligence (BI) success
How to design project management plan templates for large projects
Handling project management issues for a proprietary ownership conflict
Should we use traditional or agile software development processes?
Next steps for business intelligence: Video
Improving ODBC application performance and coding
Selecting ODBC functions for optimized SQL statements
Is it better to have a centralized or decentralized master data structure?
Can we leverage existing data quality tools for an MDM program?
Do we need business intelligence (BI) tools to be successful?

Enterprise data architecture best practices
Advantages and disadvantages of XML shredding
How to shred XML with the DB2 XMLTABLE function
Shredding XML docs into relational tables with annotated XML schemas
Teradata takes a logical approach to data warehousing appliances
Examples of single and bulk XML shredding of XML documents
What is the difference between a logical and physical warehouse design?
What are some emerging data warehouse and DBMS trends?
Teradata VP talks data warehouse appliances, reveals cloud and SSD plans
Selecting ODBC functions for optimized SQL statements
Data Warehouse Platforms Product Directory

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Compliance Articles and Research: Data Privacy, Financial Data Management, Healthcare Data
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts