The Data Access Handbook
This chapter from The Data Access Handbook discusses the strategy that goes into creating a database architecture, including how to make database applications, tools and configurations work together to reach the highest possible performance level. In this section, you'll learn how database applications affect performance, why database connections are expensive and how connection pooling works.
Table of Contents
Designing for performance: Strategic database application deployments
An introduction to database transaction management
Executing SQL statements using prepared statements and statement pooling
Database access security: network authentication or data encryption?
Static SQL vs. dynamic SQL for database application performance
Designing for Performance: What's Your Strategy?
Designing your database application and the configuration of the database middleware that connects your application to the database server for optimal performance isn't easy. We refer to all these components as your database application deployment. There is no onesize-fits-all design. You must think about every component to get the best performance possible.
Often you are not in control of every component that affects performance. For example, your company may dictate that all applications run on an application server. Also, your database administrator most likely controls the database server machine's configuration. In these cases, you need to consider the configurations that are dictated when designing your database application deployment. For example, if you know that the applications will reside on an application server, you probably want to spend ample time planning for connection and statement pooling, which are both discussed in this chapter.
Many software architects and developers don't think that the design of their database applications impacts the performance of those applications. This is not true; application design is a key factor. An application is often coded to establish a new connection to gather information about the database, such as supported data types or database version. Avoid establishing additional connections for this purpose because connections are performance-expensive, as we explain in this chapter.
This section explores several key functional areas of applications that you need to consider to achieve maximum performance:
- Database connections
- SQL statements
- Data retrieval
Some functional areas of applications, such as data encryption, affect performance, but you can do little about the performance impact. We discuss these areas and provide information about the performance impact you can expect.
When you make good application design decisions, you can improve performance by doing the following:
- Reducing network traffic
- Limiting disk I/O
- Optimizing application-to-driver interaction
- Simplifying queries
For API-specific code examples and discussions, you should also read the chapter for the standards-based API that you work with:
- For ODBC users, see Chapter 5, "ODBC Applications:Writing Good Code."
- For JDBC users, see Chapter 6, "JDBC Applications:Writing Good Code."
- For ADO.NET users, see Chapter 7, ".NET Applications: Writing Good Code."
The way you implement database connections may be the most important design decision you make for your application.
Your choices for implementing connections are as follows:
- Obtain a connection from a connection pool. Read the section, "Using Connection Pooling," page 12.
- Create a new connection one at a time as needed. Read the section, "Creating a New Connection One at a Time as Needed," page 16.
The right choice mainly depends on the CPU and memory conditions on the database server, as we explain throughout this section.
Facts About Connections
Before we discuss how to make this decision, here are some important facts about connections:
- Creating a connection is performance-expensive compared to all other tasks a database application can perform.
- Open connections use a substantial amount of memory on both the database server and database client machines.
- Establishing a connection takes multiple network round trips to and from the database server.
- Opening numerous connections can contribute to out-of-memory conditions, which might cause paging of memory to disk and, thus, overall performance degradation.
- In today's architectures, many applications are deployed in connection pooled environments, which are intended to improve performance. However, many times poorly tuned connection pooling can result in performance degradation. Connection pools can be difficult to design, tune, and monitor.
Why Connections Are Performance-Expensive
Developers often assume that establishing a connection is a simple request that results in the driver making a single network round trip to the database server to initialize a user. In reality, a connection typically involves many network round trips between the driver and the database server. For example, when a driver connects to Oracle or Sybase, that connection may take anywhere from seven to ten network round trips to perform the following actions:
- Validate the user's credentials.
- Negotiate code page settings between what the database driver expects and what the database has available, if necessary.
- Get database version information.
- Establish the optimal database protocol packet size to be used for communication.
- Set session settings.
In addition, the database management system establishes resources on behalf of the connection, which involves performance-expensive disk I/O and memory allocation.
You might be thinking that you can eliminate network round trips if you place your applications on the same machine as the database system. This is, in most cases, not realistic because of the complexity of real-world enterprises—many, many applications accessing many database systems with applications running on several application servers. In addition, the server on which the database system runs must be well tuned for the database system, not for many different applications. Even if one machine would fit the bill, would you really want a single point of failure?
Using Connection Pooling
A connection pool is a cache of physical database connections that one or more applications can reuse. Connection pooling can provide significant performance gains because reusing a connection reduces the overhead associated with establishing a physical connection. The caveat here is that your database server must have enough memory to manage all the connections in the pool.
In this book, we discuss client-side connection pooling (connection pooling provided by database drivers and application servers), not database-side connection pooling (connection pooling provided by database management systems). Some database management systems provide connection pooling, and those implementations work in conjunction with client-side connection pooling. Although specific characteristics of database-side connection pooling vary, the overall goal is to eliminate the overhead on the database server of establishing and removing connections. Unlike client-side connection pooling, database-side connection pooling does not optimize network round trips to the application. As we stated previously, connecting to a database is performance-expensive because of the resource allocation in the database driver (network round trips between the driver and the database), and the resource allocation on the database server. Client-side connection pooling helps solve the issue of expensive resource allocation for both the database driver and database server. Database-side connection pooling only helps solve the issue on the database server.
How Connection Pooling Works
In a pooled environment, once the initial physical connection is established, it will likely not be closed for the life of the environment. That is, when an application disconnects, the physical connection is not closed; instead, it is placed in the pool for reuse. Therefore, re-establishing the connection becomes one of the fastest operations instead of one of the slowest.
Here is a basic overview of how connection pooling works (as shown in Figure 2-1):
- When the application or application server is started, the connection pool is typically populated with connections.
- An application makes a connection request.
- Either the driver or the Connection Pool Manager (depending on your architecture) assigns one of the pooled connections to the application instead of requesting that a new connection be established. This means that no network round trips occur between the driver and the database server for connection requests because a connection is available in the pool. The result: Your connection request is fast.
- The application is connected to the database.
- When the connection is closed, it is placed back into the pool.
Figure 2-1 Connection pooling
Guidelines for Connection Pooling
Here are some general guidelines for using connection pooling. For details about different connection pooling models, see Chapter 8, "Connection Pooling and Statement Pooling."
- A perfect scenario for using connection pooling is when your applications reside on an application server, which implies multiple users using the applications.
- Consider using connection pooling if your application has multiple users and your database server has enough memory to manage the maximum number of connections that will be in the pool at any given time. In most connection pooling models, it is easy to calculate the maximum number of connections that will be in a pool because the connection pool implementation allows you to configure the maximum. If the implementation you are using does not support configuring the maximum number of connections in a pool, you must calculate how many connections will be in the pool during peak times to determine if your database server can handle the load.
- Determine whether the number of database licenses you have accommodates a connection pool. If you have limited licenses, answer the following questions to determine if you have enough licenses to support a connection pool:
a. Will other applications use database licenses? If yes, take this into account when calculating how many licenses you need for your connection pool.
b. Are you using a database that uses a streaming protocol, such as Sybase, Microsoft SQL Server, or MySQL? If yes, you may be using more database connections than you think. In streaming protocol databases, only one request can be processed at a time over a single connection; the other requests on the same connection must wait for the preceding request to complete before a subsequent request can be processed. Therefore, some database driver implementations duplicate connections (establish another connection) when multiple requests are sent over a single connection so that all requests can be processed in a timely manner.
- When you develop your application to use connection pooling, open connections just before the application needs them. Opening them earlier than necessary decreases the number of connections available to other users and can increase the demand for resources. Don't forget to close them when the database work is complete so that the connection can return to the pool for reuse.
When Not to Use Connection Pooling
Some applications are not good candidates for using connection pooling. If your applications have any of the following characteristics, you probably don't want to use connection pooling. In fact, for these types of applications, connection pooling may degrade performance.
- Applications that restart numerous times daily—This typically applies only to architectures that are not using an application server. Depending on the configuration of the connection pool, it may be populated with connections each time the application is started, which causes a performance penalty up front.
- Single-user applications, such as report writers—If your application only needs to establish a connection for a single user who runs reports two to three times daily, the memory usage on the database server associated with a connection pool degrades performance more than establishing the connection two or three times daily.
- Applications that run single-user batch jobs, such as end-of-day/week/month reporting—Connection pooling provides no advantage for batch jobs that access only one database server, which typically equates to only one connection. Furthermore, batch jobs are usually run during off hours when performance is not as much of a concern.
Creating a New Connection One at a Time as Needed
When you create a new connection one at a time as needed, you can design your application to create either of the following:
- One connection for each statement to be executed
- One connection for multiple statements, which is often referred to as using multiple threads
Figure 2-2 compares these two connection models.
Figure 2-2 Comparing two connection models
The advantage of using one connection for each statement is that each statement can access the database at the same time. The disadvantage is the overhead of establishing multiple connections.
The advantages and disadvantages of using one connection for multiple statements are explained later in this section.
One Connection for Multiple Statements
Before we can explain the details of one connection for multiple statements, we need to define statement. Some people equate "statement" to "SQL statement." We like the definition of "statement" that is found in the Microsoft ODBC 3.0 Programmer's Reference:
A statement is most easily thought of as an SQL statement, such as SELECT * FROM Employee.However, a statement is more than just an SQL statement— it consists of all of the information associated with that SQL statement, such as any result sets created by the statement and parameters used in the execution of the statement. A statement does not even need to have an application defined SQL statement. For example, when a catalog function such asSQLTables is executed on a statement, it executes a predefined SQL statement that returns a list of table names. (Source: Microsoft ODBC 3.0 Programmer's Reference and SDK Guide, Volume I. Redmond: Microsoft Press,1997)
To summarize, a statement is not only the request sent to the database but the result of the request.
How One Connection for Multiple Statements Works
When you develop your application to use one connection for multiple statements, an application may have to wait for a connection. To understand why, you must understand how one connection for multiple statements works; this depends on the protocol of the database system you are using: streaming or cursor based. Sybase, Microsoft SQL Server, and MySQL are examples of streaming protocol databases. Oracle and DB2 are examples of cursor-based protocol databases.
Streaming protocol database systems process the query and send results until there are no more results to send; the database is uninterruptable. Therefore, the network connection is "busy" until all results are returned (fetched) to the application.
Cursor-based protocol database systems assign a database server-side "name" (cursor) to a SQL statement. The server operates on that cursor in incremental time segments. The driver tells the database server when to work and how much information to return. Several cursors can use the network connection, each working in small slices of time.
Let's look at the case where your SQL statement creates result sets and your application is accessing a streaming protocol database. In this case, the connection is unavailable to process another SQL statement until the first statement is executed and all results are returned to the application. The time this takes depends on the size of the result set. Figure 2-3 shows an example.
Figure 2-3 Streaming protocol result sets
Let's look at the case where the SQL statement updates the database and your application is accessing a streaming protocol database, as shown in Figure 2-4. The connection is available as soon as the statement is executed and the row count is returned to the application.
Figure 2-4 Streaming protocol updates
Last, let's look at the case where your SQL statement creates result sets and your application is accessing a cursor-based protocol database. Unlike Example A, which is a streaming protocol example, the connection is available before all the results are returned to the application. When using cursor-based protocol databases, the result sets are returned as the driver asks for them. Figure 2-5 shows an example.
Figure 2-5 Cursor-based protocol/result sets
More on accessing databases:
- Read the next section — An introduction to database transaction management.
- Intrigued by this book excerpt? Download a free PDF of this chapter: Chapter 2 —Designing for Performance: What's Your Strategy?
- Read more excerpts and download more sample chapters from our Data Management bookshelf
- To purchase the book or similar titles, visit InformIT