Application timeout and retry logic implementation with ScaleArc
This knowledge base article is intended to illustrate specific client application features and or behaviors that can be used to leverage the ScaleArc database availability features to their fullest.
At its core, ScaleArc is a full ACID compliant, state aware SQL proxy that is deployed between application servers and a SQL server environment. Clients access the information in the database by directing connections to a cluster address configured on ScaleArc. The ScaleArc appliance in turn directs the queries received on the client connections to separate connections to the database servers. Due to this unique position within the traffic flow between the application servers and the database server, ScaleArc can provide many benefits to improve performance and availability of the application infrastructure.
Since ScaleArc acts as the database server from the client application perspective it can make decisions as to which database server in a cluster is the optimal choice for receiving the client application query. ScaleArc will monitor not only the health of the individual database servers but also the response time for previous communications. The response time is factored into database server selection when more than one is available to process a query. A server that is responding faster will be selected over a server that is responding more slowly.
In the event that the primary read/write server within a database cluster fails, ScaleArc will queue incoming connections from the client application servers until either the failed server becomes available again or a backup server is promoted to the primary status. This behavior is intended to allow your application servers to be insulated from impact due to database service interruptions. In order to make full use of this behavior, certain ScaleArc configuration options need to be set, such as the client Idle Client Connection Timeout and Maximum Client Connections cluster settings.
|Idle Client Connection Timeout||Time to wait (in seconds) before closing an idle connection by a client (Webserver, Application server or a MSSQL client). If you use a client side connection pool (Java/Apache connection pooling), set this value to more than the client connection pool timeout.
Default: 1200 seconds
|Maximum Client Connections||Total number of concurrent client connections that can be opened.|
In addition, the the ScaleArc configuration settings, certain application behaviors can be implemented to further increase the overall resilience of the end to end system. Connection retry logic is a good example of such a behavior. Typically, applications relay on the underlying operating system to establish and control the TCP connections between the application server and the database. If there is a disruption to the TCP connection the operating system will send an alert to the application that was using the connection indicating that it is no longer available. If that alert from the operating system is not handled within the application itself, then often times the application will fail. This type of failure is often visible to end users. One way to mitigate or even insulate end users from being impacted by connection failures is to first capture the error and handle it appropriately. An extension of this approach to higher application resiliency is to implement logic within the application code to retry the connection to the database. Retrying the connection to the database/ScaleArc can cause additional delay for providing the end user with the information they requested, but is often preferable to an error message. Some applications may have the ability to notify the end user of the expected delay.
If connection retry logic is in place, there are other considerations that may need to be addressed to prevent exasperating the effect of a database failure. For instance, if there is a long delay in either promoting a standby database server to the primary role or recovering the failed primary server, then the retry logic could cause floods of new connection traffic. These floods of new connection traffic could stress network capacity or even ScaleArc overall performance if it is already heavily loaded. Possible retry logic feature extensions to help avoid this type of scenario are placing a maximum on the number of retries attempted before simply failing as well as having a random time delay between retries. The maximum number of retries prevents infinite loops. The random time delay between tries helps to spread the new connection requests over time and in effect preventing floods of traffic at any one moment.
With application retry logic in place, it is conceivable to perform simple maintenance on an entire cluster of database servers without impact to the application (beyond an increase in response time). As an example, suppose a small schema change is needed. During testing it has been determined that the schema change can be applied in 45 seconds. With ScaleArc in place, the traffic to the database servers can be quiesced followed by the entire database cluster marked offline, the schema update applied and then the databases brought back online without impacting the client application. ScaleArc will queue the incoming connections and wait for the primary database server to become available again. If for some reason a client connection timed out during the schema update, the application retry logic would allow additional time for the database to become available before causing an error for the end user.
If you are experiencing issues with ScaleArc or with any of it's features, please contact ScaleArc Support. We are available 24x7 by phone at 855 800 7225 or +1 408 412 7315. For general support inquiries, you can also e-mail us at firstname.lastname@example.org.
2901 Tasman Drive Santa Clara, CA 95054 | Email: email@example.com