Tuesday, January 12, 2010

SQL Query Optimization Tips

SQL Query Optimization Tips

The most important part and parcel of a Database management system is the SQL. It will differ from product to product. Given below are some tips on how to write a well optimized query and avoid performance bottlenecks.

Primary verification of the SQL.

1) The SQL query will need to be verified by two parameters

1. The SQL returns the appropriate results.
2. The SQL adequately addresses the issues of optimization and efficiency.

SQL processing.

When we write a query and process the same, we are carrying out other overhead operations to finally fetch the appropriate result. A typical query undergoes the process of (a) Parsing (b) Actual execution (c) and Fetch. When we tune/ optimize the SQL query, we are in fact finetuning the performance of one of these aspects. The result is faster time to generate the result and hence optimization.

Parsing includes

- Verification of query syntax

- Verification of objects

Execution includes

- The process of actual reads, writes in order to achieve the desired result.

Fetch includes

- Retrieval of the result rows and final display (such as ordered).

A very good method to counter the parsing time is to make extensive and judicious mix of Procedures, packages, views, functions and so on. Most RDBMS optimizers work on the principle of reducing the query complexity by arriving at the optimized way of execution. With increase in query complexity, it is found that the optimizer too takes a longer time to process the query.

Certain tips to work towards an optimized query.

1) Practically it is always feasible to tune the query after initial verification of the validity and working of the query.

2) Ensure that the SQL statements are written in a uniform and identical manner throughout to aid re-use. The added advantage is the avoidance of re-parsing when the optimizer encounters a similar query.

3) When working with large tables, avoid the usage of SELECT * to increase efficiency. Make sure to use the necessary column names and query the table for these columns only.

1. E.g. IF there exists a table called EMPLOYEE, with around 36 fields, it is better to use the relevant columns in the SELECT clause instead of using SELECT *

SELECT emp_id, emp_name, emp_bonus

FROM employee

is faster than using

SELECT *

FROM employee

4) Make judicious usage of the GROUP BY and ORDER BY. Verify if we can have alternate solutions instead of using GROUP BY and ORDER BY.

5) Desist from making use of HAVING clause with the GROUP BY all the time. Use it only if it is unavoidable.

SELECT empname

FROM employee

WHERE empname = ‘Abhilash’

Is better than

SELECT empName

FROM employee

GROUP BY empname

HAVING empname = ‘Abhilash’

6) Let us compare the following two queries.

SELECT emp_ID, empname, salary

WHERE salary > 0

And

SELECT emp_ID, empname, salary

WHERE salary != 0

The query written as

SELECT emp_ID, empname, salary

WHERE salary > 0 is more optimized than the other and will return a low cost.

7) Usage of an appropriate join will be more advantages than using a sub-query due to parsing constraints.

8) Avoid queries with full table scan unless it is warranted for.

9) Judiciously use the Table JOINS, IN and EXISTS.

10) Using the IN in a subquery is faster than using other constructs

11) Whenever possible use a non-column expression. A non-column expression indicates having the FIELD COLUMN on one side of the equation/ operator and the values on the other side.

E.g

A condition to verify for the electric charge and its slab_rate can be written in the following form.

Using the expression

WHERE Elec_Charge < 750/(1 + slab_rate)

Will be always faster than using

WHERE Elec_Charge + (slab_rate * Elec_Charge) < 750

12) Using the IN with a subquery holds good while using EXISTS within the parent query works better.

13) Using IN within the parent query tends to be slow.

14) Using image/ objects within the table tends to slow down the query. Instead use the underlying file system to save these objects and refer to these objects by using a pointer stored in the database table.

15) Avoid having full table scans for very large tables.

16) Verify proper usage of the indexes in order to derive the best out of the database.

17) Using EXISTS instead of IN definitely works faster.

E.g.

SELECT title

FROM titles

WHERE EXISTS

(SELECT *

FROM publishers

WHERE pub_id = titles.pub_id

AND city LIKE ‘B%’)

is faster than using the query with the IN

SELECT title
FROM titles
WHERE pub_id IN
(SELECT pub_id
FROM publishers
WHERE city LIKE 'B%')

18) In cases when we encounter a NOT IN, try writing the same query using an OUTER JOIN. Definitely helps.

E.g.

SELECT Order_id from Order

WHERE Order_id NOT IN (SELECT Order_id from Sales)

Can be rewritten using

SELECT o.Order_id from Order o, sales s

WHERE o.Order_Id = s.Order_ID(+)

19) Avoid the usage of LIKE. Instead use the equality operation, which works faster.

Hope this holds good for you. Get back with your comments and suggestions. Thank you.

No comments:

Post a Comment

Drop in your comments/ feedback

Calorie Calculator

Calculate how much you expend in 1 hour of your favorite exercise. Health Tips.
Powered By Blogger

Followers

Images - Decision tables

Images - Decision tables
Important image details for the Decision tables

Risk Management

Risk Management
Risk Management