Wednesday, October 15, 2008

Technical - Database Testing

Database Testing
Abhilash Gopi

Database Testing?

• It is done as part of a detailed test activity for the Database and its interaction with the Application.
• The testing is done to ensure that right data is being input through the User Interface, and the right data is being retrieved from the database and displayed in the relevant reports.

Essence of Database Testing

• You have a SQL database, some stored procedures, and a layer of code sitting
between your application and the database.
• How can you put tests in place to make sure your code really is reading and writing the right data from the database?

An Approach

# Some of us may do a test as follows
• set up database,
• write code to access database,
• run code,
• do a SELECT to see if the records showed up in the database.

If the query shows up some records in the database, they are done.

Hmmm, Well, Does this really scrutinize the database operations and verify the same in detail. Absolutely NOT.


Visual Inspection in DB testing

• What has been done right now is a simple VISUAL INSPECTION.
• The problem with visual inspection is that the tester/ developer does not do it often.
• Also it may be possible that when the tester/ developer makes changes to a system, and maybe months later, the code is broken & some data will go missing.
• Most of us may not spend much time checking the data itself, so it may take a while for this mistake to surface.

How do you start?
• In order to go about a Database Test, it is advisable to have a good test set.
• What is a good test set?
• A good test set is self-sufficient and creates all the data it needs.
• Testing can be simplified if you can get the database in a known state before a test is run.
• One way to do this is to have a separate unit test database
• The Test cases should be designed in such a way that the tester/ developer can start with the database in the initial state and then proceed to test from that state onwards.
Eg.
test cases to clean out the database before starting any tests.

An Approach

• One of the things to be taken care of while testing a database is to have more than one database.
• These database can be run on one server if necessary, or synonyms can be created for efficient testing.

Types of Databases.

# Types of databases (required for testing)
1• The production database.
• Contains Live data. No testing is done on this database.
2• Your local development database.
• Most which is the testing is carried out on this database.
3• A populated development database.
• A database shared by all developers so that they can run the application and see it work with realistic amounts of data, rather than the hand full of records the testers have in their test database.

• This may not be a mandatory database, but is used by most Projects to ensure that the application code work with lots of data (i.e., a copy of the production
database's data).

4. A deployment database.
• Also called and Integration Database, where the tests are run prior to deployment to make sure that local database changes have been applied.
• Also ensure that any database structure or stored procedure changes have been made to the production database before you go live with your code.

Concepts of a Database

• The most simplest element of a database is a cell.
• The cell contents represent an attribute, when considered on a broad perspective.
• Collection of related attributes is called a ROW.
• Collection of different attributes values, and representing an attribute is called as a column.
• Collection of rows is called a table.
• Collection of tables is called a DATABASE.

Database Normalization

• Normalization is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity and scalability.
• This improvement is balanced against an increase in complexity and potential
performance losses from the joining of the normalized tables at query-time.

Database Normalization


# Consider a Bookshop storing the following information
• Title
• Author
• Author Biography
• ISBN
• Price
• Subject
• Number of Pages
• Publisher
• Description
• Review
• Reviewer Name

Database Normalization

• Suppose it contains records pertaining to 2 authors, and we assume that each name is 20 characters long i.e. 20 bytes; then it is found that it will contain 20 x 10 = 200 byes for storing 10 records of each author.
• It will also fail DATA INTEGRITY, since to repeat their names, somebody might make a mistake.

Database Normalization

• Imagine this scenario occurs for only 2 authors. What if, we had few 1000 authors.
• Hence Normalization is necessary.

What is Normalization

• Normalization is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity and scalability.

Normal Forms

• We can have 5 normal forms.
• The 5 normal forms are rarely implemented.
• All Databases are normalized only up to the 3rd Normal Form.
• The drawback of further Normalization is the generation / creation of complex query
structure.



First Normal Form



• Remove redundant data from the horizontal rows.
• This ensures that there is no duplication of data in a given row, and every column stores the least amount of information possible (i.e. atomic data).

First Normal Form

• The Book Information given below is organized as

First Normal Form




Relationship

• Once this breakup is achieved, Now it is time to create a relationship.
• Relationship exists between two tables
- One to One
- One to Many
- Many to Many

Relationship

• The relationship between the Book table and the Author table is a many-to-many
relationship: A book can have more than one author, and an author can write more than
one book.
• To represent a many-to-many relationship in a relational database we need a third table to serve as a link between the two:

First Normal Form -Relationship

Second Normal Form





• Where the First Normal form deals with redundancy of data across a horizontal row,
Second Normal Form (2NF) deals with redundancy of data in vertical columns.




Third Normal Form

• Identify and separate data that is not fully dependent on the primary key, but dependent on another value in the table.

Eg. In the REVIEWER table, the Reviewer_Street and Reviewer_City fields
are really dependent on the Reviewer_PostalCode and not on the
reviewer_ID. Hence we may do the following.



Fourth Normal Form

• Where non-key attributes depend on the key column exclusive of other non-key columns are eliminated.

Fifth Normal Form

• 5NF indicates when an entity cannot be further decomposed.
• 5NF is complex and not intuitive.
• In simple words the 5NF says “After a table has been decomposed into 3 or smaller
tables, must be capable of being joined again on common keys to form the original table”.

Benefits of Normalization


• Greater overall database organization • Reduction of redundant data
• Data consistency within the database
• A much more flexible database design
• A better handle on database security

Drawbacks of Normalization

• reduced database performance.
• Impact on CPU usage, memory usage, and input/output (I/O).

User related Database Design
considerations


• What data should be stored in the database?
• How will the user access the database?
• What privileges does the user require?
• How should the data be grouped in the database?
• What data is the most commonly accessed?
• How is all data related in the database?
• What measures should be taken to ensure accurate data?


SUB QUERY

• It is possible to embed a SQL statement within another. When this is done on the WHERE or the HAVING statements, we have a subquery construct. What is subquery useful for?
• It can also be used to join tables. Also, there are cases where the only way to correlate two tables is through a subquery.
• The syntax is as follows:
SELECT "column_name1"
FROM "table_name"
WHERE "column_name2" [Comparison Operator]
(SELECT "column_name1"
FROM "table_name"
WHERE [Condition])


INSERT clause


• This statement is used to insert a row of data in a table. All inserted values are enclosed using single quote strings. The syntax of this command is:

insert into tablename
(column1name,column2name...columnxname)
values (value1,value2...valuex);

Example
insert into citylist
(name, state, population, zipcode)
values ('Abhi', ‘Kerala', ‘11400000', ‘682 020');

Advanced Concepts of Normalization

• Database Dependency can be put forth with respect to two aspects, (a) Functional and (b) Multivalued dependencies.
• These express the presence of structure in database relations, that can be utilized in the database design process.
• The discovery of database dependencies can be viewed as an induction problem, in which general rules (dependencies) are obtained from specific facts (relation).

Functional Dependency


# Functional dependency
• Why is one relational schema better than the other?
• Informal design guidelines
• Semantics of the attributes : design a relational schema so that it is easy to explain its meaning.
• Reduce redundant values in tuples: Minimize storage space
• Reduce the null value in tuples: Nulls should apply in exceptional cases only.
• Avoid generating spurious tuples. Design relations that can be joined with equality conditions on attributes that are primary or foreign keys

Functional Dependency

• Definition (Functional Dependency) :
• A functional dependency, denoted by X-> Y between two sets of attributes X and Y that are subsets of the attributes of relation R, specifies that the values in a tuple corresponding to the attributes in Y are uniquely determined by the
values corresponding to the attributes in X.
Eg. The Emp Number uniquely determines a name EMPID -> Name


Functional Dependency

• Functional dependencies are determined by the semantics of the relation, in general they cannot be determined by inspection of an instance of the relation. That is, a functional dependency is a constraint and not a property derived from a relation.

KEYS

# Keys
• We say that a set of one or more attributes (A_1,…., A_n) is a key for a relation R if
• Those attributes functionally determine all other attributes of the relation
• No proper subset of those attributes functionally determines all other attributes of R . i.e. a key is minimal.
• A set of attributes that contains a key is called a superkey. Thus every key is a superkey but not every superkey is minimal.
• If a relation has more than one key, one of the keys is designated as the primary key.

Database Designing (Advanced Concepts)


• The Schema
• Lending-schema = (bname, assets, bcity, loan#, cname, amount) Is a bad design
• The set of functional dependencies we required to hold on this schema is

Bname -> assets, bcity
Loan# -> amount, bname

Database Designing (Advanced Concepts)

• if we decompose it into
Branch-schema = (bname, assets, bcity)
Loan-info-schema = (bname, loan#, amount)
Borrow-schema = (cname, loan#)
• We can say that this decomposition has several desirable properties.
• Lossless –join decomposition
• Dependency preservation

Database Designing (Advanced Concepts)


• First we decompose Lending-schema into
• Branch-schema = (bname, bcity, assets)
• Loan-info-schema = (bname, cname, loan#, amount)
• Since bname assets bcity, the augmentation rule for functional dependencies implies that
• bname bname assets bcity
• Since Branch-schema Borrow-schema = bname, our decomposition is lossless join.

Database Designing (Advanced Concepts)


• Next we decompose Borrow-schema into
• Loan-schema = (bname, loan#, amount)
• Borrow-schema = (cname, loan#)
• As loan# is the common attribute, and
• loan# amount bname

This is also a lossless-join decomposition.


Comparison – BCNF and 3NF

• Comparison of BCNF and 3NF
• We have seen BCNF and 3NF.
• It is always possible to obtain a 3NF design without sacrificing lossless-join or dependency-preservation.
• If we do not eliminate all transitive dependencies, we may need to use null values to represent some of the meaningful relationships.
• Repetition of information occurs.
• These problems can be illustrated with Banker-schema.
• As banker-name bname , we may want to express relationships between a banker and his or her branch.

• The table above shows how we must either have a corresponding value for customer name, or include a null.
• Repetition of information also occurs.
• Every occurrence of the banker's name must be accompanied by the branch name.

• If we must choose between BCNF and dependency preservation, it is generally better to opt for 3NF.
• If we cannot check for dependency preservation efficiently, we either pay a high price in system performance or risk the integrity of the data.
• The limited amount of redundancy in 3NF is then a lesser evil.

# To summarize, our goal for a relational database design is
• BCNF.
• Lossless-join.
• Dependency-preservation.

# If we cannot achieve this, we accept
• 3NF
• Lossless-join.
• Dependency-preservation.

Multivalued Dependency

• When designing a database using the E-R modelling technique, a common problem that
arises is that of an entity having multivalued attributes.
• The multivalued dependency relates to this problem when more than one multivalued
attributes exist.

Multivalued Dependency

• Consider the following relation that represents an entity employee that has one
mutlivalued attribute proj: emp (e#, dept, salary, proj)

• For example, e# -> dept implies only one dept value for each value of e#.

Multivalued Dependency

• Not all information in a database is single-valued.
• Eg: proj in an employee relation may be the list of all projects that the employee is currently working on.
• Although e# determines the list of all projects that an employee is working on, e# -> proj is not a functional dependency.


Multivalued Dependency

• The fourth and fifth normal forms deal with multivalued dependencies.
• Before discussing the 4NF and 5NF we discuss the following example to illustrate the concept of multivalued dependency programmer (emp_name, qualifications, languages)


Multivalued Dependency

• The relation includes two multivalued attributes of entity programmer; qualifications and languages.

Multivalued Dependency

• The above relation is therefore in 3NF (even in BCNF) but it still has some disadvantages.
• Suppose a programmer has several qualifications (B.Sc, Dip. Comp. Sc, etc) and
is proficient in several programming languages; how should this information be
represented?


Multivalued Dependency

• Note that there is no relationship between qualifications and programming languages.
• All these variations have some disadvantages.
• If the information is repeated we face the same problems of repeated information and anomalies as we did when second or third normal form conditions are violated.
• If there is no repetition, there are still some difficulties with search, insertions and deletions.

Multivalued Dependency



• The basis of the above decomposition is the concept of multivalued dependency (MVD).
• Functional dependency A ->B relates one value of A to one value of B while multivalued dependency A ->> B defines a relationship in which a set of values of attribute B are determined by a single value of A.



Thank you,

No comments:

Post a Comment

Drop in your comments/ feedback

Calorie Calculator

Calculate how much you expend in 1 hour of your favorite exercise. Health Tips.
Powered By Blogger

Followers

Images - Decision tables

Images - Decision tables
Important image details for the Decision tables

Risk Management

Risk Management
Risk Management