Washington Area Informix Users Group

Home

About Us

Upcoming Events

Current Newsletter

Newsletter Archive

Presentations

Become a Member

Sponsors

Links

January 1996 Newsletter

Volume 6, No. 1


Highlights of This Issue

Informix Invites You To View The Future of Technology

Announcing the International Informix Users Group, by Walt Hultgren

The Quality of Your SQL, by Gavin Nour


Forum 96 - Optimizing Your Informix Database and Applications

Forum 96 is a one day event for programmers, database administrators, system analysts, and managers who want to get the most out of their Informix applications and databases. Meet and network with other Informix users, see the newest technology, and learn key technical tips that will enable you to optimize your databases and applications. This is our third one day Technical Forum. The past two Forums averaged over 200 participants, speakers offering many technical tips, and exhibitors with the newest products. If you are using Informix 4GL, SE, OnLine 5.X, or moving to NewEra or OnLine 7.X, the Forum will have something for you.

Location:      Sheraton Premier at Tysons Corner

               8661 Leesburg Pike, Vienna, Virginia

Date:          Friday, March 1, 1996

Time:          8:00 am to 5:00 pm

Keynote: "Future Directions for Informix"

  • Bob Macdonald, Vice President, INFORMIX Software Inc.

Technical Session Topics:

  • Optimizing Informix 4GL
  • OnLine 5.X Tuning
  • 4GL to NewEra Migration
  • OnLine 7.1 Tuning
  • NewEra Implementation
  • OnLine 8.X - New Features

Lunch Session: "New Features in NewEra 3.0"

  • Anne Buzbee, Senior Marketing Manager for Tools, INFORMIX Software, Inc.

Closing Talk and Technical Q&A Session:

Jonathan Leffler, author of "Using Informix SQL", and one of the most helpful contributors to the Internet newsgroup comp.databases.informix will be our closing guest speaker, and will respond to your questions in our Questions and Answer session.

Lunch is sponsored by INFORMIX Federal Division

Breaks are sponsored by INFORMIX Commercial Division

Exhibitors

  • Advanced DataTools Corporation
  • Arbor Software Corporation
  • BMC Software
  • Concepts Dynamic, Inc.
  • Crystal Services, Inc.
  • INFORMIX Software, Inc.
  • Performance Software, Inc.
  • Platinum Technology, Inc.
  • Pure Software, Inc.
  • Pyramid Technology
  • SelectStar, Inc.
  • Summit Data Group
  • The Technology Group

December User Group Meeting

Our December meeting featured presentations from Compuware Corporation, BMC Software, and our new user group Web Pages, by Lester Knutsen. Compuware presented Eco-Tools which allows Informix database administrators to pro-actively manage distributed, mission-critical Informix databases. BMC Software presented Patrol for Informix OnLine databases to bring automated, 24 hour, event driven management to database administration. Then we explored the World Wide Web, looking at Informix resources on the Internet, and introduced our user group Web pages.

WAIUG Newsletter Articles Around the World

Did you know that two of our WAIUG newsletter articles appeared in Informix Tech Notes? We are also sharing articles with other user groups around the world and benefiting from articles from other uses groups. Please let us know if you would like to contribute an article.

Newsletter Sponsorship

We would like to thank the following companies for sponsoring this newsletter:

Business Systems Support Group, Inc

Information Concepts, Inc.

Marriott

Price Waterhouse LLP

Summit Data Group

New Benefits for Members

In addition to this newsletter and our local activities, there is a new reason to be a member of the Washington Area Informix Users Group. All current members will automatically become members of the International Informix Users group for one year. Some of the benefits this includes are discounts to the Informix World Wide User Conference in Chicago in July, and full access to the members-only section of the IIUG Web Pages. Other discount programs are being worked on as well. Have you renewed your membership for 1996? Renew your membership and register for Forum 96 at the same time and save $10. See the registration form on the back of this newsletter. Membership dues are $20.00. We also have a Corporate Membership Program. For membership information, please call our Membership Director, John Petruzzi, at 703-490-4598.


Informix Invites You To View The Future of Technology

Informix is the database company with its eye on the future - from its beginning as the pioneer of database technology for UNIX to its position today as the acknowledged leader in scalable, high-performance database technology for open systems.

Informix's acquisition of Illustra Information Technologies, Inc., will result in the creation of INFORMIX-Universal Server, the first database server for handling all information in your enterprise.

Illustra has taken the power of relational database technology and applied it to the rich diversity of information that is at the heart of next-generation application environments like the World Wide Web. Its ability to manage and process any type of information - documents, photos, web html pages, geo-spatial data, sound, and video - has made Illustra the database for cyberspace.

The combination of Illustra's sophisticated abilities with Informix's powerful features for mission-critical transaction processing and data warehousing will make INFORMIX-Universal Server the information foundation for the future of your enterprise.

To hear how the INFORMIX-Universal will benefit your organization, please join us on February 6 at The Washington Press Club, Washington D.C. for a worldwide satellite broadcast.

Seating is limited, so please RSVP at 1-800-529-6999.


Announcing the International Informix Users Group

by Walt Hultgren

I am pleased to officially announce the formation of the International Informix Users Group (IIUG)! IIUG was formally established at the 1995 Informix Worldwide User Conference in San Jose, California. A number of people from the Informix user community have been working for some time on starting such a group, and it's exciting to have taken this first step.

Mission and Goals

The mission of IIUG is to support Informix users worldwide by working to achieve several broad goals:

  • To promote the growth of local and regional Informix User Groups;
  • To provide a communication channel to disseminate news and other information efficiently to the Informix user community;
  • To give users a coherent collective voice to articulate the issues that they find important as they utilize Informix products in their particular environments;
  • To foster the professional development of individual Informix users through special training, documentation, and other programs;
  • To give an organizational home to those Informix users without access to a local group, and to allow local group members to participate in a global organization.
A number of possible programs have been proposed to meet these goals. We will be investigating these and other proposals to determine which will provide real benefit to IIUG members and to the Informix community at large, yet will be feasible.

Board of Directors

During the inaugural meeting in San Jose, a President and Board of Directors were elected and charged with the task of creating a Charter and a set of By Laws. They will also finish defining the internal structure of the Group. The following people are guiding the operation of IIUG during its first year:

President:

  • Walt Hultgren, Atlanta, Georgia, USA
Board of Directors:

  • Clem Akins, Muscle Shoals, Alabama, USA
  • Tom Bondur, Fremont, California, USA
  • Robert David, Sunnyvale, California, USA
  • Carlton Doe, Salt Lake City, Utah, USA
  • Cathy Kipp, Fort Collins, Colorado, USA
  • Lester Knutsen, Washington, DC, USA
  • Gavin Nour, Sydney, New South Wales, Australia
  • Mike Reetz, Boulder, Colorado, USA
  • Kerry Sainsbury, Auckland, New Zealand
  • Nancy Twomey, Menlo Park, California, USA
  • Malcolm Weallans, London, England

Each of these people brings unique strengths and experience to the Board.

Membership

Membership in IIUG is open to everyone within the Informix User Community, including end-users, vendors, and employees of Informix Software, Inc. As a way of introducing IIUG to Informix users worldwide, we have declared the period continuing through December 31, 1996, to be a Free Charter Membership Period. During that time, membership in IIUG will be free of charge to anyone completing a Membership Application or joining through their local user group. We'll post more details soon on how to join IIUG.

Current Programs

Establishing the full set of activities and benefits that we envision for IIUG will take time. However, there are several programs that we have already been able to put in place:

IIUG Internet Home Site

We have registered our own Internet domain, "IIUG.ORG", and have set up a home site on the Internet, iiug.iiug.org (IP 128.117.71.100). This site can be referred to as simply "iiug.org" in most cases. The two functions that iiug.org will serve initially will be to house our FTP archive and Web pages. A separate message will be posted with more details about these facilities.

Resources for Local and Regional User Groups

There are a number of resources that IIUG is offering to local and regional Informix user groups free of charge: a method for granting free IIUG memberships for all local group members at one time, space on www.iiug.org for local group Web pages as well as assistance in developing them, space on ftp.iiug.org for separate FTP archives maintained by local groups, plus other resources. We will soon be sending details on these offerings to each local and regional group.

1996 Informix Worldwide User Conference

IIUG will be coordinating the various activities related to user groups at this year's Informix Worldwide User Conference in Chicago. Many of these activities will be brought back from last year, and there'll be some new features as well. Prominent among these will be registration discounts for IIUG members, and a special IIUG Call for Papers.

Informix User Group Liaison

Informix Software has created a permanent position, the User Group Liaison, to help promote the growth and success of all Informix user groups. For more information, contact Christine Shannon by sending e-mail to "usergrp@informix.com". She has been working closely with IIUG to help get us going, and has already contacted many local group representatives.

More Information

As we continue with the process of turning the International Informix Users Group into a first-rate organization, we'll make sure to let everyone know of our progress through various channels, including postings to this forum. For more immediate information, visit our Web site at "http://www.iiug.org/" or send e-mail to "info@iiug.org".

It has taken a great deal of effort to bring IIUG this far, yet much formative work remains to be done. If you would like to get involved in any of our activities, feel free to contact me directly.

Walt Hultgren

President, International Informix Users Group

Walt Hultgren			Internet: walt@rmy.emory.edu  (IP 128.140.8.1)

Emory University		UUCP: {...,gatech,rutgers,uunet}!emory!rmy!walt

954 Gatewood Road, NE		BITNET: walt@EMORY

Atlanta, GA 30329 USA		Voice: +1 404 727 0648


The Quality of Your SQL

by Gavin Nour

Some organisations are suprised when I suggest that all SQL should be passed through Quality Assurance (QA) by analysing the output of the command SET EXPLAIN ON.

This article will discuss why we should QA SQL, explain the role of the Informix optimiser (including how to read the output from SET EXPLAIN) and offer some tips to improve SQL performance.

The subject should be pertinent to all development where an Informix database is used, regardless of the tool. The SQL development could be via an Informix tool such as 4GL, I-SQL, ESQL or NEWERA or it could be via a third party product accessing an Informix database.

Firstly, why QA your SQL? Generally I recommend placing strict controls on ALL SQL development. This becomes especially important today as we are finding the size of databases are growing at an alarming rate. The larger our databases become the more visible our mistakes become and the more problems we have. Large databases are less forgiving, especially in the area of performance. Paying careful attention in the development phase will of course reward us in the future.

Quality assurance will not only help us avoid performance problems, it will also help us avoid other serious problems such as filling the root database space unnecessarily with large temporary tables (possibly causing other INFORMIX applications to fail) or filling UNIX filesystem space with large sort files (possibly causing any UNIX process to fail) or even filling our logical logs with long transactions and having to restore the database from an archive.

One of the best ways to avoid such problems is to enforce SQL development standards. Usually the best approach is to separate the SQL Quality Assurance process from the Development process. Many large organisations do this and have a team of staff devoted to general QA others have just one person. My opinion is that usually the DBA is in the best position to QA SQL statements as he/she is most familiar with the database and knows how the engine goes about its task. Also the DBA should be aware of the SQLs being executed as this may highlight the need to alter the database schema (eg.; add an index). Regardless of who performs the QA, the most important thing is that it gets done and that both the developer and the person responsible for QAing the SQL understands what to look out for.

Undoubtably the best tool to use for optimising a query or for the QA process is the command SET EXPLAIN ON.

We should remember first that although we can optimise the queries proper database design plays a major part in the query performance and often the redesign of tables and an effective indexing strategy will solve many problems. Maybe a future article will discuss database schema optimisation. This article will focus on the SET EXPLAIN ON option and endeavour to explain (pun intended) how this command can help you optimise queries. First we have to;



Understanding the Informix Optimiser

What is the optimiser?

The optimiser is the part of the INFORMIX engine that has the job of trying to anticipate the best route to take before running a given query. Because there are often many paths that can be taken to get to the data, some paths will take much longer than others. This is especially true when there are many tables involved in a join. Developers usually develop the query and then leave the task of finding the best route to the optimiser and assume that the engine will use the best method available. Unfortunately because the optimiser does a good job most of the time we tend to forget that the engine does not always make the right decision. Sometimes we know more than the engine does and can see a better way of doing things. The output from SET EXPLAIN ON explains how the optimiser has chosen to access the data. With this information available we may discover the optimiser has chosen a path that will be time consuming and we can takes steps to restructure the query. Sometimes a small alteration will influence the optimiser to take an alternative route. Other times we realise that an index can be added to improve performance or we may just find out that it is fine the way it is but we need to add more temporary space to cater for a sort.

How does the optimiser make its decision?

The answer to this is very complex. The main aim of the optimiser is to reduce I/O by limiting the amount of data to search through in order to obtain the requested data in the most efficient way. The optimiser makes its decisions based on information in the system catalogues. In version 5 this information consists of :

  • the number of rows in each table used in the query (Systables.nrows)
  • how many pages are used for data and how many pages are used for indexes (systables.npused)
  • whether columns values are unique or not.(sysconstraints)
  • what indexes exist (sysindexes)
  • whether the indexes are ascending or descending (sysindexes). Not req in V7.
  • whether the data is in the same order as the index; i.e., clustered (sysindexes.clust)
  • how many levels there are in an index
  • what the second largest and second lowest values are for each column. This gives the optimiser a rough idea of the range of values (syscolumns.colmin and colmax). Version 7 can obtain more detail about the distribution of data (sysdistrib table).

Using all of this information the optimiser will determine all possible routes and then weigh each method with an estimated cost. The cost will be based on several considerations including disk access, CPU resources required and network access. In the process the optimiser will determine in what order to join tables, whether to perform sequential scans; whether it should create temporary tables; whether it can use an index for the select list, for the filter, for the sort or for a group by etc. Once the optimiser selects a plan which it believes is the most cost effective it passes the query on for processing and if SET EXPLAIN ON is in effect the chosen method will be recorded in a file.

To illustrate how important it is for the optimiser to have the right information available we can look at a very simple query and a decision that would have to be made by the optimiser to perform a join :

Lets say we have two tables, one with 500,000 rows (tab2) and the other with 200 rows, both with a unique index on the joining column. A simple select to find related rows from the two tables would be:


    SELECT * FROM tab1, tab2

    WHERE tab1.col1=tab2.col2

Hopefully the optimiser would choose to select from the smaller table first and then join to the second using the index. This would result in 1000 disk reads (assuming one row per page and it takes three index reads and one data read per row in table two). ie. 200 for table one plus 200 x 4 for table two.

Now, lets say the optimiser did not have accurate information available about the number of rows and the number of unique values and chose to select from table two first. This would result in two million reads! (assuming it takes just two index reads and one data read per row in table 1). ie. 500,000 plus 500,000 x 3.

Of course if the optimiser had even less accurate information and didn't know about an index and used a sequential scan for each row returned from table one we would have to perform around one billion reads! (200 * 500,000).

We can see from this example using a very simple query that the wrong information and the wrong decision could have dramatic effects on performance. Eg; 1000 reads verses 1,000,000,000. If there are many tables involved along with more filter conditions in the WHERE clause the decision process becomes much more complex and the importance of accurate statistics is magnified.

How accurate are your statistics?

It is very important to remember that the information in the system catalogues used by the optimiser is only updated when the UPDATE STATISTICS command is executed. The optimiser is only as good as the information it has been provided with, so, rule number one is: execute UPDATE STATISTICS as regularly as possible. More often for very dynamic tables than for tables which rarely change. If the system catalogue information is out of date the optimiser may make the wrong decision and severe performance problems could be experienced. It is surprising to see how many sites fail to do this.

We can execute UPDATE STATISTICS for the whole database, for individual tables, for columns in a table and for stored procedures.

What I usually recommend is that, instead of relying on memory, automate the execution of UPDATE STATISTICS every night via a cron command. Be careful when running the UPDATE STATISTICS command whilst other applications are running because the system tables are momentarily locked when the catalogues are updated. This can result in other processes receiving errors. Do not update too often because the command takes time.

When does the engine perform the optimisation?

The optimisation occurs whenever an SQL statement is prepared, providing there are no host variables. If there are host variables the optimiser does not have all the information required until the variables are passed (when the cursor is opened) and in this case the optimisation occurs on the OPEN statement. So, with standard SQL (i.e. not prepared) the query is optimised whenever it is executed. If an SQL statement is used repeatedly within a program it is best to PREPARE the SQL statement to improve performance. This way the SQL is only optimised once within a program instead of every time it is used (unless specifically requested to reoptimise by using the re-optimisation keyword). With Stored Procedures the SQL is optimised when the procedure is created or when UPDATE STATISTICS is executed for that stored procedure.

Suddenly slow?

Even after running UPDATE STATISTICS regularly in production we may notice one day that a query which previously took 10 minutes suddenly takes one hour. This can be the result of the optimiser suddenly choosing a new path based on new information in the system catalogues. This may mean the query needs restructuring to influence the optimiser to make a better decision. Because of this we should have the ability to turn SET EXPLAIN ON for any program in production instead of being forced to recompile the program to find out the query plan. I usually recommend building a simple function into all programs which enables either a parameter to be passed or an environment variable to be set to turn SET EXPLAIN ON.

Development Verses Production

Remember, not only can the statistics used by the optimiser change but also if you are testing a query in development on a small database the statistics may be quite different to what is in production. A query may appear to perform well in development but once it is in production the optimiser may choose (rightly or wrongly) to take a totally different route. The only way to get around this problem in the QA process is to run the query against a test database which is of a similar size to production with the same data, or alternatively (providing the database schemas are the same) one might consider manually updating some of the columns in the system catalogue tables after running UPDATE STATISTICS on the test database just so that the optimiser makes the same decisions, however this is definitely not recommended and INFORMIX may not support you if corruption is caused.



Controlling the optimiser

Optimisation level

In version 5.x and above we have the ability to influence how much information the engine attempts to obtain. We can do this by using the command SET OPTIMISATION HIGH or LOW. HIGH (which is the default) causes the engine to examine all access paths and LOW causes the engine to eliminate less likely options at an earlier stage in order to improve optimisation time. LOW can be useful if many tables are used in a join (e.g. five or more) as the optimisation time can hinder performance. The downside to using OPTIMISATION LOW is that an access path may be eliminated too early when in fact it would have been the most suitable option.

OPTCOMPIND

Beginning with version 7 we have a little more control over the optimiser. Previous versions assumed that if an index existed this would be the most cost efficient access path. Now we can specify that we would like the optimiser to compare the cost of an index read verses the cost of a table scan so that the most cost efficient path can be chosen. This is done by setting OPTCOMPIND=2 (default). We can still emulate the functionality of previous versions by setting OPTCOMPIND=0. Setting OPTCOMPIND=1 instructs the optimiser to work like it would with it set to 2 unless REPEATABLE READ is set in which case it works like it would if it was to 0. The reason for this option is that a sequential scan with repeatable read will effectively place a shared lock on the whole table (as it scans all rows) during the read. Usually undesirable!

Obtaining data distribution for the optimiser

Beginning with version 6, INFORMIX introduced data distribution analysis and storage in order to help the optimiser make more informed decisions. Data distribution analysis is used to give the optimiser a better understanding of the values contained in a column, e.g.; how many unique values there are in each area of the table. This is done by sampling the data in a column and storing information about sections of the table in various bins. This information can be extremely valuable to the optimiser when dealing with large tables. To generate distribution for a column the UPDATE STATISTICS command is used. The amount of data sampled is controlled by the keywords MEDIUM and HIGH e.g. UPDATE STATISTICS HIGH FOR table(column). MEDIUM merely samples the data and is very quick whereas HIGH evaluates all the rows in the table and is therefore slower but more accurate. LOW is like using previous versions of UPDATE STATISTICS and does not obtain any distribution data at all. The distribution analysis can be influenced further by the use of the keyword RESOLUTION and by specifying a resolution value (number of bins) and a confidence value (level of sampling). Refer to the manual for more information on using these parameters.

An UPDATE STATISTICS strategy for ONLINE DSA Version 7

The recommended UPDATE STATISTICS strategy for INFORMIX DSA version 7 is to do the following (in the same order):

1) run UPDATE STATISTICS MEDIUM on all tables using DISTRIBUTIONS ONLY without listing columns and using the default RESOLUTION parameters. Prior to 7.10.UD1 it is better to actually list the columns that do NOT head an index for better performance. 2) run UPDATE STATISTICS HIGH for all columns that head an index or the columns that are definitely part of a query. Execute a separate command for each column. 3) run UPDATE STATISTICS LOW for all other columns in composite indexes.

Step one is very fast as it just samples data, but still gathers useful distribution information. This is used first to obtain information about all the columns that are not in an index. From 7.10.UD1 it is not worth specifying each column due to the speed of MEDIUM. Step two is to get as much distribution information as possible for all columns which are important for query performance (eg joining columns). Because the index can be used (in version 7) the statistics can be gathered relatively quickly, however do note that only one column should be specified at a time with

HIGH so that it can make use of an index. In the final step the remaining columns in all the indexes which are not at the beginning of the index can be referenced in the one statement (per table) using the LOW parameter.

Do not re-run UPDATE STATISTICS for large static tables.

Once the UPDATE STATISTICS steps have been completed you can view any data distribution profiles by using the -hd option with dbschema, e.g.: dbschema -d databasename -hd tablename

TIP: This command is a very useful tool to help determine a good fragmentation strategy.



Examining the Optimiser's Choice

Once we are comfortable knowing that we have supplied the optimiser with enough information we can see what query plan the INFORMIX optimiser has chosen. To do this use SET EXPLAIN ON within the query or within the NewEra, 4GL or esql program. When this is set the optimiser will write output for all queries (for the same process) to a file called "sqexplain.out" in the current directory (usually, however the filename and location depends on the operating system and whether the query is executed on a remote host). Once the query is submitted it is at the point of no return. The only way to examine the optimiser output without completing the query is to hit the interrupt key just after the query starts (after the status line in dbaccess says explain set).

Typical SET EXPLAIN output:


QUERY:

------

select cust_id, order.* from orders, customers where order_date

> "01/12/1995" AND order_date < "01/01/1996" AND

customers.cust_id = orders.cust_id order by order_date DESC



Estimated Cost: 10



Estimated # of Rows Returned: 200



Temporary Files Required For: Order By



1) informix.orders: INDEX PATH



(1) Index keys: order_date

    Lower Index Filter: informix.orders.order_date > "01/12/1995"

    Upper Index Filter: informix.orders.order_date < "01/01/1996"



2) informix.customers: INDEX PATH



(1) Index keys: cust_id (Key-Only)

    Lower Index Filter: informix.customers.cust_id = informix.orders.cust_id

Understanding the SET EXPLAIN output:

Query: {LOW}

This section of the output shows the actual query which was optimised. LOW is displayed if SET OPTIMISATION is currently set to LOW. Note that sqexplain.out is appended to if the file already exists.

Estimated Cost:

This value is simply a number the optimiser has assigned to the chosen access method. The value is not meaningful except to the optimiser as it bears no relationship to real time. It cannot be compared to the estimated cost of other queries and is best ignored. It can be used however to compare changes made for the same query (e.g.; an index change).

Estimated # of Rows Returned:

This is the optimisers estimate based on information in the system catalogue tables. Remember that the catalogue information is fairly limited (especially prior to version 7) so this value will often be inaccurate (more so if the query involves a join). In ONLINE DSA Version 7 distribution information can be obtained for the data which helps the optimiser estimate the number of rows more accurately.

Temporary Files Required For: Order By Group By

When this is displayed there is a GROUP BY or an ORDER BY statement in the query and the optimiser has determined that there is no corresponding index available to obtain the data in the required order. A temporary file will be created to order the result set. This could be very large (depending on the size of tables) so check available disk space and be aware of the affect this sort could have on performance. Indexes cannot be used when the columns to be sorted are from more than one table. Note that in version 7 the optimiser can choose to traverse an index in the direction of the ORDER BY regardless of whether the INDEX is in the same order as the ORDER BY. Prior to version 7 the ability of the optimiser to use the index for an ORDER BY was dependant on whether the ASCENDING/DESCENDING values on the index and the ORDER BY matched.

1) owner.table: INDEX PATH (Key-Only)

This is the table that the optimiser has chosen to read first (indicated by the 1 ). Subsequent table accesses (for a nested loop join for example) will be displayed further down in the explain output and will be indicated by a higher number. For each row returned at this level the engine will query the tables at a lower level. INDEX PATH indicates an index will be used to access this table.

The (Key-Only) notation (with ONLINE only) indicates that only the index will be read and the actual data value (row) will not be read from this table. Key only access would generally be very efficient (prior to version DSA 7.2 ) due to the smaller size of the index compared to the row. Not only is the read for the data row eliminated but more index key values are likely to fit on the one page which in turn reduces I/O. This type of access is only achieved if no columns are selected from the same table. Tip: Avoid the use of SELECT * if possible and only select the required columns. Note that with ONLINE DSA 7.2 key only reads are in fact slower in most cases due to the read ahead capabilities.

(1) Index keys: column_name Lower
Index Filter: owner.table.column > x Upper
Index Filter: owner.table.column < y

Column_name is the name of the column to be used in the INDEX PATH read Lower Index Filter shows the first key value (x) where the index read will begin. Upper Index Filter shows the key value (y) where the index read will stop.

1) owner.table: SEQUENTIAL SCAN (Serial, fragments: ALL)

In the above case all rows will be read from this table using a sequential scan.

The section in brackets relates to version 7. If Parallel is displayed instead of Serial then the engine will perform a parallel scan (this behaviour is influenced by the PDQPRIORITY setting). The ALL notation indicates all fragments have to be scanned because the optimiser could not eliminate fragments after examining the WHERE predicates. NONE indicates the opposite, that is the optimiser has eliminated all fragments and therefore none need to be examined. A number (or list of numbers) indicates the engine will only examine the fragment/s listed (numbers are relative to the order in the sysfragments table) Pay special attention if the sequential scan is being performed at a lower level in the query plan (indicated by a higher number) as this could mean the whole table is being scanned for each row returned in a previous step. Often this is one of the warning bells when optimising/QAing a query. Sequential scans are not so bad when they are for small tables or when they are in the first step of a query plan providing the engine does not have to scan a large table in order to retrieve just a fraction of the table.

AUTOINDEX PATH: owner.table.column

Used more in Version 4. In order to avoid sequential access a temporary index is built on this column to perform a join. Used if an index does not exist on this join column and is generally an indication that you need a permanent index.

SORT SCAN: owner.table.column

Used in combination with a sequential scan when no index is available on the join column. This column will be sorted for later use with a join (See MERGE JOIN).

MERGE JOIN Merge Filters: owner.table.column = owner.table.column

A Merge join will be used to join the results of the two previous selections sets which were prepared for a join. Once the join columns are obtained in the appropriate order (possibly via a SORT SCAN if an index does not exist) the server sequentially reads both result sets and merges them before accessing the rows. Considered faster than a nested loop join in many cases.

DYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters: owner.tab1.col = owner.tab2.column ...

Version 7 only. A Hash join will be used to join the two preceding tables in the explain output. The Build Outer notation tells us which table is used first. The Filter shows how the tables will be joined. Where some complex queries cannot use an index the hash join takes over. A Hash join is also used instead of a sort-merge join and is considered more efficient. Whereas a sort-merge join sorts both tables a hash join typically only sorts one. Hash joins are favoured with large amounts of data especially for PDQ with fragmentation. Rows are placed in a hash table after using an internal hash algorithm. The cost of a hash join can be lower than using an index, especially when more than around 15% of data from a large table needs to be scanned. When the data is not clustered (in a different order to the index) the cost of traversing the index in addition to retrieving the actual rows (in a different order) is quite high compared to a table scan with a hash join. OPTCOMPIND=2 (which is default) will cause the optimiser to consider hash joins instead of an index. Note OPTCOMPIND ought to be set to 1 when REPEATABLE READ is being used (and arguably should be the default).



SQL Query Quality Assurance and Optimisation
(Some Tips, Some Warnings, Some Things To Check and Some Things to Avoid):

  1. Avoid sequential scans on a large table if it is not in the first position of the query plan as the whole table scan is repeated for every row in the preceding table. This can severely affect performance. This not only affects the query in action but can also impact other tasks by changing the recently used pages in shared memory (as it keeps reading the whole table into shared memory). This increases disk I/O all round as other processes are forced to read from disk. Consider adding an index if the query cannot be restructured and performance is slow.

  2. Avoid potentially large temporary sort files. This can consume all available CPU resources, increases disk I/O and consume all available disk space. Consider adding an index on the columns being sorted. A hint: if the optimiser is not using an existing index for the ORDER BY this may be because the column being ordered is not in the WHERE clause. In this case the optimiser can be influenced to use the index instead of creating a temporary file by adding a dummy WHERE condition on the ORDER BY column (e.g.; AND order_num>=0). Prior to version 7 the index has to be in the same order as the ORDER BY. When sort files are used check $PSORT_DBTEMP and $DBSPACETEMP settings, these can help significantly to improve performance by enabling the engine to use more than one disk for the sort file.

  3. Correlated subqueries. Subqueries that make reference (in the WHERE clause) to a selected column in the main query can severely affect performance. This causes the subquery to be executed repeatedly for each main table row. Be very cautious when using statements like EXISTS with large tables as logical logs can fill very quickly in the process. Temporary table space is allocated and logged for each row in the main table (even if no rows are returned in the final select). The worst affect of this is filling the logs and having to restore from an archive. Correlated subqueries should be rewritten to use a join wherever possible. Some more complex rewrites may involve joining to a temporary table.

  4. OR statements on different columns can prevent the optimiser from being able to use an existing index. If an index does exist and the optimiser chooses a sequential scan in the query plan, consider using a UNION statement (one for each OR condition) as this will provide the opportunity for the optimiser to use the index.

  5. If the query is slow even when an INDEX PATH has been chosen on all levels of the query plan, do not assume the optimiser has made the right decision. Check the query plan to see if the tables are being filtered in the right order. The aim is usually to eliminate as many rows as possible in the early stages of the query, but unfortunately the optimiser does not always have enough information to do this correctly (especially in versions prior to 6). Using UPDATE STATISTICS HIGH in version 6 and above will give the optimiser more information about the data distribution so that the right tables are eliminated first. Also note the INDEX reads are not necessarily the best. For example a sequential scan of a table can be faster than an index scan if the data pages have to be retrieved and the pages are not in the same physical order.

  6. Converting data types and comparing character columns is very expensive. E.g.; tab.character_col=tab2.integer_col. If possible consider changing the column types to be numeric. Remember if the join columns are character types the columns have to be compared byte by byte for each row. Although version 7 handles conversions better, the cost of the conversion overhead is still not considered by the optimiser.

  7. Look out for the WHERE predicates which may not be able to use indexes. These include: OR, LIKE or MATCHES with a wildcard at the beginning (e.g.; MATCHES *NOUR), functions (e.g.; MONTH, DAY, LENGTH), negative expressions ( e.g.; != NOUR), non initial substring searches (e.g.; postcode[4,5]>10).

  8. Except for very old versions of the engine the order of tables in a SELECT list or the order of elements in the WHERE clause will not have an effect, however in some cases when the optimiser believes the cost is the same for two different paths, the order of the statements in the WHERE clause may have an effect. Some tricks have been suggested in the past (such as repeating predicates in the WHERE clause) to try and force the optimiser to use a particular index, however this is no longer recommended and will not work with future versions. E.g. with version 7 the optimiser query rewrite feature eliminates duplicates in the WHERE clause.

  9. Avoid long transactions in logged databases. We probably all know by now that long transactions threaten to destroy our databases by filling the logical logs. Warning bells should ring with the following statements: LOAD statements, INSERT INTO xx SELECT yy FROM zz, UPDATE or DELETE statements spanning many rows. As an aside consider locking the table in exclusive mode to avoid excessive lock usage (hindering performance and even worse running out of locks).

  10. Check the placement of WHERE conditions with joining columns to see if all combinations of the WHERE predicate have been included. For example the WHERE predicate; tab1.x=tab2.x and tab1.x >1000 would probably cause the optimiser to use the index on tab1.x, however the index on tab2.x might be much more appropriate. Adding the condition; and tab2.x > 1000 would give the optimiser more choices to evaluate. Another example is tab1.x=tab2.x AND tab2.x=tab3.x. The optimiser in ver 5 would not consider joining tab1 directly to tab3, so adding tab1.x=tab3.x would again provide more choices. Note however that the optimiser in ONLINE DSA V7 rewrites queries so that all possible combinations are examined and these suggestions are not applicable.

  11. Select only the columns required, this reduces the communication between the front end and backend and reduces I/O. Avoid the temptation to SELECT * (all columns when they not required).

  12. Use temporary tables when a subset of data is being reselected with different WHERE predicates. For example a SELECT with the WHERE clause: orderdate> 01/12/1995 and x=1 might be followed by a SELECT with a WHERE clause: orderdate> 01/12/1995 and x=2 in this case if the table is very large performance could be improved by first selecting all rows WHERE orderdate>01/12/95 into a temporary table and then performing subsequent selects on the temporary table.

  13. Use temporary tables to influence the optimiser to take the route that you know is best. This can be done by first selecting the rows you want from a large table into a temporary table and then joining the temp table to the rest of the tables. The optimiser may use a different query plan knowing that the temporary table is much smaller than the original large table.

  14. Consider creating indexes on temporary tables. Often overlooked just because the table is temporary.

  15. When using temporary tables for subsequent selects with ORDER BYs, create the temporary table using an ORDER BY. E.g.; SELECT x FROM y ORDER BY x INTO TEMP temp_tab. This syntax is not available in earlier versions of the engine (pre 4.1).

  16. Consider running UPDATE STATISTICS (within the application) for large temporary tables. Again this is often overlooked just because the table is temporary.

  17. Use the WITH NO LOG statement when explicitly creating temporary tables, this will help performance by eliminating the overhead of writing to the logical logs. This also avoids the possibility of creating a long transaction with a large temporary table. In version 7 temporary tables can be created in a special dbspace which is not logged. Use this feature whenever possible.

  18. Time commands with the UNIX time or timex command or use; SELECT CURRENT FROM systables WHERE tabid=1 before and after the query being tested/optimised.

  19. When testing/timing queries remember that the second time the query is executed it is much more likely to be faster because the pages read the first time round will probably still be in memory. Be aware that this may distort the test results. Use tbstat -p to monitor disk/buffer usage. When timing queries consider restarting the instance between tests in order to re- initialise shared memory.

  20. Fragmentation and Parallel queries. It is important to fragment the data in a way that makes it easy for the optimiser to determine which fragments are active. Making the WHERE predicates clear enough to enable optimiser to eliminate fragments will reduce I/O and enable parallelism. PDQ and fragmentation is a whole new subject in its own right (perhaps another article). Just remember if you are using ONLINE DSA version 7 the trick is to divide and conquer!


To Summarise

The Informix Cost based optimiser does an excellent job in doing what it is supposed to do, that is shielding us from the task of having to think about the best way to retrieve our data. For this reason we often take it for granted. Just imagine if we had to make the same complex decisions as the optimiser does every time we wanted to access our data. Nevertheless, it is important that we understand the optimiser and the consequences of the decisions it makes. More importantly we need to provide the optimiser with the information it needs to perform its job effectively.

Informix are constantly refining the optimiser and finding new ways to improve performance. Some of the comments made in this article may not be valid in future releases. However this article was checked for accuracy (in Nov 95) by the brains behind the optimiser (at INFORMIX, Menlo Park, California) and I wish to publicly thank that person for his valuable time.

I hope this article has been of value to you. Any criticisms, comments or suggestions would be welcome.



Gavin Nour is the Director of Select Software Solutions Pty Ltd. He has ten years Informix experience covering a variety of Industries and applications. He was a DP Manager for a leading financial organisation in the United Kingdom, a technical Support Manager for a leading Informix Distributor in Australia and spent two years at a leading Informix Software house in Australia before forming his own company specialising in Informix services.

Gavin is committed to the Informix solution and to helping the Informix community. He is the Secretary of the NSW Informix User Group and is on the board of Directors for the International Informix User Group.

Gavin can be contacted in one of the following ways:

Mobile: 0419 803 113 (Australia)

Fax: +61 2 314 6310

Email: nourg@acslink.net.au


Washington Area Informix User Group Web Pages and On-Line Newsletters

by Lester Knutsen

We are pleased to announce that we have opened a Web Page and converted all our newsletters from the past three years to WWW documents and that they are available for on-line viewing. This is the place to find the lastest news about our meetings, future plans, links to other Informix related material and catch-up on back issues of our newsletter. The following is the index of our newsletters from the past three years that are available on-line. Visit our Home Page at http://www.access.digex.net/~waiug. (The index is not duplicated here.)


This Newsletter is published by the Washington Area Informix Users Group.

Lester Knutsen, President/Editor

Washington Area Informix Users Group

4216 Evergreen Lane, Suite 136, Annandale, VA 22003

Phone: 703-256-0267

lester@access.digex.net