Washington Area Informix Users Group

Home

About Us

Upcoming Events

Current Newsletter

Newsletter Archive

Presentations

Become a Member

Sponsors

Links

January 1994 Newsletter

Volume 5, No. 1


Highlights of This Issue

Forum 94 - by Peter Wages

Informix Announces Site on World Wide Web - Press Release

Informix Query Optimization - by Kevin Fennimore

Informix DRDA- DB2 Gateway - by Israel Gotay (Note: this articule is not available online)


WAIUG Forum 94

Last December, the Washington Area Informix Users Group held its second user group forum in Tysons, Virginia. The forum was a one day event with 16 sessions on Informix database development, and 12 exhibitors demonstrating exciting new database tools and products. Over 185 people attended the event. See the article by Peter Wages on page 3 for more details. On behalf of the user group, I would like to thank all the people who volunteered to make it happen, the speakers, the exhibitors, and the folks from Informix.

NEXT MEETING

1) Informix Query Optimization

2) New InformixLink WWW site on the internet

The next meeting will cover two topics. Kevin Fennimore will present and discuss Informix Query Optimization. (See Kevin's article in this newsletter for a preview.) Then, Lester Knutsen will present the new InformixLink World Wide Web site on the Internet. There will be time before and after the meeting for you to try out InformixLink on the Internet. The meeting will be at the new Informix Bethesda Office (Commercial Division).

Date: March 24, 1994 Time: 9:00 am to 12:00 noon

Location: Informix Software, Inc. Phone: 301-214-1300

Two Democracy Plaza, 6707 Democracy Blvd., Suite 500

Bethesda, MD 20817

1995 Informix Worldwide User Conference

The Informix Worldwide User Conference is a once-a-year opportunity for Informix end users, partners, industry executives and interested customers to meet. This year's conference will feature more than 60 conference sessions, tutorials and other breakouts designed to help Informix's customers and partners effectively use Informix database technology to be more competitive. The conference will also feature a 75,000 square foot exhibition hall where over 100 Informix customers, hardware and software partners, and VARs will showcase Informix-based solutions. The conference and exhibition will be held July 17-20 in San Jose, California. For more information call 1-800-926-EXPO.

Sponsors for the Newsletter

This newsletter is sponsored by American Computer Technology, Inc., Advanced DataTools Corporation, and Business Systems Support Group. The sponsorship fee allows a company to have a full page for advertizing or announcements. If you company would be interested in sponsoring a future newsletter, please call Lester Knutsen, at 703-256-0267.

List of Employment and Consulting Opportunities

As editor and president of WAIUG, I get several calls each month from companies looking for skilled Informix programmers and consultants. I have started a list of the companies that have called and will fax or email this information to any members who wants it. I would also like to start a directory of members who are Informix consultants. Several companies have asked for a directory of members or consultants. We don't want to get into the job- or consultant-placement business, but will have information available for members who need it. We will be discussing this a our next Board meeting and I would appreciate any feedback from members on this topic.

Plans for 1994

We plan to have quarterly meetings and publish four newsletters this year. The meetings will be in March, June, September, and November. The four topics that are of most interest to our members based on responses from the forum registrations, are 1) client/server database development, 2) database network and management, 3) Informix NewEra, and 4) database performance tuning. Our goal is to focus our meetings on these topics.

June Meeting - Client/Server Development

Our June user group meeting will focus on client/server development and setting up a database network. What our members are most interested in is other Informix users describing what works and how they did it. If you have implemented client/server and network databases using Informix and would be interested in discussing how you accomplished it, please give me a call (Lester at 703-256-0267). We need to cover topics such as: what needs to installed on a PC to connect it, how to install I-Net and I-Star, what tools work, what mistakes have been made, how do you set it up for a small office, and large projects.

Elections to the Board of Directors

At our June meeting we will be holding elections to the WAIUG Board of Directors. The Board is comprised of volunteers who plan our activities and work hard at putting them together. If you are interested in volunteering to be on the Board, or would like to nominate a member, please contact the Secretary of the WAIUG, Rick Montgomery, at 703-756-7273.


Forum 94

By Peter Wages

The Washington Area Informix Users' Group held their second annual forum on December 15, 1994 at the Tysons Corner Westpark Hotel. Forum 94 hosted 183 participants, eleven speakers and ten exhibitors. Each participant received a diskette of public domain software contributed by members of the Informix mailing list on the Internet.

Forum 94 opened with a rousing address by Informix Inc. Vice President Robert MacDonald. Mr. MacDonald reviewed the accomplishments that Informix had in 1994, including the introduction of New Era, anew graphical client/server application development tool, and the addition of United Airlines as a major customer. With friendly enthusiasm, he interspersed his remarks with several punches at Oracle Corporation. For example, Mr. MacDonald showed a slide of an Informix billboard located at the entrance of Oracle Headquarters. He reported that Oracle employees hate that billboard.

WAIUG members and Informix Inc. then presented talks about various subjects relating to Informix. Mr. Wayne Beekman of Information Concepts Inc. spoke about "Visual Basic Client/Server With Informix". Judging from the discussion about Open Database Connectivity (ODBC) drivers on the internet mailing list, many are interested in this topic. Information Concepts Inc. has had some success with Visual Basic and Q + E Inc. 's ODBC driver.

Lester Knutsen talked about "The Internet, Public Domain Software, and Support". He impressed Filomena Resurrecion of Netrix Inc. with the abundance of information available.

David Sears of Informix Inc. and John Woolsoncroft of Concepts Dynamic Inc. discussed New Era, the new graphical client/server tool released by Informix this year. New Era is to compete with Visual Basic and Powerbuilder, other client/server products, in the marketplace. Among many of New Era's benefits, is that the programmer does not have to write C code for many routines. Informix is developing a New Era function library.

Mayank Patel of Informix Inc. spoke on "Online Performance Tuning". Although his talk was very knowledgeable, he geared the discussion to version 6 of Informix which caused some confusion because many in the audience were still using version 5. The two versions are rather different.

It was great to see old friends and colleagues. One does enjoy the chance to what others are doing. Lester Knutsen and the folks at Summit Data Group did a wonderful job organizing the event.

Bob Formica of the Aberdeen Proving Ground won the raffle. He received a copy of New Era.

Peter Wages

VSE Corporation 2760 Eisenhower Ave Alexandria, Virginia

pmwages@cais.com Work: (703) 329-2626 Home: (202) 244-3511


PRESS RELEASE: INFORMIX ANNOUNCES SITE ON WORLD WIDE WEB

Company is First Independent Database Vendor to Offer Online Information to Customers Via World Wide Web

MENLO PARK, Calif., (January 3, 1995) -- Informix Software, Inc. (NASDAQ:IFMX), the leader in parallel processing database technology, announced today the expansion of its site on the World Wide Web (WWW) to a fully interactive Web server, called InformixLink, which will provide users with easy access to a variety of information on Informix products, services, partners and customers.

Informix was the first independent database vendor to offer a site on the Web in 1994, and is now the first database company to offer a complete interactive Web site. Informix's WWW site will be accessible at http://www.informix.com/ by January 4.

"This announcement is our response to the growing needs of our customers for more immediate, readily accessible information about our company, products and services," said Jim Hendrickson, vice president of Customer Services. "The Informix Web site will give customers a current resource of information whenever and from wherever they need it." WWW Site an Extension of Original InformixLink

The new Informix Web site is an extension of an existing service which provides interactive, online, subscription information service for Informix customers, resellers, partners and developers. The original InformixLink, now called InformixLink Classic is still available.

Although the new InformixLink web site will be available to the general public, a portion of the site will be restricted to subscription customers. These subscribers will now be able to access a "members only" portion of the web site with access to information such as product documentation, training, software and technical notes, in addition to all of the publicly available information. Informix employees will be able to access information on corporate policies, programs, benefits and internal job opportunities.

InformixLink Provides Easy Access to Information

InformixLink will allow any Internet user to access information about Informix. Informix's Home Page offers an easy, interactive interface to the company's web site and information on:

* Corporate background: Company information and facts; an events calendar including user conferences, tradeshows and seminars; recent Informix news and press releases; investor relations information; and a corporate calendar.

* Products: Product briefs and brochures on Informix's industry leading database servers, tools and connectivity products.

* Customer/user services: General information regarding Informix customer service; an explanation of the various levels of Informix customer support; training schedules, course descriptions and on-line registration for classes; descriptions of Informix consulting services; listing of Informix user groups; and information on ordering products.

* Industry solutions: Customer success stories organized by industry and business need (White papers on key Informix technology initiatives and icon links to customer press releases will be added later in the year.)

* Partner programs: An overview of Informix Value Added Partners (VAPs) with icon links to related success stories.

* InSync program: an overview of the Informix InSync development partner program with icon links to related successes.

* "E-Zines": Electronic copies of "Informix Times ", the Informix customer newsletter; and briefs on Informix-related books available from Informix Press.

* Job opportunities: in Menlo Park, Calif., Lenexa, Kan., and for sales positions worldwide.

* Help, comments and suggestions: Instructions on how to use the server; a table of contents, explanations of icons; and an interactive form for user comments; and suggestions.

About Informix

Informix Software is the leading supplier of high performance, parallel processing database technology for open systems. The company's database servers are the number one choice of computer hardware manufacturers for publishing Transaction Processing Council (TPC) benchmarks for UNIX-based systems. Informix products also include application development tools for creating client/server production applications, decision support systems, and ad-hoc query interfaces, and connectivity software that allows information to be shared transparently from PCs to mainframes within the corporate computing environment. The company's corporate headquarters is in Menlo Park, California.


Informix Query Optimization

by Kevin Fennimore

An often overlooked step in developing a database application is that of optimizing the queries performed against the database. Database queries include any statement that searches the database for specific rows, such as select, update and delete( i.e. statements with a where clause ). These statements are usually written without regard to how the database engine will access the data requested. It is assumed that we, the programmers, tell the database server what data we want and the server will find that information for us in the most efficient way possible. The database server does its best but, unfortunately, it does not always find the most efficient way. Following are some ideas and examples of how we can optimize our queries to ensure the database server gets the data we request in the most efficient way possible.

The first thing to do when optimizing a query is to set up a good test environment. Some guidelines to follow for the test environment are:

  • run the queries several times ( keep track of the changes and what affect each one had on performance )
  • run queries on a machine with minimal system activity
  • use data that represents the production data the query will eventually run against and use the same data for each run of the queries
  • run update statistics on the tables involved in the query
  • use the output of the set explain command
  • time each run of the queries
  • check the output of the query each time to make sure that it is correct
Running the queries several times will help to determine which is the best query to use. It is also a very good idea to keep track of each change; it is very frustrating to write a select that works well, try to improve it and lose the one that worked well. Using a machine with minimal system activity will help reduce differences in the performance of the changed queries due to other users.

Running the queries against "production" data will give a better simulation of how they will perform in a production environment. Often, developers will run their programs against a test database that does not have the volume of data or the disbursement of data that is typically found in a production environment. This can hide how inefficient a query may actually be. There is a big difference between sequentially reading from a table with a few hundred record and one with a few hundred thousand records.

As mentioned in a previous article( "The Set Explain Command" ) the database engine has an optimizer which determines how to access the requested data. It examines the information in the system catalogs and determines the best query path to access the data. Since the information it uses is stored in the system catalogs it is very important for the update statistics command to be run frequently, especially on tables that are very dynamic. This ensures that the information being used by the optimizer is accurate.

The output from the set explain command is the most useful tool for optimizing database queries. It tells you exactly how the database engine will read from the tables to get the data. When changes are made the effect of those changes can be seen in the set explain output. Sometimes it is a good idea to run a problem program with set explain turned on so that all of the queries performed by the program can be examined. This can be a hassle since the program would have to have the set explain command added to it and then recompiled. A solution is to add a call to the following 4GL function at the beginning of a program; after the database statement:

		function set_explain()|
			define set_explain char(10)
			let set_explain = fgl_getenv( "SET_EXPLAIN" )
			if( set_explain = "ON" )
			then
				set explain on
			else
				set explain off
			end if
		end function
This function gets the value of the environment variable SET_EXPLAIN. If the value is "ON" then set explain is turned on, otherwise it is turned off. This function is useful for running a program in a production environment without having to recompile the program to get set explain output.

Timing the queries is also useful and can be accomplished by using UNIX's timex command or other similar utility. The query can be put into a .sql file and run from the command line as follows:

		timex dbaccess dbname query.sql > try1.out 2>try1.err
The output from the query is stored in the file tyr1.out and the errors, including the output from timex, is in the file try1.err. Look at the man pages for timex and look at the different options and the output for each one. Some versions have an option that reports time for all of the child processes associated with the process being timed. This is useful since it would include the time for the database server(sqlexec or sqlturbo).

The next thing to do is to optimize the queries. Keep in mind that the main goal of optimization is to reduce I/O performed by all application processes ( i.e. the front-end and back-end processes ). Some of the best optimization techniques are to use indexes appropriately and to prepare statements that are executed over and over again. These have been discussed in previous article( "Indexing Strategies" and "The Prepare Statement" ). However, there are other ways to optimize queries that go beyond indexes and preparing statements.

Using Temp Tables

The primary key to optimizing a query is to know what the query is doing and the type of data being retrieved. For example, if we had a table of order items that contained 500,000 rows of data and, for a variety of close-out numbers and purchase numbers, we wanted to know how many were of type "Special" we might write the following selects:

		select count(*) from order_items
		where order_type = "Special"  and closeout_num = ?
		
		select count(*) from order_items
		where order_type = "Special" and purchase_num = ?
Hopefully there would be an indexes on purchase_num and closeout_num that could be used to access the data. If not, we would sequentially read 500,000 records from the order_items table each time one of these queries was executed. Let's assume that only 20,000 records were of type "Special". That would mean that we would sequentially read through 500,000 records even though we are only interested in 20,000. If there were indexes on these columns but the indexes were not very unique there could still be performance problems.

An alternative to the above selects is to create a temporary table that would contain only the records from the order_items table that were of type special order. We could then build indexes on that temporary table and repeatedly query from that table as follows:

		select closeout_num, purchase_num
		from order_items
		where order_type = "Special"
		into temp special_orders with no log
Note: The with no log is for OnLine users and it tells the server not to log modifications to the table.

		create index i_spec_orders1 on special_orders( 
closeout_num )
		create index i_spec_orders2 on special_orders( 
purchase_num )
		
		select count(*) from special_orders
		where closeout_num = ?
		select count(*) from special_orders
		where purchase_num = ?
The selects from this temporary table would very efficient and, with OnLine, the selects would be able to perform key-only reads. Using temp tables in this manner is only good for selecting data that is not going to change. Once the data is retrieved into the temporary table it does not change as the data in the original table changes, it is just a copy.

UNIONs instead of ORs

Queries that use the or in the where clause present another opportunity for optimization. If, for example, we were looking for customers that lived in California or New York we could write the following select:

		select * from customers
		where state in ( "CA", "NY" )
Note: The in clause translate into and or clause(e.g. state = "CA" OR state = "NY" ).

If there was an index on state, the server could use that index to satisfy the query. If, however, we were looking for customers that lived in California or had a status of inactive we might write the following select:

		select * from customers
		where state = "CA"
		      or customer_status = "I"
This gets the rows we are interested in but causes the server to read through the table sequentially because it cannot use an index. If indexes existed on state and customer_status, a more appropriate way to write the above select is to use the UNION as follows:

		select * from customers
		where state = "CA"
		UNION
		select * from customers
		where customer_status = "I"
This causes the server to execute each select individually and thereby allows it to use the indexes on state and customer_status.

Another example of using the union instead of an or, which was not as obvious, was the following select:

		select count(*) from transaction_log
		where trans_date >= ?
		   and trans_date < ?
		   and trans_id in ( 1, 5, ..., 100 )
		   and some other criteria
There was an index on trans_date and one on trans_id, trans_date and fields used in the "other criteria". The optimizer decided to use the index on trans_date to read from the transaction_log table which was extremely inefficient since the date range being used in the query included most of the table. However, the condition on trans_id only included a small portion of the table. The only way to make the optimizer use the index on trans_id was to remove the index on trans_date. This was not a feasible solution since the affect on other queries could not be determined. The not so obvious solution was to remove the or clause( or the in clause in this case ) and replace it with a union as follows:

		select count(*) from transaction_log
		where trans_date >= ?
		   and trans_date < ?
		   and trans_id = 1
		   and some other criteria
		UNION ALL
		select count(*) from transaction_log
		where trans_date >= ?
		   and trans_date < ?
		   and trans_id = 5
		   and some other criteria
			...
		UNION ALL
		select count(*) from transaction_log
		where trans_date >= ?
		   and trans_date < ?
		   and trans_id = 100
		   and some other criteria
Note: The UNION ALL in the example above will get all of the records from each select. The UNION without the ALL excludes any duplicate records.

This caused the optimizer to execute each statement individually and use the appropriate index. It did not do much for the readability of the code since it expanded one simple select into about ten or so unioned selects. The code also had to be changed to loop through and sum all of the counts returned by the union. It did have a dramatic affect on performance; the program went from running an average of 45 to 60 minutes to less than 10 minutes.

Select Specific Columns vs. Select *

As mentioned earlier, the object of optimization is to reduce I/O. This applies to both front-end and back-end or server processes. One commonly overlooked I/O is the communication, through pipes, between the front-end and the back-end. When a select is executed the front-end writes the select to the back-end, the back-end executes the select and then writes the results back to the front-end. When selecting information from a table it is very common to do the following select:

		select * from customers
If all of the columns from the customer table are not needed then this select is inefficient. If only a few rows are returned it is not a problem, but when several thousand rows are returned and the size of each row is several hundred bytes the writes across the pipe start to add up. It is therefore more efficient to only select the columns that are necessary.

Inserts with Select Statements as Value Lists

It is also helpful to reduce the number of statements passed, or written, to the server. For example, if we were to select data from a table and then insert it into another table, the code might look as follows:

		declare c1 cursor for
		select * from customer where state = "CA"
		foreach c1 into rec.*
			insert into some_table values( rec.customer_id, ... )
		end foreach
This would cause data to be passed to the front-end and the insert to be passed to the server for each customer record. An alternative to this would be:

		insert into some_table values
		select customer_id, ...
		from customer
		where state = "CA"
This inserts all of the rows that would have been inserted in the previous section of code. In this case, however, only one statement is passed to the server and no records are returned to the front-end. The problem with this is that there is no control over errors in the insert. If one record fails to be inserted, the rest of the records will not be inserted. In the first example, error handling could be added to process errors in the insert and continue inserting subsequent rows.

Outer Joins

Another way to reduce the number of statements passed to the server is to use outer joins. If we wanted to select all of the customers and we also wanted to see how many orders they had we might write the following code:

		declare c1 cursor for
		select customers.cnum, customers.name
		from customers
		foreach c1 into p_cnum, p_name
			select count(*) into p_count from orders where cnum =
p_cnum
			output to report xyz( p_cnum, p_name, p_count )
		end foreach
This would execute the select from orders for each customer record. Using the outer join the code would look as follows:

		declare c1 cursor for
		select customers.cnum, customers.name, orders.cnum,
count(*)
		from customers, outer orders
		where customers.cnum = orders.cnum
		group by customers.cnum, customers.name, orders.cnum
		foreach c1 into p_cnum, p_name, p_orders_cnum, p_count
			if ( p_orders_cnum is NULL )
			then
				let p_count = 0
			end if
			output to report xyz( p_cnum, p_name, p_count )
		end foreach
The outer join tells the server to retrieve all of the rows from the customers table and then to retrieve any corresponding rows from the orders table if they exist. If no corresponding rows exist on the orders table a NULL value is returned for orders.cnum. So in the code we check the value of p_orders_cnum to see if it is NULL. If it is then there were no orders for that customer. This reduces the number of statements sent to the server and thus improves performance.

Indexes and Order Bys

The last optimization technique is using indexes to perform order bys. The optimizer gives priority to indexes that will help the server reduce the number of rows fetched( i.e. the where clause ). It then looks at the order by clause to see if the indexes being used for the where clause will produce the correct order. Remember that doing an index read returns the rows ordered by the key of the index. If the index will not produce the correct order, a temporary sort file is used. Sometimes it is more efficient to do an indexed read on an index that is not used in the where clause but does produce the correct ordering. For example, if we want to see all of the customers with a status of "A" and we want it ordered by name, the select would be:

		select * from customer
		where customer_stat = "A"
		order by customer_name
If there is an index on customer_stat and customer_name the optimizer will choose the index on customer_stat to access the customer table and a temporary file will be used to sort the records ( this can be seen by examining the output of set explain ). In this case let's assume that 90% of the customers have a status of "A". This means that we will retrieve 90% of the records into a temporary file. This is not very optimal if there are 500,000 records in the table( that would be 450,000 records in the temporary file ).

It would be better if the optimizer used the index on customer_name since the index would read the records ordered by customer_name and thus there would be no need for a temporary file. In order to achieve this we must "force" the optimizer to use the index on customer_name is to include customer_name in the where clause as follows:

		select * from customer
		where customer_name >= ""
		  and customer_stat = "A"
		order by customer_name
This should cause the optimizer to use the index on customer_name to satisfy the where condition and the order by. Since all strings are greater than or equal to the empty string, the additional condition on customer_name does not affect the results of the query. If the optimizer still insists on using the index on customer_stat, which in some cases it may, the check for customer_stat = "A" can be removed from the select and put in the code. Any row returned that has a status not equal to "A" can be skipped.

These are just some of the techniques that can be used to optimize database queries. There are many other techniques that can be learned through practice( a.k.a. trial and error ). The best way to learn new techniques is to try various options. Think of alternative ways to perform queries that might not seem immediately obvious. The key is to remember that you are trying to improve performance by reducing I/O. This means reduce the number of rows read, reduce the communication between front-end and back-end and reduce the need for temporary files.

The other thing to keep in mind is to always use the output of set explain. It tells you exactly how the server will perform your query and whether it will use indexes, sequential scans, temporary files, etc. It is the most useful tool available for optimizing queries.

Kevin Fennimore

UCI Consulting Inc.

(919) 644-1073


Informix DRDA- DB2 Gateway

by Israel Gotay

Reprinted from the April edition of Computing Solutions with permission

from Technical Enterprises, Inc. (NASPA) Copyright 1994

(Note - this articule is not available online)


This Newsletter is published by the Washington Area Informix Users Group.

Lester Knutsen, President/Editor

Washington Area Informix Users Group

4216 Evergreen Lane, Suite 136, Annandale, VA 22003

Phone: 703-256-0267

lester@access.digex.net