Washington Area Informix Users Group

Home

About Us

Upcoming Events

Current Newsletter

Newsletter Archive

Presentations

Become a Member

Sponsors

Links

July 1995 Newsletter

Volume 5, No. 3


User Group Activities and Events

Highlights of This Special Issue

This is the third year we have produced a special edition of our newsletter for our new members and for the Informix Worldwide User Conference. In the last year we have grown to over 600 Informix users on our mailing lists. We held a one day Forum last December with 180 attendees, 16 speakers and 10 exhibitors. We had technical presentations on Informix 4GL programming tips, networking, OnLine DSA, and NewEra. Our quarterly meetings have included presentations on support services through InformixLink, reviews of Informix products and practical information for Informix programmers and DBA's. This has been a very exciting year for our user group and we look forward to another challenging year. Our objective is to provide technical tips, new product information and the opportunity to network with other Informix Users.

In This Issue:

User Group Activities and Events

Data Warehouse - by Jack Parker

Schema Optimization - by Kevin Fennimore

Informix Archive Sites - by Walt Hultgren

Writing Reusable Informix-4GL Code - by Kerry Sainsbury

User Group Activities at the Informix Worldwide User Conference

The Informix Worldwide User Conference will take place July 17-20 in San Jose. The following are some of the events of interest for User Group members.

I. Track Session: User Groups at Home and Around the World

Tuesday July 18 at 2:00p

Speakers: Carlton Doe, Walt Hultgren, Cathy Kipp, Lester Knutsen, Malcolm Weallans

II. BOF Session: International Informix User Group Kick-Off

Tuesday July 18 at 5:00p

Organization Chair:Walt Hultgren, Research:Cathy Kipp

The Birds-of-a-Feather session will be the first meeting of the International Informix User Group.

III. User Group Exhibit Booth

Tuesday July 18 to Thursday July 20

Coordinator: Carlton Doe

The booth will have information on Informix user groups around the world. It will also have Internet access via a Sun workstation and will show a collection of Informix FTP sites and WWW pages.

Next Meeting

Our next user group meeting will be in September and will feature news from the Informix Worldwide User Conference and a presentation on Load Testing by Performix, Inc. A mailing will be sent out with the date and location, or call 703-256-0267, ext. 4 for the latest information.

Performix, Inc will talk about load testing. Load testing is beneficial to users who are developing new applications or who are migrating to new systems that will support a large number of users. Since load testing is a relatively new concept, many people don't know that this type of technology exists. The presentation would discuss the basic concept of load testing and how it fits into the application development process. Load testing analyzes the effect of many users upon an application. Conscientious load testing during the development process protects the application from failing under the weight of simultaneous users. Applied early in the development process, load testing gives developers an opportunity to correct problems--even if it means changing an architecture or major component--before the problems become virtually too expensive to change.

Newsletter Sponsorship

The user group has been supported by many companies over the years. If your company would like to sponsor a mailing of the newsletter please call for more information. We would like to thank the following companies for sponsoring this newsletter:

Advanced DataTools Corporation

Business Systems Support Group, Inc

Market Consultants

Performix, Inc.

Summit Data Group

User Group Membership

The Informix User Group in the Washington area was started in May 1991. Nineteen people from a variety of companies and federal agencies met to hear about release 5.0 and make plans for a user group. At the meeting we identified the goals of the user group as to share information, learn about new products, provide technical input, discuss problems, share solutions, and network with other Informix users. Our first newsletter was sent out in June 1991. Our user group has grown to over 600 names on our mailing list this year. We are incorporated as a non-profit organization. In order to cover our expenses and support continued growth, our membership dues are $20.00. For membership information call our Membership Director, John Petruzzi, at 703-490-4598


Data Warehouse

by Jack Parker

In February I received the following comment via email regarding data warehousing:

"Cynic that I am, I suspect that data warehousing is just another jazzed up buzzword for something we have been doing all along. Sort of like voice mail is the new term for answering machine. Not quite the same thing, I know, but not significantly different."

I had the same feeling back in October when I was asked to build one. I sat through a 4 hour video-tape on the subject by Richard Irwin and was quite impressed and jazzed on the whole topic by the time it was done.

Essentially there are a number of problems with "operational" systems - systems

which handle day-to-day tasks.

1. They are designed to do a job, not to satisfy user queries.

2. They are built with well-delineated query requirements (e.g., 37 reports)

3. Getting a different slice of data out of them is a pain.

4. They are "functional silos". In a company such as ours (HP) there are all sorts of different business functions which throw data over the wall to the next business function. That may work very well for what they are doing, but when it comes time to take a look at all of the data together to try and draw out trends or make decisions, it's very difficult. Especially given that they use different applications, different OS's, etc.

5. Definition and ownership of data elements. Part_number is one thing to one person and something else to another. Component_number, product_number may be the same thing, but not always. Who owns it and who is responsible for it? With multiple non-communicating systems these things get pretty out of whack.

6. Time element. Data from system A is as of day X, from system B, as of day Y. You run a report one day and the data changes the next. Why did you make the decision that way? Where is yesterday's data?

7. Operational systems as a whole are geared to producing something, not stepping back and looking at how/what the company is doing.

8. Priorities. Operational systems are transaction processing-oriented. They care about the speed of transactions. A data warehouse user doesn't care if their request takes 15 minutes, they just need to be able to formulate a request easily, quickly and without knowing 87 different arcane rules.

Of the many problems that users have with Information Systems (IS) data, two of the more critical are:

1. We respond slowly. A user cannot always predict what data they will need tomorrow, much less in three months or three years. At times they don't know what they want until they see what they can get. When it takes that long to get results out of IS, users don't even bother to ask.

2. They don't trust the data. Too often systems which are designed to fill one need are pressed into service to fill another. Ingenious users figure out how to make the system work for them, even when it means breaking some rules and putting data into the system which makes perfect sense from their respect, but destroys data integrity from an IS perspective.

A prime example at several companies is the practice of putting intelligence into part numbers, such as "ABC#546 004065". Trying to pull the business rule out of that data element is always interesting in 20 year old systems. "Well if its ABC, then the next portion is going to be the original part number we assigned them, but then we started buying more than 999 parts from them and we had to shift all the new numbers by one digit, then...".

By the time it comes for us to get data out,there are all sorts of rules that we no longer know about or cannot deduce. At times we flat out have bugs which destroy the data integrity. The first thing any sensible IS customer does upon getting a new report from us is to sit down and check the data.

The best example I can give FOR a data warehouse is a user who came to me back in September with a request. "Give me a total for failures for these components across product lines." Getting that alone was interesting and took a day or two, first figuring out where the data lived (not my system), then figuring out how to not select the data he didn't want. Then letting the query run once (45 minutes), twice, five times while I worked out the kinks. Then he comes back with "Interesting - I don't believe it"; sure enough I still had a kink in there. Then "Well can you split out shop-floor failures vs. returns failures?"; another couple of days. By then I was beginning to figure out that I had better start pulling the base data he needed into its own table just to deal with him. "Ok, now break it out by component, type of failure...", ya-di-da-di-da. Finally, "put it into a format I can import into Excel." When all was said and done it took three weeks. On top of that, the product I delivered was data, not information. The user still had work to do in Excel before he found the information he sought.

In a data warehouse environment, the user gets a nice GUI interface (whatever it is - I don't care) - he gets to point and click to join things together. If what he needs doesn't exist, he can build it or make a simple request to the warehouse and it will be built for him. His queries will take half-an-hour to run, but he'll be able to get the data together that he needs by the end of the day, not the end of the month.

So what in fact is a warehouse?

Essentially it is a repository. Operational systems send data to it. This data is saved for eternity or the next best thing. The data is reorganized into a view the user would like to see and then placed in a location where the user can best view the data given their tool. The warehouse does not contain information related to the day to day running of the operational system. Key elements which drop out in data collection are flags, fields which indicate that some process or other has occurred or which indicate some relationship with other data. Since the data in the warehouse is perfectly normalized, (right) there is no need for data which indicates a relationship with other data different than the norm.

The data warehouse is a collection of data from diverse systems which is integrated (definition AND data), write-only (so old data never goes away), and geared towards user requests, NOT transaction processing. Instead of forcing the user to join fourteen tables with 'x' criteria, you build a view just for them. If you've set things up in a warehouse-style environment, this is not as difficult as from transactional systems. As the user dreams up new ways of demanding data, a warehouse can respond more effectively.

I still seek wisdom on how to build a warehouse. My current effort is to build an environment where a warehouse can reside - in other words, I'm a novice. What is scary is that people keep coming to me to ask how I'm doing it, as though I were a guru on some IS mountain. Scarier still is that I have yet to run into another warehouse person who is building an environment similar to ours. I'm either in front of or behind the 8-ball.

I would welcome anyone who cares to start a discussion on the topic. I've invented 90% of what we've done here and think its right, but know there are things I haven't thought of yet. What I've seen to date is that it's real easy to take short-cuts and do things wrong. I've tried to stay on the straight and narrow and most of the work to date has been conceptual - not coding, although the little code that there is to drive this thing can be rather deep.

What are the parts?

Our warehouse consists of three "non-moving" parts and a number of "moving" parts:

Non-moving Part 1 - Standard Archive

Of all of the data that we currently store and plan to store in the future, there will probably only be a business for some portion. In the future some customer is going to come forth and request historical information. The standard archive is a place to keep this information offline, but where it can easily be called back. In our case we converted all of the incoming information into an 'open' format; we resolve decimal points, use a common date format, convert currencies into dollars, and such. This data is then compressed and stored offline. This level of data is further categorized by the level of detail that goes into it. In our case every transaction which has occurred is stored there.

Non-moving Part 2 - Subject Inventories

Data which has a current use is stored in a normalized form in several SQL databases, one per subject matter. For example, one of the databases is called "part" and has complete information describing all parts used by our company. Another is called "supplier", and so forth. Yes, there is a relationship between part and supplier in that the latter provides the former. But there are also other warehouse projects underway within the company, and at some point we will discover that another division has a better, or more cost effective supplier subject inventory than ours and we will want to use theirs instead of our own. By organizing the databases by subject matter in this fashion, and given the distributive power of SQL (Informix in our case), it doesn't matter whether the supplier subject inventory is housed at our site or at some remote location.

We are also concerned at this early date about the potential size of any of these subject inventories. Using this organization it is a simple matter to move some excessively large databases off onto a separate platform.

At this level of the data we also 'lightly' summarize the data. The types of queries that will be run against this level are not likely to want the exhaustive detail of an operational system. We have still kept that detail back at the archive level in case we need to rebuild things a different way.

Non-moving Part 3 - Business Views

Information that's stored in a subject inventory is great for us IS folk. We can easily develop a query to pull out the data we need, but this same structure is impossible for a user to navigate, and users will undoubtedly make dreadfully expensive queries against these data stores. Finally, their tools may not be geared toward access of information from our particular subject inventories. So we give them a third layer of information called a Business View. As we enter this view we leave normalization at the door, the goal here is to organize the data in the manner that the customer will need it. Information is highly summarized (e.g. monthly or yearly totals). In this manner the customer need not worry about how to navigate some obscure SQL structure. As they need new business view information, they can either access the subject inventory directly and retrieve it, or ask us and we will build them a new view which we can do rapidly because we have all of the data neatly stashed away in our subject inventory.

The Moving Parts

The first of these are the strip or extraction programs. These retrieve data from the operational systems and deliver them to the warehouse for storage. Prism has a package which will retrieve this sort of information for you. In our case we deemed the coding of these programs to be so trivial that we opted to do just that. Data is stripped out of the operational systems and then delivered (by ftp) to our Unix warehouse platform.

At that point each data flow is checked and converted into our open format. There is a single program which reads the metadata to determine what is contained in each file. This data is then converted according to rules in the metadata. Any errors are rejected and noted. The data is then spit out in the open format.

That data is then routed, first into an SQL work area, and secondly to the archive itself. Unfortunately many of the systems we retrieve data from have no 'date last updated' field to determine if it is new data or not. In other words, we accept a snapshot each time. We worked around this by archiving a difference between the current snapshot and the most recent snapshot. This difference is stored in script form so that it can be applied directly against a preceding snapshot to make it current. Every 'n' occurrences we do a full archive of the snapshot, so that to rebuild a given day's worth of data, we need only go back to the last full snapshot and then apply all difference scripts to make it current for a given day. At some point in the future we will write our own difference processor to store data a little more efficiently. In the meantime we use the Unix diff -s command.

We call the SQL work area a 'reservoir' to denote that data flows to it and is held until needed. Periodically, according to a schedule, a number of integration programs are run which build subject inventory data from this reservoir data. This same mechanism is used to populate business views from the subject inventories.

All of the information required to process each of these data flows is collected and stored in the metadata. All that is required to build a new data flow is to make an entry in the metadata describing what should be included. From this new files can be accepted, data can be unearthed from the archives, and the integration programs are generated to populate downstream portions of the warehouse.

Finally the data is delivered to the business view where the users have their own tools to view it. In one case, the data is delivered to a multi-dimensional database with a specific front end built for it. In another case, we ship the data to a Powerplay application, and in yet a third case, we deliver data to an SQL database where generic tools can access it.

As new tools become available, or as new demands are placed on our data structures we can quickly build new views. In short, we can now meet new data demands within a 48 hour time frame in most cases where we store the data. As we work further on our warehouse and automate more of its functionality my goal is to average six hours between request and delivery.

Building a warehouse - putting the rubber on the road.

Charter

One the challenges of building a warehouse is the political aspect to the effort. The data you will collect resides in multiple systems across multiple environments owned by multiple people. Many of the people have no idea what a warehouse is and will see any effort to use their data in another system as a threat to the system they support, or to the integrity of their data. You must explain to them all what a warehouse is, that the data will not be coming back, so their fears of mangled data are groundless, and that the warehouse will be relieving them of the need to spin off new reporting capabilities.

Perhaps the second largest challenge is that everybody has some concept of what a warehouse should be, and they don't necessarily match. Take the first couple of weeks or even months and discuss what exactly you desire your warehouse to look like, what it will and will not do. We spent the first month of a three month project defining a set of principles to live by. I wish we could have spent longer, but we needed results, so we went with the basics.

Define pieces

There is a utopian warehouse that is probably impossible to build, or which would take several years to launch, your manager wants results NOW. The opposite extreme is yet another database which culls data from everywhere, which you've put together with bailing wire and into which anybody can put whatever they like. It is important to steer a course between these two extremes. In our case we got a prototype out the door in a hurry, and are now spending time improving that prototype. One of our key founding principles is that we are careful about what data we allow into the warehouse.

Communicate

I cannot stress enough the importance of communication. While this is true for any large project, this is especially true for a warehouse project. In an environment where there are many diverse professionals running around, each with their expertise and background, the importance of keeping them all pushing on the boulder from the same direction cannot be over-emphasized.

Summary

There are many more issues than I have covered here. Some of the more important ones we are currently wrestling with include how to handle data over time, where to get data that has no source system, how to predict what the usage and size will be, or what hardware we will need to support it.

Two summers ago we had an intern develop a executive information system which collected and integrated data from eight different systems. Users access this Informix SQL database through a simple Windows based spreadsheet. There are may things that he didn't handle - he used hard coded data flows, there is no history available, and the subject matter is very limited. But in three months this fellow put together what is essentially an information warehouse that even now feeds customers data they were screaming for. They are deliriously happy with the system and usage of it is continually growing. This little system put together by a novice (who did an excellent job) also severely cut back on the number of one-time information requests which bothered developers.

In short, although he and the customers didn't realize it, a little information warehouse which manages barely to pass the criteria for a business view had a tremendously beneficial impact.

Should YOU build a warehouse?

The first thing to ask is what sort of questions are your customers likely to ask in the future. If these questions can be easily handled by your current information systems, then probably not. If you've got a mess of thirty different operational systems which don't interface very well, and where your customers complain about not being able to get information, or not believing the information they get then undoubtedly yes. I leave it to you to figure out where you fall on that continuum.

If your answer is no or not right now, then you should still consider the future. 80% of the work involved in building a warehouse is in sourcing the information - figuring out where each data element comes from, defining them, ensuring that the source has adequate editing capabilities to provide correct data. This information is much easier to consider and collect at the time that these systems are written than by later generations.

Sources

I highly recommend Bill Inmon's book 'Building the Data Warehouse' for a more in depth discussion of what a warehouse is and why build one. Check out the Data Warehousing Institute in Washington DC. Also, Richard Irwin teaches a class on Data Warehousing. I only managed to catch a video-tape of it, but as indicated at the beginning, I found that he explained the concept very clearly.

____________________________________________________________________________

Jack Parker - Hewlett Packard, BSMC, Boise, Idaho, USA

jparker@hpbs3645.boi.hp.com


Schema Optimization

by Kevin Fennimore

The database schema is comprised of table definitions, index definitions, permissions, etc. Optimizing the schema involves creating tables that make efficient use of system resources( i.e. disk space ) and allow the database to access the data in an efficient manner. An inefficient schema can result in tables that require more disk space, are harder to maintain and require more I/O to access the data. All of this can lead to poor performance.

Column Definitions

The first area of optimization is the type of the columns defined in the table. Often times, character strings are used in place of integers. This is sometimes a result of standards that require all fields that are not used in mathematical calculations to be character. This can be inefficient for two reasons. First of which is space. An eight character field that only contains numeric values uses eight bytes for each record. If this field were numeric, such as an integer, it would be big enough to store the data and it would only use four bytes of storage per record. This means that if the column is used as a key in an index the key size will be smaller and the records size will be smaller. The benefits of smaller row sizes are discussed later.

The second inefficiency is in comparisons. It takes longer for the database engine to compare two eight character fields than two integer fields. With character strings the engine has to compare each character in the string to determine equality. With integers only one comparison is necessary. The engine is a little more efficient when comparing character strings than comparing each character in the string but still not as efficient as comparing two integers.

Another issue with column definitions is related to the size of the fields. It is very common for fields to be oversized. Especially when the database is first designed and it is unclear what the data will actually look like. For example if a price column was defined in the database as money(11,2), it would be able to hold a value up to $999,999,999.99. That is an awfully expense piece of merchandise. Defining the column as money( 5,2 ) would allow for values up to $999.99 and save two to three bytes per record.

The same concept applies to other data types such as datetime. Most applications define these fields as datetime year to fraction(5). The problem with this is that most UNIX machines do not support fractions of a second past the hundredths or thousandths. Some of these space savings may seem trivial but a few bytes here and there on a table that contains 1 million rows starts to add up.

Primary/Foreign Keys

Another common schema design mistake is to use large primary keys as the foreign keys in other tables. For example, if the primary key of an orders table consisted of a 20 character order number, it would not be efficient to use that primary key as the foreign key in the order items table.

An alternative would be to define a serial field in the orders table and carry the serial value for an order into the order items table. This saves space in the order items table and speeds up the join between to the two tables since they would be joined by an integer field. A drawback to this is that the order items table could not be accessed directly by the order number. The orders table would have to be accessed first.

"Denormalization"

Performance gains can also be realized by "denormalizing" the database. Typically a normalized database does not contain derived data. However, sometimes derived data can be a great help for performance. For example, let's assume that a commonly queried value is the total price of an order and it is important for this query to return as quickly as possible.

One way to perform this query would be to sum the total for all of the items associated with an order. For a system with a few items for each order this might not be a problem. But if there were several hundred items for each order it would require more I/O to get the sum. An alternative is to keep an order total field in the order table. Now to get the order total for a given order only reading one record from the database is required.

Another example of denormalizing for performance would be repeating data in a detail table. Let's say we have a table which contains information about cartons in inventory and a detail table about the contents of each carton. The carton table might contain the status of the carton while the carton_detail would contain the actual items in the carton. If we were looking for all of the available cartons in inventory that contained a specific item we would have to join the carton and carton_detail tables as follows:

select *

from carton, carton_detail

where carton.id = carton_detail.id

and carton.status = "A"

and carton_detail.item = some_item

If this was a very common query it might be better to move the status to the carton_detail table so that the carton table would not need to be joined. In this case an index could be built on item and status in the carton_detail table which would make the query run very quickly.

Typically, "denormalization" is frowned upon because of the additional overhead and programming needed to maintain the derived and repetitive data. There is also the risk of compromising data integrity if the derived or repeated data is not updated properly. For example if the status in the carton table is updated but the status in the carton_detail records is not there is a data integrity problem. All of these problems can be addressed by using triggers and stored procedures to maintain the denormalized data. There are trade-offs when the schema is denormalized but the key is to achieve better query performance.

Wide Tables

It is common to store all of the information about an entity( i.e. order ) in one table which might produce a row that is very large. A table which has a large row size is called a wide table. Wide tables are inefficient for several reasons, one of which is the amount of caching that is done.

With OnLine, when a row is read from the disk, the entire page containing that row is read into shared memory. Then the row is read from shared memory. If additional rows are needed that are on pages already in shared memory they are read from memory instead of being physically read from the disk( this is called caching ). This means that if more rows can be put on a page, there will be more rows eligible to be read from memory.

Standard Engine is at the mercy of the operating system with regards to the amount of caching that is performed. However, if caching is performed then having more rows in the cache buffer can mean less physical reads.

Another performance problem with wide tables is logging. When a row is updated, inserted or deleted in a database with logging, the whole record is written to the transaction log. Updates are the biggest problem since an update causes the before image and the after image of the record to be written to disk.

Let's look at an example of an order table that contains all of the information possible about an order( i.e. it's status, billing information and shipping information ). Let's assume that the record is 300 bytes long( 20 for status information, 80 for billing and 200 for shipping ).

When an order is placed all of this information is entered into one table. As the order is processed the status information is accessed frequently and sometimes changed but the billing and shipping information is accessed infrequently and never changed. When the status_flag( one character ) is updated, which is the most common update, the before and after image of the entire record, for a total of 600 bytes, is written to the transaction log. This is a lot of overhead for an update to a one character field. Also when the status information of a record is accessed the entire row is read. With OnLine the entire page containing the row is read. This means that 280 bytes are read and not used.

To correct these inefficiencies the table should be broken up into three tables with a one-to-one correspondence. There would be an order_status table( 20 bytes ), an order_billing table( 80 bytes + order_num ) and an order_shipping table( 200 bytes + order_num ). In this case when the order_status is updated only 40 bytes would be written to the log. Also, in an OnLine system with a 2K page size, there would be 102 records per page in the order_status table as opposed to 7 records per page when the record size was 300 bytes. This means more rows will be cached and performance will be increased.

Long Tables

A long table is one that has many rows. Typically, systems will contain data for several years but will only access recent data for day-today activities. This might be the case with a patient information system. Data for patient visits might be retained for years but most of the daily activity and reporting is be done on the most recent three months. All of the data prior to the last three months could be considered historical and has a negative impact on the daily operations of the database since it is still part of indexes and must be scanned when searching for "current" data.

To improve performance, a historical table could be created that would contain all of the data older than 3 months. This would give the programs the ability to just use the current table or to also select from the historical table if necessary. Selecting from the two tables can be accomplished by using a union.

A table that contains rows that have distinct purposes can also create long tables. An example might be a log table that contains log information that is uploaded to a central machine and log records that are just used for local tracking. This type of table could be broken up into separate tables, one for local tracking and one for uploaded information. This would improve performance for selects run for just local tracking information or for just uploaded information.

Another way that long tables can be broken up is to use table fragmenting which is available with OnLine DSA 7.10. This allows the ability to break up a table into separate fragments based on a value or round-robin. In the historical data example above the table might be fragmented by a date with the most recent three months in one fragment and the historical information in another. This would give the application the appearance of one table. However, if a query was run with a date criteria that was within the last three months the optimizer would only look at the fragment with the most recent three months and would skip the other fragment.

These are some of the most common problems with schema design and there are probably many more. As with query optimization, the goal of schema optimization is to reduce I/O and processing time. By using the right data types, using appropriate foreign keys, denormalizing the database and splitting wide and long tables the amount of I/O and processing time for queries can be greatly reduced an thus increase performance.

Kevin Fennimore

UCI Consulting

(301) 540-7113


Informix Archive Sites

by Walt Hultgren

This is a listing of sites known to contain archives of Informix-related files and documentation. Please send any additions or changes to this listing to Walt Hultgren <walt@rmy.emory.edu>.

ARCHIVE SITES

Name: cc1.kuleuven.ac.be IP: 134.58.8.32 Location: Belgium, GMT + 1 Access: World Wide Web/Mosaic using URL http://cc1.kuleuven.ac.be/student/informix-faq.html Contact: Bert Barbe <stud01@cc4.kuleuven.ac.be>, GUTEST8@BLEKUL11.BITNET Contents: HTML version of the Informix FAQ listing.

Name: ftp.demon.co.uk IP: 158.152.1.69 Location: United Kingdom, local time = GMT Access: Anonymous FTP Directory: /pub/perl/db/isqlperl Contact: Bill Hails <bill@tardis.co.uk> Contents: Files related to interfacing the perl programming language with Informix databases using isqlperl, developed by Bill Hails.

Name: kcbbs.gen.nz IP: 202.14.102.1 Location: New Zealand, GMT + 12 Access: Anonymous FTP Directory: /informix Contact: Kerry Sainsbury <kerry@kcbbs.gen.nz> Contents: Files related to the Informix FAQ listing (primary site).

Name: mathcs.emory.edu IP: 128.140.2.1 Location: United States, Eastern time zone, GMT - 5 Access: Anonymous FTP Directory: /pub/informix Contact: Walt Hultgren <walt@rmy.emory.edu> Contents: Utilities, documentation and a complete archive of informix-list and comp.databases.informix articles. Also mirrors files related to the Informix FAQ Listing from kcbbs.gen.nz.

Name: quasar.ucar.edu IP: 128.117.71.21 Location: United States, Mountain time zone, GMT - 7Access: Anonymous FTP, World Wide Web/Mosaic using URL http://quasar.ucar.edu/ Directory: /pub/Informix_Archive Contact: Mike Reetz <reetz@ncar.ucar.edu> Contents: Mirror of the Informix section of mathcs.emory.edu, updated nightly.

Name: InformixLink Location: United States, Pacific time zone, GMT - 8 Access: Interactive dial-up via local telephone numbers world-wide. Contact: Informix Software, Inc., +1 800 331 1763 (USA), or local office.Sheryl Ingravallo <sheryl@informix.com>, +1 415 926 6355 Contents: Bug lists, work-arounds, technical tips, product and marketing information. Recent comp.databases.informix articles with text capability.

NOTES

Most of these systems have primary purposes other than being public archives. Please note the time zone local to each system, and try to access it during off-peak hours. This is especially important if you plan to retrieve many files at one time.

mathcs.emory.edu

The system's full domain-style name is emory.mathcs.emory.edu. Note that this is not the same system as the home of the Informix mailing list. The first files you should get from the archive are "README" and "ls-lR" in the directory /pub/informix.

InformixLink

InformixLink is an on-line electronic information service offered by Informix Software, Inc. A Basic level of access is provided at no charge to Informix maintenance customers, with Full access available for a subscription fee. World-wide access is provided via local or Public Data Network dial-up.

USING ANONYMOUS FTP

If you've never used anonymous FTP, here is a quick primer, in the form of a sample session on mathcs.emory.edu with comments in {}. Lines where you type something are marked with "<<<":

% ftp mathcs.emory.edu{or: ftp 128.140.2.1}  <<<   Connected to
mathcs.emory.edu.
220 emory FTP server (SunOS 4.1) ready.
Name (mathcs.emory.edu:walt): anonymous                               <<<  
331 Guest login ok, send ident as password.
Password: walt@rmy.emory.edu           {use your e-mail address here} <<<  
230 Guest login ok, access restrictions apply.
ftp> cd pub/informix                                                  <<<  
250 CWD command successful.
ftp> get README                                                       <<<  
200 PORT command successful.
226 ASCII Transfer complete.
ftp> get ls-lR                                                        <<<  
200 PORT command successful.
226 ASCII Transfer complete.
          .
          .       {repeat this step for each file you want}           .
ftp> quit                                                             <<<  
221 Goodbye.
%

USING AN FTP MAIL SERVER TO DO ANONYMOUS FTP VIA E-MAIL

There are sites connected to the Internet that provide mail servers for use by anyone who doesn't have access to FTP. Such a server acts as an intermediary that takes your request by e-mail, gets the files you want from the FTP site you specify, then e-mails those files back to you. The following sites either have publicized themselves for general use, or have specifically stated they are available for retrieving files from mathcs:

  • bitftp@pucc.princeton.edu Princeton University, Princeton, NJ, USA (*)
  • ftpmail@cs.uow.edu.au University of Wollongong, N.S.W., Australia
  • ftpmail@decwrl.dec.com DEC Western Research Lab, Palo Alto, CA, USA
To save wear and tear on the Net, use the site nearest to yours. To obtain a help message, send e-mail to the appropriate server address above consisting of the word "help". The instructions for that server will be returned to you in an e-mail message.

Walt Hultgren, Internet: walt@rmy.emory.edu (IP 128.140.8.1), Emory University, UUCP: {...,gatech,rutgers,uunet}!emory!rmy!walt, 954 Gatewood Road, NE, BITNET: walt@EMOR, Atlanta, GA 30329, USA, Voice: +1 404 727 0648


WRITING REUSEABLE INFORMIX-4GL CODE (or: Why I love Modular Variables)

by Kerry Sainsbury

I want to explain how to create easily reuseable and extendable software components. Object Oriented Programmers use some of these buzzwords, we can too if you like:

Encapsulation: All code related to a task lives in a single place, with the actual implementation hidden from the user code.

Inheritance: The ability to take existing functionality and extend it without breaking existing user code.

ENCAPSULATION

Imagine we need a program to control a simple vending machine :-) Each machine needs to be able to:

  • accept money
  • accept product selection
  • deliver product
  • give change
  • record the sale

We will need to write a different program for each product manufacturer to cope with their own special requirements, but these are the essential functions. Here are some solutions we could adopt:

A. The MAIN way.

We could code the entire routine in one MAIN section, without any functions, and use local variables to keep track of data.

     main_eat.4gl:
          MAIN
          DEFINE l_money, l_product, l_cost, l_change...
               DISPLAY "EAT FOOD" -- Special code for this vendor
               INPUT l_money...
               INPUT l_product ...
               DISPLAY "Here is your ", l_product
               SELECT l_cost FROM price WHERE product = l_product
               LET l_change = l_money - l_cost
               DISPLAY "Here is your change: ", l_change
               INSERT INTO sales(product, day) VALUES (l_product, TODAY)
               DISPLAY "THANKS FOR EATING FOOD" -- More vendor-specific code
          END MAIN
This is a poor solution because the only way we can reuse this code (for example in a later model machine with the ability to disallow selection of out-of-stock product, or the ability to vend coffee) is to cut and paste the code into a new program.

Similarly if we later wish to change the recording of sales to include TIME of purchase we will need to remember to alter both the original vending program, that of the later model, and that of the coffee machine.

B. The GLOBAL way.

We could code each routine individually and communicate between the routines via global variables.

     main_drink.4gl:
          GLOBALS g_money
          MAIN
               DISPLAY "DRINK SUGAR"    -- Vendor specific
               CALL enter_money()
               DISPLAY "Thanks for the $", g_money -- Vendor specific
               CALL enter_product()
               CALL deliver_product()
               CALL give_change()
               CALL record_sale()
          END MAIN

     vend.4gl:
          GLOBALS g_money, g_product, g_cost, g_change...

          FUNCTION enter_money()
               INPUT g_money...
          END FUNCTION

          FUNCTION enter_product()
               INPUT g_product ...
          END FUNCTION

          FUNCTION deliver_product()
               DISPLAY "Here is your ", g_product
          END FUNCTION

          FUNCTION give_change()
               SELECT g_cost FROM price WHERE product = g_product
               LET g_change = g_money - g_cost
               DISPLAY "Here is your change: ", g_change
          END FUNCTION

          FUNCTION record_sale()
               INSERT INTO sales(product, day) VALUES (g_product, TODAY)
          END FUNCTION
This looks like a reasonable solution except that a maintenance programmer has no idea what function sets "g_money" on the vendor-specific line in MAIN, and sharing global variables between programs is a VERY messy business.

The programmer also needs to somehow know that "g_money" is the global variable he needs to use, and hope that "g_money" doesn't already exist in his program when he finds he needs to add an interface to vend.4gl (remember in the real world it's unlikely to be something as clear-cut as a vending machine!)

C. The LOCAL way.

We could code each routine individually and communicate between the routines via local variables.

     main_chocolate.4gl:
          MAIN
          DEFINE l_money, l_product, l_money, l_cost, l_change
               DISPLAY "EAT CHOCOLATE"  -- Vendor specific
               LET l_money = enter_money()
               DISPLAY "Thanks for the $", l_money -- Vendor specific
               LET l_product = enter_product()
               CALL deliver_product(l_product)
               CALL give_change(l_product, l_money)
                    RETURNING l_cost, l_change
               CALL record_sale(l_product)
          END MAIN

     vend.4gl:
          FUNCTION enter_money()
          DEFINE l_money
               INPUT l_money...
               RETURN l_money
          END FUNCTION

          FUNCTION enter_product()
          DEFINE l_product
               INPUT l_product ...
               RETURN l_product
          END FUNCTION

          FUNCTION deliver_product(l_product)
          DEFINE l_product
               DISPLAY "Here is your ", l_product
          END FUNCTION

          FUNCTION give_change(l_product, l_money)
          DEFINE l_product, l_money, l_cost, l_change
               SELECT l_cost FROM price WHERE product = l_product
               LET l_change = l_money - l_cost
               DISPLAY "Here is your change: ", l_change
               RETURN l_change
          END FUNCTION

          FUNCTION record_sale(l_product)
          DEFINE l_product
               INSERT INTO sales(product, day) VALUES (l_product, TODAY)
          END FUNCTION
This version greatly increases programmer understanding of the data flow between the routines, but introduces many variables to the calling program which aren't actually required by it. The only variable required by main_chocolate.4gl is "l_money", with the other four variables being passed back and forth only for vend.4gl's benefit. Hardly encapsualisation!

MAIN_CHOCOLATE SHOULD NOT NEED TO KNOW HOW VEND.4GL WORKS.

If a decision is later made to record the amount of change in the "sales" table then each main_.4gl would need to be altered to pass l_change into record_sale, so that it could be INSERTed into "sales".

D. The MODULAR way.

We code the routines individually and communicate via modular variables. Yes - this is the sexy solution:

main_condom.4gl:
          MAIN
          DEFINE l_money
               DISPLAY "HAVE SAFE SEX"  -- Vendor specific
               CALL enter_money()
               LET l_money = get_vend_money()
               DISPLAY "Thanks for the $", l_money -- Vendor specific
               CALL enter_product()
               CALL deliver_product()
               CALL give_change()
               CALL record_sale()
          END MAIN

     vend.4gl:
          DEFINE m_money, m_product, m_cost, m_change...

          FUNCTION enter_money()
               INPUT m_money...
          END FUNCTION

          FUNCTION enter_product()
               INPUT m_product ...
          END FUNCTION

          FUNCTION deliver_product()
               DISPLAY "Here is your ", m_product
          END FUNCTION

          FUNCTION give_change()
               SELECT m_cost FROM price WHERE product = m_product
               LET m_change = m_money - m_cost
               DISPLAY "Here is your change: ", m_change
          END FUNCTION

          FUNCTION record_sale()
               INSERT INTO sales(product, day) VALUES (m_product, TODAY)
          END FUNCTION

-- Here are the new routines...

          FUNCTION get_vend_money()     -- NEW!!!
               RETURN m_money           -- NEW!!!
          END FUNCTION                  -- NEW!!!

          FUNCTION get_vend_product()   -- NEW!!!
               RETURN m_product         -- NEW!!!
          END FUNCTION                  -- NEW!!!
This, in my humble opinion, is just glorious. The main_condom.4gl is clear and uncluttered, and has entirely no idea about how vend.4gl goes about its business.

Interfaces to vend.4gl variables are via the get_ functions. main_routines need only call the get_ functions for the variables they are interested in.

As new functionality is required it is only vend.4gl which requires modification - all main_.4gls just need recompiling. For example: Recording change in the "sales" table is just a matter of adding m_change to the list of inserted columns.

INHERITANCE

I've shown how we can encapsulate functionality via modular variables. Now I want to show how we can use modular variables to expand on the basic functionality of our vending machine, to include features specific to new vending machines without breaking our old machines.

- New vending machines are able to identify when they are out of stock of a product.

New vending machines would have a main section looking like this:

     main_new.4gl:
          MAIN
          DEFINE l_money
               DISPLAY "BUY *NEW* THINGS"    -- Vendor specific
               CALL define_new_machine()     -- NEW!!!!
               CALL enter_money()
               LET l_money = get_vend_money()
               DISPLAY "Thanks for the $", l_money -- Vendor specific
               CALL enter_product()
               CALL deliver_product()
               CALL give_change()
               CALL record_sale()
          END MAIN

     vend.4gl:
          DEFINE m_money, m_product, m_cost, m_change,
                 m_new_machine  -- NEW!!!
               [ unchanged routines not reproduced here ]

          FUNCTION define_new_machine()
               LET m_new_machine = TRUE
          END FUNCTION

          FUNCTION enter_product()
          DEFINE   l_got_product

               LET l_got_product = FALSE
               WHILE NOT l_got_product
                    INPUT m_product ...
                    IF m_new_machine THEN
                         SELECT onhand FROM stock WHERE product = m_product
                         IF onhand = 0 THEN
                              DISPLAY "Sorry - out of stock of ", m_product
                         ELSE
                              LET l_got_product = TRUE
                              LET m_new_machine = FALSE
                         END IF
                    ELSE
                         LET l_got_product = TRUE
                    END IF
               END WHILE
          END FUNCTION
The new "define_new_machine" function simply sets a flag for later use inside "enter_product". The beauty of this is that when we recompile vend.4gl against our *old* programs the code will not break - old programs don't set m_new_machine to TRUE, so none of the code dependant on that variable will be executed.

Note that it's important to reset such flags to FALSE as soon as you've finished with them so that subsequent calls to the routine don't pick up your settings. For example:

Imagine a routine which usually displays a message in RED, BLINKing text on line 3 of your screen, but on some occasions needs to display on other lines.

ugly_blink.4gl
     DEFINE m_row   SMALLINT
     FUNCTION define_blink_line(l_row)
     DEFINE   l_row    SMALLINT
          LET m_row = l_row
     END FUNCTION

     FUNCTION display_red_blink()
          IF m_row = 0 THEN
               LET m_row = 3
          END IF
          DISPLAY "UGLY MESSAGE" AT m_row, 1 ATTRIBUTE(RED, BLINK)
          LET m_row = 0  -- Reset row indicator
     END FUNCTION
It's important that display_red_blink() resets m_row otherwise the following code would display on line 5 twice, rather than line 5 and line 3:

     CALL define_blink_line(5)     -- Set to line 5
     CALL display_red_blink()      -- Display on line 5 and reset to normal
     CALL display_red_blink()      -- Now display on normal line 3
Let me know what you think of this technique.

Kerry Sainsbury, kerry@kcbbs.gen.nz | THE INFORMIX FAQ v2.2 Mar 95
Quanta Systems, Auckland            | kcbbs.gen.nz:/informix/informix.[faq|apx]
New Zealand. Work: +64 9 377-4473   | mathcs.emory.edu:/pub/informix/faq/ " "
Home: +64 9 279-3571                | http://www.garpac.com/informix.html


Washington Area Informix User Group Sponsorship

The following options are available for companies who would like to participate in our activities:

Presentation at Meetings

The user group meets every other month. We would like to have one presentation per meeting from vendors that have products that work with Informix. If you would like to make a presentation at a meeting please contact Lester Knutsen at 703-256-0267.

Newsletter Sponsorship

The newsletter is produced quarterly. Each mailing goes to over 600 users in the Washington area. Companies sponsoring the newsletter may place a one page ad. If you would like to sponsor an issue, please contact Lester Knutsen at 703-256-0267.

Forum 1995 - An Informix User Group One Day Event

We are planning a one day user group forum in November. John Petruzzi is the coordinator for this event. If you would be interested in demonstrating your products, speaking or conducting a seminar please contact John Petruzzi at 703-490-4598.


This Newsletter is published by the Washington Area Informix Users Group.

Lester Knutsen, President/Editor

Washington Area Informix Users Group

4216 Evergreen Lane, Suite 136, Annandale, VA 22003

Phone: 703-256-0267

lester@access.digex.net