Volume IV (Dec 1998)

Take care when adding a join to an existing query

Posted: 12/31/98

You should take care when adding a join to an existing query. By doing so, you may introduce a new table that has a column name that is the same as another already in the query. The impact of this can range from annoying to disastrous.

On the annoying end, this may cause the result of the query (in a CFOUTPUT or variable assignment referencing the column name) to use the value from the "new" table rather than that from the "old" one as expected.

On the disastrous end, if the column with the same name is used in the selection of data in the query, it could cause the QUERY to return incorrect results (fortunately, some obvious mistakes of this sort are trapped by the ODBC driver and rejected, but the error has burned us before!) Some examples of each situation will demonstrate.

Incorrect Output

Suppose you have the following query, referencing a table called departments, with the columns deptid, deptname, and manager. We want to show the manager for each dept. Simple enough:
    <cfquery datasource="bni" name="getdept">
    	select *
    	from departments
    <cfoutput query="getdept">
    	#getdept.deptname#: #getdept.manager# 
This query will display the names of the departments and the manager for each. Now suppose you are asked to also show the name of the section (with departments belonging to a higher-level section in the organizational hierarchy). Let's presume that there's an additional column on the departments table, called sectionid.

Simple enough, we just need to join to the Sections table to get the section name. Here's where the trouble can be introduced. Let's say that the sections table has the columns sectionid, sectionname, and--here's the problem--manager. A section has a manager. Nothing wrong with that, but note that the column is named the same in both tables.

But if we're not careful, we'll make a big logical mistake. Let's say we simply add the needed join and make no other changes to the query or CFOUTPUT section, as in:
    <cfquery datasource="bni" name="getdept">
      select *
      from departments 
      inner join sections 
      on departments.sectionid = sections.sectionid
    <cfoutput query="getdept">
      #getdept.deptname# (#getdept.sectionname#):
    #getdept.manager# </cfoutput>

Looks ok, but the results will be that the name shown for the manager will be that of the SECTION manager! Not the DEPARTMENT manager. Since they both have a column of the same name, the CFQUERY will return (at least in Access) the value of the column in the second table listed in the join. That's right, if the tables were reversed in the join, it would work as expected, but that's not the natural thing to do in this situation, since one is joining the "new" (or foriegn key) table to the existing one, it would be more natural to list it second.

This is so insideous because there is no actual error to indicate that it's happened, and if you're not really familiar with the data, you may not even notice the error. Only later when someone who knows the manager for a given department and points out the mistake will you think to look at it. And even then, you may be inclined to assume that it's a data error. But it's not.

The simplest way to solve this, when you notice the error, is to simply specifically reference the column name you are looking for with an alias in the SELECT list of the query, as in:
    <cfquery datasource="bni" name="getdept">
    	select *, departments.manager as deptmanager
    	from departments 
    	inner join sections 
Then you would change any references to manager to be deptmanager. At least now it's no longer ambiguous as to which manager you mean.

Of course, another approach would be to rename the columns in the tables, so that the names indicate this distinction and become deptmanager and sectionmanager, in departments and sections, respectively. But that's a risky move as it may impact many existing programs. Still, it's wise to keep this in mind when developing new tables, so as to preclude the problem before it bites you in the rear.

You'll note that this was done to a degree in these tables where the "id" and "name" columns were effectively distinguished, as deptid and deptname for example when they could have simply been id and name. If that were the case, this problem would have moved from simply annoying to disastrous!

Selecting Data

This tip will be expanded soon to include an example of this error. Fortunately, the ODBC driver catches some obvious errors that might be caused by this mistake, but there have been cases where we've been burned. We'll try to find a good example.

Segmenting a single Verity collection

Posted: 12/14/98

If your sites are hosted with a commercial provider, chances are you must pay for separate Verity collections. (A Verity collection is the "database" in which data is indexed for use by the CFSEARCH tag.)

There is a way to "segment" a single collection to serve disparate purposes, such that one "segment" provides results for one set of searches while another "segment" provides results for another.

This can be useful, for instance, if you are indexing files in two different sets of directories, on entirely separate subjects. You don't want the searches run for one set to return records in the other set. Yet, you don't want to pay for two separate collections. Well, you don't have to!

There may be other ways to do this, or instances in which this approach won't work (such as when indexing query data rather than file contents), but the approach will help you see possibilities that may work in those other situations.

The approach outlined here merely takes advantage of the fact that, since in our example we are indexing entirely different directories, we can know that the url for the path to these files will be different from each other.

On the simplest level, you need simply specify some additional CFSEARCH criteria to name the distinct path. One way to do this is with the criteria:
    cf_key <contains> "directoryname"
Where "directoryname" is the name of the directory that is distinct for one file group versus the other. The brackets around "contains" are part of the verity search criteria.

If you have set up a form and a CFSEARCH process to use that form, perhaps presuming a "type=simple" CFSEARCH query, you can simply modify the template to force a "type=explicist" on the CFSEARCH and then append the above criteria, as in:
    <CFSEARCH NAME="Search1" COLLECTION="xxx" TYPE="explicit" CRITERIA = "#form.searchstring# and cf_key <contains> ""#form.source#""">
Note here that the path is being passed in a form variable called "source", and that this must be enclosed in double quotes because the search criteria demands quotes while the string is within CFQUERY's CRITERIA parm which must be quoted.

There are a couple drawbacks to this solution: the search record count showing the number of records is larger than that which could possibly be returned, since the number includes records for both sets of data. Also, this is naturally searching that larger set, which has some performance penalty, but since we're searching on a Verity keyword, it would seem a small penalty.

Again, there may be other, perhaps more effective ways, but we needed a quick solution and thought others may appreciate this tip.

    BTW, as a result of this discovery, we are now able to offer a search capability for these CF Tips. Note that there is now a search button in the navigation bar at upper left. This applies the tip's approach to leverage an existing collection we had on our site and allowed us to add this search capability at no extra cost. (Hey, this tips area is a volunteer effort. Cash is not spent on the site casually. :-)

"Take only what you need"

Posted: 12/6/98
Most of us have been taught that when we have a choice, we should take only "what we need". Everything in moderation, right?

That's a good lesson to remember when coding SELECT statements in SQL. This is easily overlooked. When we're looking for the records that meet a given criteria, it's all too easy to simply say:

    SELECT * FROM ... (rest of query)

Our focus is usually more on getting the WHERE clause and/or joins right. The thing is, if we only need a single or small number of columns, or (worse) a count of the records found, it's generally far better to SELECT only the column(s) we need, such as:

    SELECT contactid,firstname FROM xxx
In the case of a count, we could use:
    SELECT count(*) as mycount FROM xxx
For more on where you can go really wrong using Select * to count records, see here.

In some tests, this simple change has led to a dramatic reduction in execution time for the query. The difference will of course depend on the number of columns needed versus available, the amount of data in those columns and records, etc.

The think to think about is that there's much more than just DB processing going on here. Even if a test on the DB shows only a minor difference between using Select * and selecting columns, remember that the SELECT * will a) pulls all the columns (for all the selected records) from the DB, b) load up the DB cache with that data (possibly forcing out other more important data that will later need to be reloaded), c) send that across the network to the CF server, d) load up the CF server memory with the entire found resultset, and then e) process it in your CFML.

Naturally, the larger the records are, the more dramatic the impact will be. We have seen reductions of as much as 90% in processing time (a query that took 10 seconds might now take just 1). In interactive applications, that's substantial!

Studio Link Verification Overlooks CFINCLUDE, perhaps others

Posted: 12/5/98

If you rely on Studio's "link verification" tool (tool>verify links), be forewarned. While it's excellent for validating that your various links and file references are accurate (such as href's, img src's, background's, etc.), it DOES NOT validate that your CFINCLUDE template (and probably CFMODULE) references are accurate.

In other words, when it reports that you have no broken links, that won't be true if, for instance, a CFINCLUDE refers to a template that does not exist in the location specified on the TEMPLATE parm. This seems a severe oversight in the moving of the feature from Homesite to Studio.

A curious error regarding SQL columnnames

Posted: 12/3/98

We came across a curious error that seems tied to the use of a SQL columname in a SQL function.

The following syntax:
    select *, {fn mid(postingdatetime,6,10)} as xdate
It got the error:
    Undefined function 'mid' in expression.
After some research, we tried the following modification, in which the columnname "postingdatetime" was enclosed within a pair of brackets, as
    select *, {fn mid([postingdatetime],6,10)} as xdate
This solved the problem. Apparently, the columnname was causing an interaction with the ODBC driver, perhaps even an interaction with CF's processing of the query. Anyway, the brackets solved the problem.

This happened to be in Access, but while the error arose in one server's Access 97 driver, it did not occur in anothers. There was a slight difference in versions of the driver, but the solution works in both, so let this be a warning for folks.
Tips Contents:

| Home | ColdFusion | Articles | Presentations
| User Groups | Other Resources | Press Releases | Company

© 1998-2024, Charles Arehart, SysteManage
Our Practice Makes You Perfect