nprakashn9businessobjects

Query Joins - Inner & Outer Joins

Original article at: http://www.databasedev.co.uk/query_joins.html

You will often need to join tables in a Microsoft Access query where you need to view information from two or more separate database tables. For this you will need to use query joins to accomplish the task.
When you run a database query to find data in related tables, by default Microsoft Access will look for records that have a matching value on both sides of the relationship. Whilst this is what you may want to do most of the time, by knowing how to use the different types of query joins, you can control which records will be displayed as the output. This enables you to again find the exact data that you want - not more and not less.

Query Joins

A join is a temporary relationship that you can create between two tables in a database query that do not already have an established relationship or common field with the same fieldname or data type. Database tables that are joined in a query are related in that query only, and nowhere else. The type of join that you use indicates which records the query will select or perform the chosen actions on.
Note: Creating a query join will not establish a permanent relationship between the tables. Permanent relationships can only be created in the Microsoft Access relationships window.

Inner Join

Definition: An inner join is a join that selects only those records from both database tables that have matching values. Records with values in the joined field that do not appear in both of the database tables will be excluded from the query. One or more fields can serve as the join fields.

The inner join is also known as an equi-join.
The inner join is the default join type in Microsoft Access

The above shows a conceptual diagram of the inner join between Customer data and Order data.
Analogy: Consider a business that employs both managers and engineers - and some employees that are both. An inner join is like a union of this set; it selects the set of people that are both managers and engineers and provides information about them in both roles.

Outer Join

Definition: An outer join selects all of the records from one database table and only those records in the second table that have matching values in the joined field. In a left outer join, the selected records will include all of the records in the first database table. In a right outer join, the selected records will include all records of the second database table. One or more fields can serve as the join fields.

Left Outer Join example:

The above conceptual diagram details the Left Outer Join between Customer data and Order data
Analogy: Consider again the business that employs both managers and engineers. A left outer join selects all of the managers, providing the information about them, but in the case of managers who are also engineers, it provides additional information about them.

Right Outer Join example:

The above shows a conceptual diagram of a Right Outer Join between the Customer data and the Order data.
Analogy: Consider again the business that employs both managers and engineers. A right outer join selects the set of all engineers, providing information about them, but in the case of engineers and also managers, if provides additional information about them.

Changing the Query Join Type

The Join Properties dialog box enables you to specify how two tables are to be joined in a Microsoft Access query. The three options that it includes describe which records you want the query to select. Option 1 in the dialog box is the inner join (the default in Microsoft Access).
Options 2 and 3 represent outer joins. Read the table names carefully when selecting these joins: if the join line was drawn starting from the table on the left, the second option represents the left outer join and the third option will represent a right outer join.
In a traditional database diagram, the "one" or "primary" table is usually drawn to the left of the "many" or "secondary" table.
In this case, a left outer join includes all records from the table on the "left side", and the right outer join includes all records from the table on the "right side".

Explanation of the Different SQL Joins

Original article at: http://www.somecoders.com/2006/02/explanation-of-the-different-sql-joins/

The main point of Relational Databases is to reduce the amount of duplicated data in tables by “normalising” the data – that is, reduce duplication and redundant data. For instance, if we were creating a table of employees and their departments in a business – rather than list the same department over and over
again, we could split out the departments into their own table and reference that table in the employees table as below:
Employees Table

*Employee_ID*	*First_Name*	*Last_Name*	*Department_ID*
1	Joe	Bloggs	1
2	Fred	Jones	2
3	Bill	Smith	3
4	Ann	Jones	2
5	Sally	Dee	1
6	Jim	Lee	2

Departments Table

*Department_ID*	*Department_Name*
1	Sales
2	Tech Support
3	Finance

The problem then comes when you want to list all employees and their department. In order
to do this we will need to JOIN the 2 tables in the SQL statement to get the relevant data back. This is the statement we will use:

SELECT
First_Name, Last_Name, Department_Name
FROM
Employees INNER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

Which will give us the following

First_Name	Last_Name	Department_Name
Joe	Bloggs	Sales
Sally	Dee	Sales
Fred	Jones	Tech Support
Ann	Jones	Tech Support
Jim	Lee	Tech Support
Bill	Smith	Finance

Things to note from the statement:

We have specified an INNER JOIN. An inner join returns only those rows from the
listed tables that match the criteria specified in the ON portion of
the JOIN clause. In this case we have said where Department_ID
in the Employees Table matches the Department_ID in the Departments
table.

Although here we have specified that it is an INNER JOIN, this is the default type
of join so you can just specify JOIN and get the same results.

The JOIN statement is in the FROM clause. This is part of the ANSI SQL-92 standard
for how to format Transact SQL statements. It is also quicker
and easier for SQL to evaluate joins in the FROM clause before it gets
to the WHERE clause.

It is not required but it is considered good practice to write all keywords in upper case.
Although Query Analyzer will colour-code keywords, this practice will
help to more easily identify and separate them.

There were also no duplicate column names in our query so we just listed the columns we wanted and
listed the full table names in the ON portion of the JOIN clause.
If there were duplicate column names in the query we would have had
to have specified to SQL Server which table we wanted the column data
to be displayed from just like in the ON portion otherwise SQL will
complain that it doesn’t know which table you want the data from.
If we had to do this, it would make our SQL statement look like:

SELECT
Employees.First_Name,
Employees.Last_Name,
Departments.Department_Name
FROM
Employees INNER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

Which looks unwieldy enough when there are just a few columns – can you imagine a statement containing
30 columns and 4 JOINS written out like this?
Fortunately, SQL allows us to use Table Aliases to replace the table names in most of the statement,
so we can rewrite our statement as below:

SELECT
e.First_Name,
e.Last_Name,
d.Department_Name
FROM
Employees e JOIN Departments d ON
e.Department_ID = d.Department_ID

As you can see, we have now replaced the table names with the letters ‘e’ and ‘d’ respectively.
We told SQL to recognise these as the table names in the FROM clause
when we said:
Employees e JOIN Departments d
It did not have to be a single letter it is just easier (and quicker to type!). Indeed you will find that, once you are writing complex queries involving a lot of tables, you will have to start using more characters. When
using aliases, try to use something that will identify the table – in our case it was easy as we only had 2 tables so using the first letter of each was enough to identify them - If our 2 tables were called Employees
and Entitlements we would have had to use more than one letter! One important note to remember is that, once you have used a table alias you must then use it everywhere else in the statement in place of the
table name.
Now lets get a bit more complex. Instead of listing employees and what department they are in, we want to list all the departments and which employees work in them. Given the tables above and the knowledge we have gained so far we could write a statement such as:

SELECT
d.Department_Name,
e.First_Name,
e.Last_Name
FROM
Departments d JOIN Employees e ON
d.Department_ID = e.Department_ID
ORDER BY
Department_Name

* Note I have thrown in an ORDER BY clause to order the listing by department – nothing to
do with joins it just makes the output neater

Which would give us:

Department_Name	First_Name	Last_Name
Finance	Bill	Smith
Tech Support	Jim	Lee
Tech Support	Fred	Jones
Tech Support	Ann	Jones
Sales	Sally	Dee
Sales	Joe	Bloggs

Which is great, and just what we wanted. But now, suppose our business expands and adds
some more departments so that the departments table looks like this:

*Department_ID*	*Department_Name*
1	Sales
2	Tech Support
3	Finance
4	Marketing
5	Legal

If we run our statement again, we will get exactly the same output! Where are our missing
departments? Because an inner join will only give us those rows that match from both tables we don’t see the Legal department or the Marketing department.
In order to see all the departments regardless of whether an employee is assigned to that department
we will have to use an OUTER JOIN. An outer join returns all rows from the outer table and those rows that match from the other (inner) table.
There are 2 types of outer join, a LEFT OUTER JOIN and a RIGHT OUTER JOIN. They do the same
thing except one takes the left most listed table as the outer table and the other takes the right most listed table as the outer table. It just allows you to be more flexible in how you can write your joins
and can become invaluable when you have to join 3 or more tables.
All we have to do to our statement is change JOIN to read LEFT OUTER JOIN as below:

SELECT
d.Department_Name,
e.First_Name,
e.Last_Name
FROM
Departments d LEFT OUTER JOIN Employees e
ON d.Department_ID = e.Department_ID
ORDER BY
Department_Name

And we will get the output we wanted as below:

Department_Name	First_Name	Last_Name
Legal	NULL	NULL
Finance	Bill	Smith
Tech Support	Jim	Lee
Tech Support	Fred	Jones
Tech Support	Ann	Jones
Sales	Sally	Dee
Sales	Joe	Bloggs
Marketing	NULL	NULL

Some things to note about the statement and the output:

The keyword OUTER is optional so you can just use LEFT JOIN (or RIGHT JOIN).

You can see that we now have all the departments listed and where there was no matching
Department_ID in the Employees table we have the entry NULL against
First_Name and Last_Name from that table.

Now lets expand our employees table. It is clear from looking at it that we have people working
in the same departments so lets promote some of them. We will record this fact by adding a Manager_ID column. This will be populated with an Employee_ID whenever a person has a manager, to indicate who
that manager is. This will leave our table looking like this:

*Employee_ID*	*First_Name*	*Last_Name*	*Department_ID*	*Manager_ID*
1	Joe	Bloggs	1	NULL
2	Fred	Jones	2	NULL
3	Bill	Smith	3	NULL
4	Ann	Jones	2	2
5	Sally	Dee	1	1
6	Jim	Lee	2	2

This shows that Joe, Fred and Bill have no managers (Indicated by the NULL in the column for this
value). Ann and Jim are managed by Fred and Sally is managed by Joe.
We now need to list all employees and the name of their manager. We know from above that,
as not everyone has a manager, we will have to use an OUTER JOIN but we do not have another table to join to. We will have to use a SELF JOIN – that is, join the table to itself - and this is where
table aliases really come into play. The statement we will use is this:

SELECT
e.First_Name,
e.Last_Name,
e2.First_Name AS Manager_Name
FROM
Employees e LEFT JOIN Employees e2 ON
e.manager_id = e2.employee_id

*First_Name*	*Last_Name*	*Manager_Name*
Joe	Bloggs	NULL
Fred	Jones	NULL
Bill	Smith	NULL
Ann	Jones	Fred
Sally	Dee	Joe
Jim	Lee	Fred

Things to note about the statement:

Basically we have used table aliases to create a “copy” of the employees table
and called it e2. Now we can use our copy of the employees table just as if it was another table.

You can also see in the statement that I have aliased the column name as well so
that the column title makes more sense – you can do this in any SQL
statement if you feel that the actual column name doesn’t make much
sense. For instance, if we had named our columns Emp_FName and
Emp_LName it doesn’t look too good on a report so we could alias them
to First Name and Last Name. You can have spaces in these column
names as log as you enclose the name in quotes like below:

SELECT
e.First_Name AS ‘First Name’,
e.Last_Name AS ‘Last Name’,
e2.First_Name AS ‘Manager Name’
FROM
Employees e LEFT JOIN Employees e2 ON
e.manager_id = e2.employee_id

First Name	Last Name	Manager Name
Joe	Bloggs	NULL

* As a test, how would you rewrite our managers query above so that it listed managers and
their employees? My solution at the end of the article.
Up until now we have only been dealing with 2 tables. What happens when you need to join
3 (or more) tables?
Our employees are hardworking so lets allow them to work some overtime. To do this we will add
2 tables – an Overtime table and an Overtime_Allowed table.
Why 2 tables? Well, we will be restricting who can take overtime based on department and
to comply with normalisation rules, rather then put all that information into one table with lots of duplication we will create 1 table for the overtime definition and then what is called a “link” table to contain information about both departments and overtime. The new tables are below:
Overtime Table

*OTime_ID*	*OTime_Name*	*Pay_Scale*
1	Weekday	0
2	Weekend	1.5
3	Bank Holiday	2

Overtime_Allowed Table

*Dept_ID*	*OTime_ID*	*OTime_Allowed*
1	1	Y
1	2	N
1	3	N
2	1	Y
2	2	Y
2	3	Y
3	1	Y
3	2	Y
3	3	N
4	1	Y
4	2	Y
4	3	N
5	1	Y
5	2	Y
5	3	Y

Hopefully you can now see the point of the 2 tables.
** NOTE – in reality the OTime_Allowed column would be data type BIT (a 1 or a 0 to indicate
TRUE or FALSE or YES and NO in our case) I have made it a CHAR field to keep the statement simple so we don’t get bogged down in understanding the syntax of converting a BIT field into a ‘Yes’ or ‘No’ when the results are displayed.
So now suppose we want to list all departments and their overtime allowances. In order to do this we will have to join the Departments, Overtime and Overtime_Allowed tables.
When joining 3 or more tables it is important to understand how SQL processes the joins so that you can order them correctly to get the results you need. I like to think of multi-table joins as like the game “six degrees of Kevin Bacon” - you take each table in turn and relate it to the next until finally you have linked Tom Cruise to Kevin Bacon in as few hops as possible. But maybe that’s just me!
SQL processes joins in much the same way - It takes the first two tables listed and creates
a result set for them. It then takes the next table and compares it to the temporary table it has created containing the previous result-set and, based on the conditions in the ON clause, creates a result set
and so on until it creates the final table it displays.
An example is needed:

SELECT
d.Department_Name
AS Department,
o.OTime_Name AS ‘Overtime Allowed’
FROM
Departments d
JOIN Overtime_Allowed oa ON d.Department_ID = oa.Dept_ID
JOIN Overtime o ON oa.OTime_ID = o.OTime_ID
WHERE
oa.OTime_Allowed = ‘Y’
ORDER BY
d.Department_Name

*Department*	Overtime Allowed
Finance	Weekend
Finance	Weekday
Legal	Weekend
Legal	Weekday
Legal	Bank Holiday
Marketing	Weekend
Marketing	Weekday
Sales	Weekday
Tech Support	Bank Holiday
Tech Support	Weekend
Tech Support	Weekday

Here you can see that we used the link table to get our results by joining Departments to
Overtime via Overtime_Allowed.
We have now covered the 2 main types of join – INNER JOIN and OUTER JOIN and
their various uses such as SELF JOIN. We have also covered joining more than 2 tables. And still there is more to talk about!
There are 2 other types of joins that we haven’t yet covered – CROSS JOIN and FULL OUTER JOIN.
These are less commonly used joins (ie – I’ve never had course to use them!) but can come in handy.
Cross join:
A cross join on 2 tables will return every possible combination of rows and columns in those
2 tables. This is called the Cartesian product of the 2 tables. This means that if we were to join our Employees and Departments tables in a cross join we would get 30 rows returned as Employees has 6 rows
and Departments has 5 (6*5=30)
When you specify a cross join you can either use the CROSS JOIN keywords or separate the table
names with a comma:

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees CROSS JOIN Departments

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees,
Departments

Both would give the results shown below:

First Name	Last Name	Manager Name
Joe	Bloggs	Sales
Fred	Jones	Sales
Bill	Smith	Sales
Ann	Jones	Sales
Sally	Dee	Sales
Jim	Lee	Sales
Joe	Bloggs	Tech Support
Fred	Jones	Tech Support
Bill	Smith	Tech Support
Ann	Jones	Tech Support
Sally	Dee	Tech Support
Jim	Lee	Tech Support
Joe	Bloggs	Finance
Fred	Jones	Finance
Bill	Smith	Finance
Ann	Jones	Finance
Sally	Dee	Finance
Jim	Lee	Finance
Joe	Bloggs	Marketing
Fred	Jones	Marketing
Bill	Smith	Marketing
Ann	Jones	Marketing
Sally	Dee	Marketing
Jim	Lee	Marketing
Joe	Bloggs	Legal
Fred	Jones	Legal
Bill	Smith	Legal
Ann	Jones	Legal
Sally	Dee	Legal
Jim	Lee	Legal

Full outer join:
A full outer join on 2 tables would return all rows from both tables with Null entries in the
left columns where things don’t match with the right and vice versa.
For example, suppose a temp worker joins the company – temp workers have no department so
the Department_ID field in Employees would be NULL. We already have 2 departments that have no workers that we added earlier so a FULL OUTER JOIN between Employees and Departments would then give us:

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees FULL OUTER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

First_Name	Last_Name	Department_Name
John	Deed	NULL
Joe	Bloggs	Sales
Sally	Dee	Sales
Fred	Jones	Tech Support
Ann	Jones	Tech Support
Jim	Lee	Tech Support
Bill	Smith	Finance
NULL	NULL	Marketing
NULL	NULL	Legal

So there you go.
INNER JOIN, OUTER JOIN, SELF JOIN, CROSS JOIN and FULL OUTER JOIN.
Hopefully you now understand all the various ways SQL allows you to join tables and you can make use of this to normalise your database and still display the data you need.
Before we go though, some final points on using joins:

The columns which are used in join statement have to be either the same data type or compatible data types. Compatible data types means that SQL Server can perform an implicit conversion on the data type when it comes to compare the columns.

The performance of queries using joins can, in general, be increased if all columns
in a join are indexed.

The only thing that remains is to see if you rewrote the managers and employees query in the same
way I did – my solution below:

SELECT
e.First_Name AS Manager_First,
e.Last_Name AS Manager_Last,
e2.First_Name AS Employee
FROM
Employees e JOIN Employees e2 ON
e.Employee_ID = e2.Manager_ID
ORDER BY
e.First_Name

I hope this little tutorial helps explain a bit about the different sort of joins available to you and I’d like to express my sincere thanks to Mark for the time and effort he put into this. Cheers mate.

SQL joins - multi-table queries

Original article at: http://www.databasejournal.com/sqletc/article.php/1402351

Most of you should be familiar with basic queries, SELECTS, UPDATES and DELETES using one table. But to harness the true power of relational databases it is vital to master queries using more than one table. This article will introduce you to database joins - queries using 2 or more tables.
This article assumes you know how to perform basic SQL queries. If you don't, I suggest you read Simple SQL: Getting Started With SQL first. A warning though - SQL implementations are notoriously non-standard, with almost every DBMS having its own extensions, as well as exclusions, especially when it gets to the realm of inner and outer joins! And version by version, they change. So, although most of these examples should work with most implementations, they don't work with all. The final word should come from the documentation of your particular database installation.
Let's do a quick recap for those who may be unsure. We will perform queries on the table below, containing data about tourist guides. The table is defined with:

CREATE TABLE tour_guides(

       employee_number INT NOT NULL,

       employee_name VARCHAR(100),

       hourly_rate INT,

       PRIMARY KEY(employee_number));

To add records into this table, we use:

INSERT INTO tour_guides(

            employee_number,

            employee_name,

            hourly_rate)

       VALUES(981,'Xoliswe Xaba',25)

INSERT INTO tour_guides(

            employee_number,

            employee_name,

            hourly_rate)

       VALUES(978,'Siyabonge Nomvete',30)

INSERT INTO tour_guides(

            employee_number,

            employee_name,

            hourly_rate)

       VALUES(942,'Jean-Marc Ithier',35)

INSERT INTO tour_guides(

            employee_number,

            employee_name,

            hourly_rate)

       VALUES(923,'Tyrone Arendse',32)

Note that you can also do a shortcut INSERT statement, leaving out the fieldnames if the number and order of the fields is the same, e.g:

INSERT INTO tour_guides VALUES(982,'Mathew Booth',25)

I don't suggest using the shortcut however, as, particularly if you're doing your SQL from within an application, the table structure may change, and then the SQL may no longer be valid. For example, I may add another field, such as months_employed. Now, the above INSERT statement will not work. You will get an error something like

Column count doesn't match value count at row 1

If you had written the statement as:

INSERT into tour_guides(

            employee_number,

            employee_name,

            hourly_rate)

       VALUES(982,'Mathew Booth',25)

it would have worked. For this reason, I suggest you don't use the shortcut - it makes your queries much less flexible, and less able to survive changes in table structure.
After the above, your table will looks as follows: Now the table contains:

employee_number	employee_name	hourly_rate
978	Siyabonge Nomvete	30
942	Jean-Marc Ithier	35
923	Tyrone Arendse	32
981	Xoliswe Xaba	25
982	Mathew Booth	25

To delete data from the table is also easy. Perhaps we have entered Xoliswe Xaba prematurely, as he has not accepted our generous offer of employment, and taken another job. To remove him, we use the statement:

DELETE FROM tour_guides WHERE employee_number=981

We use the employee number, and not the employee name, as according to our table definition the employee number is unique (the primary key). It is impossible for there to be another employee 981, while, unlikely as it may seem, there may, at least in theory, be another Xoliswe Xaba. This is why we almost always create a field that contains a unique code for each table.
Let's assume we need to know how long someone has worked for us, so we add the months_employed field talked about earlier. We use:

ALTER TABLE tour_guides ADD months_employed int

After the DELETE, and the creation of the new field, the table now looks as follows:

employee_number	employee_name	hourly_rate	months_employed
978	Siyabonge Nomvete	30	0
942	Jean-Marc Ithier	35	0
923	Tyrone Arendse	32	0
982	Mathew Booth	25	0

See how all the months_employed values have defaulted to zero. Numeric fields, unless you specify otherwise, automatically default to zero. To start adding the months_employed, we use the UPDATE statement, as follows:

UPDATE tour_guides

       SET months_employed=6

       WHERE employee_number=978

UPDATE tour_guides

       SET months_employed=12

       WHERE employee_number=942

UPDATE tour_guides

       SET months_employed=6

       WHERE employee_number=923

Now our table contains the following:

employee_number	employee_name	hourly_rate	months_employed
978	Siyabonge Nomvete	30	6
942	Jean-Marc Ithier	35	12
923	Tyrone Arendse	32	6
982	Mathew Booth	25	0

All of this putting data in, changing and taking out is of course of no use unless you can retrieve the data. This is done with the SELECT statement. For example,

SELECT employee_name FROM tour_guides

returns

employee_name

Siyabonge Nomvete

Jean-Marc Ithier

Tyrone Arendse

Mathew Booth

while

SELECT employee_name FROM tour_guides WHERE hourly_rate>30

returns

employee_name

Jean-Marc Ithier

Tyrone Arendse

If all this SELECT stuff is a bit over your head, and you're unsure of the powerful ways it can limit and perform calculations on the data, I suggest you first read a more introductory article, such as Simple SQL: Getting Started With SQL
All of this so far should be familiar to regular readers of SQLwire. Now, we introduce the 2nd table, that will show the real power of relational databases. First, a quick introduction to relational databases. Why are they given this name? The answer comes from the fact that, unlike earlier database structures (hierarchical and network), relational databases allow potentially every file, or table, to relate to every other one. They do this by using a common field. Let's add another couple of tables. (For more on database design, I suggest you read the article on database normalization.) We add the following tables:

tour_locations

location_code	location_name
1024	Table Mountain
1025	Robben Island
1026	Kruger National Park
1027	St Lucia

tour_expeditions

location_code	employee_number	hours_worked	tourgroup_size
1024	978	5	8
1027	942	8	4
1025	923	3	20
1026	982	6	8
1024	978	5	8
1025	978	3	16

I assume by now you can do the CREATE and INSERT statements to populate the above tables. Now you should be able to see the reason for the term relational database. The way these tables relate is by the common fields they have - tour expeditions joins to tour_guides through the field employee_number, and to tour_locations though location_code. Note that the field names do not have to be the same in both tables, as long as their definitions are the same (ie both int in this case). Try and make as many fields as possible NOT NULL (fields where there cannot logically be a NULL value). For example, the fields location_code and employee_number in the table tour_expeditions are good candidates. Make them NOT NULL now, and we'll reap the benefits later!
Now comes the crux. How would we answer the question "Which employees worked in which locations?". The secret here is to use the fields that relate in each table to join. Let's first answer a more simple question to introduce the concept. "Which employee_numbers worked in which locations?". We would use the following query:

SELECT employee_number,location_name

       FROM tour_locations,tour_expeditions

       WHERE tour_locations.location_code = tour_expeditions.location_code

This returns:

employee_number	location_name
978	Table Mountain
942	St Lucia
923	Robben Island
982	Kruger National Park
978	Table Mountain
978	Robben Island

How did we get this query? The first part, immediately after the SELECT lists the fields we want to return. Easy enough - employee_numbers and location_name. The second part, after the FROM, provides the tables that contain the fields. In this case it's clearly tour_locations for location_name. But which table to choose for employee_number? Both tour_expeditions and tour_guides contain this field. Here, we have to look at which table is related to tour_location. Since tour_location is related only to tour_expedition (through the location_code field), we could only use tour_expedition. And the third part, after the WHERE clause, tells us which fields the relation exists on, or are being joined.
The usual SELECT rules apply. To bring back only the employee_numbers that gave a tour to Table Mountain, and to bring back only unique records (notice that the above query brought back a duplicate value, as there are 2 records that apply), we use:

SELECT DISTINCT employee_number,location_name

       FROM tour_locations,tour_expeditions

       WHERE tour_locations.location_code = tour_expeditions.location_code AND

             location_name='Table Mountain'

employee_number	location_name
978	Table Mountain

is the only row returned. Note how the DISTINCT keyword returns only one identical row. Without it, we would have returned 2 identical rows, one for each time employee 978 gave a tour to Table Mountain.
How now do we return the name of the employees, as we originally requested, not just their numbers. To do so, we join the tour_expedition table to the tour_guide table, on the employee_number field, as follows:

SELECT DISTINCT

    tour_expeditions.employee_number,employee_name,location_name

    FROM

    tour_locations,tour_expeditions,tour_guides

    WHERE tour_locations.location_code = tour_expeditions.location_code AND

    tour_expeditions.employee_number=tour_guides.employee_number

This brings back

employee_number	employee_name	location_name
978	Siyabonge Nomvete	Table Mountain
923	Tyrone Arendse	Robben Island
978	Siyabonge Nomvete	Robben Island
982	Mathew Booth	Kruger National Park
942	Jean-Marc Ithier	St Lucia

Note the changes we made to our original join. We've added employee_name to the fields returned, tour_guides to the table list, and we've had to add the name of the table to the employee_number field of tour_expeditions, making it tour_expeditions.employee_number (now that there are 2 tables returning the employee_number, we need to specify which table to use.) In this case it makes no difference, but in others it may. And finally, we've added a join condition to the WHERE clause, pointing out the relation to use to join the 2 tables.
Let's add another record to the tour_guide table.

INSERT into tour_guides(

       employee_number,

       employee_name)

VALUES(983,'Nelson Jiranga');

Now run the query again:

SELECT DISTINCT

  tour_expeditions.employee_number,

  employee_name,location_name

  FROM tour_locations,

       tour_expeditions,

       tour_guides

  WHERE tour_locations.location_code = tour_expeditions.location_code AND

        tour_expeditions.employee_number=tour_guides.employee_number

We get identical results

employee_number	employee_name	location_name
978	Siyabonge Nomvete	Table Mountain
923	Tyrone Arendse	Robben Island
978	Siyabonge Nomvete	Robben Island
982	Mathew Booth	Kruger National Park
942	Jean-Marc Ithier	St Lucia

This makes sense, as our new tour guide has not yet undertaken any tours. He does not yet appear in the tour_expeditions table, and so the join does not work, as there is nothing in tour_expeditions to join to.
But what if we want all the employees back, regardless of whether they have undertaken a tour or not? We need to explicitly state this, and we do so using a LEFT JOIN (also called a LEFT OUTER JOIN). To introduce the concept, try the following query:

select DISTINCT employee_name

       from tour_guides

       LEFT JOIN tour_expeditions ON

          tour_guides.employee_number = tour_expeditions.employee_number

This returns:

employee_name

Siyabonge Nomvete

Jean-Marc Ithier

Tyrone Arendse

Mathew Booth

Nelson Jiranga

Note the syntax is almost the same, except that the table names are separated by LEFT JOIN, not a comma, and ON is used for the fields to be joined, rather than WHERE.
So, going back to our original question - how do we return the employee numbers, names and locations of all guides, including those who have not yet given a tour. The query is as follows:

SELECT DISTINCT

       tour_guides.employee_number,employee_name,location_name

       FROM tour_guides

       LEFT JOIN tour_expeditions ON

          tour_guides.employee_number = tour_expeditions.employee_number

       LEFT JOIN tour_locations ON

          tour_locations.location_code=tour_expeditions.location_code

This now returns:

employee_number	employee_name	location_name
978	Siyabonge Nomvete	Table Mountain
923	Tyrone Arendse	Robben Island
978	Siyabonge Nomvete	Robben Island
982	Mathew Booth	Kruger National Park
942	Jean-Marc Ithier	St Lucia
983	Nelson Jiranga	NULL

Note that Nelson Jiranga now appears in the results table, and as he has not yet lead any tours, that field is NULL.
Many existing queries make use of what are called subselects (selects within selects). For example, try the following query, which returns all employees who've worked with a tourgroup of more than 10 people:

SELECT employee_name

  FROM tour_guides

  WHERE employee_number IN

    (SELECT employee_name

     FROM tour_expeditions

     WHERE tourgroup_size>10)

The results are

employee_name

Siyabonge Nomvete

Mathew Booth

This query resolves in 2 steps - first the inner query (which returns 923 and 978) is resolved. Then we are left with

SELECT employee_name

       FROM tour_guides

       WHERE employee_number IN

             (923,978)

which resolves to the results above. But I've just demonstrated another way to do this, and which is usually a better way - the join. You can rewrite this query as:

SELECT employee_name

       FROM tour_guides,tour_expeditions

       WHERE tourgroup_size>10 AND

          tour_guides.employee_number=tour_expeditions.employee_number

Why do I say this is better? 2 reasons. One is that many DBMS's (such as early versions of MySQL) do not support nested selects. And the second reason is that more often they can be rewritten as a join, and the join is usually more efficient. On those big, heavily used tables, where performance is vital, you will want to do without nested selects as much as possible.
Let's take another example. How could we find all tour_guides who have not yet given a tour? We could write

SELECT employee_name

       FROM tour_guides

       WHERE employee_number NOT IN

             (SELECT employee_number

                     FROM tour_expeditions)

And this would return

employee_name

Nelson Jiranga

But, using the same principle as before, we could rewrite this as a join, in this case a LEFT JOIN (remembering that LEFT JOINS return values that are not present). Try the following:

SELECT employee_name

       FROM tour_guides

       LEFT JOIN tour_expeditions ON

            tour_guides.employee_number = tour_expeditions.employee_number

       WHERE tour_expeditions.employee_number IS NULL

Now we see an advantage of declaring employee_number NOT NULL. It allows us to use this kind of query, which is often more efficient than the nested select, and it also saves space (the DBMS does not have to waste space telling if the field is NULL or not--by definition it's not)
For this exercise, we first need to INSERT another tour guide, as follows:

INSERT INTO tour_guides

  VALUES('999','Jon Qwelane',30,5)

Now consider another request. We want to find the names of all the employees who have the same hourly rate as Siyabonge Nomvete. Again, we can do this with a nested select:

SELECT employee_name

  FROM tour_guides

  WHERE hourly_rate IN

  (select hourly_rate

     from tour_guides

     where employee_name='Siyabonge Nomvete')

But again, a join is preferable. In this case it will be a self-join, as all the data that we need is in the one table--tour_guides. So, we could use the following, more efficient, query:

SELECT e1.employee_name

       FROM tour_guides e1,tour_guides e2

       WHERE e1.hourly_rate=e2.hourly_rate AND

             e2.employee_name='Siyabonge Nomvete'

This returns:

employee_name

Siyabonge Nomvete

Jon Qwelane

There are a few important points to notice here. We could not have used the query, as some of you may have thought,

SELECT employee_name

       FROM tour_guides

       WHERE employee_number=employee_number AND

             employee_name ='Siyabonge Nomvete'

The reason is that we need to see the table as two separate tables to be joined. This query only returns "Siyabonge Nomvete", satisfying the final condition. In order to make the DBMS see the query as a join, we need to provide an alias for the tables. We give them the names e1 and e2. Also important is why we use e1 in SELECT e1.employee_name and e2 in e2.employee_name='Siyanbonge Nomvete'. These 2 have to come from the 2 'different' versions of the table. If we chose the employee_name from the same table that we impose the condition: WHERE employee_name='Siyabonge Nomvete', of course we're on'y going to get that one result back.
Good luck with your joins--remember to keep it as simple as possible for your DBMS, as few as possible nested selects, and you'll see the benefits in your applications!

“Getting” Joins

Original article at: http://www.khankennels.com/blog/index.php/archives/2007/04/20/getting-joins/

I was asked to post this after explaining it to someone on IRC.
If you have tried to understand how joins work and constantly get confused about what join to use, you just need to keep a simple picture in mind ( I like pictures). I will be explaining joins by referencing a Venn diagram. Have no fear - I won’t get into any of the set theory/math involved. This is just a basic overview to give you an idea the data a particular join will return to you. This is not a technical discussion - just concepts.
We will start with just an empty diagram:
basicvenn.thumbnail.png

The T1 circle represents all the records in table 1. The T2 circle represents all the records in table 2. Notice how there is a bit of overlap of the 2 circles in the middle. Simple right?
I will use red to signify the records that will be returned by a particular join.
INNER JOIN
An inner join only returns those records that have “matches” in both tables. So for every record returned in T1 - you will also get the record linked by the foreign key in T2. In programming logic - think in terms of AND.
venn1.thumbnail.png

OUTER JOIN
An outer join is the inverse of the inner join. It only returns those records not in T1 and T2. “Give me the records that DON’T have a match.” In programming logic - think in terms of NOT AND.
outervenn.thumbnail.png

LEFT JOIN
A left join returns all the records in the “left” table (T1) whether they have a match in the right table or not.
If, however, they do have a match in the right table - give me the “matching” data from the right table as well. If not - fill in the holes with null.
left_venn.thumbnail.png

It should be noted that the same thing is possible with a right join - most people just use a left one.
LEFT OUTER JOIN
A left outer join combines the ideas behind a left join and an outer join. Basically - if you use a left outer join you will get the records in the left table that DO NOT have a match in the right table.
leftOutervenn.thumbnail.png

Again it is noted that the same thing is possible with a right outer join - most people just use a left one.
Theta JOIN
A theta join is the Cartesian product of the 2 tables and not normally what people are looking for - but what they sometimes get by mistake. How many of us have written a join similar to this only to get way more then we were ever expecting.

SELECT t1.*, t2.*
FROM table1 t1, table2 t2
WHERE t1.id = 5;
thetavenn.thumbnail.png

So there you have the basic concepts of joins. Next time you need to use a join and have no clue what to do to get the data you need from the database, draw a picture. It may help you figure out what join to use. Least it helps me.
Important: Please be sure to read the comments below. There are definite situations where my simplistic drawings are invalid and brought to light.
Disclaimer: I am by no means a database guru. Please be sure to learn more about joins from the many online articles available.

nprakashn9businessobjects

Wednesday, 27 April 2011

Oracle Joins Explained

Query Joins - Inner & Outer Joins

Query Joins

Inner Join

Outer Join

Left Outer Join example:

Right Outer Join example:

Changing the Query Join Type

Explanation of the Different SQL Joins

SQL joins - multi-table queries

tour_locations

tour_expeditions

“Getting” Joins