Oracle Semijoins
What is Semijoins in Oracle?
A semi-join between two tables returns rows that match an EXISTS subquery without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery.
- Semi-joins are written using EXISTS or IN.
- A semi-join can be performed using the nested loops, hash join, or merge join algorithms
- Semijoin and antijoin transformation cannot be done if the subquery is on an OR branch of the WHERE clause.
Difference between a semi-join and a conventional join
The main difference between a semi-join and a conventional join is that rows in the table, left side of the predicate will be returned at most once. Even if the table right side of predicate contains two matches for a row in the table left side of the predicate, only one copy of the row will be returned.
Sample table: employees
Sample table: departments
Example: Oracle Semijoins
The following sql statement returns a list of departments with at least one employee.
SELECT D.department_id, D.department_name
FROM departments D
WHERE EXISTS
(SELECT 1
FROM employees E
WHERE E.department_id = D.department_id)
ORDER BY D.department_id;
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME
------------- ------------------------------
10 Administration
20 Marketing
30 Purchasing
40 Human Resources
50 Shipping
60 IT
70 Public Relations
80 Sales
90 Executive
100 Finance
110 Accounting
11 rows selected.
The above results show that no department appears more than once. Oracle stops processing each department as soon as the first employee in that department is found.
Here is the Query if we wrote with a conventional join:
The department name in the following query result will appear as many times as the number of employees in it. So, for example, if a department has 20 employees then that department will appear in the query output 20 times.
To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords. A more elegant solution is to use a semi-join between the departments and employees tables instead of a conventional join:
SELECT D.department_id, D.department_name
FROM departments D, employees E
WHERE E.department_id = D.department_id
ORDER BY D.department_id;
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME
------------- ------------------------------
10 Administration
20 Marketing
20 Marketing
30 Purchasing
30 Purchasing
30 Purchasing
30 Purchasing
30 Purchasing
30 Purchasing
40 Human Resources
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
50 Shipping
.........
.........
The above example shows that a department with N employees will appear in the list N times. To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords .
Example - 2:
The following example returns only one row needs to be returned from the departments table, even though many rows in the employees table might match the subquery. If no index has been defined on the salary column in employees, then a semijoin can be used to improve query performance.
SELECT * FROM departments
WHERE EXISTS
(SELECT * FROM employees
WHERE departments.department_id = employees.department_id
AND employees.salary >= 3500);
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
10 Administration 200 1700
20 Marketing 201 1800
30 Purchasing 114 1700
40 Human Resources 203 2400
50 Shipping 121 1500
60 IT 103 1400
70 Public Relations 204 2700
80 Sales 145 2500
90 Executive 100 1700
100 Finance 108 1700
110 Accounting 205 1700
11 rows selected.
Previous:
CROSS JOIN
Next:
ANTIJOINS
