Oracle Semijoins

Last update on August 19 2022 21:50:41 (UTC/GMT +8 hours)

What is Semijoins in Oracle?

A semi-join between two tables returns rows that match an EXISTS subquery without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery.

Semi-joins are written using EXISTS or IN.
A semi-join can be performed using the nested loops, hash join, or merge join algorithms
Semijoin and antijoin transformation cannot be done if the subquery is on an OR branch of the WHERE clause.

Difference between a semi-join and a conventional join

The main difference between a semi-join and a conventional join is that rows in the table, left side of the predicate will be returned at most once. Even if the table right side of predicate contains two matches for a row in the table left side of the predicate, only one copy of the row will be returned.

Sample table: employees

Sample table: departments

Example: Oracle Semijoins

The following sql statement returns a list of departments with at least one employee.

SELECT   D.department_id, D.department_name
FROM departments D
WHERE EXISTS
(SELECT 1
FROM   employees E
WHERE  E.department_id = D.department_id)
ORDER BY D.department_id;

Sample Output:

DEPARTMENT_ID DEPARTMENT_NAME
------------- ------------------------------
           10 Administration
           20 Marketing
           30 Purchasing
           40 Human Resources
           50 Shipping
           60 IT
           70 Public Relations
           80 Sales
           90 Executive
          100 Finance
          110 Accounting

11 rows selected.

The above results show that no department appears more than once. Oracle stops processing each department as soon as the first employee in that department is found.

Here is the Query if we wrote with a conventional join:

The department name in the following query result will appear as many times as the number of employees in it. So, for example, if a department has 20 employees then that department will appear in the query output 20 times.

To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords. A more elegant solution is to use a semi-join between the departments and employees tables instead of a conventional join:

SELECT D.department_id, D.department_name
FROM departments D, employees E
WHERE E.department_id = D.department_id
ORDER BY D.department_id;

Sample Output:

DEPARTMENT_ID DEPARTMENT_NAME
------------- ------------------------------
           10 Administration
           20 Marketing
           20 Marketing
           30 Purchasing
           30 Purchasing
           30 Purchasing
           30 Purchasing
           30 Purchasing
           30 Purchasing
           40 Human Resources
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
           50 Shipping
		   .........
		   .........

The above example shows that a department with N employees will appear in the list N times. To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords .

Example - 2:

The following example returns only one row needs to be returned from the departments table, even though many rows in the employees table might match the subquery. If no index has been defined on the salary column in employees, then a semijoin can be used to improve query performance.

SELECT * FROM departments 
WHERE EXISTS 
(SELECT * FROM employees 
WHERE departments.department_id = employees.department_id 
AND employees.salary >= 3500);

Sample Output:

DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID LOCATION_ID
------------- ------------------------------ ---------- -----------
           10 Administration                        200        1700
           20 Marketing                             201        1800
           30 Purchasing                            114        1700
           40 Human Resources                       203        2400
           50 Shipping                              121        1500
           60 IT                                    103        1400
           70 Public Relations                      204        2700
           80 Sales                                 145        2500
           90 Executive                             100        1700
          100 Finance                               108        1700
          110 Accounting                            205        1700

11 rows selected.

Previous: CROSS JOIN
Next: ANTIJOINS