Oracle Semijoins
What is Semijoins in Oracle?
A semi-join between two tables returns rows that match an EXISTS subquery without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery.
- Semi-joins are written using EXISTS or IN.
- A semi-join can be performed using the nested loops, hash join, or merge join algorithms
- Semijoin and antijoin transformation cannot be done if the subquery is on an OR branch of the WHERE clause.
Difference between a semi-join and a conventional join
The main difference between a semi-join and a conventional join is that rows in the table, left side of the predicate will be returned at most once. Even if the table right side of predicate contains two matches for a row in the table left side of the predicate, only one copy of the row will be returned.
Sample table: employees
Sample table: departments
Example: Oracle Semijoins
The following sql statement returns a list of departments with at least one employee.
SELECT D.department_id, D.department_name
FROM departments D
WHERE EXISTS
(SELECT 1
FROM employees E
WHERE E.department_id = D.department_id)
ORDER BY D.department_id;
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME ------------- ------------------------------ 10 Administration 20 Marketing 30 Purchasing 40 Human Resources 50 Shipping 60 IT 70 Public Relations 80 Sales 90 Executive 100 Finance 110 Accounting 11 rows selected.
The above results show that no department appears more than once. Oracle stops processing each department as soon as the first employee in that department is found.
Here is the Query if we wrote with a conventional join:
The department name in the following query result will appear as many times as the number of employees in it. So, for example, if a department has 20 employees then that department will appear in the query output 20 times.
To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords. A more elegant solution is to use a semi-join between the departments and employees tables instead of a conventional join:
SELECT D.department_id, D.department_name
FROM departments D, employees E
WHERE E.department_id = D.department_id
ORDER BY D.department_id;
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME ------------- ------------------------------ 10 Administration 20 Marketing 20 Marketing 30 Purchasing 30 Purchasing 30 Purchasing 30 Purchasing 30 Purchasing 30 Purchasing 40 Human Resources 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping 50 Shipping ......... .........
The above example shows that a department with N employees will appear in the list N times. To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords .
Example - 2:
The following example returns only one row needs to be returned from the departments table, even though many rows in the employees table might match the subquery. If no index has been defined on the salary column in employees, then a semijoin can be used to improve query performance.
SELECT * FROM departments
WHERE EXISTS
(SELECT * FROM employees
WHERE departments.department_id = employees.department_id
AND employees.salary >= 3500);
Sample Output:
DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID ------------- ------------------------------ ---------- ----------- 10 Administration 200 1700 20 Marketing 201 1800 30 Purchasing 114 1700 40 Human Resources 203 2400 50 Shipping 121 1500 60 IT 103 1400 70 Public Relations 204 2700 80 Sales 145 2500 90 Executive 100 1700 100 Finance 108 1700 110 Accounting 205 1700 11 rows selected.
Previous:
CROSS JOIN
Next:
ANTIJOINS
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics