Unlocking Upserts in PostgreSQL: A Guide to RETURNING and ON CONFLICT
Understanding the Concepts:
- SQL (Structured Query Language): A standardized language for interacting with relational databases, including creating, retrieving, updating, and deleting data.
- PostgreSQL: A powerful, open-source object-relational database management system (DBMS) known for its extensibility and advanced features.
- Upsert (UPDATE or INSERT): A database operation that combines insert and update functionality. It attempts to insert a new row, but if a conflict occurs due to a unique constraint violation, it updates the existing row instead.
RETURNING Clause:
- The
RETURNING
clause in PostgreSQL allows you to specify columns or expressions to be computed and returned after anINSERT
statement is executed. - This is particularly useful for obtaining values generated during the insert, such as those assigned by serial sequences or default values.
ON CONFLICT Clause:
- The
ON CONFLICT
clause, introduced in PostgreSQL 9.5, provides a way to define how the database should handle potential conflicts during insert operations. - It lets you specify an action to be taken when a unique constraint violation is encountered:
DO NOTHING
: The insert is skipped for the conflicting row(s).DO UPDATE SET ...
: The existing row is updated with the values from the attempted insert. You can use excluded columns (values from the attempted insert) to set new values in the update.
By combining RETURNING
with ON CONFLICT DO UPDATE
, you can achieve upsert behavior in PostgreSQL. Here's how it works:
- Insert Attempt: You attempt to insert a new row into a table.
- Conflict Check: The database checks if the insert would violate a unique constraint.
- Conflict Resolution:
- No Conflict: If there's no conflict, the row is inserted, and the
RETURNING
clause (if present) computes and returns the specified values. - Conflict Occurs: If a conflict occurs, the
ON CONFLICT DO UPDATE
clause takes effect. The existing row that caused the conflict is updated with the values from the attempted insert. You can use theEXCLUDED
pseudo-table to access the values from the attempted insert within the update. - The
RETURNING
clause, if present, still computes and returns values, even for updated rows.
- No Conflict: If there's no conflict, the row is inserted, and the
Example:
INSERT INTO users (username, email)
VALUES ('john_doe', '[email protected]')
ON CONFLICT (username) DO UPDATE SET email = EXCLUDED.email
RETURNING id, username;
In this example:
- We attempt to insert a new user with
username
'john_doe' andemail
'[email protected]'. - If the username already exists, the existing row is updated to set the
email
to the value from the attempted insert (usingEXCLUDED.email
). - The
RETURNING
clause returns theid
andusername
of the inserted or updated row, allowing you to retrieve the newly created or modified user information.
Benefits:
- Upserts can simplify your code by handling both inserts and updates in a single statement.
RETURNING
provides valuable information about the inserted or updated row, like automatically generated IDs.
I hope this explanation clarifies how RETURNING
with ON CONFLICT
is used for upserts in PostgreSQL!
Inserting a new row and returning its ID:
INSERT INTO products (name, price)
VALUES ('T-Shirt', 19.99)
RETURNING id;
This code attempts to insert a new product with name 'T-Shirt' and price 19.99. If successful, it returns the automatically generated ID of the new product row.
Updating an existing row and returning the updated data:
UPDATE customers
SET email = '[email protected]'
WHERE id = 123
RETURNING id, name, email;
This code updates the email address of the customer with ID 123 to '[email protected]'. It then returns the updated customer information (id
, name
, and email
).
Upsert with conflict resolution:
INSERT INTO orders (customer_id, product_id, quantity)
VALUES (1, 2, 3)
ON CONFLICT (customer_id, product_id) DO UPDATE SET quantity = quantity + EXCLUDED.quantity
RETURNING id, customer_id, product_id, quantity;
This code attempts to insert a new order with customer_id
1, product_id
2, and quantity
3. However, if an order for the same customer and product already exists, the ON CONFLICT
clause triggers an update instead. The update increments the existing quantity
by the quantity from the attempted insert (using EXCLUDED.quantity
). Finally, the RETURNING
clause returns information about the inserted or updated order, including the adjusted quantity.
Separate INSERT and UPDATE Statements (Simple Cases):
- If you don't need to return any specific information after the operation, a two-step approach can be used:
- First, attempt an
INSERT
statement. - If a conflict occurs (unique constraint violation), catch the error and perform an
UPDATE
statement.
- First, attempt an
This is suitable for simpler scenarios where you only care about successful insertion or update, and returning data is not required.
INSERT with DO NOTHING (Ignoring Conflicts):
- The
ON CONFLICT DO NOTHING
clause allows you to simply skip conflicting inserts.
This method works if you only want to insert new rows and don't need to update existing ones that violate unique constraints.
Conditional Logic with INSERT and SELECT (Checking for Existing Rows):
- You can use a
SELECT
statement to check if a row with the desired values already exists. - If it doesn't exist, perform an
INSERT
statement.
This approach involves more code but can be useful when you need to perform additional actions based on whether an insert is successful or not.
Triggers (Advanced Conflict Handling):
- PostgreSQL triggers allow you to define custom logic that fires before or after specific database operations like inserts.
- You can create a trigger that checks for conflicts and performs custom actions like updates, logging, or sending notifications.
Triggers offer greater flexibility but require more complex code and careful design to avoid unintended side effects.
Choosing the Right Method:
The best method depends on your specific requirements. Consider factors like:
- Need to return data about inserted or updated rows.
- How you want to handle conflicts (skip, update, etc.).
- Complexity of conflict resolution logic.
- Performance considerations (separate statements might be faster for simple cases).
For upserts with data retrieval and clear update logic, RETURNING
with ON CONFLICT
is a strong choice. For simpler scenarios or when advanced conflict handling is needed, consider alternative approaches like separate statements or triggers.
sql postgresql upsert