Detect Duplicates in Postgres with Cartesian Product

Brett Cassette
InstructorBrett Cassette
Share this video with your friends

Social Share Links

Send Tweet
Published 7 years ago
Updated 5 years ago

The cartesian product is a humdinger -- it relates all data in one table to all data in another! Now that’s a lot of rows, and fast. Surprisingly, cartesian products are as fast as inner joins at duplicate detection, thanks to the magic of the query planner!

[00:00] Let's look at an inner join. We'll inner join backers when projects.ID equal backers.projectID. This gives us all the projects where there's a backer for that project. A cross join doesn't need a clause. It takes every single project and matches it to every single backer regardless of whether or not they backed that project.

[00:30] Here we see the second project matched again to every single backer, and again, and again. This gives us the entire Cartesian product of the projects table matched up with the backers table. We could even cross join onto ourselves. P1 cross joins onto P2.

[00:53] Maybe we'd use this to detect some duplicates. Perhaps, where the P1.ID is not equal to the P2.ID and the P1.name is equal to the P2.name. This shows us two different projects that have the same name.

egghead
egghead
~ 32 minutes ago

Member comments are a way for members to communicate, interact, and ask questions about a lesson.

The instructor or someone from the community might respond to your question Here are a few basic guidelines to commenting on egghead.io

Be on-Topic

Comments are for discussing a lesson. If you're having a general issue with the website functionality, please contact us at support@egghead.io.

Avoid meta-discussion

  • This was great!
  • This was horrible!
  • I didn't like this because it didn't match my skill level.
  • +1 It will likely be deleted as spam.

Code Problems?

Should be accompanied by code! Codesandbox or Stackblitz provide a way to share code and discuss it in context

Details and Context

Vague question? Vague answer. Any details and context you can provide will lure more interesting answers!

Markdown supported.
Become a member to join the discussionEnroll Today