A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a...http://www.google.de/patents/US8046372?utm_source=gb-gplus-sharePatent US8046372 - Duplicate entry detection system and method