Rerank search (BETA)

A Similarity search query retrieves an extensive list of potentially relevant results from hundreds of millions of documents. Documents are matched and ranked to provide the best possible results quickly. However, some of the results might not be as relevant as expected.

To further improve the quality of results, it is possible to apply a second phase, a re-ranker based on a Cross-Encoder, to rank the most promising first-phase candidates again using the same text provided in the Similarity query.

The re-ranker defines a new relevance value between 0 and 1 for each text chunk and drops those with a relevance value below the specified rerank_threshold provided.

We recommend specifying a date_range and retrieving many documents or chunks so that all the first-phase chunks pass through the re-ranker. Only the returned chunks after the second phase will count as API query unit usage.

The following example returns all the chunks from the previous week with a relevance higher than 0.9

from bigdata_client import Bigdata
from bigdata_client.query import Similarity
from bigdata_client.daterange import RollingDateRange

bigdata = Bigdata()

# Search using Reranker
query = Similarity("partnership agreement")
search = bigdata.search.new(query, rerank_threshold=0.9, date_range=RollingDateRange.LAST_WEEK)
documents = search.run(5)

# Read all retrieved documents and print some details
for doc in documents:
    print(f"\nDocument headline: {doc.headline}")
    print(f"Document timestamp: {doc.timestamp}")
    for chunk in doc.chunks:
        # Print the chunk relevance and text
        print(f"  Chunk relevance: {chunk.relevance}")
        print(f"  Chunk text: {chunk.text}")

Output:

Document headline: PMAESA and RCREEE sign partnership agreement
Document timestamp: 2025-01-24 14:26:34+00:00
Chunk relevance: 0.9964342
Chunk text: The partnership agreement is primarily centered on strengthening cooperation through joint activities focused on sustainable energy practices for sustainable port management as well as supporting development of Green Port technologies and energy efficiency practices within Eastern and Southern African ports.

Document headline: WABE, Appen Media Group/Decaturish sign new partnership agreement
Document timestamp: 2025-01-22 16:57:39+00:00
Chunk relevance: 0.9891816
Chunk text: WABE and Appen Media Group/Decaturish recently signed a new partnership agreement that will mean more collaboration between the two organizations. The two companies are also discussing a multimedia collaboration to be determined.
Chunk relevance: 0.938124
Chunk text: Whisenhunt notes the company's other productive partnership with Atlanta News First, the local CBS affiliate. "Our agreement with ANF and WABE is proof positive that collaborations, when carefully considered and managed, can be mutually beneficial," he said. "Our readers have benefited from the expanded reach we achieved with access to articles provided by our content partners."

Document headline: UAE delegation concludes successful participation at World Economic Forum in Davos
Document timestamp: 2025-01-24 13:42:15+00:00
Chunk relevance: 0.98340684
Chunk text: The UAE and the World Economic Forum signed a partnership agreement to promote global adoption of future technologies. Leveraging the UAE's Centre for the Fourth Industrial Revolution (UAE C4IR), overseen by the Dubai Future Foundation (DFF), this partnership will foster collaboration and development of emerging technologies, solidifying the UAE's role as a global hub for Fourth Industrial Revolution innovation.

Document headline: Clipper Realty Inc. files FORM S-3 on Jan 23, 2025
Document timestamp: 2025-01-23 19:03:11+00:00
Chunk relevance: 0.97640073
Chunk text: Amendments to the Partnership Agreement Amendments to the partnership agreement may only be proposed by the general partner. Generally, the partnership agreement may be amended with the general partner's approval and the approval of the limited partners holding a majority in interests of all limited partners. Certain amendments that would, among other things, have the following effects, must be approved by each partner adversely affected thereby:
Chunk relevance: 0.97334224
Chunk text: Neither the general partner nor its directors and officers are liable to the operating partnership, the limited partners or their assignees for losses sustained, liabilities incurred or benefits not derived as a result of errors in judgment or mistakes of fact or law or of any act or omission, so long as such person acted in good faith. The partnership agreement provides for indemnification of the general partner, its affiliates and each of their respective officers, directors, employees and any persons the general partner may designate from time to time in its sole and absolute discretion, to the fullest extent permitted by applicable law, provided that the operating partnership will not indemnify such person for (i) material acts or omissions that were committed in bad faith or were the result of active and deliberate dishonesty, (ii) any transaction for which such person received an improper personal benefit in money, property or services violation or breach of any provision of the partnership agreement, or (iii) in the case of a criminal proceeding, the person had reasonable cause to believe the act or omission was unlawful, as set forth in the partnership agreement.
Chunk relevance: 0.9548163
Chunk text: Pursuant to the partnership agreement, the general partner is the partnership representative of the operating partnership and has certain other rights relating to tax matters. Accordingly, as both the general partner and partnership representative, we have authority to handle tax audits and to make tax elections under the Code, in each case, on behalf of the operating partnership. The partnership agreement provides that the operating partnership is to be operated in a manner that will enable us to satisfy the requirements for qualification as a REIT for U.S. federal income tax purposes, and ensure that the operating partnership will not be classified as a "publicly traded partnership" taxable as a corporation for purposes of the Code.
Chunk relevance: 0.920218
Chunk text: With certain limited exceptions, the limited partners may not transfer their interests in our operating partnership, in whole or in part, without the general partner's prior written consent, which consent may be withheld in the general partner's sole and absolute discretion. The partnership agreement generally provides that the general partner may cause the operating partnership to make quarterly (or more frequent) distributions of all, or such portion as the general partner may in its sole and absolute discretion determine, of available cash (which is defined to be cash available for distribution as determined by the general partner) pro rata according to the partners' respective percentage interests. The operating partnership also has the ability to grant preferred operating partnership interests, which would be entitled to distributions in accordance with any such preference (and, within each such class, pro rata according to their respective percentage interests).

Document headline: UAE concludes successful participation at WEF 2025
Document timestamp: 2025-01-24 13:13:06+00:00
Chunk relevance: 0.9753431
Chunk text: The UAE and the World Economic Forum signed a partnership agreement to promote global adoption of future technologies. Leveraging the UAE's Centre for the Fourth Industrial Revolution (UAE C4IR), overseen by the Dubai Future Foundation (DFF), this partnership will foster collaboration and development of emerging technologies, solidifying the UAE's role as a global hub for Fourth Industrial Revolution innovation.

Warning

The re-ranker has a timeout of 1 second. In an error scenario, the chunks are still returned with their original relevance value. We plan to increase the timeout and create a warning message in the SDK in case of any error in the re-ranker.

Why don’t we use the re-ranker in the first phase directly?

The re-ranker model is optimized to rerank only a subset of the best candidates from the first phase, but it is not built to search millions of documents.