Similarity Search Demystified¶

Finding Relevant Needles in the Data Haystack!

Introduction¶

The Bigdata.com API provides powerful retrieval capabilities, enabling you to search and analyze news articles, transcripts, corporate filings, and other documents. Notably, it supports both keyword-based searches and similarity searches, along with a range of other advanced search features.

In this notebook, we’ll demonstrate how to use the Bigdata.com API to perform a similarity search effectively.

# Import required modules and classes
import html

from IPython.display import display, HTML
from bigdata_client import Bigdata
from bigdata_client.daterange import RollingDateRange
from bigdata_client.models.advanced_search_query import Similarity
from bigdata_client.models.search import DocumentType, SortBy

# Initialize the Bigdata client
# Make sure BIGDATA_USERNAME and BIGDATA_PASSWORD are set in the environment
# Alternatively, you can pass your credentials directly to the Bigdata class
bigdata = Bigdata()

Helper Functions¶

We define a helper function to show the search results in a nicely formatted HTML:

def escape_special_chars(text):
    """Escapes special characters for safe HTML display."""
    text = html.escape(text)  # Escapes HTML special characters like <, >, &
    # text = text.replace(r"$", r"\$")  # Escape the dollar sign properly
    text = text.replace("  ", "&nbsp;&nbsp;")  # Preserve double spaces
    return text


def print_results_html(results):
    """Prints search results in a readable format."""
    html_output = """
    <style>
        .results-container {
            font-family: Arial, sans-serif;
            background: #1e1e1e;
            color: white;
            padding: 20px;
            border-radius: 10px;
            max-width: 800px;
            margin: auto;
            box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.5);
        }
        .result-card {
            border: 1px solid #444;
            padding: 15px;
            margin: 15px 0;
            border-radius: 8px;
            background: #2a2a2a;
            transition: transform 0.2s, box-shadow 0.2s;
        }
        .result-card:hover {
            transform: scale(1.02);
            box-shadow: 0px 4px 10px rgba(255, 255, 255, 0.1);
        }
        .rank-container {
            display: flex;
            gap: 10px; /* Space between rank bubbles */
            align-items: center;
            margin-bottom: 10px;
        }
        .rank-badge {
            font-weight: bold;
            font-size: 16px;
            padding: 6px 12px;
            border-radius: 20px;
            display: inline-block;
            color: white;
        }
        .badge-blue {
            background: #1E88E5;
        }
        .headline {
            font-size: 20px;
            font-weight: bold;
            color: #ffcc00;
        }
        .timestamp {
            font-size: 14px;
            color: #cccccc;
        }
        .text {
            font-size: 16px;
            line-height: 1.6;
            color: #dddddd;
        }
    </style>
    <div class='results-container'>
    """

    for idx, document in enumerate(results, 1):
        # Infer ranks for the document
        headline = escape_special_chars(document.headline.title())
        timestamp = document.timestamp.strftime("%Y-%m-%d %H:%M:%S")
        relevance = round(document.chunks[0].relevance, 2)
        first_chunk_text = escape_special_chars(document.chunks[0].text)

        html_output += f"""
        <div class='result-card'>
            <div class='rank-container'>
                    <div class='rank-badge badge-blue'>{('📕📗📘' * idx)[:idx]}
        </div>
            </div>
            <div class='headline'>{headline}</div>
            <div class='timestamp'><strong>Timestamp:</strong> {timestamp}</div>
            <div class='relevance'><strong>📘 Relevance:</strong> {relevance}</div>
            <div class='text'>{first_chunk_text}</div>
        </div>
        """

    html_output += "</div>"

    display(HTML(html_output))

Define Search Query and Parameters¶

We define our search parameters, including the query, time period, and the number of documents to retrieve. In this example, we are searching for articles related to the Federal Reserve’s actions on inflation and concerns about tariffs.

# Create a similarity search query
query = Similarity('Fed addresses inflation amid tariff concerns')

# Search within a specific time frame
DATE_RANGE = RollingDateRange.LAST_WEEK

# Set the rerank threshold to improve search relevance
RERANK_THRESHOLD = 0.85

# This will limit the search to news articles only
chunk_relevance = ...

# Set the maximum number of documents to retrieve
DOCUMENT_LIMIT = 10

Execute Search¶

We now run the search using the specified parameters.

One of the key features of the Bigdata API is the ability to rerank the search results based on relevance scores. This is a cross-encoder reranking that can help you find the most relevant documents quickly. You can read more about the reranking feature here.

We activate this feature by setting the rerank_threshold:

# Execute the search
# Configure and execute the search with specified parameters
search = bigdata.search.new(
    query=query,
    date_range=DATE_RANGE,
    rerank_threshold=RERANK_THRESHOLD,
    scope=DocumentType.NEWS,  # Limit to news articles
    sortby=SortBy.RELEVANCE  # Sort by relevance score
)

# Run the search and get results
results = search.run(DOCUMENT_LIMIT)

Display Results¶

Now that we have the search results, we can display them in a readable format:

print_results_html(results)

📕

Tariffs Could Factor Into Fed'S Rate-Cut Plans Amid Inflation Concerns, Experts Say

Timestamp: 2025-02-12 23:32:00

📘 Relevance: 1.0

Federal Reserve Chair Jerome Powell testified before the House Financial Services Committee on Wednesday and was asked about the impact of tariffs on Americans' cost of living and the central bank's efforts to tame inflation, and the chairman noted that the Fed doesn't comment on policy decisions it doesn't have discretion over.

📕📗

Dollar Strengthens Amid Tariff Announcements, Stocks Rise

Timestamp: 2025-02-10 16:52:57

📘 Relevance: 1.0

Amidst the market stir, economists express concerns about renewed inflation pressures, potentially affecting Federal Reserve's rate flexibility. With Fed Chair Jerome Powell to speak soon, investors keenly await insights into monetary policy directions amid these tariff tensions.

📕📗📘

Market Jitters Amid Inflation Concerns And Tariff Talks

Timestamp: 2025-02-13 12:53:11

📘 Relevance: 0.99

The latest consumer price index reflects the largest inflation rise in 18 months, reinforcing the Federal Reserve's stance against immediate interest rate cuts. This sentiment is echoed by BNP Paribas economists, who note the persistent inflation concerns amid rising labor demand and anticipated tariff impacts.

📕📗📘📕

Us Inflation Worsened Last Month With Cost Of Groceries And Gasoline Rising The Trend Will Likely Underscore The Federal Reserve'S Resolve To Delay Any Further Interest Rate Cuts

Timestamp: 2025-02-12 15:44:20

📘 Relevance: 0.99

On Tuesday, Fed Chair Powell acknowledged that higher tariffs could lift inflation and limit the central bank's ability to cut rates, calling it "a possible outcome." But he emphasized that it would depend on how many imports are hit with tariffs and for how long.

📕📗📘📕📗

Dollar Gains As Trump Warns Of New Tariffs

Timestamp: 2025-02-10 20:17:36

📘 Relevance: 0.99

Analysts express concerns over potential U.S. inflation pressures due to these tariffs, possibly affecting the Federal Reserve's rate cut options. Fed Chair Jerome Powell's upcoming testimony is anticipated for insights into monetary policy adjustments regarding tariffs and inflation prospects.

📕📗📘📕📗📘

Fed Faces Thorny Decisions As It Weighs When To Lower Interest Rates Amid Trump'S Tariffs

Timestamp: 2025-02-11 10:24:31

📘 Relevance: 0.99

"This is not what an overheating economy looks like to me," he said. Fed Governor Christopher Waller, considered a more "hawkish" official who's more apt to keep rates high to fight inflation, said he's not worried about tariffs because they represent a "one-time" increase in prices. In other words, annual inflation should return to where it was the following year. Also, the Fed's rates are aimed at reducing inflation by cooling consumer demand for products and services. Waller's view is consistent with the idea that keeping rates high may not be relevant in this case because it's tariffs that are raising prices, not consumer spending, Goolsbee said.

📕📗📘📕📗📘📕

U.S. Inflation Worsened Last Month With Cost Of Groceries And Gasoline Rising

Timestamp: 2025-02-12 14:06:35

📘 Relevance: 0.99

📕📗📘📕📗📘📕📗

Inflation Fears Grow With Tariff Uncertainty Looming

Timestamp: 2025-02-10 19:45:14

📘 Relevance: 0.98

Several Fed officials have said in recent weeks that the central bank's response to higher prices resulting from tariffs will depend on whether inflation expectations remain well anchored. U.S. consumers' long-term inflation expectations edged higher in January ahead of tariff announcements by the Trump administration, a monthly Federal Reserve Bank of New York survey showed. Expected inflation five years ahead rose to 3% last month, the highest since May 2024, according to results of the New York Fed's Survey of Consumer Expectations published Feb. 10. Expected inflation rates over the next year and three years ahead were both unchanged from December at 3%.

📕📗📘📕📗📘📕📗📘

Commodity Prices Mixed As Trump'S Tariffs Fuel Inflation Fears

Timestamp: 2025-02-11 16:05:50

📘 Relevance: 0.98

New York: Commodity prices showed a mixed trend last week as US President Donald Trump's escalating tariff policies fueled inflation concerns, casting uncertainty over global markets. According to Anadolu Agency, investors are closely watching the upcoming US macroeconomic data, including inflation figures and Federal Reserve Chair Jerome Powell's statements, amid concerns that the Fed's monetary policy may not align with Trump's trade strategies. Analysts note that while the US is avoiding a trade war with Canada and Mexico, tensions with China continue to rise, heightening fears of inflationary pressure.

📕📗📘📕📗📘📕📗📘📕

U.S. Inflation Increases To 3 Percent, Groceries And Gasoline Prices Heading Higher

Timestamp: 2025-02-12 16:16:23

📘 Relevance: 0.98

Conclusion¶

For more details and documentation on the Bigdata.com API, refer to the official documentation. There are many more filters that you can apply to narrow down your search results.

Happy Searching! 🚀