Themes (Experimental)

class ThemeTree[source]

A hierarchical tree structure rooted in a main theme, branching into distinct sub-themes that guide the analyst’s research process.

Each node in the tree provides a unique identifier, a descriptive label, and a summary explaining its relevance.

Parameters:
  • label (str) – The name of the theme or sub-theme.

  • node (int) – A unique identifier for the node.

  • summary (str) – A brief explanation of the node’s relevance. For the root node (main theme), this describes the overall theme; for sub-nodes, it explains their connection to the parent theme.

  • children (Optional[List[ThemeTree]]) – A list of child nodes representing sub-themes.

static from_dict(tree_dict)[source]

Create a ThemeTree object from a dictionary.

Parameters:

tree_dict (dict) –

A dictionary representing the ThemeTree structure with the following keys:

  • label (str): The name of the theme or sub-theme.

  • node (int): A unique identifier for the node.

  • summary (str): A brief explanation of the node’s relevance.

  • children (list, optional): A list of dictionaries representing sub-themes, each following the same structure.

Returns:

The ThemeTree object generated from the dictionary.

Return type:

ThemeTree

get_label_summaries()[source]

Extract the label summaries from the tree.

Returns:

Dictionary with all the labels of the ThemeTree as keys and their associated summaries as values.

Return type:

dict[str, str]

get_summaries()[source]

Extract the node summaries from a ThemeTree.

Returns:

List of all ‘summary’ values in the tree, including its children.

Return type:

list[str]

get_terminal_label_summaries()[source]

Extract the summaries from terminal nodes of the tree.

Returns:

Dictionary with the labels of the ThemeTree as keys and their associated summaries as values, only using terminal nodes.

Return type:

dict[str, str]

print(prefix='')[source]

Print the tree.

Parameters:

prefix (str) – prefix to add to each branch, if any.

Returns:

None.

Return type:

None

visualize()[source]

Visualize the tree. Will use a plotly treemap.

Returns:

None. Will show the tree visualization as a plotly graph.

Return type:

None

generate_theme_tree(main_theme, dataset, focus='', llm_model_config=None)[source]

Generate a ThemeTree class from a main theme and a dataset.

Parameters:
  • main_theme (str) – The primary theme to analyze.

  • dataset (SourceType) – The dataset type to filter by.

  • focus (str, optional) – Specific aspect(s) to guide sub-theme generation.

  • llm_model_config (dict) – Configuration for the large language model used to generate themes. Expected keys: - provider (str): The model provider (e.g., ‘openai’). - model (str): The model name (e.g., ‘gpt-4o-mini’). - kwargs (dict): Additional parameters for model execution, such as: - temperature (float) - top_p (float) - frequency_penalty (float) - presence_penalty (float) - seed (int) - etc.

Returns:

The generated theme tree.

Return type:

ThemeTree

stringify_label_summaries(label_summaries)[source]

Convert the label summaries of a ThemeTree into a list of strings.

Parameters:

label_summaries (dict[str, str]) – A dictionary of label summaries of ThemeTree. Expected format: {label: summary}.

Returns:

A list of strings, each one containing a label and its summary.

Return type:

List[str]

class SourceType[source]

Enumeration representing different types of data sources that can be analyzed.

PATENTS

Represents patent-related data.

Type:

str

JOBS

Represents job postings data.

Type:

str

CORPORATE_DOCS

Represents corporate documents such as reports or filings.

Type:

str

__new__(value)