Themes (Experimental)¶
- class ThemeTree[source]¶
A hierarchical tree structure rooted in a main theme, branching into distinct sub-themes that guide the analyst’s research process.
Each node in the tree provides a unique identifier, a descriptive label, and a summary explaining its relevance.
- Parameters:
label (str) – The name of the theme or sub-theme.
node (int) – A unique identifier for the node.
summary (str) – A brief explanation of the node’s relevance. For the root node (main theme), this describes the overall theme; for sub-nodes, it explains their connection to the parent theme.
children (Optional[List[ThemeTree]]) – A list of child nodes representing sub-themes.
- static from_dict(tree_dict)[source]¶
Create a ThemeTree object from a dictionary.
- Parameters:
tree_dict (dict) –
A dictionary representing the ThemeTree structure with the following keys:
label (str): The name of the theme or sub-theme.
node (int): A unique identifier for the node.
summary (str): A brief explanation of the node’s relevance.
children (list, optional): A list of dictionaries representing sub-themes, each following the same structure.
- Returns:
The ThemeTree object generated from the dictionary.
- Return type:
- get_label_summaries()[source]¶
Extract the label summaries from the tree.
- Returns:
Dictionary with all the labels of the ThemeTree as keys and their associated summaries as values.
- Return type:
dict[str, str]
- get_summaries()[source]¶
Extract the node summaries from a ThemeTree.
- Returns:
List of all ‘summary’ values in the tree, including its children.
- Return type:
list[str]
- get_terminal_label_summaries()[source]¶
Extract the summaries from terminal nodes of the tree.
- Returns:
Dictionary with the labels of the ThemeTree as keys and their associated summaries as values, only using terminal nodes.
- Return type:
dict[str, str]
- generate_theme_tree(main_theme, dataset, focus='', llm_model_config=None)[source]¶
Generate a ThemeTree class from a main theme and a dataset.
- Parameters:
main_theme (str) – The primary theme to analyze.
dataset (SourceType) – The dataset type to filter by.
focus (str, optional) – Specific aspect(s) to guide sub-theme generation.
llm_model_config (dict) – Configuration for the large language model used to generate themes. Expected keys: - provider (str): The model provider (e.g., ‘openai’). - model (str): The model name (e.g., ‘gpt-4o-mini’). - kwargs (dict): Additional parameters for model execution, such as: - temperature (float) - top_p (float) - frequency_penalty (float) - presence_penalty (float) - seed (int) - etc.
- Returns:
The generated theme tree.
- Return type:
- stringify_label_summaries(label_summaries)[source]¶
Convert the label summaries of a ThemeTree into a list of strings.
- Parameters:
label_summaries (dict[str, str]) – A dictionary of label summaries of ThemeTree. Expected format: {label: summary}.
- Returns:
A list of strings, each one containing a label and its summary.
- Return type:
List[str]
- class SourceType[source]¶
Enumeration representing different types of data sources that can be analyzed.
- PATENTS¶
Represents patent-related data.
- Type:
str
- JOBS¶
Represents job postings data.
- Type:
str
- CORPORATE_DOCS¶
Represents corporate documents such as reports or filings.
- Type:
str
- __new__(value)¶