SAE MODEL #2

FEATURE 1506

/ Justification/Rationale for Actions

This page shows the detailed analysis of a specific feature in the Sparse Autoencoder (SAE). It includes the semantic interpretation generated by the LLM, the top activating documents that trigger this feature, and statistical metrics like density and activation distribution.

Semantic Interpretation

gemma3:12b #17

The neuron appears to activate when discussing the reasons, processes, or justifications behind actions, decisions, or changes in behavior, particularly within the context of health, lifestyle choices, or systems requiring explanation. This includes rationales for medical procedures, adherence to rules, or shifts in personal beliefs (e.g., exiting vegetarianism). The key element is the 'why' behind an action, often involving a process of evaluation or justification. It's not simply about the action itself, but the reasoning and context surrounding it. The activation is tied to explanations, analyses, and the underlying motivations for choices.

STATISTICS & DISTRIBUTION

Density

0.00100

Peak Act

3.62

0.0 Max

Global Context

TOP ACTIVATING CONTEXTS

DOC #151 ANALYZE

ACT: 3.6173

Validating the Access to an Electronic Health Record: Classification and Content Analysis of Access Logs. Electronic Health Records (EHRs) have made patient information widely available, allowing health professionals to provide better care. However, information confidentiality is an issue that continually needs to be taken into account. The object…

DOC #650 ANALYZE

ACT: 1.3813

Health and citizenship: the characteristics of 21st century health. Health is at the core of modernity and its governance has been characterised by two expansions: an expansion of the territory of health into an increasing array of personal and political spaces; and an expansion of the do-ability of health. Health is an exemplary area to study the…

DOC #523 ANALYZE

ACT: 1.2738

Endoscopic lung volume reduction. Chronic obstructive pulmonary disease (COPD) is a category of diseases characterized by chronic airflow obstruction and hyperinflation. The GOLD committee and the American Thoracic Society/European Respiratory Society have published detailed, evidence-based reviews of management approaches, providing stepped-care …

DOC #977 ANALYZE

ACT: 1.2612

Effective medium theory for drag-reducing micro-patterned surfaces in turbulent flows. Many studies in the last decade have revealed that patterns at the microscale can reduce skin drag. Yet, the mechanisms and parameters that control drag reduction, e.g. Reynolds number and pattern geometry, are still unclear. We propose an effective medium repre…

DOC #993 ANALYZE

ACT: 1.2384

Acne cosmetica revisited: a case-control study shows a dose-dependent inverse association between overall cosmetic use and post-adolescent acne. Case-control studies to support the concept of acne cosmetica are lacking. To examine the association of post-adolescent acne with the use of cosmetics and cosmetic procedures. 910 post-adolescent patient…

CORRELATIONS

W – Weight-space · similarity between decoder vectors (features that point in similar directions in the embedding space).

D – Data / co-activation · features that tend to fire together on the same documents (co-occurrence in the dataset).

Ollama Model

Temp

K+

K-

System Prompt

You are a meticulous researcher investigating a specific neuron in a language model. Your task is to determine what behavior this neuron is responsible for: what concepts, topics, or linguistic features does it activate on?

INPUT DESCRIPTION: You will receive two inputs: 1) Maximum Activation Examples and 2) Zero Activation Examples.
1. You will be given several text examples that activate the neuron, along with a number indicating how strongly it was activated. This means there is some feature, concept, or pattern in this text that 'excites' this neuron.
2. You will also be given several text examples that do NOT activate the neuron. This means the feature or concept is not present in these texts.

OUTPUT DESCRIPTION: Given the inputs provided, complete the following tasks.
1. Based on the MAXIMUM ACTIVATION EXAMPLES, list potential topics, concepts, themes, and features they have in common. Be specific. You may need to look at different levels of granularity. List as many as possible. Give greater weight to concepts more prominent in higher-activation examples.
2. Based on the zero activation examples, systematically exclude any topic/concept/feature listed above that also appears in the zero activation examples.
3. Based on the two previous steps, perform a thorough analysis of which feature, concept, or topic, at which level of granularity, is likely to activate this neuron. Use Occam's razor, as long as it fits the evidence provided. Be highly rational and analytical.
4. Based on step 3, summarize this concept in 1-8 words, in the form FINAL: <explanation>. Do NOT return anything after these 1-8 words.

Respond EXCLUSIVELY with valid JSON: {'label': '...', 'description': '...'}

FEATURE 1506

Semantic Interpretation

Statistics Explained

Global Context