refactor: search provider factory pattern + unit tests
All checks were successful
Stuffle/nebula-os/pipeline/head This commit looks good
All checks were successful
Stuffle/nebula-os/pipeline/head This commit looks good
- New module: src/execution_system/tools/search_providers/ - base.py: SearchProvider ABC + SearchResult dataclass - duckduckgo.py: DuckDuckGoProvider (httpx, no API key, always available) - serper.py: SerperProvider (Google via Serper.dev, requires SERPER_API_KEY) - factory.py: get_search_provider() — auto-selects by availability, respects hint - web_tools.py: now transport-only, delegates entirely to get_search_provider() - 33 unit tests: provider behaviour, factory selection, tool execute() wiring
This commit is contained in:
717
docs/INTEGRATIONS_REPORT.md
Normal file
717
docs/INTEGRATIONS_REPORT.md
Normal file
@@ -0,0 +1,717 @@
|
||||
# NebulaOS Integration Report
|
||||
## Path-to-Integrate Reference
|
||||
|
||||
> Source: `integration_catalog` and `integration_providers` tables.
|
||||
> `auth_config` fields: `developerKey` = Nebula-side platform key; `userApiKey` = user supplies own key; `userOAuth` = user does OAuth flow; `requireLogin` = IAM login required; `localOnly` = runs via bridge extension only.
|
||||
|
||||
---
|
||||
|
||||
## Legend — Dependency Columns
|
||||
|
||||
| Column | Meaning |
|
||||
|---|---|
|
||||
| **User API Key** | User must supply their own API key / token (stored encrypted per-user) |
|
||||
| **User OAuth** | User authorizes via OAuth2 — no key stored, token managed by Nebula |
|
||||
| **User Credentials** | Username + password or host details the user supplies |
|
||||
| **Platform Key (Nebula)** | Nebula must procure and configure a platform-level key / app registration (shared across all users) |
|
||||
| **Code-Side Support** | Requires a Nebula-side plugin, SDK wrapper, or webhook handler to be built |
|
||||
| **Auth Method** | Primary auth type from DB |
|
||||
| **Status** | `built_in` / `partial` / `not_started` |
|
||||
|
||||
---
|
||||
|
||||
## 1. Dev Tools — Source Control
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **GitHub** | oauth2 / api_key / github_app | ✅ Personal Access Token or App token | ✅ OAuth app flow | — | ✅ GitHub OAuth App or GitHub App registration (client_id + client_secret) | ✅ Webhook handler for push/PR events | partial |
|
||||
| **GitLab** | oauth2 / api_key | ✅ Personal Access Token | ✅ OAuth app flow | — | ✅ GitLab OAuth app registration | ✅ Webhook handler for MR/pipeline events | not_started |
|
||||
| **Bitbucket** | oauth2 / api_key | ✅ App password or token | ✅ OAuth app flow | — | ✅ Bitbucket OAuth consumer | ✅ Webhook handler | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**GitHub (via chat)**
|
||||
- Chat asks: "Which GitHub repo do you want to connect to Nebula?"
|
||||
- Chat asks: "Do you prefer OAuth (click to authorize) or a Personal Access Token?"
|
||||
- OAuth path → Nebula redirects to GitHub authorization page → token saved automatically
|
||||
- PAT path → Chat asks: "Paste your GitHub PAT (repo + webhook scopes required)"
|
||||
- Chat asks: "What events should trigger Nebula? (push, pull_request, issues, releases)"
|
||||
- Chat confirms: "Webhook registered at `https://nebula.armco.dev/webhooks/<slug>`"
|
||||
- Chat asks: "Should Nebula create a test issue to verify connectivity?"
|
||||
|
||||
**GitLab / Bitbucket** — Same flow, substitute PAT or OAuth app per provider.
|
||||
|
||||
---
|
||||
|
||||
## 2. Dev Tools — Project Management
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Jira** | api_key / oauth2 | ✅ API token (email + token pair) | ✅ Atlassian OAuth | — | ✅ Atlassian OAuth app (client_id + secret) for OAuth path | ✅ Webhook receiver for issue events | partial |
|
||||
| **Linear** | api_key / oauth2 | ✅ Personal API key | ✅ OAuth app | — | ✅ Linear OAuth app registration | ✅ Webhook receiver | partial |
|
||||
| **ClickUp** | oauth2 / api_key | ✅ Personal API token | ✅ OAuth app | — | ✅ ClickUp OAuth app | ✅ Webhook receiver | not_started |
|
||||
| **Asana** | oauth2 / api_key | ✅ Personal Access Token | ✅ OAuth app | — | ✅ Asana OAuth app | ✅ Webhook receiver | not_started |
|
||||
| **Trello** | api_key / oauth2 | ✅ API key + token | ✅ OAuth flow | — | ✅ Trello API key (Nebula app) | ✅ Webhook receiver | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Jira (via chat)**
|
||||
- Chat asks: "What's your Jira site URL? (e.g. `mycompany.atlassian.net`)"
|
||||
- Chat asks: "Enter your Atlassian account email"
|
||||
- Chat asks: "Paste your Jira API token (generate at `id.atlassian.com/manage-profile/security/api-tokens`)"
|
||||
- Chat asks: "Which Jira project should be the default? (key, e.g. `PROJ`)"
|
||||
- Chat confirms: "I'll now verify connectivity by fetching open issues from PROJ…"
|
||||
- Chat asks: "Should Nebula receive Jira webhooks to react to issue changes in real time?"
|
||||
|
||||
**Linear (via chat)**
|
||||
- Chat asks: "Paste your Linear API key (Settings → API → Personal API Keys)"
|
||||
- Chat asks: "Which team should be the default? (I'll list them)"
|
||||
- Chat confirms connection and optionally registers a webhook for status changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Dev Tools — Incident Management
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **PagerDuty** | api_key | ✅ REST API key + Events Routing key | — | — | — | ✅ Webhook receiver for incident events | partial |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**PagerDuty (via chat)**
|
||||
- Chat asks: "Paste your PagerDuty REST API key (read+write scopes)"
|
||||
- Chat asks: "Paste your PagerDuty Events API Routing Key (v2 Integration key)"
|
||||
- Chat asks: "Which service should Nebula trigger incidents on by default?"
|
||||
- Chat confirms: "I'll send a test trigger to verify…"
|
||||
- Chat asks: "Should Nebula receive webhook alerts from PagerDuty to auto-acknowledge incidents?"
|
||||
|
||||
---
|
||||
|
||||
## 4. Dev Tools — Monitoring, CI/CD & Enterprise DevOps
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Azure DevOps** | api_key | ✅ Personal Access Token (PAT) | — | ✅ Organization URL (`dev.azure.com/<org>`) | — | ✅ ADO REST API client + service hook receiver for pipeline/work-item events | partial |
|
||||
| **Sentry** | api_key | ✅ Auth token | — | — | ✅ Sentry app install (for project key auto-discovery) | ✅ Webhook receiver for issues | not_started |
|
||||
| **CircleCI** | api_key | ✅ Personal API token | — | — | — | ✅ Webhook receiver for build events | not_started |
|
||||
| **Jenkins** | api_key | ✅ API token + Jenkins user | — | ✅ Jenkins host URL + user credentials | — | ✅ Build trigger + webhook | not_started |
|
||||
| **TravisCI** | api_key | ✅ API token | — | — | — | ✅ Webhook | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Sentry (via chat)**
|
||||
- Chat asks: "Paste your Sentry auth token (Settings → Auth Tokens)"
|
||||
- Chat asks: "What's your Sentry organization slug?"
|
||||
- Chat asks: "Which project(s) should Nebula monitor?"
|
||||
- Chat confirms: "I'll list recent unresolved issues to verify…"
|
||||
|
||||
**CircleCI (via chat)**
|
||||
- Chat asks: "Paste your CircleCI Personal API token"
|
||||
- Chat asks: "What's the default project slug? (e.g. `gh/org/repo`)"
|
||||
- Chat asks: "Should Nebula receive pipeline webhooks to react to build failures?"
|
||||
|
||||
---
|
||||
|
||||
## 5. Dev Tools — Containers & Code Quality
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **DockerHub** | api_key | ✅ Access token | — | — | — | ✅ Webhook receiver for image pushes | not_started |
|
||||
| **Harbor** | api_key | ✅ Robot account token | — | ✅ Harbor host URL | — | ✅ API client | not_started |
|
||||
| **SonarQube** | api_key | ✅ User token | — | ✅ SonarQube host URL | — | ✅ API client + webhook | not_started |
|
||||
| **Codecov** | api_key | ✅ Upload token | — | — | — | ✅ API client | not_started |
|
||||
|
||||
---
|
||||
|
||||
## 6. Communication — Team Chat
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Slack** | oauth2 / bot_token | — | ✅ OAuth app install | — | ✅ Slack App registration (client_id + client_secret + signing_secret + bot scopes) | ✅ Event subscription handler, slash command endpoint | partial |
|
||||
| **Microsoft Teams** | oauth2 | — | ✅ OAuth (Microsoft identity platform) | — | ✅ Azure AD app registration (tenant, client_id, client_secret) | ✅ Bot Framework adapter or webhook handler | not_started |
|
||||
| **Discord** | bot_token / oauth2 | — | ✅ OAuth2 guild install | — | ✅ Discord Application registration (client_id + bot token — Nebula-owned app) | ✅ Gateway or interaction endpoint | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Slack (via chat)**
|
||||
- Chat says: "Click here to add Nebula to your Slack workspace → **Install App**"
|
||||
- (Nebula's registered Slack App handles the OAuth handshake; bot token returned)
|
||||
- Chat asks: "What's the default channel Nebula should post to? (e.g. `#nebula-alerts`)"
|
||||
- Chat asks: "Should Nebula accept slash commands? (e.g. `/nebula run <agent>`)"
|
||||
- Chat confirms: "Sending a test message to `#nebula-alerts`…"
|
||||
|
||||
**Microsoft Teams (via chat)**
|
||||
- Chat says: "Click here to authorize Nebula in your Microsoft 365 tenant → **Authorize**"
|
||||
- (Azure AD app OAuth flow; admin consent may be required for tenant-wide install)
|
||||
- Chat asks: "Which Teams channel should Nebula post to by default?"
|
||||
- Chat confirms connectivity with a test card message.
|
||||
|
||||
---
|
||||
|
||||
## 7. Communication — Consumer Chat
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Telegram** | bot_token | ✅ Bot token (user creates their own bot via @BotFather) | — | — | ✅ Nebula webhook endpoint registered per bot (but user supplies the token) | ✅ Webhook handler (`/webhooks/<slug>`) | partial |
|
||||
| **WhatsApp** | webhook | ✅ Access token + phone number ID | — | — | ✅ Meta app registration (Nebula WhatsApp Business account OR user's Business Account under Nebula's app) + Verify Token | ✅ Webhook handler + template management | partial |
|
||||
| **Signal** | linked_device | — | — | ✅ Signal CLI linked to user's phone number | — | ✅ Signal CLI integration (no official API) | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Telegram (via chat)**
|
||||
- Chat asks: "Do you have a Telegram bot already? If not, I'll guide you to create one via @BotFather in 2 minutes."
|
||||
- Chat asks: "Paste your bot token (from @BotFather)"
|
||||
- Chat asks: "What's the allowed chat ID or username? (leave blank to accept messages from anyone)"
|
||||
- Chat confirms: "Registering webhook… sending test message to verify."
|
||||
|
||||
**WhatsApp (via chat)**
|
||||
- Chat asks: "Are you using Meta's WhatsApp Business Platform? (We need a verified Business account)"
|
||||
- Chat asks: "Paste your WhatsApp Cloud API access token"
|
||||
- Chat asks: "Paste your Phone Number ID (from Meta Business Manager → WhatsApp → API Setup)"
|
||||
- Chat explains: "Nebula will register a webhook — Meta requires verifying our endpoint. This is handled automatically."
|
||||
- Chat asks: "Which message template should Nebula use for outbound notifications? (must be pre-approved by Meta)"
|
||||
- Chat confirms: "Sending test template message…"
|
||||
|
||||
---
|
||||
|
||||
## 8. Communication — SMS / CPaaS / Email
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Twilio** | api_key | ✅ Account SID + Auth Token | — | — | ✅ Twilio application SID (optional, for voice) | ✅ Webhook handler for inbound SMS/voice | not_started |
|
||||
| **SMS Gateways** | api_key | ✅ Gateway API key | — | — | ✅ Gateway platform key (e.g. Vonage API key) | ✅ Generic SMS adapter | not_started |
|
||||
| **SendGrid** | api_key | ✅ API key (or Nebula supplies platform-level key for shared sending) | — | — | ✅ Verified Sender Domain on Nebula's SendGrid account | ✅ Email template engine + webhook | not_started |
|
||||
| **Mailgun** | api_key | ✅ API key | — | — | ✅ Verified domain on Mailgun | ✅ Inbound email handler | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Twilio (via chat)**
|
||||
- Chat asks: "Paste your Twilio Account SID"
|
||||
- Chat asks: "Paste your Twilio Auth Token"
|
||||
- Chat asks: "What's your Twilio phone number? (in E.164 format, e.g. `+12025551234`)"
|
||||
- Chat asks: "Should Nebula receive inbound SMS at this number? (I'll register the webhook with Twilio)"
|
||||
- Chat confirms: "Sending a test SMS to verify outbound…"
|
||||
|
||||
---
|
||||
|
||||
## 9. Communication — Email Clients & Protocols
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Gmail** | oauth2 | — | ✅ Google OAuth | — | ✅ Google Cloud project with Gmail API enabled + OAuth client (Nebula's GCP project, client_id + client_secret) | ✅ Gmail API client | not_started |
|
||||
| **Outlook** | oauth2 | — | ✅ Microsoft OAuth | — | ✅ Azure AD app registration with Microsoft Graph scopes | ✅ Microsoft Graph client | not_started |
|
||||
| **SMTP** | credentials | — | — | ✅ Host, port, username, password | — | ✅ SMTP mailer | not_started |
|
||||
| **IMAP** | credentials | — | — | ✅ Host, port, username, password | — | ✅ IMAP client + polling | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Gmail (via chat)**
|
||||
- Chat says: "Click here to connect your Google account → **Authorize with Google**"
|
||||
- (Nebula's GCP OAuth flow; scopes: `gmail.modify`, `gmail.send`)
|
||||
- Chat asks: "Should Nebula monitor your inbox for triggers? If so, which labels or senders?"
|
||||
- Chat confirms: "I'll send a test email to verify access…"
|
||||
|
||||
**SMTP / IMAP (via chat)**
|
||||
- Chat asks: "What's the SMTP host and port? (e.g. `smtp.gmail.com:587`)"
|
||||
- Chat asks: "What's the email account username/email address?"
|
||||
- Chat asks: "Paste the password or app-specific password"
|
||||
- Chat asks: "Should IMAP monitoring be enabled? If so, which mailbox/folder?"
|
||||
- Chat confirms by attempting a connection.
|
||||
|
||||
---
|
||||
|
||||
## 10. Documents & Productivity
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Notion** | api_key / oauth2 | ✅ Internal integration token | ✅ Public OAuth | — | ✅ Notion OAuth app (client_id + secret) for public integrations | ✅ Notion API client | partial |
|
||||
| **Confluence** | api_key / oauth2 | ✅ API token (email + token) | ✅ Atlassian OAuth | — | ✅ Atlassian OAuth app | ✅ Confluence API client | not_started |
|
||||
| **Coda** | api_key | ✅ API token | — | — | — | ✅ Coda API client | not_started |
|
||||
| **Airtable** | api_key / oauth2 | ✅ Personal access token | ✅ OAuth app | — | ✅ Airtable OAuth app | ✅ Airtable API client + webhook | not_started |
|
||||
| **Google Drive** | oauth2 | — | ✅ Google OAuth | — | ✅ Google Cloud project, Drive API enabled, OAuth client | ✅ Drive API client | not_started |
|
||||
| **Dropbox** | oauth2 / api_key | ✅ Access token | ✅ OAuth app | — | ✅ Dropbox app (app key + secret) | ✅ Dropbox API client | not_started |
|
||||
| **OneDrive** | oauth2 | — | ✅ Microsoft OAuth | — | ✅ Azure AD app with Files.ReadWrite scope | ✅ Microsoft Graph client | not_started |
|
||||
| **SharePoint** | oauth2 | — | ✅ Microsoft OAuth | — | ✅ Azure AD app with Sites.ReadWrite scope | ✅ Microsoft Graph client | not_started |
|
||||
| **Miro** | oauth2 | — | ✅ OAuth app | — | ✅ Miro app registration | ✅ Miro API client | not_started |
|
||||
| **Figma** | api_key / oauth2 | ✅ Personal access token | ✅ OAuth app | — | ✅ Figma OAuth app | ✅ Figma API client | not_started |
|
||||
| **Canva** | oauth2 | — | ✅ OAuth app | — | ✅ Canva Connect app (client_id + secret) | ✅ Canva API client | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Notion (via chat)**
|
||||
- Chat asks: "Do you have an internal Notion integration already? If not, I can guide you to create one (2 min)."
|
||||
- Chat asks: "Paste your Notion integration token (or click to authorize via OAuth)"
|
||||
- Chat asks: "Which Notion databases or pages should Nebula have access to? (You must share them with the integration in Notion)"
|
||||
- Chat confirms: "I'll run a test query to verify access to the linked database…"
|
||||
|
||||
**Google Drive (via chat)**
|
||||
- Chat says: "Click here to connect your Google account → **Authorize with Google**"
|
||||
- (Scopes: `drive.file` or `drive` depending on required access level)
|
||||
- Chat asks: "Should Nebula have access to all files or just files it creates?"
|
||||
- Chat confirms with a test folder listing.
|
||||
|
||||
---
|
||||
|
||||
## 11. Video Conferencing
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Zoom** | oauth2 / jwt | — | ✅ OAuth (Server-to-Server or User) | — | ✅ Zoom app (account_id + client_id + client_secret for S2S OAuth) | ✅ Zoom API client + webhook handler | not_started |
|
||||
| **Google Meet** | oauth2 | — | ✅ Google OAuth | — | ✅ Google Cloud project, Calendar API | ✅ Calendar API client | not_started |
|
||||
| **Webex** | oauth2 / bot_token | — | ✅ OAuth | — | ✅ Webex app integration (client_id + secret) | ✅ Webex API client | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Zoom (via chat)**
|
||||
- Chat says: "Click here to connect your Zoom account → **Authorize**"
|
||||
- Chat asks: "Should Nebula be able to create meetings on your behalf?"
|
||||
- Chat asks: "Should Nebula receive webhook events (e.g. meeting started/ended, recording ready)?"
|
||||
- Chat confirms: "Creating a test instant meeting to verify…"
|
||||
|
||||
---
|
||||
|
||||
## 12. Cloud Infrastructure
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **AWS** | api_key / iam_role / sso | ✅ Access Key ID + Secret Access Key (or IAM Role) | — | — | — (each user supplies own AWS creds; no shared platform key) | ✅ Boto3 or AWS SDK adapter per service (S3, EC2, Lambda, etc.) | not_started |
|
||||
| **Azure** | oauth2 / api_key / managed_identity | ✅ Service principal credentials or access token | ✅ Azure AD OAuth | — | ✅ Azure AD app registration (if using delegated OAuth) | ✅ Azure SDK adapter | not_started |
|
||||
| **Google Cloud** | oauth2 / service_account / api_key | ✅ Service account JSON key | ✅ OAuth | — | ✅ GCP project + APIs enabled (Cloud Run, BigQuery, etc.) | ✅ Google Cloud client libraries | not_started |
|
||||
| **Kubernetes** | kubeconfig / service_account | ✅ kubeconfig file or service account token | — | — | — | ✅ kubectl / k8s client | not_started |
|
||||
| **Terraform** | api_key | ✅ Terraform Cloud API token | — | — | — | ✅ Terraform CLI runner in agent sandbox + plan/apply wrapper (requires `terraform` binary in sandbox) | not_started |
|
||||
| **Pulumi** | api_key | ✅ Pulumi access token | — | — | — | ✅ Pulumi CLI runner in agent sandbox (requires `pulumi` binary + language runtime in sandbox) | not_started |
|
||||
| **Cloudflare** | api_key / api_token | ✅ API token (zone or account-scoped) | — | — | — | ✅ Cloudflare API client | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**AWS (via chat)**
|
||||
- Chat warns: "AWS credentials are sensitive — Nebula stores them encrypted and never logs them."
|
||||
- Chat asks: "Paste your AWS Access Key ID"
|
||||
- Chat asks: "Paste your AWS Secret Access Key"
|
||||
- Chat asks: "Which AWS region should be the default?"
|
||||
- Chat asks: "Which services do you want Nebula to access? (S3, EC2, Lambda, DynamoDB…)"
|
||||
- Chat asks: "Optionally, name an IAM role ARN if you prefer role assumption over long-lived keys"
|
||||
- Chat confirms: "Running STS GetCallerIdentity to verify credentials…"
|
||||
|
||||
**Kubernetes (via chat)**
|
||||
- Chat asks: "Paste your kubeconfig (or a service account token + cluster URL)"
|
||||
- Chat asks: "Which namespace should Nebula operate in by default?"
|
||||
- Chat asks: "Should Nebula be restricted to read-only operations for safety?"
|
||||
- Chat confirms: "Listing pods in the namespace to verify connectivity…"
|
||||
|
||||
---
|
||||
|
||||
## 13. Databases
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **PostgreSQL** | credentials / connection_string | — | — | ✅ Host, port, DB name, user, password | — | ✅ asyncpg / psycopg2 adapter (built-in) | built_in |
|
||||
| **MySQL** | credentials / connection_string | — | — | ✅ Host, port, DB name, user, password | — | ✅ MySQL client adapter | not_started |
|
||||
| **SQLite** | none (local file) | — | — | ✅ File path only | — | ✅ sqlite3 (built-in Python stdlib) | not_started |
|
||||
| **MongoDB** | connection_string / credentials | — | — | ✅ Connection string or host+auth | — | ✅ PyMongo adapter | not_started |
|
||||
| **Redis** | credentials / connection_string | — | — | ✅ Host, port, password | — | ✅ redis-py adapter | not_started |
|
||||
| **Elasticsearch** | api_key / credentials | ✅ API key (cloud) | — | ✅ Host + basic auth (self-hosted) | — | ✅ elasticsearch-py adapter | not_started |
|
||||
| **ClickHouse** | credentials | — | — | ✅ Host, port, user, password | — | ✅ clickhouse-driver adapter | not_started |
|
||||
| **DuckDB** | none (local file) | — | — | ✅ File path only | — | ✅ duckdb Python adapter | not_started |
|
||||
| **Snowflake** | credentials / api_key / oauth2 | ✅ Account + user + warehouse + token | — | — | — | ✅ snowflake-connector-python | not_started |
|
||||
| **BigQuery** | oauth2 / service_account | ✅ Service account JSON key | ✅ OAuth | — | ✅ GCP project + BigQuery API enabled | ✅ google-cloud-bigquery adapter | not_started |
|
||||
| **Apache Kafka** | credentials / sasl | — | — | ✅ Bootstrap servers + SASL credentials | — | ✅ confluent-kafka-python adapter | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**PostgreSQL (via chat)**
|
||||
- Chat warns: "Database credentials are encrypted at rest and never logged."
|
||||
- Chat asks: "Paste your PostgreSQL connection string OR provide host, port, database, username, password separately"
|
||||
- Chat asks: "Should Nebula have read-only or read+write access?"
|
||||
- Chat asks: "Any specific schemas or tables Nebula should be aware of?"
|
||||
- Chat confirms: "Running `SELECT 1` to verify connectivity…"
|
||||
|
||||
**Redis (via chat)**
|
||||
- Chat asks: "What's your Redis host and port?"
|
||||
- Chat asks: "Is authentication enabled? If so, paste the password."
|
||||
- Chat asks: "Should Nebula use Redis as a pub/sub channel or just for caching?"
|
||||
- Chat confirms: "Sending a PING to verify…"
|
||||
|
||||
---
|
||||
|
||||
## 14. Payments & Finance
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Stripe** | api_key | ✅ Secret key (user's own Stripe account) | — | — | ✅ Stripe webhook signing secret (registered on Nebula's endpoint) | ✅ Stripe SDK + webhook handler | not_started |
|
||||
| **PayPal** | oauth2 / api_key | ✅ Client ID + Secret (Sandbox or Live) | ✅ OAuth app flow | — | ✅ PayPal app registration | ✅ PayPal SDK + webhook handler | not_started |
|
||||
| **Razorpay** | api_key | ✅ Key ID + Key Secret | — | — | ✅ Webhook secret for signature verification | ✅ Razorpay SDK + webhook handler | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Stripe (via chat)**
|
||||
- Chat warns: "Never share your live Stripe secret key over unsecured channels. Nebula stores it encrypted."
|
||||
- Chat asks: "Paste your Stripe Secret Key (`sk_live_...` or `sk_test_...` for testing)"
|
||||
- Chat asks: "Should Nebula receive Stripe webhooks? (payment_intent, subscription events, etc.)"
|
||||
- Chat asks: "Which events matter most? (charge.succeeded, invoice.paid, subscription.updated…)"
|
||||
- Chat confirms: "Registering webhook endpoint… verifying with a test event…"
|
||||
|
||||
---
|
||||
|
||||
## 15. AI Models
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **OpenAI** | api_key | ✅ API key (BYOAI) | — | — | ✅ Nebula platform key (for Offered AI quota) | ✅ openai Python SDK (built-in) | built_in |
|
||||
| **Anthropic** | api_key | ✅ API key (BYOAI) | — | — | ✅ Nebula platform key | ✅ anthropic Python SDK | built_in |
|
||||
| **Mistral AI** | api_key | ✅ API key | — | — | ✅ Nebula platform key | ✅ OpenAI-compatible client | built_in |
|
||||
| **Google Gemini** | api_key / oauth2 | ✅ API key | — | — | ✅ Nebula platform key | ✅ google-generativeai SDK | built_in |
|
||||
| **Cohere** | api_key | ✅ API key | — | — | — | ✅ cohere Python SDK | not_started |
|
||||
| **HuggingFace** | api_key | ✅ API key (inference) | — | — | — | ✅ huggingface_hub client | not_started |
|
||||
| **Groq** | api_key | ✅ API key | — | — | ✅ Nebula platform key | ✅ OpenAI-compatible client | built_in |
|
||||
| **Ollama** | none (local) | — | — | — | — | ✅ OpenAI-compatible client via Bridge extension | built_in |
|
||||
| **vLLM** | none / api_key | — | — | ✅ Endpoint URL | — | ✅ OpenAI-compatible client via Bridge | not_started |
|
||||
| **LM Studio** | none (local) | — | — | — | — | ✅ OpenAI-compatible client via Bridge | built_in |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**OpenAI BYOAI (via chat)**
|
||||
- Chat says: "You can register your own OpenAI key to bypass the shared quota."
|
||||
- Chat asks: "Paste your OpenAI API key (`sk-...`)"
|
||||
- Chat asks: "Which models do you want to make available? (gpt-4o, gpt-4o-mini, o1…)"
|
||||
- Chat confirms: "Testing key with a lightweight completion…"
|
||||
|
||||
**Ollama / LM Studio (via chat)**
|
||||
- Chat says: "These run locally. Make sure the Nebula Local Bridge extension is installed and active."
|
||||
- Chat asks: "What's the local URL? (default: `http://localhost:11434` for Ollama)"
|
||||
- Chat asks: "Which model do you want to use? (I can list available models)"
|
||||
- Chat confirms: "Bridge is active — testing model availability…"
|
||||
|
||||
---
|
||||
|
||||
## 16. Automation Platforms
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Playwright** | none (local) | — | — | — | — | ✅ Playwright Python runner (sandboxed agent tool) | not_started |
|
||||
| **Puppeteer** | none (local) | — | — | — | — | ✅ Puppeteer Node runner | not_started |
|
||||
| **Selenium** | none (local) | — | — | — | — | ✅ Selenium WebDriver runner | not_started |
|
||||
| **Zapier** | api_key / webhook | ✅ Zapier API key | — | — | ✅ Zapier webhook URL registered per workflow | ✅ Webhook trigger handler | not_started |
|
||||
| **IFTTT** | webhook | ✅ IFTTT Webhooks key (per-user, from ifttt.com/maker_webhooks) | — | — | — | ✅ Webhook trigger handler | not_started |
|
||||
| **Make (Integromat)** | api_key / webhook | ✅ Make API key | — | — | ✅ Webhook URL registered in Make scenario | ✅ Webhook trigger handler | not_started |
|
||||
| **n8n** | api_key / webhook | ✅ n8n API key | — | — | ✅ n8n instance URL + webhook URL | ✅ Webhook trigger handler | not_started |
|
||||
| **Apache Airflow** | api_key / credentials | ✅ REST API key or user+pass | — | ✅ Airflow host URL | — | ✅ Airflow REST API client | not_started |
|
||||
| **Prefect** | api_key | ✅ Prefect API key | — | — | — | ✅ Prefect Python client | not_started |
|
||||
| **Temporal** | api_key / mtls | ✅ API key | — | ✅ Temporal host + namespace | — | ✅ Temporal Python SDK | not_started |
|
||||
| **Dagster** | api_key | ✅ Dagster+ token | — | ✅ Dagster host URL | — | ✅ Dagster GraphQL client | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Playwright (via chat)**
|
||||
- Chat says: "Playwright runs as a sandboxed browser on the Nebula server (or locally via Bridge)."
|
||||
- Chat asks: "Should browser automation run locally (via Bridge) or on the Nebula cloud sandbox?"
|
||||
- Chat asks: "Any starting URL or site the agent should target?"
|
||||
- Chat confirms: "Launching a headless test navigation to `example.com`…"
|
||||
|
||||
**Zapier (via chat)**
|
||||
- Chat asks: "Paste your Zapier API key"
|
||||
- Chat asks: "Which Zap should Nebula trigger? (list by name or ID)"
|
||||
- Chat asks: "What data payload should be sent with the trigger?"
|
||||
- Chat confirms: "Sending a test trigger to Zapier…"
|
||||
|
||||
---
|
||||
|
||||
## 17. Observability
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Prometheus** | none / api_key | — | — | ✅ Prometheus endpoint URL | — | ✅ PromQL HTTP client | not_started |
|
||||
| **Grafana** | api_key / oauth2 | ✅ Service account token | ✅ OAuth | ✅ Grafana host URL | — | ✅ Grafana HTTP API client | not_started |
|
||||
| **Datadog** | api_key | ✅ API key + App key | — | — | ✅ Datadog app key (for Nebula integration in Datadog catalog) | ✅ Datadog Python SDK | not_started |
|
||||
| **New Relic** | api_key | ✅ License key + User API key | — | — | — | ✅ New Relic REST/NerdGraph client | not_started |
|
||||
| **OpenTelemetry** | none / api_key | — | — | ✅ OTLP endpoint URL | — | ✅ opentelemetry-sdk (built into runtime) | partial |
|
||||
| **Jaeger** | none | — | — | ✅ Jaeger host URL | — | ✅ Jaeger HTTP API client | not_started |
|
||||
| **ELK Stack** | api_key / credentials | ✅ API key (cloud) | — | ✅ Elasticsearch URL + auth (self-hosted) | — | ✅ elasticsearch-py + Kibana API | not_started |
|
||||
| **Loki** | api_key | ✅ API key | — | ✅ Loki host URL | — | ✅ LogQL HTTP client | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Datadog (via chat)**
|
||||
- Chat asks: "Paste your Datadog API key (Site: `app.datadoghq.com` or `datadoghq.eu`)"
|
||||
- Chat asks: "Paste your Datadog Application key"
|
||||
- Chat asks: "Which Datadog site are you on? (US1, EU, US3, etc.)"
|
||||
- Chat asks: "Should Nebula receive Datadog webhook alerts to auto-trigger incident workflows?"
|
||||
- Chat confirms: "Posting a test event to verify…"
|
||||
|
||||
**Prometheus / Grafana (via chat)**
|
||||
- Chat asks: "What's the Prometheus/Grafana URL? (e.g. `http://prometheus.internal:9090`)"
|
||||
- Chat asks: "Is authentication required? (basic auth or bearer token)"
|
||||
- Chat asks: "Which metrics or dashboards should Nebula monitor?"
|
||||
- Chat confirms: "Running a test PromQL query to verify…"
|
||||
|
||||
---
|
||||
|
||||
## 18. Data Services
|
||||
|
||||
| Integration | Auth Method | User API Key | User OAuth | User Credentials | Platform Key (Nebula) | Code-Side Support | Status |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Weather API** | api_key | ✅ API key (OpenWeatherMap, WeatherAPI, etc.) | — | — | ✅ Nebula platform key for shared usage (optional) | ✅ HTTP client with provider adapter | not_started |
|
||||
|
||||
### Chat Setup Workflow
|
||||
|
||||
**Weather API (via chat)**
|
||||
- Chat asks: "Which weather provider do you prefer? (OpenWeatherMap, WeatherAPI.com, Weatherstack)"
|
||||
- Chat asks: "Paste your API key, or Nebula can use its shared key (rate limits apply)"
|
||||
- Chat asks: "Default location for weather queries? (city name or lat/lng)"
|
||||
- Chat confirms: "Fetching current weather for the default location to verify…"
|
||||
|
||||
---
|
||||
|
||||
## Summary: Platform Keys Nebula Must Procure
|
||||
|
||||
These are **Nebula-side** registrations — not per-user, but shared infrastructure the operator must configure once:
|
||||
|
||||
| Platform | What to Register | Where |
|
||||
|---|---|---|
|
||||
| **Slack** | Slack App (OAuth, bot scopes, signing secret) | api.slack.com/apps |
|
||||
| **GitHub** | GitHub OAuth App or GitHub App | github.com/settings/apps |
|
||||
| **GitLab** | OAuth application | gitlab.com/oauth/applications |
|
||||
| **Bitbucket** | OAuth consumer | bitbucket.org/account/settings |
|
||||
| **Jira / Confluence / Trello** | Atlassian OAuth app | developer.atlassian.com |
|
||||
| **Linear** | Linear OAuth app | linear.app/settings/api |
|
||||
| **ClickUp / Asana** | OAuth app | dev portals per service |
|
||||
| **Microsoft Teams / Outlook / OneDrive / SharePoint / Azure** | Azure AD app registration | portal.azure.com |
|
||||
| **Google (Gmail / Drive / Meet / Gemini / GCP)** | GCP project + OAuth client | console.cloud.google.com |
|
||||
| **Zoom** | Zoom Server-to-Server OAuth app | marketplace.zoom.us |
|
||||
| **WhatsApp** | Meta Business App + WhatsApp Business Platform | developers.facebook.com |
|
||||
| **Discord** | Discord Application + Bot | discord.com/developers |
|
||||
| **Webex** | Webex Integration app | developer.webex.com |
|
||||
| **Miro / Canva / Figma** | OAuth app per service | respective developer portals |
|
||||
| **Datadog** | App key for webhook integration | app.datadoghq.com |
|
||||
| **Stripe / Razorpay** | Webhook endpoint signing secret | Stripe / Razorpay dashboard |
|
||||
| **Meta/PayPal** | App registration for OAuth | developer platforms |
|
||||
| **OpenAI / Anthropic / Groq / Mistral / Gemini** | Platform-level API key (Offered AI quota) | respective API consoles |
|
||||
|
||||
---
|
||||
|
||||
## Summary: Auth Pattern Quick Reference
|
||||
|
||||
| Auth Pattern | Integrations | User Action Required | Nebula Action Required |
|
||||
|---|---|---|---|
|
||||
| **OAuth2 (user-delegated)** | Slack, GitHub, Jira, Google (Drive/Gmail/Meet), Microsoft (Teams/Outlook/Azure), Zoom, Notion, Figma, Dropbox, Miro, Canva | Click authorize button | Register OAuth app, handle callback, store token |
|
||||
| **User API Key** | Linear, PagerDuty, Stripe, OpenAI, Anthropic, Datadog, Sentry, Jira, Airtable, Razorpay, etc. | Paste key from their account | Store encrypted per-user, never log |
|
||||
| **Platform/Developer Key (Nebula-only)** | WhatsApp, Slack (app-level), MS Teams (Azure app), Google GCP | None (transparent) | Operator must register once |
|
||||
| **User Credentials** | PostgreSQL, MySQL, SMTP, IMAP, Kafka, Kubernetes, MongoDB, Redis | Paste host+user+password | Encrypt, test connectivity |
|
||||
| **Local / No Auth** | Ollama, LM Studio, DuckDB, SQLite, Playwright, Selenium | Configure local URL | Bridge extension routes traffic |
|
||||
| **Webhook** | GitHub, Jira, Stripe, PagerDuty, Slack events, WhatsApp, Telegram | Register endpoint (auto) | Expose `/webhooks/<slug>`, verify HMAC |
|
||||
| **Bot Token** | Telegram, Discord | Create bot and paste token | Register webhook per bot token |
|
||||
| **Service Account / IAM Role** | AWS, GCP, BigQuery, Azure Managed Identity | Supply JSON key or role ARN | None (per-user config) |
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
# Consolidated Work Breakdown — What Needs to Be Built in Nebula
|
||||
|
||||
> Work is categorized into four tiers: **Global** (all integrations benefit), **Type-Specific** (a group shares the same infrastructure), **Integration-Specific** (unique per provider), and **Zero-Code** (existing framework already covers it fully).
|
||||
|
||||
---
|
||||
|
||||
## Tier 1: Global — All Integrations Use This
|
||||
|
||||
These are foundational subsystems. Every integration that is built will depend on them. Build once, leverage everywhere.
|
||||
|
||||
| # | Work Item | Description | Integrations That Benefit |
|
||||
|---|---|---|---|
|
||||
| G1 | **Encrypted credential store** | Per-user, per-integration encrypted storage for API keys, tokens, connection strings. Fernet encryption, never logged, decrypted only at task runtime. Already partially exists in `system_settings`. Must be generalized to per-user scope. | Every integration requiring user credentials |
|
||||
| G2 | **Integration instance lifecycle** | `integration_instances` table + service layer: create, activate, test-connection, deactivate, delete. Already in DB schema; needs service + API wiring to be complete. | Every integration |
|
||||
| G3 | **Chat-driven setup conductor** | Chat tool (`setup_integration`) that drives the multi-step Q&A flow for any integration. Must be config-driven: reads `config_schema` from `integration_providers` to know what fields to collect, then calls the credential store. | Every integration set up via chat |
|
||||
| G4 | **Connectivity verification** | Generic `test_connection` executor per integration type — runs a lightweight read-only probe and returns success/error. Called after setup to confirm credentials work. | Every integration |
|
||||
| G5 | **Webhook endpoint dispatcher** | Single inbound route `/webhooks/<slug>` that routes to registered handler by slug. Already partially built (`webhooks` + `webhook_deliveries` tables exist). Needs generic dispatch + HMAC verification that adapts per-provider signing scheme. | GitHub, Jira, Slack, PagerDuty, Stripe, WhatsApp, Telegram, Twilio, Sentry, ADO, Datadog, Razorpay, ClickUp, Asana, Linear, Zoom, DockerHub |
|
||||
| G6 | **OAuth2 callback handler** | Generic OAuth2 authorization-code flow: redirect-to-provider, handle callback, exchange code for token, store token. Already partially in `oauth2_states` table. Needs to be completed per-provider with client_id/secret from env config. | Slack, GitHub, GitLab, Bitbucket, Jira, Linear, ClickUp, Asana, Google (all), Microsoft (all), Zoom, Notion, Dropbox, Figma, Miro, Canva, Discord, Webex, PayPal, BigQuery, Airtable |
|
||||
| G7 | **Token refresh loop** | Background job that refreshes OAuth2 access tokens before expiry using refresh tokens. Needs to run per-integration-instance. | All OAuth2 integrations |
|
||||
| G8 | **Integration catalog API** | `GET /api/v1/integrations/catalog` and `GET /api/v1/integrations/catalog/{slug}` — exposes catalog with auth requirements, capabilities, setup instructions. Needed by chat discovery and `/integrations` UI. | All integrations (catalog browsing) |
|
||||
| G9 | **Chat resolve endpoint** | `GET /api/v1/integrations/resolve?name=<free_text>` — maps user's natural language (e.g. "my slack") to a provider slug, returns auth requirements and setup path. Used by chat to detect integration intent. | All integrations (chat discovery) |
|
||||
| G10 | **Secure sandbox for tool execution** | Agent sandbox environment where tool calls against external APIs are made. Enforces policy, captures output, logs every call. All integration tool calls must route through this. | All integrations invoked by agents/workflows |
|
||||
|
||||
---
|
||||
|
||||
## Tier 2: Type-Specific — One Implementation Covers a Group
|
||||
|
||||
Each group below shares the same underlying pattern. Build the pattern once; adding a new integration in that group is primarily configuration + thin adapter.
|
||||
|
||||
### 2A. REST API Key integrations (majority of catalog)
|
||||
> Covers: Linear, PagerDuty, Jira (api_key path), Notion (token path), Sentry, CircleCI, TravisCI, DockerHub, SonarQube, Codecov, Datadog, New Relic, Grafana, Stripe, Razorpay, Twilio, SendGrid, Mailgun, Snowflake, Cloudflare, Zapier, n8n, Make, Airflow, Prefect, Dagster, HuggingFace, Cohere, Harbor, Weather API, ELK, Loki
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **Generic API key collector** | Chat tool that prompts for key fields defined in `config_schema.required[]`, validates format (non-empty, optional prefix check), stores encrypted. |
|
||||
| **Per-integration HTTP client base** | Thin base class / factory: takes `base_url` + auth header from stored credentials → makes authenticated HTTP requests. Adapters per provider are ~20-50 lines each (endpoint paths + response parsing). |
|
||||
| **Connectivity probe registry** | Registry of lightweight probe calls per slug: e.g. `jira → GET /rest/api/3/myself`, `stripe → GET /v1/account`, `linear → POST /graphql {viewer{id}}`. Called by G4. |
|
||||
|
||||
### 2B. OAuth2 User-Delegated integrations
|
||||
> Covers: Slack, GitHub, GitLab, Bitbucket, Jira (OAuth path), ClickUp, Asana, Google (Gmail/Drive/Meet/Gemini/GCP), Microsoft (Teams/Outlook/OneDrive/SharePoint/Azure), Zoom, Notion (OAuth path), Dropbox, Figma, Miro, Canva, Discord, Webex, PayPal, BigQuery, Airtable
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **OAuth2 app registry** | Config store (env vars or DB) for per-provider `client_id` + `client_secret` + `scopes` + `authorize_url` + `token_url`. Operator configures once. |
|
||||
| **OAuth2 flow UI trigger** | Chat returns an action card with a "Connect" URL that opens the provider's authorization page. Post-callback, token is stored and instance is activated. |
|
||||
| **Scope negotiation** | Each integration defines its required scopes. OAuth2 flow must request only declared scopes. |
|
||||
|
||||
### 2C. Webhook-inbound integrations
|
||||
> Covers: GitHub, Jira, Slack Events API, PagerDuty, Stripe, WhatsApp, Telegram, Twilio SMS, Sentry, ADO Service Hooks, Datadog, Razorpay, Linear, ClickUp, DockerHub, Zoom
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **HMAC verifier registry** | Per-provider HMAC verification logic: header name + algorithm differ (e.g. `X-Hub-Signature-256` for GitHub, `X-Slack-Signature` for Slack, `Stripe-Signature` for Stripe). Build as a pluggable verifier map. |
|
||||
| **Event type router** | Dispatch inbound webhook payload to the correct agent/workflow trigger based on event type field (e.g. GitHub `X-GitHub-Event`, Stripe `type`, Slack `event.type`). |
|
||||
| **Webhook registration automator** | After user provides credentials, automatically call the provider's API to register Nebula's endpoint as the webhook destination (GitHub: `POST /repos/{owner}/{repo}/hooks`, Stripe: `POST /webhook_endpoints`, etc.). |
|
||||
|
||||
### 2D. Database / connection-string integrations
|
||||
> Covers: PostgreSQL, MySQL, SQLite, MongoDB, Redis, Elasticsearch, ClickHouse, DuckDB, Snowflake, Apache Kafka
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **Database connection manager** | Pool of named connections keyed by integration instance ID. Connects on first use, keeps alive, reconnects on failure. |
|
||||
| **Query tool** | Generic agent tool `db_query(instance_id, query, params)` — routes to correct driver by provider slug. Must enforce read-only mode if configured. |
|
||||
| **Schema inspector** | `db_schema(instance_id)` — returns table/collection list and schemas for agent context. |
|
||||
| **Driver adapter set** | Thin adapters per DB: asyncpg (✅ built-in), aiomysql, motor (MongoDB), redis-py, elasticsearch-py, clickhouse-driver, duckdb, snowflake-connector, confluent-kafka. Install as optional deps. |
|
||||
|
||||
### 2E. Local / Bridge-routed integrations
|
||||
> Covers: Ollama, LM Studio, vLLM, SQLite, DuckDB, Playwright, Puppeteer, Selenium, Signal CLI, Terraform CLI, Pulumi CLI
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **Bridge routing layer** | For `localOnly=true` integrations, route all calls through the Nebula Local Bridge extension (already exists). Needs a standardized message protocol for tool call → bridge → local service → response. |
|
||||
| **Local sandbox executor** | For CLI tools (Playwright, Terraform, Pulumi), a sandboxed subprocess executor in the agent environment. Controls PATH, captures stdout/stderr, enforces timeouts. |
|
||||
|
||||
### 2F. Bot-token consumer messaging integrations
|
||||
> Covers: Telegram, Discord
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **Bot token webhook registrar** | After user supplies bot token, auto-register Nebula's inbound webhook URL with the provider (Telegram: `setWebhook`, Discord: interactions endpoint URL). |
|
||||
| **Inbound message dispatcher** | Parse provider-specific message format → normalize to Nebula `InboundMessage` model → route to assigned agent or conversation. |
|
||||
| **Outbound message sender** | Generic `send_message(instance_id, chat_id, text, attachments?)` — routes to correct provider client. |
|
||||
|
||||
### 2G. Cloud provider / IAM-credential integrations
|
||||
> Covers: AWS, GCP, Azure (service principal), Kubernetes, BigQuery (service account path)
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **Service account / key file ingestion** | Accept JSON key files (GCP service account, AWS credentials file) or structured key-value pairs from chat, store securely per-instance. |
|
||||
| **Assumed-role / token-exchange flow** | For AWS IAM Role assumption and Azure Managed Identity, implement STS `AssumeRole` and token-exchange flows. |
|
||||
| **Cloud SDK credential injector** | At tool call time, inject stored credentials into the SDK call context (boto3 Session, google-auth, azure-identity). Never expose raw credentials to agent prompts. |
|
||||
|
||||
### 2H. Email protocol integrations (SMTP/IMAP)
|
||||
> Covers: SMTP, IMAP
|
||||
|
||||
| Work Item | Description |
|
||||
|---|---|
|
||||
| **SMTP sender tool** | `send_email(instance_id, to, subject, body, attachments?)` backed by aiosmtplib with TLS support. |
|
||||
| **IMAP monitor** | Polling or IMAP IDLE listener per instance. On new mail, emits an event that can trigger an agent/workflow. |
|
||||
|
||||
---
|
||||
|
||||
## Tier 3: Integration-Specific — Unique Per Provider
|
||||
|
||||
These items cannot be generalized — each requires its own bespoke implementation.
|
||||
|
||||
| Integration | Specific Work Required |
|
||||
|---|---|
|
||||
| **Slack** | Slash command endpoint (`/slack/commands`), interactive message handler (buttons, modals), Event Subscriptions URL verification, bot presence management |
|
||||
| **GitHub** | GitHub App installation flow (different from OAuth App), `check_run` and `deployment` API calls for CI integration, repository dispatch events |
|
||||
| **WhatsApp** | Meta webhook verification handshake (`GET` with `hub.challenge`), message template management API (templates must be pre-approved by Meta), media upload flow |
|
||||
| **Telegram** | Inline keyboard builder, bot command registry (`setMyCommands`), file/media download from Telegram CDN |
|
||||
| **Jira** | Atlassian Connect app descriptor (for deep marketplace integration), JQL query builder, sprint/board management APIs |
|
||||
| **Microsoft Teams** | Bot Framework adapter or Teams manifest + App Studio, Adaptive Cards renderer for rich messages, meeting bot participant API |
|
||||
| **Discord** | Interactions endpoint verification (Ed25519 signature), slash command registration (`/applications/{id}/commands`), guild permission management |
|
||||
| **AWS** | Per-service sub-adapters: S3 (presigned URLs, multipart upload), EC2 (describe/start/stop), Lambda (invoke sync/async), DynamoDB (query/scan), CloudWatch (metrics/logs/alarms) |
|
||||
| **Kubernetes** | kubeconfig parser + multi-cluster support, namespace-scoped RBAC enforcement, watch API for real-time pod/deployment events |
|
||||
| **Stripe** | Idempotency key management for payment operations, webhook event deduplication (Stripe sends retries), portal session creation for customer self-service |
|
||||
| **Terraform** | Plan output parser (JSON plan format), approval gate before `apply`, state file locking awareness |
|
||||
| **Pulumi** | Stack output parser, deployment log streaming, language runtime selection (Python/Node/Go) |
|
||||
| **Playwright** | Browser context pool management, screenshot/PDF storage in artifacts, network interception rule builder |
|
||||
| **Signal** | signal-cli daemon process management, linked device registration flow via QR code, message encryption is handled by signal-cli |
|
||||
| **OpenTelemetry** | Nebula runtime instrumentation (agent execution spans, tool call spans, policy check spans) — emit to user-configured OTLP collector |
|
||||
| **Datadog** | Two-way: query Datadog for metrics/logs AND receive Datadog webhook alerts → trigger Nebula workflows. Requires both DD API client and inbound webhook handler |
|
||||
| **BigQuery** | Dataset/table schema caching for agent context, streaming insert support, query job polling (async queries can take minutes) |
|
||||
| **Apache Kafka** | Consumer group management, offset commit strategy, schema registry integration (Avro/Protobuf), partition assignment |
|
||||
| **Twilio** | Voice call TwiML generator, conference call management, Studio flow trigger (separate from messaging API) |
|
||||
|
||||
---
|
||||
|
||||
## Tier 4: Zero Additional Code Required — Existing Framework Covers It
|
||||
|
||||
These integrations require **no new Nebula code** once the foundational layers (G1–G10) and relevant type-specific tier (Tier 2) are in place. Adding them is purely configuration + thin credential adapter.
|
||||
|
||||
> **Why:** They use the existing OpenAI-compatible model provider interface already wired in Nebula. The model provider selector pattern handles auth, routing, and tool calling.
|
||||
|
||||
### AI Model Providers — Model Provider Interface (already built-in)
|
||||
|
||||
| Integration | Why Zero Additional Code |
|
||||
|---|---|
|
||||
| **OpenAI** | ✅ Already built-in via openai SDK + model provider system |
|
||||
| **Anthropic** | ✅ Already built-in via anthropic SDK + model provider system |
|
||||
| **Mistral AI** | ✅ Already built-in — OpenAI-compatible endpoint |
|
||||
| **Google Gemini** | ✅ Already built-in via google-generativeai SDK |
|
||||
| **Groq** | ✅ Already built-in — OpenAI-compatible endpoint |
|
||||
| **Ollama** | ✅ Already built-in — OpenAI-compatible endpoint via Bridge |
|
||||
| **LM Studio** | ✅ Already built-in — OpenAI-compatible endpoint via Bridge |
|
||||
| **Cohere** | Zero new infra — just add cohere SDK as optional dep + model provider adapter (~30 lines) |
|
||||
| **HuggingFace Inference API** | Zero new infra — OpenAI-compatible or REST adapter (~30 lines) |
|
||||
| **vLLM** | Zero new infra — OpenAI-compatible endpoint, user supplies URL |
|
||||
|
||||
### Local File Databases — Already Available in Python stdlib / thin dep
|
||||
|
||||
| Integration | Why Zero Additional Code |
|
||||
|---|---|
|
||||
| **SQLite** | `sqlite3` is Python stdlib — just expose as a `db_query` tool with file-path credential |
|
||||
| **DuckDB** | `duckdb` is a pip install — same `db_query` tool pattern, no infra change |
|
||||
|
||||
### Simple Outbound-Only HTTP Integrations (once generic REST client is built)
|
||||
|
||||
Once the Tier 2A generic API key HTTP client base (§2A) exists, these require only a thin config entry (base URL + auth header + endpoint map) — no new framework code:
|
||||
|
||||
| Integration | What's Needed Beyond Framework |
|
||||
|---|---|
|
||||
| **Weather API** | Endpoint map + response field normalizer (~20 lines) |
|
||||
| **Codecov** | Endpoint map for coverage report fetch (~20 lines) |
|
||||
| **TravisCI** | Endpoint map for build status (~20 lines) |
|
||||
| **Prefect** | Python SDK call wrappers (~30 lines) |
|
||||
| **New Relic** | NRQL query wrapper (~20 lines) |
|
||||
| **Jaeger** | HTTP trace query wrapper (~20 lines) |
|
||||
| **Loki** | LogQL HTTP query wrapper (~20 lines) |
|
||||
| **Prometheus** | PromQL HTTP query wrapper (~20 lines) |
|
||||
| **Coda** | REST API adapter with api_key header (~30 lines) |
|
||||
| **Cloudflare** | REST API adapter with api_key header (~30 lines) |
|
||||
|
||||
---
|
||||
|
||||
## Summary Matrix
|
||||
|
||||
| Category | Count | Key Dependency |
|
||||
|---|---|---|
|
||||
| **Global (G1–G10)** | 10 items | Must be built first — everything blocks on these |
|
||||
| **Type-Specific (Tier 2, groups A–H)** | 8 groups | Build once per group, unlocks all integrations in that group |
|
||||
| **Integration-Specific (Tier 3)** | 19 integrations | Custom work per provider, can be done in parallel |
|
||||
| **Zero Additional Code (Tier 4)** | 20+ integrations | Unlocked automatically once Tier 1 + relevant Tier 2 is done |
|
||||
|
||||
### Recommended Build Order
|
||||
|
||||
```
|
||||
Phase 0 (Unblocks everything):
|
||||
G1 (credential store) → G2 (instance lifecycle) → G5 (webhook dispatcher) → G6 (OAuth2 callback)
|
||||
|
||||
Phase 1 (First working integrations):
|
||||
Tier 2A (REST API key client) → wire Jira + Linear + PagerDuty + Notion
|
||||
Tier 2C (Webhook HMAC verifier) → wire GitHub + Slack (already partial)
|
||||
Tier 2D (DB connection manager) → wire PostgreSQL (already built-in)
|
||||
|
||||
Phase 2 (Expand coverage):
|
||||
G3 (chat setup conductor) → all new integrations become chat-configurable
|
||||
Tier 2B (OAuth2 full flow) → unlock Google + Microsoft ecosystem
|
||||
Tier 2F (Bot messaging) → wire Telegram + Discord
|
||||
|
||||
Phase 3 (Complex / high-effort):
|
||||
Tier 3 specifics: Slack slash commands, WhatsApp templates, Teams Bot Framework, Kubernetes watch, AWS sub-services
|
||||
Tier 2E (Local/Bridge/CLI sandbox) → Playwright, Terraform, Pulumi
|
||||
|
||||
Phase 4 (Zero-code + thin adapters):
|
||||
All Tier 4 entries — just register new providers in catalog and config
|
||||
```
|
||||
@@ -82,8 +82,7 @@ faker==22.0.0
|
||||
playwright>=1.49.0
|
||||
pytest-playwright==0.4.4
|
||||
|
||||
# --- Web Scraper + Search (optional, feature-gated) ---
|
||||
ddgs>=0.1.0
|
||||
# --- Web Scraper (optional, feature-gated) ---
|
||||
crawl4ai==0.4.247
|
||||
firecrawl-py==1.4.0
|
||||
scrapegraphai==1.13.3
|
||||
|
||||
@@ -140,6 +140,30 @@ def _extract_reply_text(data: Dict[str, Any]) -> str:
|
||||
)
|
||||
|
||||
|
||||
def _synthesize_tool_calls_from_plan(plan_json: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""Convert plan JSON requested_tools into OpenAI-compatible tool_calls.
|
||||
|
||||
requested_tools format emitted by LLM per prompt instructions:
|
||||
[{"recipient_name": "functions.tool_name", "parameters": {...}}]
|
||||
"""
|
||||
synthesized: List[Dict[str, Any]] = []
|
||||
for i, entry in enumerate(plan_json.get("requested_tools") or []):
|
||||
recipient = entry.get("recipient_name", "")
|
||||
tool_name = recipient.removeprefix("functions.").strip()
|
||||
if not tool_name:
|
||||
continue
|
||||
params = entry.get("parameters") or {}
|
||||
synthesized.append({
|
||||
"id": f"tc_plan_{i}",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool_name,
|
||||
"arguments": json.dumps(params),
|
||||
},
|
||||
})
|
||||
return synthesized
|
||||
|
||||
|
||||
def _extract_plan_json(raw_text: str) -> Optional[Dict[str, Any]]:
|
||||
if not raw_text:
|
||||
return None
|
||||
@@ -1247,46 +1271,78 @@ async def _react_loop(
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
# LLM produced text — loop ends naturally
|
||||
# LLM produced text — check if it's a plan JSON before terminating the loop.
|
||||
reply_text = _extract_reply_text(data)
|
||||
log.info("react_loop_complete", {
|
||||
"component": "api.chat",
|
||||
"operation": "_react_loop",
|
||||
"entity_id": "nebula-assistant",
|
||||
"correlation_id": cid,
|
||||
"metadata": {
|
||||
"iterations": iteration + 1,
|
||||
"tool_calls_total": len(tool_actions),
|
||||
"total_llm_ms": total_llm_ms,
|
||||
"total_tools_ms": total_tools_ms,
|
||||
},
|
||||
})
|
||||
return reply_text, tool_actions, model_used, total_llm_ms, total_tools_ms
|
||||
|
||||
plan_json = _extract_plan_json(_extract_reply_text(data))
|
||||
if plan_json and not any(ta.get("tool") == "_plan" for ta in tool_actions):
|
||||
tool_actions.append({
|
||||
"tool": "_plan",
|
||||
"result": {
|
||||
"success": True,
|
||||
"message": "Execution plan prepared.",
|
||||
"plan_json": plan_json,
|
||||
},
|
||||
"spinner_text": "Planning...",
|
||||
"iteration": iteration + 1,
|
||||
})
|
||||
await _record_execution_step(
|
||||
execution_steps,
|
||||
bus=tool_ctx.bus,
|
||||
session_id=session_id,
|
||||
user_id=tool_ctx.user_id,
|
||||
correlation_id=cid,
|
||||
step_type="plan_generated",
|
||||
step_label="Execution plan prepared",
|
||||
status="completed",
|
||||
iteration=iteration + 1,
|
||||
metadata={"requested_tools": plan_json.get("requested_tools", [])},
|
||||
)
|
||||
plan_json = _extract_plan_json(reply_text)
|
||||
if plan_json and not any(ta.get("tool") == "_plan" for ta in tool_actions):
|
||||
synthesized = _synthesize_tool_calls_from_plan(plan_json)
|
||||
if synthesized:
|
||||
# LLM chose the plan-JSON path. Record the plan step, synthesize
|
||||
# real tool_calls, and fall through to the tool execution round.
|
||||
tool_actions.append({
|
||||
"tool": "_plan",
|
||||
"result": {
|
||||
"success": True,
|
||||
"message": "Execution plan prepared.",
|
||||
"plan_json": plan_json,
|
||||
},
|
||||
"spinner_text": "Planning...",
|
||||
"iteration": iteration + 1,
|
||||
})
|
||||
await _record_execution_step(
|
||||
execution_steps,
|
||||
bus=tool_ctx.bus,
|
||||
session_id=session_id,
|
||||
user_id=tool_ctx.user_id,
|
||||
correlation_id=cid,
|
||||
step_type="plan_generated",
|
||||
step_label="Execution plan prepared",
|
||||
status="completed",
|
||||
iteration=iteration + 1,
|
||||
metadata={"requested_tools": plan_json.get("requested_tools", [])},
|
||||
)
|
||||
log.info("react_loop_plan_synthesized", {
|
||||
"component": "api.chat",
|
||||
"operation": "_react_loop",
|
||||
"entity_id": "nebula-assistant",
|
||||
"correlation_id": cid,
|
||||
"metadata": {
|
||||
"iteration": iteration + 1,
|
||||
"synthesized_tool_count": len(synthesized),
|
||||
},
|
||||
})
|
||||
tool_calls = synthesized
|
||||
# Fall through: tool_calls is now set — execution round runs below.
|
||||
else:
|
||||
# Plan JSON has no recognisable tools — return plan text as reply.
|
||||
log.info("react_loop_complete", {
|
||||
"component": "api.chat",
|
||||
"operation": "_react_loop",
|
||||
"entity_id": "nebula-assistant",
|
||||
"correlation_id": cid,
|
||||
"metadata": {
|
||||
"iterations": iteration + 1,
|
||||
"tool_calls_total": len(tool_actions),
|
||||
"total_llm_ms": total_llm_ms,
|
||||
"total_tools_ms": total_tools_ms,
|
||||
},
|
||||
})
|
||||
return reply_text, tool_actions, model_used, total_llm_ms, total_tools_ms
|
||||
else:
|
||||
# Normal text reply — loop ends naturally.
|
||||
log.info("react_loop_complete", {
|
||||
"component": "api.chat",
|
||||
"operation": "_react_loop",
|
||||
"entity_id": "nebula-assistant",
|
||||
"correlation_id": cid,
|
||||
"metadata": {
|
||||
"iterations": iteration + 1,
|
||||
"tool_calls_total": len(tool_actions),
|
||||
"total_llm_ms": total_llm_ms,
|
||||
"total_tools_ms": total_tools_ms,
|
||||
},
|
||||
})
|
||||
return reply_text, tool_actions, model_used, total_llm_ms, total_tools_ms
|
||||
|
||||
# ── _respond_directly escape hatch ───────────────────────────────────
|
||||
# On iteration 0 the LLM may call _respond_directly to signal that it
|
||||
|
||||
@@ -1,87 +1,24 @@
|
||||
"""Web tools for Nebula chat: web_scraper and internet_search."""
|
||||
"""Web chat tools: internet_search and web_scraper.
|
||||
|
||||
Search logic is fully delegated to the search provider layer:
|
||||
src/execution_system/tools/search_providers/
|
||||
|
||||
This module is transport-only — it wires ChatTool definitions to providers.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import time
|
||||
from typing import Any, Dict, List, Optional
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from libs.logging import get_logger
|
||||
from src.api.routers.chat_tools import ChatTool, ToolContext, registry
|
||||
from src.execution_system.tools.search_providers import get_search_provider
|
||||
|
||||
log = get_logger("api.chat.tools.web")
|
||||
|
||||
|
||||
# ── internet_search ───────────────────────────────────────────────────────────
|
||||
|
||||
def _check_duckduckgo() -> bool:
|
||||
try:
|
||||
import ddgs # noqa: F401
|
||||
return True
|
||||
except ImportError:
|
||||
return False
|
||||
|
||||
|
||||
def _check_serper() -> bool:
|
||||
return bool(os.getenv("SERPER_API_KEY"))
|
||||
|
||||
|
||||
async def _search_duckduckgo(query: str, max_results: int) -> List[Dict[str, Any]]:
|
||||
"""Search using ddgs library (no API key required)."""
|
||||
try:
|
||||
from ddgs import DDGS
|
||||
except ImportError as exc:
|
||||
raise RuntimeError(
|
||||
"ddgs is not installed. Run: pip install ddgs"
|
||||
) from exc
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
def _do_search():
|
||||
with DDGS() as client:
|
||||
return list(client.text(query, max_results=max_results))
|
||||
|
||||
raw = await loop.run_in_executor(None, _do_search)
|
||||
return [
|
||||
{
|
||||
"title": r.get("title", ""),
|
||||
"url": r.get("href", ""),
|
||||
"snippet": r.get("body", ""),
|
||||
}
|
||||
for r in raw
|
||||
]
|
||||
|
||||
|
||||
async def _search_serper(query: str, max_results: int) -> List[Dict[str, Any]]:
|
||||
"""Search using Serper.dev API (requires SERPER_API_KEY)."""
|
||||
import json
|
||||
import urllib.request
|
||||
|
||||
api_key = os.getenv("SERPER_API_KEY", "")
|
||||
payload = json.dumps({"q": query, "num": max_results}).encode()
|
||||
req = urllib.request.Request(
|
||||
"https://google.serper.dev/search",
|
||||
data=payload,
|
||||
headers={"X-API-KEY": api_key, "Content-Type": "application/json"},
|
||||
)
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
def _do_request():
|
||||
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||
return json.loads(resp.read())
|
||||
|
||||
data = await loop.run_in_executor(None, _do_request)
|
||||
results: List[Dict[str, Any]] = []
|
||||
for item in data.get("organic", [])[:max_results]:
|
||||
results.append({
|
||||
"title": item.get("title", ""),
|
||||
"url": item.get("link", ""),
|
||||
"snippet": item.get("snippet", ""),
|
||||
})
|
||||
return results
|
||||
|
||||
|
||||
@registry.register
|
||||
class InternetSearchTool(ChatTool):
|
||||
name = "internet_search"
|
||||
@@ -91,8 +28,9 @@ class InternetSearchTool(ChatTool):
|
||||
"name": "internet_search",
|
||||
"description": (
|
||||
"Search the internet for current information, news, documentation, or any topic. "
|
||||
"Uses DuckDuckGo (no key needed) or Serper.dev if SERPER_API_KEY is set. "
|
||||
"Returns a list of search results with title, URL, and snippet."
|
||||
"Uses DuckDuckGo by default (no key needed). "
|
||||
"Set SERPER_API_KEY to use Google Search via Serper.dev. "
|
||||
"Returns a list of results with title, URL, and snippet."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
@@ -110,7 +48,7 @@ class InternetSearchTool(ChatTool):
|
||||
"engine": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Search engine to use: 'duckduckgo' (default, free) "
|
||||
"Search provider: 'duckduckgo' (default, free) "
|
||||
"or 'serper' (requires SERPER_API_KEY)."
|
||||
),
|
||||
"enum": ["duckduckgo", "serper"],
|
||||
@@ -124,33 +62,19 @@ class InternetSearchTool(ChatTool):
|
||||
async def execute(self, args: Dict[str, Any], ctx: ToolContext) -> Dict[str, Any]:
|
||||
query = (args.get("query") or "").strip()
|
||||
max_results = max(1, min(int(args.get("max_results") or 5), 10))
|
||||
engine = (args.get("engine") or "").strip().lower()
|
||||
engine_hint = (args.get("engine") or "").strip().lower() or None
|
||||
|
||||
if not query:
|
||||
return {"success": False, "error": "query is required", "results": []}
|
||||
|
||||
# Auto-select engine: prefer serper if key is set and not forced to ddg
|
||||
if engine == "serper" or (engine != "duckduckgo" and _check_serper()):
|
||||
selected_engine = "serper"
|
||||
elif _check_duckduckgo():
|
||||
selected_engine = "duckduckgo"
|
||||
else:
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
"No search backend available. "
|
||||
"Install ddgs (`pip install ddgs`) "
|
||||
"or set SERPER_API_KEY."
|
||||
),
|
||||
"results": [],
|
||||
}
|
||||
try:
|
||||
provider = get_search_provider(engine_hint)
|
||||
except ValueError as exc:
|
||||
return {"success": False, "error": str(exc), "results": []}
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
if selected_engine == "serper":
|
||||
results = await _search_serper(query, max_results)
|
||||
else:
|
||||
results = await _search_duckduckgo(query, max_results)
|
||||
results = await provider.search(query, max_results)
|
||||
elapsed_ms = int((time.monotonic() - t0) * 1000)
|
||||
log.info("internet_search_completed", {
|
||||
"component": "api.chat.tools.web",
|
||||
@@ -159,7 +83,7 @@ class InternetSearchTool(ChatTool):
|
||||
"correlation_id": ctx.cid,
|
||||
"metadata": {
|
||||
"query": query,
|
||||
"engine": selected_engine,
|
||||
"provider": provider.name,
|
||||
"result_count": len(results),
|
||||
"elapsed_ms": elapsed_ms,
|
||||
},
|
||||
@@ -167,8 +91,8 @@ class InternetSearchTool(ChatTool):
|
||||
return {
|
||||
"success": True,
|
||||
"query": query,
|
||||
"engine": selected_engine,
|
||||
"results": results,
|
||||
"engine": provider.name,
|
||||
"results": [r.to_dict() for r in results],
|
||||
"result_count": len(results),
|
||||
"elapsed_ms": elapsed_ms,
|
||||
}
|
||||
@@ -179,7 +103,11 @@ class InternetSearchTool(ChatTool):
|
||||
"operation": "internet_search",
|
||||
"entity_id": ctx.user_id or "anon",
|
||||
"correlation_id": ctx.cid,
|
||||
"metadata": {"query": query, "engine": selected_engine, "error": str(exc)},
|
||||
"metadata": {
|
||||
"query": query,
|
||||
"provider": provider.name,
|
||||
"error": str(exc),
|
||||
},
|
||||
})
|
||||
return {"success": False, "error": str(exc), "results": [], "query": query}
|
||||
|
||||
|
||||
27
src/execution_system/tools/search_providers/__init__.py
Normal file
27
src/execution_system/tools/search_providers/__init__.py
Normal file
@@ -0,0 +1,27 @@
|
||||
"""Search provider interface, concrete providers, and factory.
|
||||
|
||||
Providers
|
||||
---------
|
||||
- DuckDuckGoProvider — uses DuckDuckGo HTML endpoint via httpx (no API key)
|
||||
- SerperProvider — uses Serper.dev Google Search API (requires SERPER_API_KEY)
|
||||
|
||||
Factory
|
||||
-------
|
||||
get_search_provider(engine_hint) → SearchProvider
|
||||
Auto-selects the best available provider, or respects an explicit hint.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from src.execution_system.tools.search_providers.base import SearchProvider, SearchResult
|
||||
from src.execution_system.tools.search_providers.duckduckgo import DuckDuckGoProvider
|
||||
from src.execution_system.tools.search_providers.serper import SerperProvider
|
||||
from src.execution_system.tools.search_providers.factory import get_search_provider
|
||||
|
||||
__all__ = [
|
||||
"SearchProvider",
|
||||
"SearchResult",
|
||||
"DuckDuckGoProvider",
|
||||
"SerperProvider",
|
||||
"get_search_provider",
|
||||
]
|
||||
35
src/execution_system/tools/search_providers/base.py
Normal file
35
src/execution_system/tools/search_providers/base.py
Normal file
@@ -0,0 +1,35 @@
|
||||
"""Abstract base for search providers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Dict, List
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass
|
||||
class SearchResult:
|
||||
title: str
|
||||
url: str
|
||||
snippet: str
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {"title": self.title, "url": self.url, "snippet": self.snippet}
|
||||
|
||||
|
||||
class SearchProvider(ABC):
|
||||
"""Interface every search provider must implement."""
|
||||
|
||||
name: str
|
||||
|
||||
@abstractmethod
|
||||
def is_available(self) -> bool:
|
||||
"""Return True if this provider can be used (keys present, libs installed, etc.)."""
|
||||
|
||||
@abstractmethod
|
||||
async def search(self, query: str, max_results: int) -> List[SearchResult]:
|
||||
"""Execute a search and return a list of results.
|
||||
|
||||
Raises:
|
||||
RuntimeError: if the provider is unavailable or the request fails.
|
||||
"""
|
||||
65
src/execution_system/tools/search_providers/duckduckgo.py
Normal file
65
src/execution_system/tools/search_providers/duckduckgo.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""DuckDuckGo search provider.
|
||||
|
||||
Uses the DuckDuckGo HTML endpoint via httpx — no API key required.
|
||||
httpx is already a project dependency (requirements.txt).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import html
|
||||
import re
|
||||
from typing import List
|
||||
|
||||
import httpx
|
||||
|
||||
from src.execution_system.tools.search_providers.base import SearchProvider, SearchResult
|
||||
|
||||
_DDG_HTML_URL = "https://html.duckduckgo.com/html/"
|
||||
_DDG_HEADERS = {
|
||||
"User-Agent": (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/124.0.0.0 Safari/537.36"
|
||||
),
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
}
|
||||
_RESULT_RE = re.compile(
|
||||
r'class="result__a"[^>]+href="([^"]+)"[^>]*>([^<]+)<',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
_SNIPPET_RE = re.compile(
|
||||
r'class="result__snippet"[^>]*>([^<]+)<',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
class DuckDuckGoProvider(SearchProvider):
|
||||
"""Search via DuckDuckGo HTML endpoint. Always available — no key needed."""
|
||||
|
||||
name = "duckduckgo"
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return True
|
||||
|
||||
async def search(self, query: str, max_results: int) -> List[SearchResult]:
|
||||
async with httpx.AsyncClient(
|
||||
headers=_DDG_HEADERS, follow_redirects=True, timeout=12
|
||||
) as client:
|
||||
resp = await client.post(
|
||||
_DDG_HTML_URL,
|
||||
data={"q": query, "kl": "us-en"},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
|
||||
body = resp.text
|
||||
urls_titles = _RESULT_RE.findall(body)
|
||||
snippets = [html.unescape(s.strip()) for s in _SNIPPET_RE.findall(body)]
|
||||
|
||||
results: List[SearchResult] = []
|
||||
for i, (url, title) in enumerate(urls_titles[:max_results]):
|
||||
results.append(SearchResult(
|
||||
title=html.unescape(title.strip()),
|
||||
url=url,
|
||||
snippet=snippets[i] if i < len(snippets) else "",
|
||||
))
|
||||
return results
|
||||
77
src/execution_system/tools/search_providers/factory.py
Normal file
77
src/execution_system/tools/search_providers/factory.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""Factory for selecting the appropriate search provider.
|
||||
|
||||
Priority order (when engine_hint is blank):
|
||||
1. SerperProvider — if SERPER_API_KEY is set (Google-quality results)
|
||||
2. DuckDuckGoProvider — always available, no key needed
|
||||
|
||||
An explicit engine_hint bypasses auto-selection.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Dict, Optional
|
||||
|
||||
from libs.logging import get_logger
|
||||
from src.execution_system.tools.search_providers.base import SearchProvider
|
||||
from src.execution_system.tools.search_providers.duckduckgo import DuckDuckGoProvider
|
||||
from src.execution_system.tools.search_providers.serper import SerperProvider
|
||||
|
||||
log = get_logger("execution_system.tools.search_providers.factory")
|
||||
|
||||
_REGISTRY: Dict[str, SearchProvider] = {
|
||||
p.name: p
|
||||
for p in [DuckDuckGoProvider(), SerperProvider()]
|
||||
}
|
||||
|
||||
|
||||
def get_search_provider(engine_hint: Optional[str] = None) -> SearchProvider:
|
||||
"""Return the best available SearchProvider.
|
||||
|
||||
Args:
|
||||
engine_hint: Optional provider name ('duckduckgo' | 'serper').
|
||||
If blank, auto-selects based on availability.
|
||||
|
||||
Returns:
|
||||
A concrete SearchProvider instance.
|
||||
|
||||
Raises:
|
||||
ValueError: if engine_hint is given but not recognised.
|
||||
"""
|
||||
hint = (engine_hint or "").strip().lower()
|
||||
|
||||
if hint:
|
||||
provider = _REGISTRY.get(hint)
|
||||
if provider is None:
|
||||
raise ValueError(
|
||||
f"Unknown search engine '{hint}'. "
|
||||
f"Available: {list(_REGISTRY.keys())}"
|
||||
)
|
||||
log.debug("search_provider_selected", {
|
||||
"component": "execution_system.tools.search_providers.factory",
|
||||
"operation": "get_search_provider",
|
||||
"entity_id": provider.name,
|
||||
"correlation_id": "factory",
|
||||
"metadata": {"hint": hint, "mode": "explicit"},
|
||||
})
|
||||
return provider
|
||||
|
||||
for provider in _REGISTRY.values():
|
||||
if provider.name != "duckduckgo" and provider.is_available():
|
||||
log.debug("search_provider_selected", {
|
||||
"component": "execution_system.tools.search_providers.factory",
|
||||
"operation": "get_search_provider",
|
||||
"entity_id": provider.name,
|
||||
"correlation_id": "factory",
|
||||
"metadata": {"hint": None, "mode": "auto"},
|
||||
})
|
||||
return provider
|
||||
|
||||
fallback = _REGISTRY["duckduckgo"]
|
||||
log.debug("search_provider_selected", {
|
||||
"component": "execution_system.tools.search_providers.factory",
|
||||
"operation": "get_search_provider",
|
||||
"entity_id": fallback.name,
|
||||
"correlation_id": "factory",
|
||||
"metadata": {"hint": None, "mode": "fallback"},
|
||||
})
|
||||
return fallback
|
||||
48
src/execution_system/tools/search_providers/serper.py
Normal file
48
src/execution_system/tools/search_providers/serper.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""Serper.dev search provider (Google Search API).
|
||||
|
||||
Requires SERPER_API_KEY environment variable.
|
||||
Uses httpx — already a project dependency.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import List
|
||||
|
||||
import httpx
|
||||
|
||||
from src.execution_system.tools.search_providers.base import SearchProvider, SearchResult
|
||||
|
||||
_SERPER_URL = "https://google.serper.dev/search"
|
||||
|
||||
|
||||
class SerperProvider(SearchProvider):
|
||||
"""Search via Serper.dev (Google results). Requires SERPER_API_KEY."""
|
||||
|
||||
name = "serper"
|
||||
|
||||
def is_available(self) -> bool:
|
||||
return bool(os.getenv("SERPER_API_KEY"))
|
||||
|
||||
async def search(self, query: str, max_results: int) -> List[SearchResult]:
|
||||
api_key = os.getenv("SERPER_API_KEY", "")
|
||||
if not api_key:
|
||||
raise RuntimeError("SERPER_API_KEY is not set")
|
||||
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
resp = await client.post(
|
||||
_SERPER_URL,
|
||||
json={"q": query, "num": max_results},
|
||||
headers={"X-API-KEY": api_key},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
|
||||
return [
|
||||
SearchResult(
|
||||
title=item.get("title", ""),
|
||||
url=item.get("link", ""),
|
||||
snippet=item.get("snippet", ""),
|
||||
)
|
||||
for item in data.get("organic", [])[:max_results]
|
||||
]
|
||||
257
tests/unit/test_search_providers.py
Normal file
257
tests/unit/test_search_providers.py
Normal file
@@ -0,0 +1,257 @@
|
||||
"""Unit tests for the search provider layer.
|
||||
|
||||
Tests:
|
||||
- DuckDuckGoProvider: availability, search, HTTP errors, empty results
|
||||
- SerperProvider: availability (key detection), search, missing key error
|
||||
- get_search_provider factory: auto-select, explicit hint, invalid hint
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
from src.execution_system.tools.search_providers.base import SearchResult
|
||||
from src.execution_system.tools.search_providers.duckduckgo import DuckDuckGoProvider
|
||||
from src.execution_system.tools.search_providers.serper import SerperProvider
|
||||
from src.execution_system.tools.search_providers.factory import get_search_provider
|
||||
|
||||
|
||||
def _ddg_html(*results: dict) -> str:
|
||||
"""Build minimal DDG HTML that the provider's regex can parse."""
|
||||
parts = []
|
||||
for r in results:
|
||||
parts.append(
|
||||
f'<a class="result__a" href="{r["url"]}">{r["title"]}</a>'
|
||||
f'<span class="result__snippet">{r.get("snippet", "")}</span>'
|
||||
)
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
# ── DuckDuckGoProvider ────────────────────────────────────────────────────────
|
||||
|
||||
class TestDuckDuckGoProvider:
|
||||
|
||||
def test_always_available(self):
|
||||
assert DuckDuckGoProvider().is_available() is True
|
||||
|
||||
def test_name(self):
|
||||
assert DuckDuckGoProvider().name == "duckduckgo"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_search_parses_results(self):
|
||||
html = _ddg_html(
|
||||
{"url": "https://techcrunch.com/ai", "title": "TechCrunch AI", "snippet": "Latest AI news"},
|
||||
{"url": "https://reuters.com/ai", "title": "Reuters AI", "snippet": "Reuters coverage"},
|
||||
)
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.text = html
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.duckduckgo.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await DuckDuckGoProvider().search("AI news", 5)
|
||||
|
||||
assert len(results) == 2
|
||||
assert isinstance(results[0], SearchResult)
|
||||
assert results[0].title == "TechCrunch AI"
|
||||
assert results[0].url == "https://techcrunch.com/ai"
|
||||
assert results[0].snippet == "Latest AI news"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_search_respects_max_results(self):
|
||||
html = _ddg_html(*[
|
||||
{"url": f"https://example.com/{i}", "title": f"Result {i}", "snippet": "..."}
|
||||
for i in range(10)
|
||||
])
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.text = html
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.duckduckgo.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await DuckDuckGoProvider().search("test", 3)
|
||||
|
||||
assert len(results) == 3
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_html_returns_empty_list(self):
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.text = "<html><body>No results</body></html>"
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.duckduckgo.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await DuckDuckGoProvider().search("noresults", 5)
|
||||
|
||||
assert results == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_error_raises(self):
|
||||
import httpx
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(
|
||||
side_effect=httpx.HTTPStatusError(
|
||||
"503", request=MagicMock(), response=MagicMock()
|
||||
)
|
||||
)
|
||||
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.duckduckgo.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
with pytest.raises(httpx.HTTPStatusError):
|
||||
await DuckDuckGoProvider().search("test", 5)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_result_to_dict(self):
|
||||
html = _ddg_html({"url": "https://example.com", "title": "Example", "snippet": "A site"})
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.text = html
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.duckduckgo.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await DuckDuckGoProvider().search("test", 1)
|
||||
|
||||
d = results[0].to_dict()
|
||||
assert set(d.keys()) == {"title", "url", "snippet"}
|
||||
|
||||
|
||||
# ── SerperProvider ────────────────────────────────────────────────────────────
|
||||
|
||||
class TestSerperProvider:
|
||||
|
||||
def test_unavailable_when_no_key(self):
|
||||
with patch.dict("os.environ", {}, clear=True):
|
||||
assert SerperProvider().is_available() is False
|
||||
|
||||
def test_available_when_key_set(self):
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "test_key"}):
|
||||
assert SerperProvider().is_available() is True
|
||||
|
||||
def test_name(self):
|
||||
assert SerperProvider().name == "serper"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_search_returns_results(self):
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json = MagicMock(return_value={
|
||||
"organic": [
|
||||
{"title": "Google Result 1", "link": "https://google.com/1", "snippet": "First"},
|
||||
{"title": "Google Result 2", "link": "https://google.com/2", "snippet": "Second"},
|
||||
]
|
||||
})
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "test_key"}):
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.serper.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await SerperProvider().search("test query", 5)
|
||||
|
||||
assert len(results) == 2
|
||||
assert isinstance(results[0], SearchResult)
|
||||
assert results[0].title == "Google Result 1"
|
||||
assert results[0].url == "https://google.com/1"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_raises_when_no_key(self):
|
||||
with patch.dict("os.environ", {}, clear=True):
|
||||
with pytest.raises(RuntimeError, match="SERPER_API_KEY"):
|
||||
await SerperProvider().search("test", 5)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_respects_max_results(self):
|
||||
organic = [
|
||||
{"title": f"R{i}", "link": f"https://g.com/{i}", "snippet": "..."}
|
||||
for i in range(10)
|
||||
]
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json = MagicMock(return_value={"organic": organic})
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "key"}):
|
||||
with patch(
|
||||
"src.execution_system.tools.search_providers.serper.httpx.AsyncClient",
|
||||
return_value=mock_client,
|
||||
):
|
||||
results = await SerperProvider().search("test", 3)
|
||||
|
||||
assert len(results) == 3
|
||||
|
||||
|
||||
# ── get_search_provider factory ───────────────────────────────────────────────
|
||||
|
||||
class TestGetSearchProvider:
|
||||
|
||||
def test_returns_duckduckgo_when_no_serper_key(self):
|
||||
with patch.dict("os.environ", {}, clear=True):
|
||||
provider = get_search_provider()
|
||||
assert provider.name == "duckduckgo"
|
||||
|
||||
def test_returns_serper_when_key_set_and_no_hint(self):
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "key"}):
|
||||
provider = get_search_provider()
|
||||
assert provider.name == "serper"
|
||||
|
||||
def test_explicit_duckduckgo_hint_overrides_serper(self):
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "key"}):
|
||||
provider = get_search_provider("duckduckgo")
|
||||
assert provider.name == "duckduckgo"
|
||||
|
||||
def test_explicit_serper_hint(self):
|
||||
with patch.dict("os.environ", {"SERPER_API_KEY": "key"}):
|
||||
provider = get_search_provider("serper")
|
||||
assert provider.name == "serper"
|
||||
|
||||
def test_invalid_hint_raises_value_error(self):
|
||||
with pytest.raises(ValueError, match="unknown_engine"):
|
||||
get_search_provider("unknown_engine")
|
||||
|
||||
def test_returns_duckduckgo_as_fallback_with_no_env(self):
|
||||
with patch.dict("os.environ", {}, clear=True):
|
||||
provider = get_search_provider(None)
|
||||
assert provider.name == "duckduckgo"
|
||||
294
tests/unit/test_web_tools.py
Normal file
294
tests/unit/test_web_tools.py
Normal file
@@ -0,0 +1,294 @@
|
||||
"""Unit tests for web_tools chat tools: internet_search and web_scraper."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
from typing import Any, Dict, List
|
||||
|
||||
from src.execution_system.tools.search_providers.base import SearchResult
|
||||
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
def _make_ctx(user_id: str = "test_user", cid: str = "test_cid") -> MagicMock:
|
||||
ctx = MagicMock()
|
||||
ctx.user_id = user_id
|
||||
ctx.cid = cid
|
||||
return ctx
|
||||
|
||||
|
||||
def _make_results(*titles: str) -> List[SearchResult]:
|
||||
return [
|
||||
SearchResult(title=t, url=f"https://example.com/{i}", snippet=f"Snippet for {t}")
|
||||
for i, t in enumerate(titles)
|
||||
]
|
||||
|
||||
|
||||
def _make_provider(name: str, results: List[SearchResult]) -> MagicMock:
|
||||
provider = MagicMock()
|
||||
provider.name = name
|
||||
provider.search = AsyncMock(return_value=results)
|
||||
return provider
|
||||
|
||||
|
||||
# ── internet_search ───────────────────────────────────────────────────────────
|
||||
|
||||
class TestInternetSearch:
|
||||
"""Tests for InternetSearchTool.execute() — mocked at the provider layer."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _tool(self):
|
||||
from src.api.routers.chat_tools.web_tools import InternetSearchTool
|
||||
self.tool = InternetSearchTool()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_results_via_duckduckgo(self):
|
||||
"""Factory returns DDG provider; results are serialised correctly."""
|
||||
fake = _make_results("AI News TechCrunch", "Reuters AI")
|
||||
provider = _make_provider("duckduckgo", fake)
|
||||
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider", return_value=provider):
|
||||
result = await self.tool.execute({"query": "latest AI news"}, _make_ctx())
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["engine"] == "duckduckgo"
|
||||
assert result["query"] == "latest AI news"
|
||||
assert len(result["results"]) == 2
|
||||
assert result["results"][0]["title"] == "AI News TechCrunch"
|
||||
assert "result_count" in result
|
||||
assert "elapsed_ms" in result
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_query_returns_error(self):
|
||||
"""Empty query is rejected before the factory is even called."""
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider") as mock_factory:
|
||||
result = await self.tool.execute({"query": ""}, _make_ctx())
|
||||
mock_factory.assert_not_called()
|
||||
|
||||
assert result["success"] is False
|
||||
assert "query is required" in result["error"]
|
||||
assert result["results"] == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_max_results_clamped_to_10(self):
|
||||
"""max_results above 10 is clamped before being passed to the provider."""
|
||||
provider = _make_provider("duckduckgo", _make_results(*[f"R{i}" for i in range(5)]))
|
||||
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider", return_value=provider):
|
||||
result = await self.tool.execute({"query": "test", "max_results": 99}, _make_ctx())
|
||||
|
||||
provider.search.assert_awaited_once()
|
||||
_, call_max = provider.search.call_args[0]
|
||||
assert call_max == 10
|
||||
assert result["success"] is True
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_selects_serper_when_factory_returns_it(self):
|
||||
"""When factory selects serper, result engine reflects that."""
|
||||
provider = _make_provider("serper", _make_results("Google Result"))
|
||||
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider", return_value=provider):
|
||||
result = await self.tool.execute({"query": "test query"}, _make_ctx())
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["engine"] == "serper"
|
||||
assert result["results"][0]["title"] == "Google Result"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_engine_hint_forwarded_to_factory(self):
|
||||
"""engine arg is passed as hint to get_search_provider."""
|
||||
provider = _make_provider("duckduckgo", _make_results("DDG Result"))
|
||||
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools.get_search_provider",
|
||||
return_value=provider,
|
||||
) as mock_factory:
|
||||
result = await self.tool.execute(
|
||||
{"query": "test", "engine": "duckduckgo"}, _make_ctx()
|
||||
)
|
||||
|
||||
mock_factory.assert_called_once_with("duckduckgo")
|
||||
assert result["engine"] == "duckduckgo"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unknown_engine_hint_returns_error(self):
|
||||
"""Invalid engine hint from factory ValueError becomes success=False."""
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools.get_search_provider",
|
||||
side_effect=ValueError("Unknown search engine 'bad_engine'"),
|
||||
):
|
||||
result = await self.tool.execute(
|
||||
{"query": "test", "engine": "bad_engine"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is False
|
||||
assert "bad_engine" in result["error"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_provider_exception_returns_failure(self):
|
||||
"""Exception raised by provider.search() is caught and returned as error."""
|
||||
provider = _make_provider("duckduckgo", [])
|
||||
provider.search = AsyncMock(side_effect=RuntimeError("DDG timeout"))
|
||||
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider", return_value=provider):
|
||||
result = await self.tool.execute({"query": "test"}, _make_ctx())
|
||||
|
||||
assert result["success"] is False
|
||||
assert "DDG timeout" in result["error"]
|
||||
assert result["results"] == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_results_from_provider(self):
|
||||
"""Provider returning empty list yields success=True with zero results."""
|
||||
provider = _make_provider("duckduckgo", [])
|
||||
|
||||
with patch("src.api.routers.chat_tools.web_tools.get_search_provider", return_value=provider):
|
||||
result = await self.tool.execute({"query": "xyznonexistent"}, _make_ctx())
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["results"] == []
|
||||
assert result["result_count"] == 0
|
||||
|
||||
|
||||
# ── web_scraper ───────────────────────────────────────────────────────────────
|
||||
|
||||
class TestWebScraper:
|
||||
"""Tests for WebScraperTool.execute()"""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _tool(self):
|
||||
from src.api.routers.chat_tools.web_tools import WebScraperTool
|
||||
self.tool = WebScraperTool()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_scrapes_url_successfully(self):
|
||||
"""Successful scrape returns structured result with markdown."""
|
||||
fake_scrape_result = {
|
||||
"url": "https://example.com",
|
||||
"library": "crawl4ai",
|
||||
"markdown": "# Example\n\nSome content here.",
|
||||
"title": "Example Domain",
|
||||
"links": ["https://iana.org"],
|
||||
"metadata": {},
|
||||
"char_count": 35,
|
||||
"elapsed_ms": 500,
|
||||
}
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools._available_scraper_library",
|
||||
return_value="crawl4ai",
|
||||
):
|
||||
with patch(
|
||||
"src.execution_system.tools.web_scraper.scrape_url",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_scrape_result,
|
||||
) as mock_scrape:
|
||||
result = await self.tool.execute(
|
||||
{"url": "https://example.com"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["url"] == "https://example.com"
|
||||
assert result["library"] == "crawl4ai"
|
||||
assert result["markdown"] == "# Example\n\nSome content here."
|
||||
assert result["title"] == "Example Domain"
|
||||
assert result["char_count"] == 35
|
||||
assert "elapsed_ms" in result
|
||||
mock_scrape.assert_awaited_once()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_missing_url_returns_error(self):
|
||||
"""Missing url argument returns error without calling scrape_url."""
|
||||
result = await self.tool.execute({}, _make_ctx())
|
||||
assert result["success"] is False
|
||||
assert "url is required" in result["error"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_backend_returns_error(self):
|
||||
"""No available scraper library returns descriptive error."""
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools._available_scraper_library",
|
||||
return_value=None,
|
||||
):
|
||||
result = await self.tool.execute(
|
||||
{"url": "https://example.com"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is False
|
||||
assert "No scraper backend" in result["error"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_explicit_library_is_passed_through(self):
|
||||
"""Explicit library arg is forwarded to scrape_url."""
|
||||
fake_result: Dict[str, Any] = {
|
||||
"url": "https://example.com",
|
||||
"library": "firecrawl",
|
||||
"markdown": "content",
|
||||
"title": "Example",
|
||||
"links": [],
|
||||
"metadata": {},
|
||||
"char_count": 7,
|
||||
"elapsed_ms": 200,
|
||||
}
|
||||
with patch(
|
||||
"src.execution_system.tools.web_scraper.scrape_url",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_result,
|
||||
) as mock_scrape:
|
||||
result = await self.tool.execute(
|
||||
{"url": "https://example.com", "library": "firecrawl"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["library"] == "firecrawl"
|
||||
call_args = mock_scrape.call_args
|
||||
assert call_args[0][1] == "firecrawl"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_scrape_exception_returns_failure(self):
|
||||
"""Exception from scrape_url is caught and returned as error."""
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools._available_scraper_library",
|
||||
return_value="crawl4ai",
|
||||
):
|
||||
with patch(
|
||||
"src.execution_system.tools.web_scraper.scrape_url",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("crawl4ai not installed"),
|
||||
):
|
||||
result = await self.tool.execute(
|
||||
{"url": "https://example.com"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is False
|
||||
assert "crawl4ai not installed" in result["error"]
|
||||
assert result["url"] == "https://example.com"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_links_truncated_to_20(self):
|
||||
"""Links list is truncated to first 20 items."""
|
||||
fake_result: Dict[str, Any] = {
|
||||
"url": "https://example.com",
|
||||
"library": "crawl4ai",
|
||||
"markdown": "content",
|
||||
"title": "Page",
|
||||
"links": [f"https://example.com/link-{i}" for i in range(50)],
|
||||
"metadata": {},
|
||||
"char_count": 7,
|
||||
"elapsed_ms": 100,
|
||||
}
|
||||
with patch(
|
||||
"src.api.routers.chat_tools.web_tools._available_scraper_library",
|
||||
return_value="crawl4ai",
|
||||
):
|
||||
with patch(
|
||||
"src.execution_system.tools.web_scraper.scrape_url",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_result,
|
||||
):
|
||||
result = await self.tool.execute(
|
||||
{"url": "https://example.com"}, _make_ctx()
|
||||
)
|
||||
|
||||
assert result["success"] is True
|
||||
assert len(result["links"]) == 20
|
||||
Reference in New Issue
Block a user