Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions acceptance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@
| Run Cache | `run-cache.feature` |
| Source Intake and Fetching v0.1 | `source-intake-and-fetching.feature` |
| Source Discovery v0.1 | `source-discovery.feature` |
| Manual Source URL Intake v0.1 | `manual-source-url-intake.feature` |

- source-quality-and-freshness.feature: Source quality/freshness indicators, unknown caveats, and report summary behavior
25 changes: 25 additions & 0 deletions acceptance/manual-source-url-intake.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Feature: Manual Source URL Intake

Scenario: User starts an investigation with optional source URLs
Given the user is on the TraceMap landing page
When the user enters a research topic
And the user enters one or more source URLs
And the user starts the investigation
Then a new analysis run should be created
And manual source URLs should be passed into source intake
And manual source URLs should be prioritized over discovered sources
And the run page should display the investigation result

Scenario: User enters duplicate source URLs
Given the user is on the TraceMap landing page
When the user enters duplicate source URLs
And the user starts the investigation
Then duplicate URLs should be removed before source intake
And the run should not fail because of duplicates

Scenario: User enters an invalid source URL
Given the user is on the TraceMap landing page
When the user enters an invalid source URL
And the user starts the investigation
Then the form should show a clear validation error
And no broken analysis run should be created
1 change: 1 addition & 0 deletions e2e/home.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ test("home page and health endpoint are reachable", async ({ page, request }) =>

await expect(page.getByRole("heading", { level: 1, name: "TraceMap" })).toBeVisible();
await expect(page.getByRole("button", { name: "Start Investigation" })).toBeVisible();
await expect(page.getByTestId("manual-source-urls-input")).toBeVisible();
});
1 change: 1 addition & 0 deletions specs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,6 @@ Each feature spec should describe:
| Run Cache | [run-cache.md](./run-cache.md) |
| Source Intake and Fetching v0.1 | [source-intake-and-fetching.md](./source-intake-and-fetching.md) |
| Source Discovery v0.1 | [source-discovery.md](./source-discovery.md) |
| Manual Source URL Intake v0.1 | [manual-source-url-intake.md](./manual-source-url-intake.md) |

- source-quality-and-freshness.md: Source Quality & Freshness Inspector v0.1 (derived quality signals, unknown caveats, report summary)
91 changes: 91 additions & 0 deletions specs/manual-source-url-intake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Manual Source URL Intake v0.1

## Purpose

Allow users to submit optional source URLs from the landing page so Investigation Mission runs can prioritize user-specified references in source intake without changing the core answer/evidence pipeline.

## User value

- Users can seed investigations with known high-signal references (official IR, press releases, public docs, papers, government data).
- Evidence Map traceability improves because user-intended sources are treated as first-class candidates.
- Invalid input is blocked early with clear form feedback, preventing broken runs.

## Scope

- Add optional multi-line URL input (`sourceUrls`) on landing intake form.
- Parse, validate, normalize, and deduplicate URLs in server action.
- Pass valid manual URLs through run creation options to source intake.
- Merge manual URLs with topic-extracted/discovered URLs, prioritizing manual URLs.
- Keep OpenAI provider schema unchanged; continue using existing `sourceCandidates` path.
- Keep existing Source Cache / Fetch Snapshot route for URL resolution.

## Non-goals

- RAG, embeddings, reranking, full-text crawling, PDF parsing.
- Background jobs, streaming response.
- New source tables, auth/workspace changes, upload flows.
- Large OpenAI provider schema redesign.

## Existing implementation constraints

- `AnalysisRun.question` is not renamed in this slice.
- `question` form field remains required and unchanged.
- Investigation mode selector behavior remains unchanged.
- Existing Evidence Map / Unknown Map / Source Lineage / Briefing Report / Report Export Lite must keep working.

## Data model strategy

- No Prisma schema changes and no DB migration.
- Manual URLs are transient form input passed via server-side options.
- Source candidate persistence continues via existing `source_snapshots` and source cache/fetch snapshot linkage.

## UI requirements

- Add textarea between Research topic and Investigation depth.
- Label: `Optional source URLs`.
- Help text: `Add one URL per line. TraceMap will prioritize these sources when building the evidence map.`
- Name: `sourceUrls`.
- `data-testid="manual-source-urls-input"`.
- Optional input, empty means existing behavior.
- Validation error can be shown near existing form error region.

## Server action requirements

- Read `sourceUrls` from `FormData`.
- Split by line, trim, drop empty lines.
- Validate as absolute `http(s)` URLs.
- Normalize and dedupe before forwarding.
- On invalid line(s), return form error and do not create run.
- Error message: `Source URLs must be valid http(s) URLs, one per line.`

## Source intake requirements

- Accept `manualSourceUrls` option in `buildSourceIntakeFromQuestion`.
- Merge URL inputs in this precedence order:
1) manual source URLs,
2) URLs extracted from question text,
3) discovery provider URLs.
- Deduplicate by normalized URL while preserving higher-priority origin.
- Invalid URLs should be safely reported into `ignoredUrls` if they still reach intake.

## Provider requirements

- OpenAI provider schema remains unchanged.
- Manual URLs are surfaced only via existing `sourceCandidates` context.
- Optional prompt tweak may prefer user-provided candidates, but no large prompt inflation.

## Cache requirements

- Manual URLs can change output; avoid stale run-cache reuse.
- For v0.1 safety: skip run-cache lookup/store when `manualSourceUrls` are present.

## Test requirements

- Unit tests for manual URL parser/validator normalization + dedupe.
- Server action behavior for valid/invalid/manual-empty paths.
- Source intake merge priority and duplicate removal coverage.
- Existing tests remain green.

## Acceptance references

- `acceptance/manual-source-url-intake.feature`
11 changes: 10 additions & 1 deletion src/app/actions/create-run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import { redirect } from "next/navigation";

import { resolveInvestigationMode } from "@/server/analysis/investigation-limits";
import { createAnalysisRunFromProvider } from "@/server/analysis/create-analysis-run-from-provider";
import { parseManualSourceUrls } from "@/app/actions/manual-source-urls";

export type CreateRunFormState = {
error?: string;
Expand All @@ -24,6 +25,14 @@ export async function createMockRunAction(
typeof rawMode === "string" ? rawMode : undefined,
);

const runId = await createAnalysisRunFromProvider(raw.trim(), { mode });
const manualSourceUrlsResult = parseManualSourceUrls(formData.get("sourceUrls"));
if (manualSourceUrlsResult.kind === "error") {
return { error: manualSourceUrlsResult.message };
}

const runId = await createAnalysisRunFromProvider(raw.trim(), {
mode,
manualSourceUrls: manualSourceUrlsResult.manualSourceUrls,
});
redirect(`/runs/${runId}` as Route);
}
30 changes: 30 additions & 0 deletions src/app/actions/manual-source-urls.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { normalizeSourceUrl } from "@/server/analysis/source-url-normalization";

export const MANUAL_SOURCE_URLS_ERROR_MESSAGE =
"Source URLs must be valid http(s) URLs, one per line.";

export type ParseManualSourceUrlsResult =
| { kind: "ok"; manualSourceUrls: string[] }
| { kind: "error"; message: string };

export function parseManualSourceUrls(raw: FormDataEntryValue | null): ParseManualSourceUrlsResult {
if (typeof raw !== "string" || raw.trim().length === 0) {
return { kind: "ok", manualSourceUrls: [] };
}

const lines = raw
.split(/\r?\n/)
.map((line) => line.trim())
.filter((line) => line.length > 0);

const unique = new Set<string>();
for (const line of lines) {
const normalized = normalizeSourceUrl(line);
if (normalized.kind !== "ok") {
return { kind: "error", message: MANUAL_SOURCE_URLS_ERROR_MESSAGE };
}
unique.add(normalized.normalizedUrl);
}

return { kind: "ok", manualSourceUrls: [...unique] };
}
17 changes: 17 additions & 0 deletions src/features/landing/components/question-intake.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,30 @@ export function QuestionIntake() {
/>
<div className="muted" data-testid="research-topic-examples" style={{ marginTop: "0.6rem" }}>
<p>Examples:</p>
<p style={{ marginTop: "0.35rem" }}>公式URLを貼ると、根拠確認と出典追跡がしやすくなります。</p>
<ul style={{ marginTop: "0.35rem", paddingLeft: "1rem" }}>
<li>トヨタ自動車のEV戦略について、成長要因・リスク・競合状況・未確認事項を根拠付きで整理する</li>
<li>国内生成AI市場の主要プレイヤーを比較し、公開情報ベースで市場機会と不明点を整理する</li>
<li>RAGとAIエージェントの違いを、技術的主張・根拠・未確認点に分解する</li>
<li>中小企業向けSaaS市場で、Vertical SaaSが伸びる要因とリスクを調査する</li>
</ul>
</div>
<label className="question-label" htmlFor="sourceUrls" style={{ marginTop: "1rem" }}>
Optional source URLs
</label>
<p className="muted" style={{ marginTop: "0.25rem", marginBottom: "0.5rem" }}>
Add one URL per line. TraceMap will prioritize these sources when building the evidence map.
</p>
<textarea
id="sourceUrls"
name="sourceUrls"
placeholder={`https://example.com/official-report
https://example.com/press-release
https://example.com/technical-doc`}
rows={4}
disabled={isPending}
data-testid="manual-source-urls-input"
/>
<label className="question-label" htmlFor="mode" style={{ marginTop: "1rem" }}>
Investigation depth
</label>
Expand Down
25 changes: 16 additions & 9 deletions src/server/analysis/create-analysis-run-from-provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import type { InvestigationMode } from "@/server/analysis/investigation-limits";

export type CreateAnalysisRunOptions = {
mode?: InvestigationMode;
manualSourceUrls?: string[];
};

/**
Expand All @@ -31,6 +32,8 @@ export async function createAnalysisRunFromProvider(
options.mode ?? process.env.TRACEMAP_INVESTIGATION_MODE?.trim(),
);

const hasManualSourceUrls = (options.manualSourceUrls?.length ?? 0) > 0;

const run = await prisma.analysisRun.create({
data: {
question,
Expand All @@ -53,8 +56,9 @@ export async function createAnalysisRunFromProvider(
let payload: GeneratedAnswerGraphPayload | null = null;
let shouldStoreRunCache = false;

try {
const cached = await lookupRunCacheEntry(cacheKeyInfo);
if (!hasManualSourceUrls) {
try {
const cached = await lookupRunCacheEntry(cacheKeyInfo);
if (cached.kind === "hit") {
console.info("[analysis] run cache hit", {
runId: run.id,
Expand All @@ -69,17 +73,20 @@ export async function createAnalysisRunFromProvider(
reason: cached.reason,
});
}
} catch (cause) {
console.error("[analysis] run cache lookup failed", {
runId: run.id,
cause,
});
} catch (cause) {
console.error("[analysis] run cache lookup failed", {
runId: run.id,
cause,
});
}
}

if (payload === null) {
let sourceIntake: SourceIntakeResult = { candidates: [], ignoredUrls: [] };
try {
sourceIntake = await buildSourceIntakeFromQuestion(question);
sourceIntake = await buildSourceIntakeFromQuestion(question, {
manualSourceUrls: options.manualSourceUrls ?? [],
});
} catch (cause) {
console.error("[analysis] source intake failed", { runId: run.id, cause });
}
Expand Down Expand Up @@ -152,7 +159,7 @@ export async function createAnalysisRunFromProvider(
} satisfies Prisma.AnalysisRunUpdateInput,
});

if (shouldStoreRunCache) {
if (shouldStoreRunCache && !hasManualSourceUrls) {
try {
await storeRunCacheEntry({ cacheKeyInfo, payload });
console.info("[analysis] run cache stored", {
Expand Down
27 changes: 22 additions & 5 deletions src/server/analysis/source-intake/source-intake-service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,21 @@ import type { SourceCandidate, SourceIntakeResult } from "@/types/source-intake"
import { resolveSourceCacheForUrl } from "@/server/analysis/source-cache-service";
import { extractUrls } from "@/server/analysis/source-intake/extract-urls";
import { resolveSourceDiscoveryProvider } from "@/server/analysis/source-discovery/resolve-source-discovery-provider";
import { DEFAULT_DISCOVERY_MAX_RESULTS, DEFAULT_SOURCE_CANDIDATE_MAX_RESULTS } from "@/server/analysis/source-discovery/source-discovery-service";
import {
DEFAULT_DISCOVERY_MAX_RESULTS,
DEFAULT_SOURCE_CANDIDATE_MAX_RESULTS,
} from "@/server/analysis/source-discovery/source-discovery-service";

export async function buildSourceIntakeFromQuestion(question: string): Promise<SourceIntakeResult> {
const manualUrls = extractUrls(question);
export type BuildSourceIntakeOptions = {
manualSourceUrls?: string[];
};

export async function buildSourceIntakeFromQuestion(
question: string,
options: BuildSourceIntakeOptions = {},
): Promise<SourceIntakeResult> {
const topicUrls = extractUrls(question);
const manualUrls = options.manualSourceUrls ?? [];
const discoveryProvider = resolveSourceDiscoveryProvider();
const ignoredUrls: SourceIntakeResult["ignoredUrls"] = [];

Expand All @@ -24,6 +35,7 @@ export async function buildSourceIntakeFromQuestion(question: string): Promise<S

const merged = [
...manualUrls.map((url) => ({ url, origin: "manual_url" as const })),
...topicUrls.map((url) => ({ url, origin: "topic_url" as const })),
...discoveredUrls.map((url) => ({ url, origin: "discovered" as const })),
];

Expand Down Expand Up @@ -52,14 +64,19 @@ export async function buildSourceIntakeFromQuestion(question: string): Promise<S
normalizedUrl: result.normalizedUrl,
originalUrl: result.originalUrl,
finalUrl: result.finalUrl,
label: result.finalUrl ? new URL(result.finalUrl).hostname : new URL(result.normalizedUrl).hostname,
label: result.finalUrl
? new URL(result.finalUrl).hostname
: new URL(result.normalizedUrl).hostname,
excerpt: result.excerpt,
contentType: result.contentType,
httpStatus: result.httpStatus,
fetchedAt: result.checkedAt,
sourceCacheEntryId: result.sourceCacheEntryId,
sourceFetchSnapshotId: result.sourceFetchSnapshotId,
fetchErrorMessage: result.verificationStatus !== "verified" ? `verification_status:${result.verificationStatus}` : null,
fetchErrorMessage:
result.verificationStatus !== "verified"
? `verification_status:${result.verificationStatus}`
: null,
origin: item.origin,
});

Expand Down
2 changes: 1 addition & 1 deletion src/types/source-intake.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
export type SourceCandidateOrigin = "manual_url" | "discovered";
export type SourceCandidateOrigin = "manual_url" | "topic_url" | "discovered";

export type SourceCandidate = {
normalizedUrl: string;
Expand Down
34 changes: 34 additions & 0 deletions tests/manual-source-urls.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import { describe, expect, it } from "vitest";

import {
MANUAL_SOURCE_URLS_ERROR_MESSAGE,
parseManualSourceUrls,
} from "@/app/actions/manual-source-urls";

describe("parseManualSourceUrls", () => {
it("returns empty list for blank input", () => {
expect(parseManualSourceUrls(null)).toEqual({ kind: "ok", manualSourceUrls: [] });
expect(parseManualSourceUrls(" ")).toEqual({ kind: "ok", manualSourceUrls: [] });
});

it("normalizes and deduplicates urls by normalized form", () => {
const result = parseManualSourceUrls(
"https://Example.com/a?utm_source=x&k=1\nhttps://example.com/a?k=1#frag\nhttps://example.com/b",
);
expect(result.kind).toBe("ok");
if (result.kind === "ok") {
expect(result.manualSourceUrls).toEqual([
"https://example.com/a?k=1",
"https://example.com/b",
]);
}
});

it("rejects invalid or non-http urls", () => {
const invalid = parseManualSourceUrls("not-a-url");
expect(invalid).toEqual({ kind: "error", message: MANUAL_SOURCE_URLS_ERROR_MESSAGE });

const ftp = parseManualSourceUrls("ftp://example.com/a");
expect(ftp).toEqual({ kind: "error", message: MANUAL_SOURCE_URLS_ERROR_MESSAGE });
});
});
Loading
Loading