# Web Scraper

A purpose-built fetch tool, separate from generic `http_request` / `curl`. It exists because the agent doesn't want raw HTML - it wants the *article*.

## What it does

* Fetches a URL.
* Strips boilerplate (nav, ads, footer, scripts).
* Returns clean text the agent can reason over.

## Guardrails

* Caps response at 1 MB - large pages get truncated, not silently dropped.
* 20-second timeout - slow servers don't stall the conversation.
* Subject to the same proxy and URL-guard rules as other network tools.

## What it's good for

* Reading articles, blog posts, docs pages, GitHub READMEs without the noise.
* Following up on a [Web Search](/openhuman/features/native-tools/web-search.md) result.
* Summarising a single page on demand.

## See also

* [Web Search](/openhuman/features/native-tools/web-search.md) - find URLs to feed into the scraper.
* [Smart Token Compression](/openhuman/features/token-compression.md) - what trims long pages before they hit the model.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tinyhumans.gitbook.io/openhuman/features/native-tools/web-scraper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
