How do you reduce HTML tokens before sending content to an AI?
To reduce the tokens of HTML content, start by removing technical noise: scripts, styles, menus, footers, ads, useless attributes and repeated blocks. Then convert the useful content into clean Markdown to keep headings, paragraphs, lists, links, tables and important metadata in a lighter format.
Explanation
A web page’s HTML often contains much more than the content you want to analyze. CSS classes, inline styles, scripts, menus, footers, hidden components, tracking URLs and duplicated blocks consume context without improving the AI answer. Reducing tokens means improving the signal-to-noise ratio.
The right method is not to cut everything aggressively. Keep what carries meaning: heading structure, main paragraphs, useful links, tables, important images, meta tags and JSON-LD if the SEO analysis requires it. Converting to Markdown is a good compromise: it preserves editorial hierarchy while removing much of the useless code.
Formula / method
Reduction checklist:
- remove
script,style, menus, footers and repeated blocks; - remove CSS classes, inline styles and useless attributes;
- keep useful headings, paragraphs, lists, links and tables;
- keep metadata or JSON-LD only when needed;
- convert the result into clean Markdown;
- add a short and precise instruction.
Concrete example
A page may contain 40,000 characters of HTML but only a few thousand characters of truly useful content. If you remove scripts, styles, navigation and repeated blocks, the final Markdown becomes shorter, more readable and easier for ChatGPT, Claude or Gemini to analyze.
Common mistake
Do not reduce tokens so much that you remove meaning. The classic trap is deleting links, headings or tables that were actually useful for the analysis. Also be careful with fixed estimates such as “characters ÷ 4”: they remain approximate and vary depending on the model, content and files.
Sources & methodology
- OpenAI — Counting tokens — Documentation on token counting, limits of local estimates and prompt optimization.
- OpenAI — Prompt engineering — Guidance on structuring prompts with Markdown, sections and hierarchy.
- CommonMark — Markdown specification — Reference for Markdown as a structured and readable plain-text format.
- WHATWG — HTML Standard — Reference for HTML, its elements and document structure.
This content follows Outilo's editorial guidelines.