Outilo Outilo

How do you reduce HTML tokens before sending content to an AI?

Edited by Outilo Reviewed by Yoann Begue Last verified on 24/05/2026
Quick answer

To reduce the tokens of HTML content, start by removing technical noise: scripts, styles, menus, footers, ads, useless attributes and repeated blocks. Then convert the useful content into clean Markdown to keep headings, paragraphs, lists, links, tables and important metadata in a lighter format.

Explanation

A web page’s HTML often contains much more than the content you want to analyze. CSS classes, inline styles, scripts, menus, footers, hidden components, tracking URLs and duplicated blocks consume context without improving the AI answer. Reducing tokens means improving the signal-to-noise ratio.

The right method is not to cut everything aggressively. Keep what carries meaning: heading structure, main paragraphs, useful links, tables, important images, meta tags and JSON-LD if the SEO analysis requires it. Converting to Markdown is a good compromise: it preserves editorial hierarchy while removing much of the useless code.

Formula / method

Reduction checklist:

  • remove script, style, menus, footers and repeated blocks;
  • remove CSS classes, inline styles and useless attributes;
  • keep useful headings, paragraphs, lists, links and tables;
  • keep metadata or JSON-LD only when needed;
  • convert the result into clean Markdown;
  • add a short and precise instruction.

Concrete example

A page may contain 40,000 characters of HTML but only a few thousand characters of truly useful content. If you remove scripts, styles, navigation and repeated blocks, the final Markdown becomes shorter, more readable and easier for ChatGPT, Claude or Gemini to analyze.

Common mistake

Do not reduce tokens so much that you remove meaning. The classic trap is deleting links, headings or tables that were actually useful for the analysis. Also be careful with fixed estimates such as “characters ÷ 4”: they remain approximate and vary depending on the model, content and files.


Similar questions