HTML translation that preserves your markup.
Tags stay tags. Attributes stay attributes. Class names, href values, data attributes — none of them change. Only the visible text your users actually read is translated.
Broken markup is worse than no translation.
Sending HTML through a standard translation API is risky. Tags get partially translated. Attribute values get corrupted. Class names that shouldn't change do. The output looks correct in plain text but breaks in a browser — and it breaks silently. For a content-heavy site or a CMS export, fixing corrupt HTML across 10 languages is not a reasonable workflow.
Worse, standard translation APIs often translate content that should never change — CSS class names, href URLs, src attributes, aria labels that contain technical identifiers, and data attributes. The resulting HTML no longer renders correctly, and some changes are subtle enough that they pass visual testing before causing runtime errors.
For CMS-generated HTML, email templates, or any HTML being programmatically processed, post-translation repair is not optional. It becomes a hidden maintenance cost that compounds with every content update.
PolyLingo parses the DOM before translating.
PolyLingo treats your HTML as a document, not a string. It traverses the DOM structure, identifies text nodes containing visible content, and sends only those to the translation model. Tag names, attribute values, class names, href values, data attributes — nothing structural is ever seen by the translation model. The output HTML is valid and structurally identical to what you sent.
PolyLingo uses a DOM parser rather than treating HTML as raw text. Before any translation occurs, the HTML is parsed into a node tree. Each node is classified: text nodes that contain natural language are extracted for translation, while element nodes, attribute values that are non-linguistic (class names, IDs, hrefs, srcs, data attributes), and non-translatable content (scripts, styles, code blocks) are excluded entirely.
The translated text nodes are then reinserted into the original DOM structure. The resulting HTML is semantically equivalent to the source, with identical tag nesting, attribute values, and structural markup. Only the visible text changes.
<article class="post">
<h1 class="post-title">How to build a multilingual site</h1>
<p>Building a <strong>multilingual website</strong> doesn't have to be
complicated. The key is choosing the right translation layer.</p>
<a href="/pricing" class="cta-button" aria-label="View pricing">
See our plans
</a>
<img src="/hero.png" alt="Multilingual platform dashboard" />
</article><article class="post">
<h1 class="post-title">Comment créer un site multilingue</h1>
<p>Créer un <strong>site web multilingue</strong> n'a pas à être
compliqué. La clé est de choisir la bonne couche de traduction.</p>
<a href="/pricing" class="cta-button" aria-label="Voir les tarifs">
Voir nos offres
</a>
<img src="/hero.png" alt="Tableau de bord de la plateforme multilingue" />
</article>What gets translated vs what gets preserved
| Translated | Preserved | |
|---|---|---|
| Text content between tags | ✓ | ✕ |
| Title attribute (accessible label) | ✓ | ✕ |
| Alt text on images | ✓ | ✕ |
| Aria-label attributes | ✓ | ✕ |
| Placeholder text in inputs | ✓ | ✕ |
| HTML tag names | ✕ | ✓ |
| Class and ID attributes | ✕ | ✓ |
| href and src URLs | ✕ | ✓ |
| Data attributes | ✕ | ✓ |
| Script and style blocks | ✕ | ✓ |
What PolyLingo handles in every HTML translation
- ✓Tag names and structure never modified
- ✓Class names, IDs, data attributes untouched
- ✓Link href and src values preserved exactly
- ✓Only visible text nodes translated
- ✓RTL languages: dir attribute handled correctly
- ✓Works with full pages, fragments, and components
- ✓Nested HTML — any depth of element nesting handled correctly
- ✓Email HTML — inline styles and table-based layouts preserved
How to translate HTML content with PolyLingo
Send your HTML to the API
POST your HTML content to /v1/translate. Set format to "html" or omit it — PolyLingo detects HTML automatically from the content. Include your target language codes.
Receive clean, translated HTML
The response contains one translated HTML string per target language. Every tag, class, ID, and attribute is exactly as you sent it. Only the natural language text has changed.
Write to your CMS, template, or file
Use the translated HTML directly in your CMS, email builder, static site generator, or any other tool that consumes HTML. No post-processing or repair needed.
Where HTML translation is needed
CMS page and post exports
Headless CMS platforms store content as HTML or rich text that serialises to HTML. PolyLingo translates this content for each locale while preserving the structure and formatting the CMS created.
Email templates
Email HTML is notoriously fragile — table-based layouts, inline styles, and deeply nested structures break when passed through naive translation. PolyLingo preserves all of it.
E-commerce product descriptions
Product descriptions often contain formatted HTML with styled lists, bold text, and structured content. Translating them at scale requires exact format preservation so the output renders consistently across languages.
Frequently asked questions about HTML translation
Does PolyLingo translate attributes like title and alt text?
Yes. alt attributes on images, title attributes on elements, aria-label attributes, and placeholder attributes on form inputs are identified as containing natural language text and are translated. href and src attributes, class names, IDs, and data attributes are not translated.
What about inline JavaScript or style attributes?
Script tags and style tags are never translated — their contents are passed through unchanged. Inline style attributes (style="...") are also preserved exactly. The only content that is translated is natural language text.
Can PolyLingo handle full-page HTML including doctype and head?
Yes. PolyLingo can handle a full HTML document including doctype, head, and body. The title element, meta description, and Open Graph tags in the head are translatable. The canonical URL, meta charset, and technical meta tags are preserved unchanged.
Does it work with HTML generated by rich-text editors like ProseMirror or TipTap?
Yes. HTML output from rich-text editors serialises to standard HTML. PolyLingo handles any valid HTML regardless of how it was generated. The DOM parser works from the HTML string itself, not from any editor-specific format.
What happens with HTML entities like & or ©?
HTML entities are decoded before translation and re-encoded in the output. Named entities like &, ©, and — are preserved. Numeric entities are also handled correctly. The translated output uses the same entity encoding as the source.
Is there a file size limit for HTML translation?
Individual translation requests are limited to 64KB of content. For longer documents (long-form articles, full pages), the recommended approach is to split at natural boundaries (sections, articles) and reassemble after translation. This also helps with token usage accuracy.
Translate your HTML without the cleanup.
Free tier. No credit card. Valid HTML out the other side.
Paste any HTML and see exactly what translates and what doesn't.