HTML translation

HTML translation that preserves your markup.

Tags stay tags. Attributes stay attributes. Class names, href values, data attributes — none of them change. Only the visible text your users actually read is translated.

View the API docs →
Tags
always preserved
Class/ID
attributes untouched
href/src
never translated
36
target languages

Broken markup is worse than no translation.

Sending HTML through a standard translation API is risky. Tags get partially translated. Attribute values get corrupted. Class names that shouldn't change do. The output looks correct in plain text but breaks in a browser — and it breaks silently. For a content-heavy site or a CMS export, fixing corrupt HTML across 10 languages is not a reasonable workflow.

Worse, standard translation APIs often translate content that should never change — CSS class names, href URLs, src attributes, aria labels that contain technical identifiers, and data attributes. The resulting HTML no longer renders correctly, and some changes are subtle enough that they pass visual testing before causing runtime errors.

For CMS-generated HTML, email templates, or any HTML being programmatically processed, post-translation repair is not optional. It becomes a hidden maintenance cost that compounds with every content update.

PolyLingo parses the DOM before translating.

PolyLingo treats your HTML as a document, not a string. It traverses the DOM structure, identifies text nodes containing visible content, and sends only those to the translation model. Tag names, attribute values, class names, href values, data attributes — nothing structural is ever seen by the translation model. The output HTML is valid and structurally identical to what you sent.

PolyLingo uses a DOM parser rather than treating HTML as raw text. Before any translation occurs, the HTML is parsed into a node tree. Each node is classified: text nodes that contain natural language are extracted for translation, while element nodes, attribute values that are non-linguistic (class names, IDs, hrefs, srcs, data attributes), and non-translatable content (scripts, styles, code blocks) are excluded entirely.

The translated text nodes are then reinserted into the original DOM structure. The resulting HTML is semantically equivalent to the source, with identical tag nesting, attribute values, and structural markup. Only the visible text changes.

Input — HTML article (English)
<article class="post">
  <h1 class="post-title">How to build a multilingual site</h1>
  <p>Building a <strong>multilingual website</strong> doesn't have to be
  complicated. The key is choosing the right translation layer.</p>
  <a href="/pricing" class="cta-button" aria-label="View pricing">
    See our plans
  </a>
  <img src="/hero.png" alt="Multilingual platform dashboard" />
</article>
Output — French (tags and attributes intact)
<article class="post">
  <h1 class="post-title">Comment créer un site multilingue</h1>
  <p>Créer un <strong>site web multilingue</strong> n'a pas à être
  compliqué. La clé est de choisir la bonne couche de traduction.</p>
  <a href="/pricing" class="cta-button" aria-label="Voir les tarifs">
    Voir nos offres
  </a>
  <img src="/hero.png" alt="Tableau de bord de la plateforme multilingue" />
</article>

What gets translated vs what gets preserved

TranslatedPreserved
Text content between tags
Title attribute (accessible label)
Alt text on images
Aria-label attributes
Placeholder text in inputs
HTML tag names
Class and ID attributes
href and src URLs
Data attributes
Script and style blocks

What PolyLingo handles in every HTML translation

  • Tag names and structure never modified
  • Class names, IDs, data attributes untouched
  • Link href and src values preserved exactly
  • Only visible text nodes translated
  • RTL languages: dir attribute handled correctly
  • Works with full pages, fragments, and components
  • Nested HTML — any depth of element nesting handled correctly
  • Email HTML — inline styles and table-based layouts preserved

How to translate HTML content with PolyLingo

1

Send your HTML to the API

POST your HTML content to /v1/translate. Set format to "html" or omit it — PolyLingo detects HTML automatically from the content. Include your target language codes.

2

Receive clean, translated HTML

The response contains one translated HTML string per target language. Every tag, class, ID, and attribute is exactly as you sent it. Only the natural language text has changed.

3

Write to your CMS, template, or file

Use the translated HTML directly in your CMS, email builder, static site generator, or any other tool that consumes HTML. No post-processing or repair needed.

Where HTML translation is needed

🖥️

CMS page and post exports

Headless CMS platforms store content as HTML or rich text that serialises to HTML. PolyLingo translates this content for each locale while preserving the structure and formatting the CMS created.

📧

Email templates

Email HTML is notoriously fragile — table-based layouts, inline styles, and deeply nested structures break when passed through naive translation. PolyLingo preserves all of it.

🏪

E-commerce product descriptions

Product descriptions often contain formatted HTML with styled lists, bold text, and structured content. Translating them at scale requires exact format preservation so the output renders consistently across languages.

Frequently asked questions about HTML translation

Does PolyLingo translate attributes like title and alt text?

Yes. alt attributes on images, title attributes on elements, aria-label attributes, and placeholder attributes on form inputs are identified as containing natural language text and are translated. href and src attributes, class names, IDs, and data attributes are not translated.

What about inline JavaScript or style attributes?

Script tags and style tags are never translated — their contents are passed through unchanged. Inline style attributes (style="...") are also preserved exactly. The only content that is translated is natural language text.

Can PolyLingo handle full-page HTML including doctype and head?

Yes. PolyLingo can handle a full HTML document including doctype, head, and body. The title element, meta description, and Open Graph tags in the head are translatable. The canonical URL, meta charset, and technical meta tags are preserved unchanged.

Does it work with HTML generated by rich-text editors like ProseMirror or TipTap?

Yes. HTML output from rich-text editors serialises to standard HTML. PolyLingo handles any valid HTML regardless of how it was generated. The DOM parser works from the HTML string itself, not from any editor-specific format.

What happens with HTML entities like &amp; or &copy;?

HTML entities are decoded before translation and re-encoded in the output. Named entities like &amp;, &copy;, and &mdash; are preserved. Numeric entities are also handled correctly. The translated output uses the same entity encoding as the source.

Is there a file size limit for HTML translation?

Individual translation requests are limited to 64KB of content. For longer documents (long-form articles, full pages), the recommended approach is to split at natural boundaries (sections, articles) and reassemble after translation. This also helps with token usage accuracy.

Translate your HTML without the cleanup.

Free tier. No credit card. Valid HTML out the other side.

Paste any HTML and see exactly what translates and what doesn't.

HTML translation that preserves your markup — PolyLingo | PolyLingo