Skip to content

nk2028/opencc-js

Repository files navigation

opencc-js

npm package badge GitHub Testing Badge jsDelivr Monthly Downloads Badge Socket.dev Supply Chain Security Badge

繁體版 - 简体版

The Pure JavaScript version of Open Chinese Convert (OpenCC)

opencc-js is a pure JavaScript implementation of OpenCC for both browsers and Node.js. It bundles dictionary data generated from opencc-data at build time, and no native binary is required.

The conversion pipeline aligns with the official OpenCC implementation, including phrase-level segmentation for the built-in converters, verified against upstream OpenCC test cases and golden outputs. Exact parity with the official OpenCC output is not guaranteed for all inputs.

opencc-js supports the OpenCC mmseg-style segmentation used by the built-in converters, but does not support extended segmenters such as jieba.

Note: For a comparison with the opencc and opencc-wasm packages, see below.

Data

Dictionary data is generated from opencc-data at build time and bundled in the published package. Browser usage does not fetch extra dictionary text files at runtime.

To avoid producing tofu boxes for glyphs that are often missing from browser and system fonts, opencc-js does not bundle OpenCC's TSCharactersExt tofu-risk mappings. A small number of rare Traditional-to-Simplified extension-character conversions may therefore intentionally differ from the upstream OpenCC test data.

Usage

Choose the installation method that matches your environment.

Important: Version 1.3.2-next.0 contains a critical bugfix. If you are using a CDN or self-hosted build, use this prerelease until the next stable release is published.

Install opencc-js for Node.js or a bundler

npm install opencc-js

ES modules:

import OpenCC from 'opencc-js';

CommonJS:

const OpenCC = require('opencc-js');

Use opencc-js in a browser

Self-hosted ES module:

<script type="module">
  import OpenCC from './dist/esm/full.js';

  const converter = OpenCC.Converter({ from: 'cn', to: 'tw' });
  console.log(converter('汉语'));
</script>

CDN ES module:

<script type="module">
  // Use the latest stable version from https://www.npmjs.com/package/opencc-js, or 1.3.2-next.0 for the latest bugfix
  import OpenCC from 'https://cdn.jsdelivr.net/npm/opencc-js@1.3.2-next.0/dist/esm/full.js';

  const converter = OpenCC.Converter({ from: 'cn', to: 'tw' });
  console.log(converter('汉语'));
</script>

UMD build for plain script tags:

<!-- Use the latest stable version from https://www.npmjs.com/package/opencc-js, or 1.3.2-next.0 for the latest bugfix -->

<script src="https://cdn.jsdelivr.net/npm/opencc-js@1.3.2-next.0/dist/umd/full.js"></script>

Basic usage

// Convert Traditional Chinese (Hong Kong) to Simplified Chinese (Mainland China)
const converter = OpenCC.Converter({ from: 'hk', to: 'cn' });
console.log(converter('漢語')); // output: 汉语

Custom Converter

const converter = OpenCC.CustomConverter([
  ['香蕉', 'banana'],
  ['蘋果', 'apple'],
  ['梨', 'pear'],
]);
console.log(converter('香蕉 蘋果 梨')); // output: banana apple pear

Or using space and vertical bar as delimiter.

const converter = OpenCC.CustomConverter('香蕉 banana|蘋果 apple|梨 pear');
console.log(converter('香蕉 蘋果 梨')); // output: banana apple pear

Add words

  • Use low-level function ConverterFactory to create converter.
  • Get dictionary from the property Locale.
const customDict = [
  ['“', '「'],
  ['”', '」'],
  ['‘', '『'],
  ['’', '』'],
];
const converter = OpenCC.ConverterFactory(
  OpenCC.Locale.from.cn,                   // Simplified Chinese (Mainland China) => OpenCC standard
  OpenCC.Locale.to.tw.concat([customDict]) // OpenCC standard => Traditional Chinese (Taiwan) with custom words
);
console.log(converter('悟空道:“师父又来了。怎么叫做‘水中捞月’?”'));
// output: 悟空道:「師父又來了。怎麼叫做『水中撈月』?」

This will get the same result with an extra conversion.

const customDict = [
  ['“', '「'],
  ['”', '」'],
  ['‘', '『'],
  ['’', '』'],
];
const converter = OpenCC.ConverterFactory(
  OpenCC.Locale.from.cn, // Simplified Chinese (Mainland China) => OpenCC standard
  OpenCC.Locale.to.tw,   // OpenCC standard => Traditional Chinese (Taiwan)
  [customDict]           // Traditional Chinese (Taiwan) => custom words
);
console.log(converter('悟空道:“师父又来了。怎么叫做‘水中捞月’?”'));
// output: 悟空道:「師父又來了。怎麼叫做『水中撈月』?」

DOM operations

HTML attribute lang='*' defines the targets.

<span lang="zh-HK">漢語</span>
// Set Chinese convert from Traditional (Hong Kong) to Simplified (Mainland China)
const converter = OpenCC.Converter({ from: 'hk', to: 'cn' });
// Set the conversion starting point to the root node, i.e. convert the whole page
const rootNode = document.documentElement;
// Convert all elements with attributes lang='zh-HK'. Change attribute value to lang='zh-CN'
const HTMLConvertHandler = OpenCC.HTMLConverter(converter, rootNode, 'zh-HK', 'zh-CN');
HTMLConvertHandler.convert(); // Convert  -> 汉语
HTMLConvertHandler.restore(); // Restore  -> 漢語

API

  • .Converter({}): declare the converter's direction via locales.
    • default: { from: 'tw', to: 'cn' }
    • syntax : { from: locale1, to: locale2 }
  • locales: letter codes defining a writing locale and, occasionally, its idiomatic habits.
    • cn: Simplified Chinese (Mainland China)
    • tw: Traditional Chinese (Taiwan)
      • twp: with phrase conversion (ex: 自行車 -> 腳踏車)
    • hk: Traditional Chinese (Hong Kong)
    • jp: Japanese Shinjitai
    • t: Traditional Chinese (OpenCC standard. For most use cases, prefer a regional locale such as tw or hk)
  • .CustomConverter([]) : defines custom dictionary.
    • default: []
    • syntax : [ ['item1','replacement1'], ['item2','replacement2'], … ]
  • .HTMLConverter(converter, rootNode, langAttrInitial, langAttrNew ) : uses previously defined converter() to convert all HTML elements text content from a starting root node and down, into the target locale. Also converts all attributes lang from existing langAttrInitial to langAttrNew values, and converts placeholder and aria-label attributes.
  • lang attributes : html attribute defines the languages of the text content to the browser, at start (langAttrInitial) and after conversion (langAttrNew).
  • ignore-opencc : html class signaling an element and its sub-nodes will not be converted.

Bundle optimization

  • Tree Shaking (ES Modules Only) may result less size of bundle file.
  • Using ConverterFactory instead of Converter.
  • Prefer explicit locale dictionaries such as tw, hk, or cn over the generic OpenCC standard t preset.
import * as OpenCC from 'opencc-js/core'; // primary code
import * as Locale from 'opencc-js/preset'; // dictionary

const converter = OpenCC.ConverterFactory(Locale.from.hk, Locale.to.cn);
console.log(converter('漢語'));

Differences between various opencc npm packages

There are three related npm packages for OpenCC conversion. They differ in runtime environment, implementation approach, and segmentation support.

opencc-js is a pure JavaScript implementation for browsers and Node.js. It bundles dictionary data generated from opencc-data at build time, requiring no native binaries and no runtime file fetching. Its conversion pipeline aligns with the official OpenCC implementation, including mmseg-style phrase segmentation for built-in converters, verified against upstream OpenCC test cases and golden outputs. Exact parity with the official OpenCC output is not guaranteed for all inputs. Extended segmenters such as Jieba are not supported.

opencc is the official Node.js native binding for the OpenCC C++ project. It depends on native or prebuilt binaries and follows the official OpenCC engine. Extended segmentation algorithms such as Jieba are supported when the official OpenCC configuration and runtime allow it.

opencc-wasm is another browser-capable implementation using WebAssembly. Its configuration and conversion logic stay aligned with the official opencc package, and it can support Jieba segmentation through the official OpenCC runtime.

opencc-js opencc opencc-wasm
Browser
Node.js
Implementation Pure JavaScript Native C++ binding WebAssembly
Native binary required
Dictionary source Bundled at build time Loaded at runtime Loaded at runtime
Aligned with official OpenCC Approximately
mmseg segmentation
Jieba segmentation available

About

The JavaScript version of Open Chinese Convert (OpenCC)

Topics

Resources

License

Stars

Watchers

Forks

Contributors