Asciidoc-pdf is a fast evoluving project. CJK support was still on the wish list when I blog about it last time. But at the day of writing, it’s CJK support is in somewhat usable state and there are a lot of other progresses (see its milestone on GitHub). More specifically, asciidoctor is able to process CJK documents and produce CJK enabled PDFs with the help of a custom theme, although the line wrap breaks currently (a fix has already been proposed). Here is my experiences.

The technology

Prawn (the underlying layout engine of asciidoc-pdf) provides the ability to use custom TTF font for texts. This enables the display of any language texts (even right-to-left languages) as long as a correct font file is provided. asciidoctor-pdf exports this functionality by providing a font configuration section in the theme and an option to use custom themes. So using CJK charactors in asciidoctor will just work with the right font configuration.

Prawn also supports justified line wrap at word boundaries as well as unicode zero width spaces. This enables custom line wrap rules. One can insert ZWSPs to distinguish CJK words so it can wrap correctly. This is confirmed by one asciidoctor-pdf collaborator but not yet merged into asciidoc-pdf. If this is merged, asciidoctor-pdf then has basic CJK supports, which satisfies at least my own needs.

The configuration

Here is an example to use Adobe source han sans font to produce Chinese enabled PDF via asciidoctor-pdf. I list them in steps.

  1. Download a TTF version of source han sans font. Prawn only supports ttf fonts, no OTF, no OTCs.

  2. Copy asciidoctor-pdf/data into another directory, and put source han sans in data/fonts. In my case, I put 'SourceHanSansSC-Normal.ttf' and 'SourceHanSansSC-Bold.ttf' inside. The directory looks like:

    data
    ├── fonts
    │   ├── mplus1mn-bold-ascii.ttf
    │   ├── mplus1mn-bolditalic-ascii.ttf
    │   ├── mplus1mn-italic-ascii.ttf
    │   ├── mplus1mn-regular-ascii-conums.ttf
    │   ├── mplus1p-regular-multilingual.ttf
    │   ├── notoserif-bolditalic-latin.ttf
    │   ├── notoserif-bold-latin.ttf
    │   ├── notoserif-italic-latin.ttf
    │   ├── notoserif-regular-latin.ttf
    │   ├── SourceHanSansSC-Bold.ttf
    │   └── SourceHanSansSC-Normal.ttf
    └── themes
        └── cn-theme.yml
  3. Enable custome fonts in cn-theme.yml:

    font:
      catalog:
        NotoSerif:
          normal: notoserif-regular-latin.ttf
          bold: notoserif-bold-latin.ttf
          italic: notoserif-italic-latin.ttf
          bold_italic: notoserif-bolditalic-latin.ttf
        Mplus1mn:
          normal: mplus1mn-regular-ascii-conums.ttf
          bold: mplus1mn-bold-ascii.ttf
          italic: mplus1mn-italic-ascii.ttf
          bold_italic: mplus1mn-bolditalic-ascii.ttf
        Mplus1pMultilingual:
          normal: mplus1p-regular-multilingual.ttf
          bold: mplus1p-regular-multilingual.ttf
          italic: mplus1p-regular-multilingual.ttf
          bold_italic: mplus1p-regular-multilingual.ttf
        SourceHanSansSC:  (1)
          normal: SourceHanSansSC-Normal.ttf
          bold: SourceHanSansSC-Bold.ttf
          italic: SourceHanSansSC-Normal.ttf
          bold_italic: SourceHanSansSC-Normal.ttf
      fallbacks:
        - SourceHanSansSC (2)
    1 Define SourceHanSansSC font
    2 Use SourceHanSansSC as fallback fonts so only Chinese content uses it.
  4. Make pdf documents with the custom theme

    asciidoctor-pdf -a pdf-style=data/themes/cn-theme.yml -a pdf-fontsdir=data/fonts example.adoc

An example

The original documents:

= 中文文档示例
XX, zyangmath@gmail.com
v0.1
:source-highlighter: pygments

== 引言

测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.
测试中文和English混合编排。这是中文。This is English. 什么都没有。 *黑体*. _斜体_.

[source, cpp]
----
// 简单的测试粒子
#include <iostream>

int main(int argc, char** argv) {
  return 0; (1)
}
----
<1> 返回0

What it looks like:

example pdf