Rules for Simple Placement of Japanese Ruby

This document describes a simple method of ruby composition for Japanese layout realized with technologies like CSS, SVG, and XML-FO, as information for rendering engine implementers. Unlike JLReq [[JLREQ]], only one layout method for each case is presented in this document, with consideration of best practices and important points in Japanese layout. Points took in consideration are described in . Also layout of the double-sided ruby, which has two distinct runs of ruby text attached to the same ruby base character string, is added in this document which is not described in [[JLREQ]].

[[JLREQ]] in one part is a record of Japanese layout that has been established in printing industry. It explains multiple ways for one thing, and sometimes they can be very complex. Ruby is one such case. There are so many factors to consider and often requirements contradict each other (c.f. Note. "Protrusion of ruby from base characters"). It is challenging to automate ruby because of the complexity.

It would seem beneficial to come up with a method that is simple and robust, and one that is suitable for automatic processing. The positioning might not be as sophisticated but we must at least make sure that it causes no misunderstanding.

The following is a proposal for a simple processing system. The target audience is implementers and specification writers. It is expected that a full system may be more complex that what is described here, both due to the interaction with other features or other writing systems, and because those designing such system may wish to provide alternative options. Note that the terminology is based on that defined in [[JLREQ]].

Matters considered by the simple placement rules

Ruby is the name given to the small annotations in Japanese content that are rendered alongside base text, usually to provide a pronunciation guide, but sometimes to provide other information. (See the article “What is ruby” by the internationalization Working Group for more information.)

The Difficulties of Ruby Processing

When performing ruby layout in Japanese, the following factors need to be considered in order to decide on the position:

How to handle the correspondence between the base characters and the ruby
What to do when the string of base characters is longer than the ruby string
What to do when the string of base characters is shorter than the ruby string
When the ruby string protrudes from the base character string, whether it can be allowed to be laid over the characters preceding or following, and whether this affects the position of the base characters

From principle not to have extra spacing between characters in Japanese composition, one method of layout is to lay over the ruby string preceding or following Hiragana (cl-15) [[JLREQ]] characters but not to lay over the ruby string preceding or following Kanji characters. This method works well when characters preceding and following the ruby base are both Kana Hiragana (cl-15) [[JLREQ]] or both Kanji. It however can have an unbalanced look when one is Hiragana (cl-15) [[JLREQ]] and the other is Kanji and if the ruby string is longer than the ruby base. (see as mono-ruby case, first example takes a rule to lay over the ruby string preceding or following Hiragana (cl-15) [[JLREQ]] characters but not to lay over the ruby string preceding or following Kanji characters.) This is also the same for group-ruby (see ). In letterpress printing, layout was adjusted with a situation of each case taken into account. Some publishers placed an additional rule for ruby string in Katakana (cl-16) [[JLREQ]] not to lay over the Katakana (cl-16) [[JLREQ]] ruby string any of preceding or following character, since ruby string in Katakana (cl-16) [[JLREQ]] are considered as a unit. For such case, a method not to lay over any ruby string could gain layout with balanced appearance depends on size of the ruby string (see , second example takes the method).

Mono-ruby with ruby characters protruding from their base characters (kanji as next character)

Group-ruby with ruby characters protruding from their base characters (kanji as next character)

Group-ruby with ruby characters protruding from their base characters (kana as before and next character)
When the ruby string protrudes from the base character string, and the base character string is at the start or the end of the line, whether the base character string or the ruby string should be aligned with the line edge
When there are multiple base characters, whether there can be line wrap opportunity between them

In movable type typography, such matters were resolved based generic principles, and could always be corrected during the proofreading phase. Essentially, each case was adjusted individually in a flexible manner.

In computer-based typesetting, the layout needs to be more or less determined based on predetermined rules, but it remained necessary to adjust the results in certain cases, for example by changing the association between base characters and the ruby string, or by switching to a different placement policy.

When thinking about computing placement for web content, it is not practical to decide on the positioning case by case as was done in movable type typography. It is therefore necessary to decide upon comprehensive rules that provide solutions to all the problems listed above, so that placement may be determined fully automatically. Considering all the possibilities that existed in movable type typesetting, the system to be designed needs to be very complex.

Matters considered by the placement rules

Here are the fundamental assumptions underlying the simple placement rules.

Ruby is used to display the reading or the meaning of the base characters. Therefore, the number one priority here is to avoid misreadings. Specifically, the ruby string which protrudes from the base character string is not allowed to be laid over the characters preceding or following, whether it is a Kanji or Kana character.
The main placement method defined in JIS X 4051 [[JISX4051]] allows some amount of overhang over the preceding and following base characters, but recognizes the method defined here as an allowed variant.
The method is agnostic to horizontal vs vertical writing, and will use the same logic in either case. Specifically, the center of the ruby string and of the base character string are aligned in the inline direction for mono-ruby.
Two-step processing method is taken. In the first step, processing of layout only considers about the ruby string and the base character string (collectively call both of them as the ruby block in this document), to decide relative position of the ruby string and the ruby base character string. In the second step, processing of layout decides a position of the ruby base character string in a line, with consideration of preceding and following characters. In other words, the relative position of the ruby string and the ruby base character string decided in the first step is not modified regarding of any preceding and following characters. Also, this document does not take a method to align the first or last character of the ruby base character string to the line head or the line end, by modifying the relative position of the ruby string and the ruby base character string when the ruby base character string is placed at the line head or the line end. Summarizing the above, resulting positionings by the first step are not modified by the second step at all.
Although there are cases where multiple ways of positioning ruby are shown in [[JLREQ]] and JIS X 4051 [[JISX4051]], this document only describes one method based on the policies described above. Also methods described in this document are mostly chosen from ones provided in JIS X 4051 [[JISX4051]]. In some cases, this document picks optional methods to be allowed as implementation defined, such that protruding ruby string is not laid over any preceding and following Kana characters.
There is a demand to use larger (or smaller) font size for ruby string. In this document, the default font size of ruby string is set to half of the font size of ruby base character string, and examples in figures are shown with the default font size. Sizes of spacing adjustments during justification are defined based on the font size of ruby base character string but not of ruby string, and this makes methods of layout are applicable for cases whose font size of ruby string is not a half of its ruby base character string.

Types of ruby

Ruby in Japanese may be divided into the following 3 different types, based on the relationship between the ruby and the base characters (see JLReq “3.3.1 Usage of Ruby” [[JLREQ]]).

Mono-ruby
Jukugo-ruby
Group-ruby

Which one to use depends on the relationship between the ruby and the base characters. Mono-ruby is used to connect ruby to a single base character, Jukugo-ruby is used when multiple base characters each have a corresponding ruby and at the same time the whole group needs to be processed together, and group-ruby is used when ruby is attached to a group of base characters together (see ). Each is used when specified.

Rules for Simple Placement of Japanese Ruby

Ruby character size and character placement

The size of the ruby characters and their placement in the inline direction relative to the base characters is as follows:

The size of the ruby is by default set to half of the size of the base characters.
In vertical text, ruby is placed to the right of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters.

Example of vertical ruby
In horizontal text, ruby is placed to the top of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters.

Example of horizontal ruby

The following sections describe in detail the placement of mono-ruby, jukugo-ruby, and group-ruby. However, since jukugo-ruby is more complex, it is explained last.

Placement of mono-ruby

Mono-ruby is placed as follows. To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.

When the ruby is made of two or more characters, each character in the ruby string is placed immediately next to its neighboring character, without any inter-letter spacing. Furthermore, when the ruby is composed of characters such as Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) [[JLREQ]] which have their own individual width, they are placed based on each character’s metrics.

Example mono-ruby with western characters
The center of the ruby string and of the base character string are aligned in the inline direction. (see ).
Since the base character and its associated ruby form a single unit there is no line wrapping opportunity inside a mono-ruby.
When the ruby string is longer than the base character string, the part of the ruby string that extends beyond the base characters must not hang over the characters preceding or following (see ). Space is introduced accordingly between these preceding or following characters and the base characters.

Example 1 of mono-ruby protruding

However, in the following punctuation marks like Full stops (cl-06) [[JLREQ]] which have spacing before or after the symbols, the ruby characters do hang over the preceding or following characters (see ). (Punctuation marks like Full stops (cl-06) [[JLREQ]] play an important role as breaks between sentences, it is desired to keep constant spacing for preceding or following of these characters that having extra spacing around these characters could change a meaning of breaks between sentences. Also there is no issue like ones noted in note "Protrusion of ruby from base characters". Therefore, this method places a different layout on punctuation marks like Full stops (cl-06) [[JLREQ]]. )
- If the character preceding the base character is one of: Closing brackets (cl-02), Full stops (cl-06), Commas (cl-07), Full-width ideographic space (cl-14), or Middle dots (cl-05) [[JLREQ]], then the ruby must hang over the blank portion at the end the character. (This blank portion is usually half the character’s width, except in the case of Middle dots (cl-05) [[JLREQ]] where it is a fourth of the character width). However, if this blank part has been compressed due to justification or similar processing of the line, then the ruby may only hang over the resulting compressed blank space (e.g. if it was reduced from half to a quarter em, hang at most a quarter em).
- If the character following the base character is one of: Opening brackets (cl-01) or Full-width ideographic space (cl-14), Middle dots (cl-05) [[JLREQ]], then the ruby must hang over the blank portion at the start the character. (This blank portion is usually half the character’s width for Opening brackets (cl-01), or a quarter of the character’s width for Middle dots (cl-05) [[JLREQ]]) However, if this blank part has been compressed due to justification or similar processing of the line, then the ruby may only hang over the resulting compressed blank space (e.g. if it was reduced from half to a quarter em, hang at most a quarter em).
Example 2 of mono-ruby protruding
When the ruby string is longer than the base character string, and the ruby falls at the start of the line, then the start of the ruby string is aligned with the line’s start edge (see ), while if the ruby falls at the end of the line, then the end of the ruby string is aligned with the line’s end edge (see ).

Example of mono-ruby at the line start

Example of mono-ruby at the line end

Placement of group-ruby

In this section, placement rules of group-ruby are described as combinations of two groups of characters, one as "Western characters" which has proportional width and consisted with characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), and Western characters (cl-27) [[JLREQ]], and another as "Japanese characters" which has fixed fullwidth (see also 2.1.2 Kanji, Hiragana and Katakana [[JLREQ]]) and consisted with characters like Hiragana (cl-15), Katakana (cl-16), and Ideographic characters (cl-19) [[JLREQ]]. For Western characters, strings are read by clusters of multiple characters, it is desired to avoid adding spacing between characters for justification. The way they are positioned depends on how their respective lengths would compare if they were each laid out without any inter-letter spacing. When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned (see ). For other cases, the placement depends on the following:

To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.

For both of ruby string and ruby base character string are consisted with "Japanese characters", the placement depends on the following:
- When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start (see ).
  
  Example 2 of group-ruby
  
  However, the size space inserted at the start and end must be capped at no more than half the size of one base character, and the space inserted between each ruby character is enlarged to compensate (see ).
  
  Example 3 of group-ruby
- When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start (see ).
  
  Example 4 of group-ruby
For ruby string is consisted with "Japanese characters" and ruby base character string is consisted with Western characters, the placement depends on the following (see ):
- When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start.
- When the ruby string is longer than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned. In this case, the ruby string protrudes from the base character string.
Example of ruby with western characters
For ruby string is consisted with Western characters and ruby base character string is consisted with "Japanese characters", the placement depends on the following (see ):
- When the ruby string is shorter than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned.
- When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start.
When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby (see ). Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.

Example of protruding group-ruby
In the case of group ruby, the base character string and its associated ruby string are treated as a unit, so there is no line wrapping opportunity inside either string.

As group-ruby is treated as a unit, there is no wrap opportunity. However, there are examples where allowing wrapping may be desirable. In such cases, based on appropriate association of base characters and ruby characters, handling the wrapping opportunities the same way the are handled for jukugo-ruby may be appropriate.

Wrapping group-ruby

Placement of Jukugo-ruby

Jukugo-ruby is placed as follows:

To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and point 4 is of the second step.

With jukugo-ruby, each base character is associated with its own ruby string. When the length of each of these ruby string laid out without inter-letter spacing is shorter than the length of all their corresponding base characters, placement is determined as follows:
- When the ruby string associated with an individual base character is 1 character long, the ruby character and the base character are placed such that their respective centers in the inline direction are aligned (see ).
  
  Example 1 of jukugo-ruby
- When the ruby string associated with an individual base character is 2 characters long or more, the ruby string is laid out without inter-letter spacing, and placed such that its center and the center of its base character are aligned in the inline direction (see ).
For simple ruby implementations, if even a single ruby string is longer than its corresponding base character when laid out without inter-letter spacing, the resulting layout would look identical to group-ruby (see and ).

Example 2 of jukugo-ruby

Example 3 of jukugo-ruby
With jukugo-ruby, individual base characters and their associated ruby string are treated as a unit, and line wrap opportunities are allowed between two base characters. When such a line wrap occurs, if a single base character that is part of the jukugo is placed alone at the end or at the start of a line, it is laid out identically to mono-ruby; conversely when several base characters that are part of the jukugo are placed together at the end or start of a line, they are laid out together as has been described in this section about jukugo-ruby (see ).

Example of wrapping jukugo-ruby
When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby. Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.

Placement of Double-Sided Ruby

Placement of Double-Sided Ruby by Combination of Type of Ruby

Quite complexed methods are required on full rules for placement of double-sided ruby composition. For simple placement of double-sided ruby, rules could be written per combinations of mono-ruby, group-ruby, and jukugo-ruby for two sides. As the same as the two-step processing, consideration of the ruby string that extended beyond the ruby base characters with preceding and following characters, and placement at the line head or the line end are processed as the same way as when the ruby string is used for one side.

When two adjacent rows has double-sided ruby, overlap of two ruby strings could happen for configuration of space between lines, which cases should be avoided. Following methods could be applied to avoid these cases:

Configure space between lines over whole document in advance to avoid overlap of two ruby strings.
Use wider line gap where overlap of two ruby strings happens to avoid overlap. In some methods, a rule to have quarter em gap between two ruby strings in addition to avoiding overlap was applied.
Place lines with double-sided rruby at regions for multiple lines, instead of having wider line gaps. For example, lines with double-sided ruby are set at the center of a two line space ("center of two line space"). (see JLReq "Processing of Gyou-dori" [[JLREQ]])
Use wider line gap to avoid overlap of two ruby strings over whole paragraph but not the line gap where overlap of two ruby strings happens.

In letterpress printing, first method was used for cases whose document has a large number of ruby or many reference marks of notes, second or third method was used for cases whose document has less ruby. In automatic processing used for the Web documents, third method might be suitable. Using third method with assigning integer number of line spaces (for example, center of two line space) makes disorder of line positioning to be limited to the line, and line positions aligned to next column. (see also JLReq "Adjustment of Processing of Realm in Block Direction" [[JLREQ]])

Combination of type of ruby

Possible combinations of type of ruby are as follows:

Mono-ruby and mono-ruby
Group-ruby and group-ruby
Mono-ruby and group-ruby
Mono-ruby and jukugo-ruby
Jukugo-ruby with group-ruby or jukugo-ruby

Rules for Placement of Double-Sided Ruby per Combinations

In JIS X 4051 [[JISX4051]], first, second, and third cases in above list of combinations are ruled. (see note in JLReq [[JLREQ]]) A rule of placement of the third case is to process continuous mono-ruby as group-ruby, and the same as the second case as a result.

For the fourth case of mono-ruby and jukugo-ruby, the first case is applicable with dividing jukugo-ruby into continuous mono-ruby by picking individual pairs of Kanji character as ruby base character and ruby string. For the fifth case of jukugo-ruby with group-ruby or jukugo-ruby, the second case is applicable with handling jukugo-ruby as group-ruby.

In this section, rules for simple placement of double-sided ruby on first and second cases as follows:

In addition, disposition of two ruby strings to two sides follows specified by the contents.

Placement of combination of mono-ruby and mono-ruby

In a case of combination of mono-ruby and mono-ruby, ruby strings are set solid, and ruby strings are placed so that their center match that of the ruby base character (see ). For other points, follow the same rules for placement of mono-ruby described in [[[#placement-of-mono-ruby]]].

Double-sided rruby example with both mono-ruby

Placement of combination of group-ruby and group-ruby

When both of the ruby string are shorter than the ruby base character string, follow the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]]. When the ruby string is consisted with "Japanese characters" defined in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string. (see ).

Double-sided rruby example 1 with both group-ruby

When on of the ruby strings is longer than the base character string, pick up the ruby string with longer length and place that ruby string following the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]]. When the ruby base character string is consisted with "Japanese characters" defined in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby base character string as well as the start and the end of the ruby base character string. Following placement of the ruby base character string, place the shorter ruby string based on the length of the ruby base character string without spacing at the start and the end, but with inter-character spacing when the ruby base character string is "Japanese characters".

When the length of the shorter ruby string is longer than the ruby base character string with inter-character spacing, the shorter ruby string is set solid and ruby string is placed so that its center match that of the ruby base character string (see ).

Double-sided ruby example 2 with both group-ruby

When the length of the shorter ruby string is shorter than the ruby base character string with inter-character spacing, follow the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]], using the length of the ruby base character string with inter-character spacing. When the shorter ruby string is consisted with "Japanese characters" described in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string (see ).

Double-sided ruby example 3 with both group-ruby

For other points, follow the same rules for placement of mono-ruby described in [[[#placement-of-group-ruby]]].