This document describes a simple method of ruby composition for Japanese layout realized with technologies like CSS, SVG, and XML-FO, as information for rendering engine implementers. Unlike JLReq [[JLREQ]], only one layout method for each case is presented in this document, with consideration of best practices and important points in Japanese layout. Points took in consideration are described in . Also layout of the double-sided ruby, which has two distinct runs of ruby text attached to the same ruby base character string, is added in this document which is not described in [[JLREQ]].
[[JLREQ]] in one part is a record of Japanese layout that has been established in printing industry. It explains multiple ways for one thing, and sometimes they can be very complex. Ruby is one such case. There are so many factors to consider and often requirements contradict each other (c.f. Note. "Protrusion of ruby from base characters"). It is challenging to automate ruby because of the complexity.
It would seem beneficial to come up with a method that is simple and robust, and one that is suitable for automatic processing. The positioning might not be as sophisticated but we must at least make sure that it causes no misunderstanding.
The following is a proposal for a simple processing system. The target audience is implementers and specification writers. It is expected that a full system may be more complex that what is described here, both due to the interaction with other features or other writing systems, and because those designing such system may wish to provide alternative options. Note that the terminology is based on that defined in [[JLREQ]].
This document was initially written in Japanese and translated to English by the Japanese Writing Technology Working Group of the Advanced Publishing Laboratory of Keio University.
It represents the subjective view of its authors and contributors as to one possible approach to address the problem, and does not claim to be the only possible solution. It is submitted to present a non-Japanese speaking audience with this particular approach, and to encourage discussion of this topic.
The original Japanese version is available in PDF format.
Ruby is the name given to the small annotations
in Japanese content that are rendered alongside base text,
usually to provide a pronunciation guide,
but sometimes to provide other information.
(See the article “What is ruby”
by the internationalization Working Group
for more information.)
When performing ruby layout in Japanese,
the following factors need to be considered
in order to decide on the position: When the ruby string protrudes from the base character string,
whether it can be allowed to be laid over the characters preceding or following,
and whether this affects the position of the base characters When there are multiple base characters,
whether there can be line wrap opportunity between them In movable type typography,
such matters were resolved based generic principles,
and could always be corrected during the proofreading phase.
Essentially, each case was adjusted individually in a flexible manner. In computer-based typesetting,
the layout needs to be more or less determined based on predetermined rules,
but it remained necessary to adjust the results in certain cases,
for example by changing the association between base characters
and the ruby string,
or by switching to a different placement policy. When thinking about computing placement for web content,
it is not practical to decide on the positioning
case by case as was done in movable type typography.
It is therefore necessary to decide upon comprehensive rules
that provide solutions to all the problems listed above,
so that placement may be determined fully automatically.
Considering all the possibilities that existed in movable type typesetting,
the system to be designed needs to be very complex. Here are the fundamental assumptions underlying the simple placement rules. Ruby in Japanese may be divided into the following 3 different types,
based on the relationship between the ruby and the base characters
(see JLReq “3.3.1 Usage of Ruby” [[JLREQ]]). Which one to use depends on the relationship
between the ruby and the base characters.
Mono-ruby is used to connect ruby to a single base character,
Jukugo-ruby is used when multiple base characters each have a corresponding ruby
and at the same time the whole group needs to be processed together,
and group-ruby is used when ruby is attached to a group of base characters together (see ).
Each is used when specified.The Difficulties of Ruby Processing
Matters considered by the placement rules
Types of ruby
The size of the ruby characters and their placement in the inline direction relative to the base characters is as follows:
The following sections describe in detail the placement of mono-ruby, jukugo-ruby, and group-ruby. However, since jukugo-ruby is more complex, it is explained last.
Mono-ruby is placed as follows. To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.
When the ruby is made of two or more characters, each character in the ruby string is placed immediately next to its neighboring character, without any inter-letter spacing. Furthermore, when the ruby is composed of characters such as Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) [[JLREQ]] which have their own individual width, they are placed based on each character’s metrics.
When the ruby string is longer than the base character string, the part of the ruby string that extends beyond the base characters must not hang over the characters preceding or following (see ). Space is introduced accordingly between these preceding or following characters and the base characters. However, in the following punctuation marks like Full stops (cl-06) [[JLREQ]] which have spacing before or after the symbols, the ruby characters do hang over the preceding or following characters (see ). (Punctuation marks like Full stops (cl-06) [[JLREQ]] play an important role as breaks between sentences, it is desired to keep constant spacing for preceding or following of these characters that having extra spacing around these characters could change a meaning of breaks between sentences. Also there is no issue like ones noted in note "Protrusion of ruby from base characters". Therefore, this method places a different layout on punctuation marks like Full stops (cl-06) [[JLREQ]]. )
When the ruby string is longer than the base character string, and the ruby falls at the start of the line, then the start of the ruby string is aligned with the line’s start edge (see ), while if the ruby falls at the end of the line, then the end of the ruby string is aligned with the line’s end edge (see ).
In this section, placement rules of group-ruby are described as combinations of two groups of characters, one as "Western characters" which has proportional width and consisted with characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), and Western characters (cl-27) [[JLREQ]], and another as "Japanese characters" which has fixed fullwidth (see also 2.1.2 Kanji, Hiragana and Katakana [[JLREQ]]) and consisted with characters like Hiragana (cl-15), Katakana (cl-16), and Ideographic characters (cl-19) [[JLREQ]]. For Western characters, strings are read by clusters of multiple characters, it is desired to avoid adding spacing between characters for justification. The way they are positioned depends on how their respective lengths would compare if they were each laid out without any inter-letter spacing. When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned (see ). For other cases, the placement depends on the following:
To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.
For both of ruby string and ruby base character string are consisted with "Japanese characters", the placement depends on the following:
When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start (see ). However, the size space inserted at the start and end must be capped at no more than half the size of one base character, and the space inserted between each ruby character is enlarged to compensate (see ).
When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start (see ).
For ruby string is consisted with "Japanese characters" and ruby base character string is consisted with Western characters, the placement depends on the following (see ):
For ruby string is consisted with Western characters and ruby base character string is consisted with "Japanese characters", the placement depends on the following (see ):
When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby (see ). Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.
In the case of group ruby, the base character string and its associated ruby string are treated as a unit, so there is no line wrapping opportunity inside either string.
Jukugo-ruby is placed as follows:
To align following items to the two-step processing method described in , points 1, 2, and 3 are of the first step, and point 4 is of the second step.With jukugo-ruby, each base character is associated with its own ruby string. When the length of each of these ruby string laid out without inter-letter spacing is shorter than the length of all their corresponding base characters, placement is determined as follows:
For simple ruby implementations, if even a single ruby string is longer than its corresponding base character when laid out without inter-letter spacing, the resulting layout would look identical to group-ruby (see and ).
Quite complexed methods are required on full rules for placement of double-sided ruby composition. For simple placement of double-sided ruby, rules could be written per combinations of mono-ruby, group-ruby, and jukugo-ruby for two sides. As the same as the two-step processing, consideration of the ruby string that extended beyond the ruby base characters with preceding and following characters, and placement at the line head or the line end are processed as the same way as when the ruby string is used for one side.
Possible combinations of type of ruby are as follows:
In JIS X 4051 [[JISX4051]], first, second, and third cases in above list of combinations are ruled. (see note in JLReq [[JLREQ]]) A rule of placement of the third case is to process continuous mono-ruby as group-ruby, and the same as the second case as a result.
For the fourth case of mono-ruby and jukugo-ruby, the first case is applicable with dividing jukugo-ruby into continuous mono-ruby by picking individual pairs of Kanji character as ruby base character and ruby string. For the fifth case of jukugo-ruby with group-ruby or jukugo-ruby, the second case is applicable with handling jukugo-ruby as group-ruby.
In this section, rules for simple placement of double-sided ruby on first and second cases as follows:
In addition, disposition of two ruby strings to two sides follows specified by the contents.
In a case of combination of mono-ruby and mono-ruby, ruby strings are set solid, and ruby strings are placed so that their center match that of the ruby base character (see ). For other points, follow the same rules for placement of mono-ruby described in [[[#placement-of-mono-ruby]]].
When both of the ruby string are shorter than the ruby base character string, follow the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]]. When the ruby string is consisted with "Japanese characters" defined in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string. (see ).
When on of the ruby strings is longer than the base character string, pick up the ruby string with longer length and place that ruby string following the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]]. When the ruby base character string is consisted with "Japanese characters" defined in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby base character string as well as the start and the end of the ruby base character string. Following placement of the ruby base character string, place the shorter ruby string based on the length of the ruby base character string without spacing at the start and the end, but with inter-character spacing when the ruby base character string is "Japanese characters".
When the length of the shorter ruby string is longer than the ruby base character string with inter-character spacing, the shorter ruby string is set solid and ruby string is placed so that its center match that of the ruby base character string (see ).
When the length of the shorter ruby string is shorter than the ruby base character string with inter-character spacing, follow the rules for placement of group-ruby described in [[[#placement-of-group-ruby]]], using the length of the ruby base character string with inter-character spacing. When the shorter ruby string is consisted with "Japanese characters" described in [[[#placement-of-group-ruby]]], spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string (see ).
For other points, follow the same rules for placement of mono-ruby described in [[[#placement-of-group-ruby]]].