This document describes a simple method of ruby composition
for Japanese layout realized with technologies like CSS, SVG, and XML-FO,
as information for rendering engine implementers.
Unlike
JLReq
[[JLREQ]], only one layout method for each case is presented in this
document, with consideration of best practices and important points in
Japanese layout.
Points took in consideration are described in
.
Also layout of the double-sided ruby, which has two distinct runs of
ruby text attached to the same ruby base character string, is added in
this document which is not described in [[JLREQ]].
[[JLREQ]] in one part is a record of Japanese layout that has been
established in printing industry. It explains multiple ways for one thing,
and sometimes they can be very complex. Ruby is one such case.
There are so many factors to consider and often requirements contradict
each other (c.f. Note. "Protrusion of ruby from base characters").
It is challenging to automate ruby because of the complexity.
It would seem beneficial to come up with a method that is simple and robust,
and one that is suitable for automatic processing. The positioning might
not be as sophisticated but we must at least make sure that it causes no
misunderstanding.
The following is a proposal for a simple processing system. The target audience is implementers and specification writers.
It is expected that a full system may be more complex that what is described here,
both due to the interaction with other features or other writing systems,
and because those designing such system may wish to provide alternative options.
Note that the terminology is based on that defined in [[JLREQ]].
It represents the subjective view of its authors and contributors
as to one possible approach to address the problem,
and does not claim to be the only possible solution.
It is submitted to present a non-Japanese speaking audience
with this particular approach,
and to encourage discussion of this topic.
Ruby is the name given to the small annotations
in Japanese content that are rendered alongside base text,
usually to provide a pronunciation guide,
but sometimes to provide other information.
(See the article “What is ruby”
by the internationalization Working Group
for more information.)
The Difficulties of Ruby Processing
When performing ruby layout in Japanese,
the following factors need to be considered
in order to decide on the position:
How to handle the correspondence between the base characters and the ruby
What to do when the string of base characters is longer than the ruby string
What to do when the string of base characters is shorter than the ruby string
When the ruby string protrudes from the base character string,
whether it can be allowed to be laid over the characters preceding or following,
and whether this affects the position of the base characters
When the ruby string protrudes from the base character string,
and the base character string is at the start or the end of the line,
whether the base character string or the ruby string should be aligned with the line edge
When there are multiple base characters,
whether there can be line wrap opportunity between them
In movable type typography,
such matters were resolved based generic principles,
and could always be corrected during the proofreading phase.
Essentially, each case was adjusted individually in a flexible manner.
In computer-based typesetting,
the layout needs to be more or less determined based on predetermined rules,
but it remained necessary to adjust the results in certain cases,
for example by changing the association between base characters
and the ruby string,
or by switching to a different placement policy.
When thinking about computing placement for web content,
it is not practical to decide on the positioning
case by case as was done in movable type typography.
It is therefore necessary to decide upon comprehensive rules
that provide solutions to all the problems listed above,
so that placement may be determined fully automatically.
Considering all the possibilities that existed in movable type typesetting,
the system to be designed needs to be very complex.
Matters considered by the placement rules
Here are the fundamental assumptions underlying the simple placement rules.
Ruby is used to display the reading or the meaning of the base characters.
Therefore, the number one priority here is to avoid misreadings.
Specifically, the ruby string which protrudes from the base character
string is not allowed to be laid over the characters preceding or
following, whether it is a Kanji or Kana character.
The method is agnostic to horizontal vs vertical writing,
and will use the same logic in either case.
Specifically, the center of the ruby string and of the base character
string are aligned in the inline direction for mono-ruby.
Two-step processing method is taken.
In the first step, processing of layout only considers about the ruby
string and the base character string (collectively call both of them as
the ruby block in this document), to decide relative position of the ruby
string and the ruby base character string.
In the second step, processing of layout decides a position of the ruby
base character string in a line, with consideration of preceding and
following characters.
In other words, the relative position of the ruby string and the
ruby base character string decided in the first step is not modified
regarding of any preceding and following characters.
Also, this document does not take a method to align the first or last
character of the ruby base character string to the line head or the line
end, by modifying the relative position of the ruby string and the ruby
base character string when the ruby base character string is placed at
the line head or the line end.
Summarizing the above, resulting positionings by the first step are not
modified by the second step at all.
Although there are cases where multiple ways of positioning ruby are shown in [[JLREQ]]
and JIS X 4051 [[JISX4051]], this document only describes one method based on
the policies described above.
Also methods described in this document are mostly chosen from ones provided
in JIS X 4051 [[JISX4051]].
In some cases, this document picks optional methods to be allowed as
implementation defined, such that protruding ruby string is not laid over
any preceding and following Kana characters.
There is a demand to use larger (or smaller) font size for ruby string.
In this document, the default font size of ruby string is set to half of the
font size of ruby base character string, and examples in figures are
shown with the default font size.
Sizes of spacing adjustments during justification are defined based on
the font size of ruby base character string but not of ruby string,
and this makes methods of layout are applicable for cases whose font size
of ruby string is not a half of its ruby base character string.
Types of ruby
Ruby in Japanese may be divided into the following 3 different types,
based on the relationship between the ruby and the base characters
(see JLReq “3.3.1 Usage of Ruby” [[JLREQ]]).
Mono-ruby
Jukugo-ruby
Group-ruby
Types of ruby
Which one to use depends on the relationship
between the ruby and the base characters.
Mono-ruby is used to connect ruby to a single base character,
Jukugo-ruby is used when multiple base characters each have a corresponding ruby
and at the same time the whole group needs to be processed together,
and group-ruby is used when ruby is attached to a group of base characters together (see ).
Each is used when specified.
Rules for Simple Placement of Japanese Ruby
Ruby character size and character placement
The size of the ruby characters
and their placement in the inline direction relative to the base characters is as follows:
The size of the ruby is by default set to
half of the size of the base characters.
In vertical text, ruby is placed to the right of the base characters,
and the character frame of the ruby is placed flush
against the character frame of the base characters.
Example of vertical ruby
In horizontal text, ruby is placed to the top of the base characters,
and the character frame of the ruby is placed flush
against the character frame of the base characters.
Example of horizontal ruby
The following sections describe in detail the placement of
mono-ruby,
jukugo-ruby,
and group-ruby.
However, since jukugo-ruby is more complex,
it is explained last.
Placement of mono-ruby
Mono-ruby is placed as follows.
To align following items to the two-step processing method described in
,
points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second
step.
When the ruby is made of two or more characters,
each character in the ruby string is placed
immediately next to its neighboring character,
without any inter-letter spacing.
Furthermore, when the ruby is composed of characters such as
Grouped numerals (cl-24),
Unit symbols (cl-25),
Western word space (cl-26),
or Western characters (cl-27) [[JLREQ]]
which have their own individual width,
they are placed based on each character’s metrics.
Example mono-ruby with western characters
The center of the ruby string and of the base character string
are aligned in the inline direction.
(see ).
Since the base character and its associated ruby form a single unit
there is no line wrapping opportunity inside a mono-ruby.
When the ruby string is longer than the base character string,
the part of the ruby string that extends beyond the base characters
must not hang over the characters preceding or following
(see ).
Space is introduced accordingly
between these preceding or following characters and the base characters.
Example 1 of mono-ruby protruding
However, in the following punctuation marks
like Full stops (cl-06)
[[JLREQ]]
which have spacing before or after the symbols,
the ruby characters do hang over the preceding or following characters
(see ).
(Punctuation marks like
Full stops (cl-06)
[[JLREQ]] play an important role as breaks between sentences,
it is desired to keep constant spacing for preceding or following of these
characters that having extra spacing around these characters could change
a meaning of breaks between sentences.
Also there is no issue like ones noted in note
"Protrusion of ruby from base characters".
Therefore, this method places a different layout on punctuation marks like
Full stops (cl-06)
[[JLREQ]].
)
If the character preceding the base character is one of:
Closing brackets (cl-02),
Full stops (cl-06),
Commas (cl-07),
Full-width ideographic space (cl-14),
or Middle dots (cl-05) [[JLREQ]],
then the ruby must hang over
the blank portion at the end the character.
(This blank portion is usually half the character’s width,
except in the case of Middle dots (cl-05) [[JLREQ]]
where it is a fourth of the character width).
However, if this blank part has been compressed
due to justification or similar processing of the line,
then the ruby may only hang over the resulting
compressed blank space
(e.g. if it was reduced from half to a quarter em,
hang at most a quarter em).
If the character following the base character is one of:
Opening brackets (cl-01) or
Full-width ideographic space (cl-14),
Middle dots (cl-05) [[JLREQ]],
then the ruby must hang over
the blank portion at the start the character.
(This blank portion is usually
half the character’s width for Opening brackets (cl-01),
or a quarter of the character’s width for Middle dots (cl-05) [[JLREQ]])
However, if this blank part has been compressed
due to justification or similar processing of the line,
then the ruby may only hang over the resulting
compressed blank space
(e.g. if it was reduced from half to a quarter em,
hang at most a quarter em).
Example 2 of mono-ruby protruding
When the ruby string is longer than the base character string,
and the ruby falls at the start of the line,
then the start of the ruby string is aligned with the line’s start edge
(see ),
while if the ruby falls at the end of the line,
then the end of the ruby string is aligned with the line’s end edge
(see ).
Example of mono-ruby at the line startExample of mono-ruby at the line end
Placement of group-ruby
In this section, placement rules of group-ruby are described as combinations
of two groups of characters,
one as "Western characters" which has proportional width
and consisted with characters like
Grouped numerals (cl-24),
Unit symbols (cl-25),
Western word space (cl-26),
and
Western characters (cl-27)
[[JLREQ]],
and another as "Japanese characters" which has fixed fullwidth
(see also 2.1.2 Kanji, Hiragana and Katakana [[JLREQ]])
and consisted with characters like
Hiragana (cl-15),
Katakana (cl-16), and
Ideographic characters (cl-19)
[[JLREQ]].
For Western characters, strings are read by clusters of multiple characters,
it is desired to avoid adding spacing between characters for justification.
The way they are positioned depends
on how their respective lengths would
compare if they were each laid out
without any inter-letter spacing.
When their respective lengths would be the same,
both are laid out without inter-letter spacing
and placed such that their respective centers in the inline direction are aligned
(see ).
For other cases, the placement depends on the following:
Example 1 of group-ruby
To align following items to the two-step processing method described in
,
points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second
step.
For both of ruby string and ruby base character string are consisted with
"Japanese characters",
the placement depends on the following:
When the ruby string is shorter than the base character string,
space is inserted between every character in the ruby string
as well as at the start and the end of the ruby string
so that it becomes the same length as the base character string,
then their centers in the inline direction are aligned.
The size of the space inserted between each of the ruby characters
is twice the size of the space inserted at the end and at the start
(see ).
Example 2 of group-ruby
However, the size space inserted at the start and end must
be capped at no more than half the size of one base character,
and the space inserted between each ruby character is enlarged to compensate
(see ).
Example 3 of group-ruby
When the ruby string is longer than the base character string,
space is inserted between every character in the base character string
as well as at the start and the end of the base character string
so that it becomes the same length as the ruby string,
then their centers in the inline direction are aligned.
The size of the space inserted between each of the base characters
is twice the size of the space inserted at the end and at the start
(see ).
Example 4 of group-ruby
For ruby string is consisted with "Japanese characters" and
ruby base character string is consisted with Western characters,
the placement depends on the following (see ):
When the ruby string is shorter than the base character string,
space is inserted between every character in the ruby string
as well as at the start and the end of the ruby string
so that it becomes the same length as the base character string,
then their centers in the inline direction are aligned.
The size of the space inserted between each of the ruby characters
is twice the size of the space inserted at the end and at the start.
When the ruby string is longer than the base character string,
both are laid out without inter-letter spacing
and placed such that their respective centers in the inline direction are aligned.
In this case, the ruby string protrudes from the base character string.
Example of ruby with western characters
For ruby string is consisted with Western characters and
ruby base character string is consisted with "Japanese characters",
the placement depends on the following (see ):
When the ruby string is shorter than the base character string,
both are laid out without inter-letter spacing
and placed such that their respective centers in the inline direction are aligned.
When the ruby string is longer than the base character string,
space is inserted between every character in the base character string
as well as at the start and the end of the base character string
so that it becomes the same length as the ruby string,
then their centers in the inline direction are aligned.
The size of the space inserted between each of the base characters
is twice the size of the space inserted at the end and at the start.
When the ruby string is longer than the base character string and protrudes,
whether and how it hangs over characters preceding or following
the base character string
is handled in the same way as with mono-ruby
(see ).
Also, when the ruby string is longer than the base character string,
protrudes, and is located at the start or end of the line,
the resulting layout is also identical to that of mono-ruby.
Example of protruding group-ruby
In the case of group ruby,
the base character string and its associated ruby string
are treated as a unit,
so there is no line wrapping opportunity inside either string.
Placement of Jukugo-ruby
Jukugo-ruby is placed as follows:
To align following items to the two-step processing method described in
,
points 1, 2, and 3 are of the first step, and point 4 is of the second
step.
With jukugo-ruby, each base character is associated with its own ruby string.
When the length of each of these ruby string laid out without inter-letter spacing
is shorter than the length of all their corresponding base characters,
placement is determined as follows:
When the ruby string associated with an individual base character is 1 character long,
the ruby character and the base character
are placed such that their respective centers in the inline direction are aligned
(see ).
Example 1 of jukugo-ruby
When the ruby string associated with an individual base character is 2 characters long or more,
the ruby string is laid out without inter-letter spacing,
and placed such that its center and the center of its base character are aligned in the inline direction
(see ).
For simple ruby implementations,
if even a single ruby string is longer than its corresponding base character
when laid out without inter-letter spacing,
the resulting layout would look identical to group-ruby
(see and ).
Example 2 of jukugo-rubyExample 3 of jukugo-ruby
With jukugo-ruby, individual base characters and their associated ruby string are treated as a unit,
and line wrap opportunities are allowed between two base characters.
When such a line wrap occurs,
if a single base character that is part of the jukugo is placed alone at the end or at the start of a line,
it is laid out identically to mono-ruby;
conversely when several base characters that are part of the jukugo
are placed together at the end or start of a line,
they are laid out together as has been described in this section about jukugo-ruby
(see ).
Example of wrapping jukugo-ruby
When the ruby string is longer than the base character string and protrudes,
whether and how it hangs over characters preceding or following
the base character string
is handled in the same way as with mono-ruby.
Also, when the ruby string is longer than the base character string,
protrudes, and is located at the start or end of the line,
the resulting layout is also identical to that of mono-ruby.
Placement of Double-Sided Ruby
Placement of Double-Sided Ruby by Combination of Type of Ruby
Quite complexed methods are required on full rules for placement of double-sided ruby
composition. For simple placement of double-sided ruby, rules could be written per
combinations of mono-ruby, group-ruby, and jukugo-ruby for two sides.
As the same as the two-step processing, consideration of the ruby string that
extended beyond the ruby base characters with preceding and following
characters, and placement at the line head or the line end are processed as
the same way as when the ruby string is used for one side.
Combination of type of ruby
Possible combinations of type of ruby are as follows:
Mono-ruby and mono-ruby
Group-ruby and group-ruby
Mono-ruby and group-ruby
Mono-ruby and jukugo-ruby
Jukugo-ruby with group-ruby or jukugo-ruby
Rules for Placement of Double-Sided Ruby per Combinations
In JIS X 4051 [[JISX4051]], first, second, and third cases in above list of
combinations are ruled.
(see note in JLReq [[JLREQ]])
A rule of placement of the third case is to process continuous mono-ruby as
group-ruby, and the same as the second case as a result.
For the fourth case of mono-ruby and jukugo-ruby, the first case is applicable
with dividing jukugo-ruby into continuous mono-ruby by picking individual pairs of Kanji character as ruby base character and ruby string.
For the fifth case of jukugo-ruby with group-ruby or jukugo-ruby, the second
case is applicable with handling jukugo-ruby as group-ruby.
In this section, rules for simple placement of double-sided ruby on first and second
cases as follows:
In addition, disposition of two ruby strings to two sides follows specified by
the contents.
Placement of combination of mono-ruby and mono-ruby
In a case of combination of mono-ruby and mono-ruby, ruby strings are set
solid, and ruby strings are placed so that their center match that of the
ruby base character
(see ).
For other points, follow the same rules for placement of mono-ruby
described in [[[#placement-of-mono-ruby]]].
Double-sided rruby example with both mono-ruby
Placement of combination of group-ruby and group-ruby
When both of the ruby string are shorter than the ruby base character string,
follow the rules for placement of group-ruby described in
[[[#placement-of-group-ruby]]].
When the ruby string is consisted with "Japanese characters" defined in
[[[#placement-of-group-ruby]]],
spacing is inserted between every character in the ruby string as well as the
start and the end of the ruby string.
(see ).
Double-sided rruby example 1 with both group-ruby
When on of the ruby strings is longer than the base character string,
pick up the ruby string with longer length and place that ruby string
following the rules for placement of group-ruby described in
[[[#placement-of-group-ruby]]].
When the ruby base character string is consisted with "Japanese characters"
defined in
[[[#placement-of-group-ruby]]],
spacing is inserted between every character in the ruby base character string
as well as the start and the end of the ruby base character string.
Following placement of the ruby base character string,
place the shorter ruby string based on the length of the ruby base character
string without spacing at the start and the end,
but with inter-character spacing when the ruby base character string is
"Japanese characters".
When the length of the shorter ruby string is longer than the ruby base
character string with inter-character spacing, the shorter ruby string is
set solid and ruby string is placed so that its center match that of the
ruby base character string
(see ).
Double-sided ruby example 2 with both group-ruby
When the length of the shorter ruby string is shorter than the ruby base
character string with inter-character spacing,
follow the rules for placement of group-ruby described in
[[[#placement-of-group-ruby]]],
using the length of the ruby base character string with inter-character
spacing.
When the shorter ruby string is consisted with "Japanese characters" described in
[[[#placement-of-group-ruby]]],
spacing is inserted between every character in the ruby string
as well as the start and the end of the ruby string
(see ).
Double-sided ruby example 3 with both group-ruby
For other points, follow the same rules for placement of mono-ruby
described in [[[#placement-of-group-ruby]]].