This page describes platform-specific JSSCxml extensions related to speech interaction in supported Web browsers. These extensions are not intended to be always interoperable with other SCXML implementations. JSSCxml currently supports only speech synthesis out of the box.
<speak>
elementThe <speak>
element can be used wherever executable content is allowed. It connects SCXML to the Web Speech Synthesis API newly implemented in several Web browsers, and exposes most of its functionnality.
In the datamodel, _x.voices
is the list of voices supported by the platform, as returned by the SpeechSynthesis.getVoices()
method. Items of that list are not voiceURI
s, but the full SpeechSynthesisVoice
objects with a voiceURI
property.
The namespace for <speak>
must be "http://www.jsscxml.org", for which I suggest the shorthand "jssc". Thus:
<scxml xmlns="http://www.w3.org/2005/07/scxml" xmlns:jssc="http://www.jsscxml.org"> … <jssc:speak text="Hello world!" xml:lang="en-US"… /> …
The reason JSSCxml is not using the SSML namespace is that there are some new attributes on the element, and that using inline SSML content would be good-looking but completely inflexible. Instead, SSML documents can be created (or parsed from actual SSML content in a <data>
element) and manipulated by ECMAScript code and finally passed to the <speak>
element.
Name | Required | Default value | Valid values | Description |
---|---|---|---|---|
text | yes*, and no more than one of those two | none | text with optional SSML tags | The text or SSML that will be spoken |
expr | none | an expression evaluating to a string or a SSML Document | Evaluates when the <speak> element is executed, used as if there had been a text attribute with the resulting (linearized) value. | |
xml:lang | no, and only one of those two | plaform-specific | any RFC 3066 language code supported by the platform | the language of the text to be read |
langexpr | none | Evaluates when the <speak> element is executed, used as if there had been an xml:lang attribute with the resulting value. | ||
voice | no | plaform-specific | expression that must evaluate to a member of _x.voices | the voice used to read the text |
volume | no | 1 | 0 – 1 | how loud the text will be spoken |
rate | no | 1 | 0.1 – 10 | how fast the text will be spoken |
pitch | no | 1 | 0 – 2 | pitch modifier for the synthesized voice |
interrupt / nomore | no* | false | boolean | stops speaking and cancel queued utterances |
* The boolean nomore
(or interrupt
) may appear alone in the <speak>
tag.
Note that the DOMParser API used by JSSCxml to parse all this may reject SCXML documents where boolean attributes are written without a value. It is therefore advised to give them the value "true"
(although the interpreter will be happy with any value at all, as long as it gets well-formed XML).
None.
When executed, the <speak>
element causes its text or SSML content to be read by the platform's SpeechSynthesis implementation, using the supplied parameters. When the synthesizer has reached the end of the utterrance, a speak.end
event will be placed in the externa queue, with its data
field containing a reference to the underlying SpeechSynthesisUtterance object.
If the nomore
or interrupt
attribute is present, current and queued utterances will be cancelled first, so the new utterance (if supplied) will be spoken immediately, no matter what. The interpreter will place a speak.error
event in the external queue for each cancelled utterance.
At this time, Chrome and Safari's implementations disagree on the way to select a voice. Chrome's utterance objects have a voiceURI
property which can be set to the voiceURI
value of a voice, whereas Safari's utterance objects have a voice
property which accepts only references to whole SpeechSynthesisVoice
objects. In order to hide this misbehavior from authors, the voice
attribute defined here always takes a reference, and JSSCxml will ensure that each browser gets what it expects.
If no voice is specified, the xml:lang
attribute will cause the platform to choose the default voice for that language, if any is available, or at least for another geographical variation of that language. The language defined by xml:lang
higher in the document hierarchy (typically on the root element) is inherited by <speak>
elements, so there is no need to repeat it all the time.
speak
eventsspeak.*
events queued when using speech synthesis have the DOM Event origintype
, but their origin
is the corresponding SpeechSynthesisUtterance
object rather than a node. There is no reason to <send>
any event back to those objects (and the interpreter won't take them as a valid target anyway), but their text
property allows you to track which utterance it is that has started, ended, or been cancelled.
The event's data
will contain the elapsedTime
, charIndex
, and name
properties of the original DOM event instead of a copy of the event itself, as would be the case for DOM events converted in the usual way by JSSCxml.