Hi Math WG,
Continuing on feedback for a future MathML specification, here is a (probably non-exhaustive) list of inconsistencies between MathML and HTML5/CSS regarding whitespace and attributes canonicalization. As a rule of thumb, it would be better for web engines if MathML can align on HTML5 so that we can reuse as much code as possible and avoid extra code to handle MathML special cases. Also people familiar with HTML5 will be less surprised when handling MathML. 1) Whitespace collapsing/trimming https://www.w3.org/TR/MathML/chapter2.html#fund.collapse Whitespace collapsing is consistent with the default CSS property "white-space" and people are familiar with it. Removing "whitespace at the beginning and end of the content" is less expected. Gecko has some code to handle this but it would be very helpful to avoid this additional complexity. WebKit does not handle it at the moment and it's not clear it's worth doing it... Except in the MathML spec/test, everybody seems to just write <mo>(</mo> and not <mo> ( </mo>. Can we deprecate this behavior in MathML4? Or maybe you should work with the HTML5 WG to define such collapsing rules during document parsing, so that the MathML rendering code no longer need to handle it? 2) In MathML, white spaces are understood as XML spaces (U+0020), tabs (U+0009), line feeds (U+000A), and carriage returns (U+000D) while HTML5 also includes "form feed" (U+000C). https://www.w3.org/TR/html5/infrastructure.html#space-character 3) MathML attributes are case-sensitive while HTML5 attributes are case-insensitive. case-sensitiveness is probably not a problem for users and it's easier for the parsing. However, WebKit developers writing or reviewing patches have often considered doing case-insensitive comparisons as that's consistent with the rest of the code base. 4) MathML boolean attributes take value "true" and "false". In HTML5, the boolean value is given by the presence/absence of the attribute and the only allowed value is the name of the attribute. This allows to get more compact syntax like <mo largeop stretchy> instead of <mo largeop="true" stretchy="true">. However, Web engines and authoring tools will continue to support the true/false syntax anyway, so it's probably not worth adding complexity here... https://www.w3.org/TR/html5/infrastructure.html#boolean-attributes 5) As I said in a previous message, the values "small", "normal", "big" of mathsize do not exist for CSS font-size. Removing them will simplify a bit the parsing code. 6) The definition of numbers is also not very accurate in the MathML recommendation compared to HTML5. One has to check the RelaxNG schemas and the predefined RelaxNG types to know the exact syntax. Again, it think it would be best to rely on the HTML5 definitions. For example, <math><mspace width="1E1em" height="10em" mathbackground="red"/></math> draws a red square in WebKit but Gecko says "1E1em" is invalid. https://www.w3.org/TR/html5/infrastructure.html#numbers Frédéric |
On 01/08/2016 16:31, Frédéric Wang wrote:
> Hi Math WG, Some personal "first thought" replies ... > > Continuing on feedback for a future MathML specification, here is a > (probably non-exhaustive) list of inconsistencies between MathML and > HTML5/CSS regarding whitespace and attributes canonicalization. As a > rule of thumb, it would be better for web engines if MathML can align on > HTML5 so that we can reuse as much code as possible and avoid extra code > to handle MathML special cases. Also people familiar with HTML5 will be > less surprised when handling MathML. > > 1) Whitespace collapsing/trimming > https://www.w3.org/TR/MathML/chapter2.html#fund.collapse > > Whitespace collapsing is consistent with the default CSS property > "white-space" and people are familiar with it. > > Removing "whitespace at the beginning and end of the content" is less > expected. Gecko has some code to handle this but it would be very > helpful to avoid this additional complexity. WebKit does not handle it > at the moment and it's not clear it's worth doing it... Except in the > MathML spec/test, everybody seems to just write <mo>(</mo> and not <mo> > ( </mo>. Can we deprecate this behavior in MathML4? Or maybe you should > work with the HTML5 WG to define such collapsing rules during document > parsing, so that the MathML rendering code no longer need to handle it? white space is always a problem:-) but I'd be sorry to just drop this completely, it's a well established feature of math typesetting (in TeX and elsewhere) that user-whitespace is ignored and the math typesetter re-adds white space as needed. That said, I agree that the fact that TeX treats 1+2 like 1 + 2 doesn't necessarily mean that mathml should treat <mo>+</mo> like <mo> + </mo>. If the trimming could happen during text/html parsing that would simplify some things. > > 2) In MathML, white spaces are understood as XML spaces (U+0020), tabs > (U+0009), line feeds (U+000A), and carriage returns (U+000D) while HTML5 > also includes "form feed" (U+000C). > > https://www.w3.org/TR/html5/infrastructure.html#space-character > Probably we should just change that. Either always include U+000C or specify white space characters are XML white space in application/xml parsing and html white space in text/html parsing or something ... > 3) MathML attributes are case-sensitive while HTML5 attributes are > case-insensitive. case-sensitiveness is probably not a problem for users > and it's easier for the parsing. However, WebKit developers writing or > reviewing patches have often considered doing case-insensitive > comparisons as that's consistent with the rest of the code base. Do you mean the attribute values or the attribute names? For the latter my understanding is that it's the same as (x)html in that the text/html parser will normalise the case of the attribute name (to lower case except for definitionURL) so giving an appearance of case insensitivity > > 4) MathML boolean attributes take value "true" and "false". In HTML5, > the boolean value is given by the presence/absence of the attribute and > the only allowed value is the name of the attribute. This allows to get > more compact syntax like <mo largeop stretchy> instead of <mo > largeop="true" stretchy="true">. However, Web engines and authoring > tools will continue to support the true/false syntax anyway, so it's > probably not worth adding complexity here... I don't think allowing stretchy=stretchy as an alternative to stretch=true would break anything on the XML side of things, and would potentially, as you say, allow just stretchy in text/html using its version of the old SGML shorttag feature. You could say more than me whether that would simplify or complicate things at implementation level. > > https://www.w3.org/TR/html5/infrastructure.html#boolean-attributes > > 5) As I said in a previous message, the values "small", "normal", "big" > of mathsize do not exist for CSS font-size. Removing them will simplify > a bit the parsing code. Are these conceptually more difficult than css names like small,medium,large,x-large? (just asking:-) > > 6) The definition of numbers is also not very accurate in the MathML > recommendation compared to HTML5. One has to check the RelaxNG schemas > and the predefined RelaxNG types to know the exact syntax. Well hopefully section 2.1.5.1 https://www.w3.org/Math/draft-spec/mathml.html#chapter2_id.2.1.5.1 is reasonably exact (but the main point that it's not exactly the same as HTML5 is of course undeniable) > Again, it > think it would be best to rely on the HTML5 definitions. For example, > <math><mspace width="1E1em" height="10em" mathbackground="red"/></math> > draws a red square in WebKit but Gecko says "1E1em" is invalid. Certainly scope for documenting the syntaxes there and seeing whether any differences are giving extra functionality or just historical, I suspect that we should be able to specify a profile of mathml for text/html parsing that brings things more in to line with html/css numeric syntax if that's needed. > > https://www.w3.org/TR/html5/infrastructure.html#numbers > > Frédéric > > David ________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Microsoft Office 365. ________________________________ |
Le 01/08/2016 à 18:45, David Carlisle a écrit :
> > white space is always a problem:-) but I'd be sorry to just drop this > completely, it's a well established feature of math typesetting (in TeX > and elsewhere) that user-whitespace is ignored and the math typesetter > re-adds white space as needed. That said, I agree that the fact that > TeX treats 1+2 like 1 + 2 doesn't necessarily mean that mathml should > treat <mo>+</mo> like <mo> + </mo>. If the trimming could happen during > text/html parsing that would simplify some things. TeX-to-MathML converters trim and collapse the whitespace and that's probably the same for other MathML generators. So that's why I don't see it's needed to be done again by renderers. > Do you mean the attribute values or the attribute names? Values. > >> 4) MathML boolean attributes take value "true" and "false". In HTML5, >> the boolean value is given by the presence/absence of the attribute and >> the only allowed value is the name of the attribute. This allows to get >> more compact syntax like <mo largeop stretchy> instead of <mo >> largeop="true" stretchy="true">. However, Web engines and authoring >> tools will continue to support the true/false syntax anyway, so it's >> probably not worth adding complexity here... > > I don't think allowing stretchy=stretchy as an alternative to > stretch=true would break anything on the XML side of things, and would > potentially, as you say, allow just stretchy in text/html using its > version of the old SGML shorttag feature. You could say more than me > whether that would simplify or complicate things at implementation level. MathML syntax (which I guess we don't want as that will break all existing documents). But maybe it's the nex syntax is not too much to add if it's really something users want. >> >> https://www.w3.org/TR/html5/infrastructure.html#boolean-attributes >> >> 5) As I said in a previous message, the values "small", "normal", "big" >> of mathsize do not exist for CSS font-size. Removing them will simplify >> a bit the parsing code. > > Are these conceptually more difficult than css names like > small,medium,large,x-large? (just asking:-) True, I forgot these keywords. I'll have to read the Gecko/WebKit code to check where these values are resolved. But as I see the lists of keywords are different so we won't be able to use exactly the same code anyway. > >> >> 6) The definition of numbers is also not very accurate in the MathML >> recommendation compared to HTML5. One has to check the RelaxNG schemas >> and the predefined RelaxNG types to know the exact syntax. > > Well hopefully section 2.1.5.1 > https://www.w3.org/Math/draft-spec/mathml.html#chapter2_id.2.1.5.1 > is reasonably exact (but the main point that it's not exactly the same > as HTML5 is of course undeniable) it could be worth just reusing the HTML5 definitions so that web engines implementers don't have to check the differences. |
In reply to this post by Frédéric Wang-2
Frédéric Wang <[hidden email]> writes in part: A specific point: > ... > For example, <math><mspace width="1E1em" > height="10em" mathbackground="red"/></math> draws a red > square in WebKit but Gecko says "1E1em" is invalid. 1E1 is ridiculous. For one thing, to my eye, it's 10.0 (floating point) -- implied by the E notation -- rather than simply 10 -- Bill |
On 01/08/2016 22:33, William F Hammond wrote:
> > Frédéric Wang <[hidden email]> writes in part: > > > A specific point: > >> ... >> For example, <math><mspace width="1E1em" >> height="10em" mathbackground="red"/></math> draws a red >> square in WebKit but Gecko says "1E1em" is invalid. > > 1E1 is ridiculous. For one thing, to my eye, it's 10.0 > (floating point) -- implied by the E notation -- rather than > simply 10 > > -- Bill ? the length is a floating point quantity here. 1E1em isn't valid mathml syntax but it seems perfectly reasonable suggested extension, isn't it? 10em is same as 10.0em and could have been the same as 1e1em if we'd specified it that way couldn't it? David ________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Microsoft Office 365. ________________________________ |
In reply to this post by David Carlisle
Le 01/08/2016 à 18:45, David Carlisle a écrit :
> I don't think allowing stretchy=stretchy as an alternative to > stretch=true would break anything on the XML side of things, and would > potentially, as you say, allow just stretchy in text/html using its > version of the old SGML shorttag feature. You could say more than me > whether that would simplify or complicate things at implementation level. One additional complexity is that MathML boolean attributes really have three values: "false" or "true" when they are explicit and "automatic" (computed from the operator dictionary etc) when they are not specified. So I'm not sure the HTML5 syntax will work. signature.asc (836 bytes) Download Attachment |
In reply to this post by William F Hammond
Le 01/08/2016 à 23:33, William F Hammond a écrit :
> > 1E1 is ridiculous. For one thing, to my eye, it's 10.0 > (floating point) -- implied by the E notation -- rather than > simply 10 > > -- Bill Not sure I understand your point either. As David said, lengths use floating point numbers. Gecko's MathML code implement its own parsing to verify that the number matches the MathML syntax before converting to float while WebKit's parsing code is simpler and just calls an internal toFloat method immediately (letting it decide what's the valid syntax). If MathML aligns on HTML5 and the typical syntax for floats then Gecko's code could be simplified a bit. Maybe that will also help converters that generate lengths from via some calculations, I don't know. |
My intention was to defend the Gecko behavior and to say that 'E' notation should not be used with human-scale lengths
Sent from my iPhone > On Aug 2, 2016, at 9:30 AM, Frédéric Wang <[hidden email]> wrote: > >> Le 01/08/2016 à 23:33, William F Hammond a écrit : >> >> 1E1 is ridiculous. For one thing, to my eye, it's 10.0 >> (floating point) -- implied by the E notation -- rather than >> simply 10 >> >> -- Bill > Not sure I understand your point either. As David said, lengths use > floating point numbers. Gecko's MathML code implement its own parsing to > verify that the number matches the MathML syntax before converting to > float while WebKit's parsing code is simpler and just calls an internal > toFloat method immediately (letting it decide what's the valid syntax). > If MathML aligns on HTML5 and the typical syntax for floats then Gecko's > code could be simplified a bit. Maybe that will also help converters > that generate lengths from via some calculations, I don't know. > > |
Free forum by Nabble | Edit this page |