Quantcast

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
This message is in regard to the discussion related to [this](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0101.html 
).

When I was implementing the Cerebra OWL reasoner, I came to the firm  
conclusion that the OWL (1.0) spec was downright broken on this point,  
and I fear we're in danger of breaking OWL 2.0 in exactly the same way.

Putting aside the issue of whether or not it's possible to use (only)  
the XML Schema datatypes to represent meaningful and implementable OWL  
datatype value spaces, I expect that there is consensus that when  
users were writing `xsd:float` and `xsd:double` without values in OWL  
1.0, what they really meant was "any number". No user ever intended to  
restrict the semantic space to a nowhere-dense number line. If the OWL  
spec presupposes that most of our users would a prefer a number line  
which does not include 1/3, my choice as an implementor would be to  
once again ignore the spec and be intentionally non-compliant. Doing  
what all my users want and expect in this case turns out to be way way  
easier than doing what a broken spec would require. Any working group  
who would produce such a spec would clearly be putting their own  
interests (ease of spec authoring and political considerations) above  
their duty to their intended users.

(Note that in the course of the discussion I read on public-owl-wg the  
notions of "dense" and "continuous" seem to have become confused. I  
think the notion of density is probably the only one that makes a  
difference in terms of current OWL semantics, since number  
restrictions can cause inconsistencies in non-dense number lines, but  
continuity is really what users have in their heads.)

The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/) is  
focused on representing particular values, not on classes of values.  
The notion of "value spaces" is used within the spec, but only in  
service of representation of values---note that there's not a single  
value space mentioned which is continuous with respect to the reals,  
nor are such notions as "rationals" defined. This makes sense in terms  
of data serialization (the driving XML use case) and standard  
programming languages (where manipulation of values is the driving use  
case), but OWL is in a very different situation. The primary OWL use  
case is reasoning about the emptiness (or size) of value spaces, and  
the definitions provided in the XML Schema spec do not serve this  
purpose well.

Note that I'm not saying XML Schema is a bad spec; merely that it  
addresses different problems than we have.


I strongly encourage the working group to publish a spec which  
provides for the following types of semantic spaces:

1. A countably infinite, nowhere-dense datatype. I.e. the integers.

2. A countably infinite, dense datatype. I.e. strings.

3. An uncountably infinite, dense, continuous datatype. I.e. the reals.

I don't particularly care what each of these three is called; as long  
as OWL specifies the internal semantics of these three types of  
spaces, then it's straightforward to "implement" the datatypes users  
will actually want in terms of them. But, of course, the ability to  
use XML Schema Datatypes to encode specific values within each of  
these spaces would be quite convenient---and would use the XML Schema  
specification for *exactly* what it's good at.

-rob

smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Alan Ruttenberg-2

Thanks for your comments Rob. It occurs to me that both the XML Schema  
and the OWL working groups are in progress, that this is an issue that  
touches both groups, that having the specifications be able to be read  
in conjunction without confusion as to their relationship would be  
beneficial to overall efforts of the W3C towards harmonization and the  
implementors and users of those specifications.

Perhaps we can take advantage of the felicitous timing to ensure that  
our respective specifications are consistent with each other by  
ensuring that terms are used in the same way, if necessary adding  
clarification, and by attempting to have any additional datatype  
concepts needed for a good OWL specification be incorporated into the  
XML Schema specification.

Towards the end of understanding the terminology, I've been trying to  
understand what the value space of XML Schema means, given that it  
doesn't mean what one would expect in a mathematical sense. Similarly,  
there seems to be missing an underlying type for the date types -  
although there is reference to timeOnTimeline, this value type is not  
surfaced in the type hierarchy.

One thought is that whether a correct interpretation is more along the  
lines of considering the value spaces as data structures. In favor of  
this interpretation it is clear that floats and integers are distinct  
and the stated influences - java, sql, machine independent data types.  
Contrary to this is that it makes it harder to interpret the integers  
as a restriction of decimals, since representation of arbitrary  
precision decimals is by a different data structure, and to understand  
the difference between base64Binary and hexBinary, which are  
represented on the machine as the same data structure.

Another thought is that the value spaces are another aspect of lexical  
expression. This would account well for there being a difference  
between base64Binary and hexBinary, but not explain why these are not  
pattern facet restrictions on string.

Finally, I wonder if you have comments on a couple of other aspects of  
datatypes that appear in XML schema. Specifically, data types that are  
derived by list and time and date types. Clearly such concepts or  
similar are relevant to OWL given work on, e.g. workflow, or  in  
spatial reasoning. Where do they fit into your view of OWL class space?

-Alan

On Jul 4, 2008, at 12:46 PM, Rob Shearer wrote:

> This message is in regard to the discussion related to [this](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0101.html 
> ).
>
> When I was implementing the Cerebra OWL reasoner, I came to the firm  
> conclusion that the OWL (1.0) spec was downright broken on this  
> point, and I fear we're in danger of breaking OWL 2.0 in exactly the  
> same way.
>
> Putting aside the issue of whether or not it's possible to use  
> (only) the XML Schema datatypes to represent meaningful and  
> implementable OWL datatype value spaces, I expect that there is  
> consensus that when users were writing `xsd:float` and `xsd:double`  
> without values in OWL 1.0, what they really meant was "any number".  
> No user ever intended to restrict the semantic space to a nowhere-
> dense number line. If the OWL spec presupposes that most of our  
> users would a prefer a number line which does not include 1/3, my  
> choice as an implementor would be to once again ignore the spec and  
> be intentionally non-compliant. Doing what all my users want and  
> expect in this case turns out to be way way easier than doing what a  
> broken spec would require. Any working group who would produce such  
> a spec would clearly be putting their own interests (ease of spec  
> authoring and political considerations) above their duty to their  
> intended users.
>
> (Note that in the course of the discussion I read on public-owl-wg  
> the notions of "dense" and "continuous" seem to have become  
> confused. I think the notion of density is probably the only one  
> that makes a difference in terms of current OWL semantics, since  
> number restrictions can cause inconsistencies in non-dense number  
> lines, but continuity is really what users have in their heads.)
>
> The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/) is  
> focused on representing particular values, not on classes of values.  
> The notion of "value spaces" is used within the spec, but only in  
> service of representation of values---note that there's not a single  
> value space mentioned which is continuous with respect to the reals,  
> nor are such notions as "rationals" defined. This makes sense in  
> terms of data serialization (the driving XML use case) and standard  
> programming languages (where manipulation of values is the driving  
> use case), but OWL is in a very different situation. The primary OWL  
> use case is reasoning about the emptiness (or size) of value spaces,  
> and the definitions provided in the XML Schema spec do not serve  
> this purpose well.
>
> Note that I'm not saying XML Schema is a bad spec; merely that it  
> addresses different problems than we have.
>
>
> I strongly encourage the working group to publish a spec which  
> provides for the following types of semantic spaces:
>
> 1. A countably infinite, nowhere-dense datatype. I.e. the integers.
>
> 2. A countably infinite, dense datatype. I.e. strings.
>
> 3. An uncountably infinite, dense, continuous datatype. I.e. the  
> reals.
>
> I don't particularly care what each of these three is called; as  
> long as OWL specifies the internal semantics of these three types of  
> spaces, then it's straightforward to "implement" the datatypes users  
> will actually want in terms of them. But, of course, the ability to  
> use XML Schema Datatypes to encode specific values within each of  
> these spaces would be quite convenient---and would use the XML  
> Schema specification for *exactly* what it's good at.
>
> -rob


Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Dave Peterson-6

[Since this discussion is going to other lists, let me explain that I'm
a member of the Schema WG, and I work primarily on datatypes.  Although
this message was directed from Alan Ruttenberg to Rob Shearer, I'm
going to inject comments since it went to the general WG lists as
well.]

At 4:56 PM -0400 2008-07-04, Alan Ruttenberg wrote:

>Thanks for your comments Rob. It occurs to me that both the XML
>Schema and the OWL working groups are in progress, that this is an
>issue that touches both groups, that having the specifications be
>able to be read in conjunction without confusion as to their
>relationship would be beneficial to overall efforts of the W3C
>towards harmonization and the implementors and users of those
>specifications.
>
>Perhaps we can take advantage of the felicitous timing to ensure
>that our respective specifications are consistent with each other by
>ensuring that terms are used in the same way, if necessary adding
>clarification, and by attempting to have any additional datatype
>concepts needed for a good OWL specification be incorporated into
>the XML Schema specification.

I'll leave comments on this aspect to others who are more up on precise
development/publication schedules.

>Towards the end of understanding the terminology, I've been trying
>to understand what the value space of XML Schema means, given that
>it doesn't mean what one would expect in a mathematical sense.

I'll have to take exception to that.  I'm sure it doesn't mean what you
would expect in a mathematical sense.  But it does very definitely mean
what I would expect in a mathematical sense.  (Credentials:  Phd, U.C.
Berkeley, 1965, primarily in Analysis and Foundations of Mathematics;
Assistant Professor of Mathematics and Computer Science, and Associate
Professor of Mathematics at various times during my career.)  So please
don't generalize to an arbitrary "one" and imply that that's the only
possible reasonable expectation.

>Similarly, there seems to be missing an underlying type for the date
>types - although there is reference to timeOnTimeline, this value
>type is not surfaced in the type hierarchy.

I'd very much like to hear how you'd do this; unlike the number datatypes,
where I could envisage how to pull them together, I can't envisage a
reasonable way for all the d/t datatypes to be derived from a universal
one.  And I did try.

>One thought is that whether a correct interpretation is more along
>the lines of considering the value spaces as data structures.

I'm curious what you mean by "data structure" here.  Reading on, it sounds
like you mean various possible machine representations of the values.
Let me assure you that that's not what is meant by a value space.  In
fact, I can think of several extremely different-appearing representations
of, for example, the integers, that are nonetheless isomorphic.  They
are all potential machine representations of the values for the same
datatype.  XSD does not have anything to say about machine representations,
except to say that if an implementation has two different representations
of the same value, it is obligated to generally treat them the same.

>Another thought is that the value spaces are another aspect of
>lexical expression. This would account well for there being a
>difference between base64Binary and hexBinary, but not explain why
>these are not pattern facet restrictions on string.

base64Binary and hexBinary are different because they use entirely different
lexical mappings.  Different lexical mappings mean different datatypes.
Except for our decision to paint the two value spaces different colors
so we can tell them apart, the value spaces of these two datatypes are
the same.  (In this case, I suspect that the obvious equality across
these two value spaces would not bother anyone.  But we weren't going
to do that for some obvious datatype pairs and not others.

They are not pattern-facet restrictions on string for the same reason that
float and double are not pattern-facet restrictions on string.  The value
spaces are different.  String values are character strings; the xxxBinary
values are bit-strings.  Bits aren't characters.

>Finally, I wonder if you have comments on a couple of other aspects
>of datatypes that appear in XML schema. Specifically, data types
>that are derived by list and time and date types. Clearly such
>concepts or similar are relevant to OWL given work on, e.g.
>workflow, or  in spatial reasoning. Where do they fit into your view
>of OWL class space?

You both should definitely look up the latest Public Working Draft (a
Last Call draft) for XSD.  I think it might clear up some of the questions,
hopefully providing a better understanding or description of list
datatypes and date/time datatypes.
--
Dave Peterson
SGMLWorks!

[hidden email]

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
In reply to this post by Rob Shearer-4
>>> Putting aside the issue of whether or not it's possible to use  
>>> (only) the XML Schema datatypes to represent meaningful and  
>>> implementable OWL datatype value spaces, I expect that there is  
>>> consensus that when users were writing `xsd:float` and  
>>> `xsd:double` without values in OWL 1.0, what they really meant was  
>>> "any number".
>
> I don't know what users meant :) I would think that they should use  
> xsd:decimal if that was their intend (or perhaps the new  
> owl:rational/real).

I'm providing you with my experience: every user I've ever spoken to  
about this topic has wanted the real number line.
They are used to using the xsd datatypes `float` and `double` to  
represent number values, so they use these without values in OWL to  
mean "some number".

My experience is that the use of xsd datatypes as value spaces in OWL  
1.0 causes users to write what they don't mean. My experience is that  
*every* ontology using `xsd:float` and `xsd:double` without values  
would be better off using `xsd:decimal`, but that the user intent was  
"some real number" (and I should note that I'm against requiring  
support for `xsd:decimal` values). And my expectation is that users  
would be much less confused if this distinction between the types used  
for specific values and the types used for value spaces were clear.

To repeat: as an implementor, I did willfully implement semantics  
contradictory to the spec, and I will do so again for OWL 2.0 if the  
spec is "broken" in the same way.

> When I am working as a user, I generally, both in programming  
> languages and in kbs, am very careful about computational types and  
> numerical methods. Its easy to find extensive discussions in  
> programming language circles about the pitfalls of floats. All  
> things being equal, it doesn't seem to be that difficult to  
> recommend that they use a more suitable type such as decimal.  
> Indeed, that is what's been happening as more and more programming  
> languages bundled in decimal types.

I am also a very careful programmer, and am familiar the details of  
the IEEE spec.

All the good programmers I've ever worked with are aware of the basic  
problems with floats but almost always use them when they mean "any  
real number" anyway. The mental model I use, and that I encouraged  
among junior programmers, was that floats were "real numbers, but  
assume that they wiggle around a little all the time". Not technically  
correct, but a safe and useful mental model for programming. The point  
being that density of the number line is *not* an issue programmers  
encounter as a matter of course, and one for which their natural  
intuition might well be wrong.

>>> No user ever intended to restrict the semantic space to a nowhere-
>>> dense number line. If the OWL spec presupposes that most of our  
>>> users would a prefer a number line which does not include 1/3, my  
>>> choice as an implementor would be to once again ignore the spec  
>>> and be intentionally non-compliant.
>
> An alternative choice is to signal that a repair is required and  
> perhaps do it.

I hereby signal that a repair to the OWL spec is required. (Are we  
really pretending that everybody thought datatypes in OWL 1.0 were  
fine and dandy?)

>>> Doing what all my users want and expect in this case turns out to  
>>> be way way easier than doing what a broken spec would require. Any  
>>> working group who would produce such a spec would clearly be  
>>> putting their own interests (ease of spec authoring and political  
>>> considerations) above their duty to their intended users.
>
> I think your rhetoric flew ahead of reality here. It's not actually  
> easier to spec this (as the ongoing battle has shown :)). As you  
> well know, it's much easier to give in to Boris than not to :) I  
> don't believe I'm particularly motivated by political considerations  
> per se. I do think that departing from existing behavior  
> (disjointness) and normal meaning (in computer science) needs to be  
> done carefully.
Let me expand upon my rhetoric:

1. Users want a (dense) real number line.
2. Users expect a (dense) real number line when they write `xsd:float`  
in OWL 1.0 ontologies.
3. OWL 1.0 implementations reason as though the `xsd:float` value  
space is dense.
4. The OWL 1.0 specifications state that the `xsd:float` value space  
is nowhere-dense.

If you disagree about the first two points then it's certainly worth  
discussion: Alan's [investigation](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0103.html 
) seems to support my experience on point 1. I have yet to see a  
single counter-example to point 2---and I've asked many users what  
they meant when they wrote their datatype restrictions.

I admit I haven't done a comprehensive survey on point 3, but it's a  
point of fact and not opinion so we should be able to gather evidence  
one way or the other.

The crux of my rhetoric is that points 1--3 (if you accept them)  
completely and utterly trump point 4. "Existing behavior" is *not*  
what the OWL 1.0 spec says. It's what OWL users (implementors and  
ontology authors) are doing.

> Given that some people have already asked for NaN support (of some  
> form) and that one of the most championed use cases is managing  
> scientific computation results, I don't think we can be too quick to  
> alter things.

I agree that it's an issue, and as a member of the public I don't  
intend to get mightily bogged down in details of the solution to be  
chosen. I'd think that NaN occurs quite rarely, and that semantics  
such as "any real" would suffice, but I don't have strong opinions on  
the issue.

>>> (Note that in the course of the discussion I read on public-owl-wg  
>>> the notions of "dense" and "continuous" seem to have become  
>>> confused. I think the notion of density is probably the only one  
>>> that makes a difference in terms of current OWL semantics, since  
>>> number restrictions can cause inconsistencies in non-dense number  
>>> lines, but continuity is really what users have in their heads.)
>>>
>>> The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/)  
>>> is focused on representing particular values, not on classes of  
>>> values. The notion of "value spaces" is used within the spec, but  
>>> only in service of representation of values
>
> I'm not sure what you mean. It seems clear that the spec is all  
> about classes of values (i.e., types) and their relations.
I mean that the problems that spec is designed to solve involve  
values, not sets of values. The most complex reasoning the XML Schema  
people have in mind is model checking, not satisfiability and  
consistency reasoning. Thus we can't necessarily expect their spec to  
have addressed all the issues which arise in our quite different  
context.

>>> I strongly encourage the working group to publish a spec which  
>>> provides for the following types of semantic spaces:
>>>
>>> 1. A countably infinite, nowhere-dense datatype. I.e. the integers.
>>>
>>> 2. A countably infinite, dense datatype. I.e. strings.
>>>
>>> 3. An uncountably infinite, dense, continuous datatype. I.e. the  
>>> reals.
>
> These are all on the agenda. The first two were in OWL1 and the  
> third is being worked on as part of the n-ary data predicate  
> proposal, but is separate from it (i.e., I believe it will be added  
> regardless of the fate of n-ary).
>
> (Note that this will likely be the algebraic reals and only rational  
> constants. So, no transcendentals. I'd be interested in your view on  
> that. I can imagine adding the trans. but would prefer to defer it  
> until a later iteration.)
This getting ridiculous---so you're saying you think there is a  
substantial user base who need to be able to specify that a value is  
the solution to some algebraic equation? I have absolutely no idea  
what perspective the working group is taking here---what implementor  
or user has expressed interest in anything other than the real number  
line???

Can you guys please just come up with a version of the [`numeric`](http://www.w3.org/TR/xmlschema-2/#rf-numeric 
) notion? Pretty please?

>>> I don't particularly care what each of these three is called; as  
>>> long as OWL specifies the internal semantics of these three types  
>>> of spaces, then it's straightforward to "implement" the datatypes  
>>> users will actually want in terms of them. But, of course, the  
>>> ability to use XML Schema Datatypes to encode specific values  
>>> within each of these spaces would be quite convenient
>
> Do you mean the lexical spaces?

I mean the only time I explicitly want XML Schema is when my  
implementation is parsing specific values provided by the user. If you  
happen to re-use the XML Schema spec for other things that is for your  
own convenience, not mine.

>>> ---and would use the XML Schema specification for *exactly* what  
>>> it's good at.
>
> The additional question is whether to require additional types that  
> are not the above three. Among these are float and double. My belief  
> is that if we are going to add such datatypes as required, and we  
> are going to take them from xsd, then they should reflect the  
> semantics of those types and our advice to users is to only use them  
> if they specifically intend those semantics.

I'd guess that using xsd names for value spaces will just (continue  
to) confuse users.
More importantly, and yet again, I have never ever encountered a user  
who would prefer to use the `float` or `double` value spaces if a  
`real` value space were available. If there are users who feel the  
other way, then please produce them---merely hypothesizing their  
theoretical existence does not seem useful. (I grant that the class is  
satisfiable. I contend that its size is vanishingly small in practice.)

> The n-ary predicate definition system will, at most, be over the  
> core three types above (e.g., polynomial or linear inequations with  
> rational coefficients over the reals ). However, one can pretty  
> easily imagine a predicate definition system that was focused on the  
> floats and was sensitive to the various semantics. It wouldn't have  
> to be direct floating point based equations, but an interval  
> arithmetic system which was designed to help reason about  
> measurements and computations (and their errors).

I care not a whit for n-ary datatypes. I might implement them if  
they're in the spec; I might not. But if the spec says you need to use  
n-ary datatypes to get real numbers, and leaves the issues raised with  
the `float` value space in place, I will ignore the spec and implement  
the real number line for unary datatypes. Just like I did for OWL 1.0.  
As a member of the public, that is my feedback to the working group.

> I grant entirely that that use case is quite speculative at the  
> moment. But given that 1) we have alternatives for the "any number"  
> type and 2) cardinality reasoning with the floats is not very much  
> more difficult that with user defined finite ranges over the  
> integers (except for the fact that users have to do much more work  
> to get there), I don't think we should muck with the semantics of  
> floats.

I strongly disagree with 2. I don't want my implementation to care  
about the difference between `double` and `float`, and I consider any  
line of code I write involving the internals of float representation  
to be a wasted line of code, because my users really don't care.

Much more importantly, it's my job to turn your spec into user-facing  
documentation and support, and there is not a chance in hell I'm going  
to explain this issue to my users. They don't care, and they don't  
want the semantics you are describing. Experience with OWL 1.0 has  
demonstrated this.

> Your feedback and insight are, as always, appreciated. I hope you  
> see that my position doesn't *quite* fall into the error you are  
> rightly concerned with. There's still the problem of educating  
> people about float and double, but that is a problem of long  
> standing :)
>
> I'll also admit up front that I *like* float and double as they are.  
> I think that IEEE binary floating point is a amazingly clever thing.  
> But then, I've always worked in programming languages that had  
> bigints and fractions available, so been spoiled for choice :)
I'm a big fan of balanced ternary. But I don't intend to implement  
that, either.

-rob



smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Alan Ruttenberg-2
In reply to this post by Dave Peterson-6

On Jul 5, 2008, at 1:58 AM, Dave Peterson wrote:

>> Towards the end of understanding the terminology, I've been trying  
>> to understand what the value space of XML Schema means, given that  
>> it doesn't mean what one would expect in a mathematical sense.
>
> I'll have to take exception to that.  I'm sure it doesn't mean what  
> you
> would expect in a mathematical sense.  But it does very definitely  
> mean
> what I would expect in a mathematical sense.  (Credentials:  Phd, U.C.
> Berkeley, 1965, primarily in Analysis and Foundations of Mathematics;
> Assistant Professor of Mathematics and Computer Science, and Associate
> Professor of Mathematics at various times during my career.)  So  
> please
> don't generalize to an arbitrary "one" and imply that that's the only
> possible reasonable expectation.

I'm sorry for the overgeneralization and didn't mean to insult. It's  
just that as much as I think about it, I can't understand the idea  
that the value space of floats and the value space of decimal are  
disjoint. Fundamentally these represent some of the same real numbers  
and this isn't reflected in the spec. In addition, many numbers that  
can be finitely expressed and be calculated with find no place in  
*any* of the value spaces, e.g. 1/3. It is this sense of  
"mathematical" that I was referring to.

I have looked at the functions and operators specification. I  
understand how you come to your previous points about different  
choice of equality, as the specification promotes decimal to float.  
As a matter of clarity, I probably would have called the comparison  
not "equality" but "equality as floats" and "equality as doubles".

Considering the definition of equality, I would ask: Is that  
something someone would do if they weren't constrained to use  
floating point numbers? It is a perfectly reasonably thing to do if  
you don't have have any more expressive numeric types, as it is a  
perfectly reasonable thing to do to throw an exception when a  
multiplication of integers exceeds the limit of the integer datatype.  
However we now have libraries that support arbitrary precision  
integer and rational numbers. Floats can be promoted to the latter  
without loss of precision, as can decimal. Again, no addressing of  
this in the spec, nor any theoretical justification of how it is even  
possible to do an exact (sometimes) promotion of a decimal value to a  
float value if their value spaces are disjoint. Maybe there's a way  
to make sense of this. I'm trying.

To offer a concrete suggestion (I'll get to putting something into  
the bug tracker...), and speaking to the possibility of harmonizing  
the OWL specification and the XSD specification, something to  
consider would be to add xsd:real and xsd:rational. This could at  
least prevent the (strong) possibility of OWL defining those types  
itself. Personally, I think it would be cleaner to have all the  
numeric types handled in the XML Schema documents.  I realize that  
this might be a bit of work, but at least that work would have  
interested parties from both the OWL and XSD WGs.

I'd also consider reviewing the part of the spec that says:
> Should a derivation be made using a derivation mechanism that  
> removes ·lexical representations· from the·lexical space· to the  
> extent that one or more values cease to have any ·lexical  
> representation·, then those values are dropped from the ·value space·.
>
>

I've still no understanding of why that is a desirable thing to do,  
and we've discussed aspects that some might consider undesirable.

>> Similarly, there seems to be missing an underlying type for the  
>> date types - although there is reference to timeOnTimeline, this  
>> value type is not surfaced in the type hierarchy.
>
> I'd very much like to hear how you'd do this; unlike the number  
> datatypes,
> where I could envisage how to pull them together, I can't envisage a
> reasonable way for all the d/t datatypes to be derived from a  
> universal
> one.  And I did try.

I had in mind subtyping the dates into those with and those without a  
timezone, and having each descend from a separate timeOnTimeline.

>> One thought is that whether a correct interpretation is more along  
>> the lines of considering the value spaces as data structures.
>
> I'm curious what you mean by "data structure" here.  Reading on, it  
> sounds
> like you mean various possible machine representations of the values.
> Let me assure you that that's not what is meant by a value space.  In
> fact, I can think of several extremely different-appearing  
> representations
> of, for example, the integers, that are nonetheless isomorphic.  They
> are all potential machine representations of the values for the same
> datatype.  XSD does not have anything to say about machine  
> representations,
> except to say that if an implementation has two different  
> representations
> of the same value, it is obligated to generally treat them the same.

Again, it is trying to wrestle with the disjointness of float and  
decimal value spaces that is leading me to look for some explanation.  
While XSD does not explicitly speak about machine representation,  
that does not mean that those concepts do not (overly) influence the  
specification. To explain myself a bit further on this kind of  
analysis - I spend a lot of time developing ontologies, and searching  
for unspoken, but operant, knowledge and constraint and then exposing  
it is a common aspect of this work.

What I specifically mean by data structure in this case was the  
little data structure that is a floating point number, composed of  
part: integer mantissa, integer exponent, sign bit, +some symbols  
encodings. I compared that to integer which doesn't have these parts.  
However decimal seems to necessarily be composed of different kinds  
of parts.

>> Another thought is that the value spaces are another aspect of  
>> lexical expression. This would account well for there being a  
>> difference between base64Binary and hexBinary, but not explain why  
>> these are not pattern facet restrictions on string.
>
> base64Binary and hexBinary are different because they use entirely  
> different
> lexical mappings.  Different lexical mappings mean different  
> datatypes.

But not disjoint value spaces.

> Except for our decision to paint the two value spaces different colors
> so we can tell them apart,

Why would one want to tell them apart? Why not consider a single  
lexical mapping that has a disjunction? More than one lexical can map  
to the same float, more than one lexical representation of a bit  
sequence can map to it.

> the value spaces of these two datatypes are
> the same.  (In this case, I suspect that the obvious equality across
> these two value spaces would not bother anyone.  But we weren't going
> to do that for some obvious datatype pairs and not others.

It's the obviousness, and the spec's decision to not respect that  
obviousness that is my concern.

> They are not pattern-facet restrictions on string for the same  
> reason that
> float and double are not pattern-facet restrictions on string.  The  
> value
> spaces are different.  String values are character strings; the  
> xxxBinary
> values are bit-strings.  Bits aren't characters.

Fair enough. My mistake.

>> Finally, I wonder if you have comments on a couple of other  
>> aspects of datatypes that appear in XML schema. Specifically, data  
>> types that are derived by list and time and date types. Clearly  
>> such concepts or similar are relevant to OWL given work on, e.g.  
>> workflow, or  in spatial reasoning. Where do they fit into your  
>> view of OWL class space?
>
> You both should definitely look up the latest Public Working Draft (a
> Last Call draft) for XSD.  I think it might clear up some of the  
> questions,
> hopefully providing a better understanding or description of list
> datatypes and date/time datatypes.

Have been. Will be doing more.

-Alan
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Michael Kay
In reply to this post by Dave Peterson-6

> >Similarly, there seems to be missing an underlying type for the date
> >types - although there is reference to timeOnTimeline, this
> value type
> >is not surfaced in the type hierarchy.
>
> I'd very much like to hear how you'd do this; unlike the
> number datatypes, where I could envisage how to pull them
> together, I can't envisage a reasonable way for all the d/t
> datatypes to be derived from a universal one.  And I did try.

Since the types date, time, dateTime, gYear, gYearMonth, gMonth, gMonthDay,
and gDay are disjoint in both their value spaces and lexical spaces, I would
have thought it quite easy to define a primitive type that is essentially
the union of all of these (it might or might not be abstract), and derive
these 8 types from this new type by restriction. Where exactly is the
difficulty?

The QT operations on dates and times could be greatly simplified if this
were done (well, perhaps not retrospectively...)

>
> base64Binary and hexBinary are different because they use
> entirely different lexical mappings.  Different lexical
> mappings mean different datatypes.

This is certainly an unfortunate feature of the system. Clearly one would
like all operations defined on one of these types to be equally applicable
to the other. Having two different external representations of the values is
really a very weak justification for making them different types. Of course
it's too late to change this; but I'm sure it could have been done better. I
would hope that if we introduced hexadecimal notation as an alternative
lexical representation of integers we would find some way of doing it that
didn't involve introducing a new primitive type.

Michael Kay
http://www.saxonica.com/


Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
In reply to this post by Alan Ruttenberg-2
>> I'm providing you with my experience: every user I've ever spoken  
>> to about this topic has wanted the real number line.
>> They are used to using the xsd datatypes `float` and `double` to  
>> represent number values, so they use these without values in OWL to  
>> mean "some number".
>
> Do they mean bounded numbers? (i.e. with min and max sizes?) Do they  
> distinguish between double and float? Do they care about NaNs?  
> (Alan's users care about the latter.)

Whether it's "forall R > 1.0^^xsd:float" or "forall R `xsd:float`"  
they seem to intend a dense number line. In the first case `float` is  
just the easiest way to specify the value; in the second you can  
certainly argue that they should have used `decimal`...but that's a  
pointless argument because my reasoner didn't really support decimal.

>> My experience is that the use of xsd datatypes as value spaces in  
>> OWL 1.0 causes users to write what they don't mean.
>
> For me, this would suggest removing them or enforcing them more  
> clearly.

I'd suggest removing them.

>> My experience is that *every* ontology using `xsd:float` and  
>> `xsd:double` without values would be better off using  
>> `xsd:decimal`, but that the user intent was "some real number" (and  
>> I should note that I'm against requiring support for `xsd:decimal`  
>> values).
>
> Values? Or the datatype? In OWL 1, all these types were optional and  
> poorly speced and had no documentation whatsoever. Part of the goal  
> here is to spec well and document clearly any types we require.

I would like to use doubles internally to represent points on the real  
number line. Some homogeneous mix of internal representations is a  
pain. And I seriously doubt that many users really care about the  
extra representation power of `decimal`. It makes sense as an optional  
feature reasoners can support, but it seems completely unnecessary to  
require it in the spec---it's exactly the sort of thing I'd put off  
implementing indefinitely under users asked for it.

The reason `decimal` keeps coming up is just that it's dense. So are  
we using the xsd spec as an excuse to conflate density with complex  
internal representations?

>> And my expectation is that users would be much less confused if  
>> this distinction between the types used for specific values and the  
>> types used for value spaces were clear.
>
> I don't understand this distinction. Every datatype are used for a  
> value space. Some value spaces are finite and some are infinite.  
> Some are dense and some are not. etc. So, please clarify what you  
> mean by this distinction.

A particular value is a point on the number line. XSD offers plenty of  
different lexical representations for such points. I support using XSD  
types to specify such points.

A "value space" (for numerics, at least) is a (possibly infinite) set  
of points on the number line. Although each xsd type is associated  
with some value space, I think xsd is a really really crap spec for  
value spaces. The entire spec is oriented in terms of lexical  
representation, not value spaces: the type hierarchy, for example, is  
a hierarchy of lexical representations, not a hierarchy of value  
spaces. Referring any user over to that spec to understand value  
spaces is obnoxious and counter-productive: even WG members seem to be  
having trouble grokking it. (And bravo to anyone making the pedantic  
point that a particular value is a degenerate value space.)

I contend that OWL users only want a tiny tiny number of different  
value spaces to play with: integers, strings, and reals.
It is possible, however, they they will want a larger number of ways  
to lexically represent particular values within these three spaces.
Most importantly, I do not think there is necessarily a direct  
correlation between the lexical representations used to represent  
particular values and the value spaces in which those particular  
values live. I.e. users want to be able to specify particular values  
within the `real` value space using `xsd:float`, but they do *not*  
have any interest in use of the `xsd:float` value space.

Thus we've got two orthogonal concepts which happen to coincide for  
strings and integers but not for real numbers.

My proposed solution would be to use brand-new OWL names for all value  
spaces, but use xsd syntax to specify particular values.

>> To repeat: as an implementor, I did willfully implement semantics  
>> contradictory to the spec,
>
> The spec made all the types except string and integer optional, and  
> didn't say much about them.
>
> It's true that they did defer to the XSD spec, so perhaps you meant  
> you violated that? Could you be specific in how you violated it? For  
> example:
>
> If an integer typed value appeared as the object of a float ranged  
> property, was the KB inconsistent?
No---I made the integer number line a subset of the real number line.

> If a float typed value was outside the range of xsd:float and used  
> in an assertion, was the KB inconsistent?

For the restriction "forall R `xsd:float`" I simply bounded the real  
number line at the min and max values of floats. Still a dense,  
infinite number line, but with bounds. I hated this usage, however,  
and would prefer if it became illegal.

> If a float typed value was NaN (i'm not sure what the constant is  
> there), was the KB consistent?

I don't entirely recall; I think NaN become either "any real" or "no  
real". Despite asking around a bit, I couldn't find any users who had  
strong opinions on the topic, however.

> I presume from what you said having a min cardinality which was  
> larger than the size of the floats on a float ranged property was  
> consistent. But I'm skeptical that this ever occurred in the wild  
> (in this way :))

Other than a little internal test in our suite to remind me of the  
issue I also doubt that my use of the real number line instead of the  
float number line ever arose.

> Did you have a user defined type derived from float that was fairly  
> small in range (e.g., -1.0 to 1.0) that interacted with a  
> cardinality restriction? Did you support user defined types?

We supported ranges, but I doubt any users wrote a range so small (and  
a cardinality restriction so big) that the issue ever arose. The hard  
data is that I never got any bug reports on the topic; the soft data  
is that when I interrogated users about their intent they were  
surprised by the notion that the set of floats was finite. (They  
understood this in principle but had no intention of deriving  
inconsistency as a result.)

>> and I will do so again for OWL 2.0 if the spec is "broken" in the  
>> same way.
>
> I think we can take this as read for the moment :)
>
>>> When I am working as a user, I generally, both in programming  
>>> languages and in kbs, am very careful about computational types  
>>> and numerical methods. Its easy to find extensive discussions in  
>>> programming language circles about the pitfalls of floats. All  
>>> things being equal, it doesn't seem to be that difficult to  
>>> recommend that they use a more suitable type such as decimal.  
>>> Indeed, that is what's been happening as more and more programming  
>>> languages bundled in decimal types.
>>
>> I am also a very careful programmer, and am familiar the details of  
>> the IEEE spec.
>>
>> All the good programmers I've ever worked with are aware of the  
>> basic problems with floats but almost always use them when they  
>> mean "any real number" anyway.
>
> But the systems don't respect that.
The systems respect their own semantics, and as a programmer I  
determined that the needs I had for reals were served sufficiently  
well by the language semantics for floats. And occasionally a tiny bit  
of extra code (i.e. turning a few equality tests into distance tests).

> Ye old Pascal used the name "real" for binary floats, but nobody  
> would go for that today. People certainly think there are bounds, at  
> least, and they are aware of the exactness problems.
>
>> The mental model I use, and that I encouraged among junior  
>> programmers, was that floats were "real numbers, but assume that  
>> they wiggle around a little all the time". Not technically correct,  
>> but a safe and useful mental model for programming.
>
> For some programming. For lots, not (since the error ranges can go  
> *really* wide unexpectedly). For lots of things this does matter  
> quite a bit. For other things it doesn't.
>
>> The point being that density of the number line is *not* an issue  
>> programmers encounter as a matter of course, and one for which  
>> their natural intuition might well be wrong.
>
> That's true. Bounds and inexactness are more obvious because of the  
> standard rounding action of float operations. But we aren't doing  
> rounding here, we are (potentially) doing counting. I agree that  
> people's intuitions are bad about counting, but I don't think it's  
> that hard to grasp that floats are more like an enumeration of  
> integers. It has to be explained of course.
To what end do we want to explain this? Help me finish this  
conversation:

user: Why is this KB inconsistent?
me: You've said that this needs to be a float, and there's this  
cardinality restriction, and there are only so many floats.
user: When I wrote float I just meant a number. Isn't that obvious?
me: Well, yes, it is obvious, but it's possible you meant something  
else, and the spec says...
user: But if it's completely obvious what I meant, then why didn't the  
system just do what I meant?
me: ...

In almost all cases we know that the user doesn't mean what he wrote.  
So why would we pass it through, produce a bug, and then try to teach  
the user the crazy semantics she never actually wanted to begin with?

>>>>> No user ever intended to restrict the semantic space to a  
>>>>> nowhere-dense number line. If the OWL spec presupposes that most  
>>>>> of our users would a prefer a number line which does not include  
>>>>> 1/3, my choice as an implementor would be to once again ignore  
>>>>> the spec and be intentionally non-compliant.
>>>
>>> An alternative choice is to signal that a repair is required and  
>>> perhaps do it.
>>
>> I hereby signal that a repair to the OWL spec is required.
>
> I meant repair of the ontology :)
I didn't support `xsd:decimal`.

>> (Are we really pretending that everybody thought datatypes in OWL  
>> 1.0 were fine and dandy?)
>
> Not at all. We're trying to do a much better job here. The design  
> choice we're faced with is whether to include floats at all (as  
> required), and if so, exactly what to spec them as. We'll include an  
> owl:real type (I hope) which will *really* be the reals, and perhaps  
> require decimal as well. A lot of the time, programmers use floats  
> for reals because there is no other choice (or they think  
> computation performances is critical). We're in a somewhat different  
> situation.
>
> Clear education material is definitely needed.
>
>>>>> Doing what all my users want and expect in this case turns out  
>>>>> to be way way easier than doing what a broken spec would  
>>>>> require. Any working group who would produce such a spec would  
>>>>> clearly be putting their own interests (ease of spec authoring  
>>>>> and political considerations) above their duty to their intended  
>>>>> users.
>>>
>>> I think your rhetoric flew ahead of reality here. It's not  
>>> actually easier to spec this (as the ongoing battle has shown :)).  
>>> As you well know, it's much easier to give in to Boris than not  
>>> to :) I don't believe I'm particularly motivated by political  
>>> considerations per se. I do think that departing from existing  
>>> behavior (disjointness) and normal meaning (in computer science)  
>>> needs to be done carefully.
>>
>> Let me expand upon my rhetoric:
>>
>> 1. Users want a (dense) real number line.
>
> Agreed. (But we have decimal and are going to offer real.)
>
>> 2. Users expect a (dense) real number line when they write  
>> `xsd:float` in OWL 1.0 ontologies.
>
> Unclear to me. Further, it's unclear to me whether we should respect  
> that or work against it.
This is an easy point for the WG to establish. Grab a whole load of  
OWL 1.0 ontologies that use `xsd:float` without values, track down the  
authors, and ask them. Absence cajoling from the interrogator, I'm  
willing to bet big money on the results.

>> 3. OWL 1.0 implementations reason as though the `xsd:float` value  
>> space is dense.
>> 4. The OWL 1.0 specifications state that the `xsd:float` value  
>> space is nowhere-dense.
>
> By reference to xsd, yes.
>
>> If you disagree about the first two points then it's certainly  
>> worth discussion: Alan's [investigation](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0103.html 
>> ) seems to support my experience on point 1.
>
> See above.
>
>> I have yet to see a single counter-example to point 2---and I've  
>> asked many users what they meant when they wrote their datatype  
>> restrictions.
>
> Let me stipulate that for a minute. I trust you would contend this  
> for programming languages too.
I don't contend that for programming languages at all. Programmers  
understand that their data types require concrete representation, and  
there is no reasonable concrete representation for all reals.

OWL does *not* need concrete representations to reason about data types.

> But programming languages don't silently substitute a dense value  
> space for floats. I would contend that most people expect exact  
> computations from their reals too, but again, programming languages  
> don't do that (though, amazingly to me, calculator do! I did some  
> testing last year and many even simple online calulators use reals  
> internally so you can go from 1/3 * 3 and get 1 again.)

Double or nothing on the last bet that they don't use reals  
internally. Rationals, maybe. Be clear: OWL can handle the real value  
space. Standard programming languages cannot manipulate arbitrary reals.

> There are lots of features of OWL that aren't obvious to many users  
> (the open world assumption, the unique name assumption). This isn't  
> a reason to blithely ignore user instincts, of course. Far from it!  
> But it is not immediate.

OWL includes OWA and discards UNA because its authors assume that most  
of its users will benefit from these choices. In those cases user  
needs and user expectations seem to be at odds, so hard choices need  
to be made. If users both want and expect real numbers you'd be crazy  
to do anything else.

[snip]

>>> (Note that this will likely be the algebraic reals and only  
>>> rational constants. So, no transcendentals. I'd be interested in  
>>> your view on that. I can imagine adding the trans. but would  
>>> prefer to defer it until a later iteration.)
>>
>> This getting ridiculous---so you're saying you think there is a  
>> substantial user base who need to be able to specify that a value  
>> is the solution to some algebraic equation?
>
> Yes. Remember that we are (considering) adding linear and perhaps  
> polynomial inequations. This has been requested by users (including  
> those working on commerical projects, see:
> http://www.w3.org/2007/OWL/wiki/N-ary_Data_predicate_use_case
> for some examples)
It would be crazy to make that stuff a core part of OWL.

> If we rule out the irrationals, then we have intuitively solvable  
> equations which are not solvable in the rationals.
>
> For writing down values, it seems that the rationals are enough for  
> a wide range of cases where you are looking for a dense line. Until  
> you expand the range of constants (usually with equations of some  
> form) it's hard to see the utility to users of additional reals  
> (e.g., the square root of two isn't really a constant, but a radical).
>
>> I have absolutely no idea what perspective the working group is  
>> taking here---what implementor
>
> RacerPro already has some form of linear inequations over the  
> algebraic reals (and, I believe, the algebraic complex numbers). FaCT
> ++ and Pellet developers have indicated that they have interest in  
> implementation. We have several classes of users (some of which are  
> represented on the page above.)
>
> The working group does not yet have consensus on these features. It  
> also does not have a complete design.
>
>> or user has expressed interest in anything other than the real  
>> number line???
>
> The algebraic reals just are the above (i.e., the solutions to  
> polynomials with rational coefficients). The transcendental reals  
> are still more. Without transcendental coefficents or constants, you  
> can't "get" to them. So, practically speaking, it makes no  
> difference to the consistency of any knowledge base whether we spec  
> the type as being all reals or being algebraic reals.
>
> I personally would like to separate them to make augmenting the  
> lexical space a bit easier in the future. That is, if we're not  
> going to have transcendental constants, I think it makes sense to  
> *call* the type we're introducing something like "algebraic  
> reals" (and even restrict the value space). That leaves it open for  
> a future group to introduce the reals as a super type (both in the  
> lexical and in the value spaces).
But why on earth do we need special value spaces for *any* of these  
sets? Once again, what percentage of users is going to want to say  
"number that can be expressed as some rational"? Surely all these  
wacky numerics just provide more evidence than unadorned OWL  
ontologies should just be referencing the entire real number line  
instead of "accidentally" restricting it to some counter-intuitive  
subset!

>> Can you guys please just come up with a version of the [`numeric`](http://www.w3.org/TR/xmlschema-2/#rf-numeric 
>> ) notion? Pretty please?
>
> That's interesting and potentially helpful for some things (like  
> spliting off the strings from the numbers in general), but I don't  
> see it helps with our current situation.
>
>>>>> I don't particularly care what each of these three is called; as  
>>>>> long as OWL specifies the internal semantics of these three  
>>>>> types of spaces, then it's straightforward to "implement" the  
>>>>> datatypes users will actually want in terms of them. But, of  
>>>>> course, the ability to use XML Schema Datatypes to encode  
>>>>> specific values within each of these spaces would be quite  
>>>>> convenient
>>>
>>> Do you mean the lexical spaces?
>>
>> I mean the only time I explicitly want XML Schema is when my  
>> implementation is parsing specific values provided by the user. If  
>> you happen to re-use the XML Schema spec for other things that is  
>> for your own convenience, not mine.
>
> So what is your position on user defined types? We reuse that from  
> XSD too.
No use of XSD to specify value spaces at all. Only particular values.

> The core types, and even things like float, seem coherent from an  
> OWL point of view (excepting certain facets and corner issues). That  
> is, you can coherently reason over them.

The fact that a system is internally coherent is a pretty low bar. The  
goal should be a semantic model which matches user needs and  
expectations. Not always possible, but `float` and `rational` value  
spaces seem to be headed in exactly the opposite direction.

>>>>> ---and would use the XML Schema specification for *exactly* what  
>>>>> it's good at.
>>>
>>> The additional question is whether to require additional types  
>>> that are not the above three. Among these are float and double. My  
>>> belief is that if we are going to add such datatypes as required,  
>>> and we are going to take them from xsd, then they should reflect  
>>> the semantics of those types and our advice to users is to only  
>>> use them if they specifically intend those semantics.
>>
>> I'd guess that using xsd names for value spaces will just (continue  
>> to) confuse users.
>
> Seems so.
>
>> More importantly, and yet again, I have never ever encountered a  
>> user who would prefer to use the `float` or `double` value spaces  
>> if a `real` value space were available.
>
> But that suggests (to me) we not provide the float or double types.
For value spaces, I agree.

>> If there are users who feel the other way, then please produce  
>> them---merely hypothesizing their theoretical existence does not  
>> seem useful. (I grant that the class is satisfiable. I contend that  
>> its size is vanishingly small in practice.)
>
> Again, there are many aspects of the types, e.g., disjointness,  
> size, NaN, lexical space, and discreteness. As far as I can tell,  
> users have picked on several of these (while champions have claimed  
> that some parts that other people have dismissed are critical).
>
> That all being said, my personal concern with "getting floats right"  
> involve future, hypothetical use. Which is a big fucking weakness of  
> my position. But I would prefer not to require them at all than to  
> require them with "wrong" semantics. I would prefer directing people  
> who want a real type to the real type. I think that is generally  
> better for a number of reasons, including education. It's rather odd  
> to introduce primitive types with different names with no different  
> semantics not even *intended* different semantics.
>
> Oh, of course, one reason is if the lexical spaces are different. I  
> don't have a problem giving the lexical space of our real a lot of  
> lexical freedom (in the initial proposal we suggested a fraction  
> like syntax, but we could add all sorts of variants; but you have to  
> be careful because other syntaxes sometimes require infinite  
> expansions).
>
>>> The n-ary predicate definition system will, at most, be over the  
>>> core three types above (e.g., polynomial or linear inequations  
>>> with rational coefficients over the reals ). However, one can  
>>> pretty easily imagine a predicate definition system that was  
>>> focused on the floats and was sensitive to the various semantics.  
>>> It wouldn't have to be direct floating point based equations, but  
>>> an interval arithmetic system which was designed to help reason  
>>> about measurements and computations (and their errors).
>>
>> I care not a whit for n-ary datatypes. I might implement them if  
>> they're in the spec; I might not. But if the spec says you need to  
>> use n-ary datatypes to get real numbers,
>
> No no no. The real datatype will be available, simpliciter. It's use  
> in n-ary is an *additional* motivation for it.
>
>> and leaves the issues raised with the `float` value space in place,
>
> ?
>
>> I will ignore the spec and implement the real number line for unary  
>> datatypes.
>
> With transcendentals? With what lexical space?
In theory, yes. My documentation will reference the whole real number  
line. But my parser will probably only handle `xsd:float` and  
`xsd:double` for values.

>> Just like I did for OWL 1.0. As a member of the public, that is my  
>> feedback to the working group.
>
> Thanks! Please not that all this is not settled yet.
>
>
>>> I grant entirely that that use case is quite speculative at the  
>>> moment. But given that 1) we have alternatives for the "any  
>>> number" type and 2) cardinality reasoning with the floats is not  
>>> very much more difficult that with user defined finite ranges over  
>>> the integers (except for the fact that users have to do much more  
>>> work to get there), I don't think we should muck with the  
>>> semantics of floats.
>>
>> I strongly disagree with 2.
>
> Really? <http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm 
> >
>
> There's some special casing for various "odd" bits (NaN, etc.) but  
> this shows that sizing float ranges can be reduced to sizing integer  
> ranges. Thus, it's not fundamentally different.
>
>> I don't want my implementation to care about the difference between  
>> `double` and `float`,
>
> So you want them exactly identical?
>
>> and I consider any line of code I write involving the internals of  
>> float representation to be a wasted line of code, because my users  
>> really don't care.
>>
>> Much more importantly, it's my job to turn your spec into user-
>> facing documentation
>
> This is my job too :) Both inside and outside the working group.
>
>> and support, and there is not a chance in hell I'm going to explain  
>> this issue to my users. They don't care, and they don't want the  
>> semantics you are describing. Experience with OWL 1.0 has  
>> demonstrated this.
> [snip]
>
> Can you say exactly what the semantics are you want? I get that you  
> want them dense (and think that I'm dense :)). But I'm unclear on:
> disjointness (from each other and from decimal and its subtypes  
> like integer)
You keep referring to the value spaces specified in xsd. I don't care  
about those value spaces. I don't think they are relevant.

But integers should probably live on the same number line as the rest  
of the reals.

> range

Infinite in both directions for the number line, but if particular  
values can only be specified with xsd datatypes then users will only  
be able to specify particular values within some range.
For integers, I'd support limiting particular values to `xsd:long`  
(and would consider a spec which only required `xsd:int` reasonable).  
If you required support for any `xsd:Integer` I probably wouldn't  
implement it unless there was great user demand.

> NaN like constants

I don't have a strong opinion; no contact with users who have NaN needs.

> Thanks for the feedback.
>
> Cheers,
> Bijan.

And if you're going to request further comment from a member of the  
public, could you please do it on a list to which the public can post?  
Shifting back to the WG list excludes me from comment. (Which is fine  
if you don't address questions directly to me.)

-rob


smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Dave Peterson-6
In reply to this post by Michael Kay

At 12:41 PM +0100 2008-07-05, Michael Kay wrote:

>Since the types date, time, dateTime, gYear, gYearMonth, gMonth, gMonthDay,
>and gDay are disjoint in both their value spaces and lexical spaces, I would
>have thought it quite easy to define a primitive type that is essentially
>the union of all of these (it might or might not be abstract), and derive
>these 8 types from this new type by restriction. Where exactly is the
>difficulty?

I don't see that moments in time, segments of time, and repeating
intervals make up a sensible datatype.  That's my particular problem
with the idea.  E.g., how does one define order?  Is 14:00:00 less
than or equal to 1997?

However, it could be done, even if the value space seemed to contain
apples and oranges, so to speak.  Just as the anySimpleType and
anyAtomicType are artificially constructed datatypes.  Why hasn't
it been suggested before?

I'm curious how the simplification would be effected for QT.
--
Dave Peterson

[hidden email]

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Michael Kay

>
> I don't see that moments in time, segments of time, and
> repeating intervals make up a sensible datatype.  That's my
> particular problem with the idea.  

Well, one can certainly conceive of a generalization of these types that is
a three-dimensional space whose axes are the start instant (perhaps
unknown), the duration (perhaps zero), and the interval between repeats
(perhaps infinite). Alternatively, and perhaps more conveniently, you can
think of it as a seven-dimensional space containing year, month, day, hour,
minute, second, and timezone-offset, allowing components at either end to be
omitted, where the absence of a high-order component indicates a repeating
interval and the absence of a low-order component indicates a time span.

E.g., how does one define order?  Is 14:00:00 less than or equal to 1997?

You could define an ordering (if you wanted to) by filling in the gaps,
treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as
1997-01-01T00:00:00. Or you could say that the new primitive type is
unordered, only the subtypes are ordered, as we do with the two duration
subtypes.
>
> I'm curious how the simplification would be effected for QT.

Difficult to do retrospectively, but with such a type, instead of XSLT
defining three functions format-date, format-time, and format-dateTime, it
could have defined a single function which would work perfectly well on all
eight types, as well as on other logically-consistent subtypes like
gHourMinute.

Michael Kay
http://www.saxonica.com/


Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Dave Peterson-6

At 10:13 AM +0100 2008-07-06, Michael Kay wrote:

>  >
>>  I don't see that moments in time, segments of time, and
>>  repeating intervals make up a sensible datatype.  That's my
>>  particular problem with the idea.
>
>Well, one can certainly conceive of a generalization of these types that is
>a three-dimensional space whose axes are the start instant (perhaps
>unknown), the duration (perhaps zero), and the interval between repeats
>(perhaps infinite). Alternatively, and perhaps more conveniently, you can
>think of it as a seven-dimensional space containing year, month, day, hour,
>minute, second, and timezone-offset, allowing components at either end to be
>omitted, where the absence of a high-order component indicates a repeating
>interval and the absence of a low-order component indicates a time span.
>
>E.g., how does one define order?  Is 14:00:00 less than or equal to 1997?
>
>You could define an ordering (if you wanted to) by filling in the gaps,
>treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as
>1997-01-01T00:00:00. Or you could say that the new primitive type is
>unordered, only the subtypes are ordered, as we do with the two duration
>subtypes.
>>
>>  I'm curious how the simplification would be effected for QT.
>
>Difficult to do retrospectively, but with such a type, instead of XSLT
>defining three functions format-date, format-time, and format-dateTime, it
>could have defined a single function which would work perfectly well on all
>eight types, as well as on other logically-consistent subtypes like
>gHourMinute.

Good ideas all.  Fodder for Schema 2.0, I'd say.  It takes time to
think these things out; equality didn't diverge from identity in 1.0
because we didn't have time to think out the ramifications.  Sigh--
even standards creation is a publish-or-perish world, and if a version
of the standard doesn't get out the door in a reasonable time, even
if the possible improvements haven't been thought out yet, the
creating standards group finds its resources gone and no standard
at all gets out.

One does the best one can, and hopes one hasn't closed off too many
useful possibilities for the next round--or left things totally
screwed up by not closing up some loopholes that leave the standard
useless.  A fine balancing act.

(This, of course, is preaching to the choir WRT Mike Kay himself;
he's been involved in the production of at least several standards.)
--
Dave Peterson
SGMLWorks!

[hidden email]

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Bijan Parsia-3
In reply to this post by Rob Shearer-4

On Jul 5, 2008, at 1:04 PM, Rob Shearer wrote:

>>> I'm providing you with my experience: every user I've ever spoken  
>>> to about this topic has wanted the real number line.
>>> They are used to using the xsd datatypes `float` and `double` to  
>>> represent number values, so they use these without values in OWL  
>>> to mean "some number".
>>
>> Do they mean bounded numbers? (i.e. with min and max sizes?) Do  
>> they distinguish between double and float? Do they care about  
>> NaNs? (Alan's users care about the latter.)
>
> Whether it's "forall R > 1.0^^xsd:float" or "forall R `xsd:float`"  
> they seem to intend a dense number line.

So you had user defined restrictions on floats, interesting.

> In the first case `float` is just the easiest way to specify the  
> value; in the second you can certainly argue that they should have  
> used `decimal`...but that's a pointless argument because my  
> reasoner didn't really support decimal.

That's interesting. I think part of what we need to is select a set  
of sane datatypes to require. String, Integer, reals seem reasonable.

>>> My experience is that the use of xsd datatypes as value spaces in  
>>> OWL 1.0 causes users to write what they don't mean.
>>
>> For me, this would suggest removing them or enforcing them more  
>> clearly.
>
> I'd suggest removing them.

That's where I'm heading too.

>>> My experience is that *every* ontology using `xsd:float` and  
>>> `xsd:double` without values would be better off using  
>>> `xsd:decimal`, but that the user intent was "some real  
>>> number" (and I should note that I'm against requiring support for  
>>> `xsd:decimal` values).
>>
>> Values? Or the datatype? In OWL 1, all these types were optional  
>> and poorly speced and had no documentation whatsoever. Part of the  
>> goal here is to spec well and document clearly any types we require.
>
> I would like to use doubles internally to represent points on the  
> real number line.

For what lexical syntax?

> Some homogeneous mix of internal representations is a pain. And I  
> seriously doubt that many users really care about the extra  
> representation power of `decimal`. It makes sense as an optional  
> feature reasoners can support, but it seems completely unnecessary  
> to require it in the spec---it's exactly the sort of thing I'd put  
> off implementing indefinitely under users asked for it.
>
> The reason `decimal` keeps coming up is just that it's dense.

That's true. But  there are several issues floating about, including  
the possibility of interaction between floats and cardinality. It  
seems to me that for most users, that will be a rare occurrence, even  
accidently. It certainly requires ranges of floats (since it's  
unlikely that the cardinalities required to cause a problem would be  
feasible anyway). E.g., if we had unbounded binary numbers then such  
floats would be no harder than integers.

> So are we using the xsd spec as an excuse to conflate density with  
> complex internal representations?

I don't think so.

[snip]
> Referring any user over to that spec to understand value spaces is  
> obnoxious and counter-productive:

We definitely don't intend to do that, I hope. Part of our current  
effort is to make sure we carefully document the types we require and/
or sanction.

> even WG members seem to be having trouble grokking it. (And bravo  
> to anyone making the pedantic point that a particular value is a  
> degenerate value space.)
>
> I contend that OWL users only want a tiny tiny number of different  
> value spaces to play with: integers, strings, and reals.

I certainly agree that these are key. I think the group agrees too.  
The other types are something of a legacy.

> It is possible, however, they they will want a larger number of  
> ways to lexically represent particular values within these three  
> spaces.

This wouldn't surprise me at all.

> Most importantly, I do not think there is necessarily a direct  
> correlation between the lexical representations used to represent  
> particular values and the value spaces in which those particular  
> values live. I.e. users want to be able to specify particular  
> values within the `real` value space using `xsd:float`,

You mean the type name or the lexical syntax (e.g., "12.78e-2")? I'm  
personally more comfortable with allowing the latter than pushing  
"xsd:float" as a synonym for the real value space. Your milage  
obviously varies.

> but they do *not* have any interest in use of the `xsd:float` value  
> space.

Some do at least to the extent of wanting NaN (and perhaps -0). I'd  
personally prefer not to shove them into the real type (certainly  
NaN; I suppose we could make our reals the affine reals and handle  
+inf).

> Thus we've got two orthogonal concepts which happen to coincide for  
> strings and integers but not for real numbers.
>
> My proposed solution would be to use brand-new OWL names for all  
> value spaces, but use xsd syntax to specify particular values.

Could you say what you think the lexical space of the reals should  
include? At least, as a first cut? (It seems decimal, scientific, and  
rational notation would all be useful, the first two for common ways  
of writing and the third for full coverage of the rationals.)

[snipped lots of useful details]

Thanks very much for those. I find them extremely helpful.

> Thanks for the feedback.
>>
>> Cheers,
>> Bijan.
>
> And if you're going to request further comment from a member of the  
> public, could you please do it on a list to which the public can  
> post? Shifting back to the WG list excludes me from comment.

D'oh! Sorry. That was an accident. My apologies.

> (Which is fine if you don't address questions directly to me.)

Thanks again for the discussion.

Cheers,
Bijan.

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
>> Most importantly, I do not think there is necessarily a direct  
>> correlation between the lexical representations used to represent  
>> particular values and the value spaces in which those particular  
>> values live. I.e. users want to be able to specify particular  
>> values within the `real` value space using `xsd:float`,
>
> You mean the type name or the lexical syntax (e.g., "12.78e-2")?

XSD offers a lexical syntax for points that happen to lie on the real  
number line---that's what I suggest using it for. The easiest approach  
is that xsd names on their own are not valid "datatypes"; particular  
values encoded using xsd, however, are (because particular values are  
single-element value spaces).

> I'm personally more comfortable with allowing the latter than  
> pushing "xsd:float" as a synonym for the real value space. Your  
> milage obviously varies.
>
>> but they do *not* have any interest in use of the `xsd:float` value  
>> space.
>
> Some do at least to the extent of wanting NaN (and perhaps -0). I'd  
> personally prefer not to shove them into the real type (certainly  
> NaN; I suppose we could make our reals the affine reals and handle  
> +inf).
I'd endorse including only one zero, but I agree there's an issue with  
NaN. My principled stand is that it's inconsistent (a value space of  
size zero), but I'd definitely want to analyze the use cases to see  
who loses important functionality from that decision.

But my main point is that users have no interest in the "holes"  
introduced by the xsd:float value space: providing them access to a  
value space of numbers representable in float representation is not  
useful, and could lead to lots of confusion, particularly if users  
could easily use such a space "by accident". That's the situation  
we've fallen into with floats in OWL 1.0.

>> Thus we've got two orthogonal concepts which happen to coincide for  
>> strings and integers but not for real numbers.
>>
>> My proposed solution would be to use brand-new OWL names for all  
>> value spaces, but use xsd syntax to specify particular values.
>
> Could you say what you think the lexical space of the reals should  
> include?

I don't know what you mean by "lexical space of the reals". I don't  
propose defining the reals lexically; I propose defining the value  
space mathematically. But implementations should allow users to  
specify particular points in that value space using the lexical  
representations for `xsd:float` and `xsd:int` values. I expect most  
implementations will also support points represented as `xsd:double`  
and `xsd:long` as well. I do *not* think a conformant implementations  
should have to deal with arbitrary points represented as `xsd:decimal`  
(since the vast majority of users don't need the extra  
representational power, and there is substantial implementation burden  
and performance penalty for dealing with such values correctly).

> At least, as a first cut? (It seems decimal, scientific, and  
> rational notation would all be useful, the first two for common ways  
> of writing and the third for full coverage of the rationals.)

The WG should consider that some implementations might allow lots of  
xsd syntaxes but lose precision on some of them (allow use of  
`xsd:decimal` in ontology files for user convenience, but convert them  
to floats during parsing)---thus a vocabulary for what it means to  
"support" a numeric xsd type for particular values would be useful. My  
big concern here is that an ontology will be developed and tested with  
a reasoner with "full" `xsd:decimal` support but then when it's used  
with an implementation with "imprecise" `xsd:decimal` support  
everything goes pear-shaped. Spitting out warnings during parsing  
isn't a great solution...

And of course some implementations might offer additional value spaces  
as well, but I'd like the spec to make it very clear that this is a  
very different thing than the above. For one thing, I'd suggest  
outlawing any use of names within the xsd namespace for value spaces,  
even spaces implementors have added as extensions. "Support for  
`xsd:decimal`" should mean `xsd:decimal` syntax for points on the real  
number line and nothing else.

-rob


smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Richard H. McCullough-3
In reply to this post by Dave Peterson-6

FYI
When I designed the mKR language, I purposely avoided placing any
constraints
on the space,time,view specification of context.  This permits the user to
choose
whatever level of detail is appropriate in a given situation.  The resulting
descriptions
are always useful, and sometimes just plain fun!

Some of my specifications:
    space, time = here, now
    time = past, present, future
    time = yesterday, today
    space = my house, the store
    view = Aristotle, feminist
    view = RDF, OWL, mKR, CycL, Amazon, Google

Dick
----- Original Message -----
From: "Dave Peterson" <[hidden email]>
To: "Michael Kay" <[hidden email]>; "'Alan Ruttenberg'"
<[hidden email]>; "'Rob Shearer'" <[hidden email]>
Cc: <[hidden email]>; <[hidden email]>;
<[hidden email]>
Sent: Sunday, July 06, 2008 6:51 AM
Subject: RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <->
float <-> double conundrum


>
> At 10:13 AM +0100 2008-07-06, Michael Kay wrote:
>>  >
>>>  I don't see that moments in time, segments of time, and
>>>  repeating intervals make up a sensible datatype.  That's my
>>>  particular problem with the idea.
>>
>>Well, one can certainly conceive of a generalization of these types that
>>is
>>a three-dimensional space whose axes are the start instant (perhaps
>>unknown), the duration (perhaps zero), and the interval between repeats
>>(perhaps infinite). Alternatively, and perhaps more conveniently, you can
>>think of it as a seven-dimensional space containing year, month, day,
>>hour,
>>minute, second, and timezone-offset, allowing components at either end to
>>be
>>omitted, where the absence of a high-order component indicates a repeating
>>interval and the absence of a low-order component indicates a time span.
>>
>>E.g., how does one define order?  Is 14:00:00 less than or equal to 1997?
>>
>>You could define an ordering (if you wanted to) by filling in the gaps,
>>treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as
>>1997-01-01T00:00:00. Or you could say that the new primitive type is
>>unordered, only the subtypes are ordered, as we do with the two duration
>>subtypes.
>>>
>>>  I'm curious how the simplification would be effected for QT.
>>
>>Difficult to do retrospectively, but with such a type, instead of XSLT
>>defining three functions format-date, format-time, and format-dateTime, it
>>could have defined a single function which would work perfectly well on
>>all
>>eight types, as well as on other logically-consistent subtypes like
>>gHourMinute.
>
> Good ideas all.  Fodder for Schema 2.0, I'd say.  It takes time to
> think these things out; equality didn't diverge from identity in 1.0
> because we didn't have time to think out the ramifications.  Sigh--
> even standards creation is a publish-or-perish world, and if a version
> of the standard doesn't get out the door in a reasonable time, even
> if the possible improvements haven't been thought out yet, the
> creating standards group finds its resources gone and no standard
> at all gets out.
>
> One does the best one can, and hopes one hasn't closed off too many
> useful possibilities for the next round--or left things totally
> screwed up by not closing up some loopholes that leave the standard
> useless.  A fine balancing act.
>
> (This, of course, is preaching to the choir WRT Mike Kay himself;
> he's been involved in the production of at least several standards.)
> --
> Dave Peterson
> SGMLWorks!
>
> [hidden email]
>
>
Dick McCullough
http://mKRmKE.org/
Ayn Rand do speak od mKR done;
knowledge := man do identify od existent done;
knowledge haspart proposition list;
mKE do enhance od "Real Intelligence" done;



Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Bijan Parsia-3
In reply to this post by Rob Shearer-4

On Jul 6, 2008, at 8:07 PM, Rob Shearer wrote:

>>> Most importantly, I do not think there is necessarily a direct  
>>> correlation between the lexical representations used to represent  
>>> particular values and the value spaces in which those particular  
>>> values live. I.e. users want to be able to specify particular  
>>> values within the `real` value space using `xsd:float`,
>>
>> You mean the type name or the lexical syntax (e.g., "12.78e-2")?
>
> XSD offers a lexical syntax for points that happen to lie on the  
> real number line

It offers several and we're free to define one for owl:real. If we  
use any decimal notation, we have exactness problems (e.g., 1/3), but  
decimal is very user friendly. So, I was thinking that the valid  
syntax for a real would be decimal floating points and ratios of  
integers. We could include scientific notation as well.

> ---that's what I suggest using it for. The easiest approach is that  
> xsd names on their own are not valid "datatypes"; particular values  
> encoded using xsd, however, are (because particular values are  
> single-element value spaces).
>
>> I'm personally more comfortable with allowing the latter than  
>> pushing "xsd:float" as a synonym for the real value space. Your  
>> milage obviously varies.
>>
>>> but they do *not* have any interest in use of the `xsd:float`  
>>> value space.
>>
>> Some do at least to the extent of wanting NaN (and perhaps -0).  
>> I'd personally prefer not to shove them into the real type  
>> (certainly NaN; I suppose we could make our reals the affine reals  
>> and handle +inf).
>
> I'd endorse including only one zero, but I agree there's an issue  
> with NaN.

And the infinities, though we could always go for the affine real line.

> My principled stand is that it's inconsistent (a value space of  
> size zero), but I'd definitely want to analyze the use cases to see  
> who loses important functionality from that decision.
>
> But my main point is that users have no interest in the "holes"  
> introduced by the xsd:float value space: providing them access to a  
> value space of numbers representable in float representation is not  
> useful, and could lead to lots of confusion, particularly if users  
> could easily use such a space "by accident".

Well, you'll get exactness holes with binary or decimal notation,  
regardless of density issues.

> That's the situation we've fallen into with floats in OWL 1.0.
>
>>> Thus we've got two orthogonal concepts which happen to coincide  
>>> for strings and integers but not for real numbers.
>>>
>>> My proposed solution would be to use brand-new OWL names for all  
>>> value spaces, but use xsd syntax to specify particular values.
>>
>> Could you say what you think the lexical space of the reals should  
>> include?
>
> I don't know what you mean by "lexical space of the reals".

XSD datatypes have a lexical space (e.g., the syntax) and a value  
space. You are suggesting, I thought, that we adopt a value space  
that is the reals and something about using xsd syntax (i.e., lexical  
spaces) for the syntax. XSD offers exact syntax only for binary and  
decimals (I believe it's exact for binary). I was wondering what sort  
of lexical space you want.

> I don't propose defining the reals lexically;

Sure.

> I propose defining the value space mathematically.

Well, of course. But that's what XSD does as well. The decimals are a  
well defined mathematical set.

> But implementations should allow users to specify particular points  
> in that value space using the lexical representations for  
> `xsd:float` and `xsd:int` values.

So you want a very broad lexical space for our real type, i.e., "1",  
"1.0",  and "12.78e-2". If we want exactness for the rationals, we  
need either to allow repeating (e.g., 0.333repeating) (usually done  
with a macron) or fraction syntax (e.g., 1/3).

> I expect most implementations will also support points represented  
> as `xsd:double` and `xsd:long` as well.

You mean their syntax, i.e., their lexical space.

(Sorry for using the XSD terminology, but I think it's a bit clearer  
if we stick to it for the moment.)

> I
> do *not* think a conformant implementations should have to deal  
> with arbitrary points represented as `xsd:decimal` (since the vast  
> majority of users don't need the extra representational power, and  
> there is substantial implementation burden and performance penalty  
> for dealing with such values correctly).

Given that more and more languages (e.g., Java) now bundle a decimal  
type with their core libraries, I'm not so clear on the first. I'd  
like to hear more about the second.

>> At least, as a first cut? (It seems decimal, scientific, and  
>> rational notation would all be useful, the first two for common  
>> ways of writing and the third for full coverage of the rationals.)
>
> The WG should consider that some implementations might allow lots  
> of xsd syntaxes but lose precision on some of them (allow use of  
> `xsd:decimal` in ontology files for user convenience, but convert  
> them to floats during parsing)

Obviously, this can cause quite serious interoperability problems.  
Some I'm inclined against it on first blush.

> ---thus a vocabulary for what it means to "support" a numeric xsd  
> type for particular values would be useful.

This is what we're after. Anything we spec will be tightly specced.  
At the moment, we only have required and optional as modalities of  
support. I think supporting various levels of precision  (or variant  
mapping) would be quite hard to understand.

> My big concern here is that an ontology will be developed and  
> tested with a reasoner with "full" `xsd:decimal` support but then  
> when it's used with an implementation with "imprecise"  
> `xsd:decimal` support everything goes pear-shaped.

That would be bad :) There could be subtler problems if people mapped  
decimal syntax to binary in variant ways (i.e., which float do you  
take 0.1 to?)

> Spitting out warnings during parsing isn't a great solution...
>
> And of course some implementations might offer additional value  
> spaces as well, but I'd like the spec to make it very clear that  
> this is a very different thing than the above. For one thing, I'd  
> suggest outlawing any use of names within the xsd namespace for  
> value spaces, even spaces implementors have added as extensions.  
> "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for  
> points on the real number line and nothing else.\

This doesn't seem likely. Existing implementations already do  
different things with different xsd types. It'll be very hard to get  
buy in from the RDF community. It seems like a more likely strategy  
is to fix a (required) set of OWL types (or core types) which are  
easy to understand and robust with respect to intuitive behavior, and  
leave the more specialized types for future people to standardize.

One this model, users would just have to decide between integers and  
reals. We could have quite a wide lexical space for reals (and even  
for integers, i.e., allow 1.0 to mean the integer 1). But  
"0.1"^^xsd:float would not be required, but also we wouldn't change  
the meaning along the lines you suggest (we'd just be silent about  
it). It's fairly simple to migrate old ontologies to the new one with  
a simple converter. If enough implementations did it silently, that  
would be information for a future group.

Thanks again.

Cheers,
Bijan.

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
>> XSD offers a lexical syntax for points that happen to lie on the  
>> real number line
>
> It offers several and we're free to define one for owl:real. If we  
> use any decimal notation, we have exactness problems (e.g., 1/3),  
> but decimal is very user friendly. So, I was thinking that the valid  
> syntax for a real would be decimal floating points and ratios of  
> integers. We could include scientific notation as well.

Why on earth would the OWL group come up with their own syntax for  
encoding numbers? The XSchema guys have already done that, and people  
have implemented parsers for their spec. If there's going to be a  
syntax for rationals or algebraics, then that seems to be right up  
their alley.

>> But my main point is that users have no interest in the "holes"  
>> introduced by the xsd:float value space: providing them access to a  
>> value space of numbers representable in float representation is not  
>> useful, and could lead to lots of confusion, particularly if users  
>> could easily use such a space "by accident".
>
> Well, you'll get exactness holes with binary or decimal notation,  
> regardless of density issues.

I thought I had made my proposal clear on this: the value space does  
not have holes. The representations supported for particular values  
are not sufficient to address all the points in that space, but the  
space itself does *not* have holes.

>> I don't know what you mean by "lexical space of the reals".
>
> XSD datatypes have a lexical space (e.g., the syntax) and a value  
> space. You are suggesting, I thought, that we adopt a value space  
> that is the reals and something about using xsd syntax (i.e.,  
> lexical spaces) for the syntax.

For the syntax of particular values. I keep trying to stress that  
values spaces should be kept separate from the syntax used for  
particular values.

> XSD offers exact syntax only for binary and decimals (I believe it's  
> exact for binary). I was wondering what sort of lexical space you  
> want.

XSD offers a well-defined mapping from lexical representation to IEEE  
floats. XSD defines an *exact* value for each valid lexical  
representaion. You may not like the way the mapping is defined  
(because the value of "1.1e0^^xsd:float" on the real number line is  
not equal to the value of "1.1^^xsd:decimal"), but there is no  
imprecision whatsoever about what each string represents. I am  
satisfied with the work the XSchema group did on floating-point  
lexical representations.

>> But implementations should allow users to specify particular points  
>> in that value space using the lexical representations for  
>> `xsd:float` and `xsd:int` values.
>
> So you want a very broad lexical space for our real type, i.e., "1",  
> "1.0",  and "12.78e-2".

No. I want `real` to be a value space with no lexical connotations.
I want to be able to specify a particular point in this value space  
using a string such as "1.0e0^^xsd:float".
The XSD lexical forms are not "the lexical space for reals". There is  
no such thing as "the lexical space for reals". There is such a thing  
as "the space of lexical representations which a conformant  
implementation must support for particular values in the real value  
space", but this space is much smaller than the real value space.

> If we want exactness for the rationals, we need either to allow  
> repeating (e.g., 0.333repeating) (usually done with a macron) or  
> fraction syntax (e.g., 1/3).

I don't intend to support exactness for rationals. A conformant  
implementation should only be required to provide exact support for  
`xsd:int` and `xsd:float` values.

>> I expect most implementations will also support points represented  
>> as `xsd:double` and `xsd:long` as well.
>
> You mean their syntax, i.e., their lexical space.

Supporting these syntaxes means that reasoners must also support  
reasoning with the particular values representable in those syntaxes.  
Support for additional syntaxes does not change the underlying  
semantics of the real number line, but it might make implementation of  
those semantics a bit harder.

> (Sorry for using the XSD terminology, but I think it's a bit clearer  
> if we stick to it for the moment.)
>
>> I
>> do *not* think a conformant implementations should have to deal  
>> with arbitrary points represented as `xsd:decimal` (since the vast  
>> majority of users don't need the extra representational power, and  
>> there is substantial implementation burden and performance penalty  
>> for dealing with such values correctly).
>
> Given that more and more languages (e.g., Java) now bundle a decimal  
> type with their core libraries, I'm not so clear on the first.
I'm not sure Java is an example of "more and more languages". In fact  
it is the flagship "you only ever need one language" proposal. And  
even in super-OO Java you have to program differently if you're going  
to play with polymorphic numbers than you would if you stuck to ints  
and floats.

I'd like to write a distributed OWL reasoner in Erlang. But Javascript  
and C are perhaps more persuasive counterexamples to your argument.

> I'd like to hear more about the second.

The most efficient bignum and decimal libraries are an order of  
magnitude slower than corresponding int and float calculations.  
Hardware is good with ints and floats.

>> ---thus a vocabulary for what it means to "support" a numeric xsd  
>> type for particular values would be useful.
>
> This is what we're after. Anything we spec will be tightly specced.  
> At the moment, we only have required and optional as modalities of  
> support. I think supporting various levels of precision  (or variant  
> mapping) would be quite hard to understand.

But presumably you're making clear that implementations which  
implement some "optional" functionality, but do so in a way which  
contradicts the optional semantics, are non-compliant. If so, then  
specifying what support for additional lexical representations means  
(i.e. exact) would make clear that a product which parsed  
`xsd:decimal` but internally converted to floating point would not  
"support `xsd:decimal`" by the terms of the OWL 2.0 spec. The  
implementors could always claim "partial support", however.

> One this model, users would just have to decide between integers and  
> reals. We could have quite a wide lexical space for reals (and even  
> for integers, i.e., allow 1.0 to mean the integer 1).

I'm getting really confused what you're talking about---constants  
appearing in XML and RDF OWL 2.0 files should be typed; there's no  
need at all to guess the type based on syntax.

And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly the  
same point on the real number line.

> But "0.1"^^xsd:float would not be required, but also we wouldn't  
> change the meaning along the lines you suggest (we'd just be silent  
> about it). It's fairly simple to migrate old ontologies to the new  
> one with a simple converter. If enough implementations did it  
> silently, that would be information for a future group.

No idea what this means. But I'm guessing I disagree with it.

-rob


smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Bijan Parsia-3

On Jul 6, 2008, at 10:55 PM, Rob Shearer wrote:

>>> XSD offers a lexical syntax for points that happen to lie on the  
>>> real number line
>>
>> It offers several and we're free to define one for owl:real. If we  
>> use any decimal notation, we have exactness problems (e.g., 1/3),  
>> but decimal is very user friendly. So, I was thinking that the  
>> valid syntax for a real would be decimal floating points and  
>> ratios of integers. We could include scientific notation as well.
>
> Why on earth would the OWL group come up with their own syntax for  
> encoding numbers?

I'm presuming we're sticking with the basic xsd framework. So types  
have a lexical space and a values space. So, owl:real has a value  
space of the reals. But what should the lexical space be? I'd propose  
that at least the union of the xsd numeric types lexical spaces be  
the lexical space for our new type. I would add additional syntax for  
exact rationals (such as 1/3). The first part is isomorphic to your  
proposal about xsd syntax, I believe.

> The XSchema guys have already done that, and people have  
> implemented parsers for their spec. If there's going to be a syntax  
> for rationals or algebraics, then that seems to be right up their  
> alley.

They don't seem interested, alas.

>>> But my main point is that users have no interest in the "holes"  
>>> introduced by the xsd:float value space: providing them access to  
>>> a value space of numbers representable in float representation is  
>>> not useful, and could lead to lots of confusion, particularly if  
>>> users could easily use such a space "by accident".
>>
>> Well, you'll get exactness holes with binary or decimal notation,  
>> regardless of density issues.
>
> I thought I had made my proposal clear on this: the value space  
> does not have holes.

Sure.

> The representations supported for particular values are not  
> sufficient to address all the points in that space, but the space  
> itself does *not* have holes.

I just meant things that you can't write down 1/3 in decimal. That's  
all.

>>> I don't know what you mean by "lexical space of the reals".
>>
>> XSD datatypes have a lexical space (e.g., the syntax) and a value  
>> space. You are suggesting, I thought, that we adopt a value space  
>> that is the reals and something about using xsd syntax (i.e.,  
>> lexical spaces) for the syntax.
>
> For the syntax of particular values. I keep trying to stress that  
> values spaces should be kept separate from the syntax used for  
> particular values.

Sure. But that's true in XSD as well. From what I can tell, you want  
all the literals that have "xsd:float" (to pick an example) to map to  
(a subset) of the reals (as the value space) and constrain/enable  
certain syntax. So "1.0"^^xsd:float would be a syntax error.

>> XSD offers exact syntax only for binary and decimals (I believe  
>> it's exact for binary). I was wondering what sort of lexical space  
>> you want.
>
> XSD offers a well-defined mapping from lexical representation to  
> IEEE floats.

Yes. I just hadn't checked the spec, hence my hesitation.

> XSD defines an *exact* value for each valid lexical representaion.  
> You may not like the way the mapping is defined (because the value  
> of "1.1e0^^xsd:float" on the real number line is not equal to the  
> value of "1.1^^xsd:decimal"),

No that's fine.

> but there is no imprecision whatsoever about what each string  
> represents.

You've got the wrong string. I only hedged because I hadn't looked  
and I don't like to speak with certainy without looking. My point was  
only that there are numbers which can not be exactly represented in  
binary or in decimal.

> I am satisfied with the work the XSchema group did on floating-
> point lexical representations.
>
>>> But implementations should allow users to specify particular  
>>> points in that value space using the lexical representations for  
>>> `xsd:float` and `xsd:int` values.
>>
>> So you want a very broad lexical space for our real type, i.e.,  
>> "1", "1.0",  and "12.78e-2".
>
> No. I want `real` to be a value space with no lexical connotations.

I'd be surprised if we could get consensus on abandoning the lexical  
space/value space language and understanding. It's pretty deeply  
embedded into RDF.

> I want to be able to specify a particular point in this value space  
> using a string such as "1.0e0^^xsd:float".

Yeah, I'm kinda against that. But I would support "1.0e0^^owl:real".

> The XSD lexical forms are not "the lexical space for reals". There  
> is no such thing as "the lexical space for reals".

Bravo! ;)

> There is such a thing as "the space of lexical representations  
> which a conformant implementation must support for particular  
> values in the real value space", but this space is much smaller  
> than the real value space.

Our initial proposal for owl:real is to support for syntax, pairs of  
integers with the second being non-zero (i.e., standard fraction  
syntax for rationals) and (at least) the algebraic reals for the  
value space. If you don't have equations or special constants, you  
can't address the irrationals or transcendentals anyway. We are  
aiming to support some classes of equation, but only with rational  
constants.

>> If we want exactness for the rationals, we need either to allow  
>> repeating (e.g., 0.333repeating) (usually done with a macron) or  
>> fraction syntax (e.g., 1/3).
>
> I don't intend to support exactness for rationals. A conformant  
> implementation should only be required to provide exact support for  
> `xsd:int` and `xsd:float` values.

I don't think that would fly.

[snip]
>> Given that more and more languages (e.g., Java) now bundle a  
>> decimal type with their core libraries, I'm not so clear on the  
>> first.
>
> I'm not sure Java is an example of "more and more languages". In  
> fact it is the flagship "you only ever need one language" proposal.

I picked java because it didn't have it for a long time and now it  
does. To pick another example, Python now has a bundled decimal  
class. Both of these are quite recent additions to popular languages.  
SQL supports it.  Visual Basic seems to.

> And even in super-OO Java you have to program differently if you're  
> going to play with polymorphic numbers than you would if you stuck  
> to ints and floats.
>
> I'd like to write a distributed OWL reasoner in Erlang. But  
> Javascript and C are perhaps more persuasive counterexamples to  
> your argument.

Javascript is a bit odd in not supporting integers either :) There  
are high quality decimal libraries for C++ (e.g., from IBM) and the  
committee is considering decimal support (<http://open-std.org/JTC1/ 
SC22/WG21/>)

>> I'd like to hear more about the second.
>
> The most efficient bignum and decimal libraries are an order of  
> magnitude slower than corresponding int and float calculations.  
> Hardware is good with ints and floats.

Sure, but I wouldn't have thought that this would be a significant  
factor. Obviously, if the user writes really big or really small  
numbers, you have to deal with them anyway. If you only have user-
defined types (no equations), then the operation (and number there  
of) is pretty limited (inclusion and cardinality testing). I'm a bit  
skeptical that it makes a huge practical difference. Perhaps because  
it doesn't come up too much.

Also, perhaps I misrecall, but don't you want arbitrarily sized floats?

"""For the restriction "forall R `xsd:float`" I simply bounded the  
real number line at the min and max values of floats. Still a dense,  
infinite number line, but with bounds. I hated this usage, however,  
and would prefer if it became illegal."""

So you did bound...but you "hate it"? Which, the bounds? the  
universal quantifier?

Implementations could always throw a warning or error if they hit a  
too large number.

>>> ---thus a vocabulary for what it means to "support" a numeric xsd  
>>> type for particular values would be useful.
>>
>> This is what we're after. Anything we spec will be tightly  
>> specced. At the moment, we only have required and optional as  
>> modalities of support. I think supporting various levels of  
>> precision  (or variant mapping) would be quite hard to understand.
>
> But presumably you're making clear that implementations which  
> implement some "optional" functionality, but do so in a way which  
> contradicts the optional semantics, are non-compliant.

That's always a problem with optional :(

> If so, then specifying what support for additional lexical  
> representations means (i.e. exact) would make clear that a product  
> which parsed `xsd:decimal` but internally converted to floating  
> point would not "support `xsd:decimal`" by the terms of the OWL 2.0  
> spec.

They can convert as long as the observable behavior is the same.

> The implementors could always claim "partial support", however.

If they are going to vary in observable ways, I would prefer that  
they would make that clear in documentation and by giving warnings. A  
"strict" mode would also be quite welcome to me as a user.

>> One this model, users would just have to decide between integers  
>> and reals. We could have quite a wide lexical space for reals (and  
>> even for integers, i.e., allow 1.0 to mean the integer 1).
>
> I'm getting really confused what you're talking about---constants  
> appearing in XML and RDF OWL 2.0 files should be typed; there's no  
> need at all to guess the type based on syntax.
>
> And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly  
> the same point on the real number line.

Sure.

But I was talking about owl:real. It seems reasonable to allow  
"1.0e0^^owl:real:" and "1^^owl:real". (xsd:integer could be a subtype  
of owl:real as well).

>> But "0.1"^^xsd:float would not be required, but also we wouldn't  
>> change the meaning along the lines you suggest (we'd just be  
>> silent about it). It's fairly simple to migrate old ontologies to  
>> the new one with a simple converter. If enough implementations did  
>> it silently, that would be information for a future group.
>
> No idea what this means. But I'm guessing I disagree with it.

Me too :)

Cheers,
Bijan.

Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
>>>> XSD offers a lexical syntax for points that happen to lie on the  

>>>> real number line
>>>
>>> It offers several and we're free to define one for owl:real. If we  
>>> use any decimal notation, we have exactness problems (e.g., 1/3),  
>>> but decimal is very user friendly. So, I was thinking that the  
>>> valid syntax for a real would be decimal floating points and  
>>> ratios of integers. We could include scientific notation as well.
>>
>> Why on earth would the OWL group come up with their own syntax for  
>> encoding numbers?
>
> I'm presuming we're sticking with the basic xsd framework. So types  
> have a lexical space and a values space.
NO. There is no lexical space to represent all of the reals. That's  
the whole point---the reals include lots and lots of values that  
cannot necessarily be represented lexically.

> So, owl:real has a value space of the reals. But what should the  
> lexical space be? I'd propose that at least the union of the xsd  
> numeric types lexical spaces be the lexical space for our new type.  
> I would add additional syntax for exact rationals (such as 1/3). The  
> first part is isomorphic to your proposal about xsd syntax, I believe.

Again, I have no intention of implementing rationals.

And if you want to come up with some syntax for encoding rational  
numbers in XML I suggest you join the XSchema working group, because  
that's way beyond the OWL charter.

>> The XSchema guys have already done that, and people have  
>> implemented parsers for their spec. If there's going to be a syntax  
>> for rationals or algebraics, then that seems to be right up their  
>> alley.
>
> They don't seem interested, alas.

And I very much hope the OWL WG takes that as a sign that they should  
be even less interested.

>>>> XSD datatypes have a lexical space (e.g., the syntax) and a value  
>>>> space. You are suggesting, I thought, that we adopt a value space  
>>>> that is the reals and something about using xsd syntax (i.e.,  
>>>> lexical spaces) for the syntax.
>>
>> For the syntax of particular values. I keep trying to stress that  
>> values spaces should be kept separate from the syntax used for  
>> particular values.
>
> Sure. But that's true in XSD as well.
No it's [not](http://www.w3.org/TR/xmlschema-2/#value-space): "Each  
value in the value space of a datatype is denoted by one or more  
literals in its ·lexical space·." In XSD the lexical and value spaces  
and very tightly bound together. This should *not* be true for OWL.

> From what I can tell, you want all the literals that have  
> "xsd:float" (to pick an example) to map to (a subset) of the reals  
> (as the value space) and constrain/enable certain syntax. So  
> "1.0"^^xsd:float would be a syntax error.

That looks like a valid [float][http://www.w3.org/TR/xmlschema-2/ 
#float] to me. But `"rob"^^xsd:float` looks like a syntax error.  
Again, all these syntax issues should be deferred to the XSD spec.

>> . I want `real` to be a value space with no lexical connotations.
>
> I'd be surprised if we could get consensus on abandoning the lexical  
> space/value space language and understanding. It's pretty deeply  
> embedded into RDF.

It's impossible to have a real number line and have lexical  
representations for all values.

The "value spaces" in OWL serve a fundamentally different purpose than  
the "value spaces" defined in XSD. See my first message. XSD is  
concerned with representing values. Future incarnations can add more  
representations for more values, and existing data sets can be  
seamlessly extended with these new values.

OWL is concerned with spaces of values. If the OWL 2.0 value space for  
numbers does not include, for example, the rationals, then the system  
can *not* be seamlessly extended to include the rationals, because  
they've already been excluded. The value spaces from XSD are  
inappropriate for OWL because they fail in exactly the wrong way. OWL  
extensions trim down value spaces. XSD extensions build up value spaces.

To make this clearer, perhaps we should abandon the "value space"  
terminology altogether and instead talk about OWL "data domains". I  
suggest that OWL have a string data domain and a number data domain.  
The integer data domain is a subset of the number data domain. There  
is absolutely no need for a float data domain. OWL implementations  
should support particular values encoded using the `xsd:int` and  
`xsd:float` lexical representations. These values are all in the  
number domain.

>> I want to be able to specify a particular point in this value space  
>> using a string such as "1.0e0^^xsd:float".
>
> Yeah, I'm kinda against that. But I would support "1.0e0^^owl:real".

That's craziness. You're crazy. Stop being crazy.

>> The XSD lexical forms are not "the lexical space for reals". There  
>> is no such thing as "the lexical space for reals".
>
> Bravo! ;)
>
>> There is such a thing as "the space of lexical representations  
>> which a conformant implementation must support for particular  
>> values in the real value space", but this space is much smaller  
>> than the real value space.
>
> Our initial proposal for owl:real is to support for syntax, pairs of  
> integers with the second being non-zero (i.e., standard fraction  
> syntax for rationals) and (at least) the algebraic reals for the  
> value space. If you don't have equations or special constants, you  
> can't address the irrationals or transcendentals anyway. We are  
> aiming to support some classes of equation, but only with rational  
> constants.
But why on earth would you cut down the value space at *all*? That  
just cuts off any possibility of future extension!

>>> If we want exactness for the rationals, we need either to allow  
>>> repeating (e.g., 0.333repeating) (usually done with a macron) or  
>>> fraction syntax (e.g., 1/3).
>>
>> I don't intend to support exactness for rationals. A conformant  
>> implementation should only be required to provide exact support for  
>> `xsd:int` and `xsd:float` values.
>
> I don't think that would fly.

Who is the vast army of users in need of support for exact rationals?  
I strongly strongly suspect that if they really existed they would  
have pushed on the XSchema folks to give them a lexical  
representation---XML is kind of big as a data representation language,  
you know.

> Also, perhaps I misrecall, but don't you want arbitrarily sized  
> floats?

An `xsd:float` has a limited size, by definition.

> """For the restriction "forall R `xsd:float`" I simply bounded the  
> real number line at the min and max values of floats. Still a dense,  
> infinite number line, but with bounds. I hated this usage, however,  
> and would prefer if it became illegal."""
>
> So you did bound...but you "hate it"? Which, the bounds? the  
> universal quantifier?

I hated that the user was saying `float` and I was interpreting it as  
"real between `FLT_MIN` and `FLT_MAX`". I hope OWL 2.0 allows only OWL  
data domains in such a context. (But of course an individual value is  
a valid data domain, and complex data domains could be built using  
facets with individual values.)

-rob

smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Bijan Parsia-3

On Jul 7, 2008, at 1:02 AM, Rob Shearer wrote:

>>>>> XSD offers a lexical syntax for points that happen to lie on  
>>>>> the real number line
>>>>
>>>> It offers several and we're free to define one for owl:real. If  
>>>> we use any decimal notation, we have exactness problems (e.g.,  
>>>> 1/3), but decimal is very user friendly. So, I was thinking that  
>>>> the valid syntax for a real would be decimal floating points and  
>>>> ratios of integers. We could include scientific notation as well.
>>>
>>> Why on earth would the OWL group come up with their own syntax  
>>> for encoding numbers?
>>
>> I'm presuming we're sticking with the basic xsd framework. So  
>> types have a lexical space and a values space.
>
> NO. There is no lexical space to represent all of the reals.

I didn't say or imply there was. There's, afaict, no requirement that  
the lexical space cover the entire mapping space. Indeed, in the  
current owl:real, where we are talking about a value space over the  
algebraic reals (which *are* denumerable), we only allow rational  
constants. (We need the additional reals as possible solutions to  
equations with rational constants.)

> That's the whole point---the reals include lots and lots of values  
> that cannot necessarily be represented lexically.

I'm skeptical that than's the whole point, as it doesn't seem relevant.

>> So, owl:real has a value space of the reals. But what should the  
>> lexical space be? I'd propose that at least the union of the xsd  
>> numeric types lexical spaces be the lexical space for our new  
>> type. I would add additional syntax for exact rationals (such as  
>> 1/3). The first part is isomorphic to your proposal about xsd  
>> syntax, I believe.
>
> Again, I have no intention of implementing rationals.
>
> And if you want to come up with some syntax for encoding rational  
> numbers in XML I suggest you join the XSchema working group,  
> because that's way beyond the OWL charter.

I'm not sure why you say that. Designing an OWL type seems well  
within our purview. Consider rdf:Literal.

>>> The XSchema guys have already done that, and people have  
>>> implemented parsers for their spec. If there's going to be a  
>>> syntax for rationals or algebraics, then that seems to be right  
>>> up their alley.
>>
>> They don't seem interested, alas.
>
> And I very much hope the OWL WG takes that as a sign that they  
> should be even less interested.

The reason (one memeber) gave (privately) is that they didn't think  
that reals beyond decimals were necessary for a schema language. I  
think we agree that they are for an ontology language. So, my  
conclusion is the opposite of your hope.

>>>>> XSD datatypes have a lexical space (e.g., the syntax) and a  
>>>>> value space. You are suggesting, I thought, that we adopt a  
>>>>> value space that is the reals and something about using xsd  
>>>>> syntax (i.e., lexical spaces) for the syntax.
>>>
>>> For the syntax of particular values. I keep trying to stress that  
>>> values spaces should be kept separate from the syntax used for  
>>> particular values.
>>
>> Sure. But that's true in XSD as well.
>
> No it's [not](http://www.w3.org/TR/xmlschema-2/#value-space): "Each  
> value in the value space of a datatype is denoted by one or more  
> literals in its ·lexical space·."

Oh, ick. I had interpreted that as contingent for the set defined,  
not as a general principle for all types in an extended system. Ick.

Yes, well, as the current design for owl:real shows, we are already  
ignoring this constraint :(

> In XSD the lexical and value spaces and very tightly bound  
> together. This should *not* be true for OWL.

Well, not exactly, but certainly moreso than I thought.

They seem to be loosening this in Schema 1.1:
        http://www.w3.org/TR/xmlschema11-2/#value-space
"""Each value in the value space of a ·primitive· or ·ordinary·  
datatype is denoted by one or more character strings in its ·lexical  
space·, according to ·the lexical mapping·; ·special· datatypes, by  
contrast, may include "ineffable" values not mapped to by any lexical  
representation. """
[snip]
>> I'd be surprised if we could get consensus on abandoning the  
>> lexical space/value space language and understanding. It's pretty  
>> deeply embedded into RDF.
>
> It's impossible to have a real number line and have lexical  
> representations for all values.

Yes, so we have to at least relax the Schema 1.0 constraint that  
every value have a corresponding literal. Thanks for pointing that out.

[snip]

> To make this clearer, perhaps we should abandon the "value space"  
> terminology altogether and instead talk about OWL "data domains". I  
> suggest that OWL have a string data domain and a number data  
> domain. The integer data domain is a subset of the number data  
> domain. There is absolutely no need for a float data domain. OWL  
> implementations should support particular values encoded using the  
> `xsd:int` and `xsd:float` lexical representations. These values are  
> all in the number domain.

This goes against existing implementation and use, wherein xsd:float  
is disjoint from xsd:int. (The non-real values of float are a problem  
as well.)

[snip]

>>> There is such a thing as "the space of lexical representations  
>>> which a conformant implementation must support for particular  
>>> values in the real value space", but this space is much smaller  
>>> than the real value space.
>>
>> Our initial proposal for owl:real is to support for syntax, pairs  
>> of integers with the second being non-zero (i.e., standard  
>> fraction syntax for rationals) and (at least) the algebraic reals  
>> for the value space. If you don't have equations or special  
>> constants, you can't address the irrationals or transcendentals  
>> anyway. We are aiming to support some classes of equation, but  
>> only with rational constants.
>
> But why on earth would you cut down the value space at *all*?

Once you let the value space vary from the lexical space, there's  
less need. I think it helps to have the concept if that's what you  
are actually using. At the time, we were motivated in part by not  
wanting to spook people.

(If our principle is the most broad relevant type, then complex seems  
to be the right supertype. The algebraic reals have a lot of nice  
properties which make them pretty suitable for our purposes.)

> That just cuts off any possibility of future extension!
[snip]

I'm not sure why. We're free to introduce new types or extend old types.

>> For the restriction "forall R `xsd:float`" I simply bounded the  
>> real number line at the min and max values of floats. Still a  
>> dense, infinite number line, but with bounds. I hated this usage,  
>> however, and would prefer if it became illegal."""
>>
>> So you did bound...but you "hate it"? Which, the bounds? the  
>> universal quantifier?
>
> I hated that the user was saying `float` and I was interpreting it  
> as "real between `FLT_MIN` and `FLT_MAX`". I hope OWL 2.0 allows  
> only OWL data domains in such a context.

So you do want arbitrarily sized floats. Ok.

> (But of course an individual value is a valid data domain, and  
> complex data domains could be built using facets with individual  
> values.)

We may be reaching diminishing returns on the public debate. We can  
continue in private if you like, and then summarize back.

Thanks for the discussion.

Cheers,
Bijan.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
>>>> The XSchema guys have already done that, and people have  

>>>> implemented parsers for their spec. If there's going to be a  
>>>> syntax for rationals or algebraics, then that seems to be right  
>>>> up their alley.
>>>
>>> They don't seem interested, alas.
>>
>> And I very much hope the OWL WG takes that as a sign that they  
>> should be even less interested.
>
> The reason (one memeber) gave (privately) is that they didn't think  
> that reals beyond decimals were necessary for a schema language. I  
> think we agree that they are for an ontology language. So, my  
> conclusion is the opposite of your hope.
Rational numbers, and linear equations, and n-ary data predicates, all  
seem *much* more relevant to data representation and model checking  
than satisfiability reasoning; these are systems people want to use to  
store and compute particular values based on input, not to check  
satisfiability. (The n-ary datatype use cases, for example, don't  
offer much insight into how such a feature could be used to draw  
valuable new inferences.) And yet the XSchema group---the data  
representation and model-checking crowd---decided that such notions  
were far too ambitious for even them.

Again, I urge the OWL working group to follow that example and focus  
on the small set of features which will actually benefit users, and  
make sure that they get those features right.

-rob


smime.p7s (3K) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Rob Shearer-4
In reply to this post by Bijan Parsia-3
>> The integer data domain is a subset of the number data domain.  
>> There is absolutely no need for a float data domain. OWL  
>> implementations should support particular values encoded using the  
>> `xsd:int` and `xsd:float` lexical representations. These values are  
>> all in the number domain.
>
> This goes against existing implementation and use, wherein xsd:float  
> is disjoint from xsd:int.

Cerebra did not make them disjoint. Neither does KAON2. And testing  
reveals that neither does FaCT++. The only reasoner I can find that  
makes them disjoint is Pellet. This sheds a lot of light on your  
definition of "existing implementation and use".

If you intend to attempt to enshrine bugs in Clark & Parsia products  
in the OWL standard, I suggest that you do it as a representative of  
Clark & Parsia and not as a representative of Manchester, which has an  
interest in FaCT++.

-rob


smime.p7s (3K) Download Attachment
12
Loading...