[Bug 27001] New: Terminology: identity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 27001] New: Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

            Bug ID: 27001
           Summary: Terminology: identity
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XQuery 3.1
          Assignee: [hidden email]
          Reporter: [hidden email]
        QA Contact: [hidden email]

Michael Sperberg-McQueen is asking us to consider the terminology we use
surrounding identity.  I am splitting this off from Bug 26958 so that we can
have the terminological discussion separately.

(In reply to C. M. Sperberg-McQueen from comment #8)

> I am generally sympathetic to Michael Kay's request that we conduct our
> discussion in terms of data independence and not by splitting hairs over the
> meaning of terms.  I apologize, therefore, for commenting solely on the
> question of terminology and not on the questions of design.  My excuse is
> that I can't contribute to any design discussion if I cannot understand what
> people are saying, and the use of the unqualified term "identity" to mean
> solely "persistent identity across mutation or update" (instead of what I
> understand "identity" to mean) makes it very hard for me to follow some of
> the discussion here.  Also, since I seem to be responsible for making JR
> self-conscious about his usage of the term, I would like to try to show that
> a less misleading usage is possible.
>
> JR asks, in the initial description of the issue:
>
>     I do not believe that the value of $z should be changed, so
>     I think that we should use copy semantics here.  Is there a
>     good way to say this without referring to identity?
>
> Yes, there are plenty of ways to say it without any use of the term
> "identity".  There are also plenty of ways to say it that use the term
> "identity" in its conventional English sense, without any notion that
> "identity" applies only to complex mutable objects and does not apply to
> (say) the integers.  
>
> By "identity" I believe normal English usage means either (a) similarity
> among distinct objects (as in "identical twins") or (b) the property of
> being itself and being distinct from other things.  We really do not want
> sense (a) here or elsewhere.  In sense (b), every thing which we can
> identify necessarily has "identity"; saying that maps, arrays, and elements
> have identity, therefore, is true but not particularly helpful, since it
> doesn't help distinguish them from other constructs in our data model or our
> languages.  What is at issue here, I think, is that we envisage having
> operators whose results depend only on the identity of the maps, arrays, or
> elements to which the operators are applied, or (roughly the same thing in
> different words) we envisage having operators which expose the identity of
> maps and arrays in much the same way that 'is' and '<<' and '>>' expose the
> identity of nodes.
>
> To test my claim that we can express what we need to express without using
> the term "identity" in the ways I continue to object to, let me suggest
> wordings for some sentences which, I believe, accurately convey the intended
> meaning.
>
>   - For "Suppose we ultimately decide that maps and arrays have identity,"
> read "Suppose we ultimately decide to expose the identity of maps and
> arrays".
>
>   - For "(in pseudocode, assuming maps and arrays do have identity)" read
> "(in pseudocode)".  
>
>   - For "Elements do have identity" read "Elements have node identity".
>
>   - For "creating a GUID to represent the identity of a map" read "creating
> a GUID to represent the identity of a map" (i.e. no change is needed).
>
>   - For "to change the semantics of our languages in ways that lose
> identity", I do not know what to write, because I'm not sure what's being
> said.
>
>   - For "This implies identity" read, perhaps, "This implies some sort of
> identity across updates".
>
> None of the references to object identity, preserving identity, exposing
> identity, maintaining identity, or changing the identity of nodes needs
> revision, because all of them make perfect sense when "identity" is
> understand as the property of things which makes them identical to
> themselves and different from other things.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #1 from Jonathan Robie <[hidden email]> ---
(In reply to Jonathan Robie from comment #0)
> JR asks, in the initial description of the issue:
>
>     I do not believe that the value of $z should be changed, so
>     I think that we should use copy semantics here.  Is there a
>     good way to say this without referring to identity?

FWIW, I was not asking whether we could create new terminology, I was asking
whether we needed to have something analogous to "node identity" for maps and
arrays in order to distinguish copy semantics from reference semantics in a
constructor.  We rely on node identity for this in the semantics of element
constructors, attribute constructors, etc.

> None of the references to object identity, preserving identity, exposing
> identity, maintaining identity, or changing the identity of nodes needs
> revision, because all of them make perfect sense when "identity" is
> understand as the property of things which makes them identical to
> themselves and different from other things.

I don't think this is clear about what it means by "identical to themselves" -
does that mean having the same values? Is an object still "identical to itself"
if its value changes?  What about "different from other things" - is a node N
"different from" an node O if they have exactly the same values?

In the status quo terminology, each node N has a unique node identity that
allows it to be distinguished from another node O, even if (1) N has exactly
the same values as O, or (2) the values of N change.

Applied to nodes, value comparisons and general comparisons are based on
values, not node identity. Node comparisons are based on node identity or
document order.  PULs identify changes to nodes by identity.

> Yes, there are plenty of ways to say it without any use of the term
> "identity".  There are also plenty of ways to say it that use the term
> "identity" in its conventional English sense, without any notion that
> "identity" applies only to complex mutable objects and does not apply to
> (say) the integers.  

Extending the term "identity" to integers leads to two different meanings of
"identity", and I find that confusing.

I also find your definition of identity as "the property of things which makes
them identical to themselves and different from other things" confusing. I
prefer your definition (b) here:

> By "identity" I believe normal English usage means either (a) similarity
> among distinct objects (as in "identical twins") or (b) the property of
> being itself and being distinct from other things.

Our specification uses "identity" much as Grady Booch does when he says:

<quote source="Object-Oriented Analysis and Design with Applications">
An object is an entity that has state, behavior, and identity.
</quote>

The XDM does not have behavior, but it does have state and identity.

<quote source="Object-Oriented Analysis and Design with Applications">
The state of an object encompasses all of the (usually static) properties of
the object plus the current (usually dynamic) values of each of these
properties.
</quote>

<quote source="Object-Oriented Analysis and Design with Applications">
Identity is that property of an object which distinguishes it from all other
objects.
</quote>

Let's consider this with a few examples:

Example 1: Elements

$a := <i>1</i>
$b := <i>1</1>

Are $a and $b "identical"?  We don't license either question. We can ask if
they are the same node, which is equivalent to asking if they have the same
identity. They are two different nodes.  We can also ask if they have the same
value, and they do.  These two questions must be distinguished.

Example 2: Integers

$a := 1
$b := 1

Again, are $a and $b "identical"? They have the same value.  We can't really
ask if they have the same identity - that would be equivalent to asking if they
are "the same 1", as opposed to "different 1s", which is a rather odd question
to ask.

>   - For "Suppose we ultimately decide that maps and arrays have identity,"
> read "Suppose we ultimately decide to expose the identity of maps and
> arrays".

This does not seem clearer to me.  We are asking whether two instances of a map
or array that have the same values are distinguishable.

Example 3: Maps

$a := { "one" : 1 }
$b := { "one" : 1 }

Do $a and $b refer to "the same map" or "different maps"?

We could decide that our data model and our language do not license the
question, as for integers - maps are just values, we can ask if they have the
same value or not.  Or we could decide that they do license the question - maps
can be distinguished from each other. If they are not distinguishable, there is
no identity to expose.

>   - For "(in pseudocode, assuming maps and arrays do have identity)" read
> "(in pseudocode)".  

That does not seem clearer to me. It loses the assumption behind the example.
Under a different assumption, the behavior would be different.

>   - For "Elements do have identity" read "Elements have node identity".

Both statements are true.

>   - For "creating a GUID to represent the identity of a map" read "creating
> a GUID to represent the identity of a map" (i.e. no change is needed).

Because a GUID is a value, this is not identity in the sense of node identity.
If the value of the GUID were to change, or two different maps were assigned
the same GUID, we would lose the ability to distinguish maps independently of
their values.

That may be perfectly acceptable, but we should not blur these distinct uses of
the term "identity".

>   - For "to change the semantics of our languages in ways that lose
> identity", I do not know what to write, because I'm not sure what's being
> said.

Read "lose the ability to distinguish maps or arrays independently of their
values".

>   - For "This implies identity" read, perhaps, "This implies some sort of
> identity across updates".

Again, "identity" is a shorthand for the ability to distinguish maps or arrays
independently of their values.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

C. M. Sperberg-McQueen <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #2 from C. M. Sperberg-McQueen <[hidden email]> ---
Thank you for the quotations from Grady Booch.  A couple questions occur to me
in that connection.

  - You and others have argued in the past that we must avoid the term "object
identity" because we have no objects and are not defining an OO language.
Given that premise, is it not an inconsistency on your part to suggest that we
should follow what you suggest is an explicitly OO usage of the term
"identity"?

  - The quotations from Booch seem to me perfectly normal instantiations of the
definitions of "identity" found in standard dictionaries of English as "the
condition of being ... itself, and not another" (this formulation from American
College Dictionary, ed. Clarence Barnhart [New York:  Random House, 1947]).
None of them seem to me to license your conclusion that integers and other
immutable things lack identity.  On the contrary, they also can be
distinguished from all other things, and thus they seem to fit his
characterization of identity.  Does he elsewhere say that integers have no
identical, or that 1 is not identical to 1?  Or do you believe that the
quotations you give license those conclusions?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #3 from Jonathan Robie <[hidden email]> ---
(In reply to C. M. Sperberg-McQueen from comment #2)
> Thank you for the quotations from Grady Booch.  A couple questions occur to
> me in that connection.
>
>   - You and others have argued in the past that we must avoid the term
> "object identity" because we have no objects and are not defining an OO
> language.  Given that premise, is it not an inconsistency on your part to
> suggest that we should follow what you suggest is an explicitly OO usage of
> the term "identity"?

Our nodes have state (properties and values) and identity.  They do not have
behavior.

Our specifications talk about properties, values, and identity with the same
meaning as that of the OO model, and have done so for a very long time.  The
phrase "Each node has a unique node identity" goes back to 2002, and has been
used in two recommendations.

>   - The quotations from Booch seem to me perfectly normal instantiations of
> the definitions of "identity" found in standard dictionaries of English as
> "the condition of being ... itself, and not another" (this formulation from
> American College Dictionary, ed. Clarence Barnhart [New York:  Random House,
> 1947]).  None of them seem to me to license your conclusion that integers
> and other immutable things lack identity.  On the contrary, they also can be
> distinguished from all other things, and thus they seem to fit his
> characterization of identity.  Does he elsewhere say that integers have no
> identical, or that 1 is not identical to 1?  Or do you believe that the
> quotations you give license those conclusions?

To Booch, an object has identity, state and behavior. State has properties and
values.  An integer is a value.  

His model does not use the term 'identity' with respect to integers.  Neither
does ours. To do so, we would have to define what we mean by it. Our
specifications have no need to define identity for integers, because the
identity of an integer is indistinguishable from its value.  The integer 1 is
no longer the integer 1 if you change its value to 2.  Introducing a concept
like this would be confusing terminology for people who use 'identity' in the
sense of 'object identity' or 'node identity', and I don't think it adds value.

The word "identical" is not a technical term that is defined in our model. I
can't answer your question unless we define it.  If it means "has the same
value for all properties", I would answer it one way.  If it means "has the
same value for all properties and the same unique identity", I would answer it
the other way.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #4 from C. M. Sperberg-McQueen <[hidden email]> ---
A note on a point of detail, for the record.  In comment 3, Jonathan Robie
writes

   The word "identical" is not a technical term that is defined in our model.

I'm assuming that "in our model" here means "in our specifications". (If not,
this remark will be irrelevant.)

In context, I believe JR is referring to the term "identical" as applied to
values.  But "identical" is defined for values in our specifications, in
section 1.6.4 Properties of functions [1] of Functions and Operators.  And in
section 2.3 Node Identity [2], the XDM spec answers explicitly the question JR
declines to answer.

[1]
https://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-31/html/Overview.html#properties-of-functions

[2]
https://www.w3.org/XML/Group/qtspecs/specifications/xpath-datamodel-31/html/Overview.html#node-identity

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

Liam R E Quin <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #5 from Liam R E Quin <[hidden email]> ---
A lot of the rhetoric around identity has of course come from implementation of
programming languages - in C and C++ style languages identity is usually
implemented as "machine address" - but atomic values such as an integer or
character or floating-point number don't have a machine address, you can't
write 6 to the location where 42 is defined since there is no such location:
indirection is not generally used to access atomic values in such languages.

The pragmatic use for identity becomes, "identity is the property that lets you
distinguish two or more things efficiently, and gives you a handle to a thing
that may change over time". That does not define identity, of course, except
through its properties.

This is not the same usage as mathematical identity, but is very common in
programming language design and specification. Not all languages use the term
"identity" for this concept, however. Perhaps we should rather say haecceity,
"this-one-ness".

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #6 from Jonathan Robie <[hidden email]> ---
(In reply to Liam R E Quin from comment #5)

> This is not the same usage as mathematical identity, but is very common in
> programming language design and specification. Not all languages use the
> term "identity" for this concept, however. Perhaps we should rather say
> haecceity, "this-one-ness".

Not all programming languages use the term 'haecceity' for this either ;->

But I agree that this discussion is parallel to the philosophical discussion of
haecceity and quiddity - see
http://plato.stanford.edu/entries/medieval-haecceity/ for a good introduction.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #7 from Jonathan Robie <[hidden email]> ---
(In reply to C. M. Sperberg-McQueen from comment #4)

> In context, I believe JR is referring to the term "identical" as applied to
> values.  But "identical" is defined for values in our specifications, in
> section 1.6.4 Properties of functions [1] of Functions and Operators.  And
> in section 2.3 Node Identity [2], the XDM spec answers explicitly the
> question JR declines to answer.

The term "identical" in Functions and Operators is defined in the context of
saying whether two function calls return the same result, but if we want it as
a defined term, perhaps it should be defined with a wider scope.

The XDM is indeed quite explicit on this:

Each node has a unique identity. Every node in an instance of the data model is
unique: identical to itself, and not identical to any other node. (Atomic
values do not have identity; every instance of the value “5” as an integer is
identical to every other instance of the value “5” as an integer.)

Indeed, that paragraph seems quite clear to me.

--
You are receiving this mail because:
You are the QA Contact for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

Ghislain Fourny <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #8 from Ghislain Fourny <[hidden email]> ---
The way I am looking at identity, in this case, is purely from a query/result
perspective, that is, about what is visible or exposed to the user. As opposed
to any kind of physical identity/address in memory. That is, I define identity
not in absolute terms, but "axiomatically", like you can define what a line and
point are in geometry, only by expressing axioms using these words (two points
unique describe a line etc). I remember from my childhood a book in which
Euclidian axioms were reformulated with camels and asparagus rather than lines
and points, to illustrate the separation between words and meaning :-)

In other words:

1. For XML nodes, I view identity in terms of the "is" operator's returning
true or false (and document order operators).

2. If you include updates (be it copy-modify-return or scripting), identity is
also exposed, in that two nodes are identical if, applying updates to the one,
you see them on the other node as well.

3. If you include a persistent layer, you might notice identity exposure via a
change of behavior w.r.t. the above definitions ("is" was returning true and
now returns false, or applying updates had and no longer has an effect at
several places in the structure, etc).


In the case of maps/arrays, there is no "is" or document order comparison
operators, so only updates and persistence reveal the "identity". I think that
in that case the definitions above apply as well.


I'm not 100% sure whether staying at the language semantics level is
sufficient, but at least it was to me so far. I am unsure where invoking
physical-level or OO-programming-level machinery helps or makes the matter more
complicated here.

To put it simply, my feeling is that the specifications should still be
implementable if we used the word "asparagus" instead of "identity" everywhere
:-)

Just my naive perspective. I hope it helps.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #9 from Jonathan Robie <[hidden email]> ---
I'm still not sure if there's a problem that needs to be solved in our
specifications.

Is the current terminology problematic? If so, why? If not, I'm in favor of
sticking with status quo terminology that has been used since 2002 (13 years
now).

Do we need a better description of these terms in our specifications?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #10 from Ghislain Fourny <[hidden email]> ---
I think I am fine with the existing terminology as well.

Although I do feel like the definition here:


"Each node has a unique identity. Every node in an instance of the data model
is unique: identical to itself, and not identical to any other node."


is cyclic and informal ("itself", "identical" and "other" kind of recursively
rely on the very term "identity" being defined here). But it feels like it's
the "is" operator that is implicitly hidden in this sentence so I am fine with
it. I wouldn't mind making it explicit though.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

Michael Kay <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #11 from Michael Kay <[hidden email]> ---
How about: "A node has a hidden property referred to as its identity. Many
operations that construct new nodes are defined to return a node whose identity
is distinct from that of any other node. The identity of nodes is exposed in a
number of ways:

* When two expressions return nodes, it is possible to compare the identity of
the nodes they return using the "is" operator. If f() is an operation defined
to construct new nodes, then the result of the expression "f() is f()" is
false.

* Some operators, for example the "union" operator and the path operator "/",
are defined to eliminate duplicate nodes, meaning that there will never be two
items at different positions in the result sequence that are nodes with the
same identity.

* The function fn:generate-id takes a node as argument: it is guaranteed that
when two nodes have the same identity, fn:generate-id applied to those nodes
will return equal strings (compared using the Unicode Codepoint Collation), and
that when they have different identity, fn:generate-id will return unequal
strings.

* In XQuery Update, it is possible to modify properties of a node (for example,
adding or removing children, or changing the string value) without changing the
node's identity. The semantics of update rely heavily on the concept of node
identity: for example, adding two attributes to the same node is fundamentally
different from adding them to different nodes.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #12 from Ghislain Fourny <[hidden email]> ---
I like Mike's suggestion. Thank you for also adding other expressions exposing
identity such as duplicate elimination or ID generation.

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

--- Comment #13 from Jonathan Robie <[hidden email]> ---
I like Mike's suggestion too.

Which specification should this text live in?

--
You are receiving this mail because:
You are the QA Contact for the bug.

Reply | Threaded
Open this post in threaded view
|

[Bug 27001] Terminology: identity

Bugzilla from bugzilla@jessica.w3.org
In reply to this post by Bugzilla from bugzilla@jessica.w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27001

Jonathan Robie <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #14 from Jonathan Robie <[hidden email]> ---
The Working Group agrees to close this in favor of the resolution of
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27040.

--
You are receiving this mail because:
You are the QA Contact for the bug.