Appendix H: Internationalization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Appendix H: Internationalization

Alexandre Rademaker
I am about to finish the translation of our OpenWordNet-PT to RDF
integrating it with the original Princeton WordNet 3.0.

In appendix H of http://www.w3.org/TR/wordnet-rdf/:

"... Integration of WordNets implies creating mappings between
entities in the WordNets to indicate lexico-semantic relationships
between them, e.g. a property that signifies that the meanings of two
Synsets overlap. The entities that represent language concepts that
should be able to map are instances of the classes: Synset, WordSense
and Word..."

I can easily see the utility of an relation between Synsets and
WordSenses like "hasTranslation". But I can't see any use of relate
the words... Any idea?

Best,

Alexandre Rademaker
http://arademaker.github.com/


Reply | Threaded
Open this post in threaded view
|

Re: Appendix H: Internationalization

Aldo Gangemi-5
Dear Alexandre, that's good news.
What schema are you using for the RDF translation? The original RDF WordNet one? In that case, are you making any extension for e.g. pecial relations existing in OpenWordNet-PT?

Concerning Appendix H, it was made mainly as a pointer to future work on creating links between wordnets.
Definitely, the suggestion that entities involved in multi-language mappings include synsets, word senses, and words was pretty generic :).

I agree that Synset and WordSense mappings are the important ones. However, it's not crazy to conceive of multi-lingual mappings between words, e.g. historical (or synchronical) relations between words of different languages. An example is the English word "spider" that used to mean a "light carriage with big wheels" in 1879, but is currently used in Italian (with historical acceptation from English) with the meaning of "roadster".

In principle, you may represent such relation as between word senses "wordsense-spider@en@1879" and "wordsense-spider@it", but it'd be useful (specially in absence of "historical" wordnets) to have a relation between "word-spider@en" and "wordsense-spider@it" or even "word-spider@it" (for derivation aspects).

Best
Aldo

On 10 May 2012, at 22:57, Alexandre Rademaker wrote:

> I am about to finish the translation of our OpenWordNet-PT to RDF
> integrating it with the original Princeton WordNet 3.0.
>
> In appendix H of http://www.w3.org/TR/wordnet-rdf/:
>
> "... Integration of WordNets implies creating mappings between
> entities in the WordNets to indicate lexico-semantic relationships
> between them, e.g. a property that signifies that the meanings of two
> Synsets overlap. The entities that represent language concepts that
> should be able to map are instances of the classes: Synset, WordSense
> and Word..."
>
> I can easily see the utility of an relation between Synsets and
> WordSenses like "hasTranslation". But I can't see any use of relate
> the words... Any idea?
>
> Best,
>
> Alexandre Rademaker
> http://arademaker.github.com/
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Appendix H: Internationalization

Chris Welty-2
In reply to this post by Alexandre Rademaker

Alexandre,

One criticism of Wordnet synsets is that there is a binary classification that must happen, each word must either be a member of a synset or not.  In reality, there is really a sort of degree to which a word may belong to a synset, and this may be useful to capture especially when translating.

One example is "to know" in English and "savoir" vs. "connaitre" in french.  In basic French, we learn that Savoir is to know something, and connaitre is to know a person.  We were taught that what in english seems to be a single sense in french is two senses.

If English Wordnet had been constructed without knowledge of this distinction, there would be only one sense of "to know", which would then be translatable to two synsets in french, you would need to understand in this mapping that it is incomplete.

In gets more complicated when you realize that what we learned in basic french is not completely true, while we use the word "know" in English for knowing people, the best translation from french for "connaitre" is "to be familiar with".  Indeed, French uses the word that way - you can reconnais a place, a store, etc., it turns out to be something of a historical artifact that (american) English uses "to know" in this case more commonly.  But "familiar" do not belong to this (English) synset as strongly as "know" - it belongs, and would be understood, but based on the frequency of usage it would sound a little archaic and formal to use "familiar" instead of "know" for a person.

So, the point is, how can you capture this fact that subtleties of language can create partial mappings between them.

This is often easier to explain when you use something that has a scientific understanding as a range of values, like colors.  Take the english word "maroon", which is a color that lies somewhere on the spectrum between red and purple.  Would you lump this into the synset for red, or for purple?   Where do you draw the line in that synset, at a particular point in the spectrum?  What if you found that different languages and cultures draw their boundaries differently, like maybe Italians "see" red as a darker color that Germans, and the mapping of "maroon" into these languages is partial.

Does that make 'sense' ;) ?

-Chris

On 5/10/2012 4:57 PM, Alexandre Rademaker wrote:

> I am about to finish the translation of our OpenWordNet-PT to RDF
> integrating it with the original Princeton WordNet 3.0.
>
> In appendix H of http://www.w3.org/TR/wordnet-rdf/:
>
> "... Integration of WordNets implies creating mappings between
> entities in the WordNets to indicate lexico-semantic relationships
> between them, e.g. a property that signifies that the meanings of two
> Synsets overlap. The entities that represent language concepts that
> should be able to map are instances of the classes: Synset, WordSense
> and Word..."
>
> I can easily see the utility of an relation between Synsets and
> WordSenses like "hasTranslation". But I can't see any use of relate
> the words... Any idea?
>
> Best,
>
> Alexandre Rademaker
> http://arademaker.github.com/
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Appendix H: Internationalization

Gerard de Melo-2
Dear Chris,

Thanks for bringing up these interesting examples. I am one of the people
working with Alexandre.

I think the cases you describe actually point to the more general problems
we face whenever we attempt to discretize meaning into nicely manageable
and identifiable entries. These same issues appear in the form of
lumping vs.
splitting decisions when compiling monolingual dictionaries or when
defining resources on the Semantic Web (is my definition of Italy really the
same as the one used in your dataset?).

There is no simple answer to these problems. Some differences will be
considered irrelevant, and others will be considered important. Maroon,
magenta, fuchsia, and so on indeed all have separate WordNet entries, and
WordNet does distinguish several different senses of "know". For
cross-linguistic work, indeed new resources should be defined when
differences are deemed important.

I agree that RDF often pushes us towards binary yes/no choices.
Weights or probabilities might serve as a (very crude) approximation of
the gradability involved in such decisions.

Best regards,
Gerard

>
> Alexandre,
>
> One criticism of Wordnet synsets is that there is a binary
> classification that must happen, each word must either be a member of
> a synset or not.  In reality, there is really a sort of degree to
> which a word may belong to a synset, and this may be useful to capture
> especially when translating.
>
> One example is "to know" in English and "savoir" vs. "connaitre" in
> french.  In basic French, we learn that Savoir is to know something,
> and connaitre is to know a person.  We were taught that what in
> english seems to be a single sense in french is two senses.
>
> If English Wordnet had been constructed without knowledge of this
> distinction, there would be only one sense of "to know", which would
> then be translatable to two synsets in french, you would need to
> understand in this mapping that it is incomplete.
>
> In gets more complicated when you realize that what we learned in
> basic french is not completely true, while we use the word "know" in
> English for knowing people, the best translation from french for
> "connaitre" is "to be familiar with".  Indeed, French uses the word
> that way - you can reconnais a place, a store, etc., it turns out to
> be something of a historical artifact that (american) English uses "to
> know" in this case more commonly.  But "familiar" do not belong to
> this (English) synset as strongly as "know" - it belongs, and would be
> understood, but based on the frequency of usage it would sound a
> little archaic and formal to use "familiar" instead of "know" for a
> person.
>
> So, the point is, how can you capture this fact that subtleties of
> language can create partial mappings between them.
>
> This is often easier to explain when you use something that has a
> scientific understanding as a range of values, like colors.  Take the
> english word "maroon", which is a color that lies somewhere on the
> spectrum between red and purple.  Would you lump this into the synset
> for red, or for purple?   Where do you draw the line in that synset,
> at a particular point in the spectrum?  What if you found that
> different languages and cultures draw their boundaries differently,
> like maybe Italians "see" red as a darker color that Germans, and the
> mapping of "maroon" into these languages is partial.
>
> Does that make 'sense' ;) ?
>
> -Chris
>
> On 5/10/2012 4:57 PM, Alexandre Rademaker wrote:
>> I am about to finish the translation of our OpenWordNet-PT to RDF
>> integrating it with the original Princeton WordNet 3.0.
>>
>> In appendix H of http://www.w3.org/TR/wordnet-rdf/:
>>
>> "... Integration of WordNets implies creating mappings between
>> entities in the WordNets to indicate lexico-semantic relationships
>> between them, e.g. a property that signifies that the meanings of two
>> Synsets overlap. The entities that represent language concepts that
>> should be able to map are instances of the classes: Synset, WordSense
>> and Word..."
>>
>> I can easily see the utility of an relation between Synsets and
>> WordSenses like "hasTranslation". But I can't see any use of relate
>> the words... Any idea?
>>
>> Best,
>>
>> Alexandre Rademaker
>> http://arademaker.github.com/
>>
>>
>>


--
Gerard de Melo [[hidden email]]
http://www.icsi.berkeley.edu/~demelo/



Reply | Threaded
Open this post in threaded view
|

Re: Appendix H: Internationalization

Valeria de Paiva
In reply to this post by Aldo Gangemi-5
Dear Aldo,

Thank you for your informative message. I am one of the people working with Alexandre on OpenWordNet-PT. We don't have special relations in OpenWordNet-PT, only a field for comments, at the moment. and I'm sure you're right, people can think of several other interesting  relations, but we're trying to keep things as simple as we can. (also I believe we're using the original RDF Wordnet schema, but I'm based in California and Alexandre is based in Rio, so I may have the wrong end of the stick on this).

thank you,
Valeria de Paiva

On Mon, May 14, 2012 at 12:27 AM, Aldo Gangemi <[hidden email]> wrote:
Dear Alexandre, that's good news.
What schema are you using for the RDF translation? The original RDF WordNet one? In that case, are you making any extension for e.g. pecial relations existing in OpenWordNet-PT?

Concerning Appendix H, it was made mainly as a pointer to future work on creating links between wordnets.
Definitely, the suggestion that entities involved in multi-language mappings include synsets, word senses, and words was pretty generic :).

I agree that Synset and WordSense mappings are the important ones. However, it's not crazy to conceive of multi-lingual mappings between words, e.g. historical (or synchronical) relations between words of different languages. An example is the English word "spider" that used to mean a "light carriage with big wheels" in 1879, but is currently used in Italian (with historical acceptation from English) with the meaning of "roadster".

In principle, you may represent such relation as between word senses "wordsense-spider@en@1879" and "wordsense-spider@it", but it'd be useful (specially in absence of "historical" wordnets) to have a relation between "word-spider@en" and "wordsense-spider@it" or even "word-spider@it" (for derivation aspects).

Best
Aldo

On 10 May 2012, at 22:57, Alexandre Rademaker wrote:

> I am about to finish the translation of our OpenWordNet-PT to RDF
> integrating it with the original Princeton WordNet 3.0.
>
> In appendix H of http://www.w3.org/TR/wordnet-rdf/:
>
> "... Integration of WordNets implies creating mappings between
> entities in the WordNets to indicate lexico-semantic relationships
> between them, e.g. a property that signifies that the meanings of two
> Synsets overlap. The entities that represent language concepts that
> should be able to map are instances of the classes: Synset, WordSense
> and Word..."
>
> I can easily see the utility of an relation between Synsets and
> WordSenses like "hasTranslation". But I can't see any use of relate
> the words... Any idea?
>
> Best,
>
> Alexandre Rademaker
> http://arademaker.github.com/
>
>





--
Valeria de Paiva
http://www.cs.bham.ac.uk/~vdp/
http://valeriadepaiva.org/www/
Reply | Threaded
Open this post in threaded view
|

Re: Appendix H: Internationalization

Valeria de Paiva
In reply to this post by Chris Welty-2
Dear Chris,
Thank you for the interesting discussion. (I am another one of the people working with Alexandre, actually I guess I am the originator of the project, since I wanted to reproduce for Portuguese the work using the Bridge system that I worked on with Danny Bobrow, Ron Kaplan and the whole team of NLTT at PARC).

 indeed, while the binary classification of members of synsets can seem very coarse at times, we feel that  a first approximation to a resource just like the original Princeton WordNet for Portuguese would be very useful.  bootstrapping it from a translation from English seems the quick way to go about it, but it also leads into the difficult stuff you mention.

My take is  that we should try to produce a first version of something a bit like a Portuguese version of WordNet, but that then, having a coarse approximation, we need to get Brazilian lexicographers to do their  work. whether we can get them interested or not, we don't know yet...

Best regards,
Valeria

On Mon, May 14, 2012 at 6:19 AM, Chris Welty <[hidden email]> wrote:

Alexandre,

One criticism of Wordnet synsets is that there is a binary classification that must happen, each word must either be a member of a synset or not.  In reality, there is really a sort of degree to which a word may belong to a synset, and this may be useful to capture especially when translating.

One example is "to know" in English and "savoir" vs. "connaitre" in french.  In basic French, we learn that Savoir is to know something, and connaitre is to know a person.  We were taught that what in english seems to be a single sense in french is two senses.

If English Wordnet had been constructed without knowledge of this distinction, there would be only one sense of "to know", which would then be translatable to two synsets in french, you would need to understand in this mapping that it is incomplete.

In gets more complicated when you realize that what we learned in basic french is not completely true, while we use the word "know" in English for knowing people, the best translation from french for "connaitre" is "to be familiar with".  Indeed, French uses the word that way - you can reconnais a place, a store, etc., it turns out to be something of a historical artifact that (american) English uses "to know" in this case more commonly.  But "familiar" do not belong to this (English) synset as strongly as "know" - it belongs, and would be understood, but based on the frequency of usage it would sound a little archaic and formal to use "familiar" instead of "know" for a person.

So, the point is, how can you capture this fact that subtleties of language can create partial mappings between them.

This is often easier to explain when you use something that has a scientific understanding as a range of values, like colors.  Take the english word "maroon", which is a color that lies somewhere on the spectrum between red and purple.  Would you lump this into the synset for red, or for purple?   Where do you draw the line in that synset, at a particular point in the spectrum?  What if you found that different languages and cultures draw their boundaries differently, like maybe Italians "see" red as a darker color that Germans, and the mapping of "maroon" into these languages is partial.

Does that make 'sense' ;) ?

-Chris


On 5/10/2012 4:57 PM, Alexandre Rademaker wrote:
I am about to finish the translation of our OpenWordNet-PT to RDF
integrating it with the original Princeton WordNet 3.0.

In appendix H of http://www.w3.org/TR/wordnet-rdf/:

"... Integration of WordNets implies creating mappings between
entities in the WordNets to indicate lexico-semantic relationships
between them, e.g. a property that signifies that the meanings of two
Synsets overlap. The entities that represent language concepts that
should be able to map are instances of the classes: Synset, WordSense
and Word..."

I can easily see the utility of an relation between Synsets and
WordSenses like "hasTranslation". But I can't see any use of relate
the words... Any idea?

Best,

Alexandre Rademaker
http://arademaker.github.com/






--
Valeria de Paiva
http://www.cs.bham.ac.uk/~vdp/
http://valeriadepaiva.org/www/