Hi
The XSD1.1 DataTypes spec in the base64Binary section gives following pseudocode for calculating octet length of a base64Binary encoded string. 
1) lex2 := killwhitespace(lexform)  remove whitespace characters
2) lex3 := strip_equals(lex2)  strip padding characters at end 3) length := floor (length(lex3) * 3 / 4)  calculate length  My understanding is that, for a base64Binary encoded string, it's lexical length would be a multiple of 4 and it's octet length would be a multiple of 3.
As an example if we take a base64Binary encoded string, which doesn't contain whitespaces or padding chars(=), so that lexform is same as lex3 in above code. Now let us take a lex3 of length 10 then, according to above code, the octet length would be 7(not a multiple of 4).
Are octetlengths which are not multiple of 4, valid in case of base64Binary encoded string ? Also, what should be the formulae for calculating lexicallength from the octetlength of a base64Binary string ?
Should it be something like this: lexicallength := ceil( octetlength*4/3) If we take an example with octetlength=10, the lexicallength is not a multiple of 4.
I am clueless here. Appreciate your help on the same.  Best Regards, Satya Prakash Tripathi 
On Apr 9, 2011, at 2:49 PM, xmlplus custodians wrote: > Hi > > The XSD1.1 DataTypes spec in the base64Binary section gives following pseudocode for calculating octet length of a base64Binary encoded string. > >  > 1) lex2 := killwhitespace(lexform)  remove whitespace characters > 2) lex3 := strip_equals(lex2)  strip padding characters at end > 3) length := floor (length(lex3) * 3 / 4)  calculate length >  > > > My understanding is that, for a base64Binary encoded string, it's lexical length would be a multiple of 4 and it's octet length would be a multiple of 3. It's been a while since I read the base64 spec, but my recollection is that base64 encodes octet sequences of any length, not just octet sequences whose length is a multiple of three. The lexical length (ignoring whitespace) will indeed always be a multiple of four; the padding characters are added at the end in order to ensure that this is so. > > As an example if we take a base64Binary encoded string, which doesn't contain whitespaces or padding > chars(=), so that lexform is same as lex3 in above code. Now let us take a lex3 of length 10 then, > according to above code, the octet length would be 7(not a multiple of 4). Yes, precisely. If the lexical form, ignoring whitespace, is twelve characters long and the last two characters are equals signs, then what you have is two clusters of four characters, each of which encodes three octets, followed by a final cluster of two nonpadding characters, which encodes the final octet. > Are octetlengths which are not multiple of 4, valid in case of base64Binary encoded string ? Yes. > Also, what should be the formulae for calculating lexicallength from the octetlength of a base64Binary string ? > Should it be something like this: > > lexicallength := ceil( octetlength*4/3) > > If we take an example with octetlength=10, the lexicallength is not a multiple of 4. > I am clueless here. Appreciate your help on the same. In base64 encoding, any input octet stream is subdivided into 24bit (i.e. threeoctet) groups, each of which is encoded in four base64 digitis. If there are fewer than 24 bits in the final group of bits, then padding characters are used. So if you wish to calculate the minimum length of the base64 encoding for an arbitrary sequence of octets (i.e. the length of an encoding without any white space), then I think the formula you want will be 4 * ceil( octetlength / 3). It is a good idea, though, to follow the recommendations in the RFC for adding whitespace and newlines; it makes debugging problems easier, if nothing else. You may find it helpful to read RFC 3548, which is normatively referred to from the XSD spec. http://www.ietf.org/rfc/rfc3548.txt I hope this helps.  **************************************************************** * C. M. SperbergMcQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net **************************************************************** 
> You may find it helpful to read RFC 3548, which is normatively referred > to from the XSD spec. > > http://www.ietf.org/rfc/rfc3548.txt > It may also be worth noting that XSD requires strict conformance to the RFC, whereas most base64 implementations available "in the wild" are liberal in what they accept, for example in areas such as the exact number of trailing "=" signs. Michael Kay Saxonica 
In reply to this post by C. M. SperbergMcQueen2
Hi SperbergMcQueen,
The concept of 3 octets accommodating 4 base64 chars, is what gave me a wrong idea that octetlengths have to be a multiple of 3. I guess, I had not accounted for use of padding chars while encoding to base64. With the use of 1 or 2 padding chars, the base64 encoded string, when strippedoff of whitespaces, would always be of a length multiple of 4. It's clear now!
Thanks for the detailed and patient reply! It was both, very insightful and helpful.  Best Regards, Satya Prakash Tripathi On Mon, Apr 11, 2011 at 9:50 PM, C. M. SperbergMcQueen <[hidden email]> wrote:

In reply to this post by Michael Kay
Mike,
I agree. It is evident that only 1 or 2 trailing "=" chars should be allowed. I am assuming that, this fact can be deduced too, for those who prefer intuitive ideas. In case we end up with fewer than 24 bits (of octets) in the last group, ie. either 8 bits or 16 bits, then they would encode to 2 and 3 base64 chars respectively. Thus the need of exactly 2 and 1 padding(=) chars in those cases to make up for the 4char base64 groups.

Best Regards, Satya Prakash Tripathi On Mon, Apr 11, 2011 at 11:03 PM, Michael Kay <[hidden email]> wrote:

Free forum by Nabble  Edit this page 