.NET Forum / Languages / C# / July 2007
Base64 question
|
|
Thread rating:  |
Jim Brandley - 07 Jul 2007 23:41 GMT I need to append a short ciphertext string as a query variable encoded so it's valid for a URL. After encryption, I convert the bytes to Base64. However, the result includes characters that are invalid for a URL, notably '+' symbols. So, I have to cycle the output string through HttpUtility.UrlEncode(). That takes time. I wrote my own URL-safe Base64 converter in C#, that's about as lean as I can make it. It is much slower (about 6 times) than the the one provided. However, it runs in about 70% of the time required to use the standard Base64 converter followed by a trip through UrlEncode().
I am using .Net 2.0, and I have not found a way to coerce the built in Base64 converter to use a character set that could avoid the trip through UrlEncode. Am I missing anything? If not, is there any way to add this capability to a future release?
Thanks,
Jim Brandley
Arne Vajhøj - 08 Jul 2007 00:18 GMT > I need to append a short ciphertext string as a query variable encoded so > it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 10 lines] > UrlEncode. Am I missing anything? If not, is there any way to add this > capability to a future release? I find it difficult to believe that URL encoding could have a noticeable impact on total performance.
Arne
Jim Brandley - 08 Jul 2007 01:54 GMT According to the Stopwatch class it did.
>> I need to append a short ciphertext string as a query variable encoded so >> it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 15 lines] > > Arne Arne Vajhøj - 08 Jul 2007 03:19 GMT >>> I need to append a short ciphertext string as a query variable encoded so >>> it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 12 lines] >> I find it difficult to believe that URL encoding could have a >> noticeable impact on total performance.
> According to the Stopwatch class it did. Stopwatch a load simulator program ?
Arne
Jim Brandley - 08 Jul 2007 03:28 GMT It is a new (to 2.0) class in System.Diagnostics. It is easy to use and is useful when comparing the performance of different approaches to solving a problem.
>>>> I need to append a short ciphertext string as a query variable encoded >>>> so it's valid for a URL. After encryption, I convert the bytes to [quoted text clipped - 18 lines] > > Arne Arne Vajhøj - 08 Jul 2007 03:47 GMT >>>>> I need to append a short ciphertext string as a query variable encoded >>>>> so it's valid for a URL. After encryption, I convert the bytes to [quoted text clipped - 15 lines] >> Stopwatch a load simulator program ? > It is a new (to 2.0) class in System.Diagnostics. It is easy to use and is
> useful when comparing the performance of different approaches to solving a
> problem. Is it ?
Try think about this.
You can do about 1 million conversions to base64 of a small string in 1 second
=>
If your web server is CPU bound at about 1000 requests/second, then the base64 conversion is using 0.1% of your CPU and something else is chewing the other 99.9%.
Arne
Jim Brandley - 08 Jul 2007 04:25 GMT I did not mean to imply this was a bottleneck. I strive to prevent the creation of bottlenecks - easier to do that than track them down later. I'm working on a very large (to me anyway - approx 2M lines of C#, not counting aspx and ascx pages) web app for intranets. Pages are generated with maybe 2% static text and 98% dynamic, and can have 1500 to 1700 users at any given time. It is primarily presenting and recording real-time information in large manufacturing environments.
Responsiveness is a big deal for our customers. I spend all my time in the business objects, data layer and writing SQL. I very seldom do anything with screens, except present the information they need for binding. Any time I write a bit of code that gets executed with any frequency, I try to find the time to analyze it carefully and shave whatever I can.
>>>>>> I need to append a short ciphertext string as a query variable >>>>>> encoded so it's valid for a URL. After encryption, I convert the [quoted text clipped - 34 lines] > > Arne Arne Vajhøj - 09 Jul 2007 02:16 GMT > "Arne Vajhøj" <arne@vajhoej.dk> wrote in message >> You can do about 1 million conversions to base64 of [quoted text clipped - 7 lines] > I did not mean to imply this was a bottleneck. I strive to prevent the > creation of bottlenecks - easier to do that than track them down later. I'm
> working on a very large (to me anyway - approx 2M lines of C#, not counting
> aspx and ascx pages) web app for intranets. Pages are generated with maybe
> 2% static text and 98% dynamic, and can have 1500 to 1700 users at any given
> time. It is primarily presenting and recording real-time information in > large manufacturing environments. > > Responsiveness is a big deal for our customers. I spend all my time in the
> business objects, data layer and writing SQL. I very seldom do anything with
> screens, except present the information they need for binding. Any time I
> write a bit of code that gets executed with any frequency, I try to find the
> time to analyze it carefully and shave whatever I can. I still don't think it is worth it.
You should write 95%-98% of your code with priority of easy maintenance and then optimize the 2%-5% of your code that has been proven to impact performance for speed.
Writing clever code that optimizes stuff that does not need to be optimized does not reduce hardware costs but will increase maintenance costs dramatically.
Simple code is usually better than clever code when we talk business.
I used to do a lot that type of micro optimizations in the 1980's. But not any more.
I think you should use the framework methods and just consider the optimized code an interesting academic exercise.
Arne
Peter Bromberg [C# MVP] - 08 Jul 2007 01:56 GMT Jim, You could consider using HEX instead. This article may provide some ideas: http://www.eggheadcafe.com/articles/20060427.asp
-- Peter Site: http://www.eggheadcafe.com UnBlog: http://petesbloggerama.blogspot.com BlogMetaFinder(BETA): http://www.blogmetafinder.com
> I need to append a short ciphertext string as a query variable encoded so > it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 14 lines] > > Jim Brandley Jim Brandley - 08 Jul 2007 02:08 GMT Thanks for the response. I already have code in place to convert byte arrays to hex char arrays. It's fast too. The problem is that it increases the length of the ciphertext by 50%, and increases the risk of exceeding the length legal for URLs.
> Jim, > You could consider using HEX instead. This article may provide some ideas: [quoted text clipped - 25 lines] >> >> Jim Brandley Jim Brandley - 08 Jul 2007 02:13 GMT Peter - I did not notice your name when I responded to your previous post. I have read many of your articles. I like the way you write, and I appreciate your contribution to the knowledgebase available on the web.
Jim
> Jim, > You could consider using HEX instead. This article may provide some ideas: [quoted text clipped - 25 lines] >> >> Jim Brandley Arne Vajhøj - 08 Jul 2007 03:22 GMT > I need to append a short ciphertext string as a query variable encoded so > it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 5 lines] > the time required to use the standard Base64 converter followed by a trip > through UrlEncode(). I believe that + is the only non URL valid character in base64 output.
Why not a simple String Replace ?
> I am using .Net 2.0, and I have not found a way to coerce the built in > Base64 converter to use a character set that could avoid the trip through > UrlEncode. Am I missing anything? If not, is there any way to add this > capability to a future release? Base64 is a standard. It is not common to allow mocking with a standard.
Arne
Jim Brandley - 08 Jul 2007 03:25 GMT I'll try that and see what it costs. I was hoping to avoid another iteration through the characters in the string.
>> I need to append a short ciphertext string as a query variable encoded so >> it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 18 lines] > > Arne Jim Brandley - 08 Jul 2007 03:46 GMT Arne - That was faster - Thanks for the idea. However, Base64 is also sending out the slash '/' character - that means a second pass with string.Replace().
BTW - I agree that altering something that complies with a standard is a bad thing to do. I was on an ANSI committee years ago, and I know why they are built the way they are. However, supplementing that method with an optimized conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name would convey the reason for the existance of the method, along with a pretty good idea of what the output might be. Just a thought.
Jim
> I'll try that and see what it costs. I was hoping to avoid another > iteration through the characters in the string. [quoted text clipped - 21 lines] >> >> Arne Peter Bromberg [C# MVP] - 09 Jul 2007 01:22 GMT Jim, As Jon Skeet pointed out, modifying the Framework System.Convert classes may be the way to go here. A quick decompilation of the System.Convert Base64 methods reveals that : 1) they use unsafe code, which probably accounts for the speed factor. 2) There is a char[] Base64Table used.
So, you could decompile this, create your own (say, Convert.ToBase64StringUrlSafe) method, and all you would need to do is change the values in the Base64table char[] array. Peter
 Signature Site: http://www.eggheadcafe.com UnBlog: http://petesbloggerama.blogspot.com BlogMetaFinder(BETA): http://www.blogmetafinder.com
> Arne - That was faster - Thanks for the idea. However, Base64 is also > sending out the slash '/' character - that means a second pass with [quoted text clipped - 34 lines] > >> > >> Arne Jim Brandley - 09 Jul 2007 03:27 GMT Thanks Peter. I'll look into that.
> Jim, > As Jon Skeet pointed out, modifying the Framework System.Convert classes [quoted text clipped - 59 lines] >> >> >> >> Arne Arne Vajhøj - 09 Jul 2007 02:09 GMT > BTW - I agree that altering something that complies with a standard is a bad > thing to do. I was on an ANSI committee years ago, and I know why they are > built the way they are. However, supplementing that method with an optimized > conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name > would convey the reason for the existance of the method, along with a pretty > good idea of what the output might be. Just a thought. If you insist in pursuing the idea, then there are some code attached below which is the fastest code I can write without unsafe code.
Arne
==================================================='
public class Base64 { private static char[] EncVals = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".ToCharArray(); private static int[] DecVals; static Base64() { DecVals = new int[128]; for(int i = 0; i < 64; i++) { DecVals[EncVals[i]] = i; } } public string Encode(byte[] b) { int len = (b.Length * 8 + 5) / 6; int extra = 3 - (len + 3) % 4; char[] res = new char[len + extra]; int p = b.Length - b.Length % 3; int ix = 0; int tmp; for(int i = 0; i < p; i += 3) { tmp = (b[i] << 16) | (b[i + 1] << 8) | b[i + 2]; res[ix + 3] = EncVals[tmp & 0x3F]; res[ix + 2] = EncVals[(tmp >> 6) & 0x3F]; res[ix + 1] = EncVals[(tmp >> 12) & 0x3F]; res[ix] = EncVals[tmp >> 18]; ix += 4; } if(extra == 1) { tmp = (b[p] << 16) | (b[p + 1] << 8); res[ix + 3] = '='; res[ix + 2] = EncVals[(tmp >> 6) & 0x3F]; res[ix + 1] = EncVals[(tmp >> 12) & 0x3F]; res[ix] = EncVals[tmp >> 18]; } else if(extra == 2) { tmp = b[p] << 16; res[ix + 3] = '='; res[ix + 2] = '='; res[ix + 1] = EncVals[(tmp >> 12) & 0x3F]; res[ix] = EncVals[tmp >> 18]; } return new String(res); } public byte[] Decode(string s) { int len = s.Length; while(s[len - 1] == '=') len--; len = (len / 4 + 2) * 3; byte[] res = new byte[len]; int ix = 0; int tmp; for(int i = 0; i < s.Length; i += 4) { tmp = (DecVals[s[i]] << 18) | (DecVals[s[i + 1]] << 12) | (DecVals[s[i + 2]] << 6) | DecVals[s[i + 3]]; res[ix] = (byte)(tmp >> 16); res[ix + 1] = (byte)((tmp >> 8) & 0xFF); res[ix + 2] = (byte)(tmp & 0xFF); ix += 3; } return res; } }
Jim Brandley - 09 Jul 2007 03:43 GMT Thanks Arne - That's pretty much what I have written. I was using a StringBuilder in Encode last night. I was able to cut the cost in half today by using a char array as you have done. I was surprised at the difference.
>> BTW - I agree that altering something that complies with a standard is a >> bad thing to do. I was on an ANSI committee years ago, and I know why [quoted text clipped - 80 lines] > } > } Jon Skeet [C# MVP] - 08 Jul 2007 07:40 GMT > > I need to append a short ciphertext string as a query variable encoded so > > it's valid for a URL. After encryption, I convert the bytes to Base64. [quoted text clipped - 7 lines] > > I believe that + is the only non URL valid character in base64 output. Depending on the exact context, it can be handy to get rid of / and = too. In some cases it's just + that needs to be replaced though, yes.
> Why not a simple String Replace ? Indeed... possibly with a check to see whether a replacement is needed to start with.
> > I am using .Net 2.0, and I have not found a way to coerce the built in > > Base64 converter to use a character set that could avoid the trip through > > UrlEncode. Am I missing anything? If not, is there any way to add this > > capability to a future release? > > Base64 is a standard. It is not common to allow mocking with a standard. I think it's pretty common to adapt base64 to only include URL-safe characters. Put it this way - it's common enough to have made it into Wikipedia:
http://en.wikipedia.org/wiki/Base64#URL_Applications
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Arne Vajhøj - 08 Jul 2007 16:23 GMT >> Base64 is a standard. It is not common to allow mocking with a standard. > [quoted text clipped - 3 lines] > > http://en.wikipedia.org/wiki/Base64#URL_Applications Hmm.
People seem already to have forgotten the nightmare of incompatible uuencode versions.
:-( Arne
Jon Skeet [C# MVP] - 08 Jul 2007 20:23 GMT > >> Base64 is a standard. It is not common to allow mocking with a standard. > > [quoted text clipped - 8 lines] > People seem already to have forgotten the nightmare of > incompatible uuencode versions. This isn't usually for communicating between two applications though - it's to allow a stateless application to communicate effectively with itself. In other words, you're in complete control of both "ends" of the conversation, so can be compatible with yourself appropriately. Base64 happens to be a pretty simple format for representing arbitrary binary data, and it just needs a little tweak for the sake of URL encoding.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Arne Vajhøj - 09 Jul 2007 02:22 GMT >>>> Base64 is a standard. It is not common to allow mocking with a standard. >>> I think it's pretty common to adapt base64 to only include URL-safe [quoted text clipped - 14 lines] > binary data, and it just needs a little tweak for the sake of URL > encoding. There are always some excuse to break the standards.
It starts with being used for one page communicating with itself. Then it become used for communicating between pages. Then it starts getting used down the lower layers. Then it gets exposed as a service to Java and Python apps. Etc.etc..
Maybe.
Arne
Jon Skeet [C# MVP] - 09 Jul 2007 07:32 GMT > > This isn't usually for communicating between two applications though - > > it's to allow a stateless application to communicate effectively with [quoted text clipped - 10 lines] > used down the lower layers. Then it gets exposed as a service to > Java and Python apps. Etc.etc.. So you avoid doing that - keep it very tightly controlled, and there are no problems. I really don't see anything wrong in this case.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Arne Vajhøj - 15 Jul 2007 02:22 GMT >>> This isn't usually for communicating between two applications though - >>> it's to allow a stateless application to communicate effectively with [quoted text clipped - 12 lines] > So you avoid doing that - keep it very tightly controlled, and there > are no problems. I really don't see anything wrong in this case. How does one prevent code reuse ?
Arne
Jon Skeet [C# MVP] - 15 Jul 2007 09:45 GMT > > So you avoid doing that - keep it very tightly controlled, and there > > are no problems. I really don't see anything wrong in this case. > > How does one prevent code reuse ? There's no problem reusing the code - within the appropriate layer. There's no reason why multiple web applications shouldn't all use the same code converting URL parameters into arbitrary binary data. You just need to be careful not to use it inappropriately elsewhere. Software engineering always requires discipline like that. Naming the class UrlSafeBase64 or something like that would make it pretty obvious though, IMO.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Jim Brandley - 09 Jul 2007 03:35 GMT That's exactly what I'm using it for. In a stateless environment, I need a secure way to return context to myself to service http requests coming in from our own pages. Since I generate a lot of these, it needs to be done quickly.
Arne Vajhøj <arne@vajhoej.dk> wrote:
> Jon Skeet [C# MVP] wrote: > >> Base64 is a standard. It is not common to allow mocking with a [quoted text clipped - 10 lines] > People seem already to have forgotten the nightmare of > incompatible uuencode versions. This isn't usually for communicating between two applications though - it's to allow a stateless application to communicate effectively with itself. In other words, you're in complete control of both "ends" of the conversation, so can be compatible with yourself appropriately. Base64 happens to be a pretty simple format for representing arbitrary binary data, and it just needs a little tweak for the sake of URL encoding.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Jim Brandley - 08 Jul 2007 16:38 GMT Thanks. That's similar to what I have written. I'll see if I can get mine to perform better. I was using a StringBuilder to accept the encoded characters. I'll see if it performs better using a character array, and save the string construction until it's complete.
Arne Vajhøj <arne@vajhoej.dk> wrote:
> Jim Brandley wrote: > > I need to append a short ciphertext string as a query variable encoded [quoted text clipped - 13 lines] > > I believe that + is the only non URL valid character in base64 output. Depending on the exact context, it can be handy to get rid of / and = too. In some cases it's just + that needs to be replaced though, yes.
> Why not a simple String Replace ? Indeed... possibly with a check to see whether a replacement is needed to start with.
> > I am using .Net 2.0, and I have not found a way to coerce the built in > > Base64 converter to use a character set that could avoid the trip [quoted text clipped - 3 lines] > > Base64 is a standard. It is not common to allow mocking with a standard. I think it's pretty common to adapt base64 to only include URL-safe characters. Put it this way - it's common enough to have made it into Wikipedia:
http://en.wikipedia.org/wiki/Base64#URL_Applications
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|