Differences in ASCII:
URLENCODE:
- Calculates a start/end length of the input string, allocates memory
- Walks through a while-loop, increments until we reach the end of the string
- Grabs the present character
- If the character is equal to ASCII Char 0x20 (ie, a "space"), add a
+
sign to the output string. - If it's not a space, and it's also not alphanumeric (
isalnum(c)
), and also isn't and_
,-
, or.
character, then we , output a%
sign to array position 0, do an array look up to thehexchars
array for a lookup foros_toascii
array (an array from Apache that translates char to hex code) for the key ofc
(the present character), we then bitwise shift right by 4, assign that value to the character 1, and to position 2 we assign the same lookup, except we preform a logical and to see if the value is 15 (0xF), and return a 1 in that case, or a 0 otherwise. At the end, you'll end up with something encoded. - If it ends up it's not a space, it's alphanumeric or one of the
_-.
chars, it outputs exactly what it is.
RAWURLENCODE:
- Allocates memory for the string
- Iterates over it based on length provided in function call (not calculated in function as with URLENCODE).
Note: Many programmers have probably never seen a for loop iterate this way, it's somewhat hackish and not the standard convention used with most for-loops, pay attention, it assigns
x
and y
, checks for exit on len
reaching 0, and increments both x
and y
. I know, it's not what you'd expect, but it's valid code.- Assigns the present character to a matching character position in
str
. - It checks if the present character is alphanumeric, or one of the
_-.
chars, and if it isn't, we do almost the same assignment as with URLENCODE where it preforms lookups, however, we increment differently, usingy++
rather thanto[1]
, this is because the strings are being built in different ways, but reach the same goal at the end anyway. - When the loop's done and the length's gone, It actually terminates the string, assigning the
\0
byte. - It returns the encoded string.
Differences:
- UrlEncode checks for space, assigns a + sign, RawURLEncode does not.
- UrlEncode does not assign a
\0
byte to the string, RawUrlEncode does (this may be a moot point) - They iterate differntly, one may be prone to overflow with malformed strings, I'm merely suggesting this and I haven't actually investigated.
They basically iterate differently, one assigns a + sign in the event of ASCII 20.
Differences in EBCDIC:
URLENCODE:
- Same iteration setup as with ASCII
- Still translating the "space" character to a + sign. Note-- I think this needs to be compiled in EBCDIC or you'll end up with a bug? Can someone edit and confirm this?
- It checks if the present char is a char before
0
, with the exception of being a.
or-
, OR less thanA
but greater than char9
, OR greater thanZ
and less thana
but not a_
. OR greater thanz
(yeah, EBCDIC is kinda messed up to work with). If it matches any of those, do a similar lookup as found in the ASCII version (it just doesn't require a lookup in os_toascii).
RAWURLENCODE:
- Same iteration setup as with ASCII
- Same check as described in the EBCDIC version of URL Encode, with the exception that if it's greater than
z
, it excludes~
from the URL encode. - Same assignment as the ASCII RawUrlEncode
- Still appending the
\0
byte to the string before return.
Grand Summary
- Both use the same hexchars lookup table
- URIEncode doesn't terminate a string with \0, raw does.
- If you're working in EBCDIC I'd suggest using RawUrlEncode, as it manages the
~
that UrlEncode does not (this is a reported issue). It's worth noting that ASCII and EBCDIC 0x20 are both spaces. - They iterate differently, one may be faster, one may be prone to memory or string based exploits.
- URIEncode makes a space into
+
, RawUrlEncode makes a space into%20
via array lookups.
Disclaimer: I haven't touched C in years, and I haven't looked at EBCDIC in a really really long time. If I'm wrong somewhere, let me know.
No Comment to " URLENCODE vs RAWURLENCODE "