URLENCODE vs RAWURLENCODE

By Game Changer → Saturday, December 5, 2015


Differences in ASCII:

URLENCODE:
  • Calculates a start/end length of the input string, allocates memory
  • Walks through a while-loop, increments until we reach the end of the string
  • Grabs the present character
  • If the character is equal to ASCII Char 0x20 (ie, a "space"), add a + sign to the output string.
  • If it's not a space, and it's also not alphanumeric (isalnum(c)), and also isn't and _-, or .character, then we , output a % sign to array position 0, do an array look up to the hexcharsarray for a lookup for os_toascii array (an array from Apache that translates char to hex code) for the key of c (the present character), we then bitwise shift right by 4, assign that value to the character 1, and to position 2 we assign the same lookup, except we preform a logical and to see if the value is 15 (0xF), and return a 1 in that case, or a 0 otherwise. At the end, you'll end up with something encoded.
  • If it ends up it's not a space, it's alphanumeric or one of the _-. chars, it outputs exactly what it is.
RAWURLENCODE:
  • Allocates memory for the string
  • Iterates over it based on length provided in function call (not calculated in function as with URLENCODE).
Note: Many programmers have probably never seen a for loop iterate this way, it's somewhat hackish and not the standard convention used with most for-loops, pay attention, it assigns x and y, checks for exit on len reaching 0, and increments both x and y. I know, it's not what you'd expect, but it's valid code.
  • Assigns the present character to a matching character position in str.
  • It checks if the present character is alphanumeric, or one of the _-. chars, and if it isn't, we do almost the same assignment as with URLENCODE where it preforms lookups, however, we increment differently, using y++ rather than to[1], this is because the strings are being built in different ways, but reach the same goal at the end anyway.
  • When the loop's done and the length's gone, It actually terminates the string, assigning the \0byte.
  • It returns the encoded string.
Differences:
  • UrlEncode checks for space, assigns a + sign, RawURLEncode does not.
  • UrlEncode does not assign a \0 byte to the string, RawUrlEncode does (this may be a moot point)
  • They iterate differntly, one may be prone to overflow with malformed strings, I'm merely suggesting this and I haven't actually investigated.
They basically iterate differently, one assigns a + sign in the event of ASCII 20.

Differences in EBCDIC:

URLENCODE:
  • Same iteration setup as with ASCII
  • Still translating the "space" character to a + sign. Note-- I think this needs to be compiled in EBCDIC or you'll end up with a bug? Can someone edit and confirm this?
  • It checks if the present char is a char before 0, with the exception of being a . or -OR less than A but greater than char 9OR greater than Z and less than a but not a _OR greater than z (yeah, EBCDIC is kinda messed up to work with). If it matches any of those, do a similar lookup as found in the ASCII version (it just doesn't require a lookup in os_toascii).
RAWURLENCODE:
  • Same iteration setup as with ASCII
  • Same check as described in the EBCDIC version of URL Encode, with the exception that if it's greater than z, it excludes ~ from the URL encode.
  • Same assignment as the ASCII RawUrlEncode
  • Still appending the \0 byte to the string before return.

Grand Summary

  • Both use the same hexchars lookup table
  • URIEncode doesn't terminate a string with \0, raw does.
  • If you're working in EBCDIC I'd suggest using RawUrlEncode, as it manages the ~ that UrlEncode does not (this is a reported issue). It's worth noting that ASCII and EBCDIC 0x20 are both spaces.
  • They iterate differently, one may be faster, one may be prone to memory or string based exploits.
  • URIEncode makes a space into +, RawUrlEncode makes a space into %20 via array lookups.
Disclaimer: I haven't touched C in years, and I haven't looked at EBCDIC in a really really long time. If I'm wrong somewhere, let me know.
Sael

I'm Sael. An expert coder and system admin. I enjoy to make codes easy for novice.

Website: fb/Fujael

No Comment to " URLENCODE vs RAWURLENCODE "