The PHP ’explode’ function splits a string into an array based on a separator character (or separator string). This is not enough to build a parser for a template language on as most languages allow strings to contain any character. In this post we will show a function that will split while respecting quotes and one to remove the quotes while allowing for escaped quotes as part of the string.
The easy assignment
Write a function or program that can split a string at each non-escaped occurrence of a separator character.
It should accept three input parameters:
- The string
- The separator character
- The escape character
It should output a list of strings. (source)
Test case
The input string:
"one^|uno||three^^^^|four^^^|^cuatro|"
Should result in an array of 5 strings:
[ "one|uno", "", "three^^", "four^|cuatro", "" ]
In this example the ‘^’ is the escape character and the ‘|’ is the separator.
The code
<?php function token_with_escape($str, $escape = '^', $separator = '|') { $tokens = []; $token = ''; $escaped = false; for ($i = 0; $i < strlen($str); $i++) { $c = $str[$i]; if (!$escaped) { if ($c == $escape) { $escaped = true; } elseif ($c == $separator) { $tokens[] = $token; $token = ''; } else { $token .= $c; } } else { $token .= $c; $escaped = false; } } $tokens[] = $token; return $tokens; }$input = “one^|uno||three^^^^|four^^^|^cuatro|"; $output = token_with_escape($input); echo json_encode($output) . "\n”;
And it does in fact output the right string.
The hard assignment (complex templates)
Write a function or program that can split a string at each occurrence of a separator character that is not within non-escaped quotes.
It should accept four input parameters:
- The string
- The quote character
- The escape character
- The separator character
It should output a list of strings.
Test case
You need to avoid splitting within a ‘strings between quotes’. So you want:
"'one|uno'||'three^'^''|'four^^^'^cuatro'|"
to be split into (step 1):
[ "'one|uno'", "", "'three^'^''", "'four^^^'^cuatro'", "" ]
and to be parsed into (step 2):
[ "one|uno", "", "three''", "four^'cuatro", "" ]
As you can see you never split within a quoted string.
The code
This function will take care of the first step:
<?php function token_with_quote($str, $quote = "'", $escape = '^', $separator = '|') { $tokens = []; $token = ''; $escaped = false; $quoted = false; $seplen = strlen($separator); for ($i = 0; $i < strlen($str); $i++) { $c = $str[$i]; if (!$quoted) { if ($c == $quote) { $quoted = true; } elseif (substr($str, $i, $seplen) == $separator) { $tokens[] = $token; $token = ''; $i += $seplen - 1; continue; } } else { if (!$escaped) { if ($c == $quote) { $quoted = false; } elseif ($c == $escape) { $escaped = true; } } else { $escaped = false; } } $token .= $c; } $tokens[] = $token; return $tokens; }$input = "‘one|uno’||‘three^’^’’|‘four^^^’^cuatro’|"; $output = token_with_quote($input); echo json_encode($output) . "\n";
This function will take care of the second step:
function token_unquote($arr, $quote = "'", $escape = '^') { for ($i = 0; $i < count($arr); $i++) { $str = trim($arr[$i]); if (strlen($str) > 1 && $str[0] == $quote && $str[strlen($str) - 1] == $quote) { $escaped = false; $token = ''; $str = substr($str, 1, strlen($str) - 2); for ($j = 0; $j < strlen($str); $j++) { $c = $str[$j]; if (!$escaped) { if ($c == $escape) { $escaped = true; continue; } } else { $escaped = false; } $token .= $c; } $arr[$i] = $token; } } return $arr; }$input = "‘one|uno’||‘three^’^’’|‘four^^^’^cuatro’|"; $output = token_unquote(token_with_quote($input)); echo json_encode($output) . "\n";
And as expected the output is parsed correctly.
Enjoy!