[wp-hackers] Portable tokenising from the shell

Otto otto at ottodestruct.com
Sat Dec 1 17:06:30 UTC 2012


This guy seems to have a json parser written in awk. Dunno if that
will help any.

https://github.com/dubiousjim/awkenough/blob/master/library.awk#L581

-Otto


On Sat, Dec 1, 2012 at 10:57 AM, David Anderson <david at wordshell.net> wrote:
> Hi,
>
> Some of you may remember an earlier discussion about parsing JSON output,
> which is one of the formats available from api.wordpress.org. JSON was the
> most suitable for portably parsing from a Bourne/Bash shell.
>
> This guy has implemented such a parser already:
> http://github.com/dominictarr/JSON.sh
>
> One part of the parser is this. It's the tokeniser, splitting up the JSON
> into parts:
>
>     local ESCAPE='(\\[^u[:cntrl:]]|\\u[0-9a-fA-F]{4})'
>     local CHAR='[^[:cntrl:]"\\]'
>     local STRING="\"$CHAR*($ESCAPE$CHAR*)*\""
>     local NUMBER='-?(0|[1-9][0-9]*)([.][0-9]*)?([eE][+-]?[0-9]*)?'
>     local KEYWORD='null|false|true'
>     local SPACE='[[:space:]]+'
>     grep -E -o "$STRING|$NUMBER|$KEYWORD|$SPACE|."
>
> It's an interesting use of grep; basically it matches *everything*, but
> splits it up based on certain separators, in a certain order.
>
> However... my research shows that the "-o" switch (which causes grep to
> output only each matched portion, one per line) is not part of POSIX, but is
> nonetheless available in GNU (hence Linux and Cygwin), Free/Net/OpenBSD and
> Mac OS X - but not in Solaris (either in the grep in /usr/bin or in
> /usr/xpg4/bin).
>
> So it's not quite totally portable. My question: does anyone have sufficient
> sed or awk skills to advise me how to reproduce the above in one of those?
> As I said, it's a tokeniser, that splits the input into the discrete chunks
> indicated. I'm an awk novice. I'm trying to write code that assumes only
> POSIX, or failing that the common subset of GNU/BSD/Mac/Solaris. If I fail I
> can use various hacks (e.g. search for perl, use that if found, search for
> PHP, use that), but it'd be nice if I didn't have to resort to multiple code
> paths in that way.
>
> Many thanks,
> David
>
> --
> WordShell - WordPress fast from the CLI - www.wordshell.net
>
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers


More information about the wp-hackers mailing list