[wp-trac] [WordPress Trac] #10473: Shortcode parser in is flawed

WordPress Trac wp-trac at lists.automattic.com
Thu Jul 23 23:27:38 UTC 2009


#10473: Shortcode parser in is flawed
--------------------------+-------------------------------------------------
 Reporter:  godfreykfc    |       Owner:                          
     Type:  defect (bug)  |      Status:  new                     
 Priority:  high          |   Milestone:  Unassigned              
Component:  Shortcodes    |     Version:  2.8.1                   
 Severity:  critical      |    Keywords:  regex, shortcode, parser
--------------------------+-------------------------------------------------
 There are several critical flaws in the shortcode parser. I'm using 2.8.2.
 This probably deserves multiple tickets but I think grouping them all here
 will be easier for the maintainers to follow up (because a single patch
 that fixes one of the following might break the others):

 Problem 1: It doesn't work properly, at all!

 Plugin code:
 {{{
 function some_handler($atts, $content){
     return "**BEGIN** Content: '{$content}' **END**";
 }

 function my_init() {
     add_shortcode('sc', 'some_handler');
 }

 add_action('init','my_init');
 }}}

 Post content:
 {{{
 1[sc]1
 2[sc]with content[/sc]2
 3[sc /]3
 4[sc]with content[/sc]4
 5[sc with="attr"][/sc]5
 6[sc with="attr"]and content[/sc]6
 7[sc]7
 8multiple line content8
 9[/sc]9
 }}}

 I apologize if this is a bit hard to read, but the idea is every line
 should be a self contained shortcode (except 7-9 which is one big fat
 chunk of short code). I'm adding the line number before and after the tag
 so that you can see clearly what's going on. (I'm not even testing the
 single line shortcode fail as described in #10082)

 Expected HTML output:
 {{{
 1**BEGIN** Content: '' **END**1<br />
 2**BEGIN** Content: 'with content' **END**2<br />
 3**BEGIN** Content: '' **END**3<br />
 4**BEGIN** Content: 'with content' **END**4<br />
 5**BEGIN** Content: '' **END**5<br />
 4**BEGIN** Content: 'and content' **END**4<br />
 7**BEGIN** Content: '7
 8multiple line content8
 9' **END**9
 }}}

 Should be fairly straight forward. But the ACTUAL HTML output blew my
 mind...
 {{{
 1**BEGIN** Content: '1<br />
 2[sc]with content' **END**2<br />
 3**BEGIN** Content: '3<br />
 4[sc]with content' **END**4<br />
 5**BEGIN** Content: '[/sc]5<br />
 6[sc with="attr"]and content' **END**6<br />
 7**BEGIN** Content: '7<br />
 8multiple line content8<br />
 9' **END**9
 }}}

 Problem 1.1:
 New lines inside a shortcode should not be converted to {{{<br />}}} tags,
 at least when they are entered in the HTML (non-visual) editor. This is
 perhaps more of an enhancement than defect, but it makes it pretty much
 impossible to create shortcodes that would take multiple lines content. If
 it's a desire behavior for that particular shortcode, it should be up to
 the author to call a function like nl2br to convert those new lines.

 Problem 1.2:
 Using the same shortcode back-to-back would make the parse spill, even if
 they are on different lines. For example, the parser picked up the opening
 tag from the first line and the closing tag from the second line. What is
 even more weird is that it happens even when the shortcode tag is self-
 closed (see line 3-4).

 ----

 Problem 2:
 Incorrect types of arguments being passed into the handler function.

 {{{
 function some_handler($atts, $content = null){
     var_dump($atts);
     var_dump($content);
 }

 function my_init() {
     add_shortcode('sc', 'some_handler');
 }

 add_action('init','my_init');
 }}}

 Post content:
 {{{
 [sc]
 }}}

 Output:
 {{{
 string '' (length=0)
 string '' (length=0)
 }}}

 Problem 2.1:
 According to the [http://codex.wordpress.org/Shortcode_API documentation],
 $content should be null when the tag is self-closed and it can be checked
 by is_null($content). In reality, it doesn't matter if you use {{{[sc]}}}
 or {{{[sc /]}}} or {{{[sc][/sc]}}}, they would all pass in an empty String
 instead of null. If this is the desired behavior, the documentation should
 be updated. (But I believe there are benefits when you can actually
 distinguish between empty content and no content.)

 Problem 2.2:
 What makes me stretch my head even more is the behavior when no attributes
 are given. Intuitively {{{$atts}}} should be an empty array when no
 arguments are passed in, so that you can loop through it using iterators
 like {{{foreach()}}} without the fear of it throwing an error. (Or at
 least it should be null..) However, when no arguments are there an empty
 string is passed in. This does not make a lot of sense, but if for some
 reason that is the intended behavior then the documentation should be
 updated too.

 Some of you may argue that it doesn't matter because you are suppose to
 normalize it using {{{shortcode_atts()}}} before you do anything else, and
 it seems to handle empty strings just fine. However, this is not always
 possible. For instance, according to the API, unnamed attributes are
 supported and they will be passed in as items with numeric keys. However,
 there aren't any good ways to make {{{shortcode_atts()}}} keep the
 numeric-keyed items in the array except naming all of them explicitly in
 the default, i.e. {{{shortcode_atts(array( 0=>'', 1=>'',
 2=>''....),$atts);}}}. (And even if you're fine with doing that, you can't
 really pass that normalized array into {{{extract()}}} because it'd just
 skip them. This might be considered as a defect of {{{shortcode_atts()}}},
 but I am not going to touch that part. So, in order to extract your
 unnamed attributes, you would have to loop through the array and do your
 stuff before passing it into {{{shortcode_atts()}}}. So the differences
 between an empty array and an empty string DOES matter.

 ----

 Problem 3:

 This is by far the most critical flaw of the parser and it does not
 require much explanation:
 {{{
 function sc_handler($atts, $content = null){
         return "A [sc] shortcode.";
 }

 function scalar_handler($atts, $content = null){
         return "A [scalar] shortcode.";
 }

 function sc_extend_handler($atts, $content = null){
         return "A [sc-extend] shortcode.";
 }

 function my_init() {
   add_shortcode('sc', 'sc_handler');
   add_shortcode('scalar', 'sc_handler');
         add_shortcode('sc-extend', 'sc_extend_handler');
 }

 add_action('init','my_init');
 }}}

 Post content:
 {{{
 [sc]
 [scalar]
 [sc-extend]
 }}}

 HTML output:
 {{{
 A [sc] shortcode.
 A [scalar] shortcode.
 A [sc] shortcode.
 }}}

 If you make it do a {{{var_dump()}}} on $atts you will have a better idea
 of what's going on:

 [sc]:
 {{{
 string '' (length=0)
 }}}

 [scalar]:
 {{{
 string '' (length=0)
 }}}

 [sc-extend]:
 {{{
 array
   0 => string '-extend' (length=7)
 }}}

 This is obviously very bad. For example, if someone registered a shortcode
 'rss', anything that starts with 'rss' and includes a dash ('rss-fetch' or
 something) would fail. Throughout my testing, shortcode names with dashes
 seems to be causing a lot of issues (but according to the documentation
 it's fully supported, it's even used in oen of the examples). I haven't
 looked into the code yet, but are we even escaping those dashes before
 sticking it into the pattern?

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/10473>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list