[wp-hackers] Getting the entire HTML page generated from WordPress

Mario Peshev mario at peshev.net
Tue Feb 7 05:21:45 UTC 2012


I actually tried to debug the plain buffer with no modifications at first
so there is another problem in the way. I'll try the suggestions above and
then proceed with the formatting. I've also read a lot about the regex
functions and their performance and I'll try not to overuse them.

Best,

Mario Peshev
Training and Consulting Services @ DevriX
http://www.linkedin.com/in/mpeshev
http://devrix.com
http://peshev.net/blog



On Tue, Feb 7, 2012 at 7:05 AM, Daniel Grundel <
daniel at webpresencepartners.com> wrote:

> Mario,
>
> Are you using str_replace or preg_replace on the entire set of buffered
> page content? That may explain your blank output.
> I am doing something very similar and was getting blank output as well when
> trying to use those functions with very large (100,000+ character) strings.
>
> Haven't come up with a solution yet, but hopefully that helps you track
> down your problem. (Ultimately I decided I didn't really *need* to replace
> those characters anyhow...)
>
> Daniel J. Grundel
> Web Presence Partners
> webpresencepartners.com
> daniel at webpresencepartners.com
> 772 678 0697
>
>
>
> On Mon, Feb 6, 2012 at 11:02 PM, <wp-hackers-request at lists.automattic.com
> >wrote:
>
> > Send wp-hackers mailing list submissions to
> >        wp-hackers at lists.automattic.com
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://lists.automattic.com/mailman/listinfo/wp-hackers
> > or, via email, send a message with subject or body 'help' to
> >        wp-hackers-request at lists.automattic.com
> >
> > You can reach the person managing the list at
> >        wp-hackers-owner at lists.automattic.com
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of wp-hackers digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Getting the entire HTML page generated from WordPress
> >      (Mario Peshev)
> >   2. Re: Getting the entire HTML page generated from   WordPress (Otto)
> >   3. Re: Getting the entire HTML page generated from   WordPress
> >      (Mario Peshev)
> >   4. Re: Getting the entire HTML page generated from   WordPress
> >      (Brian Layman)
> >   5. Re: Getting the entire HTML page generated from   WordPress
> >      (Mario Peshev)
> >   6. Re: Getting the entire HTML page generated from   WordPress
> >      (zhaiziming at gmail.com)
> >   7. Re: Getting the entire HTML page generated from   WordPress (Otto)
> >   8. Ajax requests, admin-ajax.php and the WP_ADMIN constant (24/7)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 6 Feb 2012 20:58:59 +0200
> > From: Mario Peshev <mario at peshev.net>
> > Subject: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID:
> >        <CAN_8tK6MbyhWb_BTwPAp=qfiiQuyGrjDZ=
> mtxBrjtd8JBqwnag at mail.gmail.com
> > >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hello everyone,
> >
> > I'm trying to intercept the entire HTML page generated on a page load in
> > WordPress. the_content() is not enough as I need some filtering in the
> meta
> > tags, footer, sidebar etc.
> >
> > I've been looking for some filters and the HTTP API, but nothing seems to
> > work on a page basis, i.e. post-filtering of every element in the entire
> > DOM tree.
> >
> > Any tips on this?
> >
> > Mario Peshev
> > Training and Consulting Services @ DevriX
> > http://www.linkedin.com/in/mpeshev
> > http://devrix.com
> > http://peshev.net/blog
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 6 Feb 2012 13:10:12 -0600
> > From: Otto <otto at ottodestruct.com>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID:
> >        <CAD-FghywUe=9eGsqZ7sXxzA3AZRZi=N_rmsi0w3uX=
> hr-tC6oA at mail.gmail.com
> > >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Use output buffering. On the init action, do an ob_start(), then hook
> > a function to the shutdown action hook and you can do an
> > ob_get_clean() to get the contents, filter them as you will, then echo
> > the result.
> >
> > Not the best, but the only whole-page method I know of.
> >
> > -Otto
> >
> >
> >
> > On Mon, Feb 6, 2012 at 12:58 PM, Mario Peshev <mario at peshev.net> wrote:
> > > Hello everyone,
> > >
> > > I'm trying to intercept the entire HTML page generated on a page load
> in
> > > WordPress. the_content() is not enough as I need some filtering in the
> > meta
> > > tags, footer, sidebar etc.
> > >
> > > I've been looking for some filters and the HTTP API, but nothing seems
> to
> > > work on a page basis, i.e. post-filtering of every element in the
> entire
> > > DOM tree.
> > >
> > > Any tips on this?
> > >
> > > Mario Peshev
> > > Training and Consulting Services @ DevriX
> > > http://www.linkedin.com/in/mpeshev
> > > http://devrix.com
> > > http://peshev.net/blog
> > > _______________________________________________
> > > wp-hackers mailing list
> > > wp-hackers at lists.automattic.com
> > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 6 Feb 2012 21:26:43 +0200
> > From: Mario Peshev <mario at peshev.net>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID:
> >        <CAN_8tK4wk0FTkgg4kk9M0bH2O7Zmbp=
> npF-4dJaYL0GFGfB8sg at mail.gmail.com
> > >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > I've tried one solution from stackexchange:
> >
> > http://wordpress.stackexchange.com/a/41351/11506
> >
> > Didn't seem to work though, the end buffering had problems and the
> content
> > was blank at the end. Or am I doing it wrong?
> >
> > Mario Peshev
> > Training and Consulting Services @ DevriX
> > http://www.linkedin.com/in/mpeshev
> > http://devrix.com
> > http://peshev.net/blog
> >
> >
> >
> > On Mon, Feb 6, 2012 at 9:10 PM, Otto <otto at ottodestruct.com> wrote:
> >
> > > Use output buffering. On the init action, do an ob_start(), then hook
> > > a function to the shutdown action hook and you can do an
> > > ob_get_clean() to get the contents, filter them as you will, then echo
> > > the result.
> > >
> > > Not the best, but the only whole-page method I know of.
> > >
> > > -Otto
> > >
> > >
> > >
> > > On Mon, Feb 6, 2012 at 12:58 PM, Mario Peshev <mario at peshev.net>
> wrote:
> > > > Hello everyone,
> > > >
> > > > I'm trying to intercept the entire HTML page generated on a page load
> > in
> > > > WordPress. the_content() is not enough as I need some filtering in
> the
> > > meta
> > > > tags, footer, sidebar etc.
> > > >
> > > > I've been looking for some filters and the HTTP API, but nothing
> seems
> > to
> > > > work on a page basis, i.e. post-filtering of every element in the
> > entire
> > > > DOM tree.
> > > >
> > > > Any tips on this?
> > > >
> > > > Mario Peshev
> > > > Training and Consulting Services @ DevriX
> > > > http://www.linkedin.com/in/mpeshev
> > > > http://devrix.com
> > > > http://peshev.net/blog
> > > > _______________________________________________
> > > > wp-hackers mailing list
> > > > wp-hackers at lists.automattic.com
> > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> > > _______________________________________________
> > > wp-hackers mailing list
> > > wp-hackers at lists.automattic.com
> > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> > >
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 06 Feb 2012 14:53:04 -0500
> > From: Brian Layman <wp-hackers at thecodecave.com>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID: <4F302FA0.8040002 at thecodecave.com>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > That looks like it's the right idea..
> >
> > The basic paradigm is:
> >
> > Somewhere very early on do
> > ob_end_clean(); // Ensure a clean buffer.
> > ob_start();
> >
> > On the "wp" or "init" action should be fine.
> >
> > Then, as the very last thing you do, capture, process and echo the
> results
> > $body = ob_get_contents();
> > ob_end_clean();  // Clear the results so the unaltered stuff isn't sent
> > echo makeItBetter($body);
> >
> > Perhaps in the "shutdown" action. (I've heard rumors of times shutdown
> > is not called, I have no specifics on that. That may just have been when
> > a die() was hit.)
> >
> > Headers, if you need them, can be gotten through $headers =
> headers_list();
> >
> > You could even do this code before and after the wp stuff in the main
> > index.php, but you're likely to be slapped in the face with a large
> > trout if you try something as hackish as that.
> >
> > Hope that helps,
> >
> > Brian Layman
> > http://eHermitsInc.com
> >
> > On 2/6/2012 2:26 PM, Mario Peshev wrote:
> > > I've tried one solution from stackexchange:
> > >
> > > http://wordpress.stackexchange.com/a/41351/11506
> > >
> > > Didn't seem to work though
> > >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 6 Feb 2012 21:58:39 +0200
> > From: Mario Peshev <mario at peshev.net>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID:
> >        <CAN_8tK4kvCS31fWYerKQvdiDCZ1s72SXdEtkx6Pzhb=
> 1mohJWg at mail.gmail.com
> > >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Sure, I can do it in index.php or the wp-blog-header file, but I don't
> feel
> > that suicidal yet :)
> >
> > Will try this out, thanks. I remember having some problems with the
> > shutdown hook, but it should work for normal posts and standard post
> types
> > I believe.
> >
> > Thanks,
> >
> > Mario Peshev
> > Training and Consulting Services @ DevriX
> > http://www.linkedin.com/in/mpeshev
> > http://devrix.com
> > http://peshev.net/blog
> >
> >
> >
> > On Mon, Feb 6, 2012 at 9:53 PM, Brian Layman <wp-hackers at thecodecave.com
> > >wrote:
> >
> > > That looks like it's the right idea..
> > >
> > > The basic paradigm is:
> > >
> > > Somewhere very early on do
> > > ob_end_clean(); // Ensure a clean buffer.
> > > ob_start();
> > >
> > > On the "wp" or "init" action should be fine.
> > >
> > > Then, as the very last thing you do, capture, process and echo the
> > results
> > > $body = ob_get_contents();
> > > ob_end_clean();  // Clear the results so the unaltered stuff isn't sent
> > > echo makeItBetter($body);
> > >
> > > Perhaps in the "shutdown" action. (I've heard rumors of times shutdown
> is
> > > not called, I have no specifics on that. That may just have been when a
> > > die() was hit.)
> > >
> > > Headers, if you need them, can be gotten through $headers =
> > headers_list();
> > >
> > > You could even do this code before and after the wp stuff in the main
> > > index.php, but you're likely to be slapped in the face with a large
> trout
> > > if you try something as hackish as that.
> > >
> > > Hope that helps,
> > >
> > > Brian Layman
> > > http://eHermitsInc.com
> > >
> > >
> > > On 2/6/2012 2:26 PM, Mario Peshev wrote:
> > >
> > >> I've tried one solution from stackexchange:
> > >>
> > >> http://wordpress.**stackexchange.com/a/41351/**11506<
> > http://wordpress.stackexchange.com/a/41351/11506>
> > >>
> > >> Didn't seem to work though
> > >>
> > >>  ______________________________**_________________
> > > wp-hackers mailing list
> > > wp-hackers at lists.automattic.**com <wp-hackers at lists.automattic.com>
> > > http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >
> >
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Mon, 06 Feb 2012 12:19:50 -0800
> > From: "zhaiziming at gmail.com" <zhaiziming at gmail.com>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID: <4F3035E6.7070602 at gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > Don't buffer your whole page, it is not a good idea for page response
> > time. Instead, in your template file, use ob_start("your filter
> > function") and ob_end_flush() to filter the_header(), the_content(),
> > the_sidebar(), and the_footer() separately.
> > Or if you really want to do a whole page filter. template_redirect is
> > the best hook to start the buffer, and use php function
> > register_shutdown_function() to flush the buffer.
> >
> > James Zhai
> > http://www.zhaiziming.com/zZ/
> >
> >
> > On 2/6/2012 11:58 AM, Mario Peshev wrote:
> > > Sure, I can do it in index.php or the wp-blog-header file, but I don't
> > feel
> > > that suicidal yet :)
> > >
> > > Will try this out, thanks. I remember having some problems with the
> > > shutdown hook, but it should work for normal posts and standard post
> > types
> > > I believe.
> > >
> > > Thanks,
> > >
> > > Mario Peshev
> > > Training and Consulting Services @ DevriX
> > > http://www.linkedin.com/in/mpeshev
> > > http://devrix.com
> > > http://peshev.net/blog
> > >
> > >
> > >
> > > On Mon, Feb 6, 2012 at 9:53 PM, Brian Layman<
> wp-hackers at thecodecave.com
> > >wrote:
> > >
> > >> That looks like it's the right idea..
> > >>
> > >> The basic paradigm is:
> > >>
> > >> Somewhere very early on do
> > >> ob_end_clean(); // Ensure a clean buffer.
> > >> ob_start();
> > >>
> > >> On the "wp" or "init" action should be fine.
> > >>
> > >> Then, as the very last thing you do, capture, process and echo the
> > results
> > >> $body = ob_get_contents();
> > >> ob_end_clean();  // Clear the results so the unaltered stuff isn't
> sent
> > >> echo makeItBetter($body);
> > >>
> > >> Perhaps in the "shutdown" action. (I've heard rumors of times shutdown
> > is
> > >> not called, I have no specifics on that. That may just have been when
> a
> > >> die() was hit.)
> > >>
> > >> Headers, if you need them, can be gotten through $headers =
> > headers_list();
> > >>
> > >> You could even do this code before and after the wp stuff in the main
> > >> index.php, but you're likely to be slapped in the face with a large
> > trout
> > >> if you try something as hackish as that.
> > >>
> > >> Hope that helps,
> > >>
> > >> Brian Layman
> > >> http://eHermitsInc.com
> > >>
> > >>
> > >> On 2/6/2012 2:26 PM, Mario Peshev wrote:
> > >>
> > >>> I've tried one solution from stackexchange:
> > >>>
> > >>> http://wordpress.**stackexchange.com/a/41351/**11506<
> > http://wordpress.stackexchange.com/a/41351/11506>
> > >>>
> > >>> Didn't seem to work though
> > >>>
> > >>>   ______________________________**_________________
> > >> wp-hackers mailing list
> > >> wp-hackers at lists.automattic.**com<wp-hackers at lists.automattic.com>
> > >> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >>
> > > _______________________________________________
> > > wp-hackers mailing list
> > > wp-hackers at lists.automattic.com
> > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> >
> >
> > ------------------------------
> >
> > Message: 7
> > Date: Mon, 6 Feb 2012 15:35:12 -0600
> > From: Otto <otto at ottodestruct.com>
> > Subject: Re: [wp-hackers] Getting the entire HTML page generated from
> >        WordPress
> > To: wp-hackers at lists.automattic.com
> > Message-ID:
> >        <CAD-FghyQVfXZdN9Cw17RoFqDyqgsYC=
> RZ1UP9errK+QSUx7qrA at mail.gmail.com
> > >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Don't use the register_shutdown_function() call. WordPress already
> > does that for you automatically. Just use the "shutdown" action hook
> > instead.
> >
> > -Otto
> >
> >
> >
> > On Mon, Feb 6, 2012 at 2:19 PM, zhaiziming at gmail.com
> > <zhaiziming at gmail.com> wrote:
> > > Don't buffer your whole page, it is not a good idea for page response
> > time.
> > > Instead, in your template file, use ob_start("your filter function")
> and
> > > ob_end_flush() to filter the_header(), the_content(), the_sidebar(),
> and
> > > the_footer() separately.
> > > Or if you really want to do a whole page filter. template_redirect is
> the
> > > best hook to start the buffer, and use php function
> > > register_shutdown_function() to flush the buffer.
> > >
> > > James Zhai
> > > http://www.zhaiziming.com/zZ/
> > >
> > >
> > >
> > > On 2/6/2012 11:58 AM, Mario Peshev wrote:
> > >>
> > >> Sure, I can do it in index.php or the wp-blog-header file, but I don't
> > >> feel
> > >> that suicidal yet :)
> > >>
> > >> Will try this out, thanks. I remember having some problems with the
> > >> shutdown hook, but it should work for normal posts and standard post
> > types
> > >> I believe.
> > >>
> > >> Thanks,
> > >>
> > >> Mario Peshev
> > >> Training and Consulting Services @ DevriX
> > >> http://www.linkedin.com/in/mpeshev
> > >> http://devrix.com
> > >> http://peshev.net/blog
> > >>
> > >>
> > >>
> > >> On Mon, Feb 6, 2012 at 9:53 PM, Brian
> > >> Layman<wp-hackers at thecodecave.com>wrote:
> > >>
> > >>> That looks like it's the right idea..
> > >>>
> > >>> The basic paradigm is:
> > >>>
> > >>> Somewhere very early on do
> > >>> ob_end_clean(); // Ensure a clean buffer.
> > >>> ob_start();
> > >>>
> > >>> On the "wp" or "init" action should be fine.
> > >>>
> > >>> Then, as the very last thing you do, capture, process and echo the
> > >>> results
> > >>> $body = ob_get_contents();
> > >>> ob_end_clean(); ?// Clear the results so the unaltered stuff isn't
> sent
> > >>> echo makeItBetter($body);
> > >>>
> > >>> Perhaps in the "shutdown" action. (I've heard rumors of times
> shutdown
> > is
> > >>> not called, I have no specifics on that. That may just have been
> when a
> > >>> die() was hit.)
> > >>>
> > >>> Headers, if you need them, can be gotten through $headers =
> > >>> headers_list();
> > >>>
> > >>> You could even do this code before and after the wp stuff in the main
> > >>> index.php, but you're likely to be slapped in the face with a large
> > trout
> > >>> if you try something as hackish as that.
> > >>>
> > >>> Hope that helps,
> > >>>
> > >>> Brian Layman
> > >>> http://eHermitsInc.com
> > >>>
> > >>>
> > >>> On 2/6/2012 2:26 PM, Mario Peshev wrote:
> > >>>
> > >>>> I've tried one solution from stackexchange:
> > >>>>
> > >>>>
> > >>>> http://wordpress.**stackexchange.com/a/41351/**11506<
> > http://wordpress.stackexchange.com/a/41351/11506>
> > >>>>
> > >>>> Didn't seem to work though
> > >>>>
> > >>>> ?______________________________**_________________
> > >>>
> > >>> wp-hackers mailing list
> > >>> wp-hackers at lists.automattic.**com<wp-hackers at lists.automattic.com>
> > >>>
> > >>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >>>
> > >> _______________________________________________
> > >> wp-hackers mailing list
> > >> wp-hackers at lists.automattic.com
> > >> http://lists.automattic.com/mailman/listinfo/wp-hackers
> > >
> > >
> > > _______________________________________________
> > > wp-hackers mailing list
> > > wp-hackers at lists.automattic.com
> > > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Mon, 6 Feb 2012 20:02:20 -0800 (PST)
> > From: 24/7 <24-7 at gmx.net>
> > Subject: [wp-hackers] Ajax requests, admin-ajax.php and the WP_ADMIN
> >        constant
> > To: wp-hackers at googlegroups.com
> > Message-ID:
> >        <24355460.1426.1328587340955.JavaMail.geo-discussion-forums at yqcg21
> >
> > Content-Type: text/plain; charset="utf-8"
> >
> > I'm currently developing an app that needs some ajax requests. Suddenly I
> > found myself in a weird position: I couldn't make any requests from my
> ajax
> > callback function. Everything loaded fine, the request was sent via
> > jQuery.post() and the response was `0`. As I made the request from
> inside a
> > class, I tested if I could hook the callback from outside and it worked.
> >
> > After a lot of debugging [1] I found the *"error"*: is_admin() returns
> true
> > as admin-ajax.php defines WP_ADMIN as true on top of the file. This is a
> > bit weird behavior, as we got the wp_ajax_* (admin/logged-in) and the
> > wp_ajax_nopriv_* (public/guest) hook. If I want to, like in my case,
> built
> > the front-end stuff completely separated from the back end extensions,
> then
> > I have the problem that I simply can't divide stuff. This brings me in a
> > strange situation for organizing my files: I can't stick stuff together
> > that belongs together.
> >
> > Is there some work-around, some ideas, some concepts? Or does the ticket
> > from Denis-de-Bernard [2] will receive some attention for 3.4?
> >
> > Thanks!
> >
> > [1] I found the note about the is_admin() in the Ajax Codex article a
> > little too late: http://codex.wordpress.org/AJAX_in_Plugins
> > [2] http://core.trac.wordpress.org/ticket/12400
> >
> > ------------------------------
> >
> > _______________________________________________
> > wp-hackers mailing list
> > wp-hackers at lists.automattic.com
> > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> >
> > End of wp-hackers Digest, Vol 85, Issue 9
> > *****************************************
> >
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>


More information about the wp-hackers mailing list