[wp-hackers] wp_remote_request not telling me the 301'd URL

Edward de Leau e at leau.net
Wed Mar 23 17:39:12 UTC 2011


As an update for those who find this via Google and are also looking on how
to work with redirects and getting information on where you were redirected
to:

1)
http://plugins.trac.wordpress.org/browser/wp-favicons/trunk/includes/class-http.php
basically
works incl. in-page javascript redirects and http meta redirects (manual
redirect so redirect=0) - whatever you throw it will return a favicon IF it
exists. but.............

2) there is one open issue:
https://core.trac.wordpress.org/ticket/16855#comment:29  (for Curl
FOLLOWLOCATION
must be set to false) (and other like wise new parameter) --> in my testset
: 500 out of 1500 URLS get me the message "Too many redirects"
................ (!) (I dont think anyone ever used redirect=0)

3) but since the only reason why one would want to set redirect to 0 =
knowing where we are heading to first ( at least that is what I think and
since there is the missing feature in #2 apparently noone could have used it
anyway in any succes):  https://core.trac.wordpress.org/ticket/16950 which
would work independent of (2) makes the code of (1) for a great part not
necessary and simply requests that URL that is forwarded to with redirects
as normal.

one addition: Im thinking to drop the HEAD requests first as suggested
earlier. Too many cases where we have first a 200 then a 301/302 or a 405
method not allowed (the HEAD) etc... and only do gets. I have to run that
through some more tests.



On Sat, Mar 12, 2011 at 5:07 AM, Edward de Leau <e at leau.net> wrote:

> thanks!
>
>
> On Sat, Mar 12, 2011 at 4:50 AM, Jacob Santos <wordpress at santosj.name>wrote:
>
>> 1. Check content-type, if exists. If it is "text/html" then run the filter
>> to get the favicon.ico.
>>
>> 2. Oh my god, who would have thought an use case like this would have come
>> up?
>>
>> 3. You need to look for "Refresh" header as well. Some web servers (IIS)
>> will send Refresh instead of Location as well as web sites with a redirect
>> message for systems that do not support redirects.
>>
>> Jacob Santos
>>
>> On Fri, Mar 11, 2011 at 2:09 PM, Edward de Leau <e at leau.net> wrote:
>>
>> > I have implemented manual redirection for the wp-favicons plugin here:
>> >
>> >
>> http://plugins.trac.wordpress.org/browser/wp-favicons/trunk/includes/class-http.php
>> >
>> > (part of next version 0.5.1 where a mouseover over a redirect/tiny url
>> > shows
>> > the url it redirects to)
>> > I redirect 5 times max.
>> >
>> > e.g. (1) nu.nl (301) ---> (2) www.nu.nl (200) --> (3)
>> > www.nu.nl/images/favicon.ico (200)
>> >
>> > I needed the manual redirection because I needed the base_href when no
>> > base_href is given in the HTML source.
>> > I then need the redirected URI to use that as base_href
>> >
>> > Code is not completely done since a use case like:
>> >
>> > e.g. (1) newscred.com (301) --> (2) http://platform.newscred.com (200)
>> > (look
>> > in page) --> (3) http://newscred.com/favicon.ico (301) (wtf? redirect
>> of
>> > content in page) -->
>> > (4) http://newspapers.newscred.com/favicon.ico (200) --> (5)
>> > http://newspapers.newscred.com//media/img/favicon.ico (200)
>> >
>> > does not work yet since this site gives 4 as redirect url while (4) is
>> > actually a page. So i need to add another check for binary content in
>> the
>> > beginning.
>> >
>> > But for all none favicon self-redirection this should work.
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Mar 1, 2011 at 12:31 AM, Scott Kingsley Clark <scott at skcdev.com
>> > >wrote:
>> >
>> > > The spidering process can really take a lot of time for a large site,
>> and
>> > > can end up eating resources and adding time to the infamous php
>> > > max_execution_time so I was looking to cut corners. If I've gotta do
>> two
>> > > requests to do this, I'll do it. Thanks for the advice and attention.
>> > >
>> > > -Scott
>> > >
>> > > On Monday, February 28, 2011 5:28:54 PM UTC-6, Jacob Santos wrote:
>> > > >
>> > > > Not really. The wp_remote_request simply defaults to GET, you can
>> > change
>> > > it
>> > > > to be HEAD, which is what it seems like you are wanting anyway. You
>> can
>> > > > check to see if it is a redirect and then send another request. It
>> does
>> > > not
>> > > > sound like speed is a concern (albeit one factor since many sites
>> can
>> > > quite
>> > > > frankly get up there with the amount of redirects given Canonical
>> URLs
>> > > > might
>> > > > give you (Hint: Should be at most 2 requests, one for the redirect
>> and
>> > > one
>> > > > for the actual page).
>> > > >
>> > > > You'll probably want to use wp_remote_head() instead, since
>> > > > wp_remote_request() is a generic function made to accommodated the
>> rest
>> > > of
>> > > > the HTTP and HTTP extensions (there isn't any built-in calls support
>> > for
>> > > > Subversion or webdav).
>> > > >
>> > > > Jacob Santos
>> > > >
>> > > > On Mon, Feb 28, 2011 at 5:22 PM, Scott Kingsley Clark <
>> > sc... at skcdev.com
>> > > > >wrote:
>> > > >
>> > > > > Actually, this is in regards to a plugin I'm currently developing.
>> > It's
>> > > > in
>> > > > > Beta right now but it's available on WP.org. It's called Search
>> > Engine
>> > > > and
>> > > > > it's like a mini-Google on your site. It spiders your site (or
>> other
>> > > > sites
>> > > > > too) and indexes content into the DB.
>> > > > >
>> > > > > http://wordpress.org/extend/plugins/search-engine/
>> > > > >
>> > > > > <http://wordpress.org/extend/plugins/search-engine/>The use-case
>> is
>> > > that
>> > > > I
>> > > > > want to be able to tell whether a page that's linked to on a site,
>> is
>> > > > > really
>> > > > > redirected elsewhere. Right now, since I switched to
>> > wp_remote_request,
>> > > I
>> > > > > only get the content of the final destination page, without any
>> > > knowledge
>> > > > > of
>> > > > > the path it's taken. So the best my script (or any script) can
>> tell
>> > is
>> > > > that
>> > > > > when you get content using wp_remote_request and it's redirected,
>> > there
>> > > > > page
>> > > > > exists at the URL requested -- oblivious to the real redirect
>> > > happening.
>> > > > > Previously I was using a home-brewed version similar
>> > > > > to wp_remote_request but calling cURL and others manually).
>> > > > >
>> > > > > So it looks like right now I'll need to do a little extra code to
>> > make
>> > > my
>> > > > > own wp_remote_request like function which does both the 301/302
>> > > redirect
>> > > > > headers check and the body content return.
>> > > > >
>> > > > > -Scott
>> > > > >
>> > > > > On Monday, February 28, 2011 5:11:22 PM UTC-6, Dion Hulse (dd32)
>> > wrote:
>> > > > > >
>> > > > > > 2 separate requests will be 2 separate requests.
>> > > > > > What's the use-case you're working on here?
>> > > > > > Personally, I'd do a normal fetch, followed by a head if it was
>> a
>> > > > > > exceeded-redirects error if you want the body, otherwise, the
>> url..
>> > > > > > But i cant think of a case where you'd want one or the other..
>> > > > > >
>> > > > > > On 1 March 2011 04:06, Scott Kingsley Clark <sc... at skcdev.com>
>> > > wrote:
>> > > > > >
>> > > > > > > Not sure if anyone knows this, but does the page get loaded
>> twice
>> > > or
>> > > > is
>> > > > > > the
>> > > > > > > second time getting loaded from some sort of cache? I'm
>> > > specifically
>> > > > > > > calling
>> > > > > > > to the idea of using wp_remote_head on a URL to check for a
>> > > redirect,
>> > > > > and
>> > > > > > > then using wp_remote_request on the same URL to get the
>> content /
>> > > > etc.
>> > > > > > > _______________________________________________
>> > > > > > > wp-hackers mailing list
>> > > > > > > wp-h... at lists.automattic.com
>> > > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> > > > > > >
>> > > > > > >
>> > > > > > _______________________________________________
>> > > > > > wp-hackers mailing list
>> > > > > > wp-h... at lists.automattic.com
>> > > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > _______________________________________________
>> > > > > wp-hackers mailing list
>> > > > > wp-ha... at lists.automattic.com
>> > > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> > > > >
>> > > > >
>> > > > _______________________________________________
>> > > > wp-hackers mailing list
>> > > > wp-ha... at lists.automattic.com
>> > > > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> > > >
>> > > >
>> > >
>> > > _______________________________________________
>> > > wp-hackers mailing list
>> > > wp-hackers at lists.automattic.com
>> > > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> > >
>> > >
>> > _______________________________________________
>> > wp-hackers mailing list
>> > wp-hackers at lists.automattic.com
>> > http://lists.automattic.com/mailman/listinfo/wp-hackers
>> >
>> _______________________________________________
>> wp-hackers mailing list
>> wp-hackers at lists.automattic.com
>> http://lists.automattic.com/mailman/listinfo/wp-hackers
>>
>
>


More information about the wp-hackers mailing list