[wp-hackers] pulling a massive HTML site into Wordpress

Tue Jun 7 01:09:54 UTC 2011

On 6 June 2011 23:02, John Black <immanence7 at gmail.com> wrote:

>
> On 6 Jun 2011, at 16:50, Dion Hulse (dd32) wrote:
>
> >> I see there are some plugins to handle 301 redirects. But these tend to
> be
> >> for a handful of files, not 50,000. Any thoughts on how this would
> managed?
> >>
> >
> > I'd be storing a meta of their original file location when inserting
> them,
> > That way you can add a filter later to the 404/canonical handlers to
> check
> > the url against the meta fields to find the old document, and issue the
> 301.
> > Or, you could store the meta, retrieve it later to create a massive
> redirect
> > list, and feed that into .htaccess or similar.
>
>
>

How would you generate the meta? Some of the more recent HTML files have a
> note of the URL of the file embedded. But a quick check shows that the older
> files (as I say, the archive goes back to 1998) don't.
>
I wouldnt store the full url, rather, the url of that particular page. I
assumed the files you have, are in the same structure as the live files? If
so, I'd store (for example) /2008/directory_here/filename.html as the meta.

> I was hoping to do the migration on a localhost install. To get the meta
> would I have to do the migration on the actual server of this organization?
>

Not at all, Just only store the part of the url which matters.
Of course, If the files are live on the web in a different format/url
structure, you would need a way of mapping the live structure to the archive
files you have.

>
> best,
> JB
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>