[wp-trac] [WordPress Trac] #14525: Blogger importer prepends ">" to all content

WordPress Trac wp-trac at lists.automattic.com
Thu May 31 17:00:48 UTC 2012


#14525: Blogger importer prepends ">" to all content
------------------------------------+----------------------------
 Reporter:  mdawaffe                |       Owner:  Otto42
     Type:  defect (bug)            |      Status:  assigned
 Priority:  normal                  |   Milestone:  WordPress.org
Component:  Import                  |     Version:
 Severity:  major                   |  Resolution:
 Keywords:  has-patch dev-feedback  |
------------------------------------+----------------------------

Comment (by Workshopshed):

 I'm looking at adding images to this importer, does anyone have an opinion
 on this? Is it within the remit of what the plugin should be doing or
 should it be separate?

 I've been looking at a couple of other plugins that do similar things, but
 they don't seem to quite to everything I'd expect.

 * http://wordpress.org/extend/plugins/remote-images-grabber
 * http://notions.okuda.ca/wordpress-plugins/blogger-image-import/

 The images on blogger are typically in the format of:

 {{{
 <a href="hihrezimage.jpg"><img src="lowrezimage.jpg">
 }}}

 The images structure of the URLs contains a "size" indicator as one of the
 "folders" e.g. s144 for a lowresolution file and s800 for a large one.
 There is a note in the blogger image importer that some of these point at
 a page rather than an image

 * https://lh4.googleusercontent.com/-nt66qhxzDyY/TZOD-
 RhTYMI/AAAAAAAACd4/Elzm1smRFb4/s144/Ski%2520Trip.jpg (small image)
 * https://lh4.googleusercontent.com/-nt66qhxzDyY/TZOD-
 RhTYMI/AAAAAAAACd4/Elzm1smRFb4/s800/Ski%2520Trip.jpg (large image)
 * https://lh4.googleusercontent.com/-nt66qhxzDyY/TZOD-
 RhTYMI/AAAAAAAACd4/Elzm1smRFb4/s800-h/Ski%2520Trip.jpg (page)

 On my blogs which have images back 4 or 5 years now the images are located
 on xxx.googleusercontent.com, xxx.ggpht.com, xxx.bp.blogspot.com with
 additionally some of the higher resolution images coming from
 picasaweb.google.com

 Can I confirm that the correct process to handle the images would be
 something like the following?

 * Get the location of the upload folder (wp_upload_dir)
 * Check image is not already downloaded
 * Confirm that the medium resolution (img src) url points to an image
 (wp_check_filetype)
 * Download the medium resolution image to uploads folder (download_url)
 * Read meta data for the image
 * Confirm that the high resolution (link href) url points to an image
 * Download the high resolution image to uploads folder
 * Read meta data for the image
 * Add the image details to the database as an attachement
 * Add attachment meta data
 * Generate thumbnails (wp_generate_attachment_metadata)

 * Change link URL to point to the high resolution image
 * Change image URL to point to the low resolution image
 * Link the attachment to the post

 I'm thinking that a separate class that takes a of URLs and handles the
 downloading and creation of the attachment passing the details back to the
 caller. It should theoretically be possible to use that in any importer?

 I'm thinking that the changes would be along the lines of

 * New class that handles the image processes as described above

 * WP_SimplePie_BlogItem would be extended to include a get_images function
 returning a collection of images, possibly in pairs of image and link?

 * Blogger_Importer->import_blog change to pass these images to the new
 class to process them
 * Blogger_Importer->import_blog change to store the results in
 BloggerEntry
 * Blogger_Importer->import_blog change to do a find and replace on URLs in
 the content

 * Blogger_Importer->import_post changed to connect the attachement details
 to the post

 Perhaps we could also keep a count of the attachements downloaded?

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/14525#comment:29>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list