[wp-hackers] preventing the duplicate wp_insert_post effect

Haluk Karamete halukkaramete at gmail.com
Fri Jun 22 00:12:00 UTC 2012


One discovery...

If I change the  "wp_insert_post"  to wpdb->insert and use a straight
wpdb insert, there are no problems. So this is all comes down to the
wp_insert_post.

On Thu, Jun 21, 2012 at 12:08 PM, Haluk Karamete
<halukkaramete at gmail.com> wrote:
> This is all part of a migration routine I wrote where a while loop
> goes thru 15,000 records from an ms-sql table and converts each record
> into a post_data array to be fed into the wp_insert_post API.
> Technically, since I'm dealing with 15000 recs in ms-sql, I should end
> up with 15000 posts in a fresh installed wp_posts table. Right? In my
> case, I sometimes get 16000, sometimes 290000. It's never the same.
> Everytime is different. I'm almost to the point to re-write the code.
> But if the culprit itself the wp_insert_post and some internal wp
> process or wordpress cache that I do not know of or mysql server
> config and or and or and or... , even the new approach won't work.
>
> After some research, I've come to learm I'm not alone on this. But
> from what I read, I could not get a wrapper around it.
>
> As I said above, I create the posts going thru a while loop.
>
> Since the process takes a long time, I had to build a custom time-out
> module. As the code goes thru the iterations of the while loop, it
> marks its process into a time_out table, and siince I got a watch dog
> page that's running in an iframe to keep an eye on the time_out
> table's progress, even if the bottom frame ( the one that handles the
> ms-sql to wp_posts process ) times out, I know from what record to
> kick start the process.
>
> This way I do not have to sit in front of the computer to keep
> clicking on 'continue'. link.
>
> Well, this trick works on all of my implementations that are severed
> by timeouts . I eventually end up getting the whole table is processed
> and the watch dog iframe stops refreshing when the number of records
> to be processed is equal to the number of records that have been
> processed in the last instance. So the time-out module works perfectly
> except....
>
> Only and only when the destination table is wp_posts and when the the
> wp_insert_post API is involved, I get a problem.
>
> After the first time-out, things start getting hay-wired.
>
> From time to time, wp_insert_post ends up firing twice/thrice causing
> the same record to be inserted multiple times.
>
> To remedy the situation, I've tried 3 different techniques including
> the use of custom fields, but no luck. I also tried inserting a unique
> key into the post itself to minimize the database activity. I found
> this is better cause it avoids the involvement of custom fields, and
> yet achieves the same goal. I put the source_key ( which is something
> like --{$db_name}.{$table_name}.{$recid_value}-- ) in the post and
> at_insert_time , I just check the existence of that key ( using like
> operator ) to see if that post was added previously. but even that
> fails... Wp_insert_post surprisingly creates double records. and up
> until now, I simply cannot pin-down the occurrence of the problem.
>
> I read this, Prevent duplicate posts in wp_insert_post using custom
> fields But my techniques were simply alternatives to it.
>
> Here is a cut-down code of how I do it...
>
> while ($row = mysql_fetch_assoc($RS)) :
>
>        $source_recid = $row['source_recid'];
>        //source_recid is something like db.table.tablerecid
>            //example services.media.1223
>
>        $post_data['post_content'] = $row['some_field_thats_got_page_content'];
>
>        //append the source_recid to the post data in forms of an html
> comment for uniqueness purposes
>        $post_data['post_content'] = $post_data['post_content'] .
> "<!--{$source_recid}-->";
>
>        //do the other stuff... ( shortened here for the clarify purposes... )
>
>        ....
>
>        $post_data['post_status'] = 'publish';
>
>        Insert_or_UpdatePost($dbh,$post_data,$source_recid,$post_id);
>
>
>        if ($post_id==0):
>
>            //log the error and move on
>
>            continue;
>
>        endif;
>
>
> endwhile;
>
> function Insert_or_UpdatePost($dbh,$post_data,$source_recid,&$post_id){
>
>        // this function first looks up the --db.table.tablerecid-- sequence
>        // across all wp_posts post_content field
>        // in determining if this record has been already INSERTed!
>
>        // and depending on the outcome, it does an insert or update
>        // and return the $post_id accordingly
>
>        // if the function determines there are no such records,
>        // then and only then, it runs the wp_insert_post!
>
>        // if the function determines that there is a post like that,
>        // it retrieves the post_id
>        // and then switches to operation
>        // to use the wp_update_post instead!
>
>        // with this approach, technically speaking,
>        // it should impossible to run wp_insert_post on an existing post!
>
>        // and yet, I still get duplicate posts...
>
>        // here we go;
>
>        $post_id_probed = blp_sql_getdbvals($dbh,"select id from
> wp_posts where post_content LIKE '%--{$source_recid}--%'");
>
>        if (blp_isnumber($post_id_probed)):
>            $post_id = $post_id_probed;
>
>            $post_data['ID'] = $post_id;
>            $post_id = wp_update_post( $post_data );
>
>            if ($post_id == 0):
>
>                //add error
>
>                return FALSE;
>
>
>            else:
>
>                update_post_meta($post_id, "wpcf-blp-migration-date",
> blp_now('mysql'));
>
>                return TRUE;
>
>            endif;
>
>
>         endif;
>
>         // if we make it this part, it means only one thing!
>         // there is no post for the db.table.tablerecid yet,
>         // so do the insert!
>
>        $post_id = wp_insert_post( $post_data );
>
>
>
>        if ($post_id == 0):
>
>            //add error
>
>            return FALSE;
>
>
>        else:
>
>            //add_post_meta($post_id, "wpcf-blp-migration-source",
> $source_recid,TRUE);
>            //no need for that anymore
>
>            return TRUE;
>
>
>        endif;
>
>
> }
>
> There is also a post I created at wordpress.stackexchange site.
>
> http://wordpress.stackexchange.com/questions/19732/prevent-duplicate-posts-in-wp-insert-post-using-custom-fields


More information about the wp-hackers mailing list