[wp-hackers] preventing the duplicate wp_insert_post effect
Haluk Karamete
halukkaramete at gmail.com
Fri Jun 22 00:12:00 UTC 2012
One discovery...
If I change the "wp_insert_post" to wpdb->insert and use a straight
wpdb insert, there are no problems. So this is all comes down to the
wp_insert_post.
On Thu, Jun 21, 2012 at 12:08 PM, Haluk Karamete
<halukkaramete at gmail.com> wrote:
> This is all part of a migration routine I wrote where a while loop
> goes thru 15,000 records from an ms-sql table and converts each record
> into a post_data array to be fed into the wp_insert_post API.
> Technically, since I'm dealing with 15000 recs in ms-sql, I should end
> up with 15000 posts in a fresh installed wp_posts table. Right? In my
> case, I sometimes get 16000, sometimes 290000. It's never the same.
> Everytime is different. I'm almost to the point to re-write the code.
> But if the culprit itself the wp_insert_post and some internal wp
> process or wordpress cache that I do not know of or mysql server
> config and or and or and or... , even the new approach won't work.
>
> After some research, I've come to learm I'm not alone on this. But
> from what I read, I could not get a wrapper around it.
>
> As I said above, I create the posts going thru a while loop.
>
> Since the process takes a long time, I had to build a custom time-out
> module. As the code goes thru the iterations of the while loop, it
> marks its process into a time_out table, and siince I got a watch dog
> page that's running in an iframe to keep an eye on the time_out
> table's progress, even if the bottom frame ( the one that handles the
> ms-sql to wp_posts process ) times out, I know from what record to
> kick start the process.
>
> This way I do not have to sit in front of the computer to keep
> clicking on 'continue'. link.
>
> Well, this trick works on all of my implementations that are severed
> by timeouts . I eventually end up getting the whole table is processed
> and the watch dog iframe stops refreshing when the number of records
> to be processed is equal to the number of records that have been
> processed in the last instance. So the time-out module works perfectly
> except....
>
> Only and only when the destination table is wp_posts and when the the
> wp_insert_post API is involved, I get a problem.
>
> After the first time-out, things start getting hay-wired.
>
> From time to time, wp_insert_post ends up firing twice/thrice causing
> the same record to be inserted multiple times.
>
> To remedy the situation, I've tried 3 different techniques including
> the use of custom fields, but no luck. I also tried inserting a unique
> key into the post itself to minimize the database activity. I found
> this is better cause it avoids the involvement of custom fields, and
> yet achieves the same goal. I put the source_key ( which is something
> like --{$db_name}.{$table_name}.{$recid_value}-- ) in the post and
> at_insert_time , I just check the existence of that key ( using like
> operator ) to see if that post was added previously. but even that
> fails... Wp_insert_post surprisingly creates double records. and up
> until now, I simply cannot pin-down the occurrence of the problem.
>
> I read this, Prevent duplicate posts in wp_insert_post using custom
> fields But my techniques were simply alternatives to it.
>
> Here is a cut-down code of how I do it...
>
> while ($row = mysql_fetch_assoc($RS)) :
>
> $source_recid = $row['source_recid'];
> //source_recid is something like db.table.tablerecid
> //example services.media.1223
>
> $post_data['post_content'] = $row['some_field_thats_got_page_content'];
>
> //append the source_recid to the post data in forms of an html
> comment for uniqueness purposes
> $post_data['post_content'] = $post_data['post_content'] .
> "<!--{$source_recid}-->";
>
> //do the other stuff... ( shortened here for the clarify purposes... )
>
> ....
>
> $post_data['post_status'] = 'publish';
>
> Insert_or_UpdatePost($dbh,$post_data,$source_recid,$post_id);
>
>
> if ($post_id==0):
>
> //log the error and move on
>
> continue;
>
> endif;
>
>
> endwhile;
>
> function Insert_or_UpdatePost($dbh,$post_data,$source_recid,&$post_id){
>
> // this function first looks up the --db.table.tablerecid-- sequence
> // across all wp_posts post_content field
> // in determining if this record has been already INSERTed!
>
> // and depending on the outcome, it does an insert or update
> // and return the $post_id accordingly
>
> // if the function determines there are no such records,
> // then and only then, it runs the wp_insert_post!
>
> // if the function determines that there is a post like that,
> // it retrieves the post_id
> // and then switches to operation
> // to use the wp_update_post instead!
>
> // with this approach, technically speaking,
> // it should impossible to run wp_insert_post on an existing post!
>
> // and yet, I still get duplicate posts...
>
> // here we go;
>
> $post_id_probed = blp_sql_getdbvals($dbh,"select id from
> wp_posts where post_content LIKE '%--{$source_recid}--%'");
>
> if (blp_isnumber($post_id_probed)):
> $post_id = $post_id_probed;
>
> $post_data['ID'] = $post_id;
> $post_id = wp_update_post( $post_data );
>
> if ($post_id == 0):
>
> //add error
>
> return FALSE;
>
>
> else:
>
> update_post_meta($post_id, "wpcf-blp-migration-date",
> blp_now('mysql'));
>
> return TRUE;
>
> endif;
>
>
> endif;
>
> // if we make it this part, it means only one thing!
> // there is no post for the db.table.tablerecid yet,
> // so do the insert!
>
> $post_id = wp_insert_post( $post_data );
>
>
>
> if ($post_id == 0):
>
> //add error
>
> return FALSE;
>
>
> else:
>
> //add_post_meta($post_id, "wpcf-blp-migration-source",
> $source_recid,TRUE);
> //no need for that anymore
>
> return TRUE;
>
>
> endif;
>
>
> }
>
> There is also a post I created at wordpress.stackexchange site.
>
> http://wordpress.stackexchange.com/questions/19732/prevent-duplicate-posts-in-wp-insert-post-using-custom-fields
More information about the wp-hackers
mailing list