[wp-trac] [WordPress Trac] #41920: Fixing extreme admin page renders due to broken systemd-resolved making WordPress appear slow

Tue Sep 19 13:13:33 UTC 2017

#41920: Fixing extreme admin page renders due to broken systemd-resolved making
WordPress appear slow
-----------------------------+-----------------------------
 Reporter:  dfavor           |      Owner:
     Type:  feature request  |     Status:  new
 Priority:  normal           |  Milestone:  Awaiting Review
Component:  General          |    Version:  4.8.1
 Severity:  normal           |   Keywords:
  Focuses:                   |
-----------------------------+-----------------------------
 '''Feature Request'''

 I've marked this as a feature request, because this is not a WordPress
 issue + any WordPress site running on a systemd based OS can be effected.

 This makes WordPress appear to run slow (which is a total myth), so likely
 this is good information for somewhere in the Codex.

 My request is for Codex to reflect unexplained slowness in WordPress admin
 pages may be due to systemd-resolved + if this code is running, replace it
 with a working DNS resolver + retest.

 '''Related tickets...'''

 [https://core.trac.wordpress.org/ticket/40266] - Previous WordPress
 ticket.

 [https://github.com/rmccue/Requests/issues/272] - Upstream GitHub ticket.

 '''This problem occurs when two conditions exist, which are very
 common.'''

 1) OS Distro uses the systemd abomination (my opinion).

 2) Users logged in as admin visit any page.

 This seems to occur as follows...

 1) systemd-resolved caching is completely broken + every DNS lookup
 generates a recursive lookup, rather than pulling DNS records from
 previously cached lookups.

 This is extremely easy to verify.

 Just issue '''host google.com''' repeatedly + you'll see multi-second
 lags, before IP is returned. These lags occur every time the host command
 is issued for any DNS lookup.

 So caching never seems to work... ever...

 And executing this - '''while : ; do host google.com ; done''' - will
 sometimes recreate the problem.

 Since this only seems to occur sometimes, seems like some DNS lookup
 returns data which corrupts systemd-resolved memory, which allows the
 process to continue running + never work again.

 2) systemd-resolved hangs for 5 seconds + times out + returns an error,
 which might be the reason the Requests library has hostname/ip missing in
 some exceptions. This was the original ticket issue.

 This is the behavior, once #1 seeming memory corruption occurs.

 3) systemd-resolved, once hung for a specific hostname, seems hung
 forever. In other words, once a time out occurs for any hostname lookup,
 future lookups fail with a time out also.

 You can see this behavior using the Query Monitor plugin, tracking HTTP
 requests. Once a request goes red/fail, it never recovers.

 '''The Simple Fix'''

 Nuke systemd-resolved + replace with a working DNS system.

 I host client sites in LXC containers, so container startup includes...

 1) If missing, install dnsmasq-base (Debian/Ubuntu).

 2) If systemd-resolved exists, nuke it - disable it in systemd + remove
 all related packages.

 For Ubuntu, the commands I use are...

 {{{
 systemctl stop systemd-resolved
 systemctl disable systemd-resolved
 apt-get -yqq purge libnss-resolve
 }}}

 The disable (counter intuitive) is required, because systemd Ubuntu
 packaging is broken where systemd tries to start systemd-resolved, even
 after it's purged from system, so doing the disable speeds up boots (where
 systemd hangs trying to start systemd-resolved).

 3) Start dnsmasq using a highly optimized caching config, which avoids all
 disk i/o.

 Admin pages taking 10-30 seconds now render first time in <10 seconds +
 subsequent times <1sec, as expected.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/41920>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform