Antw: [EXT] Re: Connections blocked for some tens of seconds while a single slapd thread running 100%

6 Nov 2020


      ...
...
...
Simone Piccardi piccardi@truelite.it schrieb am 05.11.2020 um 16:17 in
Nachricht 5a6d778a-b75b-3027-3a88-f5507c83977b@truelite.it:
...
Il 03/11/20 22:49, Quanah Gibson‑Mount ha scritto:
...
...
The problem manifests itself without periodicity and looking on the
number of connection before it we could not see any usage peak. We tried
to strace slapd threads during the problem, and they seem blocked on a
mutex waiting for the one running at 100% (in a single CPU, user time).
I'm attaching a top results during one of these events.
If you can attach to the process while this is occurring, I'd suggest
obtaining a full GDB backtrace to see what the different slapd threads
are doing at that time.  Also, what mutex specifically is slapd waiting
on?
...
...
I executed gstack on the slapd pid during one of such events saving the
output, they are attached, but the running slapd is stripped so they are
quite obscure (at least for me).
I think even when stripped, you could "re-attach" the symbols (given that you
saved them before stripping). For some dirstributions, such symbol (debug)
packages are available for install. I don't know for your package source,
however.
...
We are trying to put in a non stripped version (compiled with
CFLAGS='‑g"  and ‑‑enable‑debug=yes) in use for a test, but that's a
production machine, and it will take a while.
What I should do to find which one the mutex is? in the straces they are
identified just by a number.
...
...
So a first question is: there is any other configuration parameter about
indexing that I can try?
If you really believe that this is indexing related, you should be able
to tell this from the slapd logs at "stats" logging, where you would see
a specific search taking a significant amount of time.  However that
generally does not lead to a system that's paused as searches shouldn't
trigger a mutex issue like what you're describing.
No, it is not that I believe that, as I said it was just a guess about
something that could need full CPU for tens of seconds blocking all
other operations. But from what you are saying the guess is probably
plain wrong.
...
Is this on RHEL7 or later?  If you have both "stats" and "sync" logging
enabled (the recommended setting for replicating nodes), what does the
slapd log show is happening at this time?
The server is running an updated version of Amazon Linux (Amazon Linux
AMI 2018.03).
We enabled stats and sync to logs, and I'm attaching a redacted excerpt
of them around the incident time, when I also took the gstack.txt (done
at 00:39:04) and gstack2.txt (done at 00:39:20) backtraces. But during
that time there is no data.
Simone

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Antw: [EXT] Re: Connections blocked for some tens of seconds while a single slapd thread running 100%