Hello,
To start with, this is my very first post on a mailing list of this kind.
I've read much about how to do things right, like here:
https://www.openldap.org/doc/admin24/troubleshooting.html or here:
https://www.openldap.org/faq/data/cache/59.html
Nonetheless, please do tell me if I'm not following the standards.
I'm facing an issue with my OpenLDAP servers that I struggle to fix or even
find info about.
To summarize:
We have an architecture of 2 providers with mirror mode replication for
their configuration. They are LXC's hosted in a datacenter.
We have 2 consumers with mirror mode replication for their config also, and
they get the DB (mdb backend) from the above providers. They are VMs hosted
in a different hypervisor and datacenter than the providers.
We have a loadbalancing architecture in front that sends requests evenly to
the 4 servers. The writes requests directed to the consumers are forwarded
to the providers using the chain overlay.
The slapd process of those 2 consumers randomly hangs. When it happens, the
slapd service is still running and listening, but not a single query can go
through, even the logging stops. Like a complete freeze. The service comes
back to life after some time (several minutes, different every time)
without any manual intervention. Trying to manually restart the process
during such even obviously fails, unless I kill -9, which is not something
I want to do.
My observations so far :
- The hangs appear only during business hours (nothing during nights or
week-end).
- Only the consumers hang. The providers are always fine.
- CPU / RAM / Disks are fine before / during the crashes.
- The hangs seem random and I could not find a way to trigger them at will.
- strace shows the following lines in an infinite loop during hangs.
```
[pid 1082] gettimeofday({tv_sec=1726669191, tv_usec=694473}, NULL) = 0
[pid 1082] poll([{fd=18, events=POLLIN|POLLPRI}], 1, 100) = 0 (Timeout)
[pid 1082] sched_yield()
```
It seems that the process is waiting for some data that never comes.
The fd=18 is referencing to a TCP socket connected to provider1. So it
would suggests that the provider is not sending what the consumer wants.
This has led me to several suppositions:
1. Network issues
Unlikely, since only those 2 servers in the DC are experiencing
misbehavior, and I can telnet / ldapsearch from the faulty consumer to the
provider with no problem during hangs.
2. Syncrepl / slapo-chain misconfiguration
I might have a configuration problem, but no matter how much I review it, I
don't see what's wrong. I've provided the syncrepl / slapo-chain conf in
the config.ldif attachment.
I've also provided several other attachments that I thought would be
helpful:
- log.txt => The slapd logs in trace level that show the last bind before
the service hangs.
- ldaps_request.log => An ldapsearch performed against the faulty consumer
during hang.
- starttls_request.log => Same as above, but using StartTLS => the
connection seems successful and stops at the server hello TLS exchange.
Versions :
- slapd: 2.5.18 LTS
- Server: Ubuntu 22.04 LTS
I am a bit out of ideas on things I could try to fix the issues. Hence this
email.
I thank you for your time and I am very grateful for any help you could
give me.
Regards,
Pierre-Jean
On 24/08/2009 14:16, Jonathan Clarke wrote:
> On 20/08/2009 14:39, Brian Neu wrote:
>> Forgive me if pasting here is bad etiquette.
>>
>>
>> <consumer slapd.conf>
>>
>> include /etc/openldap/schema/corba.schema
>> include /etc/openldap/schema/core.schema
>> include /etc/openldap/schema/cosine.schema
>> include /etc/openldap/schema/duaconf.schema
>> include /etc/openldap/schema/dyngroup.schema
>> include /etc/openldap/schema/inetorgperson.schema
>> include /etc/openldap/schema/java.schema
>> include /etc/openldap/schema/misc.schema
>> include /etc/openldap/schema/nis.schema
>> include /etc/openldap/schema/openldap.schema
>> include /etc/openldap/schema/ppolicy.schema
>> include /etc/openldap/schema/collective.schema
>> include /etc/openldap/schema/samba.schema
>>
>> allow bind_v2
>>
>> pidfile /var/run/openldap/slapd.pid
>> argsfile /var/run/openldap/slapd.args
>>
>> TLSCACertificateFile /etc/openldap/cacerts/cavictory2.crt
>> TLSCertificateFile /etc/openldap/keys/victory3cert.pem
>> TLSCertificateKeyFile /etc/openldap/keys/victory3key.pem
>>
>> database hdb
>> suffix "dc=srg,dc=com"
>> checkpoint 1024 15
>> rootdn "cn=Manager,dc=srg,dc=com"
>>
>> rootpw {MD5}blah
>>
>> directory /var/lib/ldap
>>
>> index objectClass eq,pres
>> index ou,cn,mail,surname,givenname eq,pres,sub
>> index uidNumber,gidNumber,loginShell eq,pres
>> index uid,memberUid eq,pres,sub
>> index nisMapName,nisMapEntry eq,pres,sub
>>
>> syncrepl rid=0
>> provider=ldap://victory2.srg.com:389
>> bindmethod=simple
>> starttls=critical
>> binddn="cn=replicator,dc=srg,dc=com"
>> credentials=blah
>> searchbase="dc=srg,dc=com"
>> logbase="cn=accesslog"
>> schemachecking=on
>> type=refreshAndPersist
>> retry="60 +"
>> syncdata=accesslog
>
> I don't see anything wrong with this - although I'm not very familiar
> with accesslog configuration.
>
> Does the "cn=replicator,dc=srg,dc=com" have full access on the provider
> to read necessary data?
Please ignore this post - I hadn't seen that the discussion continued
already. My mailer displayed it in a separate post, got me confused on a
Monday morning :/
I have the following attributes set in my ldap backend for the chain overlay.
olcDbURI: "ldaps://ds2-q.global.aep.com:636"
olcDbStartTLS: ldaps starttls=no tls_cacert="/appl/openldap/etc/openldap/tls/cacerts.cer" tls_reqcert=demand tls_crlcheck=none
The referenced file is the exact same file I use in this global attribute
olcTLSCACertificateFile: /appl/openldap/etc/openldap/tls/cacerts.cer
This is a 2.4.44 replication consumer using the following replication attribute
olcSyncrepl: {1}rid=112 provider=ldaps://ds2-q.global.aep.com:636 binddn="cn=syncuser,ou=Automatons,ou=Users,dc=Global,dc=aep,dc=com" bindmethod=simple credentials=<redacted> searchbase="dc=Global,dc=aep,dc=com" type=refreshAndPersist retry="5 5 300 +" timeout=1
Replication works perfectly and changes to the referenced master are replicated to this slave. I can see successful connections for rid=112 to this master in the log. The problem is when the chain overlay tries to follow referrals to this same master I get the following error:
595fbb1c conn=1000 op=1 ldap_chain_op: ref="ldaps://ds2-q.global.aep.com:636/uid=s012235,ou=Employees,ou=Users,dc=Global,dc=aep,dc=com" -> "ldaps://ds2-q.global.aep.com:636"
595fbb1c conn=1000 op=1 ldap_chain_op: ref="ldaps://ds2-q.global.aep.com:636/uid=s012235,ou=Employees,ou=Users,dc=Global,dc=aep,dc=com": URI="ldaps://ds2-q.global.aep.com:636" found in cache
ldap_create
ldap_url_parse_ext(ldaps://ds2-q.global.aep.com:636)
595fbb1c =>ldap_back_getconn: conn=1000 op=1: lc=0x10180430 inserted refcnt=1 rc=0
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP ds2-q.global.aep.com:636
ldap_new_socket: 16
ldap_prepare_socket: 16
ldap_connect_to_host: Trying 10.92.127.52:636
ldap_pvt_connect: fd: 16 tm: -1 async: 0
attempting to connect:
connect success
TLS trace: SSL_connect:before/connect initialization
TLS trace: SSL_connect:SSLv2/v3 write client hello A
TLS trace: SSL_connect:SSLv3 read server hello A
TLS certificate verification: depth: 1, err: 19, subject: /C=US/ST=Ohio/L=Columbus/O=American Electric Power/OU=Complex - Middleware/CN=AEP Root CA (2014)/emailAddress=middleware(a)aep.com, issuer: /C=US/ST=Ohio/L=Columbus/O=American Electric Power/OU=Complex - Middleware/CN=AEP Root CA (2014)/emailAddress=middleware(a)aep.com
TLS certificate verification: Error, self signed certificate in certificate chain
TLS trace: SSL3 alert write:fatal:unknown CA
TLS trace: SSL_connect:error in error
TLS trace: SSL_connect:error in error
TLS: can't connect: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed (self signed certificate in certificate chain).
595fbb1c send_ldap_result: conn=1000 op=1 p=3
595fbb1c send_ldap_result: conn=1000 op=1 p=3
595fbb1c send_ldap_response: msgid=2 tag=103 err=52
ber_flush2: 14 bytes to sd 15
595fbb1c conn=1000 op=1 RESULT tag=103 err=52 text=
So, is there something wrong with the value of the olcDBStartTLS attribute that I'm not seeing??
-Jon C. Kidder
American Electric Power
Complex - Middleware Engineering
Hi,
I'm trying to set up a proxy-cache to a couple of OpenLDAP servers configured in mirror mode.
The back-ldap part is working fine and I can query the underlying slapd instances through the proxy.
However, I'm still struggling with the caching bit.
I couldn't find any documentation or posts related to setting this up using the new cn=config way of configuring LDAP. So after some reading and a bit of guessing, I came up with the following config:
# {1}ldap, config
dn: olcDatabase={1}ldap,cn=config
objectClass: olcDatabaseConfig
objectClass: olcLDAPConfig
olcDatabase: {1}ldap
olcSuffix: dc=sol1,dc=net
olcAccess: {0}to dn.base="" by * read
olcAccess: {1}to dn.base="cn=Subschema" by * read
olcAccess: {2}to * by self write by users read by anonymous auth
olcRootDN: uid=ldapadmin,dc=sol1,dc=net
olcRootPW: secret
olcDbURI: "ldap://192.168.200.12 ldap://192.168.200.14"
olcDbACLBind: bindmethod=simple binddn="uid=ldapadmin,dc=sol1,dc=net" credentials="secret" starttls=no
# {0}pcache, {1}ldap, config
dn: olcOverlay={0}pcache,olcDatabase={1}ldap,cn=config
objectClass: olcOverlayConfig
objectClass: olcPcacheConfig
olcOverlay: {0}pcache
olcPcache: bdb 10000 3 1000 100
olcPcacheAttrset: 0 uid userPassword uidNumber gidNumber cn homeDirectory loginShell gecos description objectClass
olcPcacheAttrset: 1 sudoCommand sudoHost
olcPcacheAttrset: 2 gidNumber
olcPcacheTemplate: (&(objectClass=)(uid=)) 0 300
olcPcacheTemplate: (sudoUser=) 1 300
olcPcacheTemplate: (&(objectClass=)(memberUid=)) 2 300
# {2}bdb, config
dn: olcDatabase={2}bdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcDatabase: {2}bdb
olcDbDirectory: /var/lib/ldap/cache
olcSuffix: cn=proxy
olcRootDN: uid=ldapadmin,dc=sol1,dc=net
olcDbCacheSize: 5000
olcDbConfig: {0}set_cachesize 0 2097152 0
olcDbConfig: {1}set_lk_max_objects 1500
olcDbConfig: {2}set_lk_max_locks 1500
olcDbConfig: {3}set_lk_max_lockers 1500
olcDbIndex: objectClass eq
olcDbIndex: uid eq
olcDbIndex: cn eq
olcDbIndex: uidNumber eq
olcDbIndex: gidNumber eq
olcDbIndex: memberUid eq
olcDbIndex: uniqueMember eq
olcDbIndex: mail eq
olcDbIndex: surname eq
olcDbIndex: givenname eq
olcDbIndex: sambaSID eq
olcDbIndex: sambaPrimaryGroupSID eq
olcDbIndex: sambaDomainName eq
olcDbIndex: sudoUser eq
But running ldapsearch keeps returning:
# search result
search: 2
result: 52 Server is unavailable
text: pcachePrivDB: cacheDB not available
I didn't find any way to specify which database to use when declaring the overlay, apart from the 'bdb' part of olcPcache, but that seems to be interpreted as the database type, not its name (I've tried replacing it with cn=proxy, but that throws an error).
Looking at the pcache overlay source (I'm running 2.4.21 from Ubuntu Lucid and also checked the latest 2.4.23 stable source), I can see this bit:
{ "pcache-", "private database args",
1, 0, STRLENOF("pcache-"), ARG_MAGIC|PC_PRIVATE_DB, pc_cf_gen,
NULL, NULL, NULL },
That seems to be for the private DB options, but the other equivalent "pcacheXXXX" in this file have the corresponding attribute declaration for the schema instead of 'NULL, NULL, NULL'.
Anyway, I'm obviously missing something :)
If someone who's got this working or a developer could point me in the right direction, that would be greatly appreciated!
Thanks,
Nico
Hi
Please ignore me previous mail.
1. There are too many errors like above.
Ans:-We are having some 48 files of 500 M each and no of such errors on
each files on average are 15-20
2 increase cachesize to at least 4GB that is
set_cachesize 4 0 1
Ans:- i have done this in DB_CONFIG file
set_cachesize 0 4294967295 0 it showing in db_stats as
3GB 1023M 1023K
Please let me know if its wrong and how and where i should modify this.
3.checkpoint 128 15
I would set checkpointing to 0 15
Ans:- Its for only data import or after data import also.
4. Please not that overlay declarations follow all database declarations,
modify slapd.conf accordingly.
Ans:- We are using static config file for all configuration, i didn't get
how it matters declaration of overlays, can i specify just after schema
declaration.
Thanks a lot, it gives me some better insight.
On Wed, Dec 26, 2012 at 9:58 AM, anil beniwal <beni.anil(a)gmail.com> wrote:
> Hi
>
> I didn't get it.
>
> Can you please elaborate it a bit. It would be great help for me.
>
>
>
>
> On Wed, Dec 26, 2012 at 2:37 AM, Dieter Klünter <dieter(a)dkluenter.de>wrote:
>
>> Am Tue, 25 Dec 2012 21:27:39 +0530
>> schrieb anil beniwal <beni.anil(a)gmail.com>:
>>
>> > Hi
>> >
>> > We are having 4 million users to migrate, all data exported from
>> > oracle to multiple ldif files.
>> > Imported 1 million till now, took almost 28 hours. and openldap-data
>> > dir of about 28G.
>> > openldap version 2.4.33 bdb version 5.1.29 RHEL 6.3 RAM 8G 4 cpu ,
>> > system is a VM.
>> >
>> > Currently running slapadd output
>> > + /apps/openldap/sbin/slapadd -q -c -w -f
>> > /apps/openldap/etc/openldap/slapd.conf -l /root/User9.ldif
>> > bdb_monitor_db_open: monitoring disabled; configure monitor database
>> > to enable
>> > . 2.27% eta 21h31m elapsed 29m57s
>> > spd 1.6 k/s str2entry: invalid value for attributeType
>> > postalAddress #0 (syntax 1.3.6.1.4.1.1466.115.121.1.41)
>> > slapadd: could not parse entry (line=394416)
>> > * 2.81% eta 19h59m elapsed 34m40s spd
>> > 10.1 k/s
>>
>> 1. There are too many errors like above.
>>
>> > Its seems to be taking weeks go import whole data.
>>
>> It takes about 2 - 4 hours in order to slapadd 4 mio.entires, depending
>> on file system and disk type.
>> >
>> > is there any tool or any other approach which we can use to make it
>> > fast,Or we are going with wrong configuration.
>> > Or we have to switch to ODS or RHDS
>>
>> There is no necessity for other tools, just modify the ldif file.
>>
>> [...]
>> > DB_CONFIG
>> >
>> > set_cachesize 0 4294967295 0
>>
>> increase cachesize to at least 4GB that is
>> set_cachesize 4 0 1
>>
>> [...]
>> > checkpoint 128 15
>>
>> I would set checkpointing to 0 15
>> [...]
>>
>> > concurrency 100
>> > index entryCSN eq
>> > index entryUUID eq
>> > index
>> >
>> mail,uid,postalCode,smail,channelType,channelValue,answer,behavName,objectclass,tokenID,type
>> > eq
>> > index givenName,sn,city,question,behavValue,cn,extName sub
>> > index displayName approx
>> > # Replication Configuration
>> > overlay syncprov
>> > syncprov-checkpoint 100 10
>> > syncprov-sessionlog 100
>> >
>> > serverid 1
>> >
>> > syncrepl rid=111
>> > provider=ldap://s01.com
>> > binddn="cn=Manager,dc=example,dc=com"
>> > bindmethod=simple
>> > starttls=yes
>> > tls_reqcert=allow
>> > schemachecking=off
>> > credentials=G00gle#
>> > searchbase="dc=example,dc=com"
>> > type=refreshAndPersist
>> > retry="5 5 300 +"
>> > interval=00:00:00:10
>> >
>> > syncrepl rid=222
>> > provider=ldap://m04.com
>> > binddn="cn=Manager,dc=example,dc=com"
>> > bindmethod=simple
>> > starttls=yes
>> > tls_reqcert=allow
>> > schemachecking=off
>> > credentials=G00gle#
>> > searchbase="dc=example,dc=com"
>> > type=refreshAndPersist
>> > retry="5 5 300 +"
>> > interval=00:00:00:10
>> >
>> > ######
>> >
>> > mirrormode TRUE
>> >
>> > directory /apps/openldap/var/openldap-data
>> >
>> > overlay unique
>> > unique_attributes mail
>> >
>> > overlay ppolicy
>> > ppolicy_default "cn=default,ou=pwdPolicy,dc=example,dc=com"
>> > ppolicy_use_lockout
>>
>> Please not that overlay declarations follow all database declarations,
>> modify slapd.conf accordingly.
>>
>> -Dieter
>>
>> --
>> Dieter Klünter | Systemberatung
>> http://dkluenter.de
>> GPG Key ID:DA147B05
>> 53°37'09,95"N
>> 10°08'02,42"E
>>
>>
>
>
> --
>
> Thanks&Regards
> Anil Beniwal
> +919891695048
>
>
--
Thanks&Regards
Anil Beniwal
+919891695048
Hi everyone,
I have noticed a problem regarding the new writetimeout feature in
OpenLDAP 2.4.17. I have a setup with one master and four replicas
running on Solaris 10, using the hdb-backend with BDB 4.4 from the
opencsw repository. Clients are Sun, Linux and Mac boxes in every
possible variation.
I updated my servers last Friday from version 2.4.15 and now some
cronjobs accessing the directory were sporadically failing over the weekend.
I monitored the server logs (log level 'stats') while running the
involved scripts repeatedly and got something like the following results
after fail-runs on all servers:
(IPs and DNs altered)
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 848112 local4.debug] conn=12479 fd=79 ACCEPT from IP=192.168.1.1:50210 (IP=0.0.0.0:389)
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 270379 local4.debug] conn=12479 op=0 EXT oid=1.3.6.1.4.1.1466.20037
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 560212 local4.debug] conn=12479 op=0 STARTTLS
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 875301 local4.debug] conn=12479 op=0 RESULT oid= err=0 text=
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 105384 local4.debug] conn=12479 fd=79 TLS established tls_ssf=256 ssf=256
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 215403 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" method=128
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 600343 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" mech=SIMPLE ssf=0
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 588225 local4.debug] conn=12479 op=1 RESULT tag=97 err=0 text=
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 469902 local4.debug] conn=12479 op=2 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=2 filter="(objectClass=posixAccount)"
> Jul 20 11:42:43 ldapserver slapd[9053]: [ID 744844 local4.debug] conn=12479 op=2 SRCH attr=uid userpassword uidnumber gidnumber gecos homedirectory loginshell
> Jul 20 11:43:00 ldapserver slapd[9053]: [ID 485650 local4.debug] conn=12479 fd=79 closed (writetimeout)
I don't have the writetimeout keyword configured on any of the boxes and
the affected script doesn't do any writes anyway. Also, the problem only
seems to arise if the client takes a while to process a search result.
The failing scripts are Net::LDAP based Perl scripts running on some old
SPARC boxes, so it took them up to half a minute and more to complete.
Setting writetimeout to a high enough value seems solve the problem, but
referring to the docs, this shouldn't happen with the keyword unset or
set to 0.
Is this a bug or did I miss something? Did anyone else encounter this so
far?
Regards,
Christian Manal
Daniel Pocock wrote:
> On 03/03/12 12:46, Michael Ströder wrote:
>> Using DNS SRV is simply not specified regarding SSL/TLS. There's no way
>> to map a naming context to a server cert despite your local security
>> policy says your DNS is trusted by some other means.
>>
>>> A hostname can be a reference identity
>>>
>>> But a reference identity is not always a hostname. It depends on the
>>> client configuration.
>>
>> The naming context (aka search root) cannot be a reference identity in
>> the context of SSL/TLS. Period.
>
> The RFC does not say that.
It does not have to say that.
> Neither does it state that an implementation
> should support the concept. It appears to leave the implementor some
> discretion about their choice of reference identity.
Yes.
>> Feel free to write an Internet draft updating TLS-RFCs and RFC 2782.
>> This could specify how a naming context is compared to some information
>> in a subjectAltName extension since there's no standard saying something
>> about this yet. A suitable GeneralName choice would be directoryName.
>
> It is already in RFC 6125:
>
> http://tools.ietf.org/html/rfc6125#section-6.2
I can't see any language which clearly defines rules how to derive the
reference identity from a LDAP naming context. Just poiting out that it is no
problem with SIP is not sufficient.
> Many a mini-RFC about best practice for RFC 6125 in the LDAP world would
> be useful though
You would have to clearly define what a client has to do to derive and check
the reference identity if its configuration simply contains
ldaps:///dc=example,dc=com
Frankly I don't like RFC 6125. IIRC I gave up reviewing the I-Ds when the
authors insisted on adding support for wildcard certs.
>> The right discussion forum could be the ldap-ext and ietf-tls mailing
>> lists.
>
> My question is really about how OpenLDAP client code supports this (or
> is anyone working on such things already)
If there are no clear rules yet OpenLDAP IMO cannot support it.
Basically we both disagree on what we consider to be sufficiently specified
and secure:
If I understood you correctly you propose that if a server's cert contains
subjectAltname::dNSName:example.com this cert should be accepted for any
application protocol having something like dc=example,dc=com, @example.com etc.
I don't like that approach. It broadens the semantics in such a way that you
cannot have distinct server certs for different services. Mainly I think that
untyped GeneralName::dNSName was defined before we had several types of SRV
RRs (except MX RRs) and no-one ever clearly specified its semantics.
Same problem like looking up MX RR for a domain and then connecting to the MX
servers with StartTLS.
For a naming context ldaps:///dc=example,dc=com one could specify that
subjectAltname MUST contain dNSName:_ldaps._tcp.example.com to at least
express the connection to the SRV RR for the LDAP service. Or other approaches
with other types of GeneralName.
Ciao, Michael.