We sometimes experience an invalid close socket call inside ldap client library. (Why do we care? It signals a bug that may disrupt an innocent thread if the innocent thread happens to open a socket with the same descriptor number as the buggy thread is trying to close.) We tracked down that the same socket is closed twice in rapid succession.
Do you know how could it happen that sb_iod and sb_iod->next share the same sbiod_sb?
Below: callstack, data of LDAP* conn, relevant code, description of the data and code, workaround.
my_app_1!closesocket my_app_1!sb_stream_close+0x21 my_app_1!ber_int_sb_close+0x1e my_app_1!ldap_free_connection+0x28d my_app_1!ldap_ld_free+0x51
+0x000 conn : 0x0523fa80 ldap +0x000 ld_sb : 0x0526b3b8 sockbuf +0x000 sb_opts : lber_options +0x000 lbo_valid : 0n3 +0x002 lbo_options : 0 +0x004 lbo_debug : 0n0 +0x008 sb_iod : 0x05259dd8 sockbuf_io_desc +0x000 sbiod_level : 0n10 +0x004 sbiod_sb : 0x0526b3b8 sockbuf +0x000 sb_opts : lber_options +0x000 lbo_valid : 0n3 +0x002 lbo_options : 0 +0x004 lbo_debug : 0n0 +0x008 sb_iod : 0x05259dd8 sockbuf_io_desc +0x000 sbiod_level : 0n10 +0x004 sbiod_sb : 0x0526b3b8 sockbuf +0x008 sbiod_io : 0x0091ef88 sockbuf_io +0x00c sbiod_pvt : (null) +0x010 sbiod_next : 0x0520c380 sockbuf_io_desc +0x00c sb_fd : 0x1534 +0x010 sb_max_incoming : 0 +0x014 sb_trans_needs_read : 0y0 +0x014 sb_trans_needs_write : 0y0 +0x008 sbiod_io : 0x0091ef88 sockbuf_io +0x000 sbi_setup : 0x006ace40 int my_app_1!sb_fd_setup+0 +0x004 sbi_remove : (null) +0x008 sbi_ctrl : 0x0052cdd0 int my_app_1!mbufFlush+0 +0x00c sbi_read : 0x006acdd0 long my_app_1!sb_stream_read+0 +0x010 sbi_write : 0x006acdf0 long my_app_1!sb_stream_write+0 +0x014 sbi_close : 0x006ace10 int my_app_1!sb_stream_close+0 +0x00c sbiod_pvt : (null) +0x010 sbiod_next : 0x0520c380 sockbuf_io_desc +0x000 sbiod_level : 0n10 +0x004 sbiod_sb : 0x0526b3b8 sockbuf +0x000 sb_opts : lber_options +0x008 sb_iod : 0x05259dd8 sockbuf_io_desc +0x00c sb_fd : 0x1534 +0x010 sb_max_incoming : 0 +0x014 sb_trans_needs_read : 0y0 +0x014 sb_trans_needs_write : 0y0 +0x008 sbiod_io : 0x0091ef88 sockbuf_io +0x000 sbi_setup : 0x006ace40 int my_app_1!sb_fd_setup+0 +0x004 sbi_remove : (null) +0x008 sbi_ctrl : 0x0052cdd0 int my_app_1!mbufFlush+0 +0x00c sbi_read : 0x006acdd0 long my_app_1!sb_stream_read+0 +0x010 sbi_write : 0x006acdf0 long my_app_1!sb_stream_write+0 +0x014 sbi_close : 0x006ace10 int my_app_1!sb_stream_close+0 +0x00c sbiod_pvt : (null) +0x010 sbiod_next : (null) +0x00c sb_fd : 0x1534 +0x010 sb_max_incoming : 0 +0x014 sb_trans_needs_read : 0y0 +0x014 sb_trans_needs_write : 0y0 (etc.)
ber_int_sb_close(): p = sb->sb_iod; while ( p ) { if ( p->sbiod_io->sbi_close && p->sbiod_io->sbi_close( p ) < 0 ) { return -1; } p = p->sbiod_next; }
sb_stream_close( Sockbuf_IO_Desc *sbiod ): tcp_close( sbiod->sbiod_sb->sb_fd );
So the socket number 0x1534 was closed twice in rapid succession, because there are 2 linked list nodes, both with the same sb_fd (actually, both sbiod_sb point to 0x0526b3b8 sockbuf, which is their "parent" a.k.a. conn->ld_sb, i.e. conn->ld_sb->sb_iod->sbiod_sb == conn->ld_sb and also == conn->ld_sb->sb_iod->sbiod_next->sbiod_sb). The question is how/when the sbiod_next was added.
A dirty workaround would be to detect this special case in ber_int_sb_close(), but I'd strongly prefer to fix the problem. p = sb->sb_iod; while ( p ) { if ( p->sbiod_io->sbi_close && p->sbiod_io->sbi_close( p ) < 0 ) { return -1; } + // workaround for double-closesocket: + while (p->sbiod_next // there's another sbiod + && (p->sbiod_next->sbiod_sb == p->sbiod_sb) // points to the same socket + && (p->sbiod_next->sbiod_io == p->sbiod_io) // same functions used + && (p->sbiod_io->sbi_close == sb_stream_close) // tcp + ) { + p = p->sbiod_next; // skip, don't close it again + } + // end of workaround p = p->sbiod_next; }