b2b_entities: fix crash calling b2b_init_request from async resume route#3822
b2b_entities: fix crash calling b2b_init_request from async resume route#3822NormB wants to merge 1 commit into
Conversation
|
Hi @NormB , thanks for the detailed report here. Even if I fully agree on how the crash happens, the fix is not 100% correct - it avoids the crash, indeed, but it misbehave, in the sense of not pushing (into the UAC side of the b2b) any lump/change that was done prior to the async jump. |
|
@bogdan-iancu I will be able to test and validate. |
|
This is still a valid bug, unfortunately the fix I was hoping for is far from simple....still WIP |
|
@NormB , please re-open it |
…penSIPSGH-3796) The previous proposed fix for the b2b_init_request()-from-async-resume crash simply skipped b2b_apply_lumps() for FL_TM_FAKE_REQ messages. That avoids the crash but, as noted in review, silently discards any lump/change applied before the async jump (record_route, append_hf, SDP edits, ...), so they never reach the request b2b builds its entities from. A faked request is a hybrid: fix_fake_req_headers()/clone_headers() give it a single pkg block of header nodes, but the nodes' parsed sub-structures and the message buffer still live in the SHM transaction clone. The generic in-place path in b2b_apply_lumps() (free_sip_msg() then reparse into the same buffer) therefore (a) pkg_free()s SHM parsed structures -> the OpenSIPSGH-3796 abort, and (b) would overwrite the SHM clone buffer. Instead, for FL_TM_FAKE_REQ, flatten the lumps into a private pkg buffer, parse it into a fresh, fully pkg-owned sip_msg, and graft that self-contained state onto the faked request, carrying over the pkg fields fake_req() set up. The new FL_TM_FAKE_REQ_REBUILT flag tells tm's free_faked_req() to release the message as a standalone pkg sip_msg rather than via the clone-relative teardown (which assumes an SHM buffer and a single-block pkg header list). Reproduced and verified on aarch64 with a DBG_MALLOC build: the worker no longer aborts in free_via_param_list()/qm_free_dbg(), and the pre-async record_route and append_hf lumps are present on the rebuilt request. Refs: OpenSIPS#3796
f555c85 to
e0f1769
Compare
|
reviewing the latest commit, looking interesting.... |
Summary
Fixes a crash (
fm_free: freeing dangling pkg pointer/qm_free_dbg: bad pointer (out of memory block\!) - aborting) whenb2b_init_request()is called from an async resume route (e.g. afterrest_post()). The worker process aborts with signal 6 (SIGABRT) and a core dump is generated, taking down the entire OpenSIPS instance.Details
When an initial INVITE is processed with an async operation (e.g.
async(rest_post(...), resume_route)), the TM module:t_newtran(), including all parsed headers and any lumps added before the async call (e.g. fromrecord_route(),fix_nated_contact(),append_hf(), etc.)t_resume_async_request()intm/async.ccallsfake_req()(tm/t_msgbuilder.h:194) to build a faked SIP message for the resume routefake_req()doesmemcpy(faked_req, shm_msg, sizeof(struct sip_msg))at line 202 — this copies the entiresip_msgstruct, meaning the faked request'smsg->headers,msg->add_rm, andmsg->body_lumpspointers all point into SHM memory (they are the same SHM pointers from the transaction's stored request)fake_req()then PKG-duplicates only specific fields (new_uri,dst_uri,path_vec,set_global_address,set_global_port) and setsfaked_req->msg_flags |= FL_TM_FAKE_REQat line 211When the resume route calls
b2b_init_request(), the script-level handlerb2bl_script_init_request()(b2b_logic/logic.c:3228) callsb2b_api.apply_lumps(msg)directly at line 3245 — before any B2B entities are created. This dispatches tob2b_apply_lumps()inb2b_entities/dlg.c, where the following crash sequence occurs:b2b_apply_lumps()checksif(\!msg->body_lumps && \!msg->add_rm) return 0;— if any module added lumps before the async call, these SHM lump pointers are non-NULL, so execution continuesbuild_req_buf_from_sip_req()processes the lumps to build a new bufferfree_sip_msg(msg)is called to free the old message structurefree_sip_msg()(parser/msg_parser.c:965), the call tofree_hdr_field_lst(msg->headers)at line 980 walks the parsed header linked list and callspkg_free()on each header nodepkg_free()on SHM pointers triggers the memory allocator's safety check and aborts the processTM's own
free_faked_req()(tm/t_msgbuilder.h:345) is designed to handle this correctly — it carefully frees only the PKG-duplicated fields and never callsfree_hdr_field_lst()on the headers. The b2b_entities module'sb2b_apply_lumps()was missing this awareness.The crash is deterministic: it occurs on every call to
b2b_init_request()from an async resume route, provided any module added lumps to the request before the async call. Common lump-producing functions includerecord_route(),fix_nated_contact(),append_hf(),remove_hf(), and many others. In production configurations, lumps are almost always present.Note that
b2b_apply_lumps()is called from 6 sites — not onlyb2bl_script_init_request()(logic.c:3245) but alsob2bl_script_bridge()(bridging.c:2290) and 4 internal call sites withindlg.citself (lines 1103, 3306, 3373, 3755). The fix in the function body protects all of them.Affected versions: The bug exists in all versions where
b2b_apply_lumps()exists alongside TM async support. Confirmed on 3.6.2 (reporter) and 4.0.0-dev (our testing).Solution
Add a guard at the top of
b2b_apply_lumps()that checks theFL_TM_FAKE_REQflag and returns early:This mirrors TM's own approach —
FL_TM_FAKE_REQ(1<<18, defined inparser/msg_parser.h:121) is already set byfake_req()attm/t_msgbuilder.h:211precisely to mark faked requests. TM'sfree_faked_req()uses this same awareness to avoid freeing SHM headers.Skipping lump application is safe for faked requests because:
b2b_apply_lumps()returning 0 simply means "no modifications were made", which all 6 callers already handleReproduction
Tested on OpenSIPS 4.0.0-dev (x86_64/linux,
DBG_MALLOCenabled) with this minimal config:A simple Python HTTP server on port 8899 with a 100ms delay ensures the async machinery activates. A SIP INVITE sent via
nc -utriggers the flow.record_route)CRITICAL: qm_free_dbg: bad pointer (out of memory block\!) - aborting, worker killed by signal 6, core dump generated, entire OpenSIPS instance terminatesrecord_route)b2b_init_requestcompletes successfully, B2B session established, process stablerecord_route)b2b_apply_lumpsreturns early at the\!msg->body_lumps && \!msg->add_rmcheck, confirming the lump-dependent triggerCompatibility
No backward compatibility issues. The fix only adds an early-return guard for a specific message type (TM faked requests) that was previously crashing. No behavior change for normal (non-faked) SIP messages. The
FL_TM_FAKE_REQflag is defined in core headers (parser/msg_parser.h) and is available to all modules.Closing issues
Closes #3796