Skip to content

fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185)#4239

Open
joanagmaia wants to merge 2 commits into
linuxfoundation:mainfrom
joanagmaia:fix/repositories-mapping-in-activityrelations-buckets
Open

fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185)#4239
joanagmaia wants to merge 2 commits into
linuxfoundation:mainfrom
joanagmaia:fix/repositories-mapping-in-activityrelations-buckets

Conversation

@joanagmaia

@joanagmaia joanagmaia commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem

Activities from GitHub have their channel field set directly from the GitHub API response, which preserves the original repository owner casing (e.g. https://github.com/NousResearch/hermes-agent).

The repositories Tinybird datasource, however, stores URLs in lowercase (e.g. https://github.com/nousresearch/hermes-agent). This means that when the activityRelations_bucket_clean_enrich_copy_pipe_* pipes filter activities using:

AND (channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels)

the tuple comparison is case-sensitive in ClickHouse, so activities whose channel URL has a different casing than the stored repository URL are silently dropped from activityRelations_deduplicated_cleaned_bucket_*_ds.

Concretely: querying activityRelations for a given segment and type returns rows, but querying activityRelations_deduplicated_cleaned_bucket_union for the same filters returns 0 — because the channel casing mismatch causes the filter to exclude all matching activities.

Fix

Apply lower() to both sides of the channel comparison in all 10 activityRelations_bucket_clean_enrich_copy_pipe_*.pipe files:

-- before
AND (channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels)

-- after
AND (lower(channel), segmentId) IN (SELECT lower(channel), segmentId FROM repos_to_channels)

This is safe for all platforms (GitHub, GitLab, Gerrit) since lower() is idempotent when the URL is already lowercase. After deploying, the next scheduled copy run (every 10 minutes) will backfill previously-dropped activities.


Note

Medium Risk
Changes analytics pipeline filtering for activity relations buckets; wrong matching logic could include or exclude activities incorrectly, but the change is narrowly scoped and aligns with lowercase repo storage.

Overview
Fixes git-related activities disappearing from activityRelations_deduplicated_cleaned_bucket_* when activity channel URLs differ in casing from repository URLs in repos_to_channels (ClickHouse tuple IN is case-sensitive).

All 10 activityRelations_bucket_clean_enrich_copy_pipe_*.pipe files now compare (lower(channel), segmentId) on both sides of the filter for git, gerrit, github, and gitlab platforms. Scheduled COPY replace runs will repopulate buckets with previously dropped rows on the next copy cycle.

Reviewed by Cursor Bugbot for commit 6a041b4. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 19, 2026 10:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss bug in the Tinybird activityRelations bucketing clean/enrich copy pipeline where git-platform activities could be dropped due to case-sensitive repository URL matching (channel) against repos_to_channels.

Changes:

  • Updates all 10 activityRelations_bucket_clean_enrich_copy_pipe_*.pipe copy pipes to compare (lower(channel), segmentId) against (lower(channel), segmentId) from repos_to_channels.
  • Keeps the non-git path unchanged (platform NOT IN (...)), limiting behavior change to git-like platforms.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@joanagmaia joanagmaia changed the title fix: case-insensitive channel matching in activityRelations clean/enrich pipes fix: case-insensitive channel matching in activityRelations clean/enrich pipes (IN-1185) Jun 19, 2026
@joanagmaia joanagmaia changed the title fix: case-insensitive channel matching in activityRelations clean/enrich pipes (IN-1185) fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185) Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants