fix: case-insensitive channel matching in activityRelations enrich pipes (IN-1185)#4239
Open
joanagmaia wants to merge 2 commits into
Open
Conversation
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a data-loss bug in the Tinybird activityRelations bucketing clean/enrich copy pipeline where git-platform activities could be dropped due to case-sensitive repository URL matching (channel) against repos_to_channels.
Changes:
- Updates all 10
activityRelations_bucket_clean_enrich_copy_pipe_*.pipecopy pipes to compare(lower(channel), segmentId)against(lower(channel), segmentId)fromrepos_to_channels. - Keeps the non-git path unchanged (
platform NOT IN (...)), limiting behavior change to git-like platforms.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe | Makes git-platform (channel, segmentId) membership check case-insensitive via lower(channel) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Activities from GitHub have their
channelfield set directly from the GitHub API response, which preserves the original repository owner casing (e.g.https://github.com/NousResearch/hermes-agent).The
repositoriesTinybird datasource, however, stores URLs in lowercase (e.g.https://github.com/nousresearch/hermes-agent). This means that when theactivityRelations_bucket_clean_enrich_copy_pipe_*pipes filter activities using:the tuple comparison is case-sensitive in ClickHouse, so activities whose channel URL has a different casing than the stored repository URL are silently dropped from
activityRelations_deduplicated_cleaned_bucket_*_ds.Concretely: querying
activityRelationsfor a given segment and type returns rows, but queryingactivityRelations_deduplicated_cleaned_bucket_unionfor the same filters returns 0 — because the channel casing mismatch causes the filter to exclude all matching activities.Fix
Apply
lower()to both sides of the channel comparison in all 10activityRelations_bucket_clean_enrich_copy_pipe_*.pipefiles:This is safe for all platforms (GitHub, GitLab, Gerrit) since
lower()is idempotent when the URL is already lowercase. After deploying, the next scheduled copy run (every 10 minutes) will backfill previously-dropped activities.Note
Medium Risk
Changes analytics pipeline filtering for activity relations buckets; wrong matching logic could include or exclude activities incorrectly, but the change is narrowly scoped and aligns with lowercase repo storage.
Overview
Fixes git-related activities disappearing from
activityRelations_deduplicated_cleaned_bucket_*when activitychannelURLs differ in casing from repository URLs inrepos_to_channels(ClickHouse tupleINis case-sensitive).All 10
activityRelations_bucket_clean_enrich_copy_pipe_*.pipefiles now compare(lower(channel), segmentId)on both sides of the filter forgit,gerrit,github, andgitlabplatforms. Scheduled COPY replace runs will repopulate buckets with previously dropped rows on the next copy cycle.Reviewed by Cursor Bugbot for commit 6a041b4. Bugbot is set up for automated code reviews on this repo. Configure here.