Skip to content

feat(ingest/sql): column logic + join extraction #13426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented May 5, 2025

Adds a couple new features to our SQL parser

  • Best-effort column logic extraction. In case a column goes through multiple layers of transformation, we only take the final one.
    • For each column, we also annotate whether it was a direct copy or if it underwent any transformation - see the is_direct_copy bool.
  • Join extraction. For each join clause, we try to figure out (1) the on clause used and (2) the full list of tables/columns involved in the join. The latter is more reliable when there's CTEs.

The column logic extraction is also now integrated into the SqlParsingAggregator, so our main warehouse integrations should benefit from it. The join logic extraction is not yet used anywhere.

All tests are updated, and I also added a couple new ones for tricky edge cases (e.g. self-joins).

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 5, 2025
Copy link

codecov bot commented May 5, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2593 1 2592 47
View the top 1 failed test(s) by shortest run time
tests.integration.cassandra.test_cassandra::test_cassandra_ingest
Stack Traces | 7.31s run time
docker_compose_runner = <function docker_compose_runner.<locals>.run at 0x7f88718f9eb0>
pytestconfig = <_pytest.config.Config object at 0x7f88cbf2ee60>
tmp_path = PosixPath('.../pytest-of-runner/pytest-0/test_cassandra_ingest0')
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f88822b6140>

    @pytest.mark.integration
    def test_cassandra_ingest(docker_compose_runner, pytestconfig, tmp_path, monkeypatch):
        # Tricky: The cassandra container makes modifications directly to the cassandra.yaml
        # config file.
        # See https://github..../cassandra/issues/165
        # To avoid spurious diffs, we copy the config file to a temporary location
        # and depend on that instead. The docker-compose file has the corresponding
        # env variable usage to pick up the config file.
        cassandra_config_file = _resources_dir / "setup/cassandra.yaml"
        shutil.copy(cassandra_config_file, tmp_path / "cassandra.yaml")
        monkeypatch.setenv("CASSANDRA_CONFIG_DIR", str(tmp_path))
    
>       with docker_compose_runner(
            _resources_dir / "docker-compose.yml", "cassandra"
        ) as docker_services:

.../integration/cassandra/test_cassandra.py:29: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../hostedtoolcache/Python/3.8.18....../x64/lib/python3.8/contextlib.py:113: in __enter__
    return next(self.gen)
.../datahub/testing/docker_utils.py:59: in run
    with pytest_docker.plugin.get_docker_services(
.../hostedtoolcache/Python/3.8.18....../x64/lib/python3.8/contextlib.py:113: in __enter__
    return next(self.gen)
venv/lib/python3.8........./site-packages/pytest_docker/plugin.py:213: in get_docker_services
    docker_compose.execute(command)
venv/lib/python3.8........./site-packages/pytest_docker/plugin.py:141: in execute
    return execute(command, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = 'docker compose -f ".../integration/cassandra/docker-compose.yml" -p "pytest5051-cassandra" up --build -d'
success_codes = (0,), ignore_stderr = False

    def execute(command: str, success_codes: Iterable[int] = (0,), ignore_stderr: bool = False) -> Union[bytes, Any]:
        """Run a shell command."""
        try:
            stderr_pipe = subprocess.DEVNULL if ignore_stderr else subprocess.STDOUT
            output = subprocess.check_output(command, stderr=stderr_pipe, shell=True)
            status = 0
        except subprocess.CalledProcessError as error:
            output = error.output or b""
            status = error.returncode
            command = error.cmd
    
        if status not in success_codes:
>           raise Exception(
                'Command {} returned {}: """{}""".'.format(command, status, output.decode("utf-8"))
            )
E           Exception: Command docker compose -f ".../integration/cassandra/docker-compose.yml" -p "pytest5051-cassandra" up --build -d returned 2: """ test-cassandra-load-keyspace Pulling 
E            test-cassandra Pulling 
E            215ed5a63843 Pulling fs layer 
E            094bfcb4db7a Pulling fs layer 
E            4df791be4da6 Pulling fs layer 
E            97a8e80e60c2 Pulling fs layer 
E            7be47dbb02a7 Pulling fs layer 
E            6d8759458974 Pulling fs layer 
E            8058deae7a94 Pulling fs layer 
E            97708ad2f62d Pulling fs layer 
E            23886a8bd282 Pulling fs layer 
E            04c79e19f98c Pulling fs layer 
E            6d8759458974 Waiting 
E            8058deae7a94 Waiting 
E            97708ad2f62d Waiting 
E            23886a8bd282 Waiting 
E            04c79e19f98c Waiting 
E            97a8e80e60c2 Waiting 
E            7be47dbb02a7 Waiting 
E            215ed5a63843 Pulling fs layer 
E            094bfcb4db7a Pulling fs layer 
E            4df791be4da6 Pulling fs layer 
E            97a8e80e60c2 Pulling fs layer 
E            7be47dbb02a7 Pulling fs layer 
E            6d8759458974 Pulling fs layer 
E            8058deae7a94 Pulling fs layer 
E            97708ad2f62d Pulling fs layer 
E            23886a8bd282 Pulling fs layer 
E            6d8759458974 Waiting 
E            97a8e80e60c2 Waiting 
E            04c79e19f98c Pulling fs layer 
E            04c79e19f98c Waiting 
E            8058deae7a94 Waiting 
E            97708ad2f62d Waiting 
E            23886a8bd282 Waiting 
E            7be47dbb02a7 Waiting 
E            094bfcb4db7a Downloading [>                                                  ]  163.1kB/16.15MB
E            094bfcb4db7a Downloading [>                                                  ]  163.1kB/16.15MB
E            4df791be4da6 Downloading [>                                                  ]  474.4kB/46.96MB
E            4df791be4da6 Downloading [>                                                  ]  474.4kB/46.96MB
E            215ed5a63843 Downloading [>                                                  ]  310.5kB/29.53MB
E            215ed5a63843 Downloading [>                                                  ]  310.5kB/29.53MB
E            094bfcb4db7a Downloading [===============>                                   ]  5.139MB/16.15MB
E            094bfcb4db7a Downloading [===============>                                   ]  5.139MB/16.15MB
E            4df791be4da6 Downloading [==================>                                ]  17.12MB/46.96MB
E            4df791be4da6 Downloading [==================>                                ]  17.12MB/46.96MB
E            215ed5a63843 Downloading [===========>                                       ]  7.043MB/29.53MB
E            215ed5a63843 Downloading [===========>                                       ]  7.043MB/29.53MB
E            094bfcb4db7a Verifying Checksum 
E            094bfcb4db7a Download complete 
E            094bfcb4db7a Verifying Checksum 
E            094bfcb4db7a Download complete 
E            4df791be4da6 Downloading [==============================>                    ]  29.02MB/46.96MB
E            4df791be4da6 Downloading [==============================>                    ]  29.02MB/46.96MB
E            215ed5a63843 Downloading [====================>                              ]  12.27MB/29.53MB
E            215ed5a63843 Downloading [====================>                              ]  12.27MB/29.53MB
E            4df791be4da6 Verifying Checksum 
E            4df791be4da6 Download complete 
E            4df791be4da6 Verifying Checksum 
E            4df791be4da6 Download complete 
E            215ed5a63843 Downloading [=========================>                         ]  15.06MB/29.53MB
E            215ed5a63843 Downloading [=========================>                         ]  15.06MB/29.53MB
E            97a8e80e60c2 Downloading [==================================================>]     156B/156B
E            97a8e80e60c2 Verifying Checksum 
E            97a8e80e60c2 Download complete 
E            97a8e80e60c2 Verifying Checksum 
E            97a8e80e60c2 Download complete 
E            215ed5a63843 Downloading [===========================================>       ]  25.52MB/29.53MB
E            215ed5a63843 Downloading [===========================================>       ]  25.52MB/29.53MB
E            215ed5a63843 Download complete 
E            215ed5a63843 Download complete 
E            215ed5a63843 Extracting [>                                                  ]  327.7kB/29.53MB
E            215ed5a63843 Extracting [>                                                  ]  327.7kB/29.53MB
E            7be47dbb02a7 Downloading [===============>                                   ]     720B/2.282kB
E            7be47dbb02a7 Downloading [==================================================>]  2.282kB/2.282kB
E            7be47dbb02a7 Downloading [===============>                                   ]     720B/2.282kB
E            7be47dbb02a7 Verifying Checksum 
E            7be47dbb02a7 Verifying Checksum 
E            7be47dbb02a7 Download complete 
E            7be47dbb02a7 Download complete 
E            215ed5a63843 Extracting [==========>                                        ]  6.226MB/29.53MB
E            215ed5a63843 Extracting [==========>                                        ]  6.226MB/29.53MB
E            6d8759458974 Downloading [====================>                              ]     720B/1.733kB
E            6d8759458974 Downloading [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Verifying Checksum 
E            6d8759458974 Download complete 
E            6d8759458974 Downloading [====================>                              ]     720B/1.733kB
E            6d8759458974 Downloading [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Verifying Checksum 
E            6d8759458974 Download complete 
E            215ed5a63843 Extracting [===============>                                   ]  9.175MB/29.53MB
E            215ed5a63843 Extracting [===============>                                   ]  9.175MB/29.53MB
E            8058deae7a94 Downloading [>                                                  ]  118.5kB/11.5MB
E            8058deae7a94 Downloading [>                                                  ]  118.5kB/11.5MB
E            215ed5a63843 Extracting [==========================>                        ]   15.4MB/29.53MB
E            215ed5a63843 Extracting [==========================>                        ]   15.4MB/29.53MB
E            97708ad2f62d Downloading [>                                                  ]  10.27kB/983.7kB
E            97708ad2f62d Downloading [>                                                  ]  10.27kB/983.7kB
E            8058deae7a94 Downloading [==================>                                ]  4.287MB/11.5MB
E            8058deae7a94 Downloading [==================>                                ]  4.287MB/11.5MB
E            97708ad2f62d Verifying Checksum 
E            97708ad2f62d Download complete 
E            97708ad2f62d Verifying Checksum 
E            97708ad2f62d Download complete 
E            215ed5a63843 Extracting [=======================================>           ]  23.27MB/29.53MB
E            215ed5a63843 Extracting [=======================================>           ]  23.27MB/29.53MB
E            8058deae7a94 Downloading [======================================>            ]  8.899MB/11.5MB
E            8058deae7a94 Downloading [======================================>            ]  8.899MB/11.5MB
E            23886a8bd282 Downloading [>                                                  ]  530.7kB/72.29MB
E            23886a8bd282 Downloading [>                                                  ]  530.7kB/72.29MB
E            8058deae7a94 Verifying Checksum 
E            8058deae7a94 Download complete 
E            8058deae7a94 Verifying Checksum 
E            8058deae7a94 Download complete 
E            23886a8bd282 Downloading [========>                                          ]  11.77MB/72.29MB
E            23886a8bd282 Downloading [========>                                          ]  11.77MB/72.29MB
E            215ed5a63843 Extracting [===========================================>       ]  25.89MB/29.53MB
E            215ed5a63843 Extracting [===========================================>       ]  25.89MB/29.53MB
E            04c79e19f98c Downloading [=============================>                     ]     720B/1.227kB
E            04c79e19f98c Download complete 
E            04c79e19f98c Downloading [=============================>                     ]     720B/1.227kB
E            04c79e19f98c Download complete 
E            23886a8bd282 Downloading [===============>                                   ]  22.98MB/72.29MB
E            23886a8bd282 Downloading [===============>                                   ]  22.98MB/72.29MB
E            215ed5a63843 Extracting [===============================================>   ]  28.18MB/29.53MB
E            215ed5a63843 Extracting [===============================================>   ]  28.18MB/29.53MB
E            23886a8bd282 Downloading [=====================>                             ]  30.97MB/72.29MB
E            23886a8bd282 Downloading [=====================>                             ]  30.97MB/72.29MB
E            215ed5a63843 Extracting [=================================================> ]  29.16MB/29.53MB
E            215ed5a63843 Extracting [=================================================> ]  29.16MB/29.53MB
E            23886a8bd282 Downloading [========================>                          ]  35.27MB/72.29MB
E            23886a8bd282 Downloading [========================>                          ]  35.27MB/72.29MB
E            215ed5a63843 Extracting [==================================================>]  29.53MB/29.53MB
E            215ed5a63843 Extracting [==================================================>]  29.53MB/29.53MB
E            215ed5a63843 Pull complete 
E            215ed5a63843 Pull complete 
E            094bfcb4db7a Extracting [>                                                  ]  163.8kB/16.15MB
E            094bfcb4db7a Extracting [>                                                  ]  163.8kB/16.15MB
E            23886a8bd282 Downloading [==============================>                    ]  43.81MB/72.29MB
E            23886a8bd282 Downloading [==============================>                    ]  43.81MB/72.29MB
E            094bfcb4db7a Extracting [========>                                          ]  2.621MB/16.15MB
E            094bfcb4db7a Extracting [========>                                          ]  2.621MB/16.15MB
E            23886a8bd282 Downloading [====================================>              ]  52.91MB/72.29MB
E            23886a8bd282 Downloading [====================================>              ]  52.91MB/72.29MB
E            094bfcb4db7a Extracting [=======================>                           ]    7.7MB/16.15MB
E            094bfcb4db7a Extracting [=======================>                           ]    7.7MB/16.15MB
E            23886a8bd282 Downloading [=======================================>           ]  57.71MB/72.29MB
E            23886a8bd282 Downloading [=======================================>           ]  57.71MB/72.29MB
E            094bfcb4db7a Extracting [=====================================>             ]  12.12MB/16.15MB
E            094bfcb4db7a Extracting [=====================================>             ]  12.12MB/16.15MB
E            23886a8bd282 Downloading [==========================================>        ]     62MB/72.29MB
E            23886a8bd282 Downloading [==========================================>        ]     62MB/72.29MB
E            094bfcb4db7a Extracting [============================================>      ]  14.25MB/16.15MB
E            094bfcb4db7a Extracting [============================================>      ]  14.25MB/16.15MB
E            23886a8bd282 Downloading [==============================================>    ]  66.81MB/72.29MB
E            23886a8bd282 Downloading [==============================================>    ]  66.81MB/72.29MB
E            23886a8bd282 Verifying Checksum 
E            23886a8bd282 Download complete 
E            23886a8bd282 Verifying Checksum 
E            23886a8bd282 Download complete 
E            094bfcb4db7a Extracting [==============================================>    ]  15.07MB/16.15MB
E            094bfcb4db7a Extracting [==============================================>    ]  15.07MB/16.15MB
E            094bfcb4db7a Extracting [===============================================>   ]  15.24MB/16.15MB
E            094bfcb4db7a Extracting [===============================================>   ]  15.24MB/16.15MB
E            094bfcb4db7a Extracting [=================================================> ]  16.06MB/16.15MB
E            094bfcb4db7a Extracting [=================================================> ]  16.06MB/16.15MB
E            094bfcb4db7a Extracting [==================================================>]  16.15MB/16.15MB
E            094bfcb4db7a Extracting [==================================================>]  16.15MB/16.15MB
E            094bfcb4db7a Pull complete 
E            094bfcb4db7a Pull complete 
E            4df791be4da6 Extracting [>                                                  ]  491.5kB/46.96MB
E            4df791be4da6 Extracting [>                                                  ]  491.5kB/46.96MB
E            4df791be4da6 Extracting [=========>                                         ]  9.339MB/46.96MB
E            4df791be4da6 Extracting [=========>                                         ]  9.339MB/46.96MB
E            4df791be4da6 Extracting [===================>                               ]  18.68MB/46.96MB
E            4df791be4da6 Extracting [===================>                               ]  18.68MB/46.96MB
E            4df791be4da6 Extracting [============================>                      ]  27.03MB/46.96MB
E            4df791be4da6 Extracting [============================>                      ]  27.03MB/46.96MB
E            4df791be4da6 Extracting [=====================================>             ]   34.9MB/46.96MB
E            4df791be4da6 Extracting [=====================================>             ]   34.9MB/46.96MB
E            4df791be4da6 Extracting [==============================================>    ]  43.25MB/46.96MB
E            4df791be4da6 Extracting [==============================================>    ]  43.25MB/46.96MB
E            4df791be4da6 Extracting [==================================================>]  46.96MB/46.96MB
E            4df791be4da6 Extracting [==================================================>]  46.96MB/46.96MB
E            4df791be4da6 Pull complete 
E            4df791be4da6 Pull complete 
E            97a8e80e60c2 Extracting [==================================================>]     156B/156B
E            97a8e80e60c2 Extracting [==================================================>]     156B/156B
E            97a8e80e60c2 Extracting [==================================================>]     156B/156B
E            97a8e80e60c2 Extracting [==================================================>]     156B/156B
E            97a8e80e60c2 Pull complete 
E            97a8e80e60c2 Pull complete 
E            7be47dbb02a7 Extracting [==================================================>]  2.282kB/2.282kB
E            7be47dbb02a7 Extracting [==================================================>]  2.282kB/2.282kB
E            7be47dbb02a7 Extracting [==================================================>]  2.282kB/2.282kB
E            7be47dbb02a7 Extracting [==================================================>]  2.282kB/2.282kB
E            7be47dbb02a7 Pull complete 
E            7be47dbb02a7 Pull complete 
E            6d8759458974 Extracting [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Extracting [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Extracting [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Extracting [==================================================>]  1.733kB/1.733kB
E            6d8759458974 Pull complete 
E            6d8759458974 Pull complete 
E            8058deae7a94 Extracting [>                                                  ]  131.1kB/11.5MB
E            8058deae7a94 Extracting [>                                                  ]  131.1kB/11.5MB
E            8058deae7a94 Extracting [=====================>                             ]  4.981MB/11.5MB
E            8058deae7a94 Extracting [=====================>                             ]  4.981MB/11.5MB
E            8058deae7a94 Extracting [==========================>                        ]   6.16MB/11.5MB
E            8058deae7a94 Extracting [==========================>                        ]   6.16MB/11.5MB
E            8058deae7a94 Extracting [======================================>            ]  8.913MB/11.5MB
E            8058deae7a94 Extracting [======================================>            ]  8.913MB/11.5MB
E            8058deae7a94 Extracting [=================================================> ]   11.4MB/11.5MB
E            8058deae7a94 Extracting [=================================================> ]   11.4MB/11.5MB
E            8058deae7a94 Extracting [==================================================>]   11.5MB/11.5MB
E            8058deae7a94 Extracting [==================================================>]   11.5MB/11.5MB
E            8058deae7a94 Pull complete 
E            8058deae7a94 Pull complete 
E            97708ad2f62d Extracting [=>                                                 ]  32.77kB/983.7kB
E            97708ad2f62d Extracting [=>                                                 ]  32.77kB/983.7kB
E            97708ad2f62d Extracting [==================================================>]  983.7kB/983.7kB
E            97708ad2f62d Extracting [==================================================>]  983.7kB/983.7kB
E            97708ad2f62d Pull complete 
E            97708ad2f62d Pull complete 
E            23886a8bd282 Extracting [>                                                  ]  557.1kB/72.29MB
E            23886a8bd282 Extracting [>                                                  ]  557.1kB/72.29MB
E            23886a8bd282 Extracting [================>                                  ]   23.4MB/72.29MB
E            23886a8bd282 Extracting [================>                                  ]   23.4MB/72.29MB
E            23886a8bd282 Extracting [===============================>                   ]  46.24MB/72.29MB
E            23886a8bd282 Extracting [===============================>                   ]  46.24MB/72.29MB
E            23886a8bd282 Extracting [==========================================>        ]  61.28MB/72.29MB
E            23886a8bd282 Extracting [==========================================>        ]  61.28MB/72.29MB
E            23886a8bd282 Extracting [==================================================>]  72.29MB/72.29MB
E            23886a8bd282 Extracting [==================================================>]  72.29MB/72.29MB
E            23886a8bd282 Pull complete 
E            23886a8bd282 Pull complete 
E            04c79e19f98c Extracting [==================================================>]  1.227kB/1.227kB
E            04c79e19f98c Extracting [==================================================>]  1.227kB/1.227kB
E            04c79e19f98c Extracting [==================================================>]  1.227kB/1.227kB
E            04c79e19f98c Extracting [==================================================>]  1.227kB/1.227kB
E            04c79e19f98c Pull complete 
E            04c79e19f98c Pull complete 
E            test-cassandra Pulled 
E            test-cassandra-load-keyspace Pulled 
E           fatal error: concurrent map writes
E           
E           goroutine 40 [running]:
E           github..../v2/pkg/compose.(*composeService).pullRequiredImages.func1.1()
E           	github..../pkg/compose/pull.go:328 +0x236
E           golang..../x/sync/errgroup.(*Group).Go.func1()
E           	golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x50
E           created by golang..../x/sync/errgroup.(*Group).Go in goroutine 39
E           	golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x96
E           
E           goroutine 1 [semacquire]:
E           sync.runtime_Semacquire(0xc00048c348?)
E           	runtime/sema.go:71 +0x25
E           sync.(*WaitGroup).Wait(0xc000812140?)
E           	sync/waitgroup.go:118 +0x48
E           golang..../x/sync/errgroup.(*Group).Wait(0xc000812100)
E           	golang.org/x/[email protected]/errgroup/errgroup.go:56 +0x25
E           github..../v2/pkg/progress.RunWithStatus({0x2ba9388, 0xc0007333e0}, 0xc00080a340, 0xc000120f60, {0x27c8359, 0x7})
E           	github..../pkg/progress/writer.go:97 +0x225
E           github..../v2/pkg/progress.Run({0x2ba9388, 0xc0007333e0}, 0xc0008120c0, 0xc000120f60)
E           	github..../pkg/progress/writer.go:61 +0x85
E           github..../v2/pkg/compose.(*composeService).Up(0xc00052bec0, {0x2ba9388, 0xc0007333e0}, _, {{0xc0004aed80, {0xc00015d180, 0x0, 0x2}, 0x0, 0x0, ...}, ...})
E           	github..../pkg/compose/up.go:40 +0x213
E           github..../v2/cmd/compose.runUp({_, _}, {_, _}, {_, _}, {0x1, 0x0, {0x27c670c, 0x6}, ...}, ...)
E           	github..../cmd/compose/up.go:319 +0xb54
E           github..../v2/cmd/compose.upCommand.func2({0x2ba9388, 0xc0007333e0}, 0xc0007507e0, {0xc00015d180, 0x0, 0x2})
E           	github..../cmd/compose/up.go:143 +0x29f
E           github..../v2/cmd/compose.upCommand.(*ProjectOptions).WithServices.func5({0x2ba93c0, 0xc0005020f0}, {0xc00015d180, 0x0, 0x2})
E           	github..../cmd/compose/compose.go:187 +0x22d
E           github..../v2/cmd/compose.upCommand.(*ProjectOptions).WithServices.Adapt.func7({0x2ba93c0?, 0xc0005020f0?}, 0x2?, {0xc00015d180?, 0x2b8dfe8?, 0x241d23b?})
E           	github..../cmd/compose/compose.go:137 +0x30
E           github..../v2/cmd/compose.upCommand.(*ProjectOptions).WithServices.Adapt.AdaptCmd.func8(0xc0001f4f08, {0xc00015d180, 0x0, 0x2})
E           	github..../cmd/compose/compose.go:121 +0x143
E           github..../cli/cli-plugins/plugin.RunPlugin.func1.1.2(0xc0001f4f08, {0xc00015d180, 0x0, 0x2})
E           	github.com/docker/[email protected]+.../cli-plugins/plugin/plugin.go:65 +0x6c
E           github..../v2/cmd/cmdtrace.Setup.wrapRunE.func2(0xc0001f4f08?, {0xc00015d180?, 0x0?, 0x2?})
E           	github..../cmd/cmdtrace/cmd_span.go:85 +0x63
E           github.com/spf13/cobra.(*Command).execute(0xc0001f4f08, {0xc0004afd60, 0x2, 0x2})
E           	github.com/spf13/[email protected]/command.go:1015 +0xa94
E           github.com/spf13/cobra.(*Command).ExecuteC(0xc0003a4f08)
E           	github.com/spf13/[email protected]/command.go:1148 +0x40c
E           github.com/spf13/cobra.(*Command).Execute(...)
E           	github.com/spf13/[email protected]/command.go:1071
E           github..../cli/cli-plugins/plugin.RunPlugin(0xc0002d6c80, 0xc0001f4908, {{0x27c5150, 0x5}, {0x27cf09e, 0xb}, {0x2b790c8, 0x7}, {0x0, 0x0}, ...})
E           	github.com/docker/[email protected]+.../cli-plugins/plugin/plugin.go:80 +0x145
E           github..../cli/cli-plugins/plugin.Run(0x29213b0, {{0x27c5150, 0x5}, {0x27cf09e, 0xb}, {0x2b790c8, 0x7}, {0x0, 0x0}, {0x0, ...}})
E           	github.com/docker/[email protected]+.../cli-plugins/plugin/plugin.go:95 +0x105
E           main.pluginMain()
E           	github..../v2/cmd/main.go:38 +0xa5
E           main.main()
E           	github..../v2/cmd/main.go:98 +0x19c
E           
E           goroutine 5 [IO wait]:
E           internal/poll.runtime_pollWait(0x7f2d03b78e20, 0x72)
E           	runtime/netpoll.go:351 +0x85
E           internal/poll.(*pollDesc).wait(0xc0004afd80?, 0xc0000847bf?, 0x0)
E           	internal/poll/fd_poll_runtime.go:84 +0x27
E           internal/poll.(*pollDesc).waitRead(...)
E           	internal/poll/fd_poll_runtime.go:89
E           internal/poll.(*FD).Read(0xc0004afd80, {0xc0000847bf, 0x1, 0x1})
E           	internal/poll/fd_unix.go:165 +0x27a
E           net.(*netFD).Read(0xc0004afd80, {0xc0000847bf?, 0x0?, 0x0?})
E           	net/fd_posix.go:55 +0x25
E           net.(*conn).Read(0xc000522d90, {0xc0000847bf?, 0x0?, 0x0?})
E           	net/net.go:189 +0x45
E           github..../cli/cli-plugins/socket.ConnectAndWait.func1()
E           	github.com/docker/[email protected]+.../cli-plugins/socket/socket.go:162 +0x45
E           created by github..../cli/cli-plugins/socket.ConnectAndWait in goroutine 1
E           	github.com/docker/[email protected]+.../cli-plugins/socket/socket.go:159 +0x118
E           
E           goroutine 9 [select]:
E           go.opentelemetry..../otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc000516000)
E           	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:302 +0x114
E           go.opentelemetry..../otel/sdk/trace.NewBatchSpanProcessor.func1()
E           	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:117 +0x4e
E           created by go.opentelemetry..../otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
E           	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:115 +0x2e5
E           
E           goroutine 7 [IO wait]:
E           internal/poll.runtime_pollWait(0x7f2d03b78d08, 0x72)
E           	runtime/netpoll.go:351 +0x85
E           internal/poll.(*pollDesc).wait(0xc000402380?, 0xc0003bd000?, 0x0)
E           	internal/poll/fd_poll_runtime.go:84 +0x27
E           internal/poll.(*pollDesc).waitRead(...)
E           	internal/poll/fd_poll_runtime.go:89
E           internal/poll.(*FD).Read(0xc000402380, {0xc0003bd000, 0x1000, 0x1000})
E           	internal/poll/fd_unix.go:165 +0x27a
E           net.(*netFD).Read(0xc000402380, {0xc0003bd000?, 0x0?, 0x2b805a0?})
E           	net/fd_posix.go:55 +0x25
E           net.(*conn).Read(0xc000522e70, {0xc0003bd000?, 0x0?, 0x0?})
E           	net/net.go:189 +0x45
E           net/http.(*persistConn).Read(0xc0000d8240, {0xc0003bd000?, 0x777f25?, 0x2379d40?})
E           	net/http/transport.go:2052 +0x4a
E           bufio.(*Reader).fill(0xc000227d40)
E           	bufio/bufio.go:110 +0x103
E           bufio.(*Reader).Peek(0xc000227d40, 0x1)
E           	bufio/bufio.go:148 +0x53
E           net/http.(*persistConn).readLoop(0xc0000d8240)
E           	net/http/transport.go:2205 +0x185
E           created by net/http.(*Transport).dialConn in goroutine 6
E           	net/http/transport.go:1874 +0x154f
E           
E           goroutine 8 [select]:
E           net/http.(*persistConn).writeLoop(0xc0000d8240)
E           	net/http/transport.go:2519 +0xe7
E           created by net/http.(*Transport).dialConn in goroutine 6
E           	net/http/transport.go:1875 +0x15a5
E           
E           goroutine 23 [syscall]:
E           os/signal.signal_recv()
E           	runtime/sigqueue.go:152 +0x29
E           os/signal.loop()
E           	os/signal/signal_unix.go:23 +0x13
E           created by os/signal.Notify.func1.1 in goroutine 1
E           	os/signal/signal.go:151 +0x1f
E           
E           goroutine 24 [chan receive]:
E           github..../v2/cmd/compose.upCommand.AdaptCmd.func4.1()
E           	github..../cmd/compose/compose.go:115 +0x27
E           created by github..../v2/cmd/compose.upCommand.AdaptCmd.func4 in goroutine 1
E           	github..../cmd/compose/compose.go:114 +0x10a
E           
E           goroutine 25 [chan receive]:
E           github..../v2/cmd/compose.upCommand.(*ProjectOptions).WithServices.Adapt.AdaptCmd.func8.1()
E           	github..../cmd/compose/compose.go:115 +0x27
E           created by github..../v2/cmd/compose.upCommand.(*ProjectOptions).WithServices.Adapt.AdaptCmd.func8 in goroutine 1
E           	github..../cmd/compose/compose.go:114 +0x10a
E           
E           goroutine 36 [select]:
E           github..../v2/pkg/progress.(*plainWriter).Start(0xc000824040, {0x2ba8f50, 0x412eac0})
E           	github..../pkg/progress/plain.go:34 +0x67
E           github..../v2/pkg/progress.RunWithStatus.func1()
E           	github..../pkg/progress/writer.go:83 +0x2a
E           golang..../x/sync/errgroup.(*Group).Go.func1()
E           	golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x50
E           created by golang..../x/sync/errgroup.(*Group).Go in goroutine 1
E           	golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x96
E           
E           goroutine 37 [semacquire]:
E           sync.runtime_Semacquire(0xc000010348?)
E           	runtime/sema.go:71 +0x25
E           sync.(*WaitGroup).Wait(0x25c5480?)
E           	sync/waitgroup.go:118 +0x48
E           golang..../x/sync/errgroup.(*Group).Wait(0xc0007302c0)
E           	golang.org/x/[email protected]/errgroup/errgroup.go:56 +0x25
E           github..../v2/pkg/progress.RunWithStatus({0x2ba9388, 0xc000786ae0}, 0xc00078a300, 0xc000120f60, {0x27c8359, 0x7})
E           	github..../pkg/progress/writer.go:97 +0x225
E           github..../v2/pkg/progress.Run({0x2ba9388, 0xc000786ae0}, 0xc000786b40, 0xc000120f60)
E           	github..../pkg/progress/writer.go:61 +0x85
E           github..../v2/pkg/compose.(*composeService).pullRequiredImages(0xc00052bec0, {0x2ba9388, 0xc000786ae0}, 0xc0007507e0, 0xc0007861b0, 0x0)
E           	github..../pkg/compose/pull.go:320 +0x2ef
E           github..../v2/pkg/compose.(*composeService).ensureImagesExists.func1({0x2ba9388?, 0xc000786ae0?})
E           	github..../pkg/compose/build.go:278 +0x37
E           github..../v2/pkg/compose.(*composeService).ensureImagesExists.SpanWrapFunc.func3({0x2ba9388, 0xc000786180})
E           	github..../internal/tracing/wrap.go:43 +0x13d
E           github..../v2/pkg/compose.(*composeService).ensureImagesExists(0xc00052bec0, {0x2ba9388, 0xc000786180}, 0xc0007507e0, 0xc0004aed80, 0x0)
E           	github..../pkg/compose/build.go:280 +0x29e
E           github..../v2/pkg/compose.(*composeService).create(0xc00052bec0, {0x2ba9388, 0xc000786180}, 0xc0007507e0, {0xc0004aed80, {0xc00074c8e0, 0x2, 0x2}, 0x0, 0x0, ...})
E           	github..../pkg/compose/create.go:83 +0xdf
E           github..../v2/pkg/compose.(*composeService).Up.func1({0x2ba9388, 0xc000786180})
E           	github..../pkg/compose/up.go:41 +0x85
E           github..../v2/pkg/compose.(*composeService).Up.SpanWrapFunc.func5({0x2ba9388, 0xc00080c5d0})
E           	github..../internal/tracing/wrap.go:43 +0x13d
E           github..../v2/pkg/progress.Run.func1({0x2ba9388?, 0xc00080c5d0?})
E           	github..../pkg/progress/writer.go:62 +0x22
E           github..../v2/pkg/progress.RunWithStatus.func2()
E           	github..../pkg/progress/writer.go:90 +0x70
E           golang..../x/sync/errgroup.(*Group).Go.func1()
E           	golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x50
E           created by golang..../x/sync/errgroup.(*Group).Go in goroutine 1
E           	golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x96
E           
E           goroutine 38 [select]:
E           github..../v2/pkg/progress.(*plainWriter).Start(0xc00074c160, {0x2ba8f50, 0x412eac0})
E           	github..../pkg/progress/plain.go:34 +0x67
E           github..../v2/pkg/progress.RunWithStatus.func1()
E           	github..../pkg/progress/writer.go:83 +0x2a
E           golang..../x/sync/errgroup.(*Group).Go.func1()
E           	golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x50
E           created by golang..../x/sync/errgroup.(*Group).Go in goroutine 37
E           	golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x96
E           
E           goroutine 39 [semacquire]:
E           sync.runtime_Semacquire(0x0?)
E           	runtime/sema.go:71 +0x25
E           sync.(*WaitGroup).Wait(0xc000786b10?)
E           	sync/waitgroup.go:118 +0x48
E           golang..../x/sync/errgroup.(*Group).Wait(0xc000730340)
E           	golang.org/x/[email protected]/errgroup/errgroup.go:56 +0x25
E           github..../v2/pkg/compose.(*composeService).pullRequiredImages.func1({0x2ba9388, 0xc000786b70})
E           	github..../pkg/compose/pull.go:340 +0x39b
E           github..../v2/pkg/progress.Run.func1({0x2ba9388?, 0xc000786b70?})
E           	github..../pkg/progress/writer.go:62 +0x22
E           github..../v2/pkg/progress.RunWithStatus.func2()
E           	github..../pkg/progress/writer.go:90 +0x70
E           golang..../x/sync/errgroup.(*Group).Go.func1()
E           	golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x50
E           created by golang..../x/sync/errgroup.(*Group).Go in goroutine 37
E           	golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x96
E           
E           goroutine 27 [IO wait]:
E           internal/poll.runtime_pollWait(0x7f2d03b78bf0, 0x72)
E           	runtime/netpoll.go:351 +0x85
E           internal/poll.(*pollDesc).wait(0xc0004ae000?, 0xc00069b000?, 0x0)
E           	internal/poll/fd_poll_runtime.go:84 +0x27
E           internal/poll.(*pollDesc).waitRead(...)
E           	internal/poll/fd_poll_runtime.go:89
E           internal/poll.(*FD).Read(0xc0004ae000, {0xc00069b000, 0x1000, 0x1000})
E           	internal/poll/fd_unix.go:165 +0x27a
E           net.(*netFD).Read(0xc0004ae000, {0xc00069b000?, 0x0?, 0x2b805a0?})
E           	net/fd_posix.go:55 +0x25
E           net.(*conn).Read(0xc00048e070, {0xc00069b000?, 0x0?, 0x0?})
E           	net/net.go:189 +0x45
E           net/http.(*persistConn).Read(0xc00077e000, {0xc00069b000?, 0x777f25?, 0x2379d40?})
E           	net/http/transport.go:2052 +0x4a
E           bufio.(*Reader).fill(0xc00053a420)
E           	bufio/bufio.go:110 +0x103
E           bufio.(*Reader).Peek(0xc00053a420, 0x1)
E           	bufio/bufio.go:148 +0x53
E           net/http.(*persistConn).readLoop(0xc00077e000)
E           	net/http/transport.go:2205 +0x185
E           created by net/http.(*Transport).dialConn in goroutine 26
E           	net/http/transport.go:1874 +0x154f
E           
E           goroutine 28 [select]:
E           net/http.(*persistConn).writeLoop(0xc00077e000)
E           	net/http/transport.go:2519 +0xe7
E           created by net/http.(*Transport).dialConn in goroutine 26
E           	net/http/transport.go:1875 +0x15a5
E           """.

venv/lib/python3.8........./site-packages/pytest_docker/plugin.py:36: Exception
View the full list of 1 ❄️ flaky tests
tests.audit_events.audit_events_test::test_policy_events

Flake rate in main: 50.00% (Passed 7 times, Failed 7 times)

Stack Traces | 29.9s run time
auth_exclude_filter = {'condition': 'EQUAL', 'field': 'name', 'negated': True, 'values': ['Test Session Token']}

    def test_policy_events(auth_exclude_filter):
        user_session = login_as(admin_user, admin_pass)
        json = {
            "query": """mutation createPolicy($input: PolicyUpdateInput!) {\n
                createPolicy(input: $input) }""",
            "variables": {
                "input": {
                    "type": "METADATA",
                    "name": "Test Metadata Policy",
                    "description": "My Metadaata Policy",
                    "state": "ACTIVE",
                    "resources": {"type": "dataset", "allResources": True},
                    "privileges": ["EDIT_ENTITY_TAGS"],
                    "actors": {
                        "users": ["urn:li:corpuser:datahub", "urn:li:corpuser:admin"],
                        "resourceOwners": False,
                        "allUsers": False,
                        "allGroups": False,
                    },
                }
            },
        }
    
        response = user_session.post(f"{get_frontend_url()}....../api/v2/graphql", json=json)
        response.raise_for_status()
        res_data = response.json()
    
        assert res_data
        assert res_data["data"]
        assert res_data["data"]["createPolicy"]
    
        wait_for_writes_to_sync(consumer_group="datahub-usage-event-consumer-job-client")
    
        new_urn = res_data["data"]["createPolicy"]
    
        update_json = {
            "query": """mutation updatePolicy($urn: String!, $input: PolicyUpdateInput!) {\n
                updatePolicy(urn: $urn, input: $input) }""",
            "variables": {
                "urn": new_urn,
                "input": {
                    "type": "METADATA",
                    "state": "ACTIVE",
                    "name": "Test Metadata Policy",
                    "description": "Updated Metadaata Policy",
                    "privileges": ["EDIT_ENTITY_TAGS", "EDIT_ENTITY_GLOSSARY_TERMS"],
                    "actors": {
                        "resourceOwners": False,
                        "allUsers": True,
                        "allGroups": False,
                    },
                },
            },
        }
    
        response = user_session.post(
            f"{get_frontend_url()}....../api/v2/graphql", json=update_json
        )
        response.raise_for_status()
        res_data = response.json()
    
        # Check updated was submitted successfully
        assert res_data
        assert res_data["data"]
        assert res_data["data"]["updatePolicy"]
        assert res_data["data"]["updatePolicy"] == new_urn
    
        wait_for_writes_to_sync(consumer_group="datahub-usage-event-consumer-job-client")
        res_data = searchForAuditEvents(
            user_session,
            3,
            ["CreatePolicyEvent", "UpdatePolicyEvent"],
            ["urn:li:corpuser:datahub", "urn:li:corpuser:admin"],
            [],
        )
        print(res_data)
        assert res_data
        assert res_data["usageEvents"]
>       assert len(res_data["usageEvents"]) == 3
E       AssertionError: assert 2 == 3
E        +  where 2 = len([{'eventType': 'CreatePolicyEvent', 'timestamp': 1746638393082, 'actorUrn': 'urn:li:corpuser:datahub', 'sourceIP': '172.18.0.5', 'eventSource': 'GRAPHQL', 'userAgent': 'python-requests/2.31.0', 'telemetryTraceId': '0006348ef0f826b872bbbe412ce9dbe9', 'rawUsageEvent': {'traceId': '0006348ef0f826b872bbbe412ce9dbe9', 'aspectName': 'dataHubPolicyKey', 'actorUrn': 'urn:li:corpuser:datahub', 'entityType': 'dataHubPolicy', 'eventSource': 'GRAPHQL', 'usageSource': 'backend', 'userAgent': 'python-requests/2.31.0', 'corp_user_username': 'datahub', 'type': 'CreatePolicyEvent', '@timestamp': 1746638393082, 'corp_user_name': 'DataHub', 'sourceIP': '172.18.0.5', 'entityUrn': 'urn:li:dataHubPolicy:158d6068-5bf1-4aae-b2c3-fc65390d7dbb', 'timestamp': 1746638393082}, 'entityUrn': 'urn:li:dataHubPolicy:158d6068-5bf1-4aae-b2c3-fc65390d7dbb', 'entityType': 'dataHubPolicy', 'aspectName': 'dataHubPolicyKey'}, {'eventType': 'UpdatePolicyEvent', 'timestamp': 1746638393081, 'actorUrn': 'urn:li:corpuser:datahub', 'sourceIP': '172.18.0.5', 'eventSource': 'GRAPHQL', 'userAgent': 'python-requests/2.31.0', 'telemetryTraceId': '0006348ef0f826b872bbbe412ce9dbe9', 'rawUsageEvent': {'traceId': '0006348ef0f826b872bbbe412ce9dbe9', 'aspectName': 'dataHubPolicyInfo', 'actorUrn': 'urn:li:corpuser:datahub', 'entityType': 'dataHubPolicy', 'eventSource': 'GRAPHQL', 'usageSource': 'backend', 'userAgent': 'python-requests/2.31.0', 'corp_user_username': 'datahub', 'type': 'UpdatePolicyEvent', '@timestamp': 1746638393081, 'corp_user_name': 'DataHub', 'sourceIP': '172.18.0.5', 'entityUrn': 'urn:li:dataHubPolicy:158d6068-5bf1-4aae-b2c3-fc65390d7dbb', 'timestamp': 1746638393081}, 'entityUrn': 'urn:li:dataHubPolicy:158d6068-5bf1-4aae-b2c3-fc65390d7dbb', 'entityType': 'dataHubPolicy', 'aspectName': 'dataHubPolicyInfo'}])

tests/audit_events/audit_events_test.py:317: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label May 5, 2025
Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

beyond self-join, what's the coverage for other join variants such as inner/outer/cross joins? they could be explicitely tested

should user-facing doc be updated with the new features?

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels May 6, 2025
@hsheth2
Copy link
Collaborator Author

hsheth2 commented May 6, 2025

@sgomezvillamor this should work for all join types, including ones that use subqueries or CTEs

There's no user-facing portion of this yet, so I don't want to add any user-facing docs around this until it shows up in the product somewhere

@@ -1383,7 +1390,16 @@ def _gen_lineage_for_downstream(
if self.can_generate_query(query_id)
else None
),
confidenceScore=queries_map[query_id].confidence_score,
confidenceScore=query.confidence_score,
transformOperation=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we format as a string and not convert it into a JSON object to make it parsable by machine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is shown in the UI as-is

@@ -26,6 +26,7 @@
"downstreams": [
"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:mssql,demodata.foo.age_dist,PROD),age)"
],
"transformOperation": "COPY: [persons].[age] AS [age]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct with the square brackets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - mssql uses square brackets for quotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants