Docs / Build Workflow

Sources

Sources are step one of content reliability

Models depend on source aliases. If the alias, credentials filename, or dataset reference is wrong, every model, query, and visualization built on top fails. Configure and validate runtime/sources.runtime.yml before writing a single model.

If you have not set up your GCP service account yet, do that first. See BigQuery Dataset Access.

Working example: public BigQuery dataset

Working example using the public bigquery-public-data.thelook_ecommerce dataset. Copy it as your starting point; replace project_id with your own GCP billing project and credentials_file with the filename of your service-account JSON (the file you drop into secrets/). The dataset itself is public, so anyone with a working BigQuery service account can query it.

sources:
  ecommerce:
    name: The Look Ecommerce
    type: bigquery
    project_id: my-gcp-billing-project
    credentials_file: my-workspace-bq.json
    datasets:
      - bigquery-public-data.thelook_ecommerce

Field by field:

  • ecommerce: the alias your Malloy models will use. Pick something short and stable — changing it later breaks every model that references it.
  • project_id: the GCP project that pays for query costs. This is your billing project, not necessarily where the data lives.
  • credentials_file: plain filename of the service-account JSON dropped into the workspace's secrets/ folder. Pattern ^[A-Za-z0-9_-][A-Za-z0-9._-]*$ — no path separators, no leading dot.
  • datasets: the dataset locations the runtime can query. Here the data is in bigquery-public-data, a different GCP project from the billing project — that is normal and expected.

Working example: your own private dataset

When data lives in your own GCP project, both project_id and the dataset project are the same:

sources:
  sales:
    name: Sales Data
    type: bigquery
    project_id: my-company-gcp-project
    credentials_file: my-workspace-bq.json
    datasets:
      - my-company-gcp-project.sales_warehouse

You can define multiple sources in the same file:

sources:
  sales:
    name: Sales Data
    type: bigquery
    project_id: my-company-gcp-project
    credentials_file: my-workspace-bq.json
    datasets:
      - my-company-gcp-project.sales_warehouse
  marketing:
    name: Marketing Data
    type: bigquery
    project_id: my-company-gcp-project
    credentials_file: my-workspace-bq.json
    datasets:
      - my-company-gcp-project.marketing_events

Each alias becomes independently referenceable in models.

Working example: Postgres source

Postgres sources split network coordinates from credentials: the dsn is a standard libpq URI (host, port, database, any TLS or pool flags) without user:password@; the user and password live in a JSON inside secrets/ referenced by credentials_file. The platform reads both, injects the credentials into the DSN in memory, and hands the result to the Postgres driver. runtime/sources.runtime.yml therefore never contains a password.

sources:
  warehouse:
    name: Internal Warehouse
    type: postgres
    dsn: "postgresql://pg.internal.example.com:5432/warehouse?sslmode=require"
    credentials_file: warehouse.json
    schemas:
      - reporting
      - public

Drop the matching JSON into secrets/warehouse.json:

{
  "user": "looky_reader",
  "password": "..."
}

Field by field:

  • type: must be postgres.
  • name: optional human label shown in the UI when the source surfaces in operator tools. Purely cosmetic.
  • dsn: libpq URI without userinfo. Must start with postgres:// or postgresql://; the schema regex rejects any URI containing @ (i.e. user:password@). Any libpq query-string parameter — sslmode, application_name, connect_timeout, target_session_attrs, channel_binding, gssencmode, … — flows through unchanged.
  • credentials_file: plain filename of a JSON inside the workspace's secrets/ folder. Pattern ^[A-Za-z0-9_-][A-Za-z0-9._-]*$ — no path separators, no leading dot. The JSON must declare {"user": "…", "password": "…"}.
  • schemas: optional array narrowing introspection. Defaults to all non-system schemas (pg_catalog and information_schema are always excluded).

Working example: MySQL source

MySQL sources follow the same split as Postgres: the dsn is a mysql://host:port/database URI without user:password@, and the user and password live in a JSON inside secrets/ referenced by credentials_file. Unlike Postgres, a MySQL DSN carries no query parameters — Looky parses the host, port, and database out of the URI and rejects any ?…. runtime/sources.runtime.yml never contains a password.

sources:
  shop:
    name: Online Shop
    type: mysql
    dsn: "mysql://db.internal.example.com:3306/shop"
    credentials_file: shop.json
    schemas:
      - shop

Drop the matching JSON into secrets/shop.json:

{
  "user": "looky_reader",
  "password": "..."
}

Field by field:

  • type: must be mysql.
  • name: optional human label shown in the UI when the source surfaces in operator tools. Purely cosmetic.
  • dsn: mysql://host:port/database URI without userinfo. The schema regex rejects any URI containing @ (i.e. user:password@) or a ? query string. The port is optional and defaults to 3306.
  • credentials_file: plain filename of a JSON inside the workspace's secrets/ folder. Pattern ^[A-Za-z0-9_-][A-Za-z0-9._-]*$ — no path separators, no leading dot. The JSON must declare {"user": "…", "password": "…"}.
  • schemas: optional array of databases to expose for introspection. Defaults to the database named in the dsn.

Two MySQL specifics worth knowing: connections are not encrypted (TLS to MySQL is not configurable yet), so use it over a trusted network; and MySQL has no real boolean type, so boolean-looking columns come back as numbers — add an explicit cast(… as boolean) in your model when you need a true/false filter.

Field matrix per adapter

Quick reference of what each adapter requires:

  • BigQuery — required: type: bigquery, project_id, credentials_file; optional: location, datasets (required at introspection time, not validation time).
  • Postgres — required: type: postgres, dsn (libpq URI without userinfo), credentials_file (filename of a JSON in secrets/ declaring {"user","password"}); optional: schemas.
  • MySQL — required: type: mysql, dsn (mysql://host:port/database, no userinfo, no query string), credentials_file (filename of a JSON in secrets/ declaring {"user","password"}); optional: schemas (defaults to the DSN's database).

For the full per-adapter divergence matrix (auth, parameter binding, NULL handling, dryRun, introspection, caching), see the source adapter comparison.

Source validation workflow

  1. Edit runtime/sources.runtime.yml from your workspace root.
  2. Run:
    looky sources list
    looky sources diff
    looky validate
  3. Fix any alias, credential, or dataset errors before touching models.

sources list shows the currently registered runtime aliases. sources diff shows what would change on push. validate checks structural consistency across the whole workspace.

Common source mistakes

  • Alias mismatch: model references sales but runtime defines ecommerce. Every query using that alias fails.
  • Wrong credentials filename: credentials_file must be a plain filename inside secrets/ — no slashes, no path prefix.
  • Dataset not in list: if a model queries a table whose dataset is not listed under datasets:, the query will be rejected at runtime.
  • Wrong billing project: if project_id does not have BigQuery Job User granted to the service account, queries will fail even if data is readable.

Never work around source problems inside the model. Fix the alias and runtime config once, and keep models clean.