Skip to content

Commit

Permalink
Deployed 9643df5 with MkDocs version: 1.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
none committed Jul 10, 2023
1 parent fa697f8 commit 59b3ee1
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 12 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
<span style="color: #f8f8f2">print(result</span><span style="color: #f92672">.</span><span style="color: #f8f8f2">df)</span> <span style="color: #75715e"># Pandas DataFrame</span>
</code></pre></div> <p>When comparing to writing SQL, it's helpful to think of the dimensions as the target columns of a <strong>group by</strong> SQL statement. Think of the metrics as the columns you are <strong>aggregating</strong>. Think of the criteria as the <strong>where clause</strong>. Your criteria are applied in the DataSource Layer SQL queries.</p> <p>The <code>ReportResult</code> has a Pandas DataFrame with the dimensions as the index and the metrics as the columns.</p> <p>A <code>Report</code> is said to have a <code>grain</code>, which defines the dimensions each metric must be able to join to in order to satisfy the <code>Report</code> requirements. The <code>grain</code> is a combination of <strong>all</strong> dimensions, including those referenced in criteria or in metric formulas. In the example above, the <code>grain</code> would be <code>{date, partner}</code>. Both "revenue" and "leads" must be able to join to those dimensions for this report to be possible.</p> <p>These concepts can take time to sink in and obviously vary with the specifics of your data model, but you will become more familiar with them as you start putting together reports against your data warehouses.</p> <p><a name=natural-language-querying></a></p> <h3 id=natural-language-querying><strong>Natural Language Querying</strong><a class=headerlink href=#natural-language-querying title="Permanent link">&para;</a></h3> <p>With the NLP extension <code>Zillion</code> has experimental support for natural language querying of your data warehouse. For example:</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f8f8f2">result</span> <span style="color: #f92672">=</span> <span style="color: #f8f8f2">warehouse</span><span style="color: #f92672">.</span><span style="color: #f8f8f2">execute_text(</span><span style="color: #e6db74">&quot;revenue and leads by date last month&quot;</span><span style="color: #f8f8f2">)</span>
<span style="color: #f8f8f2">print(result</span><span style="color: #f92672">.</span><span style="color: #f8f8f2">df)</span> <span style="color: #75715e"># Pandas DataFrame</span>
</code></pre></div> <p>This NLP feature require a running instance of Qdrant (vector database) and the following values set in your <code>Zillion</code> config file:</p> <ul> <li>QDRANT_HOST</li> <li>OPENAI_API_KEY</li> </ul> <p>Embeddings will be produced and stored in both Qdrant and a local cache. The vector database will be initialized the first time you try to use this by analyzing all fields in your warehouse. An example docker file to run Qdrant is provided in the root of this repo.</p> <p>You have some control over how fields get embedded. Namely in the configuration for any field you can choose whether to exclude a field from embeddings or override which embeddings map to that field. All fields are included by default. The following example would exclude the <code>net_revenue</code> field from being embedded and map <code>revenue</code> metric requests to the <code>gross_revenue</code> field.</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f8f8f2">{</span>
</code></pre></div> <p>This NLP feature requires a running instance of Qdrant (vector database) and the following values set in your <code>Zillion</code> config file:</p> <ul> <li>QDRANT_HOST</li> <li>OPENAI_API_KEY</li> </ul> <p>Embeddings will be produced and stored in both Qdrant and a local cache. The vector database will be initialized the first time you try to use this by analyzing all fields in your warehouse. An example docker file to run Qdrant is provided in the root of this repo.</p> <p>You have some control over how fields get embedded. Namely in the configuration for any field you can choose whether to exclude a field from embeddings or override which embeddings map to that field. All fields are included by default. The following example would exclude the <code>net_revenue</code> field from being embedded and map <code>revenue</code> metric requests to the <code>gross_revenue</code> field.</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f8f8f2">{</span>
<span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;name&quot;</span><span style="color: #f92672">:</span><span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;gross_revenue&quot;</span><span style="color: #f8f8f2">,</span>
<span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;type&quot;</span><span style="color: #f92672">:</span><span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;numeric(10,2)&quot;</span><span style="color: #f8f8f2">,</span>
<span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;aggregation&quot;</span><span style="color: #f92672">:</span><span style="color: #f8f8f2"> </span><span style="color: #e6db74">&quot;sum&quot;</span><span style="color: #f8f8f2">,</span>
Expand Down Expand Up @@ -104,7 +104,7 @@
<span style="color: #f8f8f2"> lead_id INTEGER </span><span style="color: #66d9ef">NOT</span><span style="color: #f8f8f2"> </span><span style="color: #66d9ef">NULL</span><span style="color: #f8f8f2">,</span>
<span style="color: #f8f8f2"> created_at </span><span style="color: #66d9ef">TIMESTAMP</span><span style="color: #f8f8f2"> </span><span style="color: #66d9ef">DEFAULT</span><span style="color: #f8f8f2"> </span><span style="color: #66d9ef">CURRENT_TIMESTAMP</span>
<span style="color: #f8f8f2">);</span>
</code></pre></div> <p><a name=example-warehouse-config></a></p> <h3 id=warehouse-configuration><strong>Warehouse Configuration</strong><a class=headerlink href=#warehouse-configuration title="Permanent link">&para;</a></h3> <p>A <code>Warehouse</code> may be created from a from a JSON or YAML configuration that defines its fields, datasources, and tables. The code below shows how it can be done in as little as one line of code if you have a pointer to a JSON/YAML <code>Warehouse</code> config.</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f92672">from</span> <span style="color: #f8f8f2">zillion</span> <span style="color: #f92672">import</span> <span style="color: #f8f8f2">Warehouse</span>
</code></pre></div> <p><a name=example-warehouse-config></a></p> <h3 id=warehouse-configuration><strong>Warehouse Configuration</strong><a class=headerlink href=#warehouse-configuration title="Permanent link">&para;</a></h3> <p>A <code>Warehouse</code> may be created from a JSON or YAML configuration that defines its fields, datasources, and tables. The code below shows how it can be done in as little as one line of code if you have a pointer to a JSON/YAML <code>Warehouse</code> config.</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f92672">from</span> <span style="color: #f8f8f2">zillion</span> <span style="color: #f92672">import</span> <span style="color: #f8f8f2">Warehouse</span>

<span style="color: #f8f8f2">wh</span> <span style="color: #f92672">=</span> <span style="color: #f8f8f2">Warehouse(config</span><span style="color: #f92672">=</span><span style="color: #e6db74">&quot;https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json&quot;</span><span style="color: #f8f8f2">)</span>
</code></pre></div> <p>This example config uses a <code>data_url</code> in its <code>DataSource</code> <code>connect</code> info that tells <code>Zillion</code> to dynamically download that data and connect to it as a SQLite database. This is useful for quick examples or analysis, though in most scenarios you would put a connection string to an existing database like you see <a href=https://raw.githubusercontent.com/totalhack/zillion/master/tests/test_mysql_ds_config.json>here</a></p> <p>The basics of <code>Zillion's</code> warehouse configuration structure are as follows:</p> <p>A <code>Warehouse</code> config has the following main sections:</p> <ul> <li><code>metrics</code>: optional list of metric configs for global metrics</li> <li><code>dimensions</code>: optional list of dimension configs for global dimensions</li> <li><code>datasources</code>: mapping of datasource names to datasource configs or config URLs</li> </ul> <p>A <code>DataSource</code> config has the following main sections:</p> <ul> <li><code>connect</code>: database connection url or dict of connect params</li> <li><code>metrics</code>: optional list of metric configs specific to this datasource</li> <li><code>dimensions</code>: optional list of dimension configs specific to this datasource</li> <li><code>tables</code>: mapping of table names to table configs or config URLs</li> </ul> <blockquote> <p>Tip: datasource and table configs may also be replaced with a URL that points to a local or remote config file.</p> </blockquote> <p>In this example all four tables in our database are included in the config, two as dimension tables and two as metric tables. The tables are linked through a parent-&gt;child relationship: partners to campaigns, and leads to sales. Some tables also utilize the <code>create_fields</code> flag to automatically create <code>Fields</code> on the datasource from column definitions. Other metrics and dimensions are defined explicitly.</p> <p>To view the structure of this <code>Warehouse</code> after init you can use the <code>print_info</code> method which shows all metrics, dimensions, tables, and columns that are part of your data warehouse:</p> <div class=codehilite style="background: #272822"><pre style="line-height: 125%;"><span></span><code><span style="color: #f8f8f2">wh</span><span style="color: #f92672">.</span><span style="color: #f8f8f2">print_info()</span> <span style="color: #75715e"># Formatted print of the Warehouse structure</span>
Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions sitemap.xml
Original file line number Diff line number Diff line change
@@ -1,39 +1,39 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url>
<loc>https://totalhack.github.io/zillion/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.configs/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.core/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.datasource/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.field/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.report/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.sql_utils/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/zillion.warehouse/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url><url>
<loc>https://totalhack.github.io/zillion/contributing/</loc>
<lastmod>2023-05-03</lastmod>
<lastmod>2023-07-10</lastmod>
<changefreq>daily</changefreq>
</url>
</urlset>
Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit 59b3ee1

Please sign in to comment.