Update from treeverse/lakeFS@ef54cbd

treeverse · Jan 3, 2024 · 21a22dd · 21a22dd
1 parent 34b798c
commit 21a22dd
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 11 deletions.
diff --git a/assets/js/search-data.json b/assets/js/search-data.json
@@ -974,7 +974,7 @@
   },"139": {
     "doc": "DuckDB",
     "title": "Accessing lakeFS from DuckDB",
-    "content": "Configuration . Querying data in lakeFS from DuckDB is similar to querying data in S3 from DuckDB. It is done using the httpfs extension connecting to the S3 Gateway that lakeFS provides. If not loaded already, install and load the HTTPFS extension: . INSTALL httpfs; LOAD httpfs; . Then run the following to configure the connection. SET s3_region='us-east-1'; SET s3_endpoint='lakefs.example.com'; SET s3_access_key_id='AKIAIOSFODNN7EXAMPLE'; SET s3_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'; SET s3_url_style='path'; -- Uncomment in case the endpoint listen on non-secure, for example running lakeFS locally. -- SET s3_use_ssl=false; . | s3_endpoint is the host (and port, if necessary) of your lakeFS server | s3_access_key_id and s3_secret_access_key are the access credentials for your lakeFS user | s3_url_style needs to be set to path | s3_region is the S3 region on which your bucket resides. If local storage, or not S3, then just set it to us-east-1. | . Querying Data . Once configured, you can query data using the lakeFS S3 Gateway using the following URI pattern: . s3://&lt;REPOSITORY NAME&gt;/&lt;REFERENCE ID&gt;/&lt;PATH TO DATA&gt; . Since the S3 Gateway implemenets all S3 functionality required by DuckDB, you can query using globs and patterns, including support for Hive-partitioned data. Example: . SELECT * FROM parquet_scan('s3://example-repo/main/data/population/by-region/*.parquet', HIVE_PARTITIONING=1) ORDER BY name; . Writing Data . No special configuration required for writing to a branch. Assuming the configuration above and write permissions to a dev branch, a write operation would look like any DuckDB write: . CREATE TABLE sampled_population AS SELECT * FROM parquet_scan('s3://example-repo/main/data/population/by-region/*.parquet', HIVE_PARTITIONING=1) USING SAMPLE reservoir(50000 ROWS) REPEATABLE (100); COPY sampled_population TO 's3://example-repo/main/data/population/sample.parquet'; -- actual write happens here . ",
+    "content": "Configuration . Querying data in lakeFS from DuckDB is similar to querying data in S3 from DuckDB. It is done using the httpfs extension connecting to the S3 Gateway that lakeFS provides. If not loaded already, install and load the HTTPFS extension: . INSTALL httpfs; LOAD httpfs; . Then run the following to configure the connection. -- \"s3_region\" is the S3 region on which your bucket resides. If local storage, or not S3, then just set it to \"us-east-1\". SET s3_region='us-east-1'; -- the host (and port, if necessary) of your lakeFS server SET s3_endpoint='lakefs.example.com'; -- the access credentials for your lakeFS user SET s3_access_key_id='AKIAIOSFODNN7EXAMPLE'; -- the access credentials for your lakeFS user SET s3_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'; SET s3_url_style='path'; -- Uncomment in case the endpoint listen on non-secure, for example running lakeFS locally. -- SET s3_use_ssl=false; . Querying Data . Once configured, you can query data using the lakeFS S3 Gateway using the following URI pattern: . s3://&lt;REPOSITORY NAME&gt;/&lt;REFERENCE ID&gt;/&lt;PATH TO DATA&gt; . Since the S3 Gateway implemenets all S3 functionality required by DuckDB, you can query using globs and patterns, including support for Hive-partitioned data. Example: . SELECT * FROM parquet_scan('s3://example-repo/main/data/population/by-region/*.parquet', HIVE_PARTITIONING=1) ORDER BY name; . Writing Data . No special configuration required for writing to a branch. Assuming the configuration above and write permissions to a dev branch, a write operation would look like any DuckDB write: . CREATE TABLE sampled_population AS SELECT * FROM parquet_scan('s3://example-repo/main/data/population/by-region/*.parquet', HIVE_PARTITIONING=1) USING SAMPLE reservoir(50000 ROWS) REPEATABLE (100); COPY sampled_population TO 's3://example-repo/main/data/population/sample.parquet'; -- actual write happens here . ",
     "url": "/integrations/duckdb.html#accessing-lakefs-from-duckdb",
 
     "relUrl": "/integrations/duckdb.html#accessing-lakefs-from-duckdb"

diff --git a/integrations/duckdb.html b/integrations/duckdb.html
@@ -580,22 +580,19 @@ <h3 id="configuration">
 
 <p>Then run the following to configure the connection.</p>
 
-<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="n">s3_region</span><span class="o">=</span><span class="s1">'us-east-1'</span><span class="p">;</span>
+<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- "s3_region" is the S3 region on which your bucket resides. If local storage, or not S3, then just set it to "us-east-1".</span>
+<span class="k">SET</span> <span class="n">s3_region</span><span class="o">=</span><span class="s1">'us-east-1'</span><span class="p">;</span>
+<span class="c1">-- the host (and port, if necessary) of your lakeFS server</span>
 <span class="k">SET</span> <span class="n">s3_endpoint</span><span class="o">=</span><span class="s1">'lakefs.example.com'</span><span class="p">;</span>
-<span class="k">SET</span> <span class="n">s3_access_key_id</span><span class="o">=</span><span class="s1">'AKIAIOSFODNN7EXAMPLE'</span><span class="p">;</span>
-<span class="k">SET</span> <span class="n">s3_secret_access_key</span><span class="o">=</span><span class="s1">'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'</span><span class="p">;</span>
+<span class="c1">-- the access credentials for your lakeFS user</span>
+<span class="k">SET</span> <span class="n">s3_access_key_id</span><span class="o">=</span><span class="s1">'AKIAIOSFODNN7EXAMPLE'</span><span class="p">;</span> 
+<span class="c1">-- the access credentials for your lakeFS user</span>
+<span class="k">SET</span> <span class="n">s3_secret_access_key</span><span class="o">=</span><span class="s1">'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'</span><span class="p">;</span> 
 <span class="k">SET</span> <span class="n">s3_url_style</span><span class="o">=</span><span class="s1">'path'</span><span class="p">;</span>
 
 <span class="c1">-- Uncomment in case the endpoint listen on non-secure, for example running lakeFS locally.</span>
 <span class="c1">-- SET s3_use_ssl=false;</span>
 </code></pre></div></div>
-
-<ul>
-  <li><code class="language-plaintext highlighter-rouge">s3_endpoint</code> is the host (and port, if necessary) of your lakeFS server</li>
-  <li><code class="language-plaintext highlighter-rouge">s3_access_key_id</code> and <code class="language-plaintext highlighter-rouge">s3_secret_access_key</code> are the access credentials for your lakeFS user</li>
-  <li><code class="language-plaintext highlighter-rouge">s3_url_style</code> needs to be set to <code class="language-plaintext highlighter-rouge">path</code></li>
-  <li><code class="language-plaintext highlighter-rouge">s3_region</code> is the S3 region on which your bucket resides. If local storage, or not S3, then just set it to <code class="language-plaintext highlighter-rouge">us-east-1</code>.</li>
-</ul>
 <h3 id="querying-data">