If you have heard of interactive query service, then you are familiar with Presto.is the open-source SQL query engine that powers the AWS Athena service, making data lakes easy to analyze with columnar formats like Apache Parquet.
While Athena is one of the more visible commercial offerings, Facebook Presto Database it certainly is not the only path for those interested in the software.
Facebook Presto History
Presto has its technical roots in the Hadoop world at Facebook. Before Facebook created Presto performance challenges drove them to develop the software to achieve their objectives. As a result, the project was born in 2012. Facebook Presto Database It was then rolled out company-wide in 2013. Later in 2013, Facebook open-sourced it under the Apache Software License.
Here is what Facebook said of its pursuit of the project;
For the analysts, data scientists, and engineers who crunch data derive insights, and work to continuously improve our products, the performance of queries against our data warehouse is important. Being able to run more queries and get results faster improves their productivity.Facebook Presto Database
Facebook noted vital differences in how it approaches certain operations;
In contrast, the Presto engine does not use MapReduce. It employs a custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between stages. This avoids unnecessary I/O and associated latency overhead.
Facebook also provided a simplified architecture overview;
As a result of this model, Presto is a query engine designed with a lot of data connectors.
It supports querying data in RDBMS, Hive, and other data stores. facebook database This includes non-relational sources like Hadoop relational sources such as MySQL, PostgreSQL SQL Server, and others.
Another goal was to support standard ANSI SQL, including ad hoc aggregations, joins, left/right outer joins, sub-queries, distinct counts, and many others.
As a result, it can act as a SQL query proxy, allowing you to combine data from multiple sources across your organization using familiar SQL. Depending on your architecture, this can be a complement to data warehouses, especially for organizations that use a federated model where having these connectors adds value.
What is prestodb or prestosql? Fork or Official Facebook Presto Database
In 2019 three of the original Facebook Presto team members Martin Traverso, Dain Sundstrom, and David Phillips formed the “Presto Software Foundation.” This foundation is meant to oversee their fork of the official project. 4 best telegram marketing tools The Presto fork is often referred to as online.
Facebook Presto Database On GitHub, the fork is located at while the official project is As you can imagine, this is leading to confusion as both projects seem to be synonymous with each other. For example, here are project descriptions for each on GitHub:
Unfortunately, it is not clear why the fork, or foundation, references itself as being “official.” They should own the fact that they left Facebook and forked their project rather than cast themselves as the official Presto distribution. The team has the heritage and credentials to tell a great story, Facebook Presto Database so the efforts to package their fork as the official project, including Wikipedia, is unfortunate. It seems like a missed opportunity to go down that path. b2b fax lead This posture contributes to a level of confusion and serves no benefit to the broader Presto community.
For now, we would suggest focusing your development efforts on the core project rather than the fork. People should start withas two principal official resources for the project. This will ensure you are not mistakenly investing time and energy in the wrong places.
Facebook Presto Performance
Presto was designed for running interactive analytic queries fast. Query execution runs in parallel, with most results returning in seconds. The expectation is the query engine will deliver response times ranging from sub-second to minutes. In addition to speed, providing easy to analyze large datasets in standard s3 sources as well as using Standard SQL were important.