温馨提示:本文翻译自stackoverflow.com，查看原文请点击：其他 - How do I configure Cloud Data Fusion pipeline to run against existing Hadoop clusters

google-cloud-data-fusion

其他 - 如何配置Cloud Data Fusion管道以针对现有Hadoop集群运行

发布于 2020-03-27 12:05:41

Cloud Data Fusion为每个运行的管道创建一个新的Dataproc集群。我已经有一个运行24x7的Dataproc集群设置，我想使用该集群来运行管道

提问者

Sree

被浏览

26

查看英文版

查看原文

1,710 2019-08-29 12:17

这可以通过在系统管理->配置->系统计算配置文件->创建新的计算配置文件下使用远程Hadoop供应器设置新的计算配置文件来实现。此功能仅在企业版Cloud Data Fusion（“执行环境选择”）上可用。

以下是详细步骤。

Dataproc群集上的SSH设置

a. Navigate to Dataproc console on Google Cloud Platform. Go to “Cluster details” by clicking on your Dataproc cluster name.

b. Under “VM Instances”, click on the “SSH“ button to connect to the Dataproc VM.

c. Follow the steps here to create a new SSH key, format the public key file to enforce an expiration time, and add the newly created SSH public key at project or instance level.

d. If the SSH is setup successfully, you should be able to see the SSH key you just added in the Metadata section of your Compute Engine console, as well as the authorized_keys file in your Dataproc VM.
Create a customized system compute profile for your Data Fusion instance

a. Navigate to your Data Fusion instance console by clicking on “View Instance"

b. Click on “System Admin“ on the top right corner.

C。在“配置”选项卡下，展开“系统计算配置文件”。单击“创建新配置文件”，然后在下一页上选择“远程Hadoop Provisioner”。

d。填写个人资料的一般信息。

e。您可以在Compute Engine下的“ VM实例详细信息”页面上找到SSH主机IP信息。

F。复制在步骤1中创建的SSH私钥，然后将其粘贴到“ SSH私钥”字段中。

G。单击“创建”创建配置文件。
配置您的数据融合管道以使用自定义配置文件

一种。单击管道以针对远程Hadoop运行

b。单击配置->计算配置，然后选择远程hadoop提供者配置

相关问题

1

替换Datafusion Wrangler中的点不起作用

2

Google Cloud Data Fusion将Excel提取到Bigquery

3

条件成分

4

在云数据融合中转换为日期

5

使用 HTTP 源读取 CSV 导出的 Cloud Data Fusion 问题

6

Dataproc 操作失败：INVALID_ARGUMENT：用户无权充当服务帐号

7

将 GCP DLP 与 DataFusion 结合使用，无法找到模板

8

GCP 数据融合

9

GCP 数据融合：自定义插件测试：找不到工件 jdk.tools:jdk.tools:jar:1.6

10

使用 GCP Data Fusion 将数据从 MySQL 复制到 BigQuery

热门github

1

A command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, GOPHER, GOPHERS, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, MQTT, POP3, POP3S, RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, TFTP, WS and WSS. libcurl offers a myriad of powerful features (翻译：Curl 是一个命令行工具，用于传输使用 URL 语法指定的数据。)

2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

3

Flutter makes it easy and fast to build beautiful apps for mobile and beyond (翻译：Flutter 可以轻松快速地为移动设备及其他应用构建漂亮的应用程序)

4

Powerful menu bar manager for macOS

5

Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.

6

AI coding agent, built for the terminal.

7

Tongyi DeepResearch, the Leading Open-source DeepResearch Agent

8

An AI Hedge Fund Team

9

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

10

基于大模型和 RAG 的智能问数系统。Text-to-SQL Generation via LLMs using RAG.

11

🔥 🔥 🔥 Open Source Airtable Alternative (翻译：将任何 MySQL、PostgreSQL、SQL Server、SQLite 和 MariaDB 转换为智能电子表格。)

12

Lightweight coding agent that runs in your terminal

13

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

14

Home of the WebKit project, the browser engine used by Safari, Mail, App Store and many other applications on macOS, iOS and Linux. (翻译：WebKit 项目的主页，Safari、Mail、App Store 和 macOS、iOS 和 Linux 上的许多其他应用程序使用的浏览器引擎。)

15