How to Sync Data From MySQL to Doris Using SeaTunnel

cover
9 May 2024

As data technology rapidly advances, understanding and mastering various tools and techniques becomes increasingly important. To this end, we are preparing to launch a Demo demonstration plan on the Apache SeaTunnel community on how to use connectors, inviting you all who are passionate about data synchronization technology to share your knowledge and practical experience!

In this issue, we have invited Mr. Gao Jun, a PMC member of the community, to record a demo tutorial in the video which themed How to sync data from MySQL to Doris. If you are interested in this plan, feel free to email me(xiyan@whaleops.com) to participate in the Demo recording! Whether you are a data engineer, developer, or technology enthusiast, you are welcome to showcase your technical talents.

If you are a user and want to see synchronization scenarios Demo, please scroll down to the bottom and leave a comment, and we will prioritize producing the Demo with the highest demand for synchronization scenarios!

Demo Goals

Our goal is to create a platform for sharing and learning, helping community members better understand and apply various data connectors through specific Demo demonstrations and corresponding documentation. These demos can help beginners learn quickly and also provide a stage for senior experts to showcase innovative solutions.

https://youtu.be/_xB9BidoNOA?embedable=true

https://youtu.be/T8AZVxuzfeQ?embedable=true

https://youtu.be/kzAvfqnO4yc?embedable=true

MySQL to Doris

JDBC MySQL Connector

Description

Read data from external data sources via JDBC.

Supported MySQL Versions

  • 5.5/5.6/5.7/8.0

Supported Engines

SparkFlinkSeaTunnel Zeta

Dependencies

  • Spark/Flink engine

You need to ensure that the jdbc driver jar is placed in the ${SEATUNNEL_HOME}/plugins/ directory.

  • SeaTunnel Zeta engine

You need to ensure that the jdbc driver jar is placed in the ${SEATUNNEL_HOME}/lib/ directory.

Key Features of MySQL Connector

  • Batch processing
  • Stream processing
  • Exactly once
  • Column projection
  • Parallel processing
  • Support for user-defined sharding
  • Support for reading from multiple tables

Supports query SQL and can achieve projection effect.

Supported Data Source Information

Database Dependencies

Please download the support list corresponding to ‘Maven’ and copy it to ‘SEATNUNNEL_HOME/plugins/jdbc/lib/’.

Data Type Mapping

Source Options

Tips

If partition_column is not set, it will run in single concurrency, and if partition_column is set, it will be executed in parallel according to the concurrency of tasks.

Task Example

Simple:

This example queries type_bin ‘table’ 16 data in your test “database” in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.

# Defining the runtime environment
env {
  # You can set flink configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
}
source{
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        query = "select * from type_bin limit 16"
    }
}
transform {
    # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/transform-v2/sql
}
sink {
    Console {}
}

Parallel:

Read your query table in parallel with the shard field you configured and the shard data You can do this if you want to read the whole table

source {
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        # Define query logic as required
        query = "select * from type_bin"
        # Parallel sharding reads fields
        partition_column = "id"
        # Number of fragments
        partition_num = 10
    }
}

Parallel Boundary:

It is more efficient to specify the data within the upper and lower bounds of the query It is more efficient to read your data source according to the upper and lower boundaries you configured

source {
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        # Define query logic as required
        query = "select * from type_bin"
        partition_column = "id"
        # Read start boundary
        partition_lower_bound = 1
        # Read end boundary
        partition_upper_bound = 500
        partition_num = 10
    }
}

Above is the code and documentation explanation shown in the video. You can combine it with the explanations in the video for hands-on practice. If you are interested in recording, please continue reading below.

How to submit your Demo recording

Participating in the Demo recording is simple!

First, choose a connector you are familiar with and prepare a 5 to 10-minute video demonstration. Make sure your video clearly shows how to configure and use the connector, what problems it solves, and any possible best practices.

Once you have your demo video ready, you can email me via debra.chen12@gmail.com. Remember to include relevant descriptive information, such as the connector you’re demonstrating and an overview of the video content.

That’s how you can participate in the Demo recording, take action now!