Leyendo el archivo json de registros de Azure Databricks utilizando Azure Databricks.

davy.ai

May 31, 2023 at 2:18 pm

Parece que el error que estás enfrentando está relacionado con la autenticación, específicamente que el código está intentando acceder a la cuenta de almacenamiento utilizando credenciales anónimas y no puede encontrar ninguna credencial para ella en la configuración.

Para autenticarse y acceder a los registros almacenados en Azure Blob Storage, necesitas proporcionar las credenciales que el código pueda usar para acceder a la cuenta de almacenamiento. Puedes utilizar la clave de cuenta de Azure Blob Storage o un token de firma de acceso compartido (SAS).

Para utilizar la clave de cuenta de Azure Blob Storage, puedes modificar tu código de la siguiente manera:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType,DateType,LongType


storageAccountName = "tu_nombre_de_cuenta_de_almacenamiento"
containerName = "insights-logs-jobs"
spark.conf.set("fs.azure.account.auth.type." + storageAccountName + ".blob.core.windows.net", "AccountKey")
spark.conf.set("fs.azure.account.key." + storageAccountName + ".blob.core.windows.net", "tu_clave_de_cuenta_de_almacenamiento")

# Definición del esquema para los registros
log_schema = StructType([ \
    StructField("timestamp", LongType(), True), \
    StructField("level", StringType(), True), \
    StructField("logger", StringType(), True), \
    StructField("message", StringType(), True), \
    StructField("thread", StringType(), True), \
    StructField("fileName", StringType(), True), \
    StructField("lineNumber", IntegerType(), True), \
    StructField("threadId", LongType(), True), \
    StructField("threadPriority", IntegerType(), True), \
    StructField("appId", StringType(), True), \
    StructField("appName", StringType(), True), \
    StructField("workspaceId", StringType(), True), \
    StructField("workspaceName", StringType(), True), \
    StructField("clusterId", StringType(), True), \
    StructField("clusterName", StringType(), True), \
    StructField("runId", StringType(), True), \
    StructField("attemptId", StringType(), True), \
    StructField("jobId", StringType(), True), \
    StructField("jobName", StringType(), True), \
    StructField("taskType", StringType(), True), \
    StructField("contextId", StringType(), True), \
    StructField("instanceId", StringType(), True)])

# Lectura de los registros
df = spark.read.schema(log_schema).json("wasbs://" + containerName + "@{}.blob.core.windows.net".format(storageAccountName) + "/resourceId=/SUBSCRIPTIONS/xxxBD-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xxxx-xxx-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xxx-ADB/y=2021/m=12/d=07/h=00/m=00/*.json")

El código anterior proporciona el nombre y la clave de la cuenta de almacenamiento para autenticar el acceso. Además, el esquema se define explícitamente para leer los registros.

Alternativamente, para utilizar el token de firma de acceso compartido (SAS), puedes modificar tu código de la siguiente manera:

storageAccountName = "tu_nombre_de_cuenta_de_almacenamiento"
containerName = "insights-logs-jobs"

sas_key = "tu_token_SAS"

spark.conf.set("fs.azure.sas." + containerName + "."+ storageAccountName + ".blob.core.windows.net", sas_key)

# Definición del esquema para los registros
log_schema = StructType([ \
    StructField("timestamp", LongType(), True), \
    StructField("level", StringType(), True), \
    StructField("logger", StringType(), True), \
    StructField("message", StringType(), True), \
    StructField("thread", StringType(), True), \
    StructField("fileName", StringType(), True), \
    StructField("lineNumber", IntegerType(), True), \
    StructField("threadId", LongType(), True), \
    StructField("threadPriority", IntegerType(), True), \
    StructField("appId", StringType(), True), \
    StructField("appName", StringType(), True), \
    StructField("workspaceId", StringType(), True), \
    StructField("workspaceName", StringType(), True), \
    StructField("clusterId", StringType(), True), \
    StructField("clusterName", StringType(), True), \
    StructField("runId", StringType(), True), \
    StructField("attemptId", StringType(), True), \
    StructField("jobId", StringType(), True), \
    StructField("jobName", StringType(), True), \
    StructField("taskType", StringType(), True), \
    StructField("contextId", StringType(), True), \
    StructField("instanceId", StringType(), True)])

# Lectura de los registros
df = spark.read.schema(log_schema).json("wasbs://" + containerName + "@{}.blob.core.windows.net".format(storageAccountName) + "/resourceId=/SUBSCRIPTIONS/xxxBD-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xxxx-xxx-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xxx-ADB/y=2021/m=12/d=07/h=00/m=00/*.json")

El código anterior proporciona el nombre del contenedor, el nombre de la cuenta de almacenamiento y el token SAS para autenticar el acceso. Además, el esquema se define explícitamente para leer los registros.

Una vez que hayas actualizado el código con los detalles de autenticación correctos, el código debería poder leer los registros desde Azure Blob Storage.

es.davy.ai

¿Tienes una pregunta?

Leyendo el archivo json de registros de Azure Databricks utilizando Azure Databricks.

Answer

davy.ai