Para terminar con la saga que hemos venido desarrollando aquí y aquí vamos a ver que muchas veces el uso de ‘TOP’ es incorrecto o innecesario.
Esto es así porque a menudo lo que se anda buscando en sentencias con ‘TOP’ es un número de filas elegidas aleatoriamente, y aquí radica la gran diferencia: ‘TOP’ lleva implícitamente asociado el concepto de ORDEN. Teradata provee de la cláusula ‘SAMPLE’, la cual permite elegir una muestra (aleatoria) de las filas. Este ‘SAMPLE’, a diferencia de ‘TOP’, no lleva consigo ningún concepto de ordenación.
Así pues:
BTEQ -- Enter your DBC/SQL request or BTEQ command: SELECT COUNT(1) FROM MY_DB.TABLA_GRANDE ; *** Query completed. One row found. One column returned. *** Total elapsed time was 11 seconds. Count(1) ----------- 60279016 BTEQ -- Enter your DBC/SQL request or BTEQ command: CREATE VIEW MY_DB.PRE_SAMPLE AS SELECT MY_COLUMN FROM MY_DB.TABLA_GRANDE SAMPLE 10; *** View has been created. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: SELECT * FROM MY_DB.PRE_SAMPLE ; *** Query completed. 10 rows found. One column returned. *** Total elapsed time was 1 second. MY_COLUMN --------- B15273942 C1A283144 91625314D V10253646 M15273145 N1A223041 91423354G 91829374B B1A203242 B15223845 BTEQ -- Enter your DBC/SQL request or BTEQ command: CREATE VIEW MY_DB.POST_SAMPLE AS SELECT MY_COLUMN FROM MY_DB.TABLA_GRANDE ; *** View has been created. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: SELECT * FROM MY_DB.POST_SAMPLE SAMPLE 10 ; *** Query completed. 10 rows found. One column returned. *** Total elapsed time was 1 second. MY_COLUMN --------- M10283744 C15223846 M14293146 S1A253646 M17233944 C1S253744 M15233947 O11203747 91329384G H1U263349
Los ‘EXPLAIN’ son parecidos, aunque no iguales (la ‘materialización’ de la vista con SAMPLE supone una diferencia):
BTEQ -- Enter your DBC/SQL request or BTEQ command: EXPLAIN SELECT * FROM MY_DB.PRE_SAMPLE ; *** Help information returned. 20 rows. *** Total elapsed time was 1 second. Explanation -------------------------------------------------------------------------------- 1) First, we lock a distinct MY_DB."pseudo table" for read on a RowHash to prevent global deadlock for MY_DB.TABLA_GRANDE. 2) Next, we lock MY_DB.TABLA_GRANDE in view PRE_SAMPLE for read. 3) We do an all-AMPs SAMPLING step from MY_DB.TABLA_GRANDE in view PRE_SAMPLE by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is built locally on the AMPs. Samples are specified as a number of rows. 4) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 4 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 4 is estimated with low confidence to be 60,196,840 rows (1,805,905,200 bytes). The estimated time for this step is 12.40 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 4 are sent back to the user as the result of statement 1. BTEQ -- Enter your DBC/SQL request or BTEQ command: EXPLAIN SELECT * FROM MY_DB.POST_SAMPLE SAMPLE 10 ; *** Help information returned. 14 rows. *** Total elapsed time was 1 second. Explanation -------------------------------------------------------------------------------- 1) First, we lock a distinct MY_DB."pseudo table" for read on a RowHash to prevent global deadlock for MY_DB.TABLA_GRANDE. 2) Next, we lock MY_DB.TABLA_GRANDE in view POST_SAMPLE for read. 3) We do an all-AMPs SAMPLING step from MY_DB.TABLA_GRANDE in view POST_SAMPLE by way of an all-rows scan with no residual conditions into Spool 2 (group_amps), which is built locally on the AMPs. Samples are specified as a number of rows. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 2 are sent back to the user as the result of statement 1.
Como vemos en el ejemplo, el funcionamiento de las dos ‘SELECT’ (con el ‘SAMPLE’ dentro o fuera de la vista) es bastante parecido en rendimiento (aunque no igual, como se ve en los ‘EXPLAIN’). Y mucho mejor que con ‘TOP’, siempre en el caso de que lo único que queramos sea sacar un número de filas aleatoriamente de una tabla sin preocuparnos de otros factores, como el orden o ‘ranking’ de las filas en cuestión.
Saludos.
Carlos.
Sí que ha llegado lejos en análisis. ¡Muchas gracias por el trabajo!